Self Ensemble¶
-
class
dalib.adaptation.self_ensemble.
ConsistencyLoss
(distance_measure, reduction='mean')[source]¶ Consistency loss between student model output y and teacher output y_teacher Given specific distance measure \(d\), student model output:math:y, teacher model output \(y_{teacher}\), binary mask \(mask\), consistencyLoss is
\[d(y, y_{teacher}) * mask\]- Parameters
distance_measure (callable) – Distance measure function.
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Inputs:
y: predictions from student model
y_teacher: predictions from teacher model
mask: binary mask
- Shape:
y, y_teacher: \((N, C)\) where C means the number of classes.
mask: \((N, )\) where N means mini-batch size.
-
class
dalib.adaptation.self_ensemble.
L2ConsistencyLoss
(reduction='mean')[source]¶ L2 consistency loss. Given student model predictions \(y\), teacher model predictions \(y_{teacher}\), binary mask \(mask\), L2consistencyLoss is
\[(\mathbb{E}_{i}\Vert y^i - y_{teacher}^i \Vert_2^2) * mask\]
-
class
dalib.adaptation.self_ensemble.
ClassBalanceLoss
(num_classes)[source]¶ Class balance loss that penalises the network for making predictions that exhibit large class imbalance. Given predictions \(y\) with dimension \((N, C)\), we first calculate mean across mini-batch dimension, resulting in mini-batch mean per-class probability \(y_{mean}\) with dimension \((C, )\)
\[y_{mean}^j = \frac{1}{N} \sum_{i=1}^N y_i^j\]Then we calculate binary cross entropy loss between \(y_{mean}\) and uniform probability vector \(u\) with the same dimension where \(u^j\) = \(\frac{1}{C}\)
\[loss = BCELoss(y_{mean}, u)\]- Parameters
num_classes (int) – Number of classes
- Inputs:
y (tensor): predictions from classifier
- Shape:
y: \((N, C)\) where C means the number of classes.
-
class
dalib.adaptation.self_ensemble.
EmaTeacher
(model, alpha)[source]¶ Exponential moving average model used in Self-ensembling for Visual Domain Adaptation (ICLR 2018)
We define \(\theta_t'\) at training step t as the EMA of successive \(\theta\) weights, \(\alpha\) as decay rate. Then
\[\theta_t'=\alpha \theta_{t-1}' + (1-\alpha)\theta_t\]- Parameters
model (torch.nn.Module) – student model
alpha (float) – decay rate for EMA.
- Inputs:
x (tensor): input data fed to teacher model
Examples:
>>> classifier = ImageClassifier(backbone, num_classes=31, bottleneck_dim=256).to(device) >>> # initialize teacher model >>> teacher = EmaTeacher(classifier, 0.9) >>> num_iterations = 1000 >>> for _ in range(num_iterations): >>> # x denotes input of one mini-batch >>> # you can get teacher model's output by teacher(x) >>> y_teacher = teacher(x) >>> # when you want to update teacher, you should call teacher.update() >>> teacher.update()