Self Ensemble¶
- 
class dalib.adaptation.self_ensemble.ConsistencyLoss(distance_measure, reduction='mean')[source]¶
- Consistency loss between student model output y and teacher output y_teacher Given specific distance measure \(d\), student model output:math:y, teacher model output \(y_{teacher}\), binary mask \(mask\), consistencyLoss is \[d(y, y_{teacher}) * mask\]- Parameters
- distance_measure (callable) – Distance measure function. 
- reduction (str, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Default:- 'mean'
 
 - Inputs:
- y: predictions from student model 
- y_teacher: predictions from teacher model 
- mask: binary mask 
 
- Shape:
- y, y_teacher: \((N, C)\) where C means the number of classes. 
- mask: \((N, )\) where N means mini-batch size. 
 
 
- 
class dalib.adaptation.self_ensemble.L2ConsistencyLoss(reduction='mean')[source]¶
- L2 consistency loss. Given student model predictions \(y\), teacher model predictions \(y_{teacher}\), binary mask \(mask\), L2consistencyLoss is \[(\mathbb{E}_{i}\Vert y^i - y_{teacher}^i \Vert_2^2) * mask\]
- 
class dalib.adaptation.self_ensemble.ClassBalanceLoss(num_classes)[source]¶
- Class balance loss that penalises the network for making predictions that exhibit large class imbalance. Given predictions \(y\) with dimension \((N, C)\), we first calculate mean across mini-batch dimension, resulting in mini-batch mean per-class probability \(y_{mean}\) with dimension \((C, )\) \[y_{mean}^j = \frac{1}{N} \sum_{i=1}^N y_i^j\]- Then we calculate binary cross entropy loss between \(y_{mean}\) and uniform probability vector \(u\) with the same dimension where \(u^j\) = \(\frac{1}{C}\) \[loss = BCELoss(y_{mean}, u)\]- Parameters
- num_classes (int) – Number of classes 
 - Inputs:
- y (tensor): predictions from classifier 
 
- Shape:
- y: \((N, C)\) where C means the number of classes. 
 
 
- 
class dalib.adaptation.self_ensemble.EmaTeacher(model, alpha)[source]¶
- Exponential moving average model used in Self-ensembling for Visual Domain Adaptation (ICLR 2018) - We define \(\theta_t'\) at training step t as the EMA of successive \(\theta\) weights, \(\alpha\) as decay rate. Then \[\theta_t'=\alpha \theta_{t-1}' + (1-\alpha)\theta_t\]- Parameters
- model (torch.nn.Module) – student model 
- alpha (float) – decay rate for EMA. 
 
 - Inputs:
- x (tensor): input data fed to teacher model 
 - Examples: - >>> classifier = ImageClassifier(backbone, num_classes=31, bottleneck_dim=256).to(device) >>> # initialize teacher model >>> teacher = EmaTeacher(classifier, 0.9) >>> num_iterations = 1000 >>> for _ in range(num_iterations): >>> # x denotes input of one mini-batch >>> # you can get teacher model's output by teacher(x) >>> y_teacher = teacher(x) >>> # when you want to update teacher, you should call teacher.update() >>> teacher.update()