Self Ensemble¶

class dalib.adaptation.self_ensemble.ConsistencyLoss(distance_measure, reduction='mean')[source]¶

Consistency loss between student model output y and teacher output y_teacher Given specific distance measure \(d\), student model output:math:y, teacher model output \(y_{teacher}\), binary mask \(mask\), consistencyLoss is

\[d(y, y_{teacher}) * mask\]

Parameters

distance_measure (callable) – Distance measure function.
reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:

y: predictions from student model
y_teacher: predictions from teacher model
mask: binary mask

Shape:

y, y_teacher: \((N, C)\) where C means the number of classes.
mask: \((N, )\) where N means mini-batch size.

class dalib.adaptation.self_ensemble.L2ConsistencyLoss(reduction='mean')[source]¶: L2 consistency loss. Given student model predictions \(y\), teacher model predictions \(y_{teacher}\), binary mask \(mask\), L2consistencyLoss is

\[(\mathbb{E}_{i}\Vert y^i - y_{teacher}^i \Vert_2^2) * mask\]

class dalib.adaptation.self_ensemble.ClassBalanceLoss(num_classes)[source]¶

Class balance loss that penalises the network for making predictions that exhibit large class imbalance. Given predictions \(y\) with dimension \((N, C)\), we first calculate mean across mini-batch dimension, resulting in mini-batch mean per-class probability \(y_{mean}\) with dimension \((C, )\)

\[y_{mean}^j = \frac{1}{N} \sum_{i=1}^N y_i^j\]

Then we calculate binary cross entropy loss between \(y_{mean}\) and uniform probability vector \(u\) with the same dimension where \(u^j\) = \(\frac{1}{C}\)

\[loss = BCELoss(y_{mean}, u)\]

Parameters: num_classes (int) – Number of classes

Inputs:

y (tensor): predictions from classifier

Shape:

y: \((N, C)\) where C means the number of classes.

class dalib.adaptation.self_ensemble.EmaTeacher(model, alpha)[source]¶

Exponential moving average model used in Self-ensembling for Visual Domain Adaptation (ICLR 2018)

We define \(\theta_t'\) at training step t as the EMA of successive \(\theta\) weights, \(\alpha\) as decay rate. Then

\[\theta_t'=\alpha \theta_{t-1}' + (1-\alpha)\theta_t\]

Parameters

model (torch.nn.Module) – student model
alpha (float) – decay rate for EMA.

Inputs:: x (tensor): input data fed to teacher model

Examples:

>>> classifier = ImageClassifier(backbone, num_classes=31, bottleneck_dim=256).to(device)
>>> # initialize teacher model
>>> teacher = EmaTeacher(classifier, 0.9)
>>> num_iterations = 1000
>>> for _ in range(num_iterations):
>>>     # x denotes input of one mini-batch
>>>     # you can get teacher model's output by teacher(x)
>>>     y_teacher = teacher(x)
>>>     # when you want to update teacher, you should call teacher.update()
>>>     teacher.update()

Self Ensemble¶

Docs

Tutorials