Margin Disparity Discrepancy (MDD)¶
-
class
dalib.adaptation.mdd.
MarginDisparityDiscrepancy
(source_disparity, target_disparity, margin=4, reduction='mean')[source]¶ The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).
MDD can measure the distribution discrepancy in domain adaptation.
The \(y^s\) and \(y^t\) are logits output by the main head on the source and target domain respectively. The \(y_{adv}^s\) and \(y_{adv}^t\) are logits output by the adversarial head.
The definition can be described as:
\[\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = -\gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} L_s (y^s, y_{adv}^s) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} L_t (y^t, y_{adv}^t),\]where \(\gamma\) is a margin hyper-parameter, \(L_s\) refers to the disparity function defined on the source domain and \(L_t\) refers to the disparity function defined on the target domain.
- Parameters
source_disparity (callable) – The disparity function defined on the source domain, \(L_s\).
target_disparity (callable) – The disparity function defined on the target domain, \(L_t\).
margin (float) – margin \(\gamma\). Default: 4
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Inputs:
y_s: output \(y^s\) by the main head on the source domain
y_s_adv: output \(y^s\) by the adversarial head on the source domain
y_t: output \(y^t\) by the main head on the target domain
y_t_adv: output \(y_{adv}^t\) by the adversarial head on the target domain
w_s (optional): instance weights for source domain
w_t (optional): instance weights for target domain
Examples:
>>> num_outputs = 2 >>> batch_size = 10 >>> loss = MarginDisparityDiscrepancy(margin=4., source_disparity=F.l1_loss, target_disparity=F.l1_loss) >>> # output from source domain and target domain >>> y_s, y_t = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs) >>> # adversarial output from source domain and target domain >>> y_s_adv, y_t_adv = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs) >>> output = loss(y_s, y_s_adv, y_t, y_t_adv)
MDD for Classification¶
-
class
dalib.adaptation.mdd.
ClassificationMarginDisparityDiscrepancy
(margin=4, **kwargs)[source]¶ The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).
It measures the distribution discrepancy in domain adaptation for classification.
When margin is equal to 1, it’s also called disparity discrepancy (DD).
The \(y^s\) and \(y^t\) are logits output by the main classifier on the source and target domain respectively. The \(y_{adv}^s\) and \(y_{adv}^t\) are logits output by the adversarial classifier. They are expected to contain raw, unnormalized scores for each class.
The definition can be described as:
\[\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = \gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} \log\left(\frac{\exp(y_{adv}^s[h_{y^s}])}{\sum_j \exp(y_{adv}^s[j])}\right) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} \log\left(1-\frac{\exp(y_{adv}^t[h_{y^t}])}{\sum_j \exp(y_{adv}^t[j])}\right),\]where \(\gamma\) is a margin hyper-parameter and \(h_y\) refers to the predicted label when the logits output is \(y\). You can see more details in Bridging Theory and Algorithm for Domain Adaptation.
- Parameters
margin (float) – margin \(\gamma\). Default: 4
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Inputs:
y_s: logits output \(y^s\) by the main classifier on the source domain
y_s_adv: logits output \(y^s\) by the adversarial classifier on the source domain
y_t: logits output \(y^t\) by the main classifier on the target domain
y_t_adv: logits output \(y_{adv}^t\) by the adversarial classifier on the target domain
- Shape:
Inputs: \((minibatch, C)\) where C = number of classes, or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Output: scalar. If
reduction
is'none'
, then the same size as the target: \((minibatch)\), or \((minibatch, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Examples:
>>> num_classes = 2 >>> batch_size = 10 >>> loss = ClassificationMarginDisparityDiscrepancy(margin=4.) >>> # logits output from source domain and target domain >>> y_s, y_t = torch.randn(batch_size, num_classes), torch.randn(batch_size, num_classes) >>> # adversarial logits output from source domain and target domain >>> y_s_adv, y_t_adv = torch.randn(batch_size, num_classes), torch.randn(batch_size, num_classes) >>> output = loss(y_s, y_s_adv, y_t, y_t_adv)
-
class
dalib.adaptation.mdd.
ImageClassifier
(backbone, num_classes, bottleneck_dim=1024, width=1024, grl=None, finetune=True)[source]¶ Classifier for MDD.
Classifier for MDD has one backbone, one bottleneck, while two classifier heads. The first classifier head is used for final predictions. The adversarial classifier head is only used when calculating MarginDisparityDiscrepancy.
- Parameters
backbone (torch.nn.Module) – Any backbone to extract 1-d features from data
num_classes (int) – Number of classes
bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1024
width (int, optional) – Feature dimension of the classifier head. Default: 1024
grl (nn.Module) – Gradient reverse layer. Will use default parameters if None. Default: None.
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: True
- Inputs:
x (tensor): input data
- Outputs:
outputs: logits outputs by the main classifier
outputs_adv: logits outputs by the adversarial classifier
- Shapes:
x: \((minibatch, *)\), same shape as the input of the backbone.
outputs, outputs_adv: \((minibatch, C)\), where C means the number of classes.
Note
Remember to call function step() after function forward() during training phase! For instance,
>>> # x is inputs, classifier is an ImageClassifier >>> outputs, outputs_adv = classifier(x) >>> classifier.step()
-
dalib.adaptation.mdd.
shift_log
(x, offset=1e-06)[source]¶ First shift, then calculate log, which can be described as:
\[y = \max(\log(x+\text{offset}), 0)\]Used to avoid the gradient explosion problem in log(x) function when x=0.
- Parameters
x (torch.Tensor) – input tensor
offset (float, optional) – offset size. Default: 1e-6
Note
Input tensor falls in [0., 1.] and the output tensor falls in [-log(offset), 0]
MDD for Regression¶
-
class
dalib.adaptation.mdd.
RegressionMarginDisparityDiscrepancy
(margin=1, loss_function=<function mse_loss>, **kwargs)[source]¶ The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).
It measures the distribution discrepancy in domain adaptation for regression.
The \(y^s\) and \(y^t\) are logits output by the main regressor on the source and target domain respectively. The \(y_{adv}^s\) and \(y_{adv}^t\) are logits output by the adversarial regressor. They are expected to contain
normalized
values for each factors.The definition can be described as:
\[\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = -\gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} L (y^s, y_{adv}^s) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} L (y^t, y_{adv}^t),\]where \(\gamma\) is a margin hyper-parameter and \(L\) refers to the disparity function defined on both domains. You can see more details in Bridging Theory and Algorithm for Domain Adaptation.
- Parameters
loss_function (callable) – The disparity function defined on both domains, \(L\).
margin (float) – margin \(\gamma\). Default: 1
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
- Inputs:
y_s: logits output \(y^s\) by the main regressor on the source domain
y_s_adv: logits output \(y^s\) by the adversarial regressor on the source domain
y_t: logits output \(y^t\) by the main regressor on the target domain
y_t_adv: logits output \(y_{adv}^t\) by the adversarial regressor on the target domain
- Shape:
Inputs: \((minibatch, F)\) where F = number of factors, or \((minibatch, F, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Output: scalar. The same size as the target: \((minibatch)\), or \((minibatch, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Examples:
>>> num_outputs = 2 >>> batch_size = 10 >>> loss = RegressionMarginDisparityDiscrepancy(margin=4., loss_function=F.l1_loss) >>> # output from source domain and target domain >>> y_s, y_t = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs) >>> # adversarial output from source domain and target domain >>> y_s_adv, y_t_adv = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs) >>> output = loss(y_s, y_s_adv, y_t, y_t_adv)
-
class
dalib.adaptation.mdd.
ImageRegressor
(backbone, num_factors, bottleneck_dim=1024, width=1024, finetune=True)[source]¶ Regressor for MDD.
Regressor for MDD has one backbone, one bottleneck, while two regressor heads. The first regressor head is used for final predictions. The adversarial regressor head is only used when calculating MarginDisparityDiscrepancy.
- Parameters
backbone (torch.nn.Module) – Any backbone to extract 1-d features from data
num_factors (int) – Number of factors
bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1024
width (int, optional) – Feature dimension of the classifier head. Default: 1024
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: True
- Inputs:
x (Tensor): input data
- Outputs: (outputs, outputs_adv)
outputs: outputs by the main regressor
outputs_adv: outputs by the adversarial regressor
- Shapes:
x: \((minibatch, *)\), same shape as the input of the backbone.
outputs, outputs_adv: \((minibatch, F)\), where F means the number of factors.
Note
Remember to call function step() after function forward() during training phase! For instance,
>>> # x is inputs, regressor is an ImageRegressor >>> outputs, outputs_adv = regressor(x) >>> regressor.step()