Loss¶
Knowledge Distillation¶
-
class
common.loss.
KnowledgeDistillationLoss
(T=1.0, reduction='batchmean')[source]¶ Knowledge Distillation Loss.
- Parameters
T (double) – Temperature. Default: 1.
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'batchmean'
- Inputs:
y_student (tensor): logits output of the student
y_teacher (tensor): logits output of the teacher
- Shape:
y_student: (minibatch, num_classes)
y_teacher: (minibatch, num_classes)
Cross Entropy with Label Smooth¶
-
class
common.vision.models.reid.loss.
CrossEntropyLossWithLabelSmooth
(num_classes, epsilon=0.1)[source]¶ Cross entropy loss with label smooth from Rethinking the Inception Architecture for Computer Vision (CVPR 2016).
Given one-hot labels \(labels \in R^C\), where \(C\) is the number of classes, smoothed labels are calculated as
\[smoothed\_labels = (1 - \epsilon) \times labels + \epsilon \times \frac{1}{C}\]We use smoothed labels when calculating cross entropy loss and this can be helpful for preventing over-fitting.
- Parameters
- Inputs:
y (tensor): unnormalized classifier predictions, \(y\)
labels (tensor): ground truth labels, \(labels\)
- Shape:
y: \((minibatch, C)\), where \(C\) is the number of classes
labels: \((minibatch, )\)