Vision Models¶
Image Classification¶
ResNets¶
Modified based on torchvision.models.resnet. @author: Junguang Jiang @contact: JiangJunguang1123@outlook.com
-
class
common.vision.models.resnet.
ResNet
(*args, **kwargs)[source]¶ ResNets without fully connected layer
-
property
out_features
¶ The dimension of output features
-
property
-
common.vision.models.resnet.
resnet18
(pretrained=False, progress=True, **kwargs)[source]¶ ResNet-18 model from “Deep Residual Learning for Image Recognition”
-
common.vision.models.resnet.
resnet34
(pretrained=False, progress=True, **kwargs)[source]¶ ResNet-34 model from “Deep Residual Learning for Image Recognition”
-
common.vision.models.resnet.
resnet50
(pretrained=False, progress=True, **kwargs)[source]¶ ResNet-50 model from “Deep Residual Learning for Image Recognition”
-
common.vision.models.resnet.
resnet101
(pretrained=False, progress=True, **kwargs)[source]¶ ResNet-101 model from “Deep Residual Learning for Image Recognition”
-
common.vision.models.resnet.
resnet152
(pretrained=False, progress=True, **kwargs)[source]¶ ResNet-152 model from “Deep Residual Learning for Image Recognition”
-
common.vision.models.resnet.
resnext50_32x4d
(pretrained=False, progress=True, **kwargs)[source]¶ ResNeXt-50 32x4d model from “Aggregated Residual Transformation for Deep Neural Networks”
-
common.vision.models.resnet.
resnext101_32x8d
(pretrained=False, progress=True, **kwargs)[source]¶ ResNeXt-101 32x8d model from “Aggregated Residual Transformation for Deep Neural Networks”
-
common.vision.models.resnet.
wide_resnet50_2
(pretrained=False, progress=True, **kwargs)[source]¶ Wide ResNet-50-2 model from “Wide Residual Networks”
The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.
-
common.vision.models.resnet.
wide_resnet101_2
(pretrained=False, progress=True, **kwargs)[source]¶ Wide ResNet-101-2 model from “Wide Residual Networks”
The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.
LeNet¶
LeNet model from “Gradient-based learning applied to document recognition”
- param num_classes
number of classes. Default: 10
- type num_classes
int
Note
The input image size must be 28 x 28.
DTN¶
DTN model
- param num_classes
number of classes. Default: 10
- type num_classes
int
Note
The input image size must be 32 x 32.
Semantic Segmentation¶
Keypoint Detection¶
PoseResNet¶
-
common.vision.models.keypoint_detection.pose_resnet.
pose_resnet101
(num_keypoints, pretrained_backbone=True, deconv_with_bias=False, finetune=False, progress=True, **kwargs)[source]¶ Constructs a Simple Baseline model with a ResNet-101 backbone.
- Parameters
num_keypoints (int) – number of keypoints
pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.
deconv_with_bias (bool, optional) – Whether use bias in the deconvolution layer. Default: False
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False
progress (bool, optional) – If True, displays a progress bar of the download to stderr. Default: True
-
class
common.vision.models.keypoint_detection.pose_resnet.
PoseResNet
(backbone, upsampling, feature_dim, num_keypoints, finetune=False)[source]¶ Simple Baseline for keypoint detection.
- Parameters
backbone (torch.nn.Module) – Backbone to extract 2-d features from data
upsampling (torch.nn.Module) – Layer to upsample image feature to heatmap size
feature_dim (int) – The dimension of the features from upsampling layer.
num_keypoints (int) – Number of keypoints
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False
-
class
common.vision.models.keypoint_detection.pose_resnet.
Upsampling
(in_channel=2048, hidden_dims=(256, 256, 256), kernel_sizes=(4, 4, 4), bias=False)[source]¶ 3-layers deconvolution used in Simple Baseline.
Joint Loss¶
-
class
common.vision.models.keypoint_detection.loss.
JointsMSELoss
(reduction='mean')[source]¶ Typical MSE loss for keypoint detection.
- Parameters
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output. Default:'mean'
- Inputs:
output (tensor): heatmap predictions
target (tensor): heatmap labels
target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.
- Shape:
output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.
target: \((minibatch, K, H, W)\).
target_weight: \((minibatch, K)\).
Output: scalar by default. If
reduction
is'none'
, then \((minibatch, K)\).
-
class
common.vision.models.keypoint_detection.loss.
JointsKLLoss
(reduction='mean', epsilon=0.0)[source]¶ KL Divergence for keypoint detection proposed by Regressive Domain Adaptation for Unsupervised Keypoint Detection.
- Parameters
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output. Default:'mean'
- Inputs:
output (tensor): heatmap predictions
target (tensor): heatmap labels
target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.
- Shape:
output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.
target: \((minibatch, K, H, W)\).
target_weight: \((minibatch, K)\).
Output: scalar by default. If
reduction
is'none'
, then \((minibatch, K)\).
Re-Identification¶
Models¶
-
class
common.vision.models.reid.resnet.
ReidResNet
(*args, **kwargs)[source]¶ Modified ResNet architecture for ReID from Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification (ICLR 2020). We change stride of \(layer4\_group1\_conv2, layer4\_group1\_downsample1\) to 1. During forward pass, we will not activate self.relu. Please refer to source code for details.
@author: Baixu Chen @contact: cbx_99_hasta@outlook.com
-
common.vision.models.reid.resnet.
reid_resnet18
(pretrained=False, progress=True, **kwargs)[source]¶ Constructs a Reid-ResNet-18 model.
-
common.vision.models.reid.resnet.
reid_resnet34
(pretrained=False, progress=True, **kwargs)[source]¶ Constructs a Reid-ResNet-34 model.
-
common.vision.models.reid.resnet.
reid_resnet50
(pretrained=False, progress=True, **kwargs)[source]¶ Constructs a Reid-ResNet-50 model.
-
common.vision.models.reid.resnet.
reid_resnet101
(pretrained=False, progress=True, **kwargs)[source]¶ Constructs a Reid-ResNet-101 model.
-
class
common.vision.models.reid.identifier.
ReIdentifier
(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]¶ Person reIdentifier from Bag of Tricks and A Strong Baseline for Deep Person Re-identification (CVPR 2019). Given 2-d features \(f\) from backbone network, the authors pass \(f\) through another BatchNorm1d layer and get \(bn\_f\), which will then pass through a Linear layer to output predictions. During training, we use \(f\) to compute triplet loss. While during testing, \(bn\_f\) is used as feature. This may be a little confusing. The figures in the origin paper will help you understand better.
-
property
features_dim
¶ The dimension of features before the final head layer
-
property
Loss¶
-
class
common.vision.models.reid.loss.
TripletLoss
(margin, normalize_feature=False)[source]¶ Triplet loss augmented with batch hard from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017).
Sampler¶
-
class
common.utils.data.
RandomMultipleGallerySampler
(dataset, num_instances=4)[source]¶ Sampler from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017). Assume there are \(N\) identities in the dataset, this implementation simply samples \(K\) images for every identity to form an iter of size \(N\times K\). During training, we will call
__iter__
method of pytorch dataloader once we reach aStopIteration
, this guarantees every image in the dataset will eventually be selected and we are not wasting any training data.