Vision Models¶

Image Classification¶

ResNets¶

Modified based on torchvision.models.resnet. @author: Junguang Jiang @contact: JiangJunguang1123@outlook.com

class common.vision.models.resnet.ResNet(*args, **kwargs)[source]¶

ResNets without fully connected layer

copy_head()[source]¶: Copy the origin fully connected layer

property out_features¶: The dimension of output features

common.vision.models.resnet.resnet18(pretrained=False, progress=True, **kwargs)[source]¶

ResNet-18 model from “Deep Residual Learning for Image Recognition”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet34(pretrained=False, progress=True, **kwargs)[source]¶

ResNet-34 model from “Deep Residual Learning for Image Recognition”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet50(pretrained=False, progress=True, **kwargs)[source]¶

ResNet-50 model from “Deep Residual Learning for Image Recognition”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet101(pretrained=False, progress=True, **kwargs)[source]¶

ResNet-101 model from “Deep Residual Learning for Image Recognition”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet152(pretrained=False, progress=True, **kwargs)[source]¶

ResNet-152 model from “Deep Residual Learning for Image Recognition”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnext50_32x4d(pretrained=False, progress=True, **kwargs)[source]¶

ResNeXt-50 32x4d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnext101_32x8d(pretrained=False, progress=True, **kwargs)[source]¶

ResNeXt-101 32x8d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.wide_resnet50_2(pretrained=False, progress=True, **kwargs)[source]¶

Wide ResNet-50-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.wide_resnet101_2(pretrained=False, progress=True, **kwargs)[source]¶

Wide ResNet-101-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

LeNet¶

LeNet model from “Gradient-based learning applied to document recognition”

param num_classes: number of classes. Default: 10
type num_classes: int

Note

The input image size must be 28 x 28.

DTN¶

DTN model

param num_classes: number of classes. Default: 10
type num_classes: int

Note

The input image size must be 32 x 32.

Semantic Segmentation¶

common.vision.models.segmentation.deeplabv2.deeplabv2_resnet101(num_classes=19, pretrained_backbone=True)[source]¶

Constructs a DeepLabV2 model with a ResNet-101 backbone.

Parameters

num_classes (int, optional) – number of classes. Default: 19
pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.

Keypoint Detection¶

PoseResNet¶

common.vision.models.keypoint_detection.pose_resnet.pose_resnet101(num_keypoints, pretrained_backbone=True, deconv_with_bias=False, finetune=False, progress=True, **kwargs)[source]¶

Constructs a Simple Baseline model with a ResNet-101 backbone.

Parameters

num_keypoints (int) – number of keypoints
pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.
deconv_with_bias (bool, optional) – Whether use bias in the deconvolution layer. Default: False
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False
progress (bool, optional) – If True, displays a progress bar of the download to stderr. Default: True

class common.vision.models.keypoint_detection.pose_resnet.PoseResNet(backbone, upsampling, feature_dim, num_keypoints, finetune=False)[source]¶

Simple Baseline for keypoint detection.

Parameters

backbone (torch.nn.Module) – Backbone to extract 2-d features from data
upsampling (torch.nn.Module) – Layer to upsample image feature to heatmap size
feature_dim (int) – The dimension of the features from upsampling layer.
num_keypoints (int) – Number of keypoints
finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False

class common.vision.models.keypoint_detection.pose_resnet.Upsampling(in_channel=2048, hidden_dims=(256, 256, 256), kernel_sizes=(4, 4, 4), bias=False)[source]¶: 3-layers deconvolution used in Simple Baseline.

Joint Loss¶

class common.vision.models.keypoint_detection.loss.JointsMSELoss(reduction='mean')[source]¶

Typical MSE loss for keypoint detection.

Parameters: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:

output (tensor): heatmap predictions
target (tensor): heatmap labels
target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:

output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.
target: \((minibatch, K, H, W)\).
target_weight: \((minibatch, K)\).
Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

class common.vision.models.keypoint_detection.loss.JointsKLLoss(reduction='mean', epsilon=0.0)[source]¶

KL Divergence for keypoint detection proposed by Regressive Domain Adaptation for Unsupervised Keypoint Detection.

Parameters: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:

output (tensor): heatmap predictions
target (tensor): heatmap labels
target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:

output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.
target: \((minibatch, K, H, W)\).
target_weight: \((minibatch, K)\).
Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

Re-Identification¶

Models¶

class common.vision.models.reid.resnet.ReidResNet(*args, **kwargs)[source]¶: Modified ResNet architecture for ReID from Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification (ICLR 2020). We change stride of \(layer4\_group1\_conv2, layer4\_group1\_downsample1\) to 1. During forward pass, we will not activate self.relu. Please refer to source code for details.

@author: Baixu Chen @contact: cbx_99_hasta@outlook.com

common.vision.models.reid.resnet.reid_resnet18(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a Reid-ResNet-18 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet34(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a Reid-ResNet-34 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet50(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a Reid-ResNet-50 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet101(pretrained=False, progress=True, **kwargs)[source]¶

Constructs a Reid-ResNet-101 model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr

class common.vision.models.reid.identifier.ReIdentifier(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]¶

Person reIdentifier from Bag of Tricks and A Strong Baseline for Deep Person Re-identification (CVPR 2019). Given 2-d features \(f\) from backbone network, the authors pass \(f\) through another BatchNorm1d layer and get \(bn\_f\), which will then pass through a Linear layer to output predictions. During training, we use \(f\) to compute triplet loss. While during testing, \(bn\_f\) is used as feature. This may be a little confusing. The figures in the origin paper will help you understand better.

property features_dim¶: The dimension of features before the final head layer

get_parameters(base_lr=1.0, rate=0.1)[source]¶: A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

Loss¶

class common.vision.models.reid.loss.TripletLoss(margin, normalize_feature=False)[source]¶

Triplet loss augmented with batch hard from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017).

Parameters

margin (float) – margin of triplet loss
normalize_feature (bool, optional) – if True, normalize features into unit norm first before computing loss. Default: False.

Sampler¶

class common.utils.data.RandomMultipleGallerySampler(dataset, num_instances=4)[source]¶

Sampler from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017). Assume there are \(N\) identities in the dataset, this implementation simply samples \(K\) images for every identity to form an iter of size \(N\times K\). During training, we will call __iter__ method of pytorch dataloader once we reach a StopIteration, this guarantees every image in the dataset will eventually be selected and we are not wasting any training data.

Parameters

dataset (list) – each element of this list is a tuple (image_path, person_id, camera_id)
num_instances (int, optional) – number of images to sample for every identity (\(K\) here)

Vision Models¶

Image Classification¶

ResNets¶

LeNet¶

DTN¶

Semantic Segmentation¶

Keypoint Detection¶

PoseResNet¶

Joint Loss¶

Re-Identification¶

Models¶

Loss¶

Sampler¶

Docs

Tutorials