Shortcuts

Vision Models

Image Classification

ResNets

Modified based on torchvision.models.resnet. @author: Junguang Jiang @contact: JiangJunguang1123@outlook.com

class common.vision.models.resnet.ResNet(*args, **kwargs)[source]

ResNets without fully connected layer

copy_head()[source]

Copy the origin fully connected layer

property out_features

The dimension of output features

common.vision.models.resnet.resnet18(pretrained=False, progress=True, **kwargs)[source]

ResNet-18 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet34(pretrained=False, progress=True, **kwargs)[source]

ResNet-34 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet50(pretrained=False, progress=True, **kwargs)[source]

ResNet-50 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet101(pretrained=False, progress=True, **kwargs)[source]

ResNet-101 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnet152(pretrained=False, progress=True, **kwargs)[source]

ResNet-152 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnext50_32x4d(pretrained=False, progress=True, **kwargs)[source]

ResNeXt-50 32x4d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.resnext101_32x8d(pretrained=False, progress=True, **kwargs)[source]

ResNeXt-101 32x8d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.wide_resnet50_2(pretrained=False, progress=True, **kwargs)[source]

Wide ResNet-50-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.resnet.wide_resnet101_2(pretrained=False, progress=True, **kwargs)[source]

Wide ResNet-101-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

LeNet

LeNet model from “Gradient-based learning applied to document recognition”

param num_classes

number of classes. Default: 10

type num_classes

int

Note

The input image size must be 28 x 28.

DTN

DTN model

param num_classes

number of classes. Default: 10

type num_classes

int

Note

The input image size must be 32 x 32.

Semantic Segmentation

common.vision.models.segmentation.deeplabv2.deeplabv2_resnet101(num_classes=19, pretrained_backbone=True)[source]

Constructs a DeepLabV2 model with a ResNet-101 backbone.

Parameters
  • num_classes (int, optional) – number of classes. Default: 19

  • pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.

Keypoint Detection

PoseResNet

common.vision.models.keypoint_detection.pose_resnet.pose_resnet101(num_keypoints, pretrained_backbone=True, deconv_with_bias=False, finetune=False, progress=True, **kwargs)[source]

Constructs a Simple Baseline model with a ResNet-101 backbone.

Parameters
  • num_keypoints (int) – number of keypoints

  • pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.

  • deconv_with_bias (bool, optional) – Whether use bias in the deconvolution layer. Default: False

  • finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False

  • progress (bool, optional) – If True, displays a progress bar of the download to stderr. Default: True

class common.vision.models.keypoint_detection.pose_resnet.PoseResNet(backbone, upsampling, feature_dim, num_keypoints, finetune=False)[source]

Simple Baseline for keypoint detection.

Parameters
  • backbone (torch.nn.Module) – Backbone to extract 2-d features from data

  • upsampling (torch.nn.Module) – Layer to upsample image feature to heatmap size

  • feature_dim (int) – The dimension of the features from upsampling layer.

  • num_keypoints (int) – Number of keypoints

  • finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False

class common.vision.models.keypoint_detection.pose_resnet.Upsampling(in_channel=2048, hidden_dims=(256, 256, 256), kernel_sizes=(4, 4, 4), bias=False)[source]

3-layers deconvolution used in Simple Baseline.

Joint Loss

class common.vision.models.keypoint_detection.loss.JointsMSELoss(reduction='mean')[source]

Typical MSE loss for keypoint detection.

Parameters

reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:
  • output (tensor): heatmap predictions

  • target (tensor): heatmap labels

  • target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:
  • output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

  • target: \((minibatch, K, H, W)\).

  • target_weight: \((minibatch, K)\).

  • Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

class common.vision.models.keypoint_detection.loss.JointsKLLoss(reduction='mean', epsilon=0.0)[source]

KL Divergence for keypoint detection proposed by Regressive Domain Adaptation for Unsupervised Keypoint Detection.

Parameters

reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:
  • output (tensor): heatmap predictions

  • target (tensor): heatmap labels

  • target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:
  • output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

  • target: \((minibatch, K, H, W)\).

  • target_weight: \((minibatch, K)\).

  • Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

Re-Identification

Models

class common.vision.models.reid.resnet.ReidResNet(*args, **kwargs)[source]

Modified ResNet architecture for ReID from Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification (ICLR 2020). We change stride of \(layer4\_group1\_conv2, layer4\_group1\_downsample1\) to 1. During forward pass, we will not activate self.relu. Please refer to source code for details.

@author: Baixu Chen @contact: cbx_99_hasta@outlook.com

common.vision.models.reid.resnet.reid_resnet18(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-18 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet34(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-34 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet50(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-50 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

common.vision.models.reid.resnet.reid_resnet101(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-101 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

class common.vision.models.reid.identifier.ReIdentifier(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]

Person reIdentifier from Bag of Tricks and A Strong Baseline for Deep Person Re-identification (CVPR 2019). Given 2-d features \(f\) from backbone network, the authors pass \(f\) through another BatchNorm1d layer and get \(bn\_f\), which will then pass through a Linear layer to output predictions. During training, we use \(f\) to compute triplet loss. While during testing, \(bn\_f\) is used as feature. This may be a little confusing. The figures in the origin paper will help you understand better.

property features_dim

The dimension of features before the final head layer

get_parameters(base_lr=1.0, rate=0.1)[source]

A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

Loss

class common.vision.models.reid.loss.TripletLoss(margin, normalize_feature=False)[source]

Triplet loss augmented with batch hard from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017).

Parameters
  • margin (float) – margin of triplet loss

  • normalize_feature (bool, optional) – if True, normalize features into unit norm first before computing loss. Default: False.

Sampler

class common.utils.data.RandomMultipleGallerySampler(dataset, num_instances=4)[source]

Sampler from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017). Assume there are \(N\) identities in the dataset, this implementation simply samples \(K\) images for every identity to form an iter of size \(N\times K\). During training, we will call __iter__ method of pytorch dataloader once we reach a StopIteration, this guarantees every image in the dataset will eventually be selected and we are not wasting any training data.

Parameters
  • dataset (list) – each element of this list is a tuple (image_path, person_id, camera_id)

  • num_instances (int, optional) – number of images to sample for every identity (\(K\) here)

Docs

Access comprehensive documentation for Transfer Learning Library

View Docs

Tutorials

Get started for Transfer Learning Library

Get Started