Vision Transforms¶

Classification¶

class common.vision.transforms.DeNormalizeAndTranspose(mean=(104.00698793, 116.66876762, 122.67891434))[source]¶: First, convert a tensor image from the shape (C x H x W ) to shape (H x W x C). Then, denormalize it with mean and standard deviation.

class common.vision.transforms.Denormalize(mean, std)[source]¶

DeNormalize a tensor image with mean and standard deviation. Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will denormalize each channel of the input torch.*Tensor i.e., output[channel] = input[channel] * std[channel] + mean[channel]

Note

This transform acts out of place, i.e., it does not mutate the input tensor.

Parameters

mean (sequence) – Sequence of means for each channel.
std (sequence) – Sequence of standard deviations for each channel.

class common.vision.transforms.MultipleApply(transforms)[source]¶

Apply a list of transformations to an image and get multiple transformed images.

Parameters: transforms (list or tuple) – list of transformations

Example

>>> transform1 = T.Compose([
...     ResizeImage(256),
...     T.RandomCrop(224)
... ])
>>> transform2 = T.Compose([
...     ResizeImage(256),
...     T.RandomCrop(224),
... ])
>>> multiply_transform = MultipleApply([transform1, transform2])

class common.vision.transforms.NormalizeAndTranspose(mean=(104.00698793, 116.66876762, 122.67891434))[source]¶: First, normalize a tensor image with mean and standard deviation. Then, convert the shape (H x W x C) to shape (C x H x W).

class common.vision.transforms.RandomErasing(probability=0.5, sl=0.02, sh=0.4, r1=0.3, mean=(0.4914, 0.4822, 0.4465))[source]¶

Random erasing augmentation from Random Erasing Data Augmentation (CVPR 2017). This augmentation randomly selects a rectangle region in an image and erases its pixels.

Parameters

probability (float) – The probability that the Random Erasing operation will be performed.
sl (float) – Minimum proportion of erased area against input image.
sh (float) – Maximum proportion of erased area against input image.
r1 (float) – Minimum aspect ratio of erased area.
mean (sequence) – Value to fill the erased area.

class common.vision.transforms.ResizeImage(size)[source]¶

Resize the input PIL Image to the given size.

Parameters: size (sequence or int) – Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, output size will be (size, size)

Segmentation¶

@author: Junguang Jiang @contact: JiangJunguang1123@outlook.com

common.vision.transforms.segmentation.ColorJitter¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

class common.vision.transforms.segmentation.Compose(transforms)[source]¶

Composes several transforms together.

Parameters: transforms (list) – list of transforms to compose.

Example

>>> Compose([
>>>     Resize((512, 512)),
>>>     RandomHorizontalFlip()
>>> ])

common.vision.transforms.segmentation.MultipleApply¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

common.vision.transforms.segmentation.Normalize¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

common.vision.transforms.segmentation.NormalizeAndTranspose¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

class common.vision.transforms.segmentation.RandomApply(transforms, p=0.5)[source]¶

Apply randomly a list of transformations with a given probability.

Parameters

transforms (list or tuple or torch.nn.Module) – list of transformations
p (float) – probability

class common.vision.transforms.segmentation.RandomChoice(transforms)[source]¶: Apply single transformation randomly picked from a list.

class common.vision.transforms.segmentation.RandomCrop(size)[source]¶

Crop the given image at a random location. The image can be a PIL Image

Parameters: size (sequence) – Desired output size of the crop.

forward(image, label)[source]¶

Parameters

image – (PIL Image): Image to be cropped.
label – (PIL Image): Segmentation label to be cropped.

Returns

Cropped image, cropped segmentation label.

class common.vision.transforms.segmentation.RandomHorizontalFlip(p=0.5)[source]¶

Horizontally flip the given PIL Image randomly with a given probability.

Parameters: p (float) – probability of the image being flipped. Default value is 0.5

forward(image, label)[source]¶

Parameters

image – (PIL Image): Image to be flipped.
label – (PIL Image): Segmentation label to be flipped.

Returns

Randomly flipped image, randomly flipped segmentation label.

class common.vision.transforms.segmentation.RandomResizedCrop(size, scale=(0.5, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=3)[source]¶

Crop the given image to random size and aspect ratio. The image can be a PIL Image.

A crop of random size (default: of 0.5 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size.

Parameters

size (int or sequence) – expected output size of each edge. If size is an int instead of sequence like (h, w), a square output size (size, size) is made. If provided a tuple or list of length 1, it will be interpreted as (size[0], size[0]).
scale (tuple of float) – range of size of the origin size cropped
ratio (tuple of float) – range of aspect ratio of the origin aspect ratio cropped.
interpolation – Default: PIL.Image.BILINEAR

forward(image, label)[source]¶

Parameters

image – (PIL Image): Image to be cropped and resized.
label – (PIL Image): Segmentation label to be cropped and resized.

Returns

Randomly cropped and resized image, randomly cropped and resized segmentation label.

static get_params(img, scale, ratio)[source]¶

Get parameters for crop for a random sized crop.

Parameters

img (PIL Image) – Input image.
scale (list) – range of scale of the origin size cropped
ratio (list) – range of aspect ratio of the origin aspect ratio cropped

Returns

params (i, j, h, w) to be passed to crop for a random sized crop.

class common.vision.transforms.segmentation.Resize(image_size, label_size=None)[source]¶

Resize the input image and the corresponding label to the given size. The image should be a PIL Image.

Parameters

image_size (sequence) – The requested image size in pixels, as a 2-tuple: (width, height).
label_size (sequence, optional) – The requested segmentation label size in pixels, as a 2-tuple: (width, height). The same as image_size if None. Default: None.

forward(image, label)[source]¶

Parameters

image – (PIL Image): Image to be scaled.
label – (PIL Image): Segmentation label to be scaled.

Returns

Rescaled image, rescaled segmentation label

common.vision.transforms.segmentation.ToPILImage¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

common.vision.transforms.segmentation.ToTensor¶: alias of common.vision.transforms.segmentation.wrapper.<locals>.WrapperTransform

common.vision.transforms.segmentation.wrapper(transform)[source]¶

Wrap a transform for classification to a transform for segmentation. Note that the segmentation label will keep the same before and after wrapper.

Parameters: transform (class, callable) – transform for classification
Returns: transform for segmentation

Keypoint Detection¶

@author: Junguang Jiang @contact: JiangJunguang1123@outlook.com

class common.vision.transforms.keypoint_detection.CenterCrop(size)[source]¶: Crops the given PIL Image at the center.

common.vision.transforms.keypoint_detection.ColorJitter¶: alias of common.vision.transforms.keypoint_detection.wrapper.<locals>.WrapperTransform

class common.vision.transforms.keypoint_detection.Compose(transforms)[source]¶

Composes several transforms together.

Parameters: transforms (list of Transform objects) – list of transforms to compose.

common.vision.transforms.keypoint_detection.Normalize¶: alias of common.vision.transforms.keypoint_detection.wrapper.<locals>.WrapperTransform

class common.vision.transforms.keypoint_detection.RandomApply(transforms, p=0.5)[source]¶

Apply randomly a list of transformations with a given probability.

Parameters

transforms (list or tuple or torch.nn.Module) – list of transformations
p (float) – probability

class common.vision.transforms.keypoint_detection.RandomResizedCrop(size, scale=(0.6, 1.3), interpolation=2)[source]¶

Crop the given PIL Image to random size and aspect ratio.

A crop of random size (default: of 0.08 to 1.0) of the original size and a random aspect ratio (default: of 3/4 to 4/3) of the original aspect ratio is made. This crop is finally resized to given size. This is popularly used to train the Inception networks.

Parameters

size – expected output size of each edge
scale – range of size of the origin size cropped
ratio – range of aspect ratio of the origin aspect ratio cropped
interpolation – Default: PIL.Image.BILINEAR

static get_params(img, scale)[source]¶

Get parameters for crop for a random sized crop.

Parameters

img (PIL Image) – Image to be cropped.
scale (tuple) – range of size of the origin size cropped

Returns

params (i, j, h, w) to be passed to crop for a random: sized crop.

Return type

tuple

class common.vision.transforms.keypoint_detection.RandomRotation(degrees)[source]¶

Rotate the image by angle.

Parameters: degrees (sequence or float or int) – Range of degrees to select from. If degrees is a number instead of sequence like (min, max), the range of degrees will be (-degrees, +degrees).

static get_params(degrees)[source]¶

Get parameters for rotate for a random rotation.

Returns: params to be passed to rotate for random rotation.
Return type: sequence

class common.vision.transforms.keypoint_detection.Resize(size, interpolation=2)[source]¶: Resize the input PIL Image to the given size.

class common.vision.transforms.keypoint_detection.ResizePad(size, interpolation=2)[source]¶: Pad the given image on all sides with the given “pad” value to resize the image to the given size.

common.vision.transforms.keypoint_detection.ToTensor¶: alias of common.vision.transforms.keypoint_detection.wrapper.<locals>.WrapperTransform

common.vision.transforms.keypoint_detection.center_crop(image, output_size, keypoint2d)[source]¶

Crop the given PIL Image and resize it to desired size.

Parameters

img (PIL Image) – Image to be cropped. (0,0) denotes the top left corner of the image.
output_size (sequence or int) – (height, width) of the crop box. If int, it is used for both directions

Returns

Cropped image.

Return type

PIL Image

common.vision.transforms.keypoint_detection.resized_crop(img, top, left, height, width, size, interpolation=2, keypoint2d=None, intrinsic_matrix=None)[source]¶

Crop the given PIL Image and resize it to desired size.

Notably used in RandomResizedCrop.

Parameters

img (PIL Image) – Image to be cropped. (0,0) denotes the top left corner of the image.
top (int) – Vertical component of the top left corner of the crop box.
left (int) – Horizontal component of the top left corner of the crop box.
height (int) – Height of the crop box.
width (int) – Width of the crop box.
size (sequence or int) – Desired output size. Same semantics as resize.
interpolation (int, optional) – Desired interpolation. Default is PIL.Image.BILINEAR.

Returns

Cropped image.

Return type

PIL Image

common.vision.transforms.keypoint_detection.wrapper(transform)[source]¶

Wrap a transform for classification to a transform for keypoint detection. Note that the keypoint detection label will keep the same before and after wrapper.

Parameters: transform (class, callable) – transform for classification
Returns: transform for keypoint detection

Vision Transforms¶

Classification¶

Segmentation¶

Keypoint Detection¶

Docs

Tutorials