Domain Translation¶

CycleGAN: Cycle-Consistent Adversarial Networks¶

Discriminator¶

dalib.translation.cyclegan.pixel(ndf, input_nc=3, norm='batch', init_type='normal', init_gain=0.02)[source]¶

1x1 PixelGAN discriminator can classify whether a pixel is real or not. It encourages greater color diversity but has no effect on spatial statistics.

Parameters

ndf (int) – the number of filters in the first conv layer
input_nc (int) – the number of channels in input images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

dalib.translation.cyclegan.patch(ndf, input_nc=3, norm='batch', n_layers=3, init_type='normal', init_gain=0.02)[source]¶

PatchGAN classifier described in the original pix2pix paper. It can classify whether 70×70 overlapping patches are real or fake. Such a patch-level discriminator architecture has fewer parameters than a full-image discriminator and can work on arbitrarily-sized images in a fully convolutional fashion.

Parameters

ndf (int) – the number of filters in the first conv layer
input_nc (int) – the number of channels in input images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
n_layers (int) – the number of conv layers in the discriminator. Default: 3
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

Generator¶

dalib.translation.cyclegan.resnet_9(ngf, input_nc=3, output_nc=3, norm='batch', use_dropout=False, init_type='normal', init_gain=0.02)[source]¶

Resnet-based generator with 9 Resnet blocks.

Parameters

ngf (int) – the number of filters in the last conv layer
input_nc (int) – the number of channels in input images. Default: 3
output_nc (int) – the number of channels in output images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
use_dropout (bool) – whether use dropout. Default: False
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

dalib.translation.cyclegan.resnet_6(ngf, input_nc=3, output_nc=3, norm='batch', use_dropout=False, init_type='normal', init_gain=0.02)[source]¶

Resnet-based generator with 6 Resnet blocks.

Parameters

ngf (int) – the number of filters in the last conv layer
input_nc (int) – the number of channels in input images. Default: 3
output_nc (int) – the number of channels in output images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
use_dropout (bool) – whether use dropout. Default: False
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

dalib.translation.cyclegan.unet_256(ngf, input_nc=3, output_nc=3, norm='batch', use_dropout=False, init_type='normal', init_gain=0.02)[source]¶

U-Net generator for 256x256 input images. The size of the input image should be a multiple of 256.

Parameters

ngf (int) – the number of filters in the last conv layer
input_nc (int) – the number of channels in input images. Default: 3
output_nc (int) – the number of channels in output images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
use_dropout (bool) – whether use dropout. Default: False
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

dalib.translation.cyclegan.unet_128(ngf, input_nc=3, output_nc=3, norm='batch', use_dropout=False, init_type='normal', init_gain=0.02)[source]¶

U-Net generator for 128x128 input images. The size of the input image should be a multiple of 128.

Parameters

ngf (int) – the number of filters in the last conv layer
input_nc (int) – the number of channels in input images. Default: 3
output_nc (int) – the number of channels in output images. Default: 3
norm (str) – the type of normalization layers used in the network. Default: ‘batch’
use_dropout (bool) – whether use dropout. Default: False
init_type (str) – the name of the initialization method. Choices includes: normal | xavier | kaiming | orthogonal. Default: ‘normal’
init_gain (float) – scaling factor for normal, xavier and orthogonal. Default: 0.02

GAN Loss¶

class dalib.translation.cyclegan.LeastSquaresGenerativeAdversarialLoss(reduction='mean')[source]¶

Loss for Least Squares Generative Adversarial Network (LSGAN)

Parameters: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:

prediction (tensor): unnormalized discriminator predictions
real (bool): if the ground truth label is for real images or fake images. Default: true

Warning

Do not use sigmoid as the last layer of Discriminator.

class dalib.translation.cyclegan.VanillaGenerativeAdversarialLoss(reduction='mean')[source]¶

Loss for Vanilla Generative Adversarial Network

Parameters: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:

prediction (tensor): unnormalized discriminator predictions
real (bool): if the ground truth label is for real images or fake images. Default: true

Warning

Do not use sigmoid as the last layer of Discriminator.

class dalib.translation.cyclegan.WassersteinGenerativeAdversarialLoss(reduction='mean')[source]¶

Loss for Wasserstein Generative Adversarial Network

Parameters: reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:

prediction (tensor): unnormalized discriminator predictions
real (bool): if the ground truth label is for real images or fake images. Default: true

Warning

Do not use sigmoid as the last layer of Discriminator.

Translation¶

class dalib.translation.cyclegan.Translation(generator, device=device(type='cpu'), mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))[source]¶

Image Translation Transform Module

Parameters

generator (torch.nn.Module) – An image generator, e.g. resnet_9_generator()
device (torch.device) – device to put the generator. Default: ‘cpu’
mean (tuple) – the normalized mean for image
std (tuple) – the normalized std for image

Input:

image (PIL.Image): raw image in shape H x W x C

Output:

raw image in shape H x W x 3

Util¶

class dalib.translation.cyclegan.util.ImagePool(pool_size)[source]¶

An image buffer that stores previously generated images.

This buffer enables us to update discriminators using a history of generated images rather than the ones produced by the latest generators.

Parameters: pool_size (int) – the size of image buffer, if pool_size=0, no buffer will be created

query(images)[source]¶

Return an image from the pool.

Parameters: images (torch.Tensor) – the latest generated images from the generator
Returns: By 50/100, the buffer will return input images. By 50/100, the buffer will return images previously stored in the buffer, and insert the current images to the buffer.

dalib.translation.cyclegan.util.set_requires_grad(net, requires_grad=False)[source]¶: Set requies_grad=Fasle for all the networks to avoid unnecessary computations

CyCADA: Cycle-Consistent Adversarial Domain Adaptation¶

class dalib.translation.cycada.SemanticConsistency(ignore_index=(), reduction='mean')[source]¶

Semantic consistency loss is introduced by CyCADA: Cycle-Consistent Adversarial Domain Adaptation (ICML 2018)

This helps to prevent label flipping during image translation.

Parameters

ignore_index (tuple, optional) – Specifies target values that are ignored and do not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets. Default: ().
reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the weighted mean of the output is taken, 'sum': the output will be summed. Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

Shape:

Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Target: \((N)\) where each value is \(0 \leq ext{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
Output: scalar. If reduction is 'none', then the same size as the target: \((N)\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.

Examples:

>>> loss = SemanticConsistency()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

SPGAN: Similarity Preserving Generative Adversarial Network¶

Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification. SPGAN is based on CycleGAN. An additional Siamese network is adopted to force the generator to produce images different from identities in target dataset.

Siamese Network¶

class dalib.translation.spgan.siamese.SiameseNetwork(nsf=64)[source]¶

Siamese network whose input is an image of shape \((3,H,W)\) and output is an one-dimensional feature vector.

Parameters: nsf (int) – dimension of output feature representation.

Contrastive Loss¶

class dalib.translation.spgan.loss.ContrastiveLoss(margin=2.0)[source]¶

Contrastive loss from Dimensionality Reduction by Learning an Invariant Mapping (CVPR 2006).

Given output features \(f_1, f_2\), we use \(D\) to denote the pairwise euclidean distance between them, \(Y\) to denote the ground truth labels, \(m\) to denote a pre-defined margin, then contrastive loss is calculated as

\[(1 - Y)\frac{1}{2}D^2 + (Y)\frac{1}{2}\{\text{max}(0, m-D)^2\}\]

Parameters: margin (float, optional) – margin for contrastive loss. Default: 2.0

Inputs:

output1 (tensor): feature representations of the first set of samples (\(f_1\) here).
output2 (tensor): feature representations of the second set of samples (\(f_2\) here).
label (tensor): labels (\(Y\) here).

Shape:

output1, output2: \((minibatch, F)\) where F means the dimension of input features.
label: \((minibatch, )\)

FDA: Fourier Domain Adaptation¶

class dalib.translation.fourier_transform.FourierTransform(image_list, amplitude_dir, beta=1, rebuild=False)[source]¶

Fourier Transform is introduced by FDA: Fourier Domain Adaptation for Semantic Segmentation (CVPR 2020)

Fourier Transform replace the low frequency component of the amplitude of the source image to that of the target image. Denote with \(M_{β}\) a mask, whose value is zero except for the center region:

\[M_{β}(h,w) = \mathbb{1}_{(h, w)\in [-β,β, -β, β]}\]

Given images \(x^s\) from source domain and \(x^t\) from target domain, the source image in the target style is

\[x^{s→t} = \mathcal{F}^{-1}([ M_{β}\circ\mathcal{F}^A(x^t) + (1-M_{β})\circ\mathcal{F}^A(x^s), \mathcal{F}^P(x^s) ])\]

where \(\mathcal{F}^A\), \(\mathcal{F}^P\) are the amplitude and phase component of the Fourier Transform \(\mathcal{F}\) of an RGB image.

Parameters

image_list (sequence[str]) – A sequence of image list from the target domain.
amplitude_dir (str) – Specifies the directory to put the amplitude component of the target image.
beta (int, optional) – \(β\). Default: 1.
rebuild (bool, optional) – whether rebuild the amplitude component of the target image in the given directory.

Inputs:

image (PIL Image): image from the source domain, \(x^t\).

Examples

>>> from dalib.translation.fourier_transform import FourierTransform
>>> image_list = ["target_image_path1", "target_image_path2"]
>>> amplitude_dir = "path/to/amplitude_dir"
>>> fourier_transform = FourierTransform(image_list, amplitude_dir, beta=1, rebuild=False)
>>> source_image = np.array((256, 256, 3)) # image form source domain
>>> source_image_in_target_style = fourier_transform(source_image)

Note

The meaning of \(β\) is different from that of the origin paper. Experimentally, we found that the size of the center region in the frequency space should be constant when the image size increases. Thus we make the size of the center region independent of the image size. A recommended value for \(β\) is 1.

Note

The image structure of the source domain and target domain should be as similar as possible, thus for segemntation tasks, FourierTransform should be used before RandomResizeCrop and other transformations.

Note

The image size of the source domain and the target domain need to be the same, thus before FourierTransform, you should use Resize to convert the source image to the target image size.

Examples

>>> from dalib.translation.fourier_transform import FourierTransform
>>> import common.vision.datasets.segmentation.transforms as T
>>> from PIL import Image
>>> target_image_list = ["target_image_path1", "target_image_path2"]
>>> amplitude_dir = "path/to/amplitude_dir"
>>> # build a fourier transform that translate source images to the target style
>>> fourier_transform = T.wrapper(FourierTransform)(target_image_list, amplitude_dir)
>>> transforms=T.Compose([
...     # convert source image to the size of the target image before fourier transform
...     T.Resize((2048, 1024)),
...     fourier_transform,
...     T.RandomResizedCrop((1024, 512)),
...     T.RandomHorizontalFlip(),
... ])
>>> source_image = Image.open("path/to/source_image") # image form source domain
>>> source_image_in_target_style = transforms(source_image)

dalib.translation.fourier_transform.low_freq_mutate(amp_src, amp_trg, beta=1)[source]¶

Parameters

amp_src (numpy.ndarray) – amplitude component of the Fourier transform of source image
amp_trg (numpy.ndarray) – amplitude component of the Fourier transform of target image
beta (int, optional) – the size of the center region to be replace. Default: 1

Returns

amplitude component of the Fourier transform of source image whose low-frequency component is replaced by that of the target image.

dalib.adaptation.fda.robust_entropy(y, ita=1.5, num_classes=19, reduction='mean')[source]¶

Robust entropy proposed in FDA: Fourier Domain Adaptation for Semantic Segmentation (CVPR 2020)

Parameters

y (tensor) – logits output of segmentation model in shape of \((N, C, H, W)\)
ita (float, optional) – parameters for robust entropy. Default: 1.5
num_classes (int, optional) – number of classes. Default: 19
reduction (string, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Returns

Scalar by default. If reduction is 'none', then \((N, )\).

Domain Translation¶

CycleGAN: Cycle-Consistent Adversarial Networks¶

Discriminator¶

Generator¶

GAN Loss¶

Translation¶

Util¶

CyCADA: Cycle-Consistent Adversarial Domain Adaptation¶

SPGAN: Similarity Preserving Generative Adversarial Network¶

Siamese Network¶

Contrastive Loss¶

FDA: Fourier Domain Adaptation¶

Docs

Tutorials