diffusion_models.models.vae.ResNet18Encoder

class diffusion_models.models.vae.ResNet18Encoder(in_channels, hidden_dim=256)[source]

Bases: Module

Class implementing the ResNet encoder.

For exact details, see He et al: Deep Residual Learning for Image Recognition (2015).

Implementation

  1. Image size initially is the usual ImageNet crop of 224x224.

  2. Channels increased to 64, image size decreased to 56x56, before the repeated residual blocks begin.

  3. We split into 4 submodules, where every submodule consists of 2 residual blocks.
    1. standard residual blocks

    2. first residual block increases channels to 128, halves size with stride 2, second is standard

    3. like b., but to 256 channels

    4. like b., but to 512 channels

Output of residual blocks has size 7x7 with 512 channels.

__init__(in_channels, hidden_dim=256)[source]

Methods

__init__(in_channels[, hidden_dim])

forward(x)

rtype:

Tuple[Tensor, 'batch hidden_dim'), Tensor, 'batch hidden_dim')]