
Here we details functions and classes available in the model module of DMLP, describing their functionality and usage.

  • Abstract Models
  • Models
  • My Transformers
    • Model Class

      Abstract Models

      We provide abstract model classes to serve as the backbone for text diffusion model. We built our default models upon these abstract classes. Users can also implement their own models with our abstract classes. Users can import these classes with from DMLP.models.abstract_models import *. The following section details each of the abstract classes and abstract methods they contain.

VAE abs

Class DMLP.abstract_models.VAE_abs(encoder, decoder, *args, device=None,**kwargs)

Base class for variational auto-encoder. Your models should be a subclass of this class. Args
encoder: model use to encode input text into latent space decoder: model use to decode latent space representation into text

from DMLP.models.abstract_models import VAE_Abs

class VAE(VAE_Abs):
    def __init__(self, encoder, decoder,  device=None):
        super(VAE, self).__init__(encoder, decoder, tokenizer_encoder, tokenizer_decoder, latent_size, output_dir,device=device)
        self.encoder = encoder
        self.decoder = decoder

        self.tokenizer_decoder = tokenizer_decoder
        self.tokenizer_encoder = tokenizer_encoder

        self.eos_token_id = tokenizer_decoder.convert_tokens_to_ids([tokenizer_decoder.eos_token])[0]
        self.pad_token_id = tokenizer_decoder.convert_tokens_to_ids([tokenizer_decoder.pad_token])[0]
        self.bos_token_id = tokenizer_decoder.convert_tokens_to_ids([tokenizer_decoder.bos_token])[0]
    def forward(self, inputs, labels):
        attention_mask = (inputs!=self.tokenizer_encoder.pad_token_id).float()

        out = self.encoder(inputs, attention_mask)
        out = self.decoder(input_ids=labels, past=latent_z, labels=labels, label_ignore=self.pad_token_id)
        return out

DDPM abs

Class DMLP.abstract_models.DDPM_abs(eps_model, betas, n_T, criterion, ddpm_schedule, *args, **kwargs)

Base class for Denoising Diffusion Probabilistic Model(DDPM). Your models should be a subclass of this class Args
eps_model: \(P_{\theta}\), Model for backward denoising process, should be some types of neural network
betas: Parameters for ddpm scheduler
n_t: Number of steps for diffusion/denoising
criterion: Objective function for calculating the diffusion loss
ddpm_schedule: scheduler that Returns pre-computed schedules for DDPM sampling, training process. reference

An example of usage can be found here


Class DMLP.abstract_models.VAE_DDPM_Abs(model_vae, ddpm, ddpm_weight, *args, **kwargs)

Base class for the complete VAE_DDPM structure model. Combine initialized VAE and DDPM and form a new VAE_DDPM object.

  model_vae: Initialized Variational Auto Encoder, should be a subclass of VAE_Abs  
  ddpm: Initialized DDPM, should be a subclass of DDPM_Abs
  ddpm_weight: hyperparameter $$\alpha$$ that adjust weight of ddpm loss in the total loss.  
  <div align="center"> $$\textbf{Loss} = \textbf{reconstruction loss} + \alpha \cdot \textbf{ddpm loss}$$</div>

from DMLP.models.abstract_models import VAE_DDPM_Abs

    def __init__(self, model_vae, ddpm, ddpm_weight) :
        super(VAE_DDPM, self).__init__(model_vae, ddpm, ddpm_weight)
        self.model_vae = model_vae
        self.ddpm = ddpm
        self.ddpm_weight = ddpm_weight

    def forward(self,inputs, labels): 
        loss_rec, loss_kl, loss, latent_z, mu = self.model_vae(inputs, labels)
        ddpm_loss, loss_weight = self.ddpm.forward(latent_z, mu)
        if self.ddpm_weight > 0:
            loss = (1/(loss_weight * self.ddpm.n_T)  * loss).mean() + self.ddpm_weight *ddpm_loss.mean()
            loss = loss.mean() + 0.0* ddpm_loss.mean()
        return loss_rec, loss_kl, loss, latent_z, mu, ddpm_loss, loss_weight


This module provides implementation of default models for users to use out of the box.


Class DMLP.models.models.VAE(encoder, decoder, tokenizer_encoder, tokenizer_decoder, latent_size, output_dir, device=None)

Implementation of default variational autoencoder with transformer encoder and decoder.

encoder: model use to encode input text into latent space
decoder: model use to decode latent space representation into text
tokenizer_encoder: tokenizer for encoder to encode text to tokens
tokenizer_decoder: tokenizer for decoder to decode tokens to text
latent_size: hyperparmeters for latent size representation
output_dir: directory to save checkpoints

reparametrized(mu, logvar, nsamples) sample from posterior Gaussian family

mu: Tensor, Mean of gaussian distribution with shape (batch, nz)
logvar: Tensor, logvar of gaussian distibution with shape (batch, nz)
Tensor, Sampled z with shape (batch,nz)

forward(inputs, labels) Define forward computation of VAE and compute reconstruction losses

inputs: encoder tokenized text
labels: decoder toeknized text
loss_rec: Loss between generated text and target text loss_kl: KL distance between generated text and the diffusion model loss: loss_rec / sentence length latent_z: latent representation of input text mu: Tensor, Mean of gaussian distribution with shape (batch, nz)


Class DMLP.models.models.DDPM(eps_model, betas, n_T, criterion, ddpm_schedule) Implementation of default DDPM.

eps_model: \(P_{\theta}\), Model for backward denoising process, should be some types of neural network
betas: Parameters for ddpm scheduler
n_t: Number of steps for diffusion/denoising
criterion: Objective function for calculating the diffusion loss
ddpm_schedule: scheduler that Returns pre-computed schedules for DDPM sampling, training process. reference

forward(x,mu) Makes forward diffusion x_t, and tries to guess epsilon value from x_t using eps_model.

x: latent representation from encoder
mu: mean of latent representation distribution from encoder
loss: loss between ddpm prediction \(z_0\) and actual \(x_0\)

sample(n_sample, size, device, fp16=False) Generate latent representation using denoising process

n_sample: number of sentence to generate
size: length of the sentence
device: GPU or CPU
fp16: whether compute at lower precision

x_i: generated sentence latent representation


Class DMLP.models.models.VAE_DDPM(model_vae, ddpm, ddpm_weight)
Implementation of the Complete VAE_DDPM model.

model_vae: Variational Autoencoder
ddpm: DDPM
ddpm_weight: hyperparameter \(\alpha\) that adjust weight of ddpm loss in the total loss.

$$\textbf{Loss} = \textbf{reconstruction loss} + \alpha \cdot \textbf{ddpm loss}$$

forward(inputs, labels) Forward Computation of VAE_DDPM.

inputs: input text tokens
labels: output text tokens
All outputs from VAE and DDPM. For details check above description


(TransformerNet, LinearModel, ResidualLinear has similar inputs)

Class DMLP.models.models.MLPSkipNet(latent_dim)
Implementation of MLP with skip connection. Neural network that mimic \(q_{\theta}\) in the forward process.

latent_dim: latent representation size

forward(x,t,z_sem=None) Forward computation of MLPskipNet.

x: sampled x_t
t: number of diffusion steps

h: Latent space representation of generated text

My Transformers

We provides 15 different transformers implementation as the options for VAE encoder/decoder. To access all models, import MODEL_CLASS from DMLP.models.my_transformers.

Model Class

MODEL_CLASS = {'BertForLatentConnector':BertForLatentConnector,

Our implementation based on 6 types of common used large language model: BERT, RoBERTa, DeBERTa, T5, GPT2, ALBERT. Click through the link to read about how to download each pretrained model. Examples also provided in here