Transformer

What is a transformer?

A transformer is a deep learning model architecture. "Transform" means representation is trained and re-used only by changing downstream layers.

Types of transformers

encoder only LLM (autoencoding models)

decoder only LLM (autoregressive models)

encoder decoder LLM (sequence-to-sequence models)

Transformer architecture

Source

The original transformer architecture relies on a combination of encoder and decoder
modules. Each encoder and decoder consists of a series of layers, with each layer comprising key components:

Input embedding

Multi-head attention

Steps of self-attention

Multi-head attention

Layer normalization & Residual connections

They both contribute to the stability and effectiveness of transformer models.

Feedforward layer