Projects

GPT-2 From Scratch

Decoder-only Transformer • Byte-level BPE • Full training pipeline reproduction

Reimplemented GPT-2 (12-layer, 768-dim, 12-head decoder-only Transformer) entirely from scratch in PyTorch, including a byte-level BPE tokenizer, custom dataloader, weight tying, causal self-attention, and a full training loop with learning rate warmup and cosine decay to reproduce the original paper’s setup. Focused on architectural fidelity, training stability, and performance benchmarking against WikiText-2, validating perplexity and scaling behavior end-to-end.

<p><strong>GitHub:</strong> https://github.com/BardiaKoopah/GPT-2_from_scratch</p>

Transformer from Scratch

PyTorch • Multi30k • Full encoder–decoder architecture

Implemented a full Transformer model from scratch in PyTorch, including multi-head attention, positional encodings, custom tokenizers, and a training loop on the Multi30k dataset. Focused on reproducing “Attention is All You Need” details and understanding every component end-to-end.

<p><strong>GitHub:</strong> https://github.com/BardiaKoopah/my-transformer-from-scratch</p>