CatGPT (a real transformer language model)

NONormallyNormal•Shared March 10, 2026

57 views

Love/View Ratio: 3.51%

Instructions

Press start and enter a short (less than 20 character) prompt (only letters and spaces allowed). This project implements a real transformer-based language model entirely in Scratch. The model has around 9,400 parameters. That's nothing by modern standards, but every part of the architecture is there: token and positional embeddings, causal multi-head self-attention (2 heads), layer normalization, a feed-forward network with ReLU activation, residual connections, and weight-tied output. It generates one token (character) at a time, just like full-scale GPT models do. The responses aren't great with only 9k parameters and a 32-character context window. It can barely string a sentence together but it does produce real English words and occasionally coherent replies.

Notes & Credits

Trained in PyTorch 2.10 with the roskoN/dailydialog dataset. Weights are exported to text files to import into Scratch lists.

Project Details

Project ID1288884594

Search IndexUnindexed / NFE

CreatedMarch 10, 2026

Last ModifiedMarch 12, 2026

SharedMarch 10, 2026

CommentsAllowed