This is a full working tokenizer! Created by me :) This is the first step to my project of creating an entire language model (a small one) inside a scratch project. [A language model is an Ai. ChatGPT is a language model]
This is the entire tokenizer of the language model: Smollm2-135M-instruct by Hugging face, recreated in scratch So thanks to them. (I excluded using "Run without screen refresh" on the main loop so you can see it working. Real application will not have that)