Technically, this is not actually a Transformer because it doesn't have attention. This is just about the level of an iPhone's predictive text feature. Btw, in the newest update, I added a small delay between tokens to give it more of a ChatGPT feel. A transformer converts characters or chunks of characters (tokens) into items in a list (tokenizing). The model is then trained to find long sequences of words (tokens). ChatGPT uses a transformer model which is why GPT stands for "Generative Pretrained Transformer". IMPROVEMENTS TO BE MADE: Just like the original paper on transformers says, "Attention is all you need". Attention is basically the model being able to look at all of the previous tokens to pick a new token. Just imagine for a second how many words you would have to look at and how long the training would take. ☠️ Original paper: "Attention is all you need": https://research.google/pubs/attention-is-all-you-need/ Helpful video: "AI Language Models & Transformers - Computerphile": https://www.youtube.com/watch?v=rURRYI66E54 If you want to make a better Tokenizer: https://www.youtube.com/watch?v=zduSFxRajkE