This is a BPE tokenizer that shares the same vocabulary and merge rules as GPT-4o, GPT-4.5, o1, o3, and all other recent OpenAI models. The vocab is called o200k_base, because it has about 200k tokens, which is just at the limit for scratch lists. However it can be slow for very long texts, so if you need a faster tokenizer you should use a smaller vocab. For example GPT-2's vocab contains only 50k tokens.
Text Engine by @PixelBuzz See tiktoken by OpenAI: https://github.com/openai/tiktoken See Tiktokenizer: https://tiktokenizer.vercel.app/?model=o200k_base Learn about tokenization: https://platform.openai.com/tokenizer