100% by me, not yet finished. Will be for my transformer, and also maybe for a string compression script... id=1 means that is a [UNK] (unknowed) token case sensitivity please wait 10 seconds on scratch on the beginning, time to generate data for case sensivity
IT'S FINISHED, ONLY THE TRAINING ISN'T FINISHED I picked "training" data from gutenberg project, 20,000 first books. I will train it with 75k books. merge optimizer: idea by but not code from: https://scratch.mit.edu/projects/1137519813/