This is essentially a properly functioning version of what I was trying to do earlier. This string similarity calculator takes into account position: AAAB and AAAB are more similar than AAAB and BAAA; but AAAB and BAAA are still more similar than AAAB and AAAA. If you scramble the words in a sentence, you'll get a higher similarity to the original sentence than if you scramble every letter. The relational positioning of substrings effects the similarity score. This algorithm is relatively efficient and is extremely useful for AIML purposes. For example: - Finding which known string an unknown string is closest to - Finding the closest word in a wordlist to an incorrectly spelt word (much less naive spellcheck!) - Ranking the average similarity of a string to hundreds of different strings of one emotion versus another batch of strings of another emotion, such that the emotion being expressed in that string can be deduced - Ranking the average similarity of a string to corpuses in various languages to tell (offline!) what language it might be written in - Given a dataset of chunks of a few words and This algorithm is entirely different from traditional similarity calculators, which check if the nth characters of both strings line up. It understands that "hey" and " hey" (with a space before) are basically the same, and that "hey" and "head" are actually much more different. The algorithm essentially splits up the first string into all orderings (KITTY into K I T T Y, KI TT Y, KIT TY, KITT Y, KITTY), and then checks how many segments of all orderings are in the second string. Then, the ratio of total segments produced to segments which were somewhere in the second string is calculated, giving a completely normalised similarity between 0 and 1. Feel free to use this in whichever creative way you'd like, just give a little credit! After a simple AI algorithm is devised, the definition of the "likeness" of two strings can often be the entire heart of the performance of the results. Even a perfectly good chatbot algorithm will fail if its fuzzy logic is just not that great. For strings of regular conversational/text-chat sentence length, the algorithm might take on average around only 8.5 seconds to process 2000 data points. Almost certainly, all the ideas I provided above are entirely realistic: words are even shorter and so can be quickly compared dozens of thousands of times throughout a wordlist; only a few hundred examples per language or emotion are really necessary; and most chatbot datasets of conversational length strings have no reason to go beyond at most 10k messages. Generally, for the quality of the algorithm, the efficiency is good enough to support decent AI projects. If you're still worried about performance, run a benchmark here: