Emoji detector

SHSheep_maker•Shared July 30, 2020

827 views

Love/View Ratio: 4.96%

Notes & Credits

Scratch and JavaScript use UTF-16 to encode strings, so emojis require a surrogate pair. This just means that emojis seem to have a length of two, when in reality they're just one Unicode code point. To determine which characters to use, I ran let codePoints = {} for (const c of document.body.textContent) { if (c.length === 2) { const codePoint = c[0].charCodeAt().toString(16) if (codePoints[codePoint]) { codePoints[codePoint].push(c) } else { codePoints[codePoint] = [c] } } } on a random website that claimed to have all the emojis. It seems that the high surrogate pairs in use for emojis are U+D83C, U+D83D, and U+D83E. U+D83F seems to be used for some invisible (maybe joining?) characters. [1] says that the first surrogates are in the U+D800 to U+DBFF block, so there's a lot of uncharted territory. Perhaps characters from either end of this block can be used to properly detect these LARGE characters, but I can't be bothered to find one that renders properly for me. To see if a character is within a certain range, I think you can use the _ > _ and _ < _ blocks: not char < min and char > max What are use cases for this? Maybe this can be used to detect LARGE characters so they can be kept together when doing character-by-character things like reversing a string Apparently there are also "single-character" emoji in the Dingbat and Miscellaneous Symbols Unicode blocks (see [2] and [3]). [1]: https://docs.microsoft.com/en-us/windows/win32/intl/surrogates-and-supplementary-characters#about-supplementary-characters [2]: https://en.wikipedia.org/wiki/Dingbat#Compact_table [3]: https://en.wikipedia.org/wiki/Miscellaneous_Symbols#Compact_table

Project Details

Project ID414891510

Search IndexIndexed (Visible)

CreatedJuly 30, 2020

Last ModifiedAugust 23, 2020

SharedJuly 30, 2020

CommentsAllowed