Unicode encoding

来自WHY42

Unicode

Latest unicode standard is 16.0, which contains 154,998 characters in total. [1]


UTF-8, UTF-16, UTF-32

Unicode in programing languages

JavaScript uses UTF-16 encoding, where each Unicode character may be encoded as one or two code units, so it's possible for the value returned by length to not match the actual number of Unicode characters in the string. For common scripts like Latin, Cyrillic, wellknown CJK characters, etc., this should not be an issue, but if you are working with certain scripts, such as emojis, mathematical symbols, or obscure Chinese characters, you may need to account for the difference between code units and characters.

Javascript uses UTF-16 to represent strings.

const emoji = "😄";
console.log(emoji.length); // 2
console.log([...emoji].length); // 1
const adlam = "𞤲𞥋𞤣𞤫";
console.log(adlam.length); // 8
console.log([...adlam].length); // 4
const formula = "∀𝑥∈ℝ,𝑥²≥0";
console.log(formula.length); // 11
console.log([...formula].length); // 9

Unicode


ส็็็็็็็็็็็็็็็็็็็็็็็็็༼ ຈل͜ຈ༽ส้้้้้้้้้้้้้้้้้้้้้้้
ส็็็็็็็็็็็็็็็็็็็็็็็็็༼ ಠ_ಠ ༽ส้้้้้้้้้้้้้้้้้้้้้้้
ส็็็็็็็็็็็็็็็็็็็( ͡° ͜ʖ ͡°)
ส้้้้้้้้้้้้้้้้้้้้้้้ ส็็็็็็็็็็็็็็็็็็็็็็็็็ 
S̢͎̳̞̲͈̪̳̻ͮͩt̟̳̏ͬ̔͒̈́ͦ͠a̞̤̝̟ͫ̽̂̈́ͪ͐͘n͕͐͑ͪ͐ͦ͋ͮ̅d͚̗̙̎ͫ̌â̗̬͓͍͍̳̥͆̕͠r̢̘ͣ̀d̢̢̢̘̲̺͙̂̈́̊ͬ ͎͎̫͚̣̺̤̖͊̏̀ͬ͞u̧͆ͩ́͒҉͔̠̪̖̹̠̰͎ṇ̸̛͚̟̫͎̟̣̜͋̈́ͧͯi̲̲̺͑̐ͣ͗̿̕͘͝c̦͈͇̦͈ͦ̆ͨ͝o̟̭̫̥͎̹͆́ͥ͊ͬ̏͝d̪͔̯̥̩͙̝ͩ̏͒̈́ͩ̿́̕͜ͅe͍͓̻̊͛ͅ ̸̧̻̺̤̠͙ͪ̋̽l̛̥̥ͬ͂̈́ͤ̓̀̓̚͘ͅͅͅǒ̮͓̼ͭ̂̆̇̕͘ͅl̯̯̟̗͔̳͉̰ͫ̒ͧͦͩͦ̓̓͢ͅs̝͎͚̗̮̟̒̔͛̈̊͋͒ͩͅ Cool!



我也来生成一个: Rͨ̍̀̐iͩͤͦ̈́́̓g̃ͬ̾u̓͆ͬ̐̎ͨ͋̆z̑ͤͯ̒ͦ͗̿̍ ͤ̇̒L͒̂͑̎ͣͣͯ̉e̊e̐̏̏̆̑͗ͥ́ 了解更多,参见这里