Unicode encoding:修订间差异
imported>Riguz 无编辑摘要 |
标签:2017版源代码编辑 |
||
(未显示同一用户的4个中间版本) | |||
第1行: | 第1行: | ||
= Unicode = | |||
Latest unicode standard is 16.0, which contains 154,998 characters in total. <ref>https://www.unicode.org/versions/Unicode16.0.0/</ref> | |||
== UTF-8, UTF-16, UTF-32 == | |||
== Unicode in programing languages == | |||
JavaScript uses UTF-16 encoding, where each Unicode character may be encoded as one or two code units, so it's possible for the value returned by length to not match the actual number of Unicode characters in the string. For common scripts like Latin, Cyrillic, wellknown CJK characters, etc., this should not be an issue, but if you are working with certain scripts, such as emojis, mathematical symbols, or obscure Chinese characters, you may need to account for the difference between code units and characters. | |||
Javascript uses UTF-16 to represent strings. | |||
<syntaxhighlight lang="javascript"> | |||
const emoji = "😄"; | |||
console.log(emoji.length); // 2 | |||
console.log([...emoji].length); // 1 | |||
const adlam = "𞤲𞥋𞤣𞤫"; | |||
console.log(adlam.length); // 8 | |||
console.log([...adlam].length); // 4 | |||
const formula = "∀𝑥∈ℝ,𝑥²≥0"; | |||
console.log(formula.length); // 11 | |||
console.log([...formula].length); // 9 | |||
</syntaxhighlight> | |||
== Unicode == | |||
<pre> | <pre> | ||
2024年11月13日 (三) 11:31的最新版本
Unicode
Latest unicode standard is 16.0, which contains 154,998 characters in total. [1]
UTF-8, UTF-16, UTF-32
Unicode in programing languages
JavaScript uses UTF-16 encoding, where each Unicode character may be encoded as one or two code units, so it's possible for the value returned by length to not match the actual number of Unicode characters in the string. For common scripts like Latin, Cyrillic, wellknown CJK characters, etc., this should not be an issue, but if you are working with certain scripts, such as emojis, mathematical symbols, or obscure Chinese characters, you may need to account for the difference between code units and characters.
Javascript uses UTF-16 to represent strings.
const emoji = "😄";
console.log(emoji.length); // 2
console.log([...emoji].length); // 1
const adlam = "𞤲𞥋𞤣𞤫";
console.log(adlam.length); // 8
console.log([...adlam].length); // 4
const formula = "∀𝑥∈ℝ,𝑥²≥0";
console.log(formula.length); // 11
console.log([...formula].length); // 9
Unicode
ส็็็็็็็็็็็็็็็็็็็็็็็็็༼ ຈل͜ຈ༽ส้้้้้้้้้้้้้้้้้้้้้้้ ส็็็็็็็็็็็็็็็็็็็็็็็็็༼ ಠ_ಠ ༽ส้้้้้้้้้้้้้้้้้้้้้้้ ส็็็็็็็็็็็็็็็็็็็( ͡° ͜ʖ ͡°) ส้้้้้้้้้้้้้้้้้้้้้้้ ส็็็็็็็็็็็็็็็็็็็็็็็็็ S̢͎̳̞̲͈̪̳̻ͮͩt̟̳̏ͬ̔͒̈́ͦ͠a̞̤̝̟ͫ̽̂̈́ͪ͐͘n͕͐͑ͪ͐ͦ͋ͮ̅d͚̗̙̎ͫ̌â̗̬͓͍͍̳̥͆̕͠r̢̘ͣ̀d̢̢̢̘̲̺͙̂̈́̊ͬ ͎͎̫͚̣̺̤̖͊̏̀ͬ͞u̧͆ͩ́͒҉͔̠̪̖̹̠̰͎ṇ̸̛͚̟̫͎̟̣̜͋̈́ͧͯi̲̲̺͑̐ͣ͗̿̕͘͝c̦͈͇̦͈ͦ̆ͨ͝o̟̭̫̥͎̹͆́ͥ͊ͬ̏͝d̪͔̯̥̩͙̝ͩ̏͒̈́ͩ̿́̕͜ͅe͍͓̻̊͛ͅ ̸̧̻̺̤̠͙ͪ̋̽l̛̥̥ͬ͂̈́ͤ̓̀̓̚͘ͅͅͅǒ̮͓̼ͭ̂̆̇̕͘ͅl̯̯̟̗͔̳͉̰ͫ̒ͧͦͩͦ̓̓͢ͅs̝͎͚̗̮̟̒̔͛̈̊͋͒ͩͅ Cool!
我也来生成一个: Rͨ̍̀̐iͩͤͦ̈́́̓g̃ͬ̾u̓͆ͬ̐̎ͨ͋̆z̑ͤͯ̒ͦ͗̿̍ ͤ̇̒L͒̂͑̎ͣͣͯ̉e̊e̐̏̏̆̑͗ͥ́ 了解更多,参见这里