Unicode Character Set in JavaScript

Unicode Character Set ASCII Table

In the Unicode Escape Sequences blog, we have learned about What is Unicode Escape Sequence and Normalization. Unicode Escape Sequences and Normalization are important tools in JavaScript for handling different characters and ensuring that the text looks the same. Unicode Escape Sequences help in dealing with characters from various languages, while Normalization ensures uniform text representations, reducing inconsistencies.

Using these tools improves the strength of JavaScript code, promoting compatibility across languages and creating a more inclusive web development environment. In this blog, we will go through the concept of the Unicode Character Set


What is a Unicode Character Set?

What is a Unicode Character Set? Before we dive into Unicode, let's explore how data is stored. Data is stored in bits, which are 0s and 1s. For instance, to represent the number 28, we use the binary form 11100. However, when it comes to displaying characters like b,👍 (thumbs up), or é, we turn to ASCII. ASCII, or American Standard Code for Information Interchange, assigns numbers from 0 to 127 to basic Western characters, supporting 128 characters in total. For example,


StringHeAlo
ASCII value7210165108111
Binary0100 10000110 01010100 00010110 11000110 1111

Each character becomes 8 bits or 1 byte in this encoding, known as ASCII encoding. But what if we want to display non-Western characters or other writing systems, like Chinese or Arabic? That's where Unicode comes in. Unicode encompasses over a thousand unique characters in more than a hundred languages.

To display characters or symbols in Unicode, we use code points, which can be combined. For instance, d can be represented by a single code point, i.e100, and é with an acute accent can be represented by a single code point, i.e., 233, or by multiple code points, i.e., e by 101 and ́ by 769.


ASCII Table

In the above section we have learned about ASCII. Let's take a look into its representation in Decimal, Hexadecimal:

DecimalHexadecimalBinaryCharDescription
00000000000NULNull
10100000001SOHStart of Heading
20200000010STXStart of Text
30300000011ETXEnd of Text
40400000100EOTEnd of Transmission
50500000101ENQEnquiry
60600000110ACKAcknowledge
70700000111BELBell
80800001000BSBackspace
90900001001HTHorizontal Tab
100A00001010LFLine Feed
110B00001011VTVertical Tab
120C00001100FFForm Feed
130D00001101CRCarriage Return
140E00001110SOShift Out
150F00001111SIShift In
161000010000DLEData Link Escape
171100010001DC1Device Control 1
181200010010DC2Device Control 2
191300010011DC3Device Control 3
201400010100DC4Device Control 4
211500010101NAKNegative Acknowledge
221600010110SYNSynchronize
231700010111ETBEnd of Transmission Block
241800011000CANCancel
251900011001EMEnd of Medium
261A00011010SUBSubstitute
271B00011011ESCEscape
281C00011100FSFile Separator
291D00011101GSGroup Separator
301E00011110RSRecord Separator
311F00011111USUnit Separator
322000100000spaceSpace
332100100001!exclamation mark
342200100010"double quote
352300100011#number
362400100100$dollar
372500100101%percent
382600100110&ampersand
392700100111'single quote
402800101000(left parenthesis
412900101001)right parenthesis
422A00101010*asterisk
432B00101011+plus
442C00101100,comma
452D00101101-minus
462E00101110.period
472F00101111/slash
4830001100000zero
4931001100011one
5032001100102two
5133001100113three
5234001101004four
5335001101015five
5436001101106six
5537001101117seven
5638001110008eight
5739001110019nine
583A00111010:colon
593B00111011;semicolon
603C00111100<less than
613D00111101=equality sign
623E00111110>greater than
633F00111111?question mark
644001000000@at sign
654101000001A 
664201000010B 
674301000011C 
684401000100D 
694501000101E 
704601000110F 
714701000111G 
724801001000H 
734901001001I 
744A01001010J 
754B01001011K 
764C01001100L 
774D01001101M 
784E01001110N 
794F01001111O 
805001010000P 
815101010001Q 
825201010010R 
835301010011S 
845401010100T 
855501010101U 
865601010110V 
875701010111W 
885801011000X 
895901011001Y 
905A01011010Z 
915B01011011[left square bracket
925C01011100\backslash
935D01011101]right square bracket
945E01011110^caret / circumflex
955F01011111_underscore
966001100000`grave / accent
976101100001a 
986201100010b 
996301100011c 
1006401100100d 
1016501100101e 
1026601100110f 
1036701100111g 
1046801101000h 
1056901101001i 
1066A01101010j 
1076B01101011k 
1086C01101100l 
1096D01101101m 
1106E01101110n 
1116F01101111o 
1127001110000p 
1137101110001q 
1147201110010r 
1157301110011s 
1167401110100t 
1177501110101u 
1187601110110v 
1197701110111w 
1207801111000x 
1217901111001y 
1227A01111010z 
1237B01111011{left curly bracket
1247C01111100|vertical bar
1257D01111101}right curly bracket
1267E01111110~tilde
1277F01111111DELdelete

Latin-1 Table

Let’s take a look into the Latin-1 Table where it shows the decimal and hexadecimal representation of characters:

DecimalHexadecimalDescription
000null
101start of heading
202start of text
303end of text
404end of transmission
505enquiry
606acknowledge
707bell
808backspace
909character tabulation
100Aline feed
110Bline tabulation
120Cform feed
130Dcarriage return
140Eshift out
150Fshift in
1610datalink escape
1711device control one
1812device control two
1913device control three
2014device control four
2115negative acknowledge
2216synchronous idle
2317end of transmission block
2418cancel
2519end of medium
261Asubstitute
271Bescape
281Cfile separator
291Dgroup separator
301Erecord separator
311Funit separator
3220space
1277Fdelete
12880padding character
12981high octet preset
13082break permitted here
13183no break here
13284index
13385next line
13486start of selected area
13587end of selected area
13688character tabulation set
13789character tabulation with justification
1388Aline tabulation set
1398Bpartial line forward
1408Cpartial line backward
1418Dreverse line feed
1428Esingle shift two
1438Fsingle shift three
14490device control string
14591private use one
14692private use two
14793set transmit state
14894cancel character
14995message waiting
15096start of guarded area
15197end of guarded area
15298start of string
15399single graphic character introducer
1549Asingle character introducer
1559Bcontrol sequence introducer
1569Cstring terminator
1579Doperating system command
1589Eprivacy message
1599Fapplication program command
160A0non-breaking space
161A1inverted exclamation mark
162A2cent sign
163A3pound sterling sign
164A4currency sign
165A5yen sign
166A6broken bar
167A7section sign
168A8diaeresis (umlaut)
169A9copyright sign
170AAfeminine ordinal
171ABleft angle quote
172ACnot sign
173ADsoft hyphen
174AEregistered sign
175AFmacron
176B0degree sign
177B1plus-minus sign
178B2superscript two
179B3superscript three
180B4acute accent
181B5micro sign
182B6paragraph sign (pilcrow)
183B7middle dot
184B8cedilla
185B9superscript one
186BAmasculine ordinal
187BBright angle quote
188BCone-fourth fraction
189BDone-half fraction
190BEthree-quarter fraction
191BFinverted question mark
192C0capital a with grave accent
193C1capital a with acute accent
194C2capital a with circumflex
195C3capital a with tilde
196C4capital a with diaeresis
197C5capital a with ring
198C6capital ae ligature
199C7capital c with cedilla
200C8capital e with grave accent
201C9capital e with acute accent
202CAcapital e with circumflex
203CBcapital e with diaeresis
204CCcapital i with grave accent
205CDcapital i with acute accent
206CEcapital i with circumflex
207CFcapital i with diaeresis
208D0capital eth
209D1capital n with tilde
210D2capital o with grave accent
211D3capital o with acute accent
212D4capital o with circumflex
213D5capital o with tilde
214D6capital o with diaeresis
215D7multiplication sign
216D8capital o with slash
217D9capital u with grave accent
218DAcapital u with acute accent
219DBcapital u with circumflex
220DCcapital u with diaeresis
221DDcapital y with acute accent
222DEcapital thorn
223DFsmall sharp s
224E0small a with grave accent
225E1small a with acute accent
226E2small a with circumflex
227E3small a with tilde
228E4small a with diaeresis
229E5small a with ring
230E6small ae ligature
231E8small c with cedilla
232E7small e with grave accent
233E9small e with acute accent
234EAsmall e with circumflex
235EBsmall e with diaeresis
236ECsmall i with grave accent
237EDsmall i with acute accent
238EEsmall i with circumflex
239EFsmall i with diaeresis
240F0small eth
241F1small n with tilde
242F2small o with grave accent
243F3small o with acute accent
244F4small o with circumflex
245F5small o with tilde
246F6small o with diaeresis
247F7division sign
248F8small o with slash
249F9small u with grave accent
250FAsmall u with acute accent
251FBsmall u with circumflex
252FCsmall u with diaeresis
253FDsmall y with acute accent
254FEsmall thorn
255FFsmall y with diaeresis

Unicode helps websites work well with different languages, making the online experience more connected and accessible. In coding, knowing and using Unicode is essential for creating websites that speak a global language.