Unicode Character Set in JavaScript

Last modified: Mar 17, 2024 by Atrowel

In the Unicode Escape Sequences blog, we have learned about What is Unicode Escape Sequence and Normalization. Unicode Escape Sequences and Normalization are important tools in JavaScript for handling different characters and ensuring that the text looks the same. Unicode Escape Sequences help in dealing with characters from various languages, while Normalization ensures uniform text representations, reducing inconsistencies.

Using these tools improves the strength of JavaScript code, promoting compatibility across languages and creating a more inclusive web development environment. In this blog, we will go through the concept of the Unicode Character Set

What is a Unicode Character Set?

What is a Unicode Character Set? Before we dive into Unicode, let's explore how data is stored. Data is stored in bits, which are 0s and 1s. For instance, to represent the number 28, we use the binary form 11100. However, when it comes to displaying characters like b,👍 (thumbs up), or é, we turn to ASCII. ASCII, or American Standard Code for Information Interchange, assigns numbers from 0 to 127 to basic Western characters, supporting 128 characters in total. For example,

String	H	e	A	l	o
ASCII value	72	101	65	108	111
Binary	0100 1000	0110 0101	0100 0001	0110 1100	0110 1111

Each character becomes 8 bits or 1 byte in this encoding, known as ASCII encoding. But what if we want to display non-Western characters or other writing systems, like Chinese or Arabic? That's where Unicode comes in. Unicode encompasses over a thousand unique characters in more than a hundred languages.

To display characters or symbols in Unicode, we use code points, which can be combined. For instance, d can be represented by a single code point, i.e100, and é with an acute accent can be represented by a single code point, i.e., 233, or by multiple code points, i.e., e by 101 and ́ by 769.

ASCII Table

In the above section we have learned about ASCII. Let's take a look into its representation in Decimal, Hexadecimal:

Decimal	Hexadecimal	Binary	Char	Description
0	00	00000000	NUL	Null
1	01	00000001	SOH	Start of Heading
2	02	00000010	STX	Start of Text
3	03	00000011	ETX	End of Text
4	04	00000100	EOT	End of Transmission
5	05	00000101	ENQ	Enquiry
6	06	00000110	ACK	Acknowledge
7	07	00000111	BEL	Bell
8	08	00001000	BS	Backspace
9	09	00001001	HT	Horizontal Tab
10	0A	00001010	LF	Line Feed
11	0B	00001011	VT	Vertical Tab
12	0C	00001100	FF	Form Feed
13	0D	00001101	CR	Carriage Return
14	0E	00001110	SO	Shift Out
15	0F	00001111	SI	Shift In
16	10	00010000	DLE	Data Link Escape
17	11	00010001	DC1	Device Control 1
18	12	00010010	DC2	Device Control 2
19	13	00010011	DC3	Device Control 3
20	14	00010100	DC4	Device Control 4
21	15	00010101	NAK	Negative Acknowledge
22	16	00010110	SYN	Synchronize
23	17	00010111	ETB	End of Transmission Block
24	18	00011000	CAN	Cancel
25	19	00011001	EM	End of Medium
26	1A	00011010	SUB	Substitute
27	1B	00011011	ESC	Escape
28	1C	00011100	FS	File Separator
29	1D	00011101	GS	Group Separator
30	1E	00011110	RS	Record Separator
31	1F	00011111	US	Unit Separator
32	20	00100000	space	Space
33	21	00100001	!	exclamation mark
34	22	00100010	"	double quote
35	23	00100011	#	number
36	24	00100100	$	dollar
37	25	00100101	%	percent
38	26	00100110	&	ampersand
39	27	00100111	'	single quote
40	28	00101000	(	left parenthesis
41	29	00101001	)	right parenthesis
42	2A	00101010	*	asterisk
43	2B	00101011	+	plus
44	2C	00101100	,	comma
45	2D	00101101	-	minus
46	2E	00101110	.	period
47	2F	00101111	/	slash
48	30	00110000	0	zero
49	31	00110001	1	one
50	32	00110010	2	two
51	33	00110011	3	three
52	34	00110100	4	four
53	35	00110101	5	five
54	36	00110110	6	six
55	37	00110111	7	seven
56	38	00111000	8	eight
57	39	00111001	9	nine
58	3A	00111010	:	colon
59	3B	00111011	;	semicolon
60	3C	00111100	<	less than
61	3D	00111101	=	equality sign
62	3E	00111110	>	greater than
63	3F	00111111	?	question mark
64	40	01000000	@	at sign
65	41	01000001	A
66	42	01000010	B
67	43	01000011	C
68	44	01000100	D
69	45	01000101	E
70	46	01000110	F
71	47	01000111	G
72	48	01001000	H
73	49	01001001	I
74	4A	01001010	J
75	4B	01001011	K
76	4C	01001100	L
77	4D	01001101	M
78	4E	01001110	N
79	4F	01001111	O
80	50	01010000	P
81	51	01010001	Q
82	52	01010010	R
83	53	01010011	S
84	54	01010100	T
85	55	01010101	U
86	56	01010110	V
87	57	01010111	W
88	58	01011000	X
89	59	01011001	Y
90	5A	01011010	Z
91	5B	01011011	[	left square bracket
92	5C	01011100	\	backslash
93	5D	01011101	]	right square bracket
94	5E	01011110	^	caret / circumflex
95	5F	01011111	_	underscore
96	60	01100000	`	grave / accent
97	61	01100001	a
98	62	01100010	b
99	63	01100011	c
100	64	01100100	d
101	65	01100101	e
102	66	01100110	f
103	67	01100111	g
104	68	01101000	h
105	69	01101001	i
106	6A	01101010	j
107	6B	01101011	k
108	6C	01101100	l
109	6D	01101101	m
110	6E	01101110	n
111	6F	01101111	o
112	70	01110000	p
113	71	01110001	q
114	72	01110010	r
115	73	01110011	s
116	74	01110100	t
117	75	01110101	u
118	76	01110110	v
119	77	01110111	w
120	78	01111000	x
121	79	01111001	y
122	7A	01111010	z
123	7B	01111011	{	left curly bracket
124	7C	01111100	\|	vertical bar
125	7D	01111101	}	right curly bracket
126	7E	01111110	~	tilde
127	7F	01111111	DEL	delete

Latin-1 Table

Let’s take a look into the Latin-1 Table where it shows the decimal and hexadecimal representation of characters:

Decimal	Hexadecimal	Description
0	00	null
1	01	start of heading
2	02	start of text
3	03	end of text
4	04	end of transmission
5	05	enquiry
6	06	acknowledge
7	07	bell
8	08	backspace
9	09	character tabulation
10	0A	line feed
11	0B	line tabulation
12	0C	form feed
13	0D	carriage return
14	0E	shift out
15	0F	shift in
16	10	datalink escape
17	11	device control one
18	12	device control two
19	13	device control three
20	14	device control four
21	15	negative acknowledge
22	16	synchronous idle
23	17	end of transmission block
24	18	cancel
25	19	end of medium
26	1A	substitute
27	1B	escape
28	1C	file separator
29	1D	group separator
30	1E	record separator
31	1F	unit separator
32	20	space
127	7F	delete
128	80	padding character
129	81	high octet preset
130	82	break permitted here
131	83	no break here
132	84	index
133	85	next line
134	86	start of selected area
135	87	end of selected area
136	88	character tabulation set
137	89	character tabulation with justification
138	8A	line tabulation set
139	8B	partial line forward
140	8C	partial line backward
141	8D	reverse line feed
142	8E	single shift two
143	8F	single shift three
144	90	device control string
145	91	private use one
146	92	private use two
147	93	set transmit state
148	94	cancel character
149	95	message waiting
150	96	start of guarded area
151	97	end of guarded area
152	98	start of string
153	99	single graphic character introducer
154	9A	single character introducer
155	9B	control sequence introducer
156	9C	string terminator
157	9D	operating system command
158	9E	privacy message
159	9F	application program command
160	A0	non-breaking space
161	A1	inverted exclamation mark
162	A2	cent sign
163	A3	pound sterling sign
164	A4	currency sign
165	A5	yen sign
166	A6	broken bar
167	A7	section sign
168	A8	diaeresis (umlaut)
169	A9	copyright sign
170	AA	feminine ordinal
171	AB	left angle quote
172	AC	not sign
173	AD	soft hyphen
174	AE	registered sign
175	AF	macron
176	B0	degree sign
177	B1	plus-minus sign
178	B2	superscript two
179	B3	superscript three
180	B4	acute accent
181	B5	micro sign
182	B6	paragraph sign (pilcrow)
183	B7	middle dot
184	B8	cedilla
185	B9	superscript one
186	BA	masculine ordinal
187	BB	right angle quote
188	BC	one-fourth fraction
189	BD	one-half fraction
190	BE	three-quarter fraction
191	BF	inverted question mark
192	C0	capital a with grave accent
193	C1	capital a with acute accent
194	C2	capital a with circumflex
195	C3	capital a with tilde
196	C4	capital a with diaeresis
197	C5	capital a with ring
198	C6	capital ae ligature
199	C7	capital c with cedilla
200	C8	capital e with grave accent
201	C9	capital e with acute accent
202	CA	capital e with circumflex
203	CB	capital e with diaeresis
204	CC	capital i with grave accent
205	CD	capital i with acute accent
206	CE	capital i with circumflex
207	CF	capital i with diaeresis
208	D0	capital eth
209	D1	capital n with tilde
210	D2	capital o with grave accent
211	D3	capital o with acute accent
212	D4	capital o with circumflex
213	D5	capital o with tilde
214	D6	capital o with diaeresis
215	D7	multiplication sign
216	D8	capital o with slash
217	D9	capital u with grave accent
218	DA	capital u with acute accent
219	DB	capital u with circumflex
220	DC	capital u with diaeresis
221	DD	capital y with acute accent
222	DE	capital thorn
223	DF	small sharp s
224	E0	small a with grave accent
225	E1	small a with acute accent
226	E2	small a with circumflex
227	E3	small a with tilde
228	E4	small a with diaeresis
229	E5	small a with ring
230	E6	small ae ligature
231	E8	small c with cedilla
232	E7	small e with grave accent
233	E9	small e with acute accent
234	EA	small e with circumflex
235	EB	small e with diaeresis
236	EC	small i with grave accent
237	ED	small i with acute accent
238	EE	small i with circumflex
239	EF	small i with diaeresis
240	F0	small eth
241	F1	small n with tilde
242	F2	small o with grave accent
243	F3	small o with acute accent
244	F4	small o with circumflex
245	F5	small o with tilde
246	F6	small o with diaeresis
247	F7	division sign
248	F8	small o with slash
249	F9	small u with grave accent
250	FA	small u with acute accent
251	FB	small u with circumflex
252	FC	small u with diaeresis
253	FD	small y with acute accent
254	FE	small thorn
255	FF	small y with diaeresis

Unicode helps websites work well with different languages, making the online experience more connected and accessible. In coding, knowing and using Unicode is essential for creating websites that speak a global language.

Unicode Character Set in JavaScript

What is a Unicode Character Set?

ASCII Table

Latin-1 Table

Share