<!–
main_leaderboard, all: [728,90][970,90][320,50][468,60]
–>
HTML Character Sets
In this article, we will discuss HTML charset. Web browsers must recognize the character set used by HTML pages in order to display them correctly.
ASCII to UTF-8
Character encoding formed with ASCII. Utilizing ASCII, you could use 128 characters on the internet: numbers (0-9), English letters (A-Z), and some special characters: ! $ + – ( ) @ < > in HTML charset.
The standard HTML character set or HTML charset for HTML 4 was ISO-8859-1. A total of 256 character codes were supported by this character set. UTF-8 was also supported in HTML 4. The original character set for Windows was ANSI (Windows-1252). There are 32 extra characters in ANSI compared to ISO-8859-1.
Almost all HTML charset / characters and symbols are covered by the UTF-8 character set, which is the HTML5 specification!
The HTML charset Attribute
When it comes to HTML charset, a web browser must be able to determine which character set is used in the HTML page.
There is a specification for this in the <meta> tag:
Differences Between Character Sets
Below is a comparison of the HTML charset and the HTML character set:
Numb | ASCII | ANSI | 8859 | UTF-8 | Overview |
---|---|---|---|---|---|
87 | W | W | W | W | Latin capital letter W |
88 | X | X | X | X | Latin capital letter X |
89 | Y | Y | Y | Y | Latin capital letter Y |
90 | Z | Z | Z | Z | Latin capital letter Z |
91 | [ | [ | [ | [ | Left square bracket |
92 | Reverse solidus | ||||
93 | ] | ] | ] | ] | Right square bracket |
94 | ^ | ^ | ^ | ^ | Circumflex accent |
95 | _ | _ | _ | _ | Low line |
200 | È | È | È | Latin capital letter E with grave | |
32 | A Blank Space | ||||
33 | ! | ! | ! | ! | An exclamation mark |
34 | “ | “ | “ | “ | A quotation mark |
35 | # | # | # | # | A number symbol |
36 | $ | $ | $ | $ | Dollar symbol |
37 | % | % | % | % | Percent symbol |
38 | & | & | & | & | Ampersand |
39 | ‘ | ‘ | ‘ | ‘ | Apostrophe |
40 | ( | ( | ( | ( | Left parenthesis |
41 | ) | ) | ) | ) | Right parenthesis |
42 | * | * | * | * | An Asterisk |
43 | + | + | + | + | A Plus symbol |
44 | , | , | , | , | A comma |
45 | – | – | – | – | Hyphen-minus |
46 | . | . | . | . | Full stop (End of line) |
47 | / | / | / | / | Solidus |
48 | 0 | 0 | 0 | 0 | Digit zero (Number 0) |
49 | 1 | 1 | 1 | 1 | Digit one (Number 1) |
50 | 2 | 2 | 2 | 2 | Digit two (Number 2) |
51 | 3 | 3 | 3 | 3 | Digit three (Number 3) |
52 | 4 | 4 | 4 | 4 | Digit four (Number 4) |
53 | 5 | 5 | 5 | 5 | Digit five (Number 5) |
54 | 6 | 6 | 6 | 6 | Ddigit six (Number 6) |
55 | 7 | 7 | 7 | 7 | Digit seven (Number 7) |
56 | 8 | 8 | 8 | 8 | Digit eight (Number 8) |
57 | 9 | 9 | 9 | 9 | Digit nine (Number 9) |
58 | : | : | : | : | A colon |
59 | ; | ; | ; | ; | A semicolon |
60 | < | < | < | < | less-than symbol |
61 | = | = | = | = | equals symbol |
62 | > | > | > | > | greater-than symbol |
63 | ? | ? | ? | ? | A question mark |
64 | @ | @ | @ | @ | A commercial at |
65 | A | A | A | A | Latin capital letter A |
66 | B | B | B | B | Latin capital letter B |
67 | C | C | C | C | Latin capital letter C |
68 | D | D | D | D | Latin capital letter D |
79 | O | O | O | O | Latin capital letter O |
80 | P | P | P | P | Latin capital letter P |
81 | Q | Q | Q | Q | Latin capital letter Q |
82 | R | R | R | R | Latin capital letter R |
83 | S | S | S | S | Latin capital letter S |
84 | T | T | T | T | Latin capital letter T |
85 | U | U | U | U | Latin capital letter U |
86 | V | V | V | V | Latin capital letter V |
201 | É | É | É | Latin capital letter E with acute | |
202 | Ê | Ê | Ê | Latin capital letter E with circumflex | |
203 | Ë | Ë | Ë | Latin capital letter E with diaeresis | |
204 | Ì | Ì | Ì | Latin capital letter I with grave | |
216 | Ø | Ø | Ø | Latin capital letter O with stroke | |
217 | Ù | Ù | Ù | Latin capital letter U with grave | |
218 | Ú | Ú | Ú | Latin capital letter U with acute | |
219 | Û | Û | Û | Latin capital letter U with circumflex | |
220 | Ü | Ü | Ü | Latin capital letter U with diaeresis | |
221 | Ã | Ã | Ã | Latin capital letter Y with acute | |
222 | Þ | Þ | Þ | Latin capital letter Thorn | |
235 | ë | ë | ë | Latin small letter e with diaeresis | |
236 | ì | ì | ì | Latin small letter i with grave | |
237 | Ã | Ã | Ã | Latin small letter i with acute | |
238 | î | î | î | Latin small letter i with circumflex | |
69 | E | E | E | E | Latin capital letter E |
70 | F | F | F | F | Latin capital letter F |
71 | G | G | G | G | Latin capital letter G |
72 | H | H | H | H | Latin capital letter H |
73 | I | I | I | I | Latin capital letter I |
74 | J | J | J | J | Latin capital letter J |
75 | K | K | K | K | Latin capital letter K |
76 | L | L | L | L | Latin capital letter L |
77 | M | M | M | M | Latin capital letter M |
78 | N | N | N | N | Latin capital letter N |
239 | ï | ï | ï | Latin small letter i with diaeresis | |
240 | ð | ð | ð | Latin small letter eth | |
241 | ñ | ñ | ñ | Latin small letter n with tilde | |
242 | ò | ò | ò | Latin small letter o with grave | |
243 | ó | ó | ó | Latin small letter o with acute | |
244 | ô | ô | ô | Latin small letter o with circumflex | |
245 | õ | õ | õ | Latin small letter o with tilde | |
246 | ö | ö | ö | Latin small letter o with diaeresis | |
223 | ß | ß | ß | Latin small letter sharp s | |
224 | Ã | Ã | Ã | Latin small letter a with grave | |
225 | á | á | á | Latin small letter a with acute | |
226 | â | â | â | Latin small letter a with circumflex | |
227 | ã | ã | ã | Latin small letter a with tilde | |
228 | ä | ä | ä | Latin small letter a with diaeresis | |
229 | å | å | å | Latin small letter a with ring above | |
230 | æ | æ | æ | Latin small letter ae | |
231 | ç | ç | ç | Latin small letter c with cedilla | |
232 | è | è | è | Latin small letter e with grave | |
233 | é | é | é | Latin small letter e with acute | |
234 | ê | ê | ê | Latin small letter e with circumflex | |
254 | þ | þ | þ | Latin small letter thorn | |
255 | ÿ | ÿ | ÿ | Latin small letter y with diaeresis | |
205 | Ã | Ã | Ã | Latin capital letter I with acute | |
206 | ÃŽ | ÃŽ | ÃŽ | Latin capital letter I with circumflex | |
207 | Ã | Ã | Ã | Latin capital letter I with diaeresis | |
208 | Ã | Ã | Ã | Latin capital letter Eth | |
209 | Ñ | Ñ | Ñ | Latin capital letter N with tilde | |
210 | Ã’ | Ã’ | Ã’ | Latin capital letter O with grave | |
211 | Ó | Ó | Ó | Latin capital letter O with acute | |
212 | Ô | Ô | Ô | Latin capital letter O with circumflex | |
96 | ` | ` | ` | ` | Grave accent |
97 | a | a | a | a | Latin small letter a |
98 | b | b | b | b | Latin small letter b |
99 | c | c | c | c | Latin small letter c |
100 | d | d | d | d | Latin small letter d |
101 | e | e | e | e | Latin small letter e |
102 | f | f | f | f | Latin small letter f |
103 | g | g | g | g | Latin small letter g |
104 | h | h | h | h | Latin small letter h |
105 | i | i | i | i | Latin small letter i |
106 | j | j | j | j | Latin small letter j |
107 | k | k | k | k | Latin small letter k |
108 | l | l | l | l | Latin small letter l |
109 | m | m | m | m | Latin small letter m |
110 | n | n | n | n | Latin small letter n |
111 | o | o | o | o | Latin small letter o |
112 | p | p | p | p | Latin small letter p |
113 | q | q | q | q | Latin small letter q |
114 | r | r | r | r | Latin small letter r |
115 | s | s | s | s | Latin small letter s |
116 | t | t | t | t | Latin small letter t |
117 | u | u | u | u | Latin small letter u |
118 | v | v | v | v | Latin small letter v |
119 | w | w | w | w | Latin small letter w |
120 | x | x | x | x | Latin small letter x |
121 | y | y | y | y | Latin small letter y |
122 | z | z | z | z | Latin small letter z |
123 | { | { | { | { | Left curly bracket |
124 | | | | | | | | | Vertical line |
125 | } | } | } | } | Right curly bracket |
126 | ~ | ~ | ~ | ~ | Tilde |
127 | DEL | Delete | |||
128 | € | Euro symbol | |||
129 | Â | Â | Â | NOT USED | |
130 | ‚ | single low-9 quotation mark | |||
131 | Æ’ | Latin small letter f with hook | |||
132 | „ | double low-9 quotation mark | |||
133 | … | horizontal ellipsis | |||
134 | †| dagger | |||
135 | ‡ | double dagger | |||
136 | ˆ | modifier letter circumflex accent | |||
137 | ‰ | per mille sign | |||
138 | Å | Latin capital letter S with caron | |||
139 | ‹ | single left-pointing angle quotation mark | |||
140 | Å’ | Latin capital ligature OE | |||
141 | Â | Â | Â | NOT USED | |
142 | Ž | Latin capital letter Z with caron | |||
143 | Â | Â | Â | NOT USED | |
144 | Â | Â | Â | NOT USED | |
145 | ‘ | Left single quotation mark | |||
146 | ’ | Right single quotation mark | |||
147 | “ | Left double quotation mark | |||
148 | †| Right double quotation mark | |||
149 | • | Bullet | |||
247 | ÷ | ÷ | ÷ | division sign | |
248 | ø | ø | ø | Latin small letter o with stroke | |
249 | ù | ù | ù | Latin small letter u with grave | |
250 | ú | ú | ú | Latin small letter u with acute | |
251 | û | û | û | Latin small letter with circumflex | |
252 | ü | ü | ü | Latin small letter u with diaeresis | |
253 | ý | ý | ý | Latin small letter y with acute | |
150 | – | en dash | |||
151 | — | em dash | |||
152 | ˜ | small tilde | |||
153 | â„¢ | Trade mark sign | |||
154 | Å¡ | Latin small letter s with caron | |||
155 | › | Single right-pointing angle quotation mark | |||
156 | Å“ | Latin small ligature oe | |||
157 | Â | Â | Â | NOT USED | |
158 | ž | Latin small letter z with caron | |||
159 | Ÿ | Latin capital letter Y with diaeresis | |||
160 | no-break space | ||||
161 | ¡ | ¡ | ¡ | inverted exclamation mark | |
162 | ¢ | ¢ | ¢ | cent sign | |
163 | £ | £ | £ | pound sign | |
164 | ¤ | ¤ | ¤ | currency sign | |
165 | ¥ | ¥ | ¥ | yen sign | |
166 | ¦ | ¦ | ¦ | broken bar | |
167 | § | § | § | section sign | |
168 | ¨ | ¨ | ¨ | diaeresis | |
169 | © | © | © | copyright sign | |
170 | ª | ª | ª | feminine ordinal indicator | |
171 | « | « | « | left-pointing double angle quotation mark | |
172 | ¬ | ¬ | ¬ | not sign | |
173 | Â | Â | Â | soft hyphen | |
174 | ® | ® | ® | registered sign | |
175 | ¯ | ¯ | ¯ | macron | |
176 | ° | ° | ° | degree sign | |
177 | ± | ± | ± | plus-minus sign | |
178 | ² | ² | ² | superscript two | |
179 | ³ | ³ | ³ | superscript three | |
180 | ´ | ´ | ´ | acute accent | |
181 | µ | µ | µ | micro sign | |
182 | ¶ | ¶ | ¶ | pilcrow sign | |
183 | · | · | · | middle dot | |
184 | ¸ | ¸ | ¸ | cedilla | |
185 | ¹ | ¹ | ¹ | superscript one | |
186 | º | º | º | masculine ordinal indicator | |
187 | » | » | » | right-pointing double angle quotation mark | |
188 | ¼ | ¼ | ¼ | vulgar fraction one quarter | |
189 | ½ | ½ | ½ | vulgar fraction one half | |
190 | ¾ | ¾ | ¾ | vulgar fraction three quarters | |
191 | ¿ | ¿ | ¿ | inverted question mark | |
192 | À | À | À | Latin capital letter A with grave | |
193 | Ã | Ã | Ã | Latin capital letter A with acute | |
194 | Â | Â | Â | Latin capital letter A with circumflex | |
195 | Ã | Ã | Ã | Latin capital letter A with tilde | |
196 | Ä | Ä | Ä | Latin capital letter A with diaeresis | |
197 | Ã… | Ã… | Ã… | Latin capital letter A with ring above | |
198 | Æ | Æ | Æ | Latin capital letter AE | |
199 | Ç | Ç | Ç | Latin capital letter C with cedilla | |
213 | Õ | Õ | Õ | Latin capital letter O with tilde | |
214 | Ö | Ö | Ö | Latin capital letter O with diaeresis | |
215 | × | × | × | multiplication sign |
ASCII characters:
When we talk about HTML charset, the values 128 to 255 are not used in ASCII. Control characters in ASCII range from 0 to 31 (and 127).
For letters, digits, and symbols, ASCII uses values from 32 to 126.
Character set ISO-8859-1:
128 to 159 are not operated in ISO-8859-1. From 0 to 127, ISO-8859-1 is identical to ASCII in HTML charset.
The values from 160 to 255 of ISO-8859-1 are identical to those of UTF-8 when it comes to HTML charset.
ANSI characters – Windows 1252:
If it comes to HTML charset, characters in the range of 128 to 159 are proprietary to ANSI. The values 0 through 127 of ANSI are identical to those of ASCII.
For values between 160 and 255, ANSI is identical to UTF-8 when it comes to HTML charset.
UTF-8 Character Set:
When we talk about HTML charset, a value between 128 and 159 is not used by UTF-8. More than ten thousand characters are supported in UTF-8, starting with the value 256.
For values 0 to 127, UTF-8 is identical to ASCII.
For values between 160 and 255, UTF-8 is identical to both ANSI and 8859-1 in HTML charset.
Benefits of HTML Character Sets
HTML character sets have numerous benefits that can improve the appearance, accessibility, and clarity of content in an HTML document. Here are some of the benefits of using HTML character sets:
- Language Support: Using the appropriate character set can ensure that content in different languages is displayed correctly. This is because different languages use different characters, and using the appropriate character set can help ensure that all characters in the language are displayed properly.
- Symbol Support: Different character sets include various symbols and special characters that can be used to enhance the visual appearance of content and make it more engaging and readable. Using the appropriate symbols can help convey meaning more effectively.
- Compatibility: Using a standardized character set, such as Unicode, ensures that content will be displayed consistently across different devices and operating systems. This is because all modern operating systems and web browsers are designed to be fully compatible with Unicode.
- Accessibility: Character sets can be used to improve the accessibility of content for individuals with visual impairments or other disabilities. Certain characters or symbols can provide additional cues or context to help understand the content, making it easier to read and comprehend.
- Clarity: Using the appropriate character set and symbols can improve the clarity and readability of content. This can make it easier for users to quickly scan and understand the information presented, leading to a better user experience.
- Consistency: Using a consistent character set throughout an HTML document ensures that all characters are displayed properly and helps maintain a consistent visual style and tone throughout the content. This can enhance the overall professionalism and quality of the content.