ISO/IEC 8859

ISO 8859 encoding family
Standard	ISO/IEC 8859
Classification	8-bit extended ASCII, ISO/IEC 4873 level 1
Extends	US-ASCII
Preceded by	ISO/IEC 646
Succeeded by	ISO/IEC 10646 (Unicode)
Other related encoding(s)	ISO/IEC 10367, Windows-125x

ISO/IEC 8859 is a joint ISO and IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC 8859-1, ISO/IEC 8859-2, etc. There are 15 parts, excluding the abandoned ISO/IEC 8859-12.^[1] The ISO working group maintaining this series of standards has been disbanded.

ISO/IEC 8859 parts 1, 2, 3, and 4 were originally Ecma International standard ECMA-94.

Introduction

While the bit patterns of the 95 printable ASCII characters are sufficient to exchange information in modern English, most other languages that use Latin alphabets need additional symbols not covered by ASCII. ISO/IEC 8859 sought to remedy this problem by utilizing the eighth bit in an 8-bit byte to allow positions for another 96 printable characters. Early encodings were limited to 7 bits because of restrictions of some data transmission protocols, and partially for historical reasons. However, more characters were needed than could fit in a single 8-bit character encoding, so several mappings were developed, including at least ten suitable for various Latin alphabets.

The ISO/IEC 8859 standard parts only define printable characters, although they explicitly set apart the byte ranges 0x00–1F and 0x7F–9F as "combinations that do not represent graphic characters" (i.e. which are reserved for use as control characters) in accordance with ISO/IEC 4873; they were designed to be used in conjunction with a separate standard defining the control functions associated with these bytes, such as ISO 6429 or ISO 6630.^[2] To this end a series of encodings registered with the IANA add the C0 control set (control characters mapped to bytes 0 to 31) from ISO 646 and the C1 control set (control characters mapped to bytes 128 to 159) from ISO 6429, resulting in full 8-bit character maps with most, if not all, bytes assigned. These sets have ISO-8859-n as their preferred MIME name or, in cases where a preferred MIME name is not specified, their canonical name. Many people use the terms ISO/IEC 8859-n and ISO-8859-n interchangeably. ISO/IEC 8859-11 did not get such a charset assigned, presumably because it was almost identical to TIS 620.

Characters

The ISO/IEC 8859 standard is designed for reliable information exchange, not typography; the standard omits symbols needed for high-quality typography, such as optional ligatures, curly quotation marks, dashes, etc. As a result, high-quality typesetting systems often use proprietary or idiosyncratic extensions on top of the ASCII and ISO/IEC 8859 standards, or use Unicode instead.

An inexact rule based on practical experience states that if a character or symbol was not already part of a widely used data-processing character set and was also not usually provided on typewriter keyboards for a national language, it did not get in. Hence the directional double quotation marks « and » used for some European languages were included, but not the directional double quotation marks “ and ” used for English and some other languages.

French did not get its œ and Œ ligatures because they could be typed as 'oe'. Likewise, Ÿ, needed for all-caps text, was dropped as well.^[3]^[4]^[5] Albeit under different codepoints, these three characters were later reintroduced with ISO/IEC 8859-15 in 1999, which also introduced the new euro sign character €. Likewise Dutch did not get the ĳ and Ĳ letters, because Dutch speakers had become used to typing these as two letters instead.

Romanian did not initially get its Ș/ș and Ț/ț (with comma) letters, because these letters were initially unified with Ş/ş and Ţ/ţ (with cedilla) by the Unicode Consortium, considering the shapes with comma beneath to be glyph variants of the shapes with cedilla. However, the letters with explicit comma below were later added to the Unicode standard and are also in ISO/IEC 8859-16.

Most of the ISO/IEC 8859 encodings provide diacritic marks required for various European languages using the Latin script. Others provide non-Latin alphabets: Greek, Cyrillic, Hebrew, Arabic and Thai. Most of the encodings contain only spacing characters, although the Thai, Hebrew, and Arabic ones do also contain combining characters.

The standard makes no provision for the scripts of East Asian languages (CJK), as their ideographic writing systems require many thousands of code points. Although it uses Latin based characters, Vietnamese does not fit into 96 positions (without using combining diacritics such as in Windows-1258) either. Each Japanese syllabic alphabet (hiragana or katakana, see Kana) would fit, as in JIS X 0201, but like several other alphabets of the world they are not encoded in the ISO/IEC 8859 system.

The parts of ISO/IEC 8859

ISO/IEC 8859 is divided into the following parts:

Part	Name	Revisions	Other standards	Description
Part 1	Latin-1 Western European	1987, 1998	ECMA-94 (1985, 1986)	Perhaps the most widely used part of ISO/IEC 8859, covering most Western European languages: Danish (partial),^{[nb 1]} Dutch (partial),^{[nb 2]} English, Faeroese, Finnish (partial),^{[nb 3]} French (partial),^{[nb 3]} German, Icelandic, Irish, Italian, Norwegian, Portuguese, Rhaeto-Romanic, Scottish Gaelic, Spanish, Catalan, and Swedish. Languages from other parts of the world are also covered, including: Eastern European Albanian, Southeast Asian Indonesian, as well as the African languages Afrikaans and Swahili. A modification of DEC MCS; the first (1985) standard version at the ECMA level lacked the times sign and division obelus, which were added the next year. The missing euro sign and capital Ÿ are in the revised version ISO/IEC 8859-15 (see below). The corresponding IANA character set is ISO-8859-1.
Part 2	Latin-2 Central European	1987, 1999	ECMA-94 (1986)^{[nb 4]}	Supports those Central and Eastern European languages that use the Latin alphabet, including Bosnian, Polish, Croatian, Czech, Slovak, Slovene, Serbian, and Hungarian. The missing euro sign can be found in version ISO/IEC 8859-16.
Part 3	Latin-3 South European	1988, 1999		Turkish, Maltese, and Esperanto. Largely superseded by ISO/IEC 8859-9 for Turkish.
Part 4	Latin-4 North European	1988, 1998		Estonian, Latvian, Lithuanian, Greenlandic, and Sami.
Part 5	Latin/Cyrillic	1988, 1999	ECMA-113 (1988, 1999)^{[nb 5]}	Covers mostly Slavic languages that use a Cyrillic alphabet, including Belarusian, Bulgarian, Macedonian, Russian, Serbian, and Ukrainian (partial).^{[nb 6]}
Part 6	Latin/Arabic	1987, 1999	ASMO 708 (1986) ECMA-114 (1986, 2000)	Covers the most common Arabic language characters. Does not support other languages using the Arabic script. Needs to be BiDi and cursive joining processed for display.
Part 7	Latin/Greek	1987, 2003	ELOT 928 (1986) ECMA-118 (1986)	Covers the modern Greek language (monotonic orthography). Can also be used for Ancient Greek written without accents or in monotonic orthography, but lacks the diacritics for polytonic orthography. These were introduced with Unicode. Updated 2003 to add the euro sign, drachma sign and spacing ypogegrammeni.
Part 8	Latin/Hebrew	1988, 1999	ECMA-121 (1987, 2000) SI 1311 (2002)	Covers the modern Hebrew alphabet as used in Israel. In practice two different encodings exist, logical order (needs to be BiDi processed for display) and visual (left-to-right) order (in effect, after bidi processing and line breaking). Updated 1999 to add LRM and RLM. Updated at national standard level in 2002 to add euro and shekel signs and more bidirectional format effectors; the 2002 additions were never incorporated back into the ISO standard version.
Part 9	Latin-5 Turkish	1989, 1999	TS 5881 (1988) ECMA-128 (1988, 1999)	Largely the same as ISO/IEC 8859-1, replacing the rarely used Icelandic letters with Turkish ones.
Part 10	Latin-6 Nordic	1992, 1998	ECMA-144 (1990, 1992, 2000)	A rearrangement of Latin-4. Considered more useful for Nordic languages. Baltic languages use Latin-4 more.
Part 11	Latin/Thai	2001	TIS-620 (1986, 1990)	Contains characters needed for the Thai language. First revision established in 1986 at national standard level as TIS 620. Elevated to ISO standard status as a part of ISO 8859 in 2001, with the addition of a non-breaking space.
~~Part 12~~	Latin/Devanagari	N/A	-	The work in making a part of 8859 for Devanagari was officially abandoned in 1997. ISCII and Unicode/ISO/IEC 10646 cover Devanagari.
Part 13	Latin-7 Baltic Rim	1998	-	Added some characters for Baltic languages which were missing from Latin-4 and Latin-6. Related to the earlier-published^{[nb 7]} Windows-1257.
Part 14	Latin-8 Celtic	1998	-	Covers Celtic languages such as Gaelic and the Breton language. Welsh letters correspond to the earlier (1994) ISO-IR-182.
Part 15	Latin-9	1999	-	A revision of 8859-1 that removes some little-used symbols, replacing them with the euro sign € and the letters Š, š, Ž, ž, Œ, œ, and Ÿ, which completes the coverage of French, Finnish and Estonian.
Part 16	Latin-10 South-Eastern European	2001	SR 14111 (1998)	Intended for Albanian, Croatian, Hungarian, Italian, Polish, Romanian and Slovene, but also Finnish, French, German and Irish Gaelic (new orthography). The focus lies more on letters than symbols. The currency sign is replaced with the euro sign.

Each part of ISO/IEC 8859 is designed to support languages that often borrow from each other, so the characters needed by each language are usually accommodated by a single part. However, there are some characters and language combinations that are not accommodated without transcriptions. Efforts were made to make conversions as smooth as possible. For example, German has all of its seven special characters at the same positions in all Latin variants (1–4, 9, 10, 13–16), and in many positions the characters only differ in the diacritics between the sets. In particular, variants 1–4 were designed jointly, and have the property that every encoded character appears either at a given position or not at all.

Table

Comparison of the various parts (1–16) of ISO/IEC 8859
Binary	Oct	Dec	Hex	1	2	3	4	5	6	7	8	9	10	11	13	14	15	16
1010 0000	240	160	A0	Non-breaking space (NBSP)
1010 0001	241	161	A1	¡	Ą	Ħ	Ą	Ё		‘		¡	Ą	ก	”	Ḃ	¡	Ą
1010 0010	242	162	A2	¢	˘		ĸ	Ђ		’	¢		Ē	ข	¢	ḃ	¢	ą
1010 0011	243	163	A3	£	Ł	£	Ŗ	Ѓ		£			Ģ	ฃ	£			Ł
1010 0100	244	164	A4	¤				Є	¤	€	¤		Ī	ค	¤	Ċ	€
1010 0101	245	165	A5	¥	Ľ		Ĩ	Ѕ		₯	¥		Ĩ	ฅ	„	ċ	¥	„
1010 0110	246	166	A6	¦	Ś	Ĥ	Ļ	І		¦			Ķ	ฆ	¦	Ḋ	Š
1010 0111	247	167	A7	§				Ї		§				ง	§
1010 1000	250	168	A8	¨				Ј		¨			Ļ	จ	Ø	Ẁ	š
1010 1001	251	169	A9	©	Š	İ	Š	Љ		©			Đ	ฉ	©
1010 1010	252	170	AA	ª	Ş		Ē	Њ		ͺ	×	ª	Š	ช	Ŗ	Ẃ	ª	Ș
1010 1011	253	171	AB	«	Ť	Ğ	Ģ	Ћ		«			Ŧ	ซ	«	ḋ	«
1010 1100	254	172	AC	¬	Ź	Ĵ	Ŧ	Ќ	،	¬			Ž	ฌ	¬	Ỳ	¬	Ź
1010 1101	255	173	AD	Soft hyphen (SHY)										ญ	SHY
1010 1110	256	174	AE	®	Ž		Ž	Ў			®		Ū	ฎ	®			ź
1010 1111	257	175	AF	¯	Ż		¯	Џ		―	¯		Ŋ	ฏ	Æ	Ÿ	¯	Ż
1011 0000	260	176	B0	°				А		°				ฐ	°	Ḟ	°
1011 0001	261	177	B1	±	ą	ħ	ą	Б		±			ą	ฑ	±	ḟ	±
1011 0010	262	178	B2	²	˛	²	˛	В		²			ē	ฒ	²	Ġ	²	Č
1011 0011	263	179	B3	³	ł	³	ŗ	Г		³			ģ	ณ	³	ġ	³	ł
1011 0100	264	180	B4	´				Д		΄	´		ī	ด	“	Ṁ	Ž
1011 0101	265	181	B5	µ	ľ	µ	ĩ	Е		΅	µ		ĩ	ต	µ	ṁ	µ	”
1011 0110	266	182	B6	¶	ś	ĥ	ļ	Ж		Ά	¶		ķ	ถ	¶
1011 0111	267	183	B7	·	ˇ	·	ˇ	З		·				ท	·	Ṗ	·
1011 1000	270	184	B8	¸				И		Έ	¸		ļ	ธ	ø	ẁ	ž
1011 1001	271	185	B9	¹	š	ı	š	Й		Ή	¹		đ	น	¹	ṗ	¹	č
1011 1010	272	186	BA	º	ş		ē	К		Ί	÷	º	š	บ	ŗ	ẃ	º	ș
1011 1011	273	187	BB	»	ť	ğ	ģ	Л	؛	»			ŧ	ป	»	Ṡ	»
1011 1100	274	188	BC	¼	ź	ĵ	ŧ	М		Ό	¼		ž	ผ	¼	ỳ	Œ
1011 1101	275	189	BD	½	˝	½	Ŋ	Н		½			―	ฝ	½	Ẅ	œ
1011 1110	276	190	BE	¾	ž		ž	О		Ύ	¾		ū	พ	¾	ẅ	Ÿ
1011 1111	277	191	BF	¿	ż		ŋ	П	؟	Ώ		¿	ŋ	ฟ	æ	ṡ	¿	ż
1100 0000	300	192	C0	À	Ŕ	À	Ā	Р		ΐ		À	Ā	ภ	Ą	À
1100 0001	301	193	C1	Á				С	ء	Α		Á		ม	Į	Á
1100 0010	302	194	C2	Â				Т	آ	Β		Â		ย	Ā	Â
1100 0011	303	195	C3	Ã	Ă		Ã	У	أ	Γ		Ã		ร	Ć	Ã		Ă
1100 0100	304	196	C4	Ä				Ф	ؤ	Δ		Ä		ฤ	Ä
1100 0101	305	197	C5	Å	Ĺ	Ċ	Å	Х	إ	Ε		Å		ล	Å			Ć
1100 0110	306	198	C6	Æ	Ć	Ĉ	Æ	Ц	ئ	Ζ		Æ		ฦ	Ę	Æ
1100 0111	307	199	C7	Ç			Į	Ч	ا	Η		Ç	Į	ว	Ē	Ç
1100 1000	310	200	C8	È	Č	È	Č	Ш	ب	Θ		È	Č	ศ	Č	È
1100 1001	311	201	C9	É				Щ	ة	Ι		É		ษ	É
1100 1010	312	202	CA	Ê	Ę	Ê	Ę	Ъ	ت	Κ		Ê	Ę	ส	Ź	Ê
1100 1011	313	203	CB	Ë				Ы	ث	Λ		Ë		ห	Ė	Ë
1100 1100	314	204	CC	Ì	Ě	Ì	Ė	Ь	ج	Μ		Ì	Ė	ฬ	Ģ	Ì
1100 1101	315	205	CD	Í				Э	ح	Ν		Í		อ	Ķ	Í
1100 1110	316	206	CE	Î				Ю	خ	Ξ		Î		ฮ	Ī	Î
1100 1111	317	207	CF	Ï	Ď	Ï	Ī	Я	د	Ο		Ï		ฯ	Ļ	Ï
Binary	Oct	Dec	Hex	1	2	3	4	5	6	7	8	9	10	11	13	14	15	16
1101 0000	320	208	D0	Ð	Đ		Đ	а	ذ	Π		Ğ	Ð	ะ	Š	Ŵ	Ð
1101 0001	321	209	D1	Ñ	Ń	Ñ	Ņ	б	ر	Ρ		Ñ	Ņ	ั	Ń	Ñ		Ń
1101 0010	322	210	D2	Ò	Ň	Ò	Ō	в	ز			Ò	Ō	า	Ņ	Ò
1101 0011	323	211	D3	Ó			Ķ	г	س	Σ		Ó		ำ	Ó
1101 0100	324	212	D4	Ô				д	ش	Τ		Ô		ิ	Ō	Ô
1101 0101	325	213	D5	Õ	Ő	Ġ	Õ	е	ص	Υ		Õ		ี	Õ			Ő
1101 0110	326	214	D6	Ö				ж	ض	Φ		Ö		ึ	Ö
1101 0111	327	215	D7	×				з	ط	Χ		×	Ũ	ื	×	Ṫ	×	Ś
1101 1000	330	216	D8	Ø	Ř	Ĝ	Ø	и	ظ	Ψ		Ø		ุ	Ų	Ø		Ű
1101 1001	331	217	D9	Ù	Ů	Ù	Ų	й	ع	Ω		Ù	Ų	ู	Ł	Ù
1101 1010	332	218	DA	Ú				к	غ	Ϊ		Ú		ฺ	Ś	Ú
1101 1011	333	219	DB	Û	Ű	Û		л		Ϋ		Û			Ū	Û
1101 1100	334	220	DC	Ü				м		ά		Ü			Ü
1101 1101	335	221	DD	Ý		Ŭ	Ũ	н		έ		İ	Ý		Ż	Ý		Ę
1101 1110	336	222	DE	Þ	Ţ	Ŝ	Ū	о		ή		Ş	Þ		Ž	Ŷ	Þ	Ț
1101 1111	337	223	DF	ß				п		ί	‗	ß		฿	ß
1110 0000	340	224	E0	à	ŕ	à	ā	р	ـ	ΰ	א	à	ā	เ	ą	à
1110 0001	341	225	E1	á				с	ف	α	ב	á		แ	į	á
1110 0010	342	226	E2	â				т	ق	β	ג	â		โ	ā	â
1110 0011	343	227	E3	ã	ă		ã	у	ك	γ	ד	ã		ใ	ć	ã		ă
1110 0100	344	228	E4	ä				ф	ل	δ	ה	ä		ไ	ä
1110 0101	345	229	E5	å	ĺ	ċ	å	х	م	ε	ו	å		ๅ	å			ć
1110 0110	346	230	E6	æ	ć	ĉ	æ	ц	ن	ζ	ז	æ		ๆ	ę	æ
1110 0111	347	231	E7	ç			į	ч	ه	η	ח	ç	į	็	ē	ç
1110 1000	350	232	E8	è	č	è	č	ш	و	θ	ט	è	č	่	č	è
1110 1001	351	233	E9	é				щ	ى	ι	י	é		้	é
1110 1010	352	234	EA	ê	ę	ê	ę	ъ	ي	κ	ך	ê	ę	๊	ź	ê
1110 1011	353	235	EB	ë				ы	ً	λ	כ	ë		๋	ė	ë
1110 1100	354	236	EC	ì	ě	ì	ė	ь	ٌ	μ	ל	ì	ė	์	ģ	ì
1110 1101	355	237	ED	í				э	ٍ	ν	ם	í		ํ	ķ	í
1110 1110	356	238	EE	î				ю	َ	ξ	מ	î		๎	ī	î
1110 1111	357	239	EF	ï	ď	ï	ī	я	ُ	ο	ן	ï		๏	ļ	ï
1111 0000	360	240	F0	ð	đ		đ	№	ِ	π	נ	ğ	ð	๐	š	ŵ	ð	đ
1111 0001	361	241	F1	ñ	ń	ñ	ņ	ё	ّ	ρ	ס	ñ	ņ	๑	ń	ñ		ń
1111 0010	362	242	F2	ò	ň	ò	ō	ђ	ْ	ς	ע	ò	ō	๒	ņ	ò
1111 0011	363	243	F3	ó			ķ	ѓ		σ	ף	ó		๓	ó
1111 0100	364	244	F4	ô				є		τ	פ	ô		๔	ō	ô
1111 0101	365	245	F5	õ	ő	ġ	õ	ѕ		υ	ץ	õ		๕	õ			ő
1111 0110	366	246	F6	ö				і		φ	צ	ö		๖	ö
1111 0111	367	247	F7	÷				ї		χ	ק	÷	ũ	๗	÷	ṫ	÷	ś
1111 1000	370	248	F8	ø	ř	ĝ	ø	ј		ψ	ר	ø		๘	ų	ø		ű
1111 1001	371	249	F9	ù	ů	ù	ų	љ		ω	ש	ù	ų	๙	ł	ù
1111 1010	372	250	FA	ú				њ		ϊ	ת	ú		๚	ś	ú
1111 1011	373	251	FB	û	ű	û		ћ		ϋ		û		๛	ū	û
1111 1100	374	252	FC	ü				ќ		ό		ü			ü
1111 1101	375	253	FD	ý		ŭ	ũ	§		ύ	LRM	ı	ý		ż	ý		ę
1111 1110	376	254	FE	þ	ţ	ŝ	ū	ў		ώ	RLM	ş	þ		ž	ŷ	þ	ț
1111 1111	377	255	FF	ÿ	˙			џ				ÿ	ĸ		’	ÿ
Binary	Oct	Dec	Hex	1	2	3	4	5	6	7	8	9	10	11	13	14	15	16

unassigned code points.
new additions in ISO/IEC 8859-7:2003 and ISO/IEC 8859-8:1999 versions, previously unassigned.

Relationship to Unicode and the UCS

Since 1991, the Unicode Consortium has been working with ISO and IEC to develop the Unicode Standard and ISO/IEC 10646: the Universal Character Set (UCS) in tandem. Newer editions of ISO/IEC 8859 express characters in terms of their Unicode/UCS names and the U+nnnn notation, effectively causing each part of ISO/IEC 8859 to be a Unicode/UCS character encoding scheme that maps a very small subset of the UCS to single 8-bit bytes. The first 256 characters in Unicode and the UCS are identical to those in ISO/IEC-8859-1 (Latin-1).

Single-byte character sets including the parts of ISO/IEC 8859 and derivatives of them were favoured throughout the 1990s, having the advantages of being well-established and more easily implemented in software: the equation of one byte to one character is simple and adequate for most single-language applications, and there are no combining characters or variant forms. As Unicode-enabled operating systems became more widespread, ISO/IEC 8859 and other legacy encodings became less popular. While remnants of ISO 8859 and single-byte character models remain entrenched in many operating systems, programming languages, data storage systems, networking applications, display hardware, and end-user application software, most modern computing applications use Unicode internally, and rely on conversion tables to map to and from other encodings, when necessary.

Current status

The ISO/IEC 8859 standard was maintained by ISO/IEC Joint Technical Committee 1, Subcommittee 2, Working Group 3 (ISO/IEC JTC 1/SC 2/WG 3). In June 2004, WG 3 disbanded, and maintenance duties were transferred to SC 2. The standard is not currently being updated, as the Subcommittee's only remaining working group, WG 2, is concentrating on development of Unicode's Universal Coded Character Set.

The WHATWG Encoding Standard, which specifies the character encodings permitted in HTML5 which compliant browsers must support,^[7] includes most parts of ISO/IEC 8859,^[8] except for parts 1, 9 and 11, which are instead interpreted as Windows-1252, Windows-1254 and Windows-874 respectively.^[9] Authors of new pages and the designers of new protocols are instructed to use UTF-8 instead.^[9]

Notes

↑ Missing several accented vowels including Ǿ and ǿ. These can be replaced with non-accented vowels at the cost of increased ambiguity.
↑ Only the Ĳ/ĳ (letter IJ) is missing, which is usually represented as IJ.
1 2 Missing characters are in ISO/IEC 8859-15.
↑ The 1985 edition includes only a version of ISO-8859-1.
↑ The 1986 edition defines KOI8-E, which is an entirely different encoding.
↑ 8859-5 misses the Ґ/ґ letter, which was reintroduced into the Ukrainian alphabet in 1990.
↑ Published 1995, registered 1996.^[6]

References

↑ Chaudhuri, Arindam; Mandaviya, Krupa; Badelia, Pratixa; Ghosh, Soumya K. (2016-12-24), "Optical Character Recognition Systems for French Language", Optical Character Recognition Systems for Different Languages with Soft Computing, Cham: Springer International Publishing, pp. 109–136, ISBN 978-3-319-50251-9, retrieved 2023-12-04
↑ ISO/IEC JTC 1/SC 2/WG 3 (1998-02-12). Final Text of DIS 8859-1, 8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No.1 (PDF). ISO/IEC FDIS 8859-1:1998; JTC1/SC2/N2988; WG3/N411. This set of coded graphic characters may be regarded as a version of an 8-bit code according to ISO/IEC 2022 or ISO/IEC 4873 at level 1. […] The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429.{{citation}}: CS1 maint: numeric names: authors list (link)
↑ Haralambous, Yannis (September 2007). Fonts & Encodings. Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. pp. 37–38. ISBN 978-0-596-10242-5. […] According to an urban legend, the French delegate was out sick the day when the standard came up for a vote and had to have his Belgian counterpart act as his proxy. In fact, the French delegate was an engineer, who was convinced that this ligature was useless, and the Swiss and German representatives pressed hard to have the mathematical symbols × and ÷ included at the positions where Œ and œ would logically appear. […]
↑ André, Jacques (2003-10-15) [2003-10-02]. André, Bernard; Baron, Georges-Louis; Bruillard, Éric (eds.). "Histoire d'Œ, histoire d'@ des rumeurs typographiques et de leurs enseignements". Traitement de Texte et Production de Documents INRP/GEDIAPS (in French): 19–34. Archived from the original on 2016-12-08. Retrieved 2016-12-09.
↑ André, Jacques (November 1996). "ISO Latin-1, norme de codage des caractères européens? trois caractères français en sont absents!" (PDF). Cahiers GUTenberg (in French) (25): 65–77. Archived from the original (PDF) on 2008-11-30.
↑ Lazhintseva, Katya (1996-05-03). "Registration of new MIME charset: Windows-1257". IANA.
↑ "8.2.2.3. Character encodings". HTML 5.1 2nd Edition. W3C. User agents must support the encodings defined in the WHATWG Encoding standard, including, but not limited to […]
↑ van Kesteren, Anne. "Legacy single-byte encodings". Encoding Standard. WHATWG.
1 2 van Kesteren, Anne. "Names and labels". Encoding Standard. WHATWG.

Published versions of each part of ISO/IEC 8859 are available, for a fee, from the ISO catalogue site and from the IEC Webstore.
PDF versions of the final drafts of some parts of ISO/IEC 8859 as submitted to the ISO/IEC JTC 1/SC 2/WG 3 for review & publication are available at the WG 3 web site:
- ISO/IEC 8859-1:1998 - 8-bit single-byte coded graphic character sets, Part 1: Latin alphabet No. 1 (draft dated February 12, 1998, published April 15, 1998)
- ISO/IEC 8859-4:1998 - 8-bit single-byte coded graphic character sets, Part 4: Latin alphabet No. 4 (draft dated February 12, 1998, published July 1, 1998)
- ISO/IEC 8859-7:1999 - 8-bit single-byte coded graphic character sets, Part 7: Latin/Greek alphabet (draft dated June 10, 1999; superseded by ISO/IEC 8859-7:2003, published October 10, 2003)
- ISO/IEC 8859-10:1998 - 8-bit single-byte coded graphic character sets, Part 10: Latin alphabet No. 6 (draft dated February 12, 1998, published July 15, 1998)
- ISO/IEC 8859-11:1999 - 8-bit single-byte coded graphic character sets, Part 11: Latin/Thai character set (draft dated June 22, 1999; superseded by ISO/IEC 8859-11:2001, published 15 December 2001)
- ISO/IEC 8859-13:1998 - 8-bit single-byte coded graphic character sets, Part 13: Latin alphabet No. 7 (draft dated April 15, 1998, published October 15, 1998)
- ISO/IEC 8859-15:1998 - 8-bit single-byte coded graphic character sets, Part 15: Latin alphabet No. 9 (draft dated August 1, 1997; superseded by ISO/IEC 8859-15:1999, published March 15, 1999)
- ISO/IEC 8859-16:2000 - 8-bit single-byte coded graphic character sets, Part 16: Latin alphabet No. 10 (draft dated November 15, 1999; superseded by ISO/IEC 8859-16:2001, published July 15, 2001)
ECMA standards, which in intent correspond exactly to the ISO/IEC 8859 character set standards, can be found at:
- Standard ECMA-94: 8-Bit Single Byte Coded Graphic Character Sets - Latin Alphabets No. 1 to No. 4 2nd edition (June 1986)
- Standard ECMA-113: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet 3rd edition (December 1999)
- Standard ECMA-114: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Arabic Alphabet 2nd edition (December 2000)
- Standard ECMA-118: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Greek Alphabet (December 1986)
- Standard ECMA-121: 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Hebrew Alphabet 2nd edition (December 2000)
- Standard ECMA-128: 8-Bit Single-Byte Coded Graphic Character Sets - Latin Alphabet No. 5 2nd edition (December 1999)
- Standard ECMA-144: 8-Bit Single-Byte Coded Character Sets - Latin Alphabet No. 6 3rd edition (December 2000)
ISO/IEC 8859-1 to Unicode mapping tables as plain text files are at the Unicode FTP site.
Informal descriptions and code charts for most ISO/IEC 8859 standards are available in ISO/IEC 8859 Alphabet Soup (Mirror)

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[NB_Missing_vowels-6] Missing several accented vowels including Ǿ and ǿ. These can be replaced with non-accented vowels at the cost of increased ambiguity.

[NB_IJ-7] Only the Ĳ/ĳ (letter IJ) is missing, which is usually represented as IJ.

[two-8] 1 2 Missing characters are in ISO/IEC 8859-15.

[9] The 1985 edition includes only a version of ISO-8859-1.

[10] The 1986 edition defines KOI8-E, which is an entirely different encoding.

[NB_Ghe-11] 8859-5 misses the Ґ/ґ letter, which was reintroduced into the Ukrainian alphabet in 1990.

[13] Published 1995, registered 1996.^[6]

[1] Chaudhuri, Arindam; Mandaviya, Krupa; Badelia, Pratixa; Ghosh, Soumya K. (2016-12-24), "Optical Character Recognition Systems for French Language", Optical Character Recognition Systems for Different Languages with Soft Computing, Cham: Springer International Publishing, pp. 109–136, ISBN 978-3-319-50251-9, retrieved 2023-12-04

[2] ISO/IEC JTC 1/SC 2/WG 3 (1998-02-12). Final Text of DIS 8859-1, 8-bit single-byte coded graphic character sets—Part 1: Latin alphabet No.1 (PDF). ISO/IEC FDIS 8859-1:1998; JTC1/SC2/N2988; WG3/N411. This set of coded graphic characters may be regarded as a version of an 8-bit code according to ISO/IEC 2022 or ISO/IEC 4873 at level 1. […] The shaded positions in the code table correspond to bit combinations that do not represent graphic characters. Their use is outside the scope of ISO/IEC 8859; it is specified in other International Standards, for example ISO/IEC 6429.{{citation}}: CS1 maint: numeric names: authors list (link)

[Haralambous_2007-3] Haralambous, Yannis (September 2007). Fonts & Encodings. Translated by Horne, P. Scott (1st ed.). Sebastopol, California, USA: O'Reilly Media, Inc. pp. 37–38. ISBN 978-0-596-10242-5. […] According to an urban legend, the French delegate was out sick the day when the standard came up for a vote and had to have his Belgian counterpart act as his proxy. In fact, the French delegate was an engineer, who was convinced that this ligature was useless, and the Swiss and German representatives pressed hard to have the mathematical symbols × and ÷ included at the positions where Œ and œ would logically appear. […]

[Andre_2003-4] André, Jacques (2003-10-15) [2003-10-02]. André, Bernard; Baron, Georges-Louis; Bruillard, Éric (eds.). "Histoire d'Œ, histoire d'@ des rumeurs typographiques et de leurs enseignements". Traitement de Texte et Production de Documents INRP/GEDIAPS (in French): 19–34. Archived from the original on 2016-12-08. Retrieved 2016-12-09.

[Andre_1996-5] André, Jacques (November 1996). "ISO Latin-1, norme de codage des caractères européens? trois caractères français en sont absents!" (PDF). Cahiers GUTenberg (in French) (25): 65–77. Archived from the original (PDF) on 2008-11-30.

[12] Lazhintseva, Katya (1996-05-03). "Registration of new MIME charset: Windows-1257". IANA.

[14] "8.2.2.3. Character encodings". HTML 5.1 2nd Edition. W3C. User agents must support the encodings defined in the WHATWG Encoding standard, including, but not limited to […]

[lsbe-15] van Kesteren, Anne. "Legacy single-byte encodings". Encoding Standard. WHATWG.

[nal-16] 1 2 van Kesteren, Anne. "Names and labels". Encoding Standard. WHATWG.

[1]

[2]

[3]

[4]

[5]

[nb 1]

[nb 2]

[nb 3]

[nb 4]

[nb 5]

[nb 6]

[nb 7]

[7]

[8]

[9]

[6]

IEC standards
IEC	60027 60034 60038 60062 60063 60068 60112 60228 60269 60297 60309 60320 60364 60446 60559 60601 60870 60870-5 60870-6 60906-1 60908 60929 60958 61030 61131 61131-3 61131-9 61158 61162 61334 61355 61360 61400 61499 61508 61511 61784 61850 61851 61883 61960 61968 61970 62014-4 62026 62056 62061 62196 62262 62264 62304 62325 62351 62365 62366 62379 62386 62455 62680 62682 62700 63110 63119 63382
ISO/IEC	646 1989 2022 4909 5218 6429 6523 7810 7811 7812 7813 7816 7942 8613 8632 8652 8859 9126 9293 9496 9529 9592 9593 9899 9945 9995 10021 10116 10165 10179 10279 10646 10967 11172 11179 11404 11544 11801 12207 13250 13346 13522-5 13568 13816 13818 14443 14496 14651 14882 15288 15291 15408 15444 15445 15504 15511 15693 15897 15938 16262 16485 17024 17025 18004 18014 18181 19752 19757 19770 19788 20000 20802 21000 21827 22275 22537 23000 23003 23008 23270 23360 24707 24727 24744 24752 26300 27000 27000-series 27002 27040 29110 29119 33001 38500 42010 80000 81346
Related	International Electrotechnical Commission

Character encodings
Early telecommunications	Telegraph code Needle Morse Non-Latin Wabun/Kana Chinese Cyrillic Korean Baudot and Murray Fieldata ASCII ISO/IEC 646 BCDIC Teletex and Videotex/Teletext T.51/ISO/IEC 6937 ITU T.61 ITU T.101 World System Teletext background sets Transcode
ISO/IEC 8859	Approved parts -1 (Western Europe) -2 (Central Europe) -3 (Maltese/Esperanto) -4 (North Europe) -5 (Cyrillic) -6 (Arabic) -7 (Greek) -8 (Hebrew) -9 (Turkish) -10 (Nordic) -11 (Thai) -13 (Baltic) -14 (Celtic) -15 (New Western Europe) -16 (Romanian) Abandoned parts -12 (Devanagari) Proposed but not approved KOI-8 Cyrillic Sámi Adaptations Welsh Barents Cyrillic Estonian Ukrainian Cyrillic
Bibliographic use	MARC-8 ANSEL CCCII/EACC ISO 5426 5426-2 5427 5428 6438 6862
National standards	ArmSCII Big5 BraSCII CNS 11643 DIN 66003 ELOT 927 GOST 10859 GB 2312 GB 12345 GB 12052 GB 18030 HKSCS ISCII JIS X 0201 JIS X 0208 JIS X 0212 JIS X 0213 KOI-7 KPS 9566 KS X 1001 KS X 1002 LST 1564 LST 1590-4 PASCII Shift JIS SI 960 TIS-620 TSCII VISCII VSCII YUSCII
ISO/IEC 2022	ISO/IEC 8859 ISO/IEC 10367 Extended Unix Code / EUC
Mac OS Code pages ("scripts")	Armenian Arabic Barents Cyrillic Celtic Central European Croatian Cyrillic Devanagari Farsi (Persian) Font X (Kermit) Gaelic Georgian Greek Gujarati Gurmukhi Hebrew Iceland Inuit Keyboard Latin (Kermit) Maltese/Esperanto Ogham Roman Romanian Sámi Turkish Turkic Cyrillic Ukrainian VT100
DOS code pages	437 668 708 720 737 770 773 775 776 777 778 850 851 852 853 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 897 899 903 904 932 936 942 949 950 951 1034 1040 1042 1043 1044 1098 1115 1116 1117 1118 1127 3846 ABICOMP CS Indic CSX Indic CSX+ Indic CWI-2 Iran System Kamenický Mazovia MIK
IBM AIX code pages	895 896 912 915 921 922 1006 1008 1009 1010 1012 1013 1014 1015 1016 1017 1018 1019 1046 1124 1133
Windows code pages	CER-GS 932 936 (GBK) 950 1169 Extended Latin-8 1250 1251 1252 1253 1254 1255 1256 1257 1258 1270 Cyrillic + Finnish Cyrillic + French Cyrillic + German Polytonic Greek
EBCDIC code pages	Japanese language in EBCDIC DKOI
DEC terminals (VTx)	Multinational (MCS) National Replacement (NRCS) French Canadian Swiss Spanish United Kingdom Dutch Finnish French Norwegian and Danish Swedish Norwegian and Danish (alternative) 8-bit Greek 8-bit Turkish SI 960 Hebrew Special Graphics Technical (TCS)
Platform specific	1052 1053 1054 1055 1056 1057 1058 Acorn RISC OS Amstrad CPC Apple II ATASCII Atari ST BICS Casio calculators CDC Compucolor 8001 Compucolor II CP/M+ DEC RADIX 50 DEC MCS/NRCS DG International Galaksija GEM GSM 03.38 HP Roman HP FOCAL HP RPL SQUOZE LICS LMBCS MSX NEC APC NeXT PETSCII PostScript Standard PostScript Latin 1 SAM Coupé Sega SC-3000 Sharp calculators Sharp MZ Sinclair QL Teletext TI calculators TRS-80 Ventura International WISCII XCCS ZX80 ZX81 ZX Spectrum
Unicode / ISO/IEC 10646	UTF-1 UTF-7 UTF-8 UTF-16 UTF-32 UTF-EBCDIC GB 18030 DIN 91379 BOCU-1 CESU-8 SCSU TACE16 Comparison of Unicode encodings
TeX typesetting system	Cork LY1 OML OMS OT1
Miscellaneous code pages	ABICOMP ASMO 449 Digital encoding of APL symbols ISO-IR-68 ARIB STD-B24 Fieldata HZ IEC-P27-1 INIS 7-bit 8-bit ISO-IR-169 ISO 2033 KOI KOI8-R KOI8-RU KOI8-U Mojikyō SEASCII Stanford/ITS Symbol TRON Unified Hangul Code
Control character	Morse prosigns C0 and C1 control codes ISO/IEC 6429 JIS X 0211 Unicode control, format and separator characters Whitespace characters
Related topics	CCSID Character encodings in HTML Charset detection Han unification Hardware code page MICR code Mojibake Variable-length encoding
Character sets

Standards of Ecma International
Application interfaces	ANSI escape code APIW Common Language Infrastructure Office Open XML OpenXPS
File systems (tape)	Advanced Intelligent Tape DDS DLT Super DLT Holographic Versatile Disc Linear Tape-Open (Ultrium-1) VXA
File systems (disk)	CD-ROM CD File System (CDFS) FAT FAT12 FAT16 FAT16B FD UDF Ultra Density Optical Universal Media Disc
Graphics	Universal 3D
Programming languages	C++/CLI C# Eiffel JavaScript (E4X, ECMAScript) Dart Minimal BASIC Full BASIC
Radio link interfaces	NFC UWB
Other	ECMA-35 JSON
List of Ecma standards (1961 – present)

ISO standards by standard number
List of ISO standards – ISO romanizations – IEC standards
1–9999	1 2 3 4 6 7 9 16 17 31 -0 -1 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 68-1 128 216 217 226 228 233 259 261 262 302 306 361 500 518 519 639 -1 -2 -3 -5 -6 646 657 668 690 704 732 764 838 843 860 898 965 999 1000 1004 1007 1073-1 1073-2 1155 1413 1538 1629 1745 1989 2014 2015 2022 2033 2047 2108 2145 2146 2240 2281 2533 2709 2711 2720 2788 2848 2852 3029 3103 3166 -1 -2 -3 3297 3307 3601 3602 3864 3901 3950 3977 4031 4157 4165 4217 4909 5218 5426 5427 5428 5725 5775 5776 5800 5807 5964 6166 6344 6346 6373 6385 6425 6429 6438 6523 6709 6943 7001 7002 7010 7027 7064 7098 7185 7200 7498 -1 7637 7736 7810 7811 7812 7813 7816 7942 8000 8093 8178 8217 8373 8501-1 8571 8583 8601 8613 8632 8651 8652 8691 8805/8806 8807 8820-5 8859 -1 -2 -3 -4 -5 -6 -7 -8 -8-I -9 -10 -11 -12 -13 -14 -15 -16 8879 9000/9001 9036 9075 9126 9141 9227 9241 9293 9314 9362 9407 9496 9506 9529 9564 9592/9593 9594 9660 9797-1 9897 9899 9945 9984 9985 9995
10000–19999	10006 10007 10116 10118-3 10160 10161 10165 10179 10206 10218 10279 10303 -11 -21 -22 -28 -238 10383 10585 10589 10628 10646 10664 10746 10861 10957 10962 10967 11073 11170 11179 11404 11544 11783 11784 11785 11801 11889 11898 11940 (-2) 11941 11941 (TR) 11992 12006 12052 12182 12207 12234-2 12620 13211 -1 -2 13216 13250 13399 13406-2 13450 13485 13490 13567 13568 13584 13616 13816 14000 14031 14224 14289 14396 14443 14496 -2 -3 -6 -10 -11 -12 -14 -17 -20 14617 14644 14649 14651 14698 14764 14882 14971 15022 15189 15288 15291 15292 15398 15408 15444 -3 -9 15445 15438 15504 15511 15686 15693 15706 -2 15707 15897 15919 15924 15926 15926 WIP 15930 16023 16262 16355-1 16485 16612-2 16750 16949 (TS) 17024 17025 17100 17203 17369 17442 17506 17799 18004 18014 18181 18245 18629 18916 19005 19011 19092 -1 -2 19114 19115 19125 19136 19407 19439 19500 19501 19502 19503 19505 19506 19507 19508 19509 19510 19600 19752 19757 19770 19775-1 19794-5 19831
20000–29999	20000 20022 20121 20400 20802 21000 21047 21122 21500 21827 22000 22275 22300 22301 22395 22537 23000 23090-3 23270 23271 23360 24517 24613 24617 24707 25178 25964 26000 26262 26300 26324 27000 series 27000 27001 27002 27005 27006 27729 28000 29110 29148 29199-2 29500
30000+	30170 31000 32000 37001 38500 40500 42010 45001 50001 55000 56000 80000
Category