|
7-bit character setsASCII, ISO 646 and IA5 are well-known 128-character sets that are fundamental to computing. Several versions of these standards were released in the past. This article discusses the history of ASCII and related national 7-bit character sets as well as the differences between their versions. Character tables are included. When computers were young in the early 1960s, it was decided that text should be represented with 7 bits for each character. Seven bits would be enough to represent 128 different characters, including letters, numbers, symbols and required control codes. 6 bits were too few. 8 bits were considered too much. The standard became 7. ASCII (American Standard Code for Information Interchange) was the first 7-bit character set to be standardized. During the years, several revisions of ASCII were published. ASCII based character sets became immensely widespread. Most character sets in current use are based on ASCII in a way or another. ISO 646 and IA5 (CCITT International Alphabet No. 5) are the international counterparts of ASCII. ISO 646 and IA5 are ASCII-like standards that define more or less the same 128-character set as ASCII. The main difference is that these standards are international. ASCII is the U.S. national version of ISO 646. National character sets. From the very beginning it was realized that 128 characters were not enough for international use. Each country required its own national characters with letters, accents and other diacritics. To meet this need, ISO 646 defined an International Reference Version (IRV), which each country could tune up to their national needs. Certain character positions could be replaced by national characters. A large number of country-specific versions were standardized. The versions were registered in the ISO/IEC International Register of Coded Character Sets (IR). In addition to the these sets, several other 7-bit character sets were defined. 7-bit sets have later been replaced with 8-bit, 16-bit, even 32-bit character sets. Still, the 7-bit sets are the fundamental building blocks of almost all of today's character encoding systems. See also: Character sets Revisions of ASCIIASCII has undergone several revisions to become the character set we know today. The history of ASCII is not always fully understood. As an example, IANA lists ASCII as the same thing as ANSI_X3.4-1968 and ANSI_X3.4-1986. This is not entirely accurate. The 1968 revision was ambigous. The ambiguities were fixed later, making the 1986 revision different from the 1968 revision. The ASCII version that became standard was first published in 1967 and 1968. The character set of these versions is identical. However, it is not actually what we currently think of as ASCII. The Number Sign (#) could be replaced by the symbol £. The character in position 7C was actually a broken vertical bar (¦). It was broken to prevent confusion with logical OR (|). In later standards this position is either a vertical bar (|) or a national character. According to ASCII-1967 and ASCII-1968, three characters could be "stylized". "!" could be stylized as "|" to represent logical OR. "^" could be stylized as "¬" (logical NOT). The character "˜" was called an overline when used as punctuation, and as a tilde when used as a diacritic. It could also be used for another accent. ASCII-1963 (ASA standard X3.4-1963) was the initial release of ASCII. It was in many ways different from the ASCII in current use. ASCII-1963 didn't yet gain wide acceptance. One of the reasons is that IBM chose to use EBCDIC, an IBM proprietary character set, in its successful SYSTEM/360 series of computers released in 1964. ASCII-1965 was an unpublished major revision. It looked a lot like the current ASCII, even though there were differences with certain characters. ASCII-1965 was accepted as a standard, but it went unpublished and unused. ASCII-1967 (USAS X3.4-1967) was a major revision of the previous versions of ASCII. This was the version that eventually evolved to the ASCII we know today. ASCII-1967 was not exactly what we currently think of as ASCII. The differences are as follows. ASCII-1967 offered some options for certain characters, and one character was totally ambigous. The Number Sign (#) could be replaced by the symbol £. Two characters could be stylized. The Exclamation Point (!) could be stylized as a logical OR (|) and the Circumflex (^) could be stylized as a logical NOT (¬). Character 7C, even though called a Vertical Line, looked like a broken vertical bar (¦). It looked that way to avoid confusion with a solid vertical bar (|) used as a logical OR. In other words, since character 21 could sometimes look like (|), 7C had to look like (¦). Character 7E was ambiguous. This character had three functions. It was 1) Overline when used as punctuation, 2) Tilde when used as a diacritic, and 3) General Accent, yet another diacritic which could be used for other accents not specifically provided. The character appeared in two shapes, upper tilde (˜) and midline tilde (~), interchangeably. No explanation was provided as to which shape to use and when. The character did not look like an overline (¯), even when it was called Overline. As if they couldn't decide what this character really was for. The midline shape (~) may have been unintentional. The midline position conflicts with the intended use either as a diacritic or as an overline. Ambiguity regarding the shape seems to have originated in ASCII-1965, where it may have been a typographical error or restriction. ASCII-1968 (USAS X3.4-1968) was a minor revision. It didn't change any of the graphic characters. The only change was to the "newline" function. LF could now be used alone as a newline. The previous versions required the use of CR LF (or LF CR). The 1968 standard also gave the code its name ASCII or USASCII. ASCII-1977 (ANSI X3.4-1977) fixed some of the ambiguities of ASCII-1967 and ASCII-1968. The Number Sign (#) could no longer be replaced by the Pound (£). Character 7C was now a Vertical Line (|) that no longer looked like a broken vertical bar. One could no longer stylize the Exclamation Point (!) as a (|) or the Circumflex (^) as a logical NOT (¬). Overline was no longer present; it was simply a Tilde (˜, not ~). That character could no longer be used as a General Accent either. ASCII-1977 also changed the definitions of several control characters. The changes did not necessarily change the intended use of these characters. An essential change was with VT and FF: it was now possible to allow an "optional implicit CR" after VT and FF the same way it was already possible with LF. More changes can be found in Control characters in ASCII and Unicode. ASCII-1986 (ANSI X3.4-1986) did not change the character set nor the control characters. Revisions of ISO 646ASCII was accepted, with modifications, as an ISO recommendation in 1967. ISO 646 (officially, 7-bit coded character set for information processing interchange) was an inherently international standard. The basis was "IRV", an International Reference Version, which could be tuned up to national needs. ASCII was the US national version of ISO 646. Other national versions were published for Canada, Finland, France and so on by replacing certain graphic characters with national characters. ISO R 646-1967 was the first official version of the standard (then called a recommendation). This version didn't provide an IRV yet, but only a skeleton chart to be filled by national standards organizations. The character set was similar to that of ASCII-1977 with the following differences: In place of the Number sign (#) there was a Currency symbol (£). Characters | { | } were totally missing; their locations were empty. Character 7E was an Overline (¯). National versions could be produced by assigning national characters in place of those characters that in ASCII are @ [ \ ] { | }. In ISO R 646-1967, though, only @ [ and ] were in place, and the remaining slots were empty. When more national characters were needed, characters ^` ¯ could also be replaced by national characters. In specific, character 7E (¯) could be used as ˜ or another diacritical sign. £ could be replaced by # in countries where £ was not needed. A special Sterling rule existed for the two characters immediately succeeding digit 9, namely the colon (:) and semicolon (;). These characters could be replaced by symbols for 10 and 11, respectively. This was to facilitate the adoption of ASCII in the sterling monetary area. In the old British monetary system, a pound was 20 shillings and a shilling was 12 pence. ISO 646-1973 was the second version of the standard. This was the first version to define an IRV. The IRV was similar to ASCII-1977 with the following differences: In place of the Dollar sign ($) there was a Currency sign (¤). Character 7E (¯) was called Overline, Tilde, but it was supposed to look like an overline in the IRV. National versions could be produced by assigning national characters in place of characters @ [ \ ] { | }. When more national characters were needed, characters ^` ¯ could be used for the same purpose. In specific, character 7E (¯) could be used as ˜ or another diacritical sign. Thus, national characters would appear at the same positions as before. The allowed characters in the "currency positions" were now (£ or #) for position 23 and ($ or ¤) for position 24. The Sterling rule was dropped now that the British Isles had moved to a decimal monetary system (in 1971). ISO 646-1983 has not been available at the time of writing this article. Based on references in other sources, the IRV kept the Currency sign (¤). A change appears to have been made in the IRV as regards the Overline or Tilde character. Different interprentations on this character have been made in related standards. ECMA-6 (1985) lists this character as TILDE, OVERLINE (~). In IA5 (1988) the character was Tilde, overline (¯). In the IBM codepage 1009, which is based on ISO 646-1983, it is (˜). The ISO International Registry appears to list the ISO 646-1983 character set as set number 002, but the actual document is actually from ISO 646-1973. ISO/IEC 646:1991 is the current release. The IRV of 1991 replaced the Currency sign (¤) by the dollar ($). Revisions of International Alphabet No. 5 (IA5 / IRA)CCITT standardized the International Alphabet No. 5 (or just IA5). It was meant for data transmission on the general telephone network or on telegraph networks. IA5 is closely related to ISO 646. IA5, 1968 version (V.3) was the initial standard. This standard has not been available at the time of writing this article. IA5, 1972 version (V.3) amended the 1968 version. IA5 of 1972 is an almost word-for-word copy of ISO 646-1973. Character 7E (¯) was called Overline, tilde. It looked like (¯) in the IRV, but could be used as ˜ or another diacritical sign in national use. IA5, 1988 version (T.50) corresponds to ISO 646-1983. The character set was an exact copy of that of the 1972 version. Confusingly, the IRV character 7E (¯) was now called Tilde, overline. It looked like an overline with no alternative representation as a tilde. No explanation was given as to its use. In national versions of IA5, no specific character was given in this position, but it should vary from nation to nation. IRA, 1992 version (T.50) is the current standard. IA5 is now called IRA, International Reference Alphabet. IRA is technically equivalent to ISO/IEC 646:1991. Changes in the IRV relative to the 1988 version of IA5 were as follows. The Currency sign (¤) was replaced by the Dollar sign ($). Character 7E was now Tilde. Confusingly, the Tilde appears as ~ on page 9 and as ˜ on page 12 of the document. TildeThe tilde (position 7E) is a character that Unicode and ASCII disagree on. Tilde was originally meant as a diacritic. Its location was higher up on the line (˜) rather than in the middle (~), making it possible to use as a diacritic to form characters such as õ or ñ. Tilde looks like (˜) in ASCII-1977 and ASCII-1968, and also in ISO 646 and IA5. The midline tilde (~) seems to have originated from a typographical error or restriction. The tilde first appeared in ASCII-1965, as printed in ACM Vol 8 Nr 4. This version had a tilde, intended as a diacritic only, but printed as both upper and midline tilde interchangeably. There was a character table with an upper tilde (˜) but in the text, the midline version was used instead. The text clearly refers to use as a diacritic only. This would not make sense with a midline tilde. Thus, the midline version was not intended. The same ambiguity was inherited by ASCII-1967 and ASCII-1968. Their text seems to require the upper position as well (see above). Unicode 1.0 re-defined the character as TILDE (U+007E), which was a spacing character, not a diacritic. Unicode 1.0 accepted both versions (~ and ˜) as alternative representations of the same character. In addition, three other tildes were encoded: ASCII style "upper tilde" (˜) became available as two additional characters, SPACING TILDE (U+02DC) and NON-SPACING TILDE (U+0303). A midline tilde was also encoded as TILDE OPERATOR (U+223C). Since Unicode 2.0, the regular TILDE is represented as a midline tilde (~). Later Unicode versions have added even more tildes. Table of differencesThe following table lists the differences of the character sets with respect to ASCII-1986. The reference line "ASCII" is on the top. An empty cell means there same character is used both in ASCII-1986 and the other set. A gray cell means no character was defined in that position. A cell with 2 or 3 characters means alternative characters were available in that position.
IR = Number in International Register of Coded Character Sets ASCII-1963 is very different from all the other sets, see below. ASCII-1963
ASA X3.4-1963 ASCII-1965
X3.4-1965 ¬ is called overline. The hook appears to distinguish it from underline. X3.4-1965 was approved as a standard, but not published. ASCII-1967 and ASCII-1968
USAS X3.4-1967 and USAS X3.4-1968 Where "#" is not required, it can be replaced by "£". "!" could be stylized as "|" to represent logical OR, and "^" could be stylized as "¬" (logical NOT). "¦" appears in two parts to prevent confusion with logical OR "|". The character "˜" is called an overline when used as punctuation, and as a tilde when used as a diacritic. It can also be used for another accent. ASCII-1977 and ASCII-1986
ANSI X3.4-1977 and X3.4-1986. View US-ASCII. The tilde (˜) was meant to be an accent, so it should appear high rather than in the middle (~). ISO-IR 006 is similar to ASCII-1977 and ASCII-1986, despite it saying it was based on ASCII-1968. ISO646 Invariant
ISO/IEC 646:1992 (ISO-IR 170). View 82 invariant graphic characters of all versions of ISO/IEC 646. ISO R / 646-1967
ISO R / 646-1967 Where "£" is not required, it can be replaced by "#". The empty slots and parenthesized slots are primarily for national characters. ISO646 IRV (1973), IA5 IRV (1973, 1988)
ISO 646-1973, CCITT V.3-1973, ITU-T T.50 (1988). View ISO646 IRV (1991), IA5 IRV (1992)
ISO/IEC 646:1991, ITU-T T.50 (1992). View Similar to US-ASCII and also ISO-IR 006. ISO646-CA Canada
CSA Z243.4-1985 (ISO-IR 121). View Alternate Primary Graphic Set Nr. 1 ISO646-CA2 Canada
CSA Z243.4-1985 (ISO-IR 122). View Alternate Primary Graphic Set Nr. 2 ISO646-CN China
GB 1988-80 (ISO-IR 057). View ISO646-CU Cuba
NC 99-10:81 (ISO-IR 151). View ISO646-DE German
DIN 66 003 (ISO-IR 021). View ISO646-ES Spanish (Olivetti)
Variant of ISO 646 for the Spanish language (ISO-IR 017). View ISO646-ES2 Spanish languages
A version of ISO 646 for the Spanish Languages (ISO-IR 085). View ISO646-FI Finland, ISO646-SE Swedish
SFS 4017, SEN 85 02 00 Annex B (ISO-IR 010). View Finland: Basic version. Finland Extended version
SFS 4017 The five positions allow an alternate symbol, if agreed on between sender and recipient. ISO646-SE2 Swedish for official writing of names
SEN 85 02 00 Annex C (ISO-IR 011). View ISO646-FR1 French (1973)
NF Z 62-010 (1973) (ISO-IR 025). View Withdrawn. ISO646-FR French (1982)
NF Z 62-010 (1982) (ISO-IR 069). View Revised. ISO646-GB UK
BS 4730 (ISO-IR 004). View ISO646-HU Hungarian
MSZ 7795/3 (ISO-IR 086). View ISO646-IT Italian (Olivetti)
Variant of ISO-7 for Italian (ISO-IR 015). View ISO646-JP Japanese Roman
JIS C 6220 1969 (ISO-IR 014). View ISO646-JP-OCR-B Japanese OCR-B
JIS C 6229-1984 (ISO-IR 092). View ISO646-NO Norwegian
NS 4551 Version 1 (ISO-IR 060). View ISO646-NO2 Norwegian v2 (withdrawn)
NS 4551 Version 2 (ISO-IR 061). View ISO646-PT Portuguese (Olivetti)
A version of ISO 646 for the Portuguese Language (ISO-IR 016). View ISO646-PT2 Portuguese (IBM)
A version of ISO 646 for the Portuguese Language (ISO-IR 084). View ISO646-YU Serbocroatian and Slovenian
JUS I.B1. 002 (ISO-IR 141). View Irish (Gaelic)
Irish Standard 433:1996 (ISO-IR 207). View T.61 Teletext
CCITT T.61 (ISO-IR 102). View NATS Finland and Sweden
ISO-IR 008-1. View Newspaper text transmission. 45=long dash, minus sign. UA=Unit space A. UB=Unit space B. 94=solid. NATS Denmark and Norway
ISO-IR 009-1. View Newspaper text transmission. 45=long dash, minus sign. UA=Unit space A. UB=Unit space B. 94=solid. Viewdata and Teletext (UK)
ISO-IR 047. View Alphanumerics for viewdata and broadcast teletext. HP German
HP PCL5 HP Spanish
HP PCL5 Unicode 1.0
Unicode 1.0. View Two alternative representations (~|˜) exist for the TILDE character. Similarly, the DOLLAR SIGN ($) has two representations, with one or two vertical bars. Unicode 2.0.0 and later
Unicode 2.0.0. View Alternative representations are no longer given. The TILDE character has a mid-line representation (~). SourcesRegisters
Individual standards
Vendor material
Last updated in January 2016: tilde, sterling. 7-bit character sets ©Aivosto Oy - |