Das Internet nutzt zur Übertragung von von Informationen adressenbasierte Vermittlungen.
Das Internet ist 2014 ubiquitär (allgegenwärtig, überall, überall vertreten, omnipräsent).
XHTML steht für
E(x)tensible-(H)yper(T)ext-(M)arkup-(L)anguage
(W3C-Standard, XML-Syntaxregeln, Wohlgeformtheit, Validierbar;
textbasierte Auszeichnungssprache fuer Dokumente, Textinhalte, Bilder, Hyperlinks)
Für die langlebigkeit von gespeicherten Informationen
ist neben den technischen Verarbeitungs- und
Zugriffsmethoden ( Betriebssysteme, Hardware)
die Beständigkeit der der Kodierungsformate (Unicode) wesentlich.
Heute wird oft UTF-8 (entsprich RFC 3629) verwendet.
Die Kodierungslänge je Zeichen nach ISO 10646 beträgt 32 bit,
die Kodierungslänge je Zeichen nach Unicode 4.0 beträgt 21 bit
1989 Vorschlag DP 10646 1991 Unicode 1.0 1993 Abgleich mit ISO 10646
XML kann UTF-Zeichensätze verwenden. UTF ist ein Synonym für Unicode Transformation Format.
Mit UTF-8 können (praktisch alle) Schriftzeichen der Welt abgebildet werden
(
Umgang mit Zeichencodierungen in HTML und CSS, für Anfänger siehe
w3.org:
Internationalisierung
,
Zeichencodierung
,
HTML5 UTF-8 ( Overlong forms )
,
unicode.org:
Unicode-Standards
,
).
Bei einem Editor (Speicherung), der den Unicode-Zeichenvorrat nicht beherrscht,
wird dann z.B. der Buchstabe ü "benummert" durch
ü
(dezimale Notation) oder
ü
(benummert: hexadezimale Notation) oder
ü
(benannt: mit Kurznamen benamte Notation).
Das Unicode-System versucht die Schriftzeichen der Welt abzubilden.
So entspricht z.B. das Eurozeichen-Symbol dem
Unicode U+20AC
,
in der benannten (mit Kurznamen benamten) XHTML-Schreibweise
€
(Anzeige: €) und
in der dezimal-benummerten XHTML-Schreibweise
€
(Anzeige: €)
und der hexadezimal-benummerten-Schreibweise
€
(Anzeige: €)
UTF ist eine Abkürzung für Unicode Transformation Format.
UTF-8 ist die am weitesten verbreitete Kodierung für Unicode-Zeichen.
encoding="UTF-8"
steht für eine internationale Kodierung
auf Basis der ISO/IEC-10646-Unicode-Norm (RFC 3629).
Jedem Unicode-Zeichen (Anzahl 1.114.112)
wird eine speziell kodierte Bytekette von variabler Länge (bis zu 4 Byte) zugeordnet.
(siehe
unicode.org
).
Die ersten Bytes einer Datei dienen der Erkennung der Zeichencodierung (BOM, Byte Order Mark, dt. Bytereihenfolge-Markierung) : "EFBBBF" = "UTF-8"; "FEFF" = "UTF-16 Big Endian"; "FFFE" = "UTF-16 Little Endian"; "0000FEFF" = "UTF-32 Big Endian"; "FFFE0000" = "UTF-32 Little Endian"; "0EFEFF" = "SCSU"; "DD736673" = "UTF-EBCDIC"; "FBEE28" = "BOCU-1"; "2B2F76382D" = "UTF-7"; "2B2F7638" = "UTF-7"; "2B2F7639" = "UTF-7"; "2B2F762B" = "UTF-7"; "2B2F762F" = "UTF-7";
Alle neuen Internetkommunikationsprotokolle sollen UTF-8 unterstützen. Die folgende Tabelle fasst einige UTF-Abhängigkeiten zusammen:
Name | UTF-8 | UTF-16 | UTF-16BE | UTF-16LE | UTF-32 | UTF-32BE | UTF-32LE |
---|---|---|---|---|---|---|---|
Smallest code point | 0000 | 0000 | 0000 | 0000 | 0000 | 0000 | 0000 |
Largest code point | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF | 10FFFF |
Code unit size | 8 bits | 16 bits | 16 bits | 16 bits | 32 bits | 32 bits | 32 bits |
Byte order | N/A | <BOM> | big-endian | little-endian | <BOM> | big-endian | little-endian |
Minimal bytes/character | 1 | 2 | 2 | 2 | 4 | 4 | 4 |
Maximal bytes/character | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
HTML 4.01 sollte immer HTML MIME type nutzen.
XHTML 1.x doctypes können HTML oder XML MIME nutzen.
Seiten ohne doctype werden (2011) als HTML5-Seiten betrachtet.
Hier einige Links: W3C Internationalization Checker W3C BOM-Tester W3C-Tester: UTF-8-Signatur (Byte Order Mark,hex-EFBBBF-BOM) W3C-Validierer: validator.w3.org rexswain.com: HTTP Viewer Mozilla: Web-Sniffer
In der Typografie ist eine Glyphe die grafische Darstellung eines Schriftzeichens (z.B. Buchstabens, Silbenzeichen, Ligatur oder Buchstabenteil). Die Glyphe bildet dabei in sich eine grafische Einheit.
Die Entwicklung von Schriftzeichen, die gesetzt stets zu einem ästetischen Ganzen bilden, ist eine schwieriger, umfangreicher Design-Prozess. bei der Verwendung von Fonts und Glyphs ist auf die Rechte zu achten. Rechte halten z.B. Adobe Systems Incorporated, Monotype Imaging, Apple Computer Inc., Atelier Fluxus Virus, Beijing Zhong Yi (Zheng Code) Electronics Company, DecoType Inc., Evertype, Hapax, IBM Corporation, Microsoft Corporation, Peking University Founder Group Corporation, Production First Software, SIL International, STAR - Sylheti Translation And Research, usw.
Beim Unicode darf ein Character-Set-Name (MIB) bis zu 40 Buchstaben enthalten. Die folgenden Normen (USC = Universal Character Set) legen Zeichensätze fest:
Es gibt viele Character Sets (RFC,ISO,UTF,DIN,IBM,HP,usw.), die zunehmend vereinheitlicht und durch "Unicode Character Database" abgelöst werden.
=================================================================== These are the official names for character sets that may be used in the Internet and may be referred to in Internet documentation. These names are expressed in ANSI_X3.4-1968 which is commonly called US-ASCII or simply ASCII. The character set most commonly use in the Internet and used especially in protocol standards is US-ASCII, this is strongly encouraged. The use of the name US-ASCII is also encouraged. The character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters. The MIBenum value is a unique value for use in MIBs to identify coded character sets. The value space for MIBenum values has been divided into three regions. The first region (3-999) consists of coded character sets that have been standardized by some standard setting organization. This region is intended for standards that do not have subset implementations. The second region (1000-1999) is for the Unicode and ISO/IEC 10646 coded character sets together with a specification of a (set of) sub-repertoires that may occur. The third region (>1999) is intended for vendor specific coded character sets. Assigned MIB enum Numbers ------------------------- 0-2 Reserved 3-999 Set By Standards Organizations 1000-1999 Unicode / 10646 2000-2999 Vendor The aliases that start with "cs" have been added for use with the IANA-CHARSET-MIB as originally defined in RFC3808, and as currently maintained by IANA at http://www.iana.org/assignments/ianacharset-mib. Note that the ianacharset-mib needs to be kept in sync with this registry. These aliases that start with "cs" contain the standard numbers along with suggestive names in order to facilitate applications that want to display the names in user interfaces. The "cs" stands for character set and is provided for applications that need a lower case first letter but want to use mixed case thereafter that cannot contain any special characters, such as underbar ("_") and dash ("-"). If the character set is from an ISO standard, its cs alias is the ISO standard number or name. If the character set is not from an ISO standard, but is registered with ISO (IPSJ/ITSCJ is the current ISO Registration Authority), the ISO Registry number is specified as ISOnnn followed by letters suggestive of the name or standards number of the code set. When a national or international standard is revised, the year of revision is added to the cs alias of the new character set entry in the IANA Registry in order to distinguish the revised character set from the original character set. Character Set Reference ------------- --------- Name: ANSI_X3.4-1968 [RFC1345,KXS2] MIBenum: 3 Source: ECMA registry Alias: iso-ir-6 Alias: ANSI_X3.4-1986 Alias: ISO_646.irv:1991 Alias: ASCII Alias: ISO646-US Alias: US-ASCII (preferred MIME name) Alias: us Alias: IBM367 Alias: cp367 Alias: csASCII Name: ISO_8859-1:1987 [RFC1345,KXS2] MIBenum: 4 Source: ECMA registry Alias: iso-ir-100 Alias: ISO_8859-1 Alias: ISO-8859-1 (preferred MIME name) Alias: latin1 Alias: l1 Alias: IBM819 Alias: CP819 Alias: csISOLatin1 Name: ISO_8859-2:1987 [RFC1345,KXS2] MIBenum: 5 Source: ECMA registry Alias: iso-ir-101 Alias: ISO_8859-2 Alias: ISO-8859-2 (preferred MIME name) Alias: latin2 Alias: l2 Alias: csISOLatin2 Name: ISO_8859-3:1988 [RFC1345,KXS2] MIBenum: 6 Source: ECMA registry Alias: iso-ir-109 Alias: ISO_8859-3 Alias: ISO-8859-3 (preferred MIME name) Alias: latin3 Alias: l3 Alias: csISOLatin3 Name: ISO_8859-4:1988 [RFC1345,KXS2] MIBenum: 7 Source: ECMA registry Alias: iso-ir-110 Alias: ISO_8859-4 Alias: ISO-8859-4 (preferred MIME name) Alias: latin4 Alias: l4 Alias: csISOLatin4 Name: ISO_8859-5:1988 [RFC1345,KXS2] MIBenum: 8 Source: ECMA registry Alias: iso-ir-144 Alias: ISO_8859-5 Alias: ISO-8859-5 (preferred MIME name) Alias: cyrillic Alias: csISOLatinCyrillic Name: ISO_8859-6:1987 [RFC1345,KXS2] MIBenum: 9 Source: ECMA registry Alias: iso-ir-127 Alias: ISO_8859-6 Alias: ISO-8859-6 (preferred MIME name) Alias: ECMA-114 Alias: ASMO-708 Alias: arabic Alias: csISOLatinArabic Name: ISO_8859-7:1987 [RFC1947,RFC1345,KXS2] MIBenum: 10 Source: ECMA registry Alias: iso-ir-126 Alias: ISO_8859-7 Alias: ISO-8859-7 (preferred MIME name) Alias: ELOT_928 Alias: ECMA-118 Alias: greek Alias: greek8 Alias: csISOLatinGreek Name: ISO_8859-8:1988 [RFC1345,KXS2] MIBenum: 11 Source: ECMA registry Alias: iso-ir-138 Alias: ISO_8859-8 Alias: ISO-8859-8 (preferred MIME name) Alias: hebrew Alias: csISOLatinHebrew Name: ISO_8859-9:1989 [RFC1345,KXS2] MIBenum: 12 Source: ECMA registry Alias: iso-ir-148 Alias: ISO_8859-9 Alias: ISO-8859-9 (preferred MIME name) Alias: latin5 Alias: l5 Alias: csISOLatin5 Name: ISO-8859-10 (preferred MIME name) [RFC1345,KXS2] MIBenum: 13 Source: ECMA registry Alias: iso-ir-157 Alias: l6 Alias: ISO_8859-10:1992 Alias: csISOLatin6 Alias: latin6 Name: ISO_6937-2-add [RFC1345,KXS2] MIBenum: 14 Source: ECMA registry and ISO 6937-2:1983 Alias: iso-ir-142 Alias: csISOTextComm Name: JIS_X0201 [RFC1345,KXS2] MIBenum: 15 Source: JIS X 0201-1976. One byte only, this is equivalent to JIS/Roman (similar to ASCII) plus eight-bit half-width Katakana Alias: X0201 Alias: csHalfWidthKatakana Name: JIS_Encoding MIBenum: 16 Source: JIS X 0202-1991. Uses ISO 2022 escape sequences to shift code sets as documented in JIS X 0202-1991. Alias: csJISEncoding Name: Shift_JIS (preferred MIME name) MIBenum: 17 Source: This charset is an extension of csHalfWidthKatakana by adding graphic characters in JIS X 0208. The CCS's are JIS X0201:1997 and JIS X0208:1997. The complete definition is shown in Appendix 1 of JIS X0208:1997. This charset can be used for the top-level media type "text". Alias: MS_Kanji Alias: csShiftJIS Name: Extended_UNIX_Code_Packed_Format_for_Japanese MIBenum: 18 Source: Standardized by OSF, UNIX International, and UNIX Systems Laboratories Pacific. Uses ISO 2022 rules to select code set 0: US-ASCII (a single 7-bit byte set) code set 1: JIS X0208-1990 (a double 8-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) requiring SS2 as the character prefix code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes requiring SS3 as the character prefix Alias: csEUCPkdFmtJapanese Alias: EUC-JP (preferred MIME name) Name: Extended_UNIX_Code_Fixed_Width_for_Japanese MIBenum: 19 Source: Used in Japan. Each character is 2 octets. code set 0: US-ASCII (a single 7-bit byte set) 1st byte = 00 2nd byte = 20-7E code set 1: JIS X0208-1990 (a double 7-bit byte set) restricted to A0-FF in both bytes code set 2: Half Width Katakana (a single 7-bit byte set) 1st byte = 00 2nd byte = A0-FF code set 3: JIS X0212-1990 (a double 7-bit byte set) restricted to A0-FF in the first byte and 21-7E in the second byte Alias: csEUCFixWidJapanese Name: BS_4730 [RFC1345,KXS2] MIBenum: 20 Source: ECMA registry Alias: iso-ir-4 Alias: ISO646-GB Alias: gb Alias: uk Alias: csISO4UnitedKingdom Name: SEN_850200_C [RFC1345,KXS2] MIBenum: 21 Source: ECMA registry Alias: iso-ir-11 Alias: ISO646-SE2 Alias: se2 Alias: csISO11SwedishForNames Name: IT [RFC1345,KXS2] MIBenum: 22 Source: ECMA registry Alias: iso-ir-15 Alias: ISO646-IT Alias: csISO15Italian Name: ES [RFC1345,KXS2] MIBenum: 23 Source: ECMA registry Alias: iso-ir-17 Alias: ISO646-ES Alias: csISO17Spanish Name: DIN_66003 [RFC1345,KXS2] MIBenum: 24 Source: ECMA registry Alias: iso-ir-21 Alias: de Alias: ISO646-DE Alias: csISO21German Name: NS_4551-1 [RFC1345,KXS2] MIBenum: 25 Source: ECMA registry Alias: iso-ir-60 Alias: ISO646-NO Alias: no Alias: csISO60DanishNorwegian Alias: csISO60Norwegian1 Name: NF_Z_62-010 [RFC1345,KXS2] MIBenum: 26 Source: ECMA registry Alias: iso-ir-69 Alias: ISO646-FR Alias: fr Alias: csISO69French Name: ISO-10646-UTF-1 MIBenum: 27 Source: Universal Transfer Format (1), this is the multibyte encoding, that subsets ASCII-7. It does not have byte ordering issues. Alias: csISO10646UTF1 Name: ISO_646.basic:1983 [RFC1345,KXS2] MIBenum: 28 Source: ECMA registry Alias: ref Alias: csISO646basic1983 Name: INVARIANT [RFC1345,KXS2] MIBenum: 29 Alias: csINVARIANT Name: ISO_646.irv:1983 [RFC1345,KXS2] MIBenum: 30 Source: ECMA registry Alias: iso-ir-2 Alias: irv Alias: csISO2IntlRefVersion Name: NATS-SEFI [RFC1345,KXS2] MIBenum: 31 Source: ECMA registry Alias: iso-ir-8-1 Alias: csNATSSEFI Name: NATS-SEFI-ADD [RFC1345,KXS2] MIBenum: 32 Source: ECMA registry Alias: iso-ir-8-2 Alias: csNATSSEFIADD Name: NATS-DANO [RFC1345,KXS2] MIBenum: 33 Source: ECMA registry Alias: iso-ir-9-1 Alias: csNATSDANO Name: NATS-DANO-ADD [RFC1345,KXS2] MIBenum: 34 Source: ECMA registry Alias: iso-ir-9-2 Alias: csNATSDANOADD Name: SEN_850200_B [RFC1345,KXS2] MIBenum: 35 Source: ECMA registry Alias: iso-ir-10 Alias: FI Alias: ISO646-FI Alias: ISO646-SE Alias: se Alias: csISO10Swedish Name: KS_C_5601-1987 [RFC1345,KXS2] MIBenum: 36 Source: ECMA registry Alias: iso-ir-149 Alias: KS_C_5601-1989 Alias: KSC_5601 Alias: korean Alias: csKSC56011987 Name: ISO-2022-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 37 Source: RFC-1557 (see also KS_C_5601-1987) Alias: csISO2022KR Name: EUC-KR (preferred MIME name) [RFC1557,Choi] MIBenum: 38 Source: RFC-1557 (see also KS_C_5861-1992) Alias: csEUCKR Name: ISO-2022-JP (preferred MIME name) [RFC1468,Murai] MIBenum: 39 Source: RFC-1468 (see also RFC-2237) Alias: csISO2022JP Name: ISO-2022-JP-2 (preferred MIME name) [RFC1554,Ohta] MIBenum: 40 Source: RFC-1554 Alias: csISO2022JP2 Name: JIS_C6220-1969-jp [RFC1345,KXS2] MIBenum: 41 Source: ECMA registry Alias: JIS_C6220-1969 Alias: iso-ir-13 Alias: katakana Alias: x0201-7 Alias: csISO13JISC6220jp Name: JIS_C6220-1969-ro [RFC1345,KXS2] MIBenum: 42 Source: ECMA registry Alias: iso-ir-14 Alias: jp Alias: ISO646-JP Alias: csISO14JISC6220ro Name: PT [RFC1345,KXS2] MIBenum: 43 Source: ECMA registry Alias: iso-ir-16 Alias: ISO646-PT Alias: csISO16Portuguese Name: greek7-old [RFC1345,KXS2] MIBenum: 44 Source: ECMA registry Alias: iso-ir-18 Alias: csISO18Greek7Old Name: latin-greek [RFC1345,KXS2] MIBenum: 45 Source: ECMA registry Alias: iso-ir-19 Alias: csISO19LatinGreek Name: NF_Z_62-010_(1973) [RFC1345,KXS2] MIBenum: 46 Source: ECMA registry Alias: iso-ir-25 Alias: ISO646-FR1 Alias: csISO25French Name: Latin-greek-1 [RFC1345,KXS2] MIBenum: 47 Source: ECMA registry Alias: iso-ir-27 Alias: csISO27LatinGreek1 Name: ISO_5427 [RFC1345,KXS2] MIBenum: 48 Source: ECMA registry Alias: iso-ir-37 Alias: csISO5427Cyrillic Name: JIS_C6226-1978 [RFC1345,KXS2] MIBenum: 49 Source: ECMA registry Alias: iso-ir-42 Alias: csISO42JISC62261978 Name: BS_viewdata [RFC1345,KXS2] MIBenum: 50 Source: ECMA registry Alias: iso-ir-47 Alias: csISO47BSViewdata Name: INIS [RFC1345,KXS2] MIBenum: 51 Source: ECMA registry Alias: iso-ir-49 Alias: csISO49INIS Name: INIS-8 [RFC1345,KXS2] MIBenum: 52 Source: ECMA registry Alias: iso-ir-50 Alias: csISO50INIS8 Name: INIS-cyrillic [RFC1345,KXS2] MIBenum: 53 Source: ECMA registry Alias: iso-ir-51 Alias: csISO51INISCyrillic Name: ISO_5427:1981 [RFC1345,KXS2] MIBenum: 54 Source: ECMA registry Alias: iso-ir-54 Alias: ISO5427Cyrillic1981 Name: ISO_5428:1980 [RFC1345,KXS2] MIBenum: 55 Source: ECMA registry Alias: iso-ir-55 Alias: csISO5428Greek Name: GB_1988-80 [RFC1345,KXS2] MIBenum: 56 Source: ECMA registry Alias: iso-ir-57 Alias: cn Alias: ISO646-CN Alias: csISO57GB1988 Name: GB_2312-80 [RFC1345,KXS2] MIBenum: 57 Source: ECMA registry Alias: iso-ir-58 Alias: chinese Alias: csISO58GB231280 Name: NS_4551-2 [RFC1345,KXS2] MIBenum: 58 Source: ECMA registry Alias: ISO646-NO2 Alias: iso-ir-61 Alias: no2 Alias: csISO61Norwegian2 Name: videotex-suppl [RFC1345,KXS2] MIBenum: 59 Source: ECMA registry Alias: iso-ir-70 Alias: csISO70VideotexSupp1 Name: PT2 [RFC1345,KXS2] MIBenum: 60 Source: ECMA registry Alias: iso-ir-84 Alias: ISO646-PT2 Alias: csISO84Portuguese2 Name: ES2 [RFC1345,KXS2] MIBenum: 61 Source: ECMA registry Alias: iso-ir-85 Alias: ISO646-ES2 Alias: csISO85Spanish2 Name: MSZ_7795.3 [RFC1345,KXS2] MIBenum: 62 Source: ECMA registry Alias: iso-ir-86 Alias: ISO646-HU Alias: hu Alias: csISO86Hungarian Name: JIS_C6226-1983 [RFC1345,KXS2] MIBenum: 63 Source: ECMA registry Alias: iso-ir-87 Alias: x0208 Alias: JIS_X0208-1983 Alias: csISO87JISX0208 Name: greek7 [RFC1345,KXS2] MIBenum: 64 Source: ECMA registry Alias: iso-ir-88 Alias: csISO88Greek7 Name: ASMO_449 [RFC1345,KXS2] MIBenum: 65 Source: ECMA registry Alias: ISO_9036 Alias: arabic7 Alias: iso-ir-89 Alias: csISO89ASMO449 Name: iso-ir-90 [RFC1345,KXS2] MIBenum: 66 Source: ECMA registry Alias: csISO90 Name: JIS_C6229-1984-a [RFC1345,KXS2] MIBenum: 67 Source: ECMA registry Alias: iso-ir-91 Alias: jp-ocr-a Alias: csISO91JISC62291984a Name: JIS_C6229-1984-b [RFC1345,KXS2] MIBenum: 68 Source: ECMA registry Alias: iso-ir-92 Alias: ISO646-JP-OCR-B Alias: jp-ocr-b Alias: csISO92JISC62991984b Name: JIS_C6229-1984-b-add [RFC1345,KXS2] MIBenum: 69 Source: ECMA registry Alias: iso-ir-93 Alias: jp-ocr-b-add Alias: csISO93JIS62291984badd Name: JIS_C6229-1984-hand [RFC1345,KXS2] MIBenum: 70 Source: ECMA registry Alias: iso-ir-94 Alias: jp-ocr-hand Alias: csISO94JIS62291984hand Name: JIS_C6229-1984-hand-add [RFC1345,KXS2] MIBenum: 71 Source: ECMA registry Alias: iso-ir-95 Alias: jp-ocr-hand-add Alias: csISO95JIS62291984handadd Name: JIS_C6229-1984-kana [RFC1345,KXS2] MIBenum: 72 Source: ECMA registry Alias: iso-ir-96 Alias: csISO96JISC62291984kana Name: ISO_2033-1983 [RFC1345,KXS2] MIBenum: 73 Source: ECMA registry Alias: iso-ir-98 Alias: e13b Alias: csISO2033 Name: ANSI_X3.110-1983 [RFC1345,KXS2] MIBenum: 74 Source: ECMA registry Alias: iso-ir-99 Alias: CSA_T500-1983 Alias: NAPLPS Alias: csISO99NAPLPS Name: T.61-7bit [RFC1345,KXS2] MIBenum: 75 Source: ECMA registry Alias: iso-ir-102 Alias: csISO102T617bit Name: T.61-8bit [RFC1345,KXS2] MIBenum: 76 Alias: T.61 Source: ECMA registry Alias: iso-ir-103 Alias: csISO103T618bit Name: ECMA-cyrillic MIBenum: 77 Source: ISO registry (formerly ECMA registry) http://www.itscj.ipsj.jp/ISO-IR/111.pdf Alias: iso-ir-111 Alias: KOI8-E Alias: csISO111ECMACyrillic Name: CSA_Z243.4-1985-1 [RFC1345,KXS2] MIBenum: 78 Source: ECMA registry Alias: iso-ir-121 Alias: ISO646-CA Alias: csa7-1 Alias: ca Alias: csISO121Canadian1 Name: CSA_Z243.4-1985-2 [RFC1345,KXS2] MIBenum: 79 Source: ECMA registry Alias: iso-ir-122 Alias: ISO646-CA2 Alias: csa7-2 Alias: csISO122Canadian2 Name: CSA_Z243.4-1985-gr [RFC1345,KXS2] MIBenum: 80 Source: ECMA registry Alias: iso-ir-123 Alias: csISO123CSAZ24341985gr Name: ISO_8859-6-E [RFC1556,IANA] MIBenum: 81 Source: RFC1556 Alias: csISO88596E Alias: ISO-8859-6-E (preferred MIME name) Name: ISO_8859-6-I [RFC1556,IANA] MIBenum: 82 Source: RFC1556 Alias: csISO88596I Alias: ISO-8859-6-I (preferred MIME name) Name: T.101-G2 [RFC1345,KXS2] MIBenum: 83 Source: ECMA registry Alias: iso-ir-128 Alias: csISO128T101G2 Name: ISO_8859-8-E [RFC1556,Nussbacher] MIBenum: 84 Source: RFC1556 Alias: csISO88598E Alias: ISO-8859-8-E (preferred MIME name) Name: ISO_8859-8-I [RFC1556,Nussbacher] MIBenum: 85 Source: RFC1556 Alias: csISO88598I Alias: ISO-8859-8-I (preferred MIME name) Name: CSN_369103 [RFC1345,KXS2] MIBenum: 86 Source: ECMA registry Alias: iso-ir-139 Alias: csISO139CSN369103 Name: JUS_I.B1.002 [RFC1345,KXS2] MIBenum: 87 Source: ECMA registry Alias: iso-ir-141 Alias: ISO646-YU Alias: js Alias: yu Alias: csISO141JUSIB1002 Name: IEC_P27-1 [RFC1345,KXS2] MIBenum: 88 Source: ECMA registry Alias: iso-ir-143 Alias: csISO143IECP271 Name: JUS_I.B1.003-serb [RFC1345,KXS2] MIBenum: 89 Source: ECMA registry Alias: iso-ir-146 Alias: serbian Alias: csISO146Serbian Name: JUS_I.B1.003-mac [RFC1345,KXS2] MIBenum: 90 Source: ECMA registry Alias: macedonian Alias: iso-ir-147 Alias: csISO147Macedonian Name: greek-ccitt [RFC1345,KXS2] MIBenum: 91 Source: ECMA registry Alias: iso-ir-150 Alias: csISO150 Alias: csISO150GreekCCITT Name: NC_NC00-10:81 [RFC1345,KXS2] MIBenum: 92 Source: ECMA registry Alias: cuba Alias: iso-ir-151 Alias: ISO646-CU Alias: csISO151Cuba Name: ISO_6937-2-25 [RFC1345,KXS2] MIBenum: 93 Source: ECMA registry Alias: iso-ir-152 Alias: csISO6937Add Name: GOST_19768-74 [RFC1345,KXS2] MIBenum: 94 Source: ECMA registry Alias: ST_SEV_358-88 Alias: iso-ir-153 Alias: csISO153GOST1976874 Name: ISO_8859-supp [RFC1345,KXS2] MIBenum: 95 Source: ECMA registry Alias: iso-ir-154 Alias: latin1-2-5 Alias: csISO8859Supp Name: ISO_10367-box [RFC1345,KXS2] MIBenum: 96 Source: ECMA registry Alias: iso-ir-155 Alias: csISO10367Box Name: latin-lap [RFC1345,KXS2] MIBenum: 97 Source: ECMA registry Alias: lap Alias: iso-ir-158 Alias: csISO158Lap Name: JIS_X0212-1990 [RFC1345,KXS2] MIBenum: 98 Source: ECMA registry Alias: x0212 Alias: iso-ir-159 Alias: csISO159JISX02121990 Name: DS_2089 [RFC1345,KXS2] MIBenum: 99 Source: Danish Standard, DS 2089, February 1974 Alias: DS2089 Alias: ISO646-DK Alias: dk Alias: csISO646Danish Name: us-dk [RFC1345,KXS2] MIBenum: 100 Alias: csUSDK Name: dk-us [RFC1345,KXS2] MIBenum: 101 Alias: csDKUS Name: KSC5636 [RFC1345,KXS2] MIBenum: 102 Alias: ISO646-KR Alias: csKSC5636 Name: UNICODE-1-1-UTF-7 [RFC1642] MIBenum: 103 Source: RFC 1642 Alias: csUnicode11UTF7 Name: ISO-2022-CN [RFC1922] MIBenum: 104 Source: RFC-1922 Name: ISO-2022-CN-EXT [RFC1922] MIBenum: 105 Source: RFC-1922 Name: UTF-8 [RFC3629] MIBenum: 106 Source: RFC 3629 Alias: None Name: ISO-8859-13 MIBenum: 109 Source: ISO See (http://www.iana.org/assignments/charset-reg/ISO-8859-13)[Tumasonis] Alias: None Name: ISO-8859-14 MIBenum: 110 Source: ISO See (http://www.iana.org/assignments/charset-reg/ISO-8859-14) [Simonsen] Alias: iso-ir-199 Alias: ISO_8859-14:1998 Alias: ISO_8859-14 Alias: latin8 Alias: iso-celtic Alias: l8 Name: ISO-8859-15 MIBenum: 111 Source: ISO Please see: <http://www.iana.org/assignments/charset-reg/ISO-8859-15> Alias: ISO_8859-15 Alias: Latin-9 Name: ISO-8859-16 MIBenum: 112 Source: ISO Alias: iso-ir-226 Alias: ISO_8859-16:2001 Alias: ISO_8859-16 Alias: latin10 Alias: l10 Name: GBK MIBenum: 113 Source: Chinese IT Standardization Technical Committee Please see: <http://www.iana.org/assignments/charset-reg/GBK> Alias: CP936 Alias: MS936 Alias: windows-936 Name: GB18030 MIBenum: 114 Source: Chinese IT Standardization Technical Committee Please see: <http://www.iana.org/assignments/charset-reg/GB18030> Alias: None Name: OSD_EBCDIC_DF04_15 MIBenum: 115 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-15> Alias: None Name: OSD_EBCDIC_DF03_IRV MIBenum: 116 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF03-IRV> Alias: None Name: OSD_EBCDIC_DF04_1 MIBenum: 117 Source: Fujitsu-Siemens standard mainframe EBCDIC encoding Please see: <http://www.iana.org/assignments/charset-reg/OSD-EBCDIC-DF04-1> Alias: None Name: ISO-11548-1 MIBenum: 118 Source: See <http://www.iana.org/assignments/charset-reg/ISO-11548-1> [Thibault] Alias: ISO_11548-1 Alias: ISO_TR_11548-1 Alias: csISO115481 Name: KZ-1048 MIBenum: 119 Source: See <http://www.iana.org/assignments/charset-reg/KZ-1048> [Veremeev, Kikkarin] Alias: STRK1048-2002 Alias: RK1048 Alias: csKZ1048 Name: ISO-10646-UCS-2 MIBenum: 1000 Source: the 2-octet Basic Multilingual Plane, aka Unicode this needs to specify network byte order: the standard does not specify (it is a 16-bit integer space) Alias: csUnicode Name: ISO-10646-UCS-4 MIBenum: 1001 Source: the full code space. (same comment about byte order, these are 31-bit numbers. Alias: csUCS4 Name: ISO-10646-UCS-Basic MIBenum: 1002 Source: ASCII subset of Unicode. Basic Latin = collection 1 See ISO 10646, Appendix A Alias: csUnicodeASCII Name: ISO-10646-Unicode-Latin1 MIBenum: 1003 Source: ISO Latin-1 subset of Unicode. Basic Latin and Latin-1 Supplement = collections 1 and 2. See ISO 10646, Appendix A. See RFC 1815. Alias: csUnicodeLatin1 Alias: ISO-10646 Name: ISO-10646-J-1 Source: ISO 10646 Japanese, see RFC 1815. Name: ISO-Unicode-IBM-1261 MIBenum: 1005 Source: IBM Latin-2, -3, -5, Extended Presentation Set, GCSGID: 1261 Alias: csUnicodeIBM1261 Name: ISO-Unicode-IBM-1268 MIBenum: 1006 Source: IBM Latin-4 Extended Presentation Set, GCSGID: 1268 Alias: csUnicodeIBM1268 Name: ISO-Unicode-IBM-1276 MIBenum: 1007 Source: IBM Cyrillic Greek Extended Presentation Set, GCSGID: 1276 Alias: csUnicodeIBM1276 Name: ISO-Unicode-IBM-1264 MIBenum: 1008 Source: IBM Arabic Presentation Set, GCSGID: 1264 Alias: csUnicodeIBM1264 Name: ISO-Unicode-IBM-1265 MIBenum: 1009 Source: IBM Hebrew Presentation Set, GCSGID: 1265 Alias: csUnicodeIBM1265 Name: UNICODE-1-1 [RFC1641] MIBenum: 1010 Source: RFC 1641 Alias: csUnicode11 Name: SCSU MIBenum: 1011 Source: SCSU See (http://www.iana.org/assignments/charset-reg/SCSU) [Scherer] Alias: None Name: UTF-7 [RFC2152] MIBenum: 1012 Source: RFC 2152 Alias: None Name: UTF-16BE [RFC2781] MIBenum: 1013 Source: RFC 2781 Alias: None Name: UTF-16LE [RFC2781] MIBenum: 1014 Source: RFC 2781 Alias: None Name: UTF-16 [RFC2781] MIBenum: 1015 Source: RFC 2781 Alias: None Name: CESU-8 [Phipps] MIBenum: 1016 Source: <http://www.unicode.org/unicode/reports/tr26> Alias: csCESU-8 Name: UTF-32 [Davis] MIBenum: 1017 Source: <http://www.unicode.org/unicode/reports/tr19/> Alias: None Name: UTF-32BE [Davis] MIBenum: 1018 Source: <http://www.unicode.org/unicode/reports/tr19/> Alias: None Name: UTF-32LE [Davis] MIBenum: 1019 Source: <http://www.unicode.org/unicode/reports/tr19/> Alias: None Name: BOCU-1 [Scherer] MIBenum: 1020 Source: http://www.unicode.org/notes/tn6/ Alias: csBOCU-1 Name: ISO-8859-1-Windows-3.0-Latin-1 [HP-PCL5] MIBenum: 2000 Source: Extended ISO 8859-1 Latin-1 for Windows 3.0. PCL Symbol Set id: 9U Alias: csWindows30Latin1 Name: ISO-8859-1-Windows-3.1-Latin-1 [HP-PCL5] MIBenum: 2001 Source: Extended ISO 8859-1 Latin-1 for Windows 3.1. PCL Symbol Set id: 19U Alias: csWindows31Latin1 Name: ISO-8859-2-Windows-Latin-2 [HP-PCL5] MIBenum: 2002 Source: Extended ISO 8859-2. Latin-2 for Windows 3.1. PCL Symbol Set id: 9E Alias: csWindows31Latin2 Name: ISO-8859-9-Windows-Latin-5 [HP-PCL5] MIBenum: 2003 Source: Extended ISO 8859-9. Latin-5 for Windows 3.1 PCL Symbol Set id: 5T Alias: csWindows31Latin5 Name: hp-roman8 [HP-PCL5,RFC1345,KXS2] MIBenum: 2004 Source: LaserJet IIP Printer User's Manual, HP part no 33471-90901, Hewlet-Packard, June 1989. Alias: roman8 Alias: r8 Alias: csHPRoman8 Name: Adobe-Standard-Encoding [Adobe] MIBenum: 2005 Source: PostScript Language Reference Manual PCL Symbol Set id: 10J Alias: csAdobeStandardEncoding Name: Ventura-US [HP-PCL5] MIBenum: 2006 Source: Ventura US. ASCII plus characters typically used in publishing, like pilcrow, copyright, registered, trade mark, section, dagger, and double dagger in the range A0 (hex) to FF (hex). PCL Symbol Set id: 14J Alias: csVenturaUS Name: Ventura-International [HP-PCL5] MIBenum: 2007 Source: Ventura International. ASCII plus coded characters similar to Roman8. PCL Symbol Set id: 13J Alias: csVenturaInternational Name: DEC-MCS [RFC1345,KXS2] MIBenum: 2008 Source: VAX/VMS User's Manual, Order Number: AI-Y517A-TE, April 1986. Alias: dec Alias: csDECMCS Name: IBM850 [RFC1345,KXS2] MIBenum: 2009 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp850 Alias: 850 Alias: csPC850Multilingual Name: PC8-Danish-Norwegian [HP-PCL5] MIBenum: 2012 Source: PC Danish Norwegian 8-bit PC set for Danish Norwegian PCL Symbol Set id: 11U Alias: csPC8DanishNorwegian Name: IBM862 [RFC1345,KXS2] MIBenum: 2013 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp862 Alias: 862 Alias: csPC862LatinHebrew Name: PC8-Turkish [HP-PCL5] MIBenum: 2014 Source: PC Latin Turkish. PCL Symbol Set id: 9T Alias: csPC8Turkish Name: IBM-Symbols [IBM-CIDT] MIBenum: 2015 Source: Presentation Set, CPGID: 259 Alias: csIBMSymbols Name: IBM-Thai [IBM-CIDT] MIBenum: 2016 Source: Presentation Set, CPGID: 838 Alias: csIBMThai Name: HP-Legal [HP-PCL5] MIBenum: 2017 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 1U Alias: csHPLegal Name: HP-Pi-font [HP-PCL5] MIBenum: 2018 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 15U Alias: csHPPiFont Name: HP-Math8 [HP-PCL5] MIBenum: 2019 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 8M Alias: csHPMath8 Name: Adobe-Symbol-Encoding [Adobe] MIBenum: 2020 Source: PostScript Language Reference Manual PCL Symbol Set id: 5M Alias: csHPPSMath Name: HP-DeskTop [HP-PCL5] MIBenum: 2021 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 7J Alias: csHPDesktop Name: Ventura-Math [HP-PCL5] MIBenum: 2022 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 6M Alias: csVenturaMath Name: Microsoft-Publishing [HP-PCL5] MIBenum: 2023 Source: PCL 5 Comparison Guide, Hewlett-Packard, HP part number 5961-0510, October 1992 PCL Symbol Set id: 6J Alias: csMicrosoftPublishing Name: Windows-31J MIBenum: 2024 Source: Windows Japanese. A further extension of Shift_JIS to include NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). The CCS's are JIS X0201:1997, JIS X0208:1997, and these extensions. This charset can be used for the top-level media type "text", but it is of limited or specialized use (see RFC2278). PCL Symbol Set id: 19K Alias: csWindows31J Name: GB2312 (preferred MIME name) MIBenum: 2025 Source: Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C Alias: csGB2312 Name: Big5 (preferred MIME name) MIBenum: 2026 Source: Chinese for Taiwan Multi-byte set. PCL Symbol Set Id: 18T Alias: csBig5 Name: macintosh [RFC1345,KXS2] MIBenum: 2027 Source: The Unicode Standard ver1.0, ISBN 0-201-56788-1, Oct 1991 Alias: mac Alias: csMacintosh Name: IBM037 [RFC1345,KXS2] MIBenum: 2028 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp037 Alias: ebcdic-cp-us Alias: ebcdic-cp-ca Alias: ebcdic-cp-wt Alias: ebcdic-cp-nl Alias: csIBM037 Name: IBM038 [RFC1345,KXS2] MIBenum: 2029 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-INT Alias: cp038 Alias: csIBM038 Name: IBM273 [RFC1345,KXS2] MIBenum: 2030 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP273 Alias: csIBM273 Name: IBM274 [RFC1345,KXS2] MIBenum: 2031 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-BE Alias: CP274 Alias: csIBM274 Name: IBM275 [RFC1345,KXS2] MIBenum: 2032 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: EBCDIC-BR Alias: cp275 Alias: csIBM275 Name: IBM277 [RFC1345,KXS2] MIBenum: 2033 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: EBCDIC-CP-DK Alias: EBCDIC-CP-NO Alias: csIBM277 Name: IBM278 [RFC1345,KXS2] MIBenum: 2034 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP278 Alias: ebcdic-cp-fi Alias: ebcdic-cp-se Alias: csIBM278 Name: IBM280 [RFC1345,KXS2] MIBenum: 2035 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP280 Alias: ebcdic-cp-it Alias: csIBM280 Name: IBM281 [RFC1345,KXS2] MIBenum: 2036 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: EBCDIC-JP-E Alias: cp281 Alias: csIBM281 Name: IBM284 [RFC1345,KXS2] MIBenum: 2037 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP284 Alias: ebcdic-cp-es Alias: csIBM284 Name: IBM285 [RFC1345,KXS2] MIBenum: 2038 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP285 Alias: ebcdic-cp-gb Alias: csIBM285 Name: IBM290 [RFC1345,KXS2] MIBenum: 2039 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: cp290 Alias: EBCDIC-JP-kana Alias: csIBM290 Name: IBM297 [RFC1345,KXS2] MIBenum: 2040 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp297 Alias: ebcdic-cp-fr Alias: csIBM297 Name: IBM420 [RFC1345,KXS2] MIBenum: 2041 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990, IBM NLS RM p 11-11 Alias: cp420 Alias: ebcdic-cp-ar1 Alias: csIBM420 Name: IBM423 [RFC1345,KXS2] MIBenum: 2042 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp423 Alias: ebcdic-cp-gr Alias: csIBM423 Name: IBM424 [RFC1345,KXS2] MIBenum: 2043 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp424 Alias: ebcdic-cp-he Alias: csIBM424 Name: IBM437 [RFC1345,KXS2] MIBenum: 2011 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp437 Alias: 437 Alias: csPC8CodePage437 Name: IBM500 [RFC1345,KXS2] MIBenum: 2044 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP500 Alias: ebcdic-cp-be Alias: ebcdic-cp-ch Alias: csIBM500 Name: IBM851 [RFC1345,KXS2] MIBenum: 2045 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp851 Alias: 851 Alias: csIBM851 Name: IBM852 [RFC1345,KXS2] MIBenum: 2010 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp852 Alias: 852 Alias: csPCp852 Name: IBM855 [RFC1345,KXS2] MIBenum: 2046 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp855 Alias: 855 Alias: csIBM855 Name: IBM857 [RFC1345,KXS2] MIBenum: 2047 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp857 Alias: 857 Alias: csIBM857 Name: IBM860 [RFC1345,KXS2] MIBenum: 2048 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp860 Alias: 860 Alias: csIBM860 Name: IBM861 [RFC1345,KXS2] MIBenum: 2049 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp861 Alias: 861 Alias: cp-is Alias: csIBM861 Name: IBM863 [RFC1345,KXS2] MIBenum: 2050 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp863 Alias: 863 Alias: csIBM863 Name: IBM864 [RFC1345,KXS2] MIBenum: 2051 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp864 Alias: csIBM864 Name: IBM865 [RFC1345,KXS2] MIBenum: 2052 Source: IBM DOS 3.3 Ref (Abridged), 94X9575 (Feb 1987) Alias: cp865 Alias: 865 Alias: csIBM865 Name: IBM868 [RFC1345,KXS2] MIBenum: 2053 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP868 Alias: cp-ar Alias: csIBM868 Name: IBM869 [RFC1345,KXS2] MIBenum: 2054 Source: IBM Keyboard layouts and code pages, PN 07G4586 June 1991 Alias: cp869 Alias: 869 Alias: cp-gr Alias: csIBM869 Name: IBM870 [RFC1345,KXS2] MIBenum: 2055 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP870 Alias: ebcdic-cp-roece Alias: ebcdic-cp-yu Alias: csIBM870 Name: IBM871 [RFC1345,KXS2] MIBenum: 2056 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP871 Alias: ebcdic-cp-is Alias: csIBM871 Name: IBM880 [RFC1345,KXS2] MIBenum: 2057 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp880 Alias: EBCDIC-Cyrillic Alias: csIBM880 Name: IBM891 [RFC1345,KXS2] MIBenum: 2058 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp891 Alias: csIBM891 Name: IBM903 [RFC1345,KXS2] MIBenum: 2059 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp903 Alias: csIBM903 Name: IBM904 [RFC1345,KXS2] MIBenum: 2060 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: cp904 Alias: 904 Alias: csIBBM904 Name: IBM905 [RFC1345,KXS2] MIBenum: 2061 Source: IBM 3174 Character Set Ref, GA27-3831-02, March 1990 Alias: CP905 Alias: ebcdic-cp-tr Alias: csIBM905 Name: IBM918 [RFC1345,KXS2] MIBenum: 2062 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP918 Alias: ebcdic-cp-ar2 Alias: csIBM918 Name: IBM1026 [RFC1345,KXS2] MIBenum: 2063 Source: IBM NLS RM Vol2 SE09-8002-01, March 1990 Alias: CP1026 Alias: csIBM1026 Name: EBCDIC-AT-DE [RFC1345,KXS2] MIBenum: 2064 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csIBMEBCDICATDE Name: EBCDIC-AT-DE-A [RFC1345,KXS2] MIBenum: 2065 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICATDEA Name: EBCDIC-CA-FR [RFC1345,KXS2] MIBenum: 2066 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICCAFR Name: EBCDIC-DK-NO [RFC1345,KXS2] MIBenum: 2067 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICDKNO Name: EBCDIC-DK-NO-A [RFC1345,KXS2] MIBenum: 2068 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICDKNOA Name: EBCDIC-FI-SE [RFC1345,KXS2] MIBenum: 2069 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFISE Name: EBCDIC-FI-SE-A [RFC1345,KXS2] MIBenum: 2070 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFISEA Name: EBCDIC-FR [RFC1345,KXS2] MIBenum: 2071 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICFR Name: EBCDIC-IT [RFC1345,KXS2] MIBenum: 2072 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICIT Name: EBCDIC-PT [RFC1345,KXS2] MIBenum: 2073 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICPT Name: EBCDIC-ES [RFC1345,KXS2] MIBenum: 2074 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICES Name: EBCDIC-ES-A [RFC1345,KXS2] MIBenum: 2075 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICESA Name: EBCDIC-ES-S [RFC1345,KXS2] MIBenum: 2076 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICESS Name: EBCDIC-UK [RFC1345,KXS2] MIBenum: 2077 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICUK Name: EBCDIC-US [RFC1345,KXS2] MIBenum: 2078 Source: IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987 Alias: csEBCDICUS Name: UNKNOWN-8BIT [RFC1428] MIBenum: 2079 Alias: csUnknown8BiT Name: MNEMONIC [RFC1345,KXS2] MIBenum: 2080 Source: RFC 1345, also known as "mnemonic+ascii+38" Alias: csMnemonic Name: MNEM [RFC1345,KXS2] MIBenum: 2081 Source: RFC 1345, also known as "mnemonic+ascii+8200" Alias: csMnem Name: VISCII [RFC1456] MIBenum: 2082 Source: RFC 1456 Alias: csVISCII Name: VIQR [RFC1456] MIBenum: 2083 Source: RFC 1456 Alias: csVIQR Name: KOI8-R (preferred MIME name) [RFC1489] MIBenum: 2084 Source: RFC 1489, based on GOST-19768-74, ISO-6937/8, INIS-Cyrillic, ISO-5427. Alias: csKOI8R Name: HZ-GB-2312 MIBenum: 2085 Source: RFC 1842, RFC 1843 [RFC1842, RFC1843] Name: IBM866 [Pond] MIBenum: 2086 Source: IBM NLDG Volume 2 (SE09-8002-03) August 1994 Alias: cp866 Alias: 866 Alias: csIBM866 Name: IBM775 [HP-PCL5] MIBenum: 2087 Source: HP PCL 5 Comparison Guide (P/N 5021-0329) pp B-13, 1996 Alias: cp775 Alias: csPC775Baltic Name: KOI8-U [RFC2319] MIBenum: 2088 Source: RFC 2319 Name: IBM00858 MIBenum: 2089 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00858) [Mahdi] Alias: CCSID00858 Alias: CP00858 Alias: PC-Multilingual-850+euro Name: IBM00924 MIBenum: 2090 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM00924) [Mahdi] Alias: CCSID00924 Alias: CP00924 Alias: ebcdic-Latin9--euro Name: IBM01140 MIBenum: 2091 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01140) [Mahdi] Alias: CCSID01140 Alias: CP01140 Alias: ebcdic-us-37+euro Name: IBM01141 MIBenum: 2092 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01141) [Mahdi] Alias: CCSID01141 Alias: CP01141 Alias: ebcdic-de-273+euro Name: IBM01142 MIBenum: 2093 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01142) [Mahdi] Alias: CCSID01142 Alias: CP01142 Alias: ebcdic-dk-277+euro Alias: ebcdic-no-277+euro Name: IBM01143 MIBenum: 2094 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01143) [Mahdi] Alias: CCSID01143 Alias: CP01143 Alias: ebcdic-fi-278+euro Alias: ebcdic-se-278+euro Name: IBM01144 MIBenum: 2095 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01144) [Mahdi] Alias: CCSID01144 Alias: CP01144 Alias: ebcdic-it-280+euro Name: IBM01145 MIBenum: 2096 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01145) [Mahdi] Alias: CCSID01145 Alias: CP01145 Alias: ebcdic-es-284+euro Name: IBM01146 MIBenum: 2097 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01146) [Mahdi] Alias: CCSID01146 Alias: CP01146 Alias: ebcdic-gb-285+euro Name: IBM01147 MIBenum: 2098 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01147) [Mahdi] Alias: CCSID01147 Alias: CP01147 Alias: ebcdic-fr-297+euro Name: IBM01148 MIBenum: 2099 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01148) [Mahdi] Alias: CCSID01148 Alias: CP01148 Alias: ebcdic-international-500+euro Name: IBM01149 MIBenum: 2100 Source: IBM See (http://www.iana.org/assignments/charset-reg/IBM01149) [Mahdi] Alias: CCSID01149 Alias: CP01149 Alias: ebcdic-is-871+euro Name: Big5-HKSCS [Yick] MIBenum: 2101 Source: See (http://www.iana.org/assignments/charset-reg/Big5-HKSCS) Alias: None Name: IBM1047 [Robrigado] MIBenum: 2102 Source: IBM1047 (EBCDIC Latin 1/Open Systems) http://www-1.ibm.com/servers/eserver/iseries/software/globalization/pdf/cp01047z.pdf Alias: IBM-1047 Name: PTCP154 [Uskov] MIBenum: 2103 Source: See (http://www.iana.org/assignments/charset-reg/PTCP154) Alias: csPTCP154 Alias: PT154 Alias: CP154 Alias: Cyrillic-Asian Name: Amiga-1251 MIBenum: 2104 Source: See (http://www.amiga.ultranet.ru/Amiga-1251.html) Alias: Ami1251 Alias: Amiga1251 Alias: Ami-1251 (Aliases are provided for historical reasons and should not be used) [Malyshev] Name: KOI7-switched MIBenum: 2105 Source: See <http://www.iana.org/assignments/charset-reg/KOI7-switched> Aliases: None Name: BRF MIBenum: 2106 Source: See <http://www.iana.org/assignments/charset-reg/BRF> [Thibault] Alias: csBRF Name: TSCII MIBenum: 2107 Source: See <http://www.iana.org/assignments/charset-reg/TSCII> [Kalyanasundaram] Alias: csTSCII Name: windows-1250 MIBenum: 2250 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1250) [Lazhintseva] Alias: None Name: windows-1251 MIBenum: 2251 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1251) [Lazhintseva] Alias: None Name: windows-1252 MIBenum: 2252 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1252) [Wendt] Alias: None Name: windows-1253 MIBenum: 2253 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1253) [Lazhintseva] Alias: None Name: windows-1254 MIBenum: 2254 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1254) [Lazhintseva] Alias: None Name: windows-1255 MIBenum: 2255 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1255) [Lazhintseva] Alias: None Name: windows-1256 MIBenum: 2256 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1256) [Lazhintseva] Alias: None Name: windows-1257 MIBenum: 2257 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1257) [Lazhintseva] Alias: None Name: windows-1258 MIBenum: 2258 Source: Microsoft (http://www.iana.org/assignments/charset-reg/windows-1258) [Lazhintseva] Alias: None Name: TIS-620 MIBenum: 2259 Source: Thai Industrial Standards Institute (TISI) [Tantsetthi] REFERENCES ---------- [RFC1345] Simonsen, K., "Character Mnemonics & Character Sets", RFC 1345, Rationel Almen Planlaegning, Rationel Almen Planlaegning, June 1992. [RFC1428] Vaudreuil, G., "Transition of Internet Mail from Just-Send-8 to 8bit-SMTP/MIME", RFC1428, CNRI, February 1993. [RFC1456] Vietnamese Standardization Working Group, "Conventions for Encoding the Vietnamese Language VISCII: VIetnamese Standard Code for Information Interchange VIQR: VIetnamese Quoted-Readable Specification Revision 1.1", RFC 1456, May 1993. [RFC1468] Murai, J., Crispin, M., and E. van der Poel, "Japanese Character Encoding for Internet Messages", RFC 1468, Keio University, Panda Programming, June 1993. [RFC1489] Chernov, A., "Registration of a Cyrillic Character Set", RFC1489, RELCOM Development Team, July 1993. [RFC1554] Ohta, M., and K. Handa, "ISO-2022-JP-2: Multilingual Extension of ISO-2022-JP", RFC1554, Tokyo Institute of Technology, ETL, December 1993. [RFC1556] Nussbacher, H., "Handling of Bi-directional Texts in MIME", RFC1556, Israeli Inter-University, December 1993. [RFC1557] Choi, U., Chon, K., and H. Park, "Korean Character Encoding for Internet Messages", KAIST, Solvit Chosun Media, December 1993. [RFC1641] Goldsmith, D., and M. Davis, "Using Unicode with MIME", RFC1641, Taligent, Inc., July 1994. [RFC1642] Goldsmith, D., and M. Davis, "UTF-7", RFC1642, Taligent, Inc., July 1994. [RFC1815] Ohta, M., "Character Sets ISO-10646 and ISO-10646-J-1", RFC 1815, Tokyo Institute of Technology, July 1995. [Adobe] Adobe Systems Incorporated, PostScript Language Reference Manual, second edition, Addison-Wesley Publishing Company, Inc., 1990. [ECMA Registry] ISO-IR: International Register of Escape Sequences http://www.itscj.ipsj.or.jp/ISO-IE/ Note: The current registration authority is IPSJ/ITSCJ, Japan. [HP-PCL5] Hewlett-Packard Company, "HP PCL 5 Comparison Guide", (P/N 5021-0329) pp B-13, 1996. [IBM-CIDT] IBM Corporation, "ABOUT TYPE: IBM's Technical Reference for Core Interchange Digitized Type", Publication number S544-3708-01 [RFC1842] Wei, Y., J. Li, and Y. Jiang, "ASCII Printable Characters-Based Chinese Character Encoding for Internet Messages", RFC 1842, Harvard University, Rice University, University of Maryland, August 1995. [RFC1843] Lee, F., "HZ - A Data Format for Exchanging Files of Arbitrarily Mixed Chinese and ASCII Characters", RFC 1843, Stanford University, August 1995. [RFC2152] Goldsmith, D., M. Davis, "UTF-7: A Mail-Safe Transformation Format of Unicode", RFC 2152, Apple Computer, Inc., Taligent Inc., May 1997. [RFC2279] Yergeau, F., "UTF-8, A Transformation Format of ISO 10646", RFC 2279, Alis Technologies, January, 1998. [RFC2781] Hoffman, P., Yergeau, F., "UTF-16, an encoding of ISO 10646", RFC 2781, February 2000. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", RFC3629, November 2003. PEOPLE ------ [KXS2] Keld Simonsen <Keld.Simonsen&dkuug.dk> [Choi] Woohyong Choi <whchoi&cosmos.kaist.ac.kr> [Davis] Mark Davis, <mark&unicode.org>, April 2002. [Kalyanasundaram] Kuppuswamy Kalyanasundaram, <kalyan.geo@yahoo. com>, 14 May 2007. [Kikkarin] Sairan M. Kikkarin, <sairan&sci.kz>, 7 December 2006. [Lazhintseva] Katya Lazhintseva, <katyal&MICROSOFT.com>, May 1996. [Mahdi] Tamer Mahdi, <tamer&ca.ibm.com>, August 2000. [Malyshev] Michael Malyshev, <michael_malyshev&mail.ru>, January 2004 [Murai] Jun Murai <jun&wide.ad.jp> [Nussbacher] Hank Nussbacher, <hank&vm.tau.ac.il> [Ohta] Masataka Ohta, <mohta&cc.titech.ac.jp>, July 1995. [Phipps] Toby Phipps, <tphipps&peoplesoft.com>, March 2002. [Pond] Rick Pond, <rickpond&vnet.ibm.com>, March 1997. [Robrigado] Reuel Robrigado, <reuelr&ca.ibm.com>, September 2002. [Scherer] Markus Scherer, <markus.scherer&jtcsv.com>, August 2000, September 2002. [Simonsen] Keld Simonsen, <Keld.Simonsen&rap.dk>, August 2000. [Tantsetthi] Trin Tantsetthi, <trin&mozart.inet.co.th>, September 1998. [Thibault] Samuel Thibault, <samuel.thibault&ens-lyon.org>, 7 December 2006. [Tumasonis] Vladas Tumasonis, <vladas.tumasonis&maf.vu.lt>, August 2000. [Uskov] Alexander Uskov, <auskov&idc.kz>, September 2002. [Veremeev] Alexei Veremeev, <Alexey.Veremeev&oracle.com>, 7 December 2006. [Wendt] Chris Wendt, <christwµsoft.com>, December 1999. [Yick] Nicky Yick, <cliac&itsd.gcn.gov.hk>, October 2000. []
# Blocks-5.1.0.txt # Date: 2008-03-20, 17:41:00 PDT [KW] # # Unicode Character Database # Copyright (c) 1991-2008 Unicode, Inc. # For terms of use, see http://www.unicode.org/terms_of_use.html # For documentation, see UCD.html # # Note: The casing of block names is not normative. # For example, "Basic Latin" and "BASIC LATIN" are equivalent. # # Format: # Start Code..End Code; Block Name # ================================================ # Note: When comparing block names, casing, whitespace, hyphens, # and underbars are ignored. # For example, "Latin Extended-A" and "latin extended a" are equivalent. # For more information on the comparison of property values, # see UCD.html. # # All code points not explicitly listed for Block # have the value No_Block. # Property: Block # # @missing: 0000..10FFFF; No_Block 0000..007F; Basic Latin 0080..00FF; Latin-1 Supplement 0100..017F; Latin Extended-A 0180..024F; Latin Extended-B 0250..02AF; IPA Extensions 02B0..02FF; Spacing Modifier Letters 0300..036F; Combining Diacritical Marks 0370..03FF; Greek and Coptic 0400..04FF; Cyrillic 0500..052F; Cyrillic Supplement 0530..058F; Armenian 0590..05FF; Hebrew 0600..06FF; Arabic 0700..074F; Syriac 0750..077F; Arabic Supplement 0780..07BF; Thaana 07C0..07FF; NKo 0900..097F; Devanagari 0980..09FF; Bengali 0A00..0A7F; Gurmukhi 0A80..0AFF; Gujarati 0B00..0B7F; Oriya 0B80..0BFF; Tamil 0C00..0C7F; Telugu 0C80..0CFF; Kannada 0D00..0D7F; Malayalam 0D80..0DFF; Sinhala 0E00..0E7F; Thai 0E80..0EFF; Lao 0F00..0FFF; Tibetan 1000..109F; Myanmar 10A0..10FF; Georgian 1100..11FF; Hangul Jamo 1200..137F; Ethiopic 1380..139F; Ethiopic Supplement 13A0..13FF; Cherokee 1400..167F; Unified Canadian Aboriginal Syllabics 1680..169F; Ogham 16A0..16FF; Runic 1700..171F; Tagalog 1720..173F; Hanunoo 1740..175F; Buhid 1760..177F; Tagbanwa 1780..17FF; Khmer 1800..18AF; Mongolian 1900..194F; Limbu 1950..197F; Tai Le 1980..19DF; New Tai Lue 19E0..19FF; Khmer Symbols 1A00..1A1F; Buginese 1B00..1B7F; Balinese 1B80..1BBF; Sundanese 1C00..1C4F; Lepcha 1C50..1C7F; Ol Chiki 1D00..1D7F; Phonetic Extensions 1D80..1DBF; Phonetic Extensions Supplement 1DC0..1DFF; Combining Diacritical Marks Supplement 1E00..1EFF; Latin Extended Additional 1F00..1FFF; Greek Extended 2000..206F; General Punctuation 2070..209F; Superscripts and Subscripts 20A0..20CF; Currency Symbols 20D0..20FF; Combining Diacritical Marks for Symbols 2100..214F; Letterlike Symbols 2150..218F; Number Forms 2190..21FF; Arrows 2200..22FF; Mathematical Operators 2300..23FF; Miscellaneous Technical 2400..243F; Control Pictures 2440..245F; Optical Character Recognition 2460..24FF; Enclosed Alphanumerics 2500..257F; Box Drawing 2580..259F; Block Elements 25A0..25FF; Geometric Shapes 2600..26FF; Miscellaneous Symbols 2700..27BF; Dingbats 27C0..27EF; Miscellaneous Mathematical Symbols-A 27F0..27FF; Supplemental Arrows-A 2800..28FF; Braille Patterns 2900..297F; Supplemental Arrows-B 2980..29FF; Miscellaneous Mathematical Symbols-B 2A00..2AFF; Supplemental Mathematical Operators 2B00..2BFF; Miscellaneous Symbols and Arrows 2C00..2C5F; Glagolitic 2C60..2C7F; Latin Extended-C 2C80..2CFF; Coptic 2D00..2D2F; Georgian Supplement 2D30..2D7F; Tifinagh 2D80..2DDF; Ethiopic Extended 2DE0..2DFF; Cyrillic Extended-A 2E00..2E7F; Supplemental Punctuation 2E80..2EFF; CJK Radicals Supplement 2F00..2FDF; Kangxi Radicals 2FF0..2FFF; Ideographic Description Characters 3000..303F; CJK Symbols and Punctuation 3040..309F; Hiragana 30A0..30FF; Katakana 3100..312F; Bopomofo 3130..318F; Hangul Compatibility Jamo 3190..319F; Kanbun 31A0..31BF; Bopomofo Extended 31C0..31EF; CJK Strokes 31F0..31FF; Katakana Phonetic Extensions 3200..32FF; Enclosed CJK Letters and Months 3300..33FF; CJK Compatibility 3400..4DBF; CJK Unified Ideographs Extension A 4DC0..4DFF; Yijing Hexagram Symbols 4E00..9FFF; CJK Unified Ideographs A000..A48F; Yi Syllables A490..A4CF; Yi Radicals A500..A63F; Vai A640..A69F; Cyrillic Extended-B A700..A71F; Modifier Tone Letters A720..A7FF; Latin Extended-D A800..A82F; Syloti Nagri A840..A87F; Phags-pa A880..A8DF; Saurashtra A900..A92F; Kayah Li A930..A95F; Rejang AA00..AA5F; Cham AC00..D7AF; Hangul Syllables D800..DB7F; High Surrogates DB80..DBFF; High Private Use Surrogates DC00..DFFF; Low Surrogates E000..F8FF; Private Use Area F900..FAFF; CJK Compatibility Ideographs FB00..FB4F; Alphabetic Presentation Forms FB50..FDFF; Arabic Presentation Forms-A FE00..FE0F; Variation Selectors FE10..FE1F; Vertical Forms FE20..FE2F; Combining Half Marks FE30..FE4F; CJK Compatibility Forms FE50..FE6F; Small Form Variants FE70..FEFF; Arabic Presentation Forms-B FF00..FFEF; Halfwidth and Fullwidth Forms FFF0..FFFF; Specials 10000..1007F; Linear B Syllabary 10080..100FF; Linear B Ideograms 10100..1013F; Aegean Numbers 10140..1018F; Ancient Greek Numbers 10190..101CF; Ancient Symbols 101D0..101FF; Phaistos Disc 10280..1029F; Lycian 102A0..102DF; Carian 10300..1032F; Old Italic 10330..1034F; Gothic 10380..1039F; Ugaritic 103A0..103DF; Old Persian 10400..1044F; Deseret 10450..1047F; Shavian 10480..104AF; Osmanya 10800..1083F; Cypriot Syllabary 10900..1091F; Phoenician 10920..1093F; Lydian 10A00..10A5F; Kharoshthi 12000..123FF; Cuneiform 12400..1247F; Cuneiform Numbers and Punctuation 1D000..1D0FF; Byzantine Musical Symbols 1D100..1D1FF; Musical Symbols 1D200..1D24F; Ancient Greek Musical Notation 1D300..1D35F; Tai Xuan Jing Symbols 1D360..1D37F; Counting Rod Numerals 1D400..1D7FF; Mathematical Alphanumeric Symbols 1F000..1F02F; Mahjong Tiles 1F030..1F09F; Domino Tiles 20000..2A6DF; CJK Unified Ideographs Extension B 2F800..2FA1F; CJK Compatibility Ideographs Supplement E0000..E007F; Tags E0100..E01EF; Variation Selectors Supplement F0000..FFFFF; Supplementary Private Use Area-A 100000..10FFFF; Supplementary Private Use Area-B # EOF
Beispiele für benamte/benummerte XHTML-Zeichen:
& = & &,
< = < <,
> = > >,
⇒ = ⇒ = ⇒ ⇒,
→ = → = → →,
ä ä,
ö ö,
ü ü,
Ä Ä,
Ö Ö,
Ü Ü,
ß ß,
⟨ = 〈 〈,
⟩ = 〉 〉,
– =   =   = –,
— =   =   =—,
...
Hier einige (benummerte) Unicode-Zeichen:
❐ = ❐, invers =❐ ,
➥ = ➥ = ➥ ,
☓ = ☓ = ☓ ,
⊗ = ⊗ ,
✺ = ✺ ,
❍ = ❍ ,
〇 = 〇 Vergleich: 'O', '0',
⊗
〣 = 〣 ,
─ ─
│ │
┌ ┌
┐ ┐
└ └
┘ ┘
├ ├
┤ ┤
┬ ┬
┴ ┴
┼ ┼
═ ═
║ ║
╒ ╒
╓ ╓
╔ ╔
╕ ╕
╖ ╖
╗ ╗
╘ ╘
╙ ╙
╚ ╚
╛ ╛
╜ ╜
╝ ╝
╞ ╞
╟ ╟
╠ ╠
╡ ╡
╢ ╢
╣ ╣
╤ ╤
╥ ╥
╦ ╦
╧ ╧
╨ ╨
╩ ╩
╪ ╪
╫ ╫
╬ ╬
▀ ▀
▄ ▄
█ █
▌ ▌
▐ ▐
░ ░
▒ ▒
▓ ▓
■ ■
▪ ▪
▫ ▫
▬ ▬
▲ ▲
► ►
▼ ▼
◄ ◄
◊ ◊
○ ○
● ●
◘ ◘
◙ ◙
◦ ◦
☺ ☺
☻ ☻
☼ ☼
♀ ♀
♂ ♂
♠ ♠
♣ ♣
♥ ♥
♦ ♦
♪ ♪
♫ ♫
♰ ♰
♱ ♱
1. Beispiel für Unicode-Block: wikipedia.org: U+0000–U+007F Unicode-Block Basis-Lateinisch oder decodeunicode.org: visuell U+0000–U+007F (Basic Latin) unicode.coeurlumiere: U+0000–U+0FFF Tabelle (hex,dez)
2. Beispiel für Unicode-Block: wikipedia.org: U+2300–U+23FF Unicode-Block Miscellaneous Technical (Verschiedene technische Zeichen) oder decodeunicode.org: visuell U+2300–U+23FF Miscellaneous Technical (technische Symbole)
3. Beispiel für Unicode-Block: wikipedia.org: U+25A0–U+25FF Unicode-Block Geometrische Formen oder decodeunicode.org: visuell U+25A0–U+25FF Geometric Shapes
Name und Link zur Unicodetabelle | Block |
---|---|
Basic Latin (ASCII-Codetabelle) | U+0000 bis U+007F |
Latin-1 (Codetabelle von ISO 8859-1) | U+0080 bis U+00FF |
Latin Extended-A | U+0100 bis U+017F |
Latin Extended-B | U+0180 bis U+024F |
IPA Extensions | U+0250 bis U+02AF |
Spacing Modifier Letters | U+02B0 bis U+02FF |
Combining Diacritical Marks | U+0300 bis U+036F |
Greek | U+0370 bis U+03FF |
Cyrillic | U+0400 bis U+04FF |
Armenian | U+0530 bis U+058F |
Hebrew | U+0590 bis U+05FF |
Arabic | U+0600 bis U+06FF |
Devanagari | U+0900 bis U+097F |
Bengali | U+0980 bis U+09FF |
Gurmukhi | U+0A00 bis U+0A7F |
Gujarati | U+0A80 bis U+0AFF |
Oriya | U+0B00 bis U+0B7F |
Tamil | U+0B80 bis U+0BFF |
Telugu | U+0C00 bis U+0C7F |
Kannada | U+0C80 bis U+0CFF |
Malayalam | U+0D00 bis U+0D7F |
Thai | U+0E00 bis U+0E7F |
Lao | U+0E80 bis U+0EFF |
Tibetan | U+0F00 bis U+0FBF |
Georgian | U+10A0 bis U+10FF |
Hangul Jamo | U+1100 bis U+11FF |
Latin Extended Additional | U+1E00 bis U+1EFF |
Greek Extended | U+1F00 bis U+1FFF |
General Punctuation | U+2000 bis U+206F |
Superscripts and Subscripts | U+2070 bis U+209F |
Currency Symbols |
U+20A0 bis U+20CF
Eurozeichen-Symbol; U+20AC ,
HTML auch: €
oder €
|
Combining Diacritical Marks for Symbols | U+20D0 bis U+20FF |
Letterlike Symbols | U+2100 bis U+214F |
Number Forms | U+2150 bis U+218F |
Arrows | U+2190 bis U+21FF |
Mathematical Operators | U+2200 bis U+22FF |
Miscellaneous Technical | U+2300 bis U+23FF |
Control Pictures | U+2400 bis U+243F |
Optical Character Recognition | U+2440 bis U+245F |
Enclosed Alphanumerics | U+2460 bis U+24FF |
Box Drawing | U+2500 bis U+257F |
Block Elements | U+2580 bis U+259F |
Geometric Shapes | U+25A0 bis U+25FF |
Miscellaneous Symbols | U+2600 bis U+26FF |
Dingbats | U+2700 bis U+27BF |
CJK Symbols and Punctuation | U+3000 bis U+303F |
Hiragana | U+3040 bis U+309F |
Katakana | U+30A0 bis U+30FF |
Bopomofo | U+3100 bis U+312F |
Hangul Compatibility Jamo | U+3130 bis U+318F |
Kanbun | U+3190 bis U+319F |
Enclosed CJK Letters and Months | U+3200 bis U+32FF |
CJK Compatibility | U+3300 bis U+33FF |
CJK Unified Ideographs | U+4E00 bis U+9FA5 |
Hangul Syllables | U+AC00 bis U+D7A3 |
High Surrogates | U+D800 bis U+DB7F |
Private Use High Surrogates | U+DB80 bis U+DBFF |
Low Surrogates | U+DC00 bis U+DFFF |
Private Use Area | U+E000 bis U+F8FF |
CJK Compatibility Ideographs | U+F900 bis U+FAFF |
Alphabetic Presentation Forms | U+FB00 bis U+FB4F |
Arabic Presentation Forms-A | U+FB50 bis U+FDFF |
Combining Half Marks | U+FE20 bis U+FE2F |
CJK Compatibility Forms | U+FE30 bis U+FE4F |
Small Form Variants | U+FE50 bis U+FE6F |
Arabic Presentation Forms-B | U+FE70 bis U+FEFF |
Halfwidth and Fullwidth Forms | U+FF00 bis U+FFEF |
Specials | U+FFF0 bis U+FFFF |
Name und Link zur Unicodetabelle | Block |
---|---|
Linear B Syllabary | U+10000 bis U+1007F |
Linear B Ideograms | U+10080 bis U+100FF |
Aegean Numbers | U+10100 bis U+1013F |
Ancient Greek Numbers | U+10140 bis U+1018F |
Old Italic | U+10300 bis U+1032F |
Gothic | U+10330 bis U+1034F |
Ugaritic | U+10380 bis U+1039F |
Old Persian | U+103A0 bis U+103DF |
Deseret | U+10400 bis U+1044F |
Shavian | U+10450 bis U+1047F |
Osmanya | U+10480 bis U+104AF |
Cypriot Syllabary | U+10800 bis U+1083F |
Kharoshthi | U+10A00 bis U+10A5F |
Byzantine Musical Symbols | U+1D000 bis U+1D0FF |
Musical Symbols | U+1D100 bis U+1D1FF |
Ancient Greek Musical Notation | U+1D200 bis U+1D24F |
Tai Xuan Jing Symbols | U+1D300 bis U+1D35F |
Mathematical Alphanumeric Symbols | U+1D400 bis U+1D7FF |
CJK Unified Ideographs Extension B | U+20000 bis U+2A6D6 |
CJK Compatibility Ideographs Supplement | U+2F800 bis U+2FA1F |
Tags | U+E0000 bis U+E007F |
Variation Selectors Supplement | U+E0100 bis U+E01EF |
Supplementary Private Use Area-A | U+F0000 bis U+E01EF |
Supplementary Private Use Area-B | U+100000 bis U+10FFFF |
Zur Zeit (2009) sind in den Browsern noch nicht alle Unicode-Blöcke verfügbar.
→U+0000 - U+007F Basis-Lateinisch →U+0080 - U+00FF Lateinisch-1, Ergänzung →U+0100 - U+017F Lateinisch, erweitert-A →U+0180 - U+024F Lateinisch, erweitert-B →U+02B0 - U+02FF Spacing Modifier Letters →U+0300 - U+036F Kombinierende diakritische Zeichen →U+0370 - U+03FF Griechisch und Koptisch →U+0300 - U+036F Kombinierende diakritische Zeichen, Ergänzung →U+1F00 - U+1FFF Griechisch, Zusatz →U+2070 - U+209F Hoch- und tiefgestellte Zeichen →U+20A0 - U+20CF Währungszeichen →U+2100 - U+214F Buchstabenähnliche Symbole →U+2150 - U+218F Zahlzeichen →U+2190 - U+21FF Pfeile →U+2400 - U+243F Symbole für Steuerzeichen →U+2440 - U+245F Optische Zeichenerkennung →U+2460 - U+24FF Umschlossene alphanumerische Zeichen →U+2580 - U+259F Blockelemente →U+25A0 - U+25FF Geometrische Formen →U+2600 - U+26FF Verschiedene Symbole →U+27F0 - U+27FF Zusätzliche Pfeile-A →U+2800 - U+28FF Braille-Zeichen →U+FE70 - U+FEFF Arabische Präsentationsformen-B
Unicode-Block: General-Punctuation U+2000(8192) – U+206F(8303) → Unicode.org chart U+2000(8192) – U+206F(8303) (PDF)
Unicode-Block: General-Punctuation U+2000(8192) – U+206F(8303) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+200 x
| Ȁ | ȁ | Ȃ | ȃ | Ȅ | ȅ | Ȇ | ȇ | Ȉ | ȉ | Ȋ | ȋ | Ȍ | ȍ | Ȏ | ȏ |
U+201 x
| ‐ | ‑ | ‒ | – | — | ― | ‖ | ‗ | ‘ | ’ | ‚ | ‛ | “ | ” | „ | ‟ |
U+202 x
| † | ‡ | • | ‣ | ․ | ‥ | … | ‧ | | | | | | |||
U+203 x
| ‰ | ‱ | ′ | ″ | ‴ | ‵ | ‶ | ‷ | ‸ | ‹ | › | ※ | ‼ | ‽ | ‾ | ‿ |
U+204 x
| ⁀ | ⁁ | ⁂ | ⁃ | ⁄ | ⁅ | ⁆ | ⁇ | ⁈ | ⁉ | ⁊ | ⁋ | ⁌ | ⁍ | ⁎ | ⁏ |
U+205 x
| ⁐ | ⁑ | ⁒ | ⁓ | ⁔ | ⁕ | ⁖ | ⁗ | ⁘ | ⁙ | ⁚ | ⁛ | ⁜ | ⁝ | ⁞ | |
U+206 x
| | | | | | | | | | | | | | | | |
Unicode-Block: Letterlike-Symbols U+2100(8448) – U+214f(8527) → Unicode.org chart U+2100(8448) – U+214f(8527) (PDF)
Unicode-Letterlike-Symbols U+2100(8448) – U+214f(8527) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+220 x
| ℀ | ℁ | ℂ | ℃ | ℄ | ℅ | ℆ | ℇ | ℈ | ℉ | ℊ | ℋ | ℌ | ℍ | ℎ | ℏ |
U+211 x
| ℐ | ℑ | ℒ | ℓ | ℔ | ℕ | № | ℗ | ℘ | ℙ | ℚ | ℛ | ℜ | ℝ | ℞ | ℟ |
U+212 x
| ℠ | ℡ | ™ | ℣ | ℤ | ℥ | Ω | ℧ | ℨ | ℩ | K | Å | ℬ | ℭ | ℮ | ℯ |
U+213 x
| ℰ | ℱ | Ⅎ | ℳ | ℴ | ℵ | ℶ | ℷ | ℸ | ℹ | ℺ | ℻ | ℼ | ℽ | ℾ | ℿ |
U+214 x
| ⅀ | ⅁ | ⅂ | ⅃ | ⅄ | ⅅ | ⅆ | ⅇ | ⅈ | ⅉ | ⅊ | ⅋ | ⅌ | ⅍ | ⅎ | ⅏ |
Unicode-Block: Block-Elements U+2580 (9600) - U+259f (9631) → Unicode.org chart U+2580 (9600) - U+259f (9631) (PDF)
Unicode-Block-Elements U+2580 (9600) - U+259f (9631) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+258 x
| ▀ | ▁ | ▂ | ▃ | ▄ | ▅ | ▆ | ▇ | █ | ▉ | ▊ | ▋ | ▌ | ▍ | ▎ | ▏ |
U+259 x
| ▐ | ░ | ▒ | ▓ | ▔ | ▕ | ▖ | ▗ | ▘ | ▙ | ▚ | ▛ | ▜ | ▝ | ▞ | ▟ |
Unicode-Block: Geometric Shapes U+25a0 (9632) - U+25ff (9727) → Unicode.org chart U+25a0 (9632) - U+25ff (9727) (PDF)
Beispiele aus U+25a0 (9632) - U+25ff (9727):
▲   ▲  
◄ ► ◄   ►
▼   ▼  
◤▴◥ ◤▴◥
◂◌▶ ◂◌▶
◣▾◢ ◣▾◢
Unicode-Geometric-Shapes U+25a0 (9632) - U+25ff (9727) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+25a x
| ■ | □ | ▢ | ▣ | ▤ | ▥ | ▦ | ▧ | ▨ | ▩ | ▪ | ▫ | ▬ | ▭ | ▮ | ▯ |
U+25b x
| ▰ | ▱ | ▲ | △ | ▴ | ▵ | ▶ | ▷ | ▸ | ▹ | ► | ▻ | ▼ | ▽ | ▾ | ▿ |
U+25c x
| ◀ | ◁ | ◂ | ◃ | ◄ | ◅ | ◆ | ◇ | ◈ | ◉ | ◊ | ○ | ◌ | ◍ | ◎ | ● |
U+25d x
| ◐ | ◑ | ◒ | ◓ | ◔ | ◕ | ◖ | ◗ | ◘ | ◙ | ◚ | ◛ | ◜ | ◝ | ◞ | ◟ |
U+25e x
| ◠ | ◡ | ◢ | ◣ | ◤ | ◥ | ◦ | ◧ | ◨ | ◩ | ◪ | ◫ | ◬ | ◭ | ◮ | ◯ |
U+25f x
| ◰ | ◱ | ◲ | ◳ | ◴ | ◵ | ◶ | ◷ | ◸ | ◹ | ◺ | ◻ | ◼ | ◽ | ◾ | ◿ |
Unicode-Block: Miscellaneous-Symbols U+2600(9728) – U+26FF(9983) → Unicode.org chart U+2600(9728) – U+26FF(9983) (PDF)
Beispiele aus U+2600(9728) – U+26FF(9983):
☀ = ☀,
☁ = ☁,
☂ = ☂,
☃ = ☃,
♩ = ♩,
♪ = ♪,
♫ = ♫,
♬ = ♬,
Unicode-Miscellaneous-Symbols U+2600(9728) – U+26FF(9983) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+260 x
| ☀ | ☁ | ☂ | ☃ | ☄ | ★ | ☆ | ☇ | ☈ | ☉ | ☊ | ☋ | ☌ | ☍ | ☎ | ☏ |
U+261 x
| ☐ | ☑ | ☒ | ☓ | ☔ | ☕ | ☖ | ☗ | ☘ | ☙ | ☚ | ☛ | ☜ | ☝ | ☞ | ☟ |
U+262 x
| ☠ | ☡ | ☢ | ☣ | ☤ | ☥ | ☦ | ☧ | ☨ | ☩ | ☪ | ☫ | ☬ | ☭ | ☮ | ☯ |
U+263 x
| ☰ | ☱ | ☲ | ☳ | ☴ | ☵ | ☶ | ☷ | ☸ | ☹ | ☺ | ☻ | ☼ | ☽ | ☾ | ☿ |
U+264 x
| ♀ | ♁ | ♂ | ♃ | ♄ | ♅ | ♆ | ♇ | ♈ | ♉ | ♊ | ♋ | ♌ | ♍ | ♎ | ♏ |
U+265 x
| ♐ | ♑ | ♒ | ♓ | ♔ | ♕ | ♖ | ♗ | ♘ | ♙ | ♚ | ♛ | ♜ | ♝ | ♞ | ♟ |
U+266 x
| ♠ | ♡ | ♢ | ♣ | ♤ | ♥ | ♦ | ♧ | ♨ | ♩ | ♪ | ♫ | ♬ | ♭ | ♮ | ♯ |
U+267 x
| ♰ | ♱ | ♲ | ♳ | ♴ | ♵ | ♶ | ♷ | ♸ | ♹ | ♺ | ♻ | ♼ | ♽ | ♾ | ♿ |
Unicode-Block: Dingbats-Block U+2701 (9985) - U+27BE (10174) → Unicode.org chart Dingbats-Block U+2701 (9985) - U+27BE (10174) (PDF)
Unicode-Nummer | Zeichen | XHTML- Code | Beschreibung | Offizielle Bezeichnung |
---|---|---|---|---|
U+2701 (9985) | ✁ | ✁ |
Schere mit oberer Klinge | UPPER BLADE SCISSORS |
U+2702 (9986) | ✂ | ✂ |
Schwarze Schere | BLACK SCISSORS |
U+2703 (9987) | ✃ | ✃ |
Schere mit unterer Klinge | LOWER BLADE SCISSORS |
U+2704 (9988) | ✄ | ✄ |
Weiße Schere | WHITE SCISSORS |
U+2706 (9990) | ✆ | ✆ |
Zeichen für Telefonstandort ( U+2121 TELEPHONE SIGN ℡) | TELEPHONE LOCATION SIGN |
U+2707 (9991) | ✇ | ✇ |
Bandlaufwerk | TAPE DRIVE |
U+2708 (9992) | ✈ | ✈ |
Flugzeug, Zeichen für Flughafen | AIRPLANE |
U+2709 (9993) | ✉ | ✉ |
Briefumschlag, Zeichen für Post | ENVELOPE |
U+270C (9996) | ✌ | ✌ |
Victory-Zeichen | VICTORY HAND |
U+270D (9997) | ✍ | ✍ |
Schreibende Hand Zeichen für handschriftlich, Schriftstellerei |
WRITING HAND |
U+270E (9998) | ✎ | ✎ |
Schreibstift nach rechts unten | LOWER RIGHT PENCIL |
U+270F (9999) | ✏ | ✏ |
Schreibstift | PENCIL |
U+2710 (10000) | ✐ | ✐ |
Schreibstift nach rechts oben | UPPER RIGHT PENCIL |
U+2711 (10001) | ✑ | ✑ |
Weiße Schreibfederspitze | WHITE NIB |
U+2712 (10002) | ✒ | ✒ |
Schwarze Schreibfederspitze | BLACK NIB |
U+2713 (10003) | ✓ | ✓ |
Häkchen (U+2611 BALLOT BOX WITH CHECK ☑) | CHECK MARK |
U+2714 (10004) | ✔ | ✔ |
Fettes Häkchen | HEAVY CHECK MARK |
U+2715 (10005) | ✕ | ✕ |
Kreuzchen als Malzeichen für Multiplikation (Mathematik) U+00D7 MULTIPLICATION SIGN × U+2573 BOX DRAWINGS LIGHT DIAGONAL CROSS ╳ |
MULTIPLICATION X |
U+2716 (10006) | ✖ | ✖ |
Fettes Kreuzchen als Malzeichen | HEAVY MULTIPLICATION X |
U+2717 (10007) | ✗ | ✗ |
Kreuzchen (wörtl. "Wahlkästchen-X", U+2612 BALLOT BOX WITH X, U+2613 SALTIRE ☓) |
BALLOT X |
U+2718 (10008) | ✘ | ✘ |
Fettes Kreuzchen | HEAVY BALLOT X |
U+2719 (10009) | ✙ | ✙ |
Umrandetes griechisches Kreuz | OUTLINED GREEK CROSS |
U+271A (10010) | ✚ | ✚ |
Starkes griechisches Kreuz | HEAVY GREEK CROSS |
U+271B (10011) | ✛ | ✛ |
Kreuz mit offener Mitte | OPEN CENTRE CROSS |
U+271C (10012) | ✜ | ✜ |
Starkes Kreuz mit offener Mitte | HEAVY OPEN CENTRE CROSS |
U+271D (10013) | ✝ | ✝ |
Lateinisches Kreuz | LATIN CROSS |
U+271E (10014) | ✞ | ✞ |
Schattiertes weißes lateinisches Kreuz | SHADOWED WHITE LATIN CROSS |
U+271F (10015) | ✟ | ✟ |
Umrandetes lateinisches Kreuz | OUTLINED LATIN CROSS |
U+2720 (10016) | ✠ | ✠ |
Malteserkreuz | MALTESE CROSS |
U+2721 (10017) | ✡ | ✡ |
Davidstern | STAR OF DAVID |
U+2722 (10018) | ✢ | ✢ |
Vierarmiges Tropfensternchen (Tropfenkreuz) | FOUR TEARDROP-SPOKED ASTERISK |
U+2723 (10019) | ✣ | ✣ |
Vierarmiges Ballensternchen (Ballenkreuz) | FOUR BALLOON-SPOKED ASTERISK |
U+2724 (10020) | ✤ | ✤ |
Fettes vierarmiges Ballensternchen | HEAVY FOUR BALLOON-SPOKED ASTERISK |
U+2725 (10021) | ✥ | ✥ |
Kleeblattsternchen (Kleeblattkreuz) | FOUR CLUB-SPOKED ASTERISK |
U+2726 (10022) | ✦ | ✦ |
Gefüllter vierzackiger Stern | BLACK FOUR POINTED STAR |
U+2727 (10023) | ✧ | ✧ |
Weißer vierzackiger Stern | WHITE FOUR POINTED STAR |
U+2729 (10025) | ✩ | ✩ |
Weißer fünfzackiger Stern | STRESS OUTLINED WHITE STAR |
U+272A (10026) | ✪ | ✪ |
Weißer fünfzackiger Stern in gefülltem Kreis | CIRCLED WHITE STAR |
U+272B (10027) | ✫ | ✫ |
Schwarzer fünfzackiger Stern mit offener Mitte | OPEN CENTRE BLACK STAR |
U+272C (10028) | ✬ | ✬ |
Weißer fünfzackiger Stern mit schwarzer Mitte | BLACK CENTRE WHITE STAR |
U+272D (10029) | ✭ | ✭ |
Schwarzer fünfzackiger Stern gefüllt, mit Kontur | OUTLINED BLACK STAR |
U+272E (10030) | ✮ | ✮ |
Dicker schwarzer fünfzackiger Stern gefüllt, mit Kontur | HEAVY OUTLINED BLACK STAR |
U+272F (10031) | ✯ | ✯ |
Fünfzackige Kompassrose | PINWHEEL STAR |
U+2730 (10032) | ✰ | ✰ |
Weißer fünfzackiger Stern leer, mit Schatten | SHADOWED WHITE STAR |
U+2731 (10033) | ✱ | ✱ |
Großes Sternchen | HEAVY ASTERISK |
U+2732 (10034) | ✲ | ✲ |
Sternchen mit offener Mitte | OPEN CENTRE ASTERISK |
U+2733 (10035) | ✳ | ✳ |
Achtarmiges Sternchen | EIGHT SPOKED ASTERISK |
U+2734 (10036) | ✴ | ✴ |
Schwarzer achtzackiger Stern | EIGHT POINTED BLACK STAR |
U+2735 (10037) | ✵ | ✵ |
Achtarmiges Windrädchen | EIGHT POINTED PINWHEEL STAR |
U+2736 (10038) | ✶ | ✶ |
Sechszackiger schwarzer Stern | SIX POINTED BLACK STAR |
U+2737 (10039) | ✷ | ✷ |
Geradlinig achtzackiger schwarzer Stern | EIGHT POINTED RECTILINEAR BLACK STAR |
U+2738 (10040) | ✸ | ✸ |
Dicker geradlinig achtzackiger schwarzer Stern | HEAVY EIGHT POINTED RECTILINEAR BLACK STAR |
U+2739 (10041) | ✹ | ✹ |
Schwarzer zwölfzackiger Stern | TWELVE POINTED BLACK STAR |
U+273A (10042) | ✺ | ✺ |
Sechszehnarmiges Sternchen | SIXTEEN POINTED ASTERISK |
U+273B (10043) | ✻ | ✻ |
Tropfensternchen | TEARDROP-SPOKED ASTERISK |
U+273C (10044) | ✼ | ✼ |
Tropfensternchen mit offener Mitte | OPEN CENTRE TEARDROP-SPOKED ASTERISK |
U+273D (10045) | ✽ | ✽ |
Dickes Tropfensternchen | HEAVY TEARDROP-SPOKED ASTERISK |
U+273E (10046) | ✾ | ✾ |
Sechsblättrige Blüte, je schwarz und weiß | SIX PETALLED BLACK AND WHITE FLORETTE |
U+273F (10047) | ✿ | ✿ |
Schwarze fünfblättrige Blüte | BLACK FLORETTE |
U+2740 (10048) | ❀ | ❀ |
Weiße fünfblättrige Blüte | WHITE FLORETTE |
U+2741 (10049) | ❁ | ❁ |
Achtblättrige umrandete schwarze Blüte | EIGHT PETALLED OUTLINED BLACK FLORETTE |
U+2742 (10050) | ❂ | ❂ |
Achtzackiger Stern mit offener Mitte im Kreis | CIRCLED OPEN CENTRE EIGHT POINTED STAR |
U+2743 (10051) | ❃ | ❃ |
Dickes tropfenförmiges Windrädchen | HEAVY TEARDROP-SPOKED PINWHEEL ASTERISK |
U+2744 (10052) | ❄ | ❄ |
Schneeflocke | SNOWFLAKE |
U+2745 (10053) | ❅ | ❅ |
Knapp gegabelte Schneeflocke | TIGHT TRIFOLIATE SNOWFLAKE |
U+2746 (10054) | ❆ | ❆ |
Astige Schneeflocke mit dicken Winkeln | HEAVY CHEVRON SNOWFLAKE |
U+2747 (10055) | ❇ | ❇ |
Funken | SPARKLE |
U+2748 (10056) | ❈ | ❈ |
Dicker Funken | HEAVY SPARKLE |
U+2749 (10057) | ❉ | ❉ |
Kugelsternchen | BALLOON-SPOKED ASTERISK |
U+274A (10058) | ❊ | ❊ |
Propellersternchen aus acht Tropfen | EIGHT TEARDROP-SPOKED PROPELLER ASTERISK |
U+274B (10059) | ❋ | ❋ |
Dickes Propellersternchen aus acht Tropfen | HEAVY EIGHT TEARDROP-SPOKED PROPELLER ASTERISK |
U+274D (10061) | ❍ | ❍ |
Weißer Kreis nach rechts schattiert | SHADOWED WHITE CIRCLE |
U+274F (10063) | ❏ | ❏ |
Weißes Quadrat unten rechts abgetrennt schattiert | LOWER RIGHT DROP-SHADOWED WHITE SQUARE |
U+2750 (10064) | ❐ | ❐ |
Weißes Quadrat oben rechts abgetrennt schattiert | UPPER RIGHT DROP-SHADOWED WHITE SQUARE |
U+2751 (10065) | ❑ | ❑ |
Weißes Quadrat nach unten rechts schattiert | LOWER RIGHT SHADOWED WHITE SQUARE |
U+2752 (10066) | ❒ | ❒ |
Weißes Quadrat nach oben rechts schattiert | UPPER RIGHT SHADOWED WHITE SQUARE |
U+2756 (10070) | ❖ | ❖ |
Schwarzes Karo ohne weißem X | BLACK DIAMOND MINUS WHITE X |
U+2758 (10072) | ❘ | ❘ |
Dünner senkrechter Strich | LIGHT VERTICAL BAR |
U+2759 (10073) | ❙ | ❙ |
Mittlererstarker senkrechter Strich | MEDIUM VERTICAL BAR |
U+275A (10074) | ❚ | ❚ |
Dicker senkrechter Strich | HEAVY VERTICAL BAR |
U+275B (10075) | ❛ | ❛ |
Dickes öffnendes halbes Anführungszeichen (englisch) | HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT |
U+275C (10076) | ❜ | ❜ |
Dickes schliessendes halbes Anführungszeichen (englisch) | HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT |
U+275D (10077) | ❝ | ❝ |
Dickes öffnendes Anführungszeichen (englisch) | HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT |
U+275E (10078) | ❞ | ❞ |
Dickes schliessendes Anführungszeichen (englisch) | HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT |
U+2761 (10081) | ❡ | ❡ |
Geschwungenes Absatzzeichen | CURVED STEM PARAGRAPH SIGN ORNAMENT |
U+2762 (10082) | ❢ | ❢ |
Dickes geschwungenes Ausrufezeichen | HEAVY EXCLAMATION MARK ORNAMENT |
U+2763 (10083) | ❣ | ❣ |
Dickes herzförmiges geschwungenes Ausrufezeichen | HEAVY HEART EXCLAMATION MARK ORNAMENT |
U+2764 (10084) | ❤ | ❤ |
Dickes schwarzes Herz | HEAVY BLACK HEART |
U+2765 (10085) | ❥ | ❥ |
Dickes schwarzes Herz, gegen den Uhrzeigersinn gedreht (Aufzählungszeichen) |
ROTATED HEAVY BLACK HEART BULLET |
U+2766 (10086) | ❦ | ❦ |
Aldusblatt (wörtl. „florales Herz“) | FLORAL HEART |
U+2767 (10087) | ❧ | ❧ |
Aldusblatt, gegen den Uhrzeigersinn gedreht (Aufzählungszeichen, wörtl. „gedrehtes florales Herz“, U+2619 REVERSED ROTATED FLORAL HEART BULLET ☙) |
ROTATED FLORAL HEART BULLET |
U+2768 (10088) | ❨ | ❨ |
Öffnende runde Klammer | MEDIUM LEFT PARENTHESIS ORNAMENT |
U+2769 (10089) | ❩ | ❩ |
Schließende runde Klammer | MEDIUM RIGHT PARENTHESIS ORNAMENT |
U+276A (10090) | ❪ | ❪ |
Abgeflachte öffnende runde Klammer | MEDIUM FLATTENED LEFT PARENTHESIS ORNAMENT |
U+276B (10091) | ❫ | ❫ |
Abgeflachte schließende runde Klammer | MEDIUM FLATTENED RIGHT PARENTHESIS ORNAMENT |
U+276C (10092) | ❬ | ❬ |
Öffnende Winkelklammer | MEDIUM LEFT-POINTING ANGLE BRACKET ORNAMENT |
U+276D (10093) | ❭ | ❭ |
Schließende Winkelklammern | MEDIUM RIGHT-POINTING ANGLE BRACKET ORNAMENT |
U+276E (10094) | ❮ | ❮ |
Fettes linksweisendes einfaches Guillemet/Chevron (Anführungszeichen) | HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT |
U+276F (10095) | ❯ | ❯ |
Fettes rechtsweisendes einfaches Guillemet/Chevron (Anführungszeichen) | HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT |
U+2770 (10096) | ❰ | ❰ |
Fette öffnende Winkelklammer | HEAVY LEFT-POINTING ANGLE BRACKET ORNAMENT |
U+2771 (10097) | ❱ | ❱ |
Fette schließende Winkelklammern | HEAVY RIGHT-POINTING ANGLE BRACKET ORNAMENT |
U+2772 (10098) | ❲ | ❲ |
Feine linke schildpattförmige (?) Zierklammer | LIGHT LEFT TORTOISE SHELL BRACKET ORNAMENT |
U+2773 (10099) | ❳ | ❳ |
Feine rechte schildpattförmige (?) Zierklammer | LIGHT RIGHT TORTOISE SHELL BRACKET ORNAMENT |
U+2774 (10100) | ❴ | ❴ |
Öffnende geschweifte Klammer | MEDIUM LEFT CURLY BRACKET ORNAMENT |
U+2775 (10101) | ❵ | ❵ |
Schließende geschweifte Klammer | MEDIUM RIGHT CURLY BRACKET ORNAMENT |
U+2776 (10102) | ❶ | ❶ |
Eins im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT ONE |
U+2777 (10103) | ❷ | ❷ |
Zwei im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT TWO |
U+2778 (10104) | ❸ | ❸ |
Drei im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT THREE |
U+2779 (10105) | ❹ | ❹ |
Vier im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT FOUR |
U+277A (10106) | ❺ | ❺ |
Fünf im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT FIVE |
U+277B (10107) | ❻ | ❻ |
Sechs im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT SIX |
U+277C (10108) | ❼ | ❼ |
Sieben im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT SEVEN |
U+277D (10109) | ❽ | ❽ |
Acht im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT EIGHT |
U+277E (10110) | ❾ | ❾ |
Neun im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED DIGIT NINE |
U+277F (10111) | ❿ | ❿ |
Zehn im gefüllten Kreis | DINGBAT NEGATIVE CIRCLED NUMBER TEN |
U+2780 (10112) | ➀ | ➀ |
Eins im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT ONE |
U+2781 (10113) | ➁ | ➁ |
Zwei im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT TWO |
U+2782 (10114) | ➂ | ➂ |
Drei im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT THREE |
U+2783 (10115) | ➃ | ➃ |
Vier im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT FOUR |
U+2784 (10116) | ➄ | ➄ |
Fünf im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT FIVE |
U+2785 (10117) | ➅ | ➅ |
Sechs im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT SIX |
U+2786 (10118) | ➆ | ➆ |
Sieben im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT SEVEN |
U+2787 (10119) | ➇ | ➇ |
Acht im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT EIGHT |
U+2788 (10120) | ➈ | ➈ |
Neun im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF DIGIT NINE |
U+2789 (10121) | ➉ | ➉ |
Zehn im Kreis, serifenlos | DINGBAT CIRCLED SANS-SERIF NUMBER TEN |
U+278A (10122) | ➊ | ➊ |
Eins im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ONE |
U+278B (10123) | ➋ | ➋ |
Zwei im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT TWO |
U+278C (10124) | ➌ | ➌ |
Drei im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT THREE |
U+278D (10125) | ➍ | ➍ |
Vier im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FOUR |
U+278E (10126) | ➎ | ➎ |
Fünf im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT FIVE |
U+278F (10127) | ➏ | ➏ |
Sechs im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SIX |
U+2790 (10128) | ➐ | ➐ |
Sieben im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT SEVEN |
U+2791 (10129) | ➑ | ➑ |
Acht im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT EIGHT |
U+2792 (10130) | ➒ | ➒ |
Neun im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT NINE |
U+2793 (10131) | ➓ | ➓ |
Zehn im gefüllten Kreis, serifenlos | DINGBAT NEGATIVE CIRCLED SANS-SERIF NUMBER TEN |
U+2794 (10132) | ➔ | ➔ |
Dicker Pfeil nach rechts mit breiter Spitze | HEAVY WIDE-HEADED RIGHTWARDS ARROW |
U+2798 (10136) | ➘ | ➘ |
Dicker Pfeil nach Südost | HEAVY SOUTH EAST ARROW |
U+2799 (10137) | ➙ | ➙ |
Dicker Pfeil nach rechts | HEAVY RIGHTWARDS ARROW |
U+279A (10138) | ➚ | ➚ |
Dicker Pfeil nach Nordost | HEAVY NORTH EAST ARROW |
U+279B (10139) | ➛ | ➛ |
Bemaßungspfeil nach rechts | DRAFTING POINT RIGHTWARDS ARROW |
U+279C (10140) | ➜ | ➜ |
Dicker Pfeil nach rechts, mit abgerundeten Balken | HEAVY ROUND-TIPPED RIGHTWARDS ARROW |
U+279D (10141) | ➝ | ➝ |
Pfeil nach rechts mit Dreiecksspitze | TRIANGLE-HEADED RIGHTWARDS ARROW |
U+279E (10142) | ➞ | ➞ |
Dicker Pfeil nach rechts mit Dreiecksspitze | HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW |
U+279F (10143) | ➟ | ➟ |
Strichlierter Pfeil nach rechts mit Dreiecksspitze | DASHED TRIANGLE-HEADED RIGHTWARDS ARROW |
U+27A0 (10144) | ➠ | ➠ |
Dicker strichlierter Pfeil nach rechts mit Dreiecksspitze | HEAVY DASHED TRIANGLE-HEADED RIGHTWARDS ARROW |
U+27A1 (10145) | ➡ | ➡ |
Schwarzer Pfeil nach rechts | BLACK RIGHTWARDS ARROW |
U+27A2 (10146) | ➢ | ➢ |
Dreidimensionaler Pfeil nach rechts, oben weiß | THREE-D TOP-LIGHTED RIGHTWARDS ARROWHEAD |
U+27A3 (10147) | ➣ | ➣ |
Dreidimensionaler Pfeil nach rechts, unten weiß | THREE-D BOTTOM-LIGHTED RIGHTWARDS ARROWHEAD |
U+27A4 (10148) | ➤ | ➤ |
Schwarze Pfeilspitze nach rechts | BLACK RIGHTWARDS ARROWHEAD |
U+27A5 (10149) | ➥ | ➥ |
Dicker schwarzer Pfeil, nach unten und rechts gebogen | HEAVY BLACK CURVED DOWNWARDS AND RIGHTWARDS ARROW |
U+27A6 (10150) | ➦ | ➦ |
Dicker schwarzer Pfeil, nach oben und rechts gebogen | HEAVY BLACK CURVED UPWARDS AND RIGHTWARDS ARROW |
U+27A7 (10151) | ➧ | ➧ |
Gestauchter schwarzer Pfeil nach rechts | SQUAT BLACK RIGHTWARDS ARROW |
U+27A8 (10152) | ➨ | ➨ |
Dicker konkavspitzer schwarzer Pfeil nach rechts | HEAVY CONCAVE-POINTED BLACK RIGHTWARDS ARROW |
U+27A9 (10153) | ➩ | ➩ |
Weißer Pfeil nach rechts mit Rechtsschatten | RIGHT-SHADED WHITE RIGHTWARDS ARROW |
U+27AA (10154) | ➪ | ➪ |
Weißer Pfeil nach rechts mit Linksschatten | LEFT-SHADED WHITE RIGHTWARDS ARROW |
U+27AB (10155) | ➫ | ➫ |
Nach hinten gekippter Pfeil nach rechts mit Schatten | BACK-TILTED SHADOWED WHITE RIGHTWARDS ARROW |
U+27AC (10156) | ➬ | ➬ |
Nach vorn gekippter Pfeil nach rechts mit Schatten | FRONT-TILTED SHADOWED WHITE RIGHTWARDS ARROW |
U+27AD (10157) | ➭ | ➭ |
Dicker weißer Pfeil nach rechts mit Schatten rechts unten | HEAVY LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
U+27AE (10158) | ➮ | ➮ |
Dicker weißer Pfeil nach rechts mit Schatten rechts oben | HEAVY UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
U+27AF (10159) | ➯ | ➯ |
Gekerbter Pfeil nach rechts mit Schatten rechts unten | NOTCHED LOWER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
U+27B1 (10161) | ➱ | ➱ |
Gekerbter Pfeil nach rechts mit Schatten rechts oben | NOTCHED UPPER RIGHT-SHADOWED WHITE RIGHTWARDS ARROW |
U+27B2 (10162) | ➲ | ➲ |
Dicker weißer Pfeil nach rechts im Kreis | CIRCLED HEAVY WHITE RIGHTWARDS ARROW |
U+27B3 (10163) | ➳ | ➳ |
Weiß-gefiederter Pfeil nach rechts | WHITE-FEATHERED RIGHTWARDS ARROW |
U+27B4 (10164) | ➴ | ➴ |
Schwarz-gefiederter Pfeil nach Südosten | BLACK-FEATHERED SOUTH EAST ARROW |
U+27B5 (10165) | ➵ | ➵ |
Schwarz-gefiederter Pfeil nach rechts | BLACK-FEATHERED RIGHTWARDS ARROW |
U+27B6 (10166) | ➶ | ➶ |
Schwarz-gefiederter Pfeil nach Nordosten | BLACK-FEATHERED NORTH EAST ARROW |
U+27B7 (10167) | ➷ | ➷ |
Dicker schwarz-gefiederter Pfeil nach Südosten | HEAVY BLACK-FEATHERED SOUTH EAST ARROW |
U+27B8 (10168) | ➸ | ➸ |
Dicker schwarz-gefiederter Pfeil nach rechts | HEAVY BLACK-FEATHERED RIGHTWARDS ARROW |
U+27B9 (10169) | ➹ | ➹ |
Dicker schwarz-gefiederter Pfeil nach Nordosten | HEAVY BLACK-FEATHERED NORTH EAST ARROW |
U+27BA (10170) | ➺ | ➺ |
Pfeil nach rechts mit tropfenförmigen Widerhaken | TEARDROP-BARBED RIGHTWARDS ARROW |
U+27BB (10171) | ➻ | ➻ |
Dicker Pfeil nach rechts mit tropfenförmigem Schaft | HEAVY TEARDROP-SHANKED RIGHTWARDS ARROW |
U+27BC (10172) | ➼ | ➼ |
Keilschwänziger Pfeil nach rechts | WEDGE-TAILED RIGHTWARDS ARROW |
U+27BD (10173) | ➽ | ➽ |
Dicker keilschwänziger Pfeil nach rechts | HEAVY WEDGE-TAILED RIGHTWARDS ARROW |
U+27BE (10174) | ➾ | ➾ |
Offen konturierter Pfeil nach rechts | OPEN-OUTLINED RIGHTWARDS ARROW |
Unicode-Block: CJK-Symbols and Punctuation U+3000 (12288) – U+303f(12351) → Unicode.org chart U+3000 (12288) – U+303f(12351) (PDF), → Unicode.org chart U+3000 (12288) – U+303f(12351)
Unicode-CJK-Symbols and Punctuation U+3000 (12288) – U+303f(12351) | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
x= | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | a | b | c | d | e | f |
U+300 x
| 、 | 。 | 〃 | 〄 | 々 | 〆 | 〇 | 〈 | 〉 | 《 | 》 | 「 | 」 | 『 | 』 | |
U+301 x
| 【 | 】 | 〒 | 〓 | 〔 | 〕 | 〖 | 〗 | 〘 | 〙 | 〚 | 〛 | 〜 | 〝 | 〞 | 〟 |
U+302 x
| 〠 | 〡 | 〢 | 〣 | 〤 | 〥 | 〦 | 〧 | 〨 | 〩 | 〪 | 〫 | 〬 | 〭 | 〮 | 〯 |
U+303 x
| 〰 | 〱 | 〲 | 〳 | 〴 | 〵 | 〶 | 〷 | 〸 | 〹 | 〺 | 〻 | 〼 | 〽 | 〾 | 〿 |
Es gibt zahlreiche Schriften für die Weltsprachen. In Indien gibt es z.B. 15 zugelassene Sprachen. Eine XHTML-Seite kann länderspezifische Zeichensätze verwenden. XHTML unterstützt Meta-Infos, die der Unicode-Zeichen-Darstellung dienen. Wie werden die Schriften in einer XHML-Seite unterstützt?
Wie sieht ein typischer Aufbau einer XHTML-Seite (mit Meta-Infos) aus? Eine XHTML-Seite besteht aus einem hirachischen strukturiertem Dokument. XHTML unterstützt Zeichensätze, die der Unicode-Zeichen-Darstellung dienen. Eine XHTML-Page mit einer XML-Verarbeitungsanweisung (1.Zeile) hat den typischen Aufbau:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" lang="de" xml:lang="de"> <head> <title> mein-titel </title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <meta http-equiv="Content-Script-Type" content="text/javascript" charset="" /> <meta http-equiv="Content-Style-Type" content="text/css" charset="" /> <meta http-equiv="expires" content="0" /> <!-- Page von Org-Adresse laden --> <meta http-equiv="refresh" content="5; URL=http://www.fh-giessen.de/~hg54/" /> <meta name="author" content="mein Name" /> <meta name="copyright" content="besitze alle Rechte" /> <meta http-equiv="Content-Style-Type" content="text/css" /> <meta name="description" content="Kurzbeschreibung" /> <meta name="keywords" content="Inhalt-Schlüsselworte" /> <link type='text/css' rel='stylesheet' href='html-standard.css' /> </head> <body> ... </body> </html>
Wie werden XHTML-eigene Zeichen maskiert? Die folgenden Zeichen haben in XHTML eine "Umschalt"-Bedeutung und müssen bei einer Darstellung in einer html-Page maskiert werden, wenn das grafische Zeichen angezeigt werden soll.
Zeichen | Bedeutung | named-entity | Unicode |
---|---|---|---|
& | "Ampersand" | & | & |
< | "lower than" | < | < |
> | "greater than" | > | > |
" | "Quote" | " | " |
' | "Apostroph" (XML) | ' | ' |
Die ANSI-Codierung (wie z.B. charset=iso-8859-1) lässt nur Zeichen zu, die von der aktuellen Codepage unterstützt werden. Dadurch wird die internationale Verwendbarkeit eingeschränkt.
Was sind "benannte Zeichen" (named entities)?
&
)
α
)
∑
)
⌈
)
↑
)
•
)
‰
—
–
)
©
®
¢
£
¦
Wie können deutsche Umlaut in XHTML als "benannte Zeichen" (named entities) dargestellt werden?
html kann den Unicode (ISO-10646-Standard = Unicode-Standard) verwenden, der Deutsche Umlaute und scharfes S enthält. Das Euro-Zeichen ( € ) kann z.B. im html-Quelltext durch € dargestellt werden. Ohne Angaben zum verwendeten Zeichencode, müssen deutsche Umlaute durch benannte und/oder benummerte html-Zeichen ersetzt werden.
Zeichen | ä | Ä | ö | Ö | ü | Ü | ß |
html-Ersatzname (named entities) | ä | Ä | ö | Ö | ü | Ü | ß |
Beispiel für html-Quellttext mit named-entities:
XHTML-Quelltext: Fräulein Müllers möchte einen Kuß.
Browser-Anzeige: Fräulein Müllers möchte einen Kuß.
Zum Austausch von Zeichen in einem String var str;
kann die eingebaute ECMAScript-Funktion
str = str.replace(FindRe, Replstring)
verwendet werden.
Als Beispiel einige Paare für /RegExpr/ und "ErsatzStr"
/\ä/ ersetze durch "ä", /\Ä/ ersetze durch "Ä", /\ö/ ersetze durch "ö", /\Ö/ ersetze durch "Ö", /\ü/ ersetze durch "ü", /\Ü/ ersetze durch "Ü", /\ß/ ersetze durch "ß"
Die Dateien lat1.ent, symbol.ent, spezial.ent sind Bestandteil der XHTML-DTD (Dokumenttypdefinition, engl. DTD = Document Type Definition, auch Schema-Definition oder DOCTYPE) :
Die Datei lat1.ent gehört zur XHTML-DTD xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. Diese "benannten" (mit Kurznamen benamte) und "benummerten" Zeichen können in XHTML verwendet werden. Nicht alle Zeichen werden derzeit von allen Browsern unterstützt (siehe Datei lat1.ent ).
ENTITIES Latin 1 for XHTML (lat1.ent) | ||||||||
---|---|---|---|---|---|---|---|---|
  | | ¡ | ¡ | ¡ | ¢ | ¢ | ¢ | |
£ | £ | £ | ¤ | ¤ | ¤ | ¥ | ¥ | ¥ |
¦ | ¦ | ¦ | § | § | § | ¨ | ¨ | ¨ |
© | © | © | ª | ª | ª | « | « | « |
¬ | ¬ | ¬ | | ­ | ­ | ® | ® | ® |
¯ | ¯ | ¯ | ° | ° | ° | ± | ± | ± |
² | ² | ² | ³ | ³ | ³ | ´ | ´ | ´ |
µ | µ | µ | ¶ | ¶ | ¶ | · | · | · |
¸ | ¸ | ¸ | ¹ | ¹ | ¹ | º | º | º |
» | » | » | ¼ | ¼ | ¼ | ½ | ½ | ½ |
¾ | ¾ | ¾ | ¿ | ¿ | ¿ | À | À | À |
Á | Á | Á | Â | Â | Â | Ã | Ã | Ã |
Ä | Ä | Ä | Å | Å | Å | Æ | Æ | Æ |
Ç | Ç | Ç | È | È | È | É | É | É |
Ê | Ê | Ê | Ë | Ë | Ë | Ì | Ì | Ì |
Í | Í | Í | Î | Î | Î | Ï | Ï | Ï |
Ð | Ð | Ð | Ñ | Ñ | Ñ | Ò | Ò | Ò |
Ó | Ó | Ó | Ô | Ô | Ô | Õ | Õ | Õ |
Ö | Ö | Ö | × | × | × | Ø | Ø | Ø |
Ù | Ù | Ù | Ú | Ú | Ú | Û | Û | Û |
Ü | Ü | Ü | Ý | Ý | Ý | Þ | Þ | Þ |
ß | ß | ß | à | à | à | á | á | á |
â | â | â | ã | ã | ã | ä | ä | ä |
å | å | å | æ | æ | æ | ç | ç | ç |
è | è | è | é | é | é | ê | ê | ê |
ë | ë | ë | ì | ì | ì | í | í | í |
î | î | î | ï | ï | ï | ð | ð | ð |
ñ | ñ | ñ | ò | ò | ò | ó | ó | ó |
ô | ô | ô | õ | õ | õ | ö | ö | ö |
÷ | ÷ | ÷ | ø | ø | ø | ù | ù | ù |
ú | ú | ú | û | û | û | ü | ü | ü |
ý | ý | ý | þ | þ | þ | ÿ | ÿ | ÿ |
xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. Diese "benannten" (mit Kurznamen benamte) und "benummerten" Zeichen können in XHTML verwendet werden (siehe symbol.ent ). Nicht alle Zeichen werden derzeit von allen Browsern unterstützt
ENTITIES Symbols for XHTML (symbol.ent) | ||||||||
---|---|---|---|---|---|---|---|---|
ƒ | ƒ | ƒ | Α | Α | Α | Β | Β | Β |
Γ | Γ | Γ | Δ | Δ | Δ | Ε | Ε | Ε |
Ζ | Ζ | Ζ | Η | Η | Η | Θ | Θ | Θ |
Ι | Ι | Ι | Κ | Κ | Κ | Λ | Λ | Λ |
Μ | Μ | Μ | Ν | Ν | Ν | Ξ | Ξ | Ξ |
Ο | Ο | Ο | Π | Π | Π | Ρ | Ρ | Ρ |
Σ | Σ | Σ | Τ | Τ | Τ | Υ | Υ | Υ |
Φ | Φ | Φ | Χ | Χ | Χ | Ψ | Ψ | Ψ |
Ω | Ω | Ω | α | α | α | β | β | β |
γ | γ | γ | δ | δ | δ | ε | ε | ε |
ζ | ζ | ζ | η | η | η | θ | θ | θ |
ι | ι | ι | κ | κ | κ | λ | λ | λ |
μ | μ | μ | ν | ν | ν | ξ | ξ | ξ |
ο | ο | ο | π | π | π | ρ | ρ | ρ |
ς | ς | ς | σ | σ | σ | τ | τ | τ |
υ | υ | υ | φ | φ | φ | χ | χ | χ |
ψ | ψ | ψ | ω | ω | ω | ϑ | ϑ | ϑ |
ϒ | ϒ | ϒ | ϖ | ϖ | ϖ | • | • | • |
… | … | … | ′ | ′ | ′ | ″ | ″ | ″ |
‾ | ‾ | ‾ | ⁄ | ⁄ | ⁄ | ℘ | ℘ | ℘ |
ℑ | ℑ | ℑ | ℜ | ℜ | ℜ | ™ | ™ | ™ |
ℵ | ℵ | ℵ | ← | ← | ← | ↑ | ↑ | ↑ |
→ | → | → | ↓ | ↓ | ↓ | ↔ | ↔ | ↔ |
↵ | ↵ | ↵ | ⇐ | ⇐ | ⇐ | ⇑ | ⇑ | ⇑ |
⇒ | ⇒ | ⇒ | ⇓ | ⇓ | ⇓ | ⇔ | ⇔ | ⇔ |
∀ | ∀ | ∀ | ∂ | ∂ | ∂ | ∃ | ∃ | ∃ |
∅ | ∅ | ∅ | ∇ | ∇ | ∇ | ∈ | ∈ | ∈ |
∉ | ∉ | ∉ | ∋ | ∋ | ∋ | ∏ | ∏ | ∏ |
∑ | ∑ | ∑ | − | − | − | ∗ | ∗ | ∗ |
√ | √ | √ | ∝ | ∝ | ∝ | ∞ | ∞ | ∞ |
∠ | ∠ | ∠ | ∧ | ∧ | ∧ | ∨ | ∨ | ∨ |
∩ | ∩ | ∩ | ∪ | ∪ | ∪ | ∫ | ∫ | ∫ |
∴ | &8756; | ∴ | ∼ | ∼ | ∼ | ≅ | ≅ | ≅ |
≈ | ≈ | ≈ | ≠ | ≠ | ≠ | ≡ | ≡ | ≡ |
≤ | ≤ | ≤ | ≥ | ≥ | ≥ | ⊂ | ⊂ | ⊂ |
⊃ | ⊃ | ⊃ | ⊄ | ⊄ | ⊄ | ⊆ | ⊆ | ⊆ |
⊇ | ⊇ | ⊇ | ⊕ | ⊕ | ⊕ | ⊗ | ⊗ | ⊗ |
⊥ | ⊥ | ⊥ | ⋅ | ⋅ | ⋅ | ⌈ | ⌈ | ⌈ |
⌉ | ⌉ | ⌉ | ⌊ | ⌊ | ⌊ | ⌋ | ⌋ | ⌋ |
〈 | 〈 | ⟨ | 〉 | 〉 | ⟩ | ◊ | ◊ | ◊ |
♠ | ♠ | ♠ | ♣ | ♣ | ♣ | ♥ | ♥ | ♥ |
♦ | ♦ | ♦ |
xhtml1-....dtd includiert die Dateien lat1.ent, symbol.ent, spezial.ent, die die "ENTITIES"-Zeichen-Definitionen enthalten. XHTML verwendet werden (siehe Datei special.ent ). Nicht alle Zeichen werden derzeit von allen Browsern unterstützt (siehe special.ent ).
XHTML (spezial.ent) | ||||||||
---|---|---|---|---|---|---|---|---|
" | " | " | & | & | & | < | < | < |
> | > | > | ' | ' | ' | Œ | Œ | Œ |
œ | œ | œ | Š | Š | Š | š | š | š |
Ÿ | Ÿ | Ÿ | ˆ | ˆ | ˆ | ˜ | ˜ | ˜ |
  |   |   |   |   |   | |||
| ‌ | ‌ | | ‍ | ‍ | | ‎ | ‎ |
| ‏ | ‏ | – | – | – | — | — | — |
‘ | ‘ | ‘ | ’ | ’ | ’ | ‚ | ‚ | ‚ |
“ | “ | “ | ” | ” | ” | „ | „ | „ |
† | † | † | ‡ | ‡ | ‡ | ‰ | ‰ | ‰ |
‹ | ‹ | ‹ | › | › | › | € | € | € |
ECMAScript 262 wird umgangssprachlich JavaScript genannt und ist eine standardisierte Skriptsprache
(modern, schlank, dynamisch typisierte, objektorientiert aber klassenlos, Prototypen;
kann prozedural, funktional, objektorientiert fuer DOM-Scripting in Web-Browsern).
ECMAScript
-Programm erstellt aus den XHTML
-lat1-Zeichen
mit Hilfe von unescape("%"+j.toString(16));
die Tabelle für die "benummerten" und "benannte" Zeichen.
function build_html_zeichen_tabelle() { // start_idx=160 var aa=[" ", "¡", "¢", "£", "¤", "¥", "¦","§", "¨", "©", "ª", "«", "¬", "", "®", "¯", "°", "±","²", "³", "´", "µ","¶", "·","¸", "¹", "º", "»", "¼","½", "¾","¿","À","Á","Â", "Ã","Ä", "Å","Æ","Ç", "È","É","Ê","Ë","Ì", "Í","Î", "Ï","Ð","Ñ", "Ò","Ó","Ô","Õ","Ö", "×","Ø", "Ù","Ú","Û", "Ü","Ý", "Þ","ß","à", "á","â", "ã","ä","å", "æ","ç", "è","é","ê", "ë","ì", "í","î","ï", "ð","ñ", "ò","ó","ô", "õ","ö", "÷","ø","ù", "ú","û", "ü","ý","þ", "ÿ"]; var j, start_idx=160, s=""; for(var i=0;i< aa.length; i++){ j = start_idx + i; s +="<br />\"" + unescape("%"+j.toString(16)); s += "\", \"" + "&#"+j; s += "\", \"" + aa[i]; s += "\","; } document.write(s); } build_html_zeichen_tabelle();
Das A als benummertes XHTML/XML
Zeichen
kann durch A
dargestellt werden.
Die eingebaute ECMAScript-unescape()
-Funktion liefert den HTML-Zeichencode.
Der folgende ECMAScript-Code dient dazu, mit Hilfe der Funktion unescape()
die benummerte Zeichenmaskierung (wie z.B. A
) darzustaellen.
<textarea id="SRC" cols="90" rows="22"> </textarea> <script type="text/javascript"> var s="<table border='1'>"; for (var i=0; i < 256; i++) { if((i%10)==0) if(i==0) s += "\n<tr><th colspan=\"2\">" + i + ":"; else s += "\n</th></tr><tr><th colspan=\"2\">" + i + ":"; s += "</th><td>"+unescape("%"+i.toString(16)); s += "</td><th>&#"+i; } s += "</td></tr></table>" document.getElementById("SRC").value=s; </script>
Anzeige von unescape("%"+i.toString(16)) und &#i | |||||||||||||||||||||
i | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | % | &# | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0: | %0 | � | %1 |  | %2 |  | %3 |  | %4 |  | %5 |  | %6 |  | %7 |  | %8 |  | %9 | 	 | |
10: | %a | 
 | %b |  | %c |  | %d | 
 | %e |  | %f |  |  |  |  |  | |||||
20: |  |  |  |  |  |  |  |  |  |  | |||||||||||
30: |  |  |   | ! | ! | " | " | # | # | $ | $ | % | % | & | & | ' | ' | ||||
40: | ( | ( | ) | ) | * | * | + | + | , | , | - | - | . | . | / | / | 0 | 0 | 1 | 1 | |
50: | 2 | 2 | 3 | 3 | 4 | 4 | 5 | 5 | 6 | 6 | 7 | 7 | 8 | 8 | 9 | 9 | : | : | ; | ; | |
60: | < | < | = | = | > | > | ? | ? | @ | @ | A | A | B | B | C | C | D | D | E | E | |
70: | F | F | G | G | H | H | I | I | J | J | K | K | L | L | M | M | N | N | O | O | |
80: | P | P | Q | Q | R | R | S | S | T | T | U | U | V | V | W | W | X | X | Y | Y | |
90: | Z | Z | [ | [ | \ | \ | ] | ] | ^ | ^ | _ | _ | ` | ` | a | a | b | b | c | c | |
100: | d | d | e | e | f | f | g | g | h | h | i | i | j | j | k | k | l | l | m | m | |
110: | n | n | o | o | p | p | q | q | r | r | s | s | t | t | u | u | v | v | w | w | |
120: | x | x | y | y | z | z | { | { | | | | | } | } | ~ | ~ |  | € |  | ||||
130: | ‚ | ƒ | „ | … | † | ‡ | ˆ | ‰ | Š | ‹ | |||||||||||
140: | Œ |  | Ž |  |  | ‘ | ’ | “ | ” | • | |||||||||||
150: | – | — | ˜ | ™ | š | › | œ |  | ž | Ÿ | |||||||||||
160: |   | ¡ | ¡ | ¢ | ¢ | £ | £ | ¤ | ¤ | ¥ | ¥ | ¦ | ¦ | § | § | ¨ | ¨ | © | © | ||
170: | ª | ª | « | « | ¬ | ¬ | | ­ | ® | ® | ¯ | ¯ | ° | ° | ± | ± | ² | ² | ³ | ³ | |
180: | ´ | ´ | µ | µ | ¶ | ¶ | · | · | ¸ | ¸ | ¹ | ¹ | º | º | » | » | ¼ | ¼ | ½ | ½ | |
190: | ¾ | ¾ | ¿ | ¿ | À | À | Á | Á | Â | Â | Ã | Ã | Ä | Ä | Å | Å | Æ | Æ | Ç | Ç | |
200: | È | È | É | É | Ê | Ê | Ë | Ë | Ì | Ì | Í | Í | Î | Î | Ï | Ï | Ð | Ð | Ñ | Ñ | |
210: | Ò | Ò | Ó | Ó | Ô | Ô | Õ | Õ | Ö | Ö | × | × | Ø | Ø | Ù | Ù | Ú | Ú | Û | Û | |
220: | Ü | Ü | Ý | Ý | Þ | Þ | ß | ß | à | à | á | á | â | â | ã | ã | ä | ä | å | å | |
230: | æ | æ | ç | ç | è | è | é | é | ê | ê | ë | ë | ì | ì | í | í | î | î | ï | ï | |
240: | ð | ð | ñ | ñ | ò | ò | ó | ó | ô | ô | õ | õ | ö | ö | ÷ | ÷ | ø | ø | ù | ù | |
250: | ú | ú | û | û | ü | ü | ý | ý | þ | þ | ÿ | ÿ |
Zu XML -Dokumenten gehört eine Dokumenttypdefinition DTD (englisch Document Type Definition, DTD, auch Schema-Definition oder DOCTYPE) ist ein Satz an Regeln für Dokumente. Eine DTD legt die gültige die Struktur des Dokuments fest, d.h. eine DTD legt die Reihenfolge, die Verschachtelung der Elemente und die mögliche Art des Inhalts von Attributen fest.
Bei XML werden die Tags und die zugehörigen .dtd's (bzw. .xsd's) abhängig von den Datenstrukturen durch den User entwickelt. Benummerte Unicode-Zeichen können unmittelbar verwendet werden. Für "benannten" (mit Kurznamen benamte) Unicode-Zeichen müssen die "ENTITIES"-Zeichen-Definitionen erstellt werden, denn zunächst existieren ffür XML keine "Unicode-Kurz-Namen für Zeichen".
Bei einem zeichesystem spielen
die "Leer-Zeichen" eine wesentliche Rolle.
Bei
XML
bedürfen Daten, die "white-Chr"-erhalten
und deren Umwandlung (Transformationen) besonderer Aufmerksamkeit (z.B. sog. "nichtdruckbare Zeichen").
Die meisten Unicode-Zeichen liegen in den Unicode-Bereichen
#x20-#xD7FF
, #xE000-#xFFFD
, #x10000-#x10FFFF
.
Es gibt auch Werte, die nicht erlaubt sind, wie z.B. #xFFFE
und #xFFFF
.
Leerraumzeichen (Steuerzeichen) sind z.B.
Tabulator-Zeichen (#x09;
)
Zeilenvorschub-Zeichen (#x0a;
)
Wagenrücklaufzeichen (#x0d;
)
normales Leerzeichen (#x20;
)
Unicode "blank characters" sind:
0009 = HT 000a = LF 000b = VT 000c = FF 000d = CR 0020 = space 0085 = next line 00a0 = non-breaking space 1680 = Ogham space mark 180e = Mongolian vowel separator 2000 - 2000b = spaces of different sizes, including zero 2028 = line separator 2029 = paragraph separator 202f = narrow no-break space 205f = medium mathematical space 3000 = ideographic space feff = zero-width no-break space
Hier eine Tabelle für Unicode-Separatoren (Space):
Unicode und Separator, Space | |||||
---|---|---|---|---|---|
Symbol | Hex-Code | Dez-Code | Unicode-Name | Block | Vers |
A B |   | 32 | SPACE | Basic Latin | 2.1 |
A B |   | 160 | NO-BREAK SPACE | Latin-1 Supplement | 2.1 |
A B |   | 8192 | EN QUAD | General Punctuation | 2.1 |
A B |   | 8193 | EM QUAD | 2.1 | |
A B |   | 8194 | EN SPACE | 2.1 | |
A B |   | 8195 | EM SPACE | 2.1 | |
A B |   | 8196 | THREE-PER-EM SPACE | 2.1 | |
A B |   | 8197 | FOUR-PER-EM SPACE | 2.1 | |
A B |   | 8198 | SIX-PER-EM SPACE | 2.1 | |
A B |   | 8199 | FIGURE SPACE | 2.1 | |
A B |   | 8200 | PUNCTUATION SPACE | 2.1 | |
A B |   | 8201 | THIN SPACE | 2.1 | |
A B |   | 8202 | HAIR SPACE |
|
2.1 |
A B |   | 12288 | IDEOGRAPHIC SPACE | CJK Symbols and Punctuation | 2.1 |
A B |   | 8239 | NARROW NO-BREAK SPACE | 3.0 | |
A B |   | 5760 | OGHAM SPACE MARK | Ogham | 3.0 |
AB | ᠎ | 6158 | MONGOLIAN VOWEL SEPARATOR | Mongolian | 3.0 |
A B |   | 8287 | MEDIUM MATHEMATICAL SPACE | 3.2 |
XML-Dokumente dürfen CDATA-Abschnitte enthalten. Diese werden nicht vom Parser interpretiert.
<![CDATA[<Element>dieses Element wird nur als Zeichenfolge ausgegeben</Element>]]>
Wie sind html-Zeichen in .dtd eingebunden?
Das Prozentzeichen % sagt, dass das Entity (der Inhalt) zum Bestandteil der aktuellen DTD wird.
.dtd-Beispiel für benannte Zeichen
Die benutzerdefinierten Zeichen &smiley_traurig; und &smiley_froehlich; werden bei xml in einer my_smilies.dtd hinterlegt (falls der Browser den Unicode hinreichend unterstützt).
<!-- in my_smilies.dtd --> <!ENTITY smiley_traurig "⍩" > <!ENTITY smiley_froehlich "⍪" > <!ELEMENT smilies (#PCDATA)> <!-- in my_smilies.xml --> <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE smilies SYSTEM "my_smilies.dtd"> <smilies> &smiley_froehlich; oder &smiley_traurig;? </smilies>
<!ENTITY % HTML_Chars PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML" > %HTML_Chars;
xml und externe Ressourcen
<!-- news.dtd: --> <!ELEMENT news (newsdaten)*> <!ENTITY datenquelle SYSTEM "news.txt" > <!ELEMENT newsdaten EMPTY> <!ATTLIST newsdaten quelle ENTITY #REQUIRED> <!-- .xml: --> <?xml version="1.0" encoding="ISO-8859-1" ?> <!DOCTYPE news SYSTEM "news.dtd"> <news> <newsdaten quelle="datenquelle" /> </news>
Das Prozentzeichen % sagt, dass das Entity (der Inhalt) zum Bestandteil der aktuellen DTD wird.
<!-- produkt.dtd: --> <!ELEMENT produkt (warennummer,bezeichnung,hersteller)> <!ELEMENT warennummer (#PCDATA)> <!ELEMENT bezeichnung (#PCDATA)> <!ELEMENT hersteller (#PCDATA)> <!-- Bestellungen.dtd: --> <!ENTITY % produktdaten SYSTEM "produkt.dtd" > %produktdaten; <!ELEMENT bestellungen (bestellung)*> <!ELEMENT bestellung (produkt,besteller,anzahl,preis)*> <!ELEMENT besteller (#PCDATA)> <!ELEMENT anzahl (#PCDATA)> <!ELEMENT preis (#PCDATA)>