LUCR

LaTeX Unicode Character References

Aim:

Unique "canonical" LaTeX representation (LICR) for every Unicode character that can be represented by LaTeX (standard + commonly used packages)

Answer to the question: "Where can I find an official list of LaTeX macros for Unicode characters?"

Rationale:

Define a 7-bit ASCII representation of Unicode text that allows lossless bidirectional conversion between LaTeX-code <-> Unicode

Use cases:
  • *.dfu files for the utf8 inputenc option
  • xunicode for XeTeX
  • inputenc for LuaTeX
  • LaTeX frontends (LyX, Docutils, ...)
  • encoding converters (recode, Python codec module, ...)

inputencs extensible utf8-support is coupled to the declared font encodings. However, font encodings overlap. Canonical LICRs help to avoid incompatibilities.

Considering the complete Unicode set helps to avoid short-cuts, e.g.

\textmu = µ = MICRO SIGN ≠ μ GREEK SMALL LETTER MU

1 State of the Art

Standard (inputenc) support for Unicode very limited:

LICR

LaTeX internal character reference

  • not formally defined (described in the "LaTeX compagnion")

  • 7-bit character representation between inputenc and fontenc (ASCII-encoded)

  • only text mode (math is handled separately)

  • macros defined with \DeclareTextSymbol or \DeclareTextCommand, e.g. \textbackslash, \eth

    combinations of accent macros (defined with \DeclareTextAccent) with ASCII chars or other LIRCs e.g. \"a, \@tabacckludge`\cyrg or the (silly) \u\textcopyright

Comparison with Unicode character codes

LICR Unicode
macros with descriptive names code point (natural number) + Unicode Character Name
∃ aliases for convenience or due to unique (with few exceptions)
"historic reasons" or different  
naming by different packages  
decorations via accent macros decorations via combining chars
(\accent{\basechar}) (basechar + combining char) or with pre-composed chars
features "programmed" into macros features described by character
or via "knowledge" of the macro classes
in other parts/packages  
(e.g. @uclclist)  

XML entity names

There is a large set of short mnemonic names designed for ASCII input of Unicode characters:

XML Entity Definitions for Characters

W3C Recommendation 01 April 2010

This document defines several sets of names, so that to each name is assigned a Unicode character or sequence of characters.