Cyrillic in LaTeX and Postscript and Unicode

Cyrillic, LaTeX, Postscript and Unicode

Let's see how to deal with the Cyrillic alphabet in LaTeX, Postscript, and Unicode. You might also be interested in this free on-line journal on Postscript and PDF. And just in case you need a PDF to Word converter, use OpenOffice with its PDF Import Extension. You can import PDF and export as Word, all with free software! Here are some other great sources of detailed information on how to deal with LaTeX fonts:

Cyrillic in LaTeX

The following produces Cyrillic Postscript output for me. There are other ways of doing this, see this tug.org page for a starting point.

1 — Use the cyrillic package
Include this line in your latex preamble:
\usepackage{cyrillic}
If that fails with an error about being unable to find the cyrillic package, and you cannot find the right software package to add, you could try my cyrillic.sty file. Put it somewhere, and now you will use it like this:
\usepackage{/home/cromwell/.latex/cyrillic}
Note that you do not include the ".sty" part of the file name.

2 — Define some Cyrillic fonts
Include these lines in your latex preamble, right after the above \usepackage line:

\newcommand{\cyrrm}{\fontencoding{OT2}\selectfont\textcyrup}
\newcommand{\cyrit}{\fontencoding{OT2}\selectfont\textcyrit}
\newcommand{\cyrsl}{\fontencoding{OT2}\selectfont\textcyrsl}
\newcommand{\cyrsf}{\fontencoding{OT2}\selectfont\textcyrsf}
\newcommand{\cyrbf}{\fontencoding{OT2}\selectfont\textcyrbf}
\newcommand{\cyrsc}{\fontencoding{OT2}\selectfont\textcyrsc}
%%%% cyrrm = "Roman", or really upright, normal font
%%%% cyrit = Italic (cursive forms of letters)
%%%% cyrsl = Italic (non-cursive forms of letters)
%%%% cyrsf = Sans-serif
%%%% cyrbf = Bold-face 

3 — Use transliteration
For the most part, latex will "do the right thing" turning your ASCII typing into Russian, if you are careful. Examine your output carefully, and adjust as needed. I have no idea about transliteration of other Slavic languages that use Cyrillic — Ukrainian and Belarussian are probably close enough, but for Serbian, Macedonian, Bulgarian, and other South Slavic languages, look at some of those web sites above.

Some special characters:

 \cprime  ь "soft sign" \cdprime ъ "hard sign" \u{i} й "i-kratkaya" \"{e} ё "yoh" \{e} э "e-oborotnoye" \{E} Э "E-oborotnoye"

For the last two, э/Э, notice that the quote character is the one that slopes down, like an opening quote mark within text, and not the more commonly used single-quote. That is, ASCII 0x60 and not 0x27.

You may need to specify that a letter or a letter pair should stand on its own. For example, this:
{\cyrrm{Tsiolkovski\u{i}} y Krushchev}
will generate this:
Циолковский ы Крущев
But if you really want this instead:
Тсиолковский Ы Крушчев
you need to specify that the T and sh should not be combined with what follows. Put them inside curly braces:
{\cyrrm{{T}siolkovski\u{i}} y Kru{sh}chev}

Here is an example, the same silly text in several fonts:

%% Start the document
\documentclass[letterpaper,12pt]{letter}
\usepackage[dvips]{color}
\makeatother
%% Cyrillic font definitions
\usepackage{/home/cromwell/.latex/cyrillic}
\newcommand{\cyrrm}{\fontencoding{OT2}\selectfont\textcyrup}
\newcommand{\cyrit}{\fontencoding{OT2}\selectfont\textcyrit}
\newcommand{\cyrsl}{\fontencoding{OT2}\selectfont\textcyrsl}
\newcommand{\cyrsf}{\fontencoding{OT2}\selectfont\textcyrsf}
\newcommand{\cyrbf}{\fontencoding{OT2}\selectfont\textcyrbf}
\newcommand{\cyrsc}{\fontencoding{OT2}\selectfont\textcyrsc}
\newcommand{\lat}{\fontencoding{OT1}\selectfont}

%%% Support for "\begin{alltt}...\end{alltt}"
\usepackage{alltt}
%%% Support \euro for Euro symbol
\usepackage{textcomp}
\makeatother
\newcommand{\euro}{\textsf{\texteuro}}

\begin{document}

{\cyrrm{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

{\cyrsl{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

{\cyrit{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

{\cyrsf{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

{\cyrbf{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

{\cyrsc{Zdravstvu\u{i}te! \\
Krasivaya sobaka ili krasivie sobaki. \\
Ob{\cdprime}ekha\u{i}te Rossii! \\
Kreml\cprime -- doma Krushcheva i Gorbach\"{e}va.}}

\end{document} 

And, here is the result, after generating Postscript with latex, and converting that to PNG and cropping it with convert from the ImageMagick suite:

Alternative method, using the Babel package

As an alternative, you can use the Babel package. This allows you to type Cyrillic characters and get them nicely typeset. The problem is that you are required to type Cyrillic! Use the method shown above to get LaTeX to transliterate your ASCII based writing.

Greek, on the other hand, works easily with the Babel package:

\documentclass[letterpaper,12pt]{article}
\usepackage[russian,greek,english]{textcomp}
\usepackage[latin1]{inputenc}
\usepackage[T1,T2A]{fontenc}

\begin{document}

The last language listed will be the active
(or default) one.  The others can be chosen
for large blocks:

\selectlanguage{russian}

Горбачёв

\selectlanguage{greek}

Ellhniko keimeno.

\selectlanguage{english}

You can also insert short pieces of text in
arbitrary languages, even within paragraphs
of a different language:

The capital of Russia is
\foreignlanguage{russian}{Moskva.}

The capital of Greece is
\foreignlanguage{greek}{Ajhna.}

\end{document}


As for the mapping from ASCII input and Greek output:

  a   b   g   d   e   z   h   j   i   k   l   m   n   x   o   p   r   s   c   t   u   f   q   y   w     'a   'e   'h   'i   "i   "u   'o   'u   'w  α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ά έ ή ί ϊ ϋ ό ύ ώ

Also see:

Cyrillic in Postscript

The theory is that you can do something like the following and get Postscript that renders Cyrillic:

%!
%%Creator: Your Name Here
%%BoundingBox: 0 0 792 611
%%
%% Postscript Cyrillic demo
%%
%% Define measurements in millimeters, 1 mm = 2.834645 Postscript point
/mm { 2.834645 mul } def
%% Use the Cyrillic-Italic font.  Could be just Cyrillic, etc:
/Cyrillic-Italic findfont 12 scalefont setfont
%% Move to the location (50mm, 50mm) and Russify my name:
50 mm 50 mm moveto (Robert Vilhelmoviq Kromvell) midshow
showpage 

You have to figure out the quirky character-to-character mapping. Some letters are obvious, just the ASCII letter that is pronounced in a Roman-alphabet language much like the corresponding Cyrillic one is in a Slavic language. Others are not, like these in the following list.

The one that I cannot figure out is the Cyrillic character "ya" or я — if you know how to do this with the ASCII encoding, without remapping your keyboard to a Cyrillic character set, please let me know!

-/_  for  "eh/EH"
j/J  for  "zh/ZH"
y/Y  for  "e-kratkaya/E-KRATKAYA"
[/{  for  "yuri/YURI"
]/}  for  "yu/YU"
h/H  for  "kh/KH"
q/Q  for  "ch/CH"
w/W  for  "sh/SH"
x/X  for  "shch/SHCH"
c/C  for  "ts/TS"
+/\# for  "YAT/yat" 

Cyrillic in Unicode

The real answer is what you find at the Unicode organization's site. I have this HTML table for my own use — I have a copy on my laptop, and I don't have to bother with rendering the Unicode PDF file. Plus, you can see how well your browser renders Unicode... Both Firefox and Konqueror do a fine job on Linux and OpenBSD.

Unicode describes the codes as:
0400-040f — Cyrillic extensions
0410-044f — Basic Russian alphabet
0450-045f — Cyrillic extensions
0460-0481 — Historic letters
0482-0489 — Historic miscellaneous
048a-04f9 — Cyrillic extensions
04fa-04ff — Additions for Nivkh
0500-050f — Komi letters
0510-0513 — Cyrillic extensions
Codes 048a-04ff are mostly for Cyrillic representation of non-Slavic languages like Sami, Azerbaijani, Yakut, Tatar, and so on. 0500-0513 are entirely for Cyrillic representation of Komi, Enets, Khanty, Chuckchi, etc. Read the Unicode pages to see how arcane some of these are, and to get explanations or at least names and language attributions for all the characters.

To use this table: Place the code between &#x and ;. So, the Russian word да is created with:
 &#x0434;&#x0430;

 Basic Russian Alphabet Ѐ 0400 А 0410 Р 0420 а 0430 р 0440 ѐ 0450 Ѡ 0460 Ѱ 0470 Ҁ 0480 Ґ 0490 Ҡ 04a0 Ұ 04b0 Ӏ 04c0 Ӑ 04d0 Ӡ 04e0 Ӱ 04f0 Ԁ 0500 Ԑ 0510 Ё 0401 Б 0411 С 0421 б 0431 с 0441 ё 0451 ѡ 0461 ѱ 0471 ҁ 0481 ґ 0491 ҡ 04a1 ұ 04b1 Ӂ 04c1 ӑ 04d1 ӡ 04e1 ӱ 04f1 ԁ 0501 ԑ 0511 Ђ 0402 В 0412 Т 0422 в 0432 т 0442 ђ 0452 Ѣ 0462 Ѳ 0472 ҂ 0482 Ғ 0492 Ң 04a2 Ҳ 04b2 ӂ 04c2 Ӓ 04d2 Ӣ 04e2 Ӳ 04f2 Ԃ 0502 Ԓ 0512 Ѓ 0403 Г 0413 У 0423 г 0433 у 0443 ѓ 0453 ѣ 0463 ѳ 0473 ҃ 0483 ғ 0493 ң 04a3 ҳ 04b3 Ӄ 04c3 ӓ 04d3 ӣ 04e3 ӳ 04f3 ԃ 0503 ԓ 0513 Є 0404 Д 0414 Ф 0424 д 0434 ф 0444 є 0454 Ѥ 0464 Ѵ 0474 ҄ 0484 Ҕ 0494 Ҥ 04a4 Ҵ 04b4 ӄ 04c4 Ӕ 04d4 Ӥ 04e4 Ӵ 04f4 Ԅ 0504 Ѕ 0405 Е 0415 Х 0425 е 0435 х 0445 ѕ 0455 ѥ 0465 ѵ 0475 ҅ 0485 ҕ 0495 ҥ 04a5 ҵ 04b5 Ӆ 04c5 ӕ 04d5 ӥ 04e5 ӵ 04f5 ԅ 0505 І 0406 Ж 0416 Ц 0426 ж 0436 ц 0446 і 0456 Ѧ 0466 Ѷ 0476 ҆ 0486 Җ 0496 Ҧ 04a6 Ҷ 04b6 ӆ 04c6 Ӗ 04d6 Ӧ 04e6 Ӷ 04f6 Ԇ 0506 Ї 0407 З 0417 Ч 0427 з 0437 ч 0447 ї 0457 ѧ 0467 ѷ 0477 ҇ 0487 җ 0497 ҧ 04a7 ҷ 04b7 Ӈ 04c7 ӗ 04d7 ӧ 04e7 ӷ 04f7 ԇ 0507 Ј 0408 И 0418 Ш 0428 и 0438 ш 0448 ј 0458 Ѩ 0468 Ѹ 0478 ҈ 0488 Ҙ 0498 Ҩ 04a8 Ҹ 04b8 ӈ 04c8 Ә 04d8 Ө 04e8 Ӹ 04f8 Ԉ 0508 Љ 0409 Й 0419 Щ 0429 й 0439 щ 0449 љ 0459 ѩ 0469 ѹ 0479 ҉ 0489 ҙ 0499 ҩ 04a9 ҹ 04b9 Ӊ 04c9 ә 04d9 ө 04e9 ӹ 04f9 ԉ 0509 Њ 040a К 041a Ъ 042a к 043a ъ 044a њ 045a Ѫ 046a Ѻ 047a Ҋ 048a Қ 049a Ҫ 04aa Һ 04ba ӊ 04ca Ӛ 04da Ӫ 04ea Ӻ 04fa Ԋ 050a Ћ 040b Л 041b Ы 042b л 043b ы 044b ћ 045b ѫ 046b ѻ 047b ҋ 048b қ 049b ҫ 04ab һ 04bb Ӌ 04cb ӛ 04db ӫ 04eb ӻ 04fb ԋ 050b Ќ 040c М 041c Ь 042c м 043c ь 044c ќ 045c Ѭ 046c Ѽ 047c Ҍ 048c Ҝ 049c Ҭ 04ac Ҽ 04bc ӌ 04cc Ӝ 04dc Ӭ 04ec Ӽ 04fc Ԍ 050c Ѝ 040d Н 041d Э 042d н 043d э 044d ѝ 045d ѭ 046d ѽ 047d ҍ 048d ҝ 049d ҭ 04ad ҽ 04bd Ӎ 04cd ӝ 04dd ӭ 04ed ӽ 04fd ԍ 050d Ў 040e О 041e Ю 042e о 043e ю 044e ў 045e Ѯ 046e Ѿ 047e Ҏ 048e Ҟ 049e Ү 04ae Ҿ 04be ӎ 04ce Ӟ 04de Ӯ 04ee Ӿ 04fe Ԏ 050e Џ 040f П 041f Я 042f п 043f я 044f џ 045f ѯ 046f ѿ 047f ҏ 048f ҟ 049f ү 04af ҿ 04bf ӏ 04cf ӟ 04df ӯ 04ef ӿ 04ff ԏ 050f