Taming the wild unicode
December 2nd, 2010 10:39:10 pm pst by Sterling Camden
This post serves a dual purpose:
- Introduce a new utility, urxvt-selfont, which allows you to select a font for rxvt-unicode (urxvt)
- Share some tips and tricks for getting Unicode to really work in urxvt
urxvt-selfont: select font for urxvt
Setting the fonts to choose from in urxvt isn’t that difficult. You either modify .Xdefaults to add/modify the resource URxvt.font, or you can select the fonts at runtime using
printf '\33]50;%s\007' fontname
For XFT fonts, the fontname must be preceded by ‘xft:’. To list fonts, you can use fc-list.
To save iteration time when searching for acceptable fonts, I created the Ruby script urxvt-selfont, which you can download from one of the links at the bottom of this post. It presents the user with a list of fonts and their available style combinations as reported by fc-list. If the user selects one, urxvt-selfont then selects it for the current urxvt and prints its name on stdout. It can also take an optional -p switch (or --pixelsize, for the GNUly-initiated verbosophiles) to specify the desired pixelsize for the font. Otherwise, the default size is used, which for some fonts can be quite large.
Unicode support
The urxvt terminal contains excellent support for Unicode, but you probably won’t get the desired result from the default configuration. After I got it all figured out, it seems obvious and follows the documentation. But since someone as brilliant as I (judge for yourself where that places the bar) required several attempts at this problem before getting it right, perhaps someone else out there is at least as stupid as I could benefit from my experience.
First and foremost, you must set LC_CTYPE in the environment before you start urxvt. You can get uxterm (the other Unicode X terminal) to behave without this environment variable, but urxvt seems to require it. Mine, for instance, is now set on login as follows:
export LC_CTYPE=en_US.UTF-8
This tells urxvt that I generally use US English, and more importantly that my encoding is UTF-8. With this setting, it is not necessary to manipulate character sets in utilities like mutt — urxvt handles it for you.
The second step involves selecting fonts that support all of the Unicode code points that you want to be able to see. A good test for this can be found in the CSV download on this page. Simply cat the file in you urxvt session to see how many characters print correctly versus those that present a little cell-sized square.
The Code2000 font provides the best coverage of any that I’ve found, but it isn’t a very legible font for programmers like myself. I much prefer DejaVu Sans Mono for its Latin character glyphs (easy to distinguish 0/O, 1/l, etc.) and its clean anti-aliasing within urxvt. DejaVu provides pretty strong Unicode coverage, but it doesn’t cover all of the scripts that I like to be able to read — for instance, Hebrew.
Fortunately, urxvt allows you to use both. Here’s what my URxvt.font resource looks like:
URxvt.font: xft:DejaVu Sans Mono:pixelsize=12,xft:Code 2000:pixelsize=10
Allow me to parse that for you. The comma delimits multiple fonts that can be selected automatically by urxvt based on their support for each glyph. The selection gives priority to the first font specified, but if it does not provide a specific glyph then the next font is queried until one is found that does provide the glyph. Failing that, urxvt falls back to the first font.
If one of the fonts specified does not exist, urxvt silently ignores it. So watch out for spelling and punctuation errors! Those led me down many a blind alley.
So, with this configuration, urxvt prefers the DejaVu Sans Mono font. If it doesn’t provide one of the glyphs displayed, however, then urxvt will use Code2000 if that font provides it.
You may notice the different pixelsize qualifiers on the two fonts. The first font always determines the size of urxvt’s character cells. The glyphs in Code2000 are in some cases (especially Hebrew) too large for DejaVu’s cell size. I therefore shrink Code2000′s glyph size a bit so it will fit with perceptible spacing.
Here’s the result, using part of the example file I mentioned above:

You can see that the only script in this example that remains completely unsupported is the one used in Bhutan. One character sems to be missing from the Ethiopian set, but otherwise I’ve got full coverage for these examples.
As far as I can tell, urxvt does not support right-to-left text rendition, so for instance in Hebrew and Arabic you have to emit the characters in reverse order, left to right. In the image above you can see that the examples do not do this, so you have to read them bass-ackwards.
Posted in Ruby, Unix | 3 Comments » RSS 2.0






[...] Taming the wild unicode — Chip’s Tips for DevelopersOK, maybe that's a 'corny' titleTags: fonts urxvt unicode chipstips ruby unix [...]
[...] characters. I was pleased to see the Hebrew characters rendered properly in my mutt session, having recently reconciled my terminal window with Unicode. However, having studied Hebrew in college, I could see immediately that the Hebrew text was being [...]
[...] characters or international fonts beyond what is default in Xorg. To that end, there’s a very good series of posts at Chip’s Tips on getting urxvt to show CJK and other letter sets. And if that’s not enough, the Gentoo [...]