l i n u x - u s e r s - g r o u p - o f - d a v i s
Next Meeting:
July 7: Social gathering
Next Installfest:
Latest News:
Jun. 14: June LUGOD meeting cancelled
Page last updated:
2003 Jun 09 17:19

The following is an archive of a post made to our 'vox-tech mailing list' by one of its subscribers.

Report this post as spam:

(Enter your email address)
Re: [vox-tech] vim and utf-8 support (newbie alert)
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vox-tech] vim and utf-8 support (newbie alert)

On Mon 09 Jun 03,  4:57 PM, Mark K. Kim <markslist@cbreak.org> said:
> On Mon, 9 Jun 2003, Peter Jay Salzman wrote:
> > the language i'm thinking of is hebrew, but with some important issues.
> >
> > 1. i need vowel support.
> > 2. i really want to have mixed hebrew/english
> >
> > i believe taken together, i want to use ISO 10646 which can represent
> > all languages at the same time.
> Unfortunately I don't know the hebrew language so I don't know what the
> difficulties are.  For both Korean and Japanese, we use two-bytes to
> represent a single Asian "character", while maintaining backwards
> compatibility with ASCII by using the MSb on the first character to flag a
> multibyte character.
> If Hebrew does the same thing, there is no technical reason why it can't
> use both English and Hebrew.

fwiw, i happen to know this is the case.   :)    i think you just
described (part of the) utf-8 encoding... the portion of the encoding
which insures backwards compatibility.

the msb being set is also part of utf-8 encoding, and is necessary
because strings in unicode can contain NULL characters, which would
wreak havoc on C string handling.  that's why you don't see utf-2 and
utf-4 encoding on linux.  they don't set the msb byte.

> > as a first stab at getting utf-8 capable xterms, i set:
> >
> >    LC_CTYPE=en_US.UTF-8
> >
> > but wierd things started to happen, like mutt's threading lines turned
> > into really strange characters.  i guess the applications themselves
> > need to be utf-8 aware too.
> UTF-8 is compatible only with the standard ASCII set.  The threading lines
> are in the extended ASCII set (it uses the MSb), not the standard ASCII
> set.  They clash because UTF-8 uses the MSb to signal multibyte character,
> while the extended ASCII set use the MSb.
> I recommend just ignoring it (you get used to it).  If not, I think you
> can tell Mutt to use standard ASCII for threading lines (using +, -, |,
> etc.)
unicode includes mathematical and scientific symbols, so those extended
characters are in there somewhere.  it's probably just a matter of
whether you can mutt which characters to use for threading (and how, of

heh.  i read that unicode even includes klingon and the tengwar.

unicode has everything we need.  it's "just" a matter of getting
software to use it correctly.  but boy oh boy are there alot of details
in that word "just"...   :(

> It's one of the reasons I have WindowsXP.  The international language
> support is so amazing.  I can read multi-language data file with so much
> ease.  I've seen Windows2000 also do a very nice job.

you're breaking my heart...   :(


GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
vox-tech mailing list

LUGOD Group on LinkedIn
Sign up for LUGOD event announcements
Your email address:
LUGOD Group on Facebook
'Like' LUGOD on Facebook:

Hosting provided by:
Sunset Systems
Sunset Systems offers preconfigured Linux systems, remote system administration and custom software development.

LUGOD: Linux Users' Group of Davis
PO Box 2082, Davis, CA 95617
Contact Us

LUGOD is a 501(c)7 non-profit organization
based in Davis, California
and serving the Sacramento area.
"Linux" is a trademark of Linus Torvalds.

Sponsored in part by:
O'Reilly and Associates
For numerous book donations.