unicode and font files question

Hi there;

We have an application running under QNX 4.25 / Photon 1.14. We’d like to
add foreign language support. Most of our draw operations for a significant
part of the system pass through the function: PgDrawText.

To summarize my questions at the start:

  1. How do I make the system show me specific characters from a given
    character set?

  2. How do I find out the relationship between 16 bit Unicode values and the
    characters that will be displayed for them.

I’ve played around with this, and I’ve had some luck making non-US
characters appear.

The approach I’ve tried is:

static PxTransCtrl * myTrans;
int srctaken, dstmade;
char str[SOME_BYTES]; // Input from calling function
char utf[1024 * MB_LEN_MAX];

myTrans = PxTranslateSet(NULL, “ISO_8859-2:1987”); // done just once

PxTranslateToUTF(trans, str, strlen(str), &srctaken, utf, (1024 *
MB_LEN_MAX), &dstmade);
PgSetFont(“pcterm12”);
PgDrawText(utf, dstmade, &Pos, Pg_BACK_FILL);

I can make this give me ASCII output kinds of US characters when I stay
below 0x7F for input characters. This is just as I expect. When I give it
characters above 0x7F the data out of the PxTranslateToUTF starts doing it’s
translation thing, and the characters coming out the PgDrawText function
take on that international look I’m after. All well and good.

The big question is: how do I make it show me specific characters from a
specific set? An interesting set to look at would be Korean, or Cyrillic.

A limitation I can see here is that the data I’m feeding my function are 8
bit values. How do I find out what characters go with 16 bit Unicode values?
Is there a way to look at a font and see the characters in it? Some of the
font files in the /usr/photon/font directory look kind of small (some as
small as 3k). No way are they completely populated with 64k characters.

I can look at the characters that come out of the PxTranslateToUTF function.
They look like normal ASCII characters as long as I stay below 0x7F (I
expected this).

When I put characters above 0x7F into the str array, then I get “translated”
characters out the other side. This is again kind of what I would expect.

Is there a way to look at a font file and see what characters are available
in it?

Thanks for any help you can give.

Steve Shumway
Software Engineer
Facts, Inc.

sshumway@facts-inc.com

  1. How do I make the system show me specific characters from a given
    character set?

  2. How do I find out the relationship between 16 bit Unicode values and the
    characters that will be displayed for them.

I’m a little confused about what you are asking for, but here is
a brief summary.

Unicode is a very large charactor code set that is predefined.
It includes may separate charactor sets. Unicode 16 bit charactors
must be converted to an 8 bit code called UTF8 before displaying
in a text widget. UTF8 is not so much a translation as an unrolling
of the Unicode. You can algorithmically translate between Unicode
and UTF8 and back. QSSL provides routines that do this. Take a look
at: mblen(), mbstowcs(), mbtowc(), wcstombs(), wtcomb()

While a Unicode charactor is always 1 16 bit charactor, UTF8
can be 1,2 or 3 8 bit charactors. As you’ve found, 00-0x7f
UTF8 is equivalent to 0000-007f Unicode.

In order to display a specific Unicode/UTF8 charactor, you first
must be using a font that provides that charactor. I found a
fairly complete listing describing Unicode somewhere on the internet.
Here is a short section that I believe pertains to Korean charactors.

If you send me email, I’ll send you the entire listing.

3260;CIRCLED HANGUL KIYEOK;So;0;L; 1100;;;;N;CIRCLED HANGUL GIYEOG;;;;
3261;CIRCLED HANGUL NIEUN;So;0;L; 1102;;;;N;;;;;
3262;CIRCLED HANGUL TIKEUT;So;0;L; 1103;;;;N;CIRCLED HANGUL DIGEUD;;;;
3263;CIRCLED HANGUL RIEUL;So;0;L; 1105;;;;N;CIRCLED HANGUL LIEUL;;;;
3264;CIRCLED HANGUL MIEUM;So;0;L; 1106;;;;N;;;;;
3265;CIRCLED HANGUL PIEUP;So;0;L; 1107;;;;N;CIRCLED HANGUL BIEUB;;;;
3266;CIRCLED HANGUL SIOS;So;0;L; 1109;;;;N;;;;;
3267;CIRCLED HANGUL IEUNG;So;0;L; 110B;;;;N;;;;;
3268;CIRCLED HANGUL CIEUC;So;0;L; 110C;;;;N;CIRCLED HANGUL JIEUJ;;;;
3269;CIRCLED HANGUL CHIEUCH;So;0;L; 110E;;;;N;CIRCLED HANGUL CIEUC;;;;
326A;CIRCLED HANGUL KHIEUKH;So;0;L; 110F;;;;N;CIRCLED HANGUL KIYEOK;;;;
326B;CIRCLED HANGUL THIEUTH;So;0;L; 1110;;;;N;CIRCLED HANGUL TIEUT;;;;
326C;CIRCLED HANGUL PHIEUPH;So;0;L; 1111;;;;N;CIRCLED HANGUL PIEUP;;;;
326D;CIRCLED HANGUL HIEUH;So;0;L; 1112;;;;N;;;;;
326E;CIRCLED HANGUL KIYEOK A;So;0;L; 1100 1161;;;;N;CIRCLED HANGUL GA;;;;
326F;CIRCLED HANGUL NIEUN A;So;0;L; 1102 1161;;;;N;CIRCLED HANGUL NA;;;;
3270;CIRCLED HANGUL TIKEUT A;So;0;L; 1103 1161;;;;N;CIRCLED HANGUL DA;;;;
3271;CIRCLED HANGUL RIEUL A;So;0;L; 1105 1161;;;;N;CIRCLED HANGUL LA;;;;
3272;CIRCLED HANGUL MIEUM A;So;0;L; 1106 1161;;;;N;CIRCLED HANGUL MA;;;;
3273;CIRCLED HANGUL PIEUP A;So;0;L; 1107 1161;;;;N;CIRCLED HANGUL BA;;;;
3274;CIRCLED HANGUL SIOS A;So;0;L; 1109 1161;;;;N;CIRCLED HANGUL SA;;;;
3275;CIRCLED HANGUL IEUNG A;So;0;L; 110B 1161;;;;N;CIRCLED HANGUL A;;;;
3276;CIRCLED HANGUL CIEUC A;So;0;L; 110C 1161;;;;N;CIRCLED HANGUL JA;;;;
3277;CIRCLED HANGUL CHIEUCH A;So;0;L; 110E 1161;;;;N;CIRCLED HANGUL CA;;;;
3278;CIRCLED HANGUL KHIEUKH A;So;0;L; 110F 1161;;;;N;CIRCLED HANGUL KA;;;;
3279;CIRCLED HANGUL THIEUTH A;So;0;L; 1110 1161;;;;N;CIRCLED HANGUL TA;;;;
327A;CIRCLED HANGUL PHIEUPH A;So;0;L; 1111 1161;;;;N;CIRCLED HANGUL PA;;;;
327B;CIRCLED HANGUL HIEUH A;So;0;L; 1112 1161;;;;N;CIRCLED HANGUL HA;;;;
327F;KOREAN STANDARD SYMBOL;So;0;L;;;;;N;;;;;
3280;CIRCLED IDEOGRAPH ONE;No;0;L; 4E00;;;1;N;;;;;
3281;CIRCLED IDEOGRAPH TWO;No;0;L; 4E8C;;;2;N;;;;;
3282;CIRCLED IDEOGRAPH THREE;No;0;L; 4E09;;;3;N;;;;;

Mitchell Schoenbrun --------- maschoen@pobox.com

I now have a somewhat clearer understanding of Unicode, thank-you.

I think I now understand how this works. Let me see if I have this right:

  1. A momentary rehash: It looks to me like passing a single byte character
    in the range 0 through 0x7F into PxTranslateToUTF along with the PxTransCtrl
    structure (from PxTranslateSet) gives back normal old ASCII chars.

  2. Now to the new stuff: If I pass a character in the range 0x80 through
    0xFF into PxTranslateToUTF the same way, it gets converted into between 1
    and 3 UTF8 bytes.

  3. I then pass these UTF8 bytes into the PgDrawText function. If I have a
    font file selected which contains the specifically requested characters,
    they will be drawn / displayed.

One of my original questions (although poorly stated perhaps) is how do I
know what characters are actually in a font file? What do they look like?

A new question would seem to be: how do the characters in the range of 0x80
through 0xFF map into the Unicode character set:? Clearly this mapping is
performed by the PxTranslateSet function. Given this translation operation,
how do we know where for instance 0x80 winds up mapped to? This brings up
the next question: 0x80 through 0xFF represents 128 characters. Is this
enough for say Japanese?, Chinese?

Steve Shumway
sshumway@facts-inc.com


Mitchell Schoenbrun <maschoen@pobox.com> wrote in message
news:Voyager.010309180908.18165F@schoenbrun.com

  1. How do I make the system show me specific characters from a given
    character set?

  2. How do I find out the relationship between 16 bit Unicode values and
    the
    characters that will be displayed for them.

I’m a little confused about what you are asking for, but here is
a brief summary.

Snip for brevity

Mitchell Schoenbrun --------- > maschoen@pobox.com

Previously, Steve Shumway wrote in qdn.public.qnx4.photon:

One of my original questions (although poorly stated perhaps) is how do I
know what characters are actually in a font file? What do they look like?

I don’t really know the answer to this. There might have been a utility
around that displayed the contents.

A new question would seem to be: how do the characters in the range of 0x80
through 0xFF map into the Unicode character set:? Clearly this mapping is
performed by the PxTranslateSet function.

When you say “charactoers in the range of 0x80 through 0xFF” what
specifically do you mean. There is more than one encoding of
these charactors.

PxTranslateSet() will answer this question if you know the code set that
you are translating from to Unicode.

Given this translation operation,
how do we know where for instance 0x80 winds up mapped to?

That would depend on what charactor you mean by 0x80.

This brings up
the next question: 0x80 through 0xFF represents 128 characters. Is this
enough for say Japanese?, Chinese?

No and why do you think it would need to be? Once translated to Unicode,
you are working in a set of 64K charactors.

I hope you are not confusing the two translations we are talking about.
PxTranslateSet() is for translating charactors from one charactor set
to another. This is quite different from translating Unicode back and
forth to UTF8. Unicode and UTF8 are different representations of the
same charactor set. One is 16 bit per charactor, and the other is 1-3
8 bit bytes per charactor.



Mitchell Schoenbrun --------- maschoen@pobox.com