Text Tool and Unicode

 From:  Michael Gibson
2464.5 In reply to 2464.4 
Hi Tim, thanks for the link.

I've looked into this today and unfortunately it appears that there are limitations in Windows which prevent this from working.

MoI calls a Windows function called GetGlyphOutline to retrieve the geometry for a given character glyph.

Previously, MoI was not decoding "surrogate pairs" properly so it would see Unicode characters above U+FFFF as 2 garbage characters instead of as a single character value.

I have now fixed this part of MoI up for the next v2 beta so that it can correctly decode character values above U+FFFF now in the Text command. However, when MoI calls the Windows function GetGlyphOutline with a character greater than FFFF, the Windows function ignores the high bits, it only seems to use the low 16 bits.

So for example when MoI requests the geometry for character U+10024 , instead Windows decides to give the geometry for character U+0024 instead.

I've got it all set up now so it passes the correct value to Windows so hopefully in the future if Microsoft can correct this problem it will then start to work. But until then there doesn't seem to be anything that I can do about it, short of developing a completely custom font parser that hits the font data directly rather than using the Windows API call. Unfortunately it would take a significant amount of work to accomplish that. There are actually some libraries out there that could help with this part but even incorporating a library can still take a fair amount of time.

If you have any other programs out there that are able to extract geometry from Unicode characters above U+FFFF, please let me know and maybe I can see what they are doing to accomplish that. But my guess is that other programs will likely have the same issue as MoI.

Other parts of Windows are able to understand the special characters such as the font rasterizer, but it seems that this specific GetGlyphOutline function with the GGO_NATIVE flag unfortunately was not updated along with the rest of Windows when Unicode was expanded to have these special new characters. The original Unicode 1.0 specification did not include these characters and only had characters with values up to U+FFFF.

- Michael