Thread

Topic: Unicode source character set (Was: pi, epsilon and others)

Author: kanze@gabi-soft.de (James Kanze)
Date: Sun, 1 Dec 2002 18:40:20 +0000 (UTC) Raw View

K.Hagan@thermoteknix.co.uk ("Ken Hagan") wrote in message
news:<as4qog$1m5$1$8302bc10@news.demon.co.uk>...
> John Nagle <nagle@animats.com> wrote in message
> > I'd argue for a standardized name for the value of "pi", at least.

> James Kanze <kanze@gabi-soft.de> wrote:
> >And that the symbol name should be std::\u03C0, of course.

> "Ross Ridge" <rridge@calum.csclub.uwaterloo.ca> wrote...
> > I wish I knew if you were serious or not.

> But of course he's serious. :)

I'm always serious.

Except when I'm joking, of course.  (Regretfully, I generally forget the
smiley to indicate that I'm joking.  As in the preceding paragraph.)

Seriously: standard C++ already uses too many characters, and I would
strongly oppose any additions in the standard library that use
characters other than those in the basic character set.

> If I might lift Thomas Mang's proposal in the thread "Three features
> you would like to have not in C++" (The argument presumably applies to
> C as well)...

>  > Interestingly, it would be quite easy to write a program that takes
>  > as input some C++ code with the 'explicit' syntax, and produce as
>  > output C++ code that has 'implicit' syntax.

> It would be equally easy to write a program that took source files
> written in UTF-8 and wrote source files that used things like
> std::\u03C0. James may already have one.

Not as a stand-alone program, but I do have a filtering wstreambuf which
works in the opposite direction: the output contains no \u or \U
sequences.

> I'd like to suggest building this facility into the preprocessor.

> Microsoft support "#pragma code_page(n)" in their resource compiler,
> which directs its preprocessor to interpret all subsequent bytes in
> the code page "n". The directive probably shouldn't apply to files
> subsequently #include-d.

We've already discussed the problem of how a compiler should handle
characters in the extended character set in one or more threads in the
past.  The pragma is a good idea.  Or would be, if it had a standard
name; "code_page" and the corresponding numeric values are really too
Microsoft specific.

> With this support in place, it would be fairly easy for IDEs and
> editors to read and write source files with that pragma as their first
> line, and let programmers play with the whole Unicode character set
> without ever seeing "std::\u03C0".

IMHO, this was the intent.  IMHO, it was also the intent with regards to
trigraphs.

At present, regrefully, support for both is at about the same level.
And in portable code, you don't use trigraphs, and you don't use
characters outside of the basic character set.  I could almost live with
this for symbol names, but it means having to hack a bit for any string
constants.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: K.Hagan@thermoteknix.co.uk ("Ken Hagan")
Date: Thu, 28 Nov 2002 20:03:29 +0000 (UTC) Raw View

John Nagle <nagle@animats.com> wrote in message
> I'd argue for a standardized name for the value of "pi", at least.

James Kanze <kanze@gabi-soft.de> wrote:
>And that the symbol name should be std::\u03C0, of course.

"Ross Ridge" <rridge@calum.csclub.uwaterloo.ca> wrote...
> I wish I knew if you were serious or not.

But of course he's serious. :)

If I might lift Thomas Mang's proposal in the thread "Three features you
would like to have not in C++" (The argument presumably applies to C as
well)...

 > Interestingly, it would be quite easy to write a program that takes
 > as input some C++ code with the 'explicit' syntax, and produce as
 > output C++ code that has 'implicit' syntax.

It would be equally easy to write a program that took source files
written in UTF-8 and wrote source files that used things like
std::\u03C0. James may already have one.

I'd like to suggest building this facility into the preprocessor.

Microsoft support "#pragma code_page(n)" in their resource compiler,
which directs its preprocessor to interpret all subsequent bytes in the
code page "n". The directive probably shouldn't apply to files
subsequently #include-d.

With this support in place, it would be fairly easy for IDEs and editors
to read and write source files with that pragma as their first line, and
let programmers play with the whole Unicode character set without ever
seeing "std::\u03C0".

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]