Topic: Once again a Plea for proper International Character support in C++.


Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Fri, 18 Oct 2002 14:14:12 CST
Raw View
rmaddox@isicns.com (Randy Maddox) writes:

> > Yes, this is what windows NT/2000/XP does. No, it is not
> > trivial. For characters outside the basic (7-bit) ASCII character
> > set, the conversion depends on the current code page. Therefore,
> > if you change the code page, you might not be able to read back
> > the file, as the necessary characters might have different
> > (narrow) code points, or might not be present.
>
> Yes, but if the user changes the code page all bets are off anyway.

The problem is worse: Some file names are not accessible through the
narrow string API, but still might be accessible to the wide string
API. So even though other programs on the system can happily access
the files, standard C++ programs cannot.

> And since that is a user-settable function at the OS level, it's
> completely outside the scope of C++.  The user might also change
> permissions on the file so that we couldn't read it back too.  There
> are any number of other scenarios, but the common factor is that they
> are outside the scope of C++ and so do not, IMHO, constitute an
> argument against supporting wide character file names.

If it were alone for users changing some system setting, I'd
agree. However, users can be confronted with files that that they can
access without problems when using programs not written in standard
C++.

> We are already implicitly using wide-character file names in any C++
> code that runs under NT-based versions of Windows anyway, and thus are
> already subject to the code page change problem, so how can explicit
> C++ support for wide character file names make this situation any
> worse?

I don't understand this remark. Exposing wide character file names
would not make the situation worse, but better. Also, the wide strings
are not affected by code page changes - the narrow strings are.

It is the wide strings that are native to the system: the narrow
string file names are derived.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 18 Oct 2002 18:05:00 CST
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message
news:<1034880394.30252@master.nyc.kbcfp.com>...
> James Kanze wrote:
>  > I wouldn't be surprised if Java didn't systematically convert the
>  > filename to narrow characters even under Windows.

> I tried the sample Java notepad application in Windows.  I gave it a
> filename full of foreign characters, including Chinese, and it
> faithfully created a file with that name, at least as displayed by
> Explorer.

> So it looks like Java has taken the same hacker's approach that lots
> of us here are advocating - do it wide if you can, and do something
> arbitrary and possibly appropriate if you can't.

Maybe.  To be truthful, I'd like to know what Java really does.  Both on
Windows and in the case of Unix (Solaris).  But whatever it does is
existing practice, of sorts, which is a significant advance.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 18 Oct 2002 18:05:08 CST
Raw View
eldiener@earthlink.net ("Edward Diener") wrote in message
news:<hMNq9.35079$OB5.2747089@newsread2.prod.itd.earthlink.net>...
> "James Kanze" <kanze@gabi-soft.de> wrote in message
> news:d6651fb6.0210140503.3c2542ba@posting.google.com...

> > I also noticed something else when doing these tests.  There is
> > another issue that no one seems to have raised.  If I understand
> > correctly, Windows also offers native wide character files.  But
> > neither Java nor C++ can read or write directly using wide
> > characters -- both convert systematically to bytes on input and
> > output.

> Why can not C++ read or write wide characters on Windows ? Please
> explain this.

It can.  In the same way it supports Posix threads under Solaris: a
system specific extension, for something that is not generally
available.  The fact that one particular operating system has an
interface which is not available via standard C or C++ functions is NOT
an argument for extending the standard.  (There are valid arguments for
wide character filenames, and I'd be for them, if I only had some idea
what to expect from an implementation.  But the fact that Windows
happens to have a direct system call isn't an argument, anymore than the
fact that Unix has a function named fork is an argument for supporting a
specific model of multiple processes.)

> I see nothing preventing a C++ implementation from using the Windows
> API to read and write wide characters for a Windows file that consists
> of wide characters. Similarly there is nothing preventing a C++
> implementation from opening or creating a wide character filename file
> on Windows using the Windows API. And finally, opening/creating a wide
> character filename has nothing to do with reading/writing wide
> characters in a file on Windows.

So we're fully agreed, and there is no need to modify anything with
regards to the standard.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 17 Oct 2002 18:38:03 +0000 (UTC)
Raw View
anthony.williamsNOSPAM@anthonyw.cjb.net wrote in message news:<d6qau63g.fsf@anthonyw.cjb.net>...
> rmaddox@isicns.com (Randy Maddox) writes:
>
> > The, IMHO, trivial issue is going from narrow to wide characters.
> > This is trivial because this is a well defined conversion.  My
> > understanding of Windows, at least the NT based versions, is that this
> > is the conversion done now whenever we open a file with a narrow
> > character name.  Internally the name is widened, and the wide name is
> > used to open the file.
>
> Yes, this is what windows NT/2000/XP does. No, it is not trivial. For
> characters outside the basic (7-bit) ASCII character set, the conversion
> depends on the current code page. Therefore, if you change the code page, you
> might not be able to read back the file, as the necessary characters might
> have different (narrow) code points, or might not be present.

Yes, but if the user changes the code page all bets are off anyway.
And since that is a user-settable function at the OS level, it's
completely outside the scope of C++.  The user might also change
permissions on the file so that we couldn't read it back too.  There
are any number of other scenarios, but the common factor is that they
are outside the scope of C++ and so do not, IMHO, constitute an
argument against supporting wide character file names.

In the particular case of changing the code page, that problem already
exists if the user chooses to use any character outside the 7-bit
ASCII range in a file name, regardless of whether C++ is in the
picture or not.

We are already implicitly using wide-character file names in any C++
code that runs under NT-based versions of Windows anyway, and thus are
already subject to the code page change problem, so how can explicit
C++ support for wide character file names make this situation any
worse?

Randy.

>
> Anthony
> --
> Anthony Williams
> Senior Software Engineer, Beran Instruments Ltd.
> Remove NOSPAM when replying, for timely response.
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: anthony.williamsNOSPAM@anthonyw.cjb.net (Anthony Williams)
Date: Fri, 18 Oct 2002 11:48:55 +0000 (UTC)
Raw View
rmaddox@isicns.com (Randy Maddox) writes:

> anthony.williamsNOSPAM@anthonyw.cjb.net wrote in message
> news:<d6qau63g.fsf@anthonyw.cjb.net>...
> > rmaddox@isicns.com (Randy Maddox) writes:
> >
> > > The, IMHO, trivial issue is going from narrow to wide characters.
> > > This is trivial because this is a well defined conversion.  My
> > > understanding of Windows, at least the NT based versions, is that this
> > > is the conversion done now whenever we open a file with a narrow
> > > character name.  Internally the name is widened, and the wide name is
> > > used to open the file.
> >
> > Yes, this is what windows NT/2000/XP does. No, it is not trivial. For
> > characters outside the basic (7-bit) ASCII character set, the conversion
> > depends on the current code page. Therefore, if you change the code page, you
> > might not be able to read back the file, as the necessary characters might
> > have different (narrow) code points, or might not be present.
>
> Yes, but if the user changes the code page all bets are off anyway.
> And since that is a user-settable function at the OS level, it's
> completely outside the scope of C++.  The user might also change
> permissions on the file so that we couldn't read it back too.  There
> are any number of other scenarios, but the common factor is that they
> are outside the scope of C++ and so do not, IMHO, constitute an
> argument against supporting wide character file names.

> We are already implicitly using wide-character file names in any C++
> code that runs under NT-based versions of Windows anyway, and thus are
> already subject to the code page change problem, so how can explicit
> C++ support for wide character file names make this situation any
> worse?

You seem to be reading my comments in exactly the opposite way to how they
were intended. My point is that use of narrow character filenames on Windows
NT produces dependencies on the code page. In particular, each thread in a
program could use a different code page. Therefore, by adding wide character
support to C++, we allow programmers to eliminate this problem on NT by using
wide characters, which are not locale/code page dependent, for filenames. At
the same time, the locale-specific wide-to-narrow conversions necessary on
systems that don't support native wide character filenames don't introduce any
problems not already present with the use of narrow character filenames on a
major platform (Window NT).

Anthony
--
Anthony Williams
Senior Software Engineer, Beran Instruments Ltd.
Remove NOSPAM when replying, for timely response.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 14 Oct 2002 17:27:19 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message
news:<1034361105.892453@master.nyc.kbcfp.com>...
> James Kanze wrote:
> > Can you guarantee me that if wide character filenames are adopted,
> > Sun, g++, Dinkumware and the STLport will implement them with the
> > same semantics, so that they the different libraries will be
> > interchangeable? If not, we have a problem which must be addressed.

> Easy enough. Java runs on all of these platforms, and Java strings and
> thus filenames are wide. So C++ implementors will adopt whatever
> convention Java uses.

Which is?  Or is this some attempt to reduce everything to the lowest
common denominator:-).

Seriously, the idea sounded reasonable.  (I'll ignore the fact for the
moment that on my platform, C++ wchar_t is 32 bits, whereas Java char is
only 16 bits.)  For once, some one else has done the experimenting for
us.  As far as I can tell, Java doesn't document what they do here,
however.  So I did some experimenting.  With several interesting
results:

  - First, the interpretation of my String constants depended on my
    environment variables during compilation.  This has nothing to do
    with the issue at hand, directly, but it is interesting with regards
    to other threads that occured: the legality of a program depends on
    the environment in which it is compiled, and the error message is an
    internal compiler error.

    What C++ compilers do in this case varies, but it's never worse:-).

  - I changed the program so that the String constants used UCN's for
    everything that wasn't in the (C) basic character set, and tried
    using two different names "\u00E9t\u00E9" ("   t   ", displayable with
    the font I use) and "\u20A0" (a Euro character, not displayable with
    the font I use).  I then ran the program, changing LC_CTYPE each
    time.  (I would have liked to have experimented with an environment
    and a font using 8859-15, where the Euro character would be
    displayable, but the necessary locales aren't installed on this
    machine.)

    Apparently, under Solaris, Java converts the filename according to
    the LC_CTYPE shell variable, replacing non-representable characters
    with a '?'.  I suppose that I could live with this, except that
    non-representable characters should cause an error, but I'm not sure
    that it is universally accepted as the correct way a quality
    implementation should behave.  Especially since if I set LC_TYPE to
    "en_US.UTF-8", Java generated filenames that no other program could
    exploit -- like it or not, the other programs, like ls, don't
    understand multbyte characters.  (This may, however, be du to the
    fact that I don't have a Unicode font available on this machine.
    But the formatting of ls doesn't seem to take multibytes into
    consideration, even when LC_CTYPE says that UTF-8 is being used.)

Again, I'm not fundamentally against wide character filenames.  But I am
against forcing implementors to implement something which they will then
have to support, when we don't really know what is needed.

I also noticed something else when doing these tests.  There is another
issue that no one seems to have raised.  If I understand correctly,
Windows also offers native wide character files.  But neither Java nor
C++ can read or write directly using wide characters -- both convert
systematically to bytes on input and output.  (It would also be
interesting to know whether Java passes wide character filenames to
Windows on open or not.  I wouldn't be surprised if Java didn't
systematically convert the filename to narrow characters even under
Windows.)

> This illustrates more clearly than ever that the rules of association
> between wide and narrow filenames belong to the platform ABI, not to
> the programming language.

No one is saying anything else.  What I am saying is that we should have
some idea what the goals of these rules are before requiring
implementors to invent them.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 14 Oct 2002 17:28:02 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message
news:<1034361398.476589@master.nyc.kbcfp.com>...
> James Kanze wrote:
> > If wide character filenames are adopted, the newer versions of the
> > libraries will implement them.  But what will they implement?  Will
> > the g++ library still have the same semantics as the Sun libraray?
> > Could I replace one of the libraries with a library from Dinkumware
> > or the STLport with no change in semantics?

> You can get Java systems from more than one maker on UNIX (eg., Sun
> and IBM). Java strings and thus filenames are wide. We can just adopt
> whatever solution Java used to avoid the worry you state.

You mean: if the characters in the name aren't agreeable to the
compile-time environment, the compiler aborts?  (Sorry, I couldn't
resist:-).  I know it's not an argument.)

Seriously, this is a sort of compatibility.  Sort of leveling by the
botton ("nivellement par le bas"), but it wouldn't be the first time.
On the other hand, I'm not sure that it would satisfy some of the people
who want wide character filenames.  My impression was that some people
wanted an interface to the specific OS function, and not that the C++
translated them and systematically called open with a narrow character
filename, even on systems which supported wide character filenames.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Mon, 14 Oct 2002 17:54:15 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message news:<1034350225.340511@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
> > On the other hand, a context-independent mapping ensures that if you
> > open the file today using wchar_t sequence X, you will be able to open
> > the same file tomorrow using that same sequence X (assuming the file
> > hasn't been deleted or moved of course.)
>
> I don't understand why this is the programming language's problem to
> solve, though. And in any case, this condition of yours doesn't hold
> for char filenames, so why should it hold for wide names?
>
> In VMS files have version numbers. When you modify a file, the old
> version is kept, and a new copy is created with a higher version
> number. When you open a file with no version specified, you get the
> one with the highest version. So if I open "a.txt" today, and then
> someone changes it, and I open "a.txt" tomorrow, I am not getting
> the same file, even though that same file still exists (and is
> accessible by explicitly giving the version).

Odd example. On non-VMS, if you make a file named a.txt and someone
modifies it, when you open a.txt you get the modified file. The
"condition of mine" doesn't guarantee that file contents are stable,
just that you are able to open the same file.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Mon, 14 Oct 2002 21:11:39 +0000 (UTC)
Raw View
On Mon, 14 Oct 2002 16:45:48 +0000 (UTC), James Kanze <kanze@gabi-soft.de> wrote:
>  news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in
> > On 11 Oct 2002 17:10:01 GMT, James Kanze <kanze@gabi-soft.de> wrote:
> > [...]
> > >    - The issue of what is expected, for portability between compilers on
> > >      the same system.
>
> > Compilers on the same system don't even need to agree on the same char
> > and wchar_t sizes.
>
>  The standard doesn't require it, but all do.  That's because we know
>  what is expected of a quality implementation for a given platform, and
>  no one intentionally implements bad quality.

We didn't know what is expected before the implementations introduced
it, though.

> > I don't see how the standard could impose portability of
> > filenames. Even the notion of "same system" can be rather vague.
>
> > >    - Consider the following code:
>
> > >          std::string filename ;
> > >          std::cin >> filename ;
> > >          std::ifstream file( filename.c_str() ) ;
>
> > >      This is pretty straight forward, and I think that we all know
> > >      what to expect of it.
>
> > I'm not sure how straightforward it is. More importantly, I don't
> > think the standard provides a lot of guarantees for that code.
>
>  It would help discussion if people would actually read what others are
>  writing.  No one said that the standard guaranteed any more than that
>  these functions exist.  And there are doubtlessly platforms where the
>  usual expectations about this code doesn't apply.

My point was that nothing more is expected from the standard with regard
to a wide-character filename interface by those who wish for it.
When there are, on a given system, legitimate expectations on what the
equivalent of the above code for wide character strings will do,then a
quality implementation will quite likely come up to those expectations.

> > The system defines what a correct wide character filename is, the same
> > as it does with narrow character filenames.  Where do you think lies
> > the difference, with regard to the above code?
>
>  Systems have defined what the correct narrow character filenames can be.
>  We know what to expect.  They haven't defined anything for wide
>  characters.  We don't know what to expect.

We know what to expect on Windows, I think there's no debate about that.
On Unix, the convention I'm familiar with is to use the multibyte encoding
specified by the locale, since this is what you get when you work with
files in a shell. (For example, on my machine at home, I use ISO-8859-1,
UTF-8 and EUC-JP filenames, depending whether I work with latin1 xterm
UTF-8 xterm or EUC-JP kterm. Programs like ls and the shell itself
behave as expected.)

If on some system there's no convention for storing filenames from
a codeset which is the same as or a superset of the codeset the
implementation chooses for wchar_t for a given locale, then the
implementation can either make the encoding of wide-character filenames
user-selectable or restrict them to the codeset of narrow-character
filenames. This is fine, since there are no conventions that would
legitimate any expectations.

One might argue that on such systems, there is no advantage in using
wide-character filenames. I quite agree, but no one would think there
is and use them when writing programs for such systems in the first
place.

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bdawes@acm.org (Beman Dawes)
Date: Tue, 15 Oct 2002 00:13:09 +0000 (UTC)
Raw View
news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in message news:<slrnaqlo05.1dq.news_comp.std.c++_expires-2002-10-01@nightrunner.nmhq.net>...
> On Mon, 14 Oct 2002 15:05:34 +0000 (UTC), Beman Dawes <bdawes@acm.org> wrote:
> >  news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in message news:<slrnaqdrjn.396.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net>...
> >  If I understand the question, and past requests made to the committee
> >  by National Bodies from Asia, it matters a great deal to those who
> >  routinely deal with languages which require wide-characters for easy
> >  manipulation of filenames, but have to run on operating systems which
> >  use narrow-character filenames. They consider these conversions so
> >  important that they have asked that no conversions be mandated without
> >  their prior particpation.
>
> This is quite understandandable. What is unclear to me is a different
> issue, though: Why should the standard mandate any particular such
> conversions?

The implementors say that otherwise they don't know what to implement.
The users say that otherwise they have no semantics they can depend
on. Both point out that it isn't at all like the narrow-string
situation, where both implementators and users both know what to
expect, even thought the standard is silent.

--Beman

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Tue, 15 Oct 2002 17:23:42 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0210140508.56269c38@posting.google.com...
> news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in
> message
> > The system defines what a correct wide character filename is, the same
> > as it does with narrow character filenames.  Where do you think lies
> > the difference, with regard to the above code?
>
> Systems have defined what the correct narrow character filenames can be.
> We know what to expect.  They haven't defined anything for wide
> characters.  We don't know what to expect.

This is clearly not true. Systems which support wide character filenames
have defined what the correct wide character filenames are and, naturally,
systems which don't support wide character filenames have not.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Tue, 15 Oct 2002 17:41:28 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0210140503.3c2542ba@posting.google.com...
>
> I also noticed something else when doing these tests.  There is another
> issue that no one seems to have raised.  If I understand correctly,
> Windows also offers native wide character files.  But neither Java nor
> C++ can read or write directly using wide characters -- both convert
> systematically to bytes on input and output.

Why can not C++ read or write wide characters on Windows ? Please explain
this. I see nothing preventing a C++ implementation from using the Windows
API to read and write wide characters for a Windows file that consists of
wide characters. Similarly there is nothing preventing a C++ implementation
from opening or creating a wide character filename file on Windows using the
Windows API. And finally, opening/creating a wide character filename has
nothing to do with reading/writing wide characters in a file on Windows.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: Tue, 15 Oct 2002 17:41:40 +0000 (UTC)
Raw View
Just some more factoids, don't know whether it will help.


On Mon, 14 Oct 2002 17:27:19 +0000 (UTC), kanze@gabi-soft.de (James Kanze) wrote:
>
>I also noticed something else when doing these tests.  There is another
>issue that no one seems to have raised.  If I understand correctly,
>Windows also offers native wide character files.

Not really, although many tools, such as the Windows standard editor,
understand them and can generate them (using byte-level functions).
However, Windows NT does offer a 16-bit character interface to console
windows, which provides a way of reading and writing uninterpreted
character codes (i.e. locale independent).  But this interface does not
map down to file handling; it's the other way around.


> But neither Java nor C++ can read or write directly using wide
> characters -- both convert systematically to bytes on input and output.

I remember an article in DDJ where the author, a std. library expert,
was surprised by this behavior.

And a lot of other instances where people have been seriously surprised.

It's almost the same problem as with filenames, because there's no single
way to do it "correctly"; it might serve as an example of a trap one
should avoid.



>(It would also be interesting to know whether Java passes wide character
>filenames to >Windows on open or not.  I wouldn't be surprised if Java
>[did] systematically convert the filename to narrow characters even under
>Windows.)

That I do not know, sorry, but that's one trap to try to avoid.


>
>> This illustrates more clearly than ever that the rules of association
>> between wide and narrow filenames belong to the platform ABI, not to
>> the programming language.
>
>No one is saying anything else.  What I am saying is that we should have
>some idea what the goals of these rules are before requiring
>implementors to invent them.

How about this as a MAIN GOAL:


  * Support applications written exclusively using wchar_t as character
    type (char may be used as a byte type in such an application).


It seems to me that all the rest is in support of this goal.

But unfortunately it brings in all the other baggage  --  stream data
conversion, exception text, etc.  --  so for the purposes of this thread it
would have to be this goal "with respect to file names only".

Cheers,


- Alf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 15 Oct 2002 18:28:35 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message
news:<1034621558.517591@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
> > Odd example. On non-VMS, if you make a file named a.txt and someone
> > modifies it, when you open a.txt you get the modified file. The
> > "condition of mine" doesn't guarantee that file contents are stable,
> > just that you are able to open the same file.

> In VMS, "a.txt;1" (I think that's the syntax)

(I thought it was a comma, but it doesn't matter.)

> continues to exist and is the same file as originally created.

For what definition of the same file?  Under Unix, if I open a file for
writing, modify it, then close it, is it the same file?  For what
definition of the same file?

My editor does versionning of files under Unix.  After I've edited the
file "a.txt", which is the same file: the new "a.txt", or the original
file, which has been renamed "a.txt.~1~"?

> Subsequent versions are new files.

Which is generally true under Unix as well, except that it is the
responsibility of the application to manage this (and most applications
screw it up badly when hard links are involved).

> Anyway, if we're going to get into an argument over the semantics of
> being "the same file", that's all the more reason for not requiring
> that the meaning of a wide filename be "the same file" each time the
> program is run.

> Here's a simpler argument. If I specify a name like "joe.txt", its
> meaning depends upon another external context, namely the current
> directory when the program is executed (at least on UNIX and Windows).
> Why should it be OK to depend on current directory for augmenting the
> meaning of a filename, but not current language?

Because I control (more or less) in what directory I am.

Let's put it differently: it is perfectly acceptable to document that a
program will execute in a certain directory, or even that it must be
started in that directory.  Would you consider it acceptable that the
program must execute in a certain language environment?  And if so, why
bother with internationalization at all?

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 15 Oct 2002 19:38:14 +0000 (UTC)
Raw View
pjp@dinkumware.com ("P.J. Plauger") wrote in message news:<3da5fd28$0$13811$4c41069e@reader1.ash.ops.us.uu.net>...
> "Niklas Matthies" <news_comp.std.c++_expires-2002-10-01@nmhq.net> wrote in message
> news:slrnaqbp5t.ad1.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net...
>
> > On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> > >  It's the *relationship* between "abc" and L"abc" that matters to some
> > >  of us, and it is important even if you can convince yourself that
> > >  some OS somewhere is permitted to reject either or both names.
> >
> > Why?
>
> Never mind. It's clearly unimportant to some people.

I personally don't believe that it is unimportant, it's just that
there are two issues, one of which is trivial, and the other of which
is not solvable.  The concern is with the above stated relation
between "abc" and L"abc".

The, IMHO, trivial issue is going from narrow to wide characters.
This is trivial because this is a well defined conversion.  My
understanding of Windows, at least the NT based versions, is that this
is the conversion done now whenever we open a file with a narrow
character name.  Internally the name is widened, and the wide name is
used to open the file.

The other side of this is, it would appear from the discussion in this
thread, not solvable in that it does not appear to be possible to
define a consistent mapping, or at least not possible to agree on a
consistent mapping, from wide back to narrow.  However, if other OSes
behave the same way that Windows NT does, then this too is not really
an issue since it will not occur.

Could this be a way out of the morass?  Or am I still missing
something important?

>
> I dislike shouting matches that don't lead to consensus. I'm
> checking out of this discussion.

Who can blame you.  This has been a very frustrating discussion, but
it also seems to be one that is going to keep coming back until it is
resolved.  Your particular input could be very valuable, so don't give
up too soon.

Randy.

>
> P.J. Plauger
> Dinkumware, Ltd.
> http://www.dinkumware.com
>
>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: allan_w@my-dejanews.com (Allan W)
Date: Tue, 15 Oct 2002 22:16:34 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote
> It's not up to the C or C++ standard to dictate to the
> operating system what its filenames mean. If the OS has some
> filenames which can only be opened as narrow chars, and some
> that can only be opened as wide chars then your application
> will have to deal with it. I don't know why you would expect
> otherwise.

That's an interesting point.

Does the standard say anyplace that a program can open any file
on the whole system? Obviously not, or we could never implement
standard C++ on any OS that understands ownership and permissions
and access control lists.

Some people in this debate have been arguing how to implement
multiple filename styles and still allow access to all files.
But if the standard doesn't say you have access to all files in
the first place, then having a mechanism simply fail for some
files can't make things worse.

Apart from QOI issues, there may be nothing wrong with a blanket
statement like this:
    On OS/17, files either have wide character names or narrow
    character names. Although this compiler complies with the
    latest international language Standard, it cannot directly
    open files with wide character names. If you wish to use
    such files with C++, you have several options:
      1. Rename the file to use narrow character names.
      2. Write a subroutine in Lithp or Fifth that opens the file
         and reads the data, then link it with your C++ program.
      3. Use OS/17 Adventurer to add a narrow character alias, and
         then use the alias in your C++ program. To create a
         narrow-character alias, press Red+Grape+Help+Power+F192,
         then type in the new name using only the narrow portion
         of your keyboard. See the OS/17 user documentation for
         more details.
      4. Call the OS/17 file primitives directly. You must use
         #include <os17.h> in your program, and you must link
         os17.lib into your image. We wanted to supply sample code
         to demonstrate this technique, but frankly, we only really
         understand the first 12 parameters to primitives such as
         OS17WideFileOpenRandomYesYadda, and calls to the makers of
         OS/17 have not been illuminating, which is why we don't
         support wide filenames in the first place.

         P.S. Anyone who really, completely understands the 16th
         parameter of OS17WideFileOpenRandomYesYadda, and can write
         a program that reliably uses it without the "trial and error"
         approach, is asked to please give us a call...

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: anthony.williamsNOSPAM@anthonyw.cjb.net
Date: Wed, 16 Oct 2002 16:08:30 +0000 (UTC)
Raw View
rmaddox@isicns.com (Randy Maddox) writes:

> The, IMHO, trivial issue is going from narrow to wide characters.
> This is trivial because this is a well defined conversion.  My
> understanding of Windows, at least the NT based versions, is that this
> is the conversion done now whenever we open a file with a narrow
> character name.  Internally the name is widened, and the wide name is
> used to open the file.

Yes, this is what windows NT/2000/XP does. No, it is not trivial. For
characters outside the basic (7-bit) ASCII character set, the conversion
depends on the current code page. Therefore, if you change the code page, you
might not be able to read back the file, as the necessary characters might
have different (narrow) code points, or might not be present.

Anthony
--
Anthony Williams
Senior Software Engineer, Beran Instruments Ltd.
Remove NOSPAM when replying, for timely response.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Wed, 16 Oct 2002 16:08:50 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message news:<1034621558.517591@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
> > Odd example. On non-VMS, if you make a file named a.txt and someone
> > modifies it, when you open a.txt you get the modified file. The
> > "condition of mine" doesn't guarantee that file contents are stable,
> > just that you are able to open the same file.
>
> In VMS, "a.txt;1" (I think that's the syntax) continues to exist
> and is the same file as originally created. Subsequent versions
> are new files. Anyway, if we're going to get into an argument over
> the semantics of being "the same file", that's all the more reason
> for not requiring that the meaning of a wide filename be "the same
> file" each time the program is run.

I don't intend to get into that argument. I know what I meant, and you
know full well what I meant. What's more important, users do
understand the meaning of "same file" well.

Getting into the "same file" argument would be a distraction.

> Here's a simpler argument. If I specify a name like "joe.txt", its
> meaning depends upon another external context, namely the current
> directory when the program is executed (at least on UNIX and Windows).
> Why should it be OK to depend on current directory for augmenting the
> meaning of a filename, but not current language?

Good point. Novice computer users are confused by every bit of file
name context dependence, including the current directory. The
difference is that there exists a mechanism of creating a file name
that is not dependent on the current directory. There should exist a
mechanism of creating a file name that is not dependent on the current
language. I view wchar_t file names as a good candidate for this
mechanism; one might argue that code page independence is one of the
driving forces behind wchar_t.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Sat, 12 Oct 2002 05:53:48 +0000 (UTC)
Raw View
On 11 Oct 2002 17:10:01 GMT, James Kanze <kanze@gabi-soft.de> wrote:
[...]
>    - The issue of what is expected, for portability between compilers on
>      the same system.

Compilers on the same system don't even need to agree on the same char
and wchar_t sizes. I don't see how the standard could impose portability
of filenames. Even the notion of "same system" can be rather vague.

>    - Consider the following code:
>
>          std::string filename ;
>          std::cin >> filename ;
>          std::ifstream file( filename.c_str() ) ;
>
>      This is pretty straight forward, and I think that we all know what
>      to expect of it.

I'm not sure how straightforward it is. More importantly, I don't
think the standard provides a lot of guarantees for that code.

>      What does the equivalent using wide characters mean?  Can the
>      user enter a wide character filename, and expect the system to
>      process it correctly.

The system defines what a correct wide character filename is, the same
as it does with narrow character filenames. Where do you think lies the
difference, with regard to the above code?

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Sat, 12 Oct 2002 09:23:44 +0000 (UTC)
Raw View
On Fri, 11 Oct 2002 17:41:11 +0000 (UTC), James Kanze <kanze@gabi-soft.de> wrote:
[...]
>  But for Windows and Unix, there is a generally acknowledged semantics
>  for narrow character filenames, which is universally implemented.

What are these common semantics? I'm currently a little at loss about that.
Even different Windows versions don't agree on what is the set of valid
filenames. And remote-mounted file systems have still different ideas.

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Sun, 13 Oct 2002 04:13:29 +0000 (UTC)
Raw View
On Sat, 12 Oct 2002 05:52:31 +0000 (UTC), Philip Guenther <guenther+expires-2002-11@gac.edu> wrote:
[...]
>  What is of concern is the relationship between the two namespaces for
>  filenames.  For a given implementation and locale, it would seem there
>  are four possibilities:
>
>  1) there are files openable using wide characters that are not
>     openable using narrow characters
>
>  2) there are file openable using narrow characters that are not
>     openable using wide characters
>
>  3) both (1) and (2), so that you need to use both interfaces to
>     have access to the entire set of files openable in (extended) C++
>
>  4) you can open the same set of files using either interface
>
>
>  Which of those possibilities, if any, do you feel that the standard
>  should disallow and consider non-compliant?

None. If there are systems on which any of these possibilities hold,
a C++ implementation for that system should be allowed to map to the
system in a way that maintains them.

>  Based on your answer,
>  please respond to the appropriate following paragraph:
>
>  As an application writer striving for maximal portability, I would
>  prefer that my programs be able, on all compliant systems, to open
>  all the files that they possibly could.  So, if possibility (3) is
>  permitted by the standard, or if both possibilities (1) and (2) are
>  permitted, than my program would need to utilize _both_ interfaces.
>  If you think the standard should _not_ disallow possibility (3),
>  or should permit both possibilities (1) and (2), please suggest how
>  I should present to the user, in documentation and interface, the
>  distinction between the two filename APIs.

The user usually selects the character encoding along with the locale,
and the character encoding will imply whether narrow or wide character
filenames are used.

I don't think this is a practical problem though, or can you point me to
any system where (3) holds?

[...]
>  If the standard allows possibility (1) but not (2) or (3), so that
>  any file openable via the narrow character interface would be
>  openable via the wide character interface, but where reverse isn't
>  true, then all existing programs compliant with the C++ standard
>  would implicitly become 'second class citizens' compared to programs
>  that utilized the wide character interface (or both).

Most likely they already are, on systems for which (1) is true (e.g.
Windows). On systems for which (1) is not true, (1) doesn't need to be
true for the C++ implementation either.

>  If you support this combination, how would suggest programmers deal
>  with the transition period, where they have a choice between greater
>  portability (by using only the narrow character interface) and
>  complete support for platforms where (1) is true?

Supporting (1) doesn't necessarily imply less portability, since the
wide character support can be implemented as an addition to a narrow
character interface, instead of as a replacement.

>  Do you feel this would have any effects on maintainability of C++
>  code?

Of course, a program which supports both narrow and wide character
filenames can be more costly to maintain than a program which supports
only narrow character filenames.

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bdawes@acm.org (Beman Dawes)
Date: Mon, 14 Oct 2002 15:05:34 +0000 (UTC)
Raw View
news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in message news:<slrnaqdrjn.396.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net>...
> On Thu, 10 Oct 2002 23:18:04 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> >  "Niklas Matthies" <news_comp.std.c++_expires-2002-10-01@nmhq.net> wrote in message
> >  news:slrnaqbp5t.ad1.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net...
> >
> > > On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> > > >  It's the *relationship* between "abc" and L"abc" that matters to some
> > > >  of us, and it is important even if you can convince yourself that
> > > >  some OS somewhere is permitted to reject either or both names.
> > >
> > > Why?
> >
> >  Never mind. It's clearly unimportant to some people.
>
> It seems that you think that it is obvious why that relationship is
> important to the extent that a facility for opening files with wide-
> character names should not be provided unless some (which kind of?)
> guarantees with regard to that relationship are made.
>
> Unfortunately, it's not obvious at all to me. I'm even hard-pressed to
> imagine an application that would need to rely on such a guarantee.
> Therefore, and since I can't remember that any illustrations were given
> yet with regard to that matter, I had hoped you could provide some
> insightful explanations of why you believe that it is that important.

If I understand the question, and past requests made to the committee
by National Bodies from Asia, it matters a great deal to those who
routinely deal with languages which require wide-characters for easy
manipulation of filenames, but have to run on operating systems which
use narrow-character filenames. They consider these conversions so
important that they have asked that no conversions be mandated without
their prior particpation.

--Beman

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 14 Oct 2002 15:06:07 +0000 (UTC)
Raw View
alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach) wrote in message
news:<3da721f1.52550593@news.bluecom.no>...

>    In Windows, a wide character filename may be up to 32.000
>    characters long -- no matter that e.g. Explorer doesn't support it.

>    A narrow character filename may only be up to 260 characters long.

>    Wide character filenames allow additional syntax.

> So it's not just a question of character/encoding conversion.

You've convinced me.  Any Windows compiler should offer wide character
filenames as an extension.  And map narrow character filenames to the
wide character system interface.

I don't quite see the fact that one particular system happens to have
done something even more stupidly than usual as a justification for
changing the standard.  (I know, I know: there are a lot of stupid
things in C that come from the Unix world.  But that's history, and they
generally date to before standardization.  Just because we've made some
mistakes in the past, doesn't mean we have to continue in the same
path.)

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 14 Oct 2002 16:45:48 +0000 (UTC)
Raw View
news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in
message
news:<slrnaqesc6.j8.news_comp.std.c++_expires-2002-10-01@nightrunner.nmhq.net>...
> On 11 Oct 2002 17:10:01 GMT, James Kanze <kanze@gabi-soft.de> wrote:
> [...]
> >    - The issue of what is expected, for portability between compilers on
> >      the same system.

> Compilers on the same system don't even need to agree on the same char
> and wchar_t sizes.

The standard doesn't require it, but all do.  That's because we know
what is expected of a quality implementation for a given platform, and
no one intentionally implements bad quality.

> I don't see how the standard could impose portability of
> filenames. Even the notion of "same system" can be rather vague.

> >    - Consider the following code:

> >          std::string filename ;
> >          std::cin >> filename ;
> >          std::ifstream file( filename.c_str() ) ;

> >      This is pretty straight forward, and I think that we all know
> >      what to expect of it.

> I'm not sure how straightforward it is. More importantly, I don't
> think the standard provides a lot of guarantees for that code.

It would help discussion if people would actually read what others are
writing.  No one said that the standard guaranteed any more than that
these functions exist.  And there are doubtlessly platforms where the
usual expectations about this code doesn't apply.

Never the less, it is quite portable, working as it does on all Unix and
all Windows systems, and probably on most other systems for general
purpose computers.

> >      What does the equivalent using wide characters mean?  Can the
> >      user enter a wide character filename, and expect the system to
> >      process it correctly.

> The system defines what a correct wide character filename is, the same
> as it does with narrow character filenames.  Where do you think lies
> the difference, with regard to the above code?

Systems have defined what the correct narrow character filenames can be.
We know what to expect.  They haven't defined anything for wide
characters.  We don't know what to expect.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 14 Oct 2002 16:45:53 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message
news:<1034350225.340511@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
> > On the other hand, a context-independent mapping ensures that if you
> > open the file today using wchar_t sequence X, you will be able to
> > open the same file tomorrow using that same sequence X (assuming the
> > file hasn't been deleted or moved of course.)

> I don't understand why this is the programming language's problem to
> solve, though. And in any case, this condition of yours doesn't hold
> for char filenames, so why should it hold for wide names?

> In VMS files have version numbers. When you modify a file, the old
> version is kept, and a new copy is created with a higher version
> number. When you open a file with no version specified, you get the
> one with the highest version. So if I open "a.txt" today, and then
> someone changes it, and I open "a.txt" tomorrow, I am not getting the
> same file, even though that same file still exists (and is accessible
> by explicitly giving the version).

Guess what?  If someone modifies the file under Unix or Windows, I also
see the changed file.  The only difference in VMS is that I can also see
older versions.  But I don't normally.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Mon, 14 Oct 2002 16:49:09 +0000 (UTC)
Raw View
On Mon, 14 Oct 2002 15:05:34 +0000 (UTC), Beman Dawes <bdawes@acm.org> wrote:
>  news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies) wrote in message news:<slrnaqdrjn.396.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net>...
> > On Thu, 10 Oct 2002 23:18:04 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> > >  "Niklas Matthies" <news_comp.std.c++_expires-2002-10-01@nmhq.net> wrote in message
> > > > On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> > > > >  It's the *relationship* between "abc" and L"abc" that matters
> > > > >  to some of us, and it is important even if you can convince
> > > > >  yourself that some OS somewhere is permitted to reject either
> > > > >  or both names.
> > > >
> > > > Why?
> > >
> > >  Never mind. It's clearly unimportant to some people.
> >
> > It seems that you think that it is obvious why that relationship is
> > important to the extent that a facility for opening files with wide-
> > character names should not be provided unless some (which kind of?)
> > guarantees with regard to that relationship are made.
> >
> > Unfortunately, it's not obvious at all to me. I'm even hard-pressed to
> > imagine an application that would need to rely on such a guarantee.
> > Therefore, and since I can't remember that any illustrations were given
> > yet with regard to that matter, I had hoped you could provide some
> > insightful explanations of why you believe that it is that important.
>
>  If I understand the question, and past requests made to the committee
>  by National Bodies from Asia, it matters a great deal to those who
>  routinely deal with languages which require wide-characters for easy
>  manipulation of filenames, but have to run on operating systems which
>  use narrow-character filenames. They consider these conversions so
>  important that they have asked that no conversions be mandated without
>  their prior particpation.

This is quite understandandable. What is unclear to me is a different
issue, though: Why should the standard mandate any particular such
conversions?

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: Thu, 10 Oct 2002 20:54:46 +0000 (UTC)
Raw View
On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), pjp@dinkumware.com ("P.J. Plauger") wrote:

>""Edward Diener"" <eldiener@earthlink.net> wrote in message
>news:wkjp9.20209$OB5.1934654@newsread2.prod.itd.earthlink.net...
>
>> > So we standardize something that's so wishy washy today that it'll be
>> > equally wishy washy in ten years, and equally beneficial to writing
>> > portable programs?
>>
>> I don't think that supporting programmers who want to create filenames on
>> their own operating system in a foreign language which they understand is a
>> "wishy washy" enterprise,
>
>Nor do I. Nor did I say that.
>
>>                          nor that giving C++ the ability to do so is
>> either.
>
>Depends on how you do it. If you produce an International Standard that
>says, here's how you can *try* to open a file with a wide-character name,
>but no implementation is obliged to succeed for *any* request, and every
>implementation is at liberty to do something different -- that, my friend,
>is wishy washy.

I think the point here is that that particular wishy washiness does
__not__ matter.  It's present in current C++ support for narrow filenames.
So it is a red herring, and should not need to be discussed further,
except if someone sees a way to get rid of it (which would be good).


>...
>It's the *relationship* between "abc" and L"abc" that matters to some of us,
>and it is important even if you can convince yourself that some OS somewhere
>is permitted to reject either or both names.

Right.

I have a suggestion about the wide-to-narrow conversion (half of the
problem!), based on the discussion so far.


  * By default, no character codes are changed.

    I think this is what James Kanze is suggesting.

    This default does nothing wrong, but doesn't necessarily do all
    that we'd wish for right, either.  It's something the programmer
    can rely on always being the same.  And it is therefore portable
    to the same degree that wide character strings are portable.

    - Requires that the narrow ch. set is a subset of the wide ch. set.
    - All file names can be represented as wide character strings.
    - Conversion error on characters not representable as narrow chars.


  * Selectable by the programmer: an unspecified system-dependent
    conversion.

    In other words, no particular conversion is selected, but the
    programmer says: "Use the most appropriate/natural conversion for
    this system and perhaps for the current system configuration".
    Examples might be UTF-8, UTF-8 with non-display prefix, locale
    specific conversion (e.g. some IBM PC codepage).

    - Possibility of filenames not representable as wide character string,
      because the conversion might be anything.
    - Conversion error on characters not representable as narrow chars.


  * Selectable by the programmer: particular encoding.

    Perhaps based on current locale support?

    - Conversion error on characters not representable as narrow chars.


A nice-to-have feature would be to have conversion error or not as
a run-time parameter or compile-time policy.  Then one could choose
to say "go ahead, convert that name to the closest approximation you
can come up with".  Another nice to have feature would then be the
ability to check what the generated name would be, so that one could
e.g. present it to the user or log it for later processing, whatever.

Since I'm not very familiar with the locale support (shame to say),
"someone" (TM) will have to flesh out the last point above if that
"someone" has arguments for or against  --  I don't feel competent
to participate in a pure locale support subthread.


Cheers,

- Alf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Thu, 10 Oct 2002 21:31:12 +0000 (UTC)
Raw View
On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
>  It's the *relationship* between "abc" and L"abc" that matters to some
>  of us, and it is important even if you can convince yourself that
>  some OS somewhere is permitted to reject either or both names.

Why?

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Thu, 10 Oct 2002 23:18:04 +0000 (UTC)
Raw View
"Niklas Matthies" <news_comp.std.c++_expires-2002-10-01@nmhq.net> wrote in message
news:slrnaqbp5t.ad1.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net...

> On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> >  It's the *relationship* between "abc" and L"abc" that matters to some
> >  of us, and it is important even if you can convince yourself that
> >  some OS somewhere is permitted to reject either or both names.
>
> Why?

Never mind. It's clearly unimportant to some people.

I dislike shouting matches that don't lead to consensus. I'm
checking out of this discussion.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 10 Oct 2002 23:32:35 +0000 (UTC)
Raw View
kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0210100721.11e257a2@posting.google.com>...
> pdimov@mmltd.net (Peter Dimov) wrote in message
> news:<7dc3b1ea.0210090707.cf95281@posting.google.com>...
>
> > The perfect mapping has some pretty obvious properties:
>
> > * Equal wchar_t sequences map to equal byte sequences, independent of
> > OS/runtime/compiler state.
>
> Independent of the current locale setting?  Which may be used to write
> the filename into another file?

Yes. What you later do with the name is not under the OS/stdlib
control. You may write it to a file using whatever encoding you want,
irregardless of any locale settings the library performing the mapping
has access to.

> This means that if I write the filename to a file.  I then use it to
> create the file.  I (or another program) then rereads it as a narrow
> character sequence, and tries to open the file.  The file won't be
> their, because two different mappings were used to convert it to narrow
> characters.

Yes.

The problem with the above scenario is that you are trying to open the
same file using a different name. In the general case, this will not
work and cannot be made to work.

On the other hand, a context-independent mapping ensures that if you
open the file today using wchar_t sequence X, you will be able to open
the same file tomorrow using that same sequence X (assuming the file
hasn't been deleted or moved of course.)

> > * Different wchar_t sequences map to different byte sequences.
>
> Including if they contain combining sequences?  What about the
> positional variants of the Arabic alphabets.

Sorry, this is outside my area of experience. We may look at what
Windows does with such whcar_t sequences.

> > * Identity mapping for file names in a suitably limited character set.
>
> The basic character set.

At a minimum, yes. I was thinking of a suitable superset of the basic
character set, the "native character set" for our platform, if one
exists.

> I think that's the one requirement which will find more or less
> universal agreement.
>
> > I believe that it is possible to invent a mapping that is very close
> > to perfect. See my answer to Beman's post.
>
> It's possible to invent any number of mappings which meet your
> criteria.  I think you forgot the most important one: that a character
> in the wide character set map to the same character in the narrow
> character set (provided such a character exist).  But meeting this
> criteria involved determining what the narrow character set is, which is
> almost impossible.

This is definitely impossible if the narrow character set can change.
Impossible requirements aren't useful. :-)

[...]
> Unix, as a whole, certainly does interpret filenames.  The shells and
> find do pattern matching on them, ls (and echo, and a number of other
> programs) display them, etc. Suppose I generate a configuration file,
> and at the same time, output the filename to a startup script that I
> generate.  The output of the filename will use the current locale
> (unless I explicitly imbue another locale in the filebuf).  Are you
> saying that the filename in the startup script should be different than
> the one generated in the file system?

Difficult question. If your file name is in the "identity" subset, all
is well. If it's not...

[...]
> Finally, I'd like to stress that I'm brainstorming here.  I DON'T know
> what the correct solution is.  I don't even know how to tell whether a
> solution is correct or not.  And I'm convinced that I can find problems
> (like the above) with any solution.
>
> What I'm arguing against isn't your solution.  What I'm arguing against
> is the idea that this is a trivial undertaking, with "obvious"
> solutions.  That doesn't mean that we shouldn't undertake it.  But it
> does mean not to expect miracles anytime soon.  And while I generally
> think that some support for wide character filenames is necessary and
> inevitable, I'd rather have no support, than to have something that
> doesn't work, and that we have to change later.

What I have tried to accomplish is to get the debate past the "theory"
stage that, for an indifferent observer, might have looked a bit...
unproductive.

Having a real proposed solution helps us concentrate on real problems
with that solution rather than wasting time.

Any solution will have problems, but if we get to the point where we
agree that solution X quite likely has only the problems that will
inherently be part of any other reasonable solution, we'll probably
have a "release candidate".

Of course we shouldn't forget that one could always use narrow
character file names if they are more appropriate, so we should view
the wchar_t alternative with that in mind.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Fri, 11 Oct 2002 11:48:56 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0210100702.79d02e12@posting.google.com...
> eldiener@earthlink.net ("Edward Diener") wrote in message
> news:<at3p9.18378$OB5.1806394@newsread2.prod.itd.earthlink.net>...
> > I hope that C++ will not make that mistake and provide support for
> > wide character filenames which an implementation/OS is allowed to
> > define for itself. In that case, if OSs do coagulate around a
> > standard, C++ will not be out-of-date as these other languages are or
> > may well be in the future.
>
> I agree that what constitutes a filename, and how the wide characters
> are converted, must be implementation defined.  But I would like to know
> what to expect in some typical cases.

Do you want the C++ standard to tell you what to expect in typical cases
when what constitutes a filename is implementation defined ? That seems to
me a contradiction. I would rather have the OS tell me what to expect since
each OS may well be different. C++ can not predict the future in its
specifications, and the best that can be expected is that C++ would tell you
for now how each OS might deal with the issue. But I don't think that is
something that belongs in the standard but rather in an individual's
understanding of what implementation defined means for a particular
implementation on a particular OS. I wouldn't mind if the C++ standard
offered suggestions of what might be done if an OS does not natively support
wide character filenames in any specific way in that OS's API, but that to
me is as far as the standard can go, and frankly I don't think that even
that serves much of a purpose.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 11 Oct 2002 17:05:14 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message news:<1034272803.962195@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
>
> > Because, if the forward mapping is state-dependent, the reverse
> > mapping will be state-dependent, too. Therefore, in order to decode a
> > filename, you will need not only the byte sequence, but the state used
> > at the time the forward mapping was performed, which is one of the
> > main disadvantages of char[] based names (to interpret them you need
> > to know the code page).
>
>
> Why is that a problem?

Because context independence ensures that file names are stable. The
user has created a file using wchar_t sequence X as a name, and he
most likely prefers to be able to refer to that same file via that
same name X in the future.

> Or rather, why is it more of a problem
> than any other locale-dependent conversions?

I don't understand this question.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 11 Oct 2002 12:09:44 CST
Raw View
alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach) wrote in message
news:<3da5d5f3.620062250@news.bluecom.no>...
> On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), pjp@dinkumware.com
> ("P.J. Plauger") wrote:

> >""Edward Diener"" <eldiener@earthlink.net> wrote in message
> >news:wkjp9.20209$OB5.1934654@newsread2.prod.itd.earthlink.net...

> >> > So we standardize something that's so wishy washy today that it'll be
> >> > equally wishy washy in ten years, and equally beneficial to writing
> >> > portable programs?

> >> I don't think that supporting programmers who want to create
> >> filenames on their own operating system in a foreign language which
> >> they understand is a "wishy washy" enterprise,

> >Nor do I. Nor did I say that.

> >>                          nor that giving C++ the ability to do so is
> >> either.

> >Depends on how you do it. If you produce an International Standard
> >that says, here's how you can *try* to open a file with a
> >wide-character name, but no implementation is obliged to succeed for
> >*any* request, and every implementation is at liberty to do something
> >different -- that, my friend, is wishy washy.

> I think the point here is that that particular wishy washiness does
> __not__ matter.  It's present in current C++ support for narrow
> filenames.  So it is a red herring, and should not need to be
> discussed further, except if someone sees a way to get rid of it
> (which would be good).

The current situation is that all of the compilers for Unix I know of
implement narrow character filenames in the same way.  That's NOT
wishy-washy.  Even if the standard makes allowances for systems without
traditional file systems, everyone knows what narrow character filenames
*should* do under Unix.

Can you guarantee me that if wide character filenames are adopted, Sun,
g++, Dinkumware and the STLport will implement them with the same
semantics, so that they the different libraries will be interchangeable?
If not, we have a problem which must be addressed.

> >...
> >It's the *relationship* between "abc" and L"abc" that matters to some
> >of us, and it is important even if you can convince yourself that
> >some OS somewhere is permitted to reject either or both names.

> Right.

> I have a suggestion about the wide-to-narrow conversion (half of the
> problem!), based on the discussion so far.

>   * By default, no character codes are changed.

>     I think this is what James Kanze is suggesting.

Not particularly.  For anything anyone suggests, I can suggest something
else which is just as reasonable.  And THAT is the reason why we
shouldn't adopt them yet.

>     This default does nothing wrong, but doesn't necessarily do all
>     that we'd wish for right, either.

For what definitions of right and wrong?  I've seen a number of
suggestions for different mappings, but I've yet to see any real
discussion of what the actual goals are (except to allow a standardized
access to Windows wide character filenames).

>     It's something the programmer can rely on always being the same.
>     And it is therefore portable

>     - Requires that the narrow ch. set is a subset of the wide ch. set.

Not necessarily.  It requires that all of the narrow characters possible
in a legal filename be a subset of the wide character set.  To be
practically useful, it also requires that the characters in this subset
have the same numerical value in both sets.  (Roughly speaking -- if
char is signed, they probably won't have, but they may work anyway.)

Obviously, if you have a system where narrow characters are EBCDIC, and
wide characters Unicode, this mapping is a disaster.  Still, such
systems are rare enough that I'm willing to accept that we don't know
what the system will do with the "implementation defined".

>     - All file names can be represented as wide character strings.

I'm less sure of this.  On systems which support them, can all wide
character filenames be represented as a narrow character string?


More generally, I think that there are still a number of open points
concerning wide character filenames, which any proposal will have to
address if it is to be considered:

  - The issue of what is expected, for portability between compilers on
    the same system.

  - Consider the following code:

        std::string filename ;
        std::cin >> filename ;
        std::ifstream file( filename.c_str() ) ;

    This is pretty straight forward, and I think that we all know what
    to expect of it.  What does the equivalent using wide characters
    mean?  Can the user enter a wide character filename, and expect the
    system to process it correctly.

  - Interaction with other parts of the system.  (You can doubtlessly
    argue that this doesn't work with narrow characters, either, but I
    see no need to make it worse.)  Normally, if I am a user of the
    above code, I would like to cut and paste the output of ls (in
    another window) into my input.  I don't expect a perfect solution
    for this -- I don't think one exists, even for narrow characters.
    But I do expect to see some discussion of the issue.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: news_comp.std.c++_expires-2002-10-01@nmhq.net (Niklas Matthies)
Date: Fri, 11 Oct 2002 17:25:12 +0000 (UTC)
Raw View
On Thu, 10 Oct 2002 23:18:04 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
>  "Niklas Matthies" <news_comp.std.c++_expires-2002-10-01@nmhq.net> wrote in message
>  news:slrnaqbp5t.ad1.news_comp.std.c++_expires-2002-10-01@moongate.nmhq.net...
>
> > On Thu, 10 Oct 2002 19:21:01 +0000 (UTC), "P.J. Plauger" <pjp@dinkumware.com> wrote:
> > >  It's the *relationship* between "abc" and L"abc" that matters to some
> > >  of us, and it is important even if you can convince yourself that
> > >  some OS somewhere is permitted to reject either or both names.
> >
> > Why?
>
>  Never mind. It's clearly unimportant to some people.

It seems that you think that it is obvious why that relationship is
important to the extent that a facility for opening files with wide-
character names should not be provided unless some (which kind of?)
guarantees with regard to that relationship are made.

Unfortunately, it's not obvious at all to me. I'm even hard-pressed to
imagine an application that would need to rely on such a guarantee.
Therefore, and since I can't remember that any illustrations were given
yet with regard to that matter, I had hoped you could provide some
insightful explanations of why you believe that it is that important.

>  I dislike shouting matches that don't lead to consensus.

Neither do I.

>  I'm checking out of this discussion.

I'm sorry for that, since honestly I've just been checking into this
discussion to try to understand your viewpoint.

-- Niklas Matthies
--
Save Farscape - get informed and involved: http://farscape.wdsection.com/
Together, we can get Farscape back on air. Crackers *do* matter.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: semzx@newmail.ru (Alexander E. Patrakov)
Date: Fri, 11 Oct 2002 17:25:19 +0000 (UTC)
Raw View
bdawes@acm.org (Beman Dawes) wrote in message news:<70fa0367.0210061718.3b8747e8@posting.google.com>...
> ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...
>
> > I therefore suggest the following changes:
> >
> > Add the following members to parallel the existing const char* arg'd versions
> >
> > [wchar_t* arg'd overloads]
> With what semantics on operating systems which do not support wchar_t
> filenames?
>
Valid question. And I have already posted a proposal to this thread
which avoids the problem and still allows using wide character
filenames on platforms supporting them.
I post it once again, since I have got only the answer that

> The problem is not with the type of the elements used to represent
> characters. It's with the relationship between byte- and wide-character
> sequences used to represent file names.

and I am not satisfied with that (I think that both problems do exist
and I solved one of them - am I right?).

Now, the details:
1) typedef implementation_defined filenamechar_t;

where implementation_defined is char on platforms which accept narrow
filenames natively and implementation_defined is wchar_t on platforms
on which native filename handling routines accept wchar_t*.

2) Add overloads of above mentioned functions with filenamechar_t
arguments on platforms where filenamechar_t is not char. Note that
there are no overloads with wchar_t* on platforms accepting only char*
natively, and therefore there is no question on correct semantics of
such overloads.

The good point:

A library writer can use vector<basic_string<filenamechar_t> > as the
return type of a function enumerating all files in a directory. This
offers some (maybe false) feeling of portability.

And the bad point:

My proposal does not solve the problem of constructing filenames
dynamically (e.g. by adding a default extension to the user-entered
filename) since there is no default conversion between narrow and wide
strings and they cannot be easily concatenated.

Any other good and bad points?

Alexander E. Patrakov

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Fri, 11 Oct 2002 17:36:58 +0000 (UTC)
Raw View
eldiener@earthlink.net ("Edward Diener") writes:

> By creating a proposal which others would find acceptable because the idea
> as proposed ( and possibly the implementation of that idea ) behind it is a
> good one.

You are certainly free to create a proposal, and I'm sure many
committee members will consider it, as will compiler vendors.

You just can't expect to get a *formal* notification of acceptance or
rejection, if you don't follow the procedures.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 11 Oct 2002 17:41:11 +0000 (UTC)
Raw View
pjp@dinkumware.com ("P.J. Plauger") wrote in message
news:<3da5cff0$0$13809$4c41069e@reader1.ash.ops.us.uu.net>...

> >        I don't believe that C++ should wait ten years to make such a
> > decision and even waiting six years, circa 2008 when I am told the
> > next C++ standard will be ratified, seems too long.

> Too long for what? People can open files with wide names right now, as
> James Kanze keeps pointing out. They just can't do it in a
> standard-conforming way.

Actually, I don't remember raising that particular point.  What I have
been saying is that I don't want anything standardized without our
having a very good idea what to expect from a quality implementation.

It's important to realize what the effect of standardizing something
is.  In this case, I'm quite convinced that any final definition in the
standard will be that it is implementation defined.  This IS the case,
more or less for narrow character filenames already, and I certainly
don't want the standard to impose some sort of behavior that isn't
implementable on, say, a Palm.

What is important to realize is that, implementation defined or not, I
know what to expect from an open with narrow character filenames on the
systems I have to deal with.  Under Solaris (my main platform at the
moment), I currently use four different compilers (Sun CC 4.2, Sun CC
5.0, g++ 2.95.2 and g++ 3.0, with at least three different libraries.
Never the less, and even though it is implementation defined, they all
do the same thing for an open with narrow characters.  I haven't tried
it, but I feel very certain that if I downloaded the Dinkumware library,
or the STLport, that they would also do the same thing.

If wide character filenames are adopted, the newer versions of the
libraries will implement them.  But what will they implement?  Will the
g++ library still have the same semantics as the Sun libraray?  Could I
replace one of the libraries with a library from Dinkumware or the
STLport with no change in semantics?

I expect minor changes in semantics when moving from Unix to Windows
(but they are very minor).  I don't expect the filenames I currently use
to work at all on a Palm.  But for Windows and Unix, there is a
generally acknowledged semantics for narrow character filenames, which
is universally implemented.  Without a generally acknowledged semantics
for wide character filenames, at least in the usual environments, their
standardization is premature.

The current situation is that not only is there no general
acknowledgement as to what the semantics should be, but that I don't
even know what they should be for my own personal case.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: 11 Oct 2002 20:10:13 GMT
Raw View
On Fri, 11 Oct 2002 17:41:11 +0000 (UTC), kanze@gabi-soft.de (James Kanze) wrote:

>pjp@dinkumware.com ("P.J. Plauger") wrote in message
>news:<3da5cff0$0$13809$4c41069e@reader1.ash.ops.us.uu.net>...
>
>> >        I don't believe that C++ should wait ten years to make such a
>> > decision and even waiting six years, circa 2008 when I am told the
>> > next C++ standard will be ratified, seems too long.
>
>> Too long for what? People can open files with wide names right now, as
>> James Kanze keeps pointing out. They just can't do it in a
>> standard-conforming way.
>
>Actually, I don't remember raising that particular point.  What I have
>been saying is that I don't want anything standardized without our
>having a very good idea what to expect from a quality implementation.
>
>It's important to realize what the effect of standardizing something
>is.  In this case, I'm quite convinced that any final definition in the
>standard will be that it is implementation defined.  This IS the case,
>more or less for narrow character filenames already, and I certainly
>don't want the standard to impose some sort of behavior that isn't
>implementable on, say, a Palm.
>
>What is important to realize is that, implementation defined or not, I
>know what to expect from an open with narrow character filenames on the
>systems I have to deal with.  Under Solaris (my main platform at the
>moment), I currently use four different compilers (Sun CC 4.2, Sun CC
>5.0, g++ 2.95.2 and g++ 3.0, with at least three different libraries.
>Never the less, and even though it is implementation defined, they all
>do the same thing for an open with narrow characters.  I haven't tried
>it, but I feel very certain that if I downloaded the Dinkumware library,
>or the STLport, that they would also do the same thing.
>
>If wide character filenames are adopted, the newer versions of the
>libraries will implement them.  But what will they implement?  Will the
>g++ library still have the same semantics as the Sun libraray?  Could I
>replace one of the libraries with a library from Dinkumware or the
>STLport with no change in semantics?
>
>I expect minor changes in semantics when moving from Unix to Windows
>(but they are very minor).  I don't expect the filenames I currently use
>to work at all on a Palm.  But for Windows and Unix, there is a
>generally acknowledged semantics for narrow character filenames, which
>is universally implemented.  Without a generally acknowledged semantics
>for wide character filenames, at least in the usual environments, their
>standardization is premature.
>
>The current situation is that not only is there no general
>acknowledgement as to what the semantics should be, but that I don't
>even know what they should be for my own personal case.

James, I'll reply to your reply to another posting of mine (the
"suggestion") later, need to think about it.

But wrt. to the above, consider:


   In Windows, a wide character filename may be up to 32.000 characters
   long  --  no matter that e.g. Explorer doesn't support it.

   A narrow character filename may only be up to 260 characters long.

   Wide character filenames allow additional syntax.


So it's not just a question of character/encoding conversion.

Cheers,


- Alf

(Throwing a few refreshing facts into the discussion, always fun.  Most
of the doc about Windows filenames is found with CreateFile().)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: guenther+expires-2002-11@gac.edu (Philip Guenther)
Date: Sat, 12 Oct 2002 05:52:31 +0000 (UTC)
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message news:<1034281467.63765@master.nyc.kbcfp.com>...
...
> So we define a std::wfopen whose first argument is a const wchar_t *.
> In exactly the same way, as fopen, this function is defined to open the
> file whose name is that string. We declare open methods for file streams
> and buffers which take wide string names, and define them to open the
> files "as if" by std::wfopen.
>
> The semantics are exactly the same. I fail to see why we need any more
> comprehensive definition for wfopen than for fopen.

What is of concern is the relationship between the two namespaces for
filenames.  For a given implementation and locale, it would seem there
are four possibilities:

1) there are files openable using wide characters that are not
   openable using narrow characters

2) there are file openable using narrow characters that are not
   openable using wide characters

3) both (1) and (2), so that you need to use both interfaces to
   have access to the entire set of files openable in (extended) C++

4) you can open the same set of files using either interface


Which of those possibilities, if any, do you feel that the standard
should disallow and consider non-compliant?  Based on your answer,
please respond to the appropriate following paragraph:

As an application writer striving for maximal portability, I would
prefer that my programs be able, on all compliant systems, to open
all the files that they possibly could.  So, if possibility (3) is
permitted by the standard, or if both possibilities (1) and (2) are
permitted, than my program would need to utilize _both_ interfaces.
If you think the standard should _not_ disallow possibility (3),
or should permit both possibilities (1) and (2), please suggest how
I should present to the user, in documentation and interface, the
distinction between the two filename APIs.

If the standard were to disallow both (1) and (3), so that the wide
character interfaces provide no additional 'reach', then there would
clearly exist a mapping from wide character filenames to narrow
character filenames.  If you support wide character filenames but
think the standard should disallow possibilities (1) and (3), please
explain why the standard shouldn't just standardize a function that
performs the (possibly locale-dependent) mapping.

If the standard allows possibility (1) but not (2) or (3), so that
any file openable via the narrow character interface would be
openable via the wide character interface, but where reverse isn't
true, then all existing programs compliant with the C++ standard
would implicitly become 'second class citizens' compared to programs
that utilized the wide character interface (or both).  If you support
this combination, how would suggest programmers deal with the
transition period, where they have a choice between greater portability
(by using only the narrow character interface) and complete support
for platforms where (1) is true?  Do you feel this would have any
effects on maintainability of C++ code?


Philip Guenther

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: r-smith@ihug.co.nz (Ross Smith)
Date: Sat, 12 Oct 2002 05:52:54 +0000 (UTC)
Raw View
James Kanze wrote:
>
> What I'm arguing against isn't your solution.  What I'm arguing
> against is the idea that this is a trivial undertaking, with "obvious"
> solutions.  That doesn't mean that we shouldn't undertake it.  But it
> does mean not to expect miracles anytime soon.  And while I generally
> think that some support for wide character filenames is necessary and
> inevitable, I'd rather have no support, than to have something that
> doesn't work, and that we have to change later.

I think you're asking too much of C++.

As far as I can see, there's nothing wrong with the simple, obvious
solution: add language to the standard about opening files with wide
character names that simply mirrors the existing language about narrow
characters. (Well, the existing language refers to C's <stdio.h> so we
can't just literally cut-and-paste the wording, but importing the
relevant parts of the C standard's wording, mutatis mutandis, should be
practical.) This would solve the basic problem: the need to open files
with wide character names.

You and P. J. Plauger and others keep bringing up the issue of
converting between wide and narrow file names. I don't see why the
standard has to address this at all, beyond what it already has about
wide/narrow string conversion (not a lot). It seems to me to be a
completely orthogonal issue, and I, and apparently several others,
can't understand why you insist that it has to be solved before we can
introduce wide character file names.

What's wrong with the obvious approach: allow people to use WC file
names in the same way we use NC ones now, treat them as two disjoint
namespaces, and just say nothing at all about conversion between the
two?

You keep insisting that we need to be able to determine whether a WC
name and a NC name refer to the same file. But _we don't have that
now_, with NC names alone, so I can't understand why you consider it a
reasonable thing to demand before WC names can be introduced.

--
Ross Smith ......... r-smith@ihug.co.nz ......... Auckland, New Zealand

"I'm deeply concerned about a leader who has ignored the United Nations
for all these years, refused to conform to resolution after resolution
after resolution, who has weapons of mass destruction."
                                                -- George W. Bush, Jr.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Tue, 8 Oct 2002 17:39:10 +0000 (UTC)
Raw View
bdawes@acm.org (Beman Dawes) wrote in message news:<70fa0367.0210061718.3b8747e8@posting.google.com>...
> ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...
>
> > I therefore suggest the following changes:

[wchar_t filenames]

> With what semantics on operating systems which do not support wchar_t
> filenames?

What semantics do the existing functions have on systems which only
support wchar_t filenames? Let's be fair. Some systems have to convert
today. Some systems will have to convert tomorrow.

> Based on past committee discussions, numerous members of the standards
> committee's library working group would like to provide for wchar_t
> filenames. But not a single LWG member has been willing to speak in
> favor of doing so without explicit semantics.
>
> "Implementation defined" won't fly; that's just an "illusion of
> portability" without any underlying reality. The implementors say they
> have no idea what semantics they would be defining.

The semantics _must_ be implementation defined. The meaning of a file
name has always been implementation defined.

That said, let me outline a "reference conversion" on a system where
the native file name is an NTBS, and wchar_t is UCS-x.

* If the wchar_t sequence contains only characters in the [1..255]
range and does not start with 0xEF 0xBB 0xBF, use the corresponding
byte sequence;

* Otherwise, convert the wchar_t sequence to UTF-8 and prepend 0xEF
0xBB 0xBF.

Is this good enough? If not, why not?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Tue, 8 Oct 2002 17:39:49 +0000 (UTC)
Raw View
"Francis Glassborow" <francis.glassborow@ntlworld.com> wrote in message news:EW7Uh0Lsxgo9EwsU@robinton.demon.co.uk...

> No, we are quite willing to claim credit for implementing excellent
> ideas and campaigning for them. What we are not willing to accept is the
> principle that someone should do something about an idea. If you have an
> idea and really want to see it discussed you must do the spade work. I
> have quite enough good ideas of my own without being asked to work up
> proposals based on someone else's sketchy suggestion.

It's not sketchy.  It's simple and straight-forward and we've already done it
for the  (at least DEBUG source version) shipped with Visual C++ 6 SP 5.
All you do is add the appropriate interfaces I detailed to call the existing
implementation wide char file interfaces (in the VC++ case these appear
to be wide char versions of the POSIX open routine, which at a lower
level call the wide char CreateFile).

This by the way demostrates the use in both a WCHAR_T based system
and a non-WCHAR_T based system as WIN32 ships on both systems
(compile time flags select which it is using natively).

Similar wrappers were done for Sun's Forte compiler.    It's just not all that
hard.   If I have to wrap my brain around yet another library implementation
like STLPort, that's fine, but it's demonstrably not a hard problem.   Nobody
has come up with a single issue that introduces any complexity to this thing
as long as we are sticking to the fstream/streambuf issues.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Tue, 8 Oct 2002 17:54:30 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message news:d6651fb6.0210080618.bdb1bfe@posting.google.com...

>   - If the system *only* uses wide character filenames, I would expect
>     it to accept narrow character filenames anyway.  Since these are the
>     only ones which are standard at present.

Yes, this is true.

>   - If I specify a filename "abc", I expect the system to find the file
>     with that name (supposing it exists), even if the file has a wide
>     character filename.

Yes.

>   - I would certainly expect that "abc" and L"abc" refer to the same
>     file.  But I could be wrong in my expectations.

This is a slightly different issue.   The standard says NOTHING about the encoding
of string literals.   It's hard to imagine a system where the wide strings aren't supersets
of what's available in the narrow string literals, I'm not sure it says that.   However,
I don't think it makes a differnce.

>   - If wide character filenames are supported, I would expect to be able
>     to use them to specify filenames on a system like Plan 9, where the
>     actual filenames are in UTF-8.  (I believe that this is also the
>     direction that Linux is going.)

It's possible.  It depends on what the implementation uses for the encoding
of the wide strings.   On Solaris, wide strings are UTF-32 and the file system
interfaces all accept UTF-8 conversions from this.

> The question is, of course, what does it mean to say that a file can be
> opened with wchar_t based names.  If the proposal is just to add the
> necessary functions, but that they will fail systematically on most
> systems (which don't support wide character filenames), then I fail to
> see how that would help portability.

It's NOT a portability issue.   Filenames are HIGHLY machine specific.  Even
for char based file names there is no guarantee that you can an arbitrary
sequence is a valid file name.

The issue is that the language is broken, there is no way to create files with
wide character names on machines where the interface requires wide names.
There's no real good workaround.    I'm specifically limiting myself to this
which is quite clear as opposed to the other morass (like the fact that strings
are totally ignorant of multibyte and the contents of the streams fare little better).



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 8 Oct 2002 21:40:53 +0000 (UTC)
Raw View
francis.glassborow@ntlworld.com (Francis Glassborow) wrote in message news:<EW7Uh0Lsxgo9EwsU@robinton.demon.co.uk>...
> In article <ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net>,
> Edward Diener <eldiener@earthlink.net> writes
> >I will parse it for you. Various people who answered previously mentioned
> >that the only way which would exist for wide character filename support to
> >be taken seriously as a proposal is if an implementation for such support
> >were done and shown to the C++ committee. I still have no idea of the formal
> >way in which a proposal is made to the C++ committee for changes to the C++
> >language or C++ standard libraries, and gathered from the previous
> >discussion that only implementations of proposed changes, and not just
> >ideas, will somehow be considered.
>
> No, we are quite willing to claim credit for implementing excellent
> ideas and campaigning for them. What we are not willing to accept is the
> principle that someone should do something about an idea. If you have an
> idea and really want to see it discussed you must do the spade work. I
> have quite enough good ideas of my own without being asked to work up
> proposals based on someone else's sketchy suggestion.
>
>
> --
> Francis Glassborow      ACCU
> 64 Southfield Rd
> Oxford OX4 1PA          +44(0)1865 246490
> All opinions are mine and do not represent those of any organisation
>

Here, I think, is the nub of the problem.  These ideas vis-a-vis wide
character file names have been discussed to death.  Those in favor
cannot see what the problem with these ideas is, i.e., what are the
technical arguments against them?  Those opposed have not presented
anything that, IMHO, represented a valid technical argument, and
instead have responded "Go implement it and get back to us", or "do
the spade work".

That's all well and good, but what does it mean?  I personally don't
have the time to implement these changes, nor would my employer be
happy with my mucking about with our standard library code.  Others
who support these ideas are no doubt in the same boat.  Does that mean
that any idea we might have, no matter how good, is to be ignored?

So what do we do?  We have ideas that seem good and useful, and we
would like to see them move forward somehow.  Instead of telling us
that isn't good enough, what about some concrete suggestions about
what we might be better doing instead?  Is there some other forum we
should be addressing?  Are there compiler vendors who might be
interested?  Who might they be?  And how might we contact them?  I
know that many compiler vendors follow this newsgroup.  Why aren't
they picking up on this discussion and responding if they are
interested?

On the one hand, I find myself in reluctant agreement that a standard
is not the place for invention, i.e., the point of a standard is to
document widely used standard practice so that we can all count on it.
 On the other hand, invention too is necessary, but how and where does
it take place?

Another way of looking at this is that those who are posting these
ideas are users of C++, and we are providing user feedback on what
would make C++ a better product for us.  Products that ignore feedback
from their users are generally destined to lose market share to
products that pay attention to their users.  From this point of view,
it would make sense for the committee to help C++ users with good
ideas to move those ideas forward.  How do we do that?

Thanks.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Tue, 8 Oct 2002 22:24:43 +0000 (UTC)
Raw View
"Hyman Rosen" <hyrosen@mail.com> wrote in message news:1034107932.395971@master.nyc.kbcfp.com...
> James Kanze wrote:
> >> Specifically, how it is to be implemented for operating systems that
> >> do not support wide character file names.
> >
> > That is the big question.  I don't think it is a killer problem, but to
> > date, no one has really decided to address it.
>
> Since wcsrtombs is part of standard C and C++,
> I really don't see the problem here. We have a
> standard way to convert a wide string to a
> multi-byte string, so if the OS can't open a
> wide-named file directly, convert the name
> using this function, and use that name instead.

That is a solution (note that wcsrtombs does demonstrate a standardization
DEFECT in that C++ refers to it in context of the C90 spec in which it doesn't
appear (C99 defines, it, but C90 only defines wcstombs).

It may or may not be possible to do such a conversion however.   In some cases
this may clash with existing non-MB legal characters.   Again the whole character
mapping is by nature of the beast highly machine dependent.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 9 Oct 2002 01:19:36 +0000 (UTC)
Raw View
bop2@telia.com ("Bo Persson") wrote in message news:<g6no9.721$hV3.29062@newsb.telia.net>...
>
> I think the message from the committee is that they already *have*
> considered the idea, and rejected it. To have them reconsider, someone has
> to show how it can be of general use. Specifically, how it is to be
> implemented for operating systems that do not support wide character file
> names.
>
>
> Bo Persson
> bop2@telia.com

That's an easy one.  Using a wide-character file name to open a file
on an OS that does not support wide-character file names will fail.
Seems simple enough to implement to me.  Where is the problem?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Wed, 9 Oct 2002 01:20:27 +0000 (UTC)
Raw View
""Ron Natalie"" <ron@sensor.com> wrote in message news:ErIo9.267632$F21.203329@fe02...

> That is a solution (note that wcsrtombs does demonstrate a standardization
> DEFECT in that C++ refers to it in context of the C90 spec in which it doesn't
> appear (C99 defines, it, but C90 only defines wcstombs).
>
> It may or may not be possible to do such a conversion however.   In some cases
> this may clash with existing non-MB legal characters.   Again the whole character
> mapping is by nature of the beast highly machine dependent.

Uh huh.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com (Andy Sawyer)
Date: Wed, 9 Oct 2002 08:49:48 +0000 (UTC)
Raw View
rmaddox@isicns.com (Randy Maddox) writes:

> bop2@telia.com ("Bo Persson") wrote in message
> news:<g6no9.721$hV3.29062@newsb.telia.net>...
> >
> > I think the message from the committee is that they already *have*
> > considered the idea, and rejected it. To have them reconsider, someone has
> > to show how it can be of general use. Specifically, how it is to be
> > implemented for operating systems that do not support wide character file
> > names.
> >
> >
> > Bo Persson
> > bop2@telia.com
>
> That's an easy one.  Using a wide-character file name to open a file
> on an OS that does not support wide-character file names will fail.
> Seems simple enough to implement to me.  Where is the problem?

So where's the benefit to you in having it? Apart from making your
programs "portable" to an enviroment where they are gaurenteed to
fail. Please show how it can be of general use.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Wed, 9 Oct 2002 08:50:10 +0000 (UTC)
Raw View
"Francis Glassborow" <francis.glassborow@ntlworld.com> wrote in message
news:EW7Uh0Lsxgo9EwsU@robinton.demon.co.uk...
> In article <ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net>,
> Edward Diener <eldiener@earthlink.net> writes
> >I will parse it for you. Various people who answered previously mentioned
> >that the only way which would exist for wide character filename support
to
> >be taken seriously as a proposal is if an implementation for such support
> >were done and shown to the C++ committee. I still have no idea of the
formal
> >way in which a proposal is made to the C++ committee for changes to the
C++
> >language or C++ standard libraries, and gathered from the previous
> >discussion that only implementations of proposed changes, and not just
> >ideas, will somehow be considered.
>
> No, we are quite willing to claim credit for implementing excellent
> ideas and campaigning for them. What we are not willing to accept is the
> principle that someone should do something about an idea. If you have an
> idea and really want to see it discussed you must do the spade work. I
> have quite enough good ideas of my own without being asked to work up
> proposals based on someone else's sketchy suggestion.

I am still waiting for a C++ committee member to explain to me how a formal
proposal of change to the C++ language or libraries is submitted to the
committee. I assume you are just such a member. Would you, or any other
member, care to explain this step by step ?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: francis.glassborow@ntlworld.com (Francis Glassborow)
Date: Wed, 9 Oct 2002 11:50:05 +0000 (UTC)
Raw View
In article <5%No9.16510$OB5.1669518@newsread2.prod.itd.earthlink.net>,
Edward Diener <eldiener@earthlink.net> writes
>I am still waiting for a C++ committee member to explain to me how a formal
>proposal of change to the C++ language or libraries is submitted to the
>committee. I assume you are just such a member. Would you, or any other
>member, care to explain this step by step ?

Write a paper providing


1) Rationale (i.e. why this extension/change)
2) What the change is including proposed changes to the wording of the
current standard
3) Impact statement where you itemise the places within the standard
that your proposal will affect.

Many proposals will fall at 1) fence because the rationale will be
considered to be inadequate for the cost  and complexity of the change.
Others will fall because 2) fails to identify the majority of places
where changed wording will be necessary. The fact that the proposed
wording is not exactly correct would not be reason for rejection.
If you can actually write 3) you probably should be an active member of
the committee even if only through electronic participation.

4) Get the support of at least one National Body, preferably that of the
country in which you reside.

If you cannot persuade at least one NB to support your proposal you do
not have much chance.

--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Wed, 9 Oct 2002 15:50:24 +0000 (UTC)
Raw View
"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0210081152.3d9dc06@posting.google.com...

> Here, I think, is the nub of the problem.  These ideas vis-a-vis wide
> character file names have been discussed to death.  Those in favor
> cannot see what the problem with these ideas is, i.e., what are the
> technical arguments against them?  Those opposed have not presented
> anything that, IMHO, represented a valid technical argument, and
> instead have responded "Go implement it and get back to us", or "do
> the spade work".
>
> That's all well and good, but what does it mean?  I personally don't
> have the time to implement these changes, nor would my employer be
> happy with my mucking about with our standard library code.  Others
> who support these ideas are no doubt in the same boat.  Does that mean
> that any idea we might have, no matter how good, is to be ignored?

Yes. If *you* can't justify the time to pursue your obviously
correct and ONE RIGHT idea, why should those of us who question
its ONEness or its RIGHTness?

> So what do we do?  We have ideas that seem good and useful, and we
> would like to see them move forward somehow.  Instead of telling us
> that isn't good enough, what about some concrete suggestions about
> what we might be better doing instead?

We've told you. You just don't like the answer.

>                                       Is there some other forum we
> should be addressing?  Are there compiler vendors who might be
> interested?

Maybe.

>             Who might they be?

Microsoft. Metrowerks. GCC. etc. etc.

                                And how might we contact them?

If you don't know...

>                                                               I
> know that many compiler vendors follow this newsgroup.  Why aren't
> they picking up on this discussion and responding if they are
> interested?

Uh, maybe they're not convinced you have the ONE RIGHT idea?

> On the one hand, I find myself in reluctant agreement that a standard
> is not the place for invention, i.e., the point of a standard is to
> document widely used standard practice so that we can all count on it.
>  On the other hand, invention too is necessary, but how and where does
> it take place?

By inventors, who believe strongly enough in what they see that they
actually follow through?

> Another way of looking at this is that those who are posting these
> ideas are users of C++, and we are providing user feedback on what
> would make C++ a better product for us.  Products that ignore feedback
> from their users are generally destined to lose market share to
> products that pay attention to their users.

True enough. And products that respond to vague requests such as:
``Make it faster,'' or ``Make it cheaper,'' or ``Do what I mean''
don't last long either.

>                                             From this point of view,
> it would make sense for the committee to help C++ users with good
> ideas to move those ideas forward.  How do we do that?

You've been told repeatedly. You just don't like the answer.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Wed, 9 Oct 2002 15:51:21 +0000 (UTC)
Raw View
"Francis Glassborow" <francis.glassborow@ntlworld.com> wrote in message
news:brGD+LCax$o9Ewdj@robinton.demon.co.uk...
> In article <5%No9.16510$OB5.1669518@newsread2.prod.itd.earthlink.net>,
> Edward Diener <eldiener@earthlink.net> writes
> >I am still waiting for a C++ committee member to explain to me how a
formal
> >proposal of change to the C++ language or libraries is submitted to the
> >committee. I assume you are just such a member. Would you, or any other
> >member, care to explain this step by step ?
>
> Write a paper providing
>
>
> 1) Rationale (i.e. why this extension/change)
> 2) What the change is including proposed changes to the wording of the
> current standard
> 3) Impact statement where you itemise the places within the standard
> that your proposal will affect.

Is there any particular form each of these parts take, or is it just a
free-form document ? To whom in particular does this document go ?

>
> Many proposals will fall at 1) fence because the rationale will be
> considered to be inadequate for the cost  and complexity of the change.
> Others will fall because 2) fails to identify the majority of places
> where changed wording will be necessary. The fact that the proposed
> wording is not exactly correct would not be reason for rejection.
> If you can actually write 3) you probably should be an active member of
> the committee even if only through electronic participation.
>
> 4) Get the support of at least one National Body, preferably that of the
> country in which you reside.

What is a "National Body" in the country I preside ? I have no idea to what
this refers.

>
> If you cannot persuade at least one NB to support your proposal you do
> not have much chance.

Why should it not be adequate merely to persuade the C++ standards committee
?


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 9 Oct 2002 15:51:31 +0000 (UTC)
Raw View
kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0210080618.bdb1bfe@posting.google.com>...
> ron@sensor.com ("Ron Natalie") wrote in message
> news:<NXno9.3334$cG.719@fe04>...
>
> Not having any experience with a system using wide character file names,
> it's hard to say.  Still
>
>   - If the system *only* uses wide character filenames, I would expect
>     it to accept narrow character filenames anyway.  Since these are the
>     only ones which are standard at present.
>
>   - If I specify a filename "abc", I expect the system to find the file
>     with that name (supposing it exists), even if the file has a wide
>     character filename.
>
>   - I would certainly expect that "abc" and L"abc" refer to the same
>     file.  But I could be wrong in my expectations.

I would probably rephrase that to say "I would certainly expect that,
on some systems, "abc" and L"abc" refer to the same file".  This is in
line with the current use of filenames where on some systems, e.g.,
Windows, "abc" and "ABC" refer to the same file, while on others,
e.g., Unix, "abc" and "ABC" refer to different files.  The bottom line
here is that the file system is part of the OS rather than part of
C++.

>
>   - If wide character filenames are supported, I would expect to be able
>     to use them to specify filenames on a system like Plan 9, where the
>     actual filenames are in UTF-8.  (I believe that this is also the
>     direction that Linux is going.)
>
> The question is, of course, what does it mean to say that a file can be
> opened with wchar_t based names.  If the proposal is just to add the
> necessary functions, but that they will fail systematically on most
> systems (which don't support wide character filenames), then I fail to
> see how that would help portability.

Actually, there are a great number of OSes that support wide character
file names, including all NT-based versions of Windows and many
versions of Unix and Linux.  The growing number of OSes that support
this is a good part of the reason this question keeps coming up.  Real
developers working on real programs keep running into a problem that
C++ is not helping them with.  Of course there are workarounds, but
when many people keep having to solve the same problem is that not at
least potentially a reason for extending C++ to include a standard
solution?

IMHO, it looks like a good part of the issue is not whether wide
character file names should be supported by the library or not, but
rather the mapping between file names in different character sets.
The latter issue is, from what I have read in discussions on this
newsgroup, ambiguous, contentious and not likely to be resolved.  More
importantly, the mapping issue is, again IMHO, not properly part of
C++ but rather part of the OS.  I am not talking about any "illusion
of portability" here.  The plain fact is that what constitutes a valid
file name has always been defined by the OS and not by C++.  Nothing
that has been proposed here would change that.  Those who write code
to run on multiple OS platforms will have to deal with this issue,
just as they do now.  The proposed additions would simply make that a
bit easier to do.

Randy.

>
> --
> James Kanze                           mailto:jkanze@caicheuvreux.com
> Conseils en informatique orient   e objet/
>                     Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 9 Oct 2002 15:51:51 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message
news:<jMEo9.265415$F21.113410@fe02>...
> "James Kanze" <kanze@gabi-soft.de> wrote in message
> news:d6651fb6.0210080618.bdb1bfe@posting.google.com...

> >   - I would certainly expect that "abc" and L"abc" refer to the same
> >     file.  But I could be wrong in my expectations.

> This is a slightly different issue.  The standard says NOTHING about
> the encoding of string literals.  It's hard to imagine a system where
> the wide strings aren't supersets of what's available in the narrow
> string literals, I'm not sure it says that.  However, I don't think it
> makes a differnce.

I think that an implementation is required to support characters in the
basic character set in all extended character sets.  So as long as the
characters are in the basic character set...

In fact, I'm not really sure what the answer should be here, but I am
sure that any proposal concerning wide character filenames will have to
address it.

> >   - If wide character filenames are supported, I would expect to be
> >     able to use them to specify filenames on a system like Plan 9,
> >     where the actual filenames are in UTF-8.  (I believe that this
> >     is also the direction that Linux is going.)

> It's possible.  It depends on what the implementation uses for the
> encoding of the wide strings.  On Solaris, wide strings are UTF-32 and
> the file system interfaces all accept UTF-8 conversions from this.

I understood part of the suggestion to be that if a system didn't
support wide character filenames (with system functions), a request to
filebuf::open with a wide character filename would fail.  Plan 9 and
Solaris don't support wide character filenames, but at least under Plan
9, I would expect this to work.

I think that this is really more of a quality of implementation issue,
rather than something that should be specified in the standard.  But I
think that if we adopt something into the standard, we should have a
pretty good idea of what a good quality implementation should do with it
in some typical cases.

> > The question is, of course, what does it mean to say that a file can
> > be opened with wchar_t based names.  If the proposal is just to add
> > the necessary functions, but that they will fail systematically on
> > most systems (which don't support wide character filenames), then I
> > fail to see how that would help portability.

> It's NOT a portability issue.  Filenames are HIGHLY machine specific.
> Even for char based file names there is no guarantee that you can an
> arbitrary sequence is a valid file name.

No.  But there is (almost) a guarantee that the user can specify one.

The current situation is a bit awkward.  Consider:

    std::wstring filename ;
    std::wcout << L"Enter filename:" ;
    std::wcin >> filename ;
    std::wifstream file( filename ) ;

I'm sure that most people would like to see this work.  I'm far from
sure that we have any consensus on the semantics, however, if the OS
doesn't support wide character filenames directly.

Of course, this doesn't just affect filenames.  If I copy std::wcin to a
std::wofstream, then view the stream in another window, using different
fonts, it is far from certain that what I see corresponds to what I
enter.  I think in this case we are in a domain where we still don't
really know what questions we should be asking, much less know the
answers.

> The issue is that the language is broken, there is no way to create
> files with wide character names on machines where the interface
> requires wide names.

That's a real problem, and it is a very strong motivation for doing
something.  On the other hand, we have to be very careful about what we
do, least the cure be worse than the disease.

> There's no real good workaround.  I'm specifically limiting myself to
> this which is quite clear as opposed to the other morass (like the
> fact that strings are totally ignorant of multibyte and the contents
> of the streams fare little better).

If the system supports wide character filenames, what a quality
implementation should do is obvious (I think).  The problem is that
there are still some systems which don't.  And the question is what they
should do.  (Even if, of course, what they should do ends up
"implementation defined".  We still want to know what to expect.)  I've
tried to show that just returning an error, regardless of the wide
character filename, is wrong.  It's the easiest solution, but it's
wrong.  Which just means, of course, that more work will be necessary in
order to find a reasonable solution.

One reasonable solution might be to reject anything with characters not
in the basic character set, or in straight, seven bit ASCII, on the
grounds that we don't know which encoding to use.  Another might be
UTF-8, although my systems currently support 8859-1 -- if I have
filenames in UTF-8, they're going to look funny.  Other solutions might
be based on the current locale, or a default locale, or who knows what
locale.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com (Andy Sawyer)
Date: Wed, 9 Oct 2002 15:56:21 +0000 (UTC)
Raw View
rmaddox@isicns.com (Randy Maddox) writes:


> Those opposed have not presented anything that, IMHO, represented a
> valid technical argument

That is, as you say, your opinion. IMHO, there are valid technical
arguments for not _requiring_ implementors to implement the
functionality you want, somne of which have been presented on this
thread. But not all of the reasons for the inclusion or exclusion of a
feature from the standard are purely technical.

 If the standard requires the changes you want, then implementations
for platforms where wide character filenames can _never_ be meaningful
will _have_ to provide those signatures you require to achieve
conformance. In other words, they have to write code (which costs time
and money) knowing it would never be used (which is, I assure you from
personal experience, somewhat soul destroying).

 On the other hand, with the current standard, implementations are
free to implement the signatures you require as extentions to the
current library (more on this later). You are free to take advantage
of those extentions, but of course the portability of your code will
be compromised (your choice). But I don't think you'll get very far
requesting a change in the standard specifically to make your
non-portable code less unportable.

 Clearly, this isn't a technical reason - it's a commercial one, but
it is (IMHO) a valid reason. The fact that you don't like it doesn't
make it any less true.

> I personally don't have the time to implement these changes,

If the changes were as important as you seem to think, you'd make the
time. But even if you really can't make the time, who do you think
ought to be implementing these changes for you? And what makes you
think that /they/ have the time?

> nor would my employer be happy with my mucking about with our
> standard library code.

Surely if you explained your motivation to them, they would
understand? Or isn't the extra functionality that important to them?

> Others who support these ideas are no doubt in the same boat.  Does
> that mean that any idea we might have, no matter how good, is to be
> ignored?

No, but if the people who _do_ support the idea are not prepared to
implement it, then I don't see how they can be justified in expecting
those who don't support the idea to do so for them. Perhaps you can
explain that to me?

> So what do we do?  We have ideas that seem good and useful, and we
> would like to see them move forward somehow.  Instead of telling us
> that isn't good enough, what about some concrete suggestions about
> what we might be better doing instead?  Is there some other forum we
> should be addressing?

If you're as keen as you seem to be to see a change in the standard,
then you're going to _have_ to do some legwork. Write a proposal
detailing the changes/additions - and a reference implementation will
certainly add substantial weight to your proposal. Submit that
proposal to your national standards body. Be prepared to travel to
WG21 meetings to present your proposal (it's a fact of life that it's
much more likely that time will be spent discussing it if you present
it in person rather as part of a mailing.)

> Are there compiler vendors who might be interested? Who might they
> be?  And how might we contact them?

IIRC, there's at least one library vendor who has posted in response
to this thread. As for compiler vendors, presumably you know whose
compiler(s) you are currently using, so they might be a good start. Of
course, if you indicate to your compiler vendor(s) how important this
is to you, then you may be able to encourage them to make the changes
you desire - and, or course, since _a_ reference implemetation is Good
Thing imagine how much help _several_ reference implementations -
possibly on a variety of platforms - would be.

> I know that many compiler vendors follow this newsgroup.  Why aren't
> they picking up on this discussion and responding if they are
> interested?

"if". That's the important word there. That /may/ be an indication
that the vendors who do follow the group aren't interested. Or it may
be an indication of something else entirely - I can't honestly speak
for them.

> On the one hand, I find myself in reluctant agreement that a standard
> is not the place for invention, i.e., the point of a standard is to
> document widely used standard practice so that we can all count on it.
>  On the other hand, invention too is necessary, but how and where does
> it take place?

Mostly outside of the commitee room. A thoroughly researched, mature
and demonstrably implementatble proposal is much more likely to be
approved than "wouldn't it be nice if...".

> Another way of looking at this is that those who are posting these
> ideas are users of C++, and we are providing user feedback on what
> would make C++ a better product for us.

This point of view is valid up to a point - but what many of us forget
is that C++ is used in a huge variety of environments, and most of us
are only familiar with a handful of them. Something that might make
C++ a better product for me might also make it a worse product for
you, and I'm sure you'd argue against that change.

> Products that ignore feedback from their users are generally
> destined to lose market share to products that pay attention to
> their users.

So you're suggesting that C++ will fall into disuse because the
filestream classes don't have wchar_t constructors? Yes, I see that
happening (NOT!) Of course, if that really is the case then all the
compiler and library vendors will rush out and implement your desired
features so they stay in business, and that will add even more weight
to a proposal to add those features to the standard.

> From this point of view, it would make sense for the committee to
> help C++ users with good ideas to move those ideas forward.  How do
> we do that?

Hopefully the above will at least give you some pointers.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Wed, 9 Oct 2002 17:03:08 +0000 (UTC)
Raw View
"Andy Sawyer" <andys@evo6.com> wrote in message
news:elb0s2v3.fsf@ender.evo6.com...
> rmaddox@isicns.com (Randy Maddox) writes:
>
> > bop2@telia.com ("Bo Persson") wrote in message
> > news:<g6no9.721$hV3.29062@newsb.telia.net>...
> > >
> > > I think the message from the committee is that they already *have*
> > > considered the idea, and rejected it. To have them reconsider, someone
has
> > > to show how it can be of general use. Specifically, how it is to be
> > > implemented for operating systems that do not support wide character
file
> > > names.
> > >
> > >
> > > Bo Persson
> > > bop2@telia.com
> >
> > That's an easy one.  Using a wide-character file name to open a file
> > on an OS that does not support wide-character file names will fail.
> > Seems simple enough to implement to me.  Where is the problem?
>
> So where's the benefit to you in having it? Apart from making your
> programs "portable" to an enviroment where they are gaurenteed to
> fail. Please show how it can be of general use.

Is there no benefit of having it for operating systems which do support wide
character filenames ? I am getting the idea that you believe that anything
which is not automatically supported on all operating systems should not be
part of the C++ standard.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 9 Oct 2002 17:04:03 +0000 (UTC)
Raw View
rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0210081152.3d9dc06@posting.google.com>...
> francis.glassborow@ntlworld.com (Francis Glassborow) wrote in message
> news:<EW7Uh0Lsxgo9EwsU@robinton.demon.co.uk>...
> > In article <ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net>,
> > Edward Diener <eldiener@earthlink.net> writes

> > >I will parse it for you. Various people who answered previously
> > >mentioned that the only way which would exist for wide character
> > >filename support to be taken seriously as a proposal is if an
> > >implementation for such support were done and shown to the C++
> > >committee. I still have no idea of the formal way in which a
> > >proposal is made to the C++ committee for changes to the C++
> > >language or C++ standard libraries, and gathered from the previous
> > >discussion that only implementations of proposed changes, and not
> > >just ideas, will somehow be considered.

> > No, we are quite willing to claim credit for implementing excellent
> > ideas and campaigning for them. What we are not willing to accept is
> > the principle that someone should do something about an idea. If you
> > have an idea and really want to see it discussed you must do the
> > spade work. I have quite enough good ideas of my own without being
> > asked to work up proposals based on someone else's sketchy
> > suggestion.

> Here, I think, is the nub of the problem.  These ideas vis-a-vis wide
> character file names have been discussed to death.  Those in favor
> cannot see what the problem with these ideas is, i.e., what are the
> technical arguments against them?  Those opposed have not presented
> anything that, IMHO, represented a valid technical argument, and
> instead have responded "Go implement it and get back to us", or "do
> the spade work".

That is a misrepresentation.  There's no doubt in my mind that someone
like Dinkumware, the authors of STLport, or the authors of the g++
library could implement wide character filenames without any excessive
effort.  If they knew what to implement.  The problem to date is that
all of the proposals have been simple: we want wide character
filenames.  When asked what the semantics should be, I've yet to see any
consensus -- some of the proponents wanted one thing, some others
something else.

As long as the proponents continue to take a na   ve approach, and refuse
to recognize that there are some important questions that have to be
answered first, the proposal won't go anywhere.

(Before I go any further, I might add that I have the impression that
this may not be the case this time around, and that Ron seems to be
willing to listen to what the problems are, and to address them.  (And
also not mix in superfluous issues like exception::what which cause
untold implementation problems, and are of little use.)  If this is the
case, then there is a good chance that he will succeed.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 9 Oct 2002 12:05:50 CST
Raw View
pdimov@mmltd.net (Peter Dimov) wrote in message
news:<7dc3b1ea.0210080714.6bd1e1d5@posting.google.com>...
> bdawes@acm.org (Beman Dawes) wrote in message
> news:<70fa0367.0210061718.3b8747e8@posting.google.com>...

> > ron@sensor.com ("Ron Natalie") wrote in message
> > news:<sgmn9.11631$Jw5.6647@fe04>...

> > > I therefore suggest the following changes:

> [wchar_t filenames]

> > With what semantics on operating systems which do not support
> > wchar_t filenames?

> What semantics do the existing functions have on systems which only
> support wchar_t filenames?  Let's be fair. Some systems have to
> convert today.  Some systems will have to convert tomorrow.

Certainly.

> > Based on past committee discussions, numerous members of the
> > standards committee's library working group would like to provide
> > for wchar_t filenames. But not a single LWG member has been willing
> > to speak in favor of doing so without explicit semantics.

> > "Implementation defined" won't fly; that's just an "illusion of
> > portability" without any underlying reality. The implementors say
> > they have no idea what semantics they would be defining.

> The semantics _must_ be implementation defined. The meaning of a file
> name has always been implementation defined.

Agreed.  There are still two open points: 1) implementation defined or
not, there should be some semantics -- an attempt to open using a wide
character filename cannot be standardized if the behavior is to fail on
almost all existing systems, and 2) we must have some idea of what to
reasonably expect from a quality implementation (of C++) on the usual
systems (Unix, Windows -- but I think we already have an answer for
Windows).

> That said, let me outline a "reference conversion" on a system where
> the native file name is an NTBS, and wchar_t is UCS-x.

> * If the wchar_t sequence contains only characters in the [1..255]
> range and does not start with 0xEF 0xBB 0xBF, use the corresponding
> byte sequence;

> * Otherwise, convert the wchar_t sequence to UTF-8 and prepend 0xEF
> 0xBB 0xBF.

> Is this good enough? If not, why not?

To begin with, it might leave ME with filenames I can't display
correctly.

This is a tricky question, and goes well beyond the simple question of
filenames.  My system (or systems -- both Linux and Solaris work pretty
much the same way here) stores filenames as a string of bytes.  It
pretty much handles transparently whatever bytes are given to it; '\0'
and '/' are exceptions, but they don't really affect anything I'm about
to say.

My system displays such filenames (using the 'ls' program for example)
by simply copying them to the screen window.  It does some preliminary
checks, to replace non-printable characters with a '?', for example,
using the current locale.  How the characters actually appear on the
screen depends on the current font.

When available, I generally use fonts and locales for 8859-15. This
means, for example, that the wide character '\u20A0' should be
translated to the narrow character if it is to appear correctly as the
Euro sign. Using your scheme, I will see six characters: "      ??", where
the last two question marks replace non-printable characters.

While not perfect, an acceptable implementation would be to use a
character between [0...255] directly, and reject any wide character
filename which doesn't contain characters in this range.

Now, I don't expect the standards committee to give special attention to
my personal environment. But it is one of the most frequent environments
for Unix machines in Europe. I would expect it to work more or less
correctly with a quality implementation. And of course, a quality
implementation has to work with the existing OS. (If we could also
influence the OS, your suggestion has a lot going for it. Filenames are
8859-1, unless the name is at least three characters, and starts with
"   ", in which case it is Unicode, starting with a zero width no-break
space.  The problem is that the decision doesn't rest with the C++
implementor, and I haven't heard about any great movement in this
direction from Sun (for Solaris) or even Linux.)

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 9 Oct 2002 17:48:53 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message
news:<fvCo9.263073$F21.178897@fe02>...
> "Francis Glassborow" <francis.glassborow@ntlworld.com> wrote in
> message news:EW7Uh0Lsxgo9EwsU@robinton.demon.co.uk...

> > No, we are quite willing to claim credit for implementing excellent
> > ideas and campaigning for them. What we are not willing to accept is
> > the principle that someone should do something about an idea. If you
> > have an idea and really want to see it discussed you must do the
> > spade work. I have quite enough good ideas of my own without being
> > asked to work up proposals based on someone else's sketchy
> > suggestion.

> It's not sketchy.  It's simple and straight-forward and we've already
> done it for the (at least DEBUG source version) shipped with Visual
> C++ 6 SP 5.  All you do is add the appropriate interfaces I detailed
> to call the existing implementation wide char file interfaces (in the
> VC++ case these appear to be wide char versions of the POSIX open
> routine, which at a lower level call the wide char CreateFile).

> This by the way demostrates the use in both a WCHAR_T based system and
> a non-WCHAR_T based system as WIN32 ships on both systems (compile
> time flags select which it is using natively).

> Similar wrappers were done for Sun's Forte compiler.

Excellent.  What are the semantics?  And what do the people using them
say about them?

> It's just not all that hard.

Regardless of what semantics we decide, the implementation shouldn't be
that hard.  Deciding what semantics are appropriate is the only problem
I see.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Wed, 9 Oct 2002 17:51:01 +0000 (UTC)
Raw View
kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0210080633.e8ff293@posting.google.com>...
> ron@sensor.com ("Ron Natalie") wrote in message
> news:<XQno9.3253$cG.781@fe04>...
> > Of all the "objections" to your fstream propsals are:
>
> >    In the standard, this will be "implementation defined", but we
> >    don't want to add anything "implementation defined" to the standard
> >    without a fairly good idea of what we should expect from a quality
> >    implementation in a specific context
>
> That is, I believe, a fair summary.  (It's my position, at least:-).)
>
> > The answer is that a quality implementation does what is right for it.
>
> The problem is that many implementors don't know what is right for them.
> I'm not an implementor, but I have no idea what is right for Unix, for
> example. So they don't want to have to implement it.
>
> > If there is a mapping from wide string to a narrow one, then it can do
> > the conversion, otherwise it has no choice but to return an error.
>
> The problem is that there are potentially many mappings.  And
> implementors don't (yet) know how to choose.

The perfect mapping has some pretty obvious properties:

* Equal wchar_t sequences map to equal byte sequences, independent of
OS/runtime/compiler state.

* Different wchar_t sequences map to different byte sequences.

* Identity mapping for file names in a suitably limited character set.

I believe that it is possible to invent a mapping that is very close
to perfect. See my answer to Beman's post.

> > As you state, this is NO different than feeding inappropriate
> > characters (say things like slashes or characters > 128) to existing
> > implementations.
>
> I feed characters > 128 to my implementation all of the time.  That may
> be why I'm sensibilized to the problem; the results aren't always what
> one might expect.
>
> > If the implementation can't represent the given string (wide or
> > narrow) as a filename, then it has to return error.
>
> If the system has a very limited idea of what a filename can be (say,
> like on the old PDP-11), I don't think that the problem is very complex.
> The real problem is with systems like Unix, which allow almost anything
> in a filename.  But interprets them according to context.  Create a file
> using accented characters on Solaris.  Then try invoking ls while
> changing the environment variable LC_ALL, and watch the filename
> change.  The problem isn't trivial.  (That doesn't mean that it doesn't
> have a usable answer.)

Unix does not interpret file names in any way. It uses a byte sequence
as a file name, the same byte sequence always names the same file, and
the names of the existing files are locale-independent. It is 'ls'
that interprets the bytes as locale-dependent characters.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rjcox@cix-remove-me-.co.uk (Richard J Cox)
Date: Wed, 9 Oct 2002 17:58:33 +0000 (UTC)
Raw View
In article <70fa0367.0210061718.3b8747e8@posting.google.com>,
bdawes@acm.org (Beman Dawes) wrote:

> Based on past committee discussions, numerous members of the standards
> committee's library working group would like to provide for wchar_t
> filenames. But not a single LWG member has been willing to speak in
> favor of doing so without explicit semantics.

Which I would certainly like, however I think this is less important than
have the capacity.

> "Implementation defined" won't fly; that's just an "illusion of
> portability" without any underlying reality. The implementors say they
> have no idea what semantics they would be defining.

Also true, but the current reality within which we work.

> "Unspecified" won't fly either; unlike narrow character filenames,
> there is no agreed upon behavior for an implementation to fall back
> upon.

Where is the "agreed upon behaviour" for narrow (char*) filenames defined?

> Perhaps there are some reasonably portable semantics for wide names on
> narrow platforms, and the LWG has somehow overlooked them. But until
> someone steps forward and explains what those semantics are, the LWG
> can't standardize them.

I am not sure what the compromise should be in this case; however I do not
think that trying to define narrowing semantics is actually very helpful
at this point (too many variables to consider). I think that we should
start by defining widening semantics where there is no issue of
unrepresentable characters (i.e. all narrow character sets are proper
subsets of the filesystems character set).

This means we can focus on perhaps the most contentious issue which I
believe is: should file name character set conversions be locale relative
(i.e. does same set of byte values always convert to the same filename
even if the user has changed locales)?

Once this is resolved other questions should be come easier.

IOW let us solve a simpler problem first.

Richard

--
rjcox at cix dot co dot uk

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: Wed, 9 Oct 2002 17:59:24 +0000 (UTC)
Raw View
On Wed, 9 Oct 2002 15:56:21 +0000 (UTC), andys@evo6.com (Andy Sawyer) wrote:

>rmaddox@isicns.com (Randy Maddox) writes:
>...
>> I personally don't have the time to implement these changes,
>
>If the changes were as important as you seem to think, you'd make the
>time. But even if you really can't make the time, who do you think
>ought to be implementing these changes for you? And what makes you
>think that /they/ have the time?

Andy may be correct regarding the specific issue under discussion,
since it's easy to implement, and has been implemented.

But the general meaning of the comment is false.

In some cases, what is very difficult and time-consuming for one
person or company can be much easier for someone else, e.g.
someone with detailed knowledge of a compiler implementation.

If, say, I noticed that the square wheels of my car are the main
cause of the car's very rough ride, it would be unreasonable for
the car manufacturer to require *me* to fully analyze, manufacture
and test my proposed "round wheels", especially since other cars
already have round wheels.

Of course that analogy doesn't hold completely (it's questionable
whether the lack of wchar_t support actually is a defect or just
a would-be-nice-to-have feature, C++ as a language is not a
product, etc.), but I think the general idea is valid; just not
always applicable.  In the particular case I think it is probably
not applicable.  In general, I think it often is applicable.

- Alf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: Wed, 9 Oct 2002 18:35:16 +0000 (UTC)
Raw View
On 9 Oct 2002 17:10:01 GMT, kanze@gabi-soft.de (James Kanze) wrote:

>pdimov@mmltd.net (Peter Dimov) wrote in message
>news:<7dc3b1ea.0210080714.6bd1e1d5@posting.google.com>...
>> bdawes@acm.org (Beman Dawes) wrote in message
>> news:<70fa0367.0210061718.3b8747e8@posting.google.com>...
>
>> > ron@sensor.com ("Ron Natalie") wrote in message
>> > news:<sgmn9.11631$Jw5.6647@fe04>...
>
>> > > I therefore suggest the following changes:
>
>> [wchar_t filenames]
>
>> > With what semantics on operating systems which do not support
>> > wchar_t filenames?
>
>> What semantics do the existing functions have on systems which only
>> support wchar_t filenames?  Let's be fair. Some systems have to
>> convert today.  Some systems will have to convert tomorrow.
>
>Certainly.
>
>> > Based on past committee discussions, numerous members of the
>> > standards committee's library working group would like to provide
>> > for wchar_t filenames. But not a single LWG member has been willing
>> > to speak in favor of doing so without explicit semantics.
>
>> > "Implementation defined" won't fly; that's just an "illusion of
>> > portability" without any underlying reality. The implementors say
>> > they have no idea what semantics they would be defining.
>
>> The semantics _must_ be implementation defined. The meaning of a file
>> name has always been implementation defined.
>
>Agreed.  There are still two open points: 1) implementation defined or
>not, there should be some semantics -- an attempt to open using a wide
>character filename cannot be standardized if the behavior is to fail on
>almost all existing systems, and 2) we must have some idea of what to
>reasonably expect from a quality implementation (of C++) on the usual
>systems (Unix, Windows -- but I think we already have an answer for
>Windows).
>
>> That said, let me outline a "reference conversion" on a system where
>> the native file name is an NTBS, and wchar_t is UCS-x.
>
>> * If the wchar_t sequence contains only characters in the [1..255]
>> range and does not start with 0xEF 0xBB 0xBF, use the corresponding
>> byte sequence;
>
>> * Otherwise, convert the wchar_t sequence to UTF-8 and prepend 0xEF
>> 0xBB 0xBF.
>
>> Is this good enough? If not, why not?
>
>To begin with, it might leave ME with filenames I can't display
>correctly.
>
>This is a tricky question, and goes well beyond the simple question of
>filenames.  My system (or systems -- both Linux and Solaris work pretty
>much the same way here) stores filenames as a string of bytes.  It
>pretty much handles transparently whatever bytes are given to it; '\0'
>and '/' are exceptions, but they don't really affect anything I'm about
>to say.
>
>My system displays such filenames (using the 'ls' program for example)
>by simply copying them to the screen window.  It does some preliminary
>checks, to replace non-printable characters with a '?', for example,
>using the current locale.  How the characters actually appear on the
>screen depends on the current font.
>
>When available, I generally use fonts and locales for 8859-15. This
>means, for example, that the wide character '\u20A0' should be
>translated to the narrow character if it is to appear correctly as the
>Euro sign. Using your scheme, I will see six characters: "      ??", where
>the last two question marks replace non-printable characters.
>
>While not perfect, an acceptable implementation would be to use a
>character between [0...255] directly, and reject any wide character
>filename which doesn't contain characters in this range.

You mean, reject any wide character filename which contains characters
outside of this range?  The difference isn't trivial.  But I guess that
that was a typo?


>Now, I don't expect the standards committee to give special attention to
>my personal environment. But it is one of the most frequent environments
>for Unix machines in Europe. I would expect it to work more or less
>correctly with a quality implementation. And of course, a quality
>implementation has to work with the existing OS. (If we could also
>influence the OS, your suggestion has a lot going for it. Filenames are
>8859-1, unless the name is at least three characters, and starts with
>"   ", in which case it is Unicode, starting with a zero width no-break
>space.  The problem is that the decision doesn't rest with the C++
>implementor, and I haven't heard about any great movement in this
>direction from Sun (for Solaris) or even Linux.)

It seems to me that there are several "good" conversions down to
narrow characters, namely


  * No change of character codes.
    - Assumes the system filename character set is a subset of the
      C++ wide character set.
    - Error on filenames that contain characters outside this set.

  * Convert to some MBCS character set NS other than UTF-8.
    - Assumes the system treats filenames as NS.
    - Error possible on characters not representable in NS.

  * Convert to UTF-8 with non-display prefix, as necessary.
    - Assumes the system treats filenames as UTF-8 (otherwise
      display of filenames may be impossible to grok).
    - No errors on the filename conversion.


The key here is, IMHO, the word "Assumes".  So I don't think it's
possible to select one particular conversion as being the "best"
compromise.  Instead, the conversion should match the system, but
that match can only partially be specified by the standard (and
should perhaps not be specified at all, but left to the compiler).


***

Now, at the risk of spinning off counter-arguments to what I'd
dearly like to see in a revised standard (namely wchar_t support),
since Java and MS Windows have conspired to make UCS-2  --  16-bit
Unicode  --  a de facto standard, it isn't unlikely that many C++
implementations will also in the future have wchar_t as 16 bits.

I think this has been discussed at length, but without any firm
conclusion (the issue is the ability to represent any character
by a single element in a sequence of character type elements),
so I suggest let's skip that particular debate; it's just the
motivation for my point, which follows.

The point I'm raising is this: in the future we may not only need
conversion from "wide character" to "narrow character" filenames,
but conversions from UCS-4 to UCS-2, and down to single-byte.  So
the C++ model of either wide character or narrow character may no
longer match reality.  Is there any way we can prepare for this
situation, so that the language won't be in-principle incompatible?


Cheers,


- Alf

(with apologies for any abuse of terminology)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com (Andy Sawyer)
Date: Wed, 9 Oct 2002 23:57:18 +0000 (UTC)
Raw View
eldiener@earthlink.net ("Edward Diener") writes:

> "Andy Sawyer" <andys@evo6.com> wrote in message
> news:elb0s2v3.fsf@ender.evo6.com...
> > rmaddox@isicns.com (Randy Maddox) writes:
> >
> > > That's an easy one.  Using a wide-character file name to open a
> > > file on an OS that does not support wide-character file names
> > > will fail.  Seems simple enough to implement to me.  Where is
> > > the problem?
> >
> > So where's the benefit to you in having it? Apart from making your
> > programs "portable" to an enviroment where they are gaurenteed to
> > fail. Please show how it can be of general use.
>
> Is there no benefit of having it for operating systems which do
> support wide character filenames ?

Quite probably. And - as I have said elsewhere - it is quite
permissable (indeed, in Randy's case, desireable) for language &
library vendors to implement those functions as extentions to the
standard. Once again, where is the benefit to Randy of a library
vendor implementing a function that is required to fail? Of couse, it
won't fail on /his/ platform(s), but it will fail on the platforms
other people are using (not that he cares about that). And the reason
it will fail could be as simple as a typo (for instance, an extra L),
which _could_ have been detected as a compile-time error on those
platforms. So adding these functions to the standard is not only of no
benefit on some cases, it is actually dangerous in some cases.

 Of course, if you think run-time failures due to compile-time
diagnosable errors are preferable to compile-time diagnosis, then
please feel free to submit (and support) your proposal through the
standards process.

> I am getting the idea that you believe that anything which is not
> automatically supported on all operating systems should not be part
> of the C++ standard.

You're entitled to your beliefs. You're also entitled to be wrong. If
that is what you believe that I believe, then in this case you are
wrong. However, I do feel the argument that "add this function so that
my program is portable to environments where it is gaurenteed to fail"
is a little on the weak side.

 For the record, much of my development work is on platforms that _do_
support wide-character filenames, and I can live with the current
situation. Having said that, I also work on platforms which do _not_
support wide-character filenames, which means I see both sides of the
coin here. Lack of standard library support for wide-character names
has never been a particularly great burden to me (nor has lack of
standard library support for reference counting pointers, or vulgar
fractions, or non-normalised rational numbers, or singly linked lists
or any one of a number of features I would like to see standardised -
many of which would be generally useful).

 In the case of an International Standard (and all that that implies),
when no clear case can be seen for or against a change, one must err
on the side of no change.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Thu, 10 Oct 2002 00:38:40 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0210090754.2d2cbf1d@posting.google.com...
> Now, I don't expect the standards committee to give special attention to
> my personal environment. But it is one of the most frequent environments
> for Unix machines in Europe. I would expect it to work more or less
> correctly with a quality implementation. And of course, a quality
> implementation has to work with the existing OS. (If we could also
> influence the OS, your suggestion has a lot going for it. Filenames are
> 8859-1, unless the name is at least three characters, and starts with
> "   ", in which case it is Unicode, starting with a zero width no-break
> space.  The problem is that the decision doesn't rest with the C++
> implementor, and I haven't heard about any great movement in this
> direction from Sun (for Solaris) or even Linux.)

You have made an excellent case for the fact that the semantics of wide
character filenames must be defined by the implementation/OS for which it is
used. While one may say, here are the things which C++ recommends should be
done if the OS does not natively support wide character filenames in its
API, it is impossible to come up with a set of rules which tells an
implementation on every OS what they must do. If the intent is to not
support wide character filenames in the C++ file streams and message facets
because not every OS has committed to natively supporting it in their API
and their file system, then I think that is a valid choice certainly but
that it is wrong. I say it is wrong because it seems apparent to me that the
general movement is toward OSs that cater to foreign languages whose
character set can not be encompassed by a string of narrow characters. By
supporting wide character filenames for OSs which do support them in their
API and file system, C++ will lead the way in supporting an idea whose time
has come rather than embracing a backward path.

Other languages ( Java, C# ) already support wide character filename,
rightly or wrongly, in the form of Unicode strings of some character width.
They have provided semantics and now their support is already out-of-date in
many instances. I hope that C++ will not make that mistake and provide
support for wide character filenames which an implementation/OS is allowed
to define for itself. In that case, if OSs do coagulate around a standard,
C++ will not be out-of-date as these other languages are or may well be in
the future.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: francis.glassborow@ntlworld.com (Francis Glassborow)
Date: Thu, 10 Oct 2002 02:27:45 +0000 (UTC)
Raw View
In article <b1Wo9.17213$OB5.1719713@newsread2.prod.itd.earthlink.net>,
Edward Diener <eldiener@earthlink.net> writes
>Why should it not be adequate merely to persuade the C++ standards committee
>?
>

That would be perfectly adequate. But your chance of doing that without
having someone defend your views/proposals is pretty small.
To answer an earlier question, formally the people who present material
to the Standards committees are its members. I.e. Members of J16 or NBs
who are members of WG21. Informally, we never refuse to listen to good
ideas that are well presented.

Most Standards are very much more closed than this one. In many cases
even to be heard requires very substantial cash payments. Try changing
the ISO Standard on grass cutting equipment. You might find it somewhat
hard to even find someone to listen to you informally.




--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: semzx@newmail.ru (Alexander E. Patrakov)
Date: Thu, 10 Oct 2002 08:24:02 +0000 (UTC)
Raw View
Let's reconsider some earlier statements made by various people:

> Implementors (and users) need some clear
> guidance on what the correspondence might be between narrow and wide
> names, in the presence of parallel signatures. It's not that there might
> not be a way -- there always is. It's that there might be more than one
> way, and no clear rule to choose.

> Not having a way to resolve the ambiguity means that there exists no
> correspondence.

That's true, and that's why we cannot use wchar_t * as a filename on
platforms where "native" characters in filenames are of type char.


> What semantics do the existing functions have on systems which only
> support wchar_t filenames? Let's be fair. Some systems have to convert
> today. Some systems will have to convert tomorrow.

Also valid. We are in trouble using char * as a filename on platforms
where "native" characters in filenames are of type wchar_t since there
are inaccessible files.

Conclusion: there is NO single acceptable character set for filenames.
My proposal is to say this explicitly:

typedef platform_dependent filenamechar_t;

and to add overloads to file-opening functions accepting
filenamechar_t * on platforms where filenamechar_t is not char.

So, we lose nothing (compared to wchar_t * related proposal) on
platforms which accept wchar_t * as filenames and (maybe) gain some
portability to other systems.

Alexander E. Patrakov

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 10 Oct 2002 11:48:50 CST
Raw View
hyrosen@mail.com (Hyman Rosen) wrote in message news:<1034188726.855265@master.nyc.kbcfp.com>...
> Peter Dimov wrote:
> > The perfect mapping has some pretty obvious properties:
> > * Equal wchar_t sequences map to equal byte sequences, independent of
> > OS/runtime/compiler state.
>
> Why should this be the case? I would think the locale in effect
> would dictate the conversion from wide-char to multi-byte sequence.

Because, if the forward mapping is state-dependent, the reverse
mapping will be state-dependent, too. Therefore, in order to decode a
filename, you will need not only the byte sequence, but the state used
at the time the forward mapping was performed, which is one of the
main disadvantages of char[] based names (to interpret them you need
to know the code page).

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: 10 Oct 2002 16:50:08 GMT
Raw View
""Edward Diener"" <eldiener@earthlink.net> wrote in message
news:at3p9.18378$OB5.1806394@newsread2.prod.itd.earthlink.net...

> You have made an excellent case for the fact that the semantics of wide
> character filenames must be defined by the implementation/OS for which it is
> used. While one may say, here are the things which C++ recommends should be
> done if the OS does not natively support wide character filenames in its
> API, it is impossible to come up with a set of rules which tells an
> implementation on every OS what they must do. If the intent is to not
> support wide character filenames in the C++ file streams and message facets
> because not every OS has committed to natively supporting it in their API
> and their file system, then I think that is a valid choice certainly but
> that it is wrong. I say it is wrong because it seems apparent to me that the
> general movement is toward OSs that cater to foreign languages whose
> character set can not be encompassed by a string of narrow characters. By
> supporting wide character filenames for OSs which do support them in their
> API and file system, C++ will lead the way in supporting an idea whose time
> has come rather than embracing a backward path.

Why is it an either/or choice? You're arguing that because there's a
``movement toward'' some technology, that we should make our best guess
at standardizing what things should look like when we get there. That's
*never* been a good strategy, particularly in software. We just don't
know enough.

> Other languages ( Java, C# ) already support wide character filename,
> rightly or wrongly, in the form of Unicode strings of some character width.
> They have provided semantics and now their support is already out-of-date in
> many instances.

EXACTLY. And the designers of both languages had the information they
needed at design time to avoid becoming out of date. They just guessed
wrong at the time, because of insufficient practical experience (IMO).

>          I hope that C++ will not make that mistake and provide
> support for wide character filenames which an implementation/OS is allowed
> to define for itself. In that case, if OSs do coagulate around a standard,
> C++ will not be out-of-date as these other languages are or may well be in
> the future.

So we standardize something that's so wishy washy today that it'll be
equally wishy washy in ten years, and equally beneficial to writing
portable programs?

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Thu, 10 Oct 2002 16:54:06 +0000 (UTC)
Raw View
"Alexander E. Patrakov" <semzx@newmail.ru> wrote in message
news:642ac601.0210091922.192a8c4@posting.google.com...

> Also valid. We are in trouble using char * as a filename on platforms
> where "native" characters in filenames are of type wchar_t since there
> are inaccessible files.

People keep saying that, but I haven't found it to be true in practice.
There's *always* at least one multibyte encoding that will represent
an arbitrary sequence of wide characters. The problem is that there's
usually more than one.

> Conclusion: there is NO single acceptable character set for filenames.
> My proposal is to say this explicitly:
>
> typedef platform_dependent filenamechar_t;
>
> and to add overloads to file-opening functions accepting
> filenamechar_t * on platforms where filenamechar_t is not char.
>
> So, we lose nothing (compared to wchar_t * related proposal) on
> platforms which accept wchar_t * as filenames and (maybe) gain some
> portability to other systems.

The problem is not with the type of the elements used to represent
characters. It's with the relationship between byte- and wide-character
sequences used to represent file names.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 10 Oct 2002 11:50:30 CST
Raw View
eldiener@earthlink.net ("Edward Diener") wrote in message
news:<b1Wo9.17213$OB5.1719713@newsread2.prod.itd.earthlink.net>...
> "Francis Glassborow" <francis.glassborow@ntlworld.com> wrote in
> message news:brGD+LCax$o9Ewdj@robinton.demon.co.uk...
> > In article <5%No9.16510$OB5.1669518@newsread2.prod.itd.earthlink.net>,
> > Edward Diener <eldiener@earthlink.net> writes
> > >I am still waiting for a C++ committee member to explain to me how
> > >a formal proposal of change to the C++ language or libraries is
> > >submitted to the committee. I assume you are just such a
> > >member. Would you, or any other member, care to explain this step
> > >by step ?

> > Write a paper providing

> > 1) Rationale (i.e. why this extension/change)
> > 2) What the change is including proposed changes to the wording of the
> > current standard
> > 3) Impact statement where you itemise the places within the standard
> > that your proposal will affect.

> Is there any particular form each of these parts take, or is it just a
> free-form document ? To whom in particular does this document go ?

The form isn't that important, as long as the document is
understandable, and the information is there.  I don't know if this is
still the case, but it used to be that you could send it to the
secretary of the committee.

> > Many proposals will fall at 1) fence because the rationale will be
> > considered to be inadequate for the cost and complexity of the
> > change.  Others will fall because 2) fails to identify the majority
> > of places where changed wording will be necessary. The fact that the
> > proposed wording is not exactly correct would not be reason for
> > rejection.  If you can actually write 3) you probably should be an
> > active member of the committee even if only through electronic
> > participation.

> > 4) Get the support of at least one National Body, preferably that of the
> > country in which you reside.

> What is a "National Body" in the country I preside ?

That depends where you reside.  In France, the national body is AFNOR,
in Germany DIN and in Great Britain BSI.

But Francis' explination presents an international point of view.  Due
to its size, and its exceptional role in the standardization, I think
the situation is slightly different for ANSI (the American national
body).  For starters, the ANSI committee meets "in parallel" with the
ISO committee.

Note too that many of the national bodies will listen to you even if you
aren't a national.

> I have no idea to what this refers.

ISO is made up of "national bodies": the standardization groups in each
of its member countries.  The standardization of C++ is the activity of
a working group of ISO, and the members of that working group are the
members of the corresponding working groups in the national bodies.
Except that a lot of national bodies don't have working groups for C++,
in this case, try the national body of a neighboring country, or a
country which has historical links to your country.

Failing that, ANSI has no nationality requirements for membership.

> > If you cannot persuade at least one NB to support your proposal you
> > do not have much chance.

> Why should it not be adequate merely to persuade the C++ standards
> committee ?

The C++ standards committee is made up of representatives of the
national bodies.  If you cannot persuade at least one member to support
your proposal, how do you expect to persuade the organization as a
whole.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 10 Oct 2002 17:34:41 +0000 (UTC)
Raw View
kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0210090754.2d2cbf1d@posting.google.com>...
> pdimov@mmltd.net (Peter Dimov) wrote in message
> news:<7dc3b1ea.0210080714.6bd1e1d5@posting.google.com>...
> > That said, let me outline a "reference conversion" on a system where
> > the native file name is an NTBS, and wchar_t is UCS-x.
>
> > * If the wchar_t sequence contains only characters in the [1..255]
> > range and does not start with 0xEF 0xBB 0xBF, use the corresponding
> > byte sequence;
>
> > * Otherwise, convert the wchar_t sequence to UTF-8 and prepend 0xEF
> > 0xBB 0xBF.
>
> > Is this good enough? If not, why not?
>
> To begin with, it might leave ME with filenames I can't display
> correctly.

As is already the case. Currently you can't display _any_ filename
correctly, with the standard 'ls', with a custom 'ls', or within your
programs, _unless_ you know the code page used at the time the file
has been created.

Having a standard context-independent mapping would enable you at
least to write your own 'ls' that displays file names correctly, and
to handle file names correctly in your own programs. Which is a
reasonable step forward.

> This is a tricky question, and goes well beyond the simple question of
> filenames.  My system (or systems -- both Linux and Solaris work pretty
> much the same way here) stores filenames as a string of bytes.  It
> pretty much handles transparently whatever bytes are given to it; '\0'
> and '/' are exceptions, but they don't really affect anything I'm about
> to say.
>
> My system displays such filenames (using the 'ls' program for example)
> by simply copying them to the screen window.  It does some preliminary
> checks, to replace non-printable characters with a '?', for example,
> using the current locale.  How the characters actually appear on the
> screen depends on the current font.
>
> When available, I generally use fonts and locales for 8859-15. This
> means, for example, that the wide character '\u20A0' should be
> translated to the narrow character if it is to appear correctly as the
> Euro sign. Using your scheme, I will see six characters: "      ??", where
> the last two question marks replace non-printable characters.

True. What I outlined is a "reference mapping", an answer to the (too)
general question "but what would an implementation do?" (The only
reasonable answer is "it depends" but I tried to get the discussion
past the obvious stalemate.)

If the "natural" encoding for a concrete implementation is not 8859-1
but 8859-15 (or windows-1252), the "identity" part of the mapping can
be modified appropriately. The "natural" encoding _was_ 8859-1 at one
time, so I went with it. :-)

> While not perfect, an acceptable implementation would be to use a
> character between [0...255] directly, and reject any wide character
> filename which doesn't contain characters in this range.

This is a subset of the above, is it not?

> Now, I don't expect the standards committee to give special attention to
> my personal environment. But it is one of the most frequent environments
> for Unix machines in Europe. I would expect it to work more or less
> correctly with a quality implementation. And of course, a quality
> implementation has to work with the existing OS. (If we could also
> influence the OS, your suggestion has a lot going for it. Filenames are
> 8859-1, unless the name is at least three characters, and starts with
> "   ", in which case it is Unicode, starting with a zero width no-break
> space.  The problem is that the decision doesn't rest with the C++
> implementor, and I haven't heard about any great movement in this
> direction from Sun (for Solaris) or even Linux.)

This is one of my points. We _can_ influence the OS, by standardizing
the wide character based interface, and perhaps adding some
non-normative encouragements. (A C-based interface would be even more
influential.) You will not hear about great movements from Sun/Linux
before we decide that wchar_t filenames are important for us. They are
important for you. Let's do something. :-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 10 Oct 2002 17:35:58 +0000 (UTC)
Raw View
eldiener@earthlink.net ("Edward Diener") wrote in message
news:<at3p9.18378$OB5.1806394@newsread2.prod.itd.earthlink.net>...
> "James Kanze" <kanze@gabi-soft.de> wrote in message
> news:d6651fb6.0210090754.2d2cbf1d@posting.google.com...
> > Now, I don't expect the standards committee to give special
> > attention to my personal environment. But it is one of the most
> > frequent environments for Unix machines in Europe. I would expect it
> > to work more or less correctly with a quality implementation. And of
> > course, a quality implementation has to work with the existing
> > OS. (If we could also influence the OS, your suggestion has a lot
> > going for it. Filenames are 8859-1, unless the name is at least
> > three characters, and starts with "   ", in which case it is
> > Unicode, starting with a zero width no-break space.  The problem is
> > that the decision doesn't rest with the C++ implementor, and I
> > haven't heard about any great movement in this direction from Sun
> > (for Solaris) or even Linux.)

> You have made an excellent case for the fact that the semantics of
> wide character filenames must be defined by the implementation/OS for
> which it is used. While one may say, here are the things which C++
> recommends should be done if the OS does not natively support wide
> character filenames in its API, it is impossible to come up with a set
> of rules which tells an implementation on every OS what they must
> do. If the intent is to not support wide character filenames in the
> C++ file streams and message facets because not every OS has committed
> to natively supporting it in their API and their file system, then I
> think that is a valid choice certainly but that it is wrong.

It's a possible choice.  I think it is too early to say whether it is
wrong or right.

> I say it is wrong because it seems apparent to me that the general
> movement is toward OSs that cater to foreign languages whose character
> set can not be encompassed by a string of narrow characters.

All of the Unix I use support only narrow character filenames.  All of
them allow any possible character in the filename, except '/' and '\0'.
(All of them *allow* UTF-8 filenames, for example.)  The problem isn't
just what the OS allows, it is also what other tools in the tool set do
with it.

This is one place where C++ must NOT be at the leading edge.  Until I
know how ls processes the filenames, in order to display them, I cannot
know how I want to process them, in order to generate them.  The last
thing I want is to generate a filename with a Euro character in it (for
example), and not be able to access the file from other programs,
because they have no way of specifying the filename, or even displaying
it.

> By supporting wide character filenames for OSs which do support them
> in their API and file system, C++ will lead the way in supporting an
> idea whose time has come rather than embracing a backward path.

The problem is, in this domaine, we don't want to lead the way.

> Other languages ( Java, C# ) already support wide character filename,
> rightly or wrongly, in the form of Unicode strings of some character
> width.  They have provided semantics and now their support is already
> out-of-date in many instances.

What are the semantics of Java?  The API says "in a system dependant
manner", but what are the actual semantics?  (I tried to create a file
with a Euro character in the filename, and apparently, the actual file
created has a '?' in its place.)

> I hope that C++ will not make that mistake and provide support for
> wide character filenames which an implementation/OS is allowed to
> define for itself. In that case, if OSs do coagulate around a
> standard, C++ will not be out-of-date as these other languages are or
> may well be in the future.

I agree that what constitutes a filename, and how the wide characters
are converted, must be implementation defined.  But I would like to know
what to expect in some typical cases.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 10 Oct 2002 18:07:37 +0000 (UTC)
Raw View
alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach) wrote in message
news:<3da46942.526701890@news.bluecom.no>...

> >> That said, let me outline a "reference conversion" on a system
> >> where the native file name is an NTBS, and wchar_t is UCS-x.

> >> * If the wchar_t sequence contains only characters in the [1..255]
> >> range and does not start with 0xEF 0xBB 0xBF, use the corresponding
> >> byte sequence;

> >> * Otherwise, convert the wchar_t sequence to UTF-8 and prepend 0xEF
> >> 0xBB 0xBF.

> >> Is this good enough? If not, why not?

> >To begin with, it might leave ME with filenames I can't display
> >correctly.

> >This is a tricky question, and goes well beyond the simple question
> >of filenames.  My system (or systems -- both Linux and Solaris work
> >pretty much the same way here) stores filenames as a string of bytes.
> >It pretty much handles transparently whatever bytes are given to it;
> >'\0' and '/' are exceptions, but they don't really affect anything
> >I'm about to say.

> >My system displays such filenames (using the 'ls' program for
> >example) by simply copying them to the screen window.  It does some
> >preliminary checks, to replace non-printable characters with a '?',
> >for example, using the current locale.  How the characters actually
> >appear on the screen depends on the current font.

> >When available, I generally use fonts and locales for 8859-15. This
> >means, for example, that the wide character '\u20A0' should be
> >translated to the narrow character if it is to appear correctly as
> >the Euro sign. Using your scheme, I will see six characters:
> >"      ??", where the last two question marks replace non-printable
> >characters.

> >While not perfect, an acceptable implementation would be to use a
> >character between [0...255] directly, and reject any wide character
> >filename which doesn't contain characters in this range.

> You mean, reject any wide character filename which contains characters
> outside of this range?  The difference isn't trivial.  But I guess
> that that was a typo?

Right.

> >Now, I don't expect the standards committee to give special attention
> >to my personal environment. But it is one of the most frequent
> >environments for Unix machines in Europe. I would expect it to work
> >more or less correctly with a quality implementation. And of course,
> >a quality implementation has to work with the existing OS. (If we
> >could also influence the OS, your suggestion has a lot going for
> >it. Filenames are 8859-1, unless the name is at least three
> >characters, and starts with "   ", in which case it is Unicode,
> >starting with a zero width no-break space.  The problem is that the
> >decision doesn't rest with the C++ implementor, and I haven't heard
> >about any great movement in this direction from Sun (for Solaris) or
> >even Linux.)

> It seems to me that there are several "good" conversions down to
> narrow characters,

There are.  That's why it is so difficult to get a consensus.

> namely

>   * No change of character codes.
>     - Assumes the system filename character set is a subset of the
>       C++ wide character set.
>     - Error on filenames that contain characters outside this set.

That's basically what I'm proposing.  There are two widely used
character sets in western Europe: 8859-1 and 8859-15.  They only differ
in a few characters, and 8859-1 is a subset of Unicode -- it represents
the first 255 characters.  So this solution is probably acceptable for
western Europe, and it is certainly easy to implement.

>   * Convert to some MBCS character set NS other than UTF-8.
>     - Assumes the system treats filenames as NS.
>     - Error possible on characters not representable in NS.

      - How do you determine the character set?

>   * Convert to UTF-8 with non-display prefix, as necessary.
>     - Assumes the system treats filenames as UTF-8 (otherwise
>       display of filenames may be impossible to grok).
>     - No errors on the filename conversion.

> The key here is, IMHO, the word "Assumes".  So I don't think it's
> possible to select one particular conversion as being the "best"
> compromise.  Instead, the conversion should match the system, but that
> match can only partially be specified by the standard (and should
> perhaps not be specified at all, but left to the compiler).

I suspect that the actual behavior will have to be implementation
defined.  What I'm looking for is what our expectations should be for a
quality implementation on the most widespread systems.  For systems
which support wide character filenames directly, I think that the
implementation is obvious.  For others, I'm not sure what it should be,
but I think we should have some idea before we rush in and standardize
anything, even implementation defined behavior.

> ***

> Now, at the risk of spinning off counter-arguments to what I'd dearly
> like to see in a revised standard (namely wchar_t support), since Java
> and MS Windows have conspired to make UCS-2 -- 16-bit Unicode -- a de
> facto standard, it isn't unlikely that many C++ implementations will
> also in the future have wchar_t as 16 bits.

Most C++ implementations have to contend with existing practice on their
machines.  The existing practice under Solaris and Linux, at least, is
that wchar_t is 32 bits -- I think that this is also the case for HP/UX
and Compaq ex-Digital systems.

I think it is almost certain that in the near future, we will have to
deal with both 16 bit and 32 bit wchar_t.

> I think this has been discussed at length, but without any firm
> conclusion (the issue is the ability to represent any character by a
> single element in a sequence of character type elements), so I suggest
> let's skip that particular debate; it's just the motivation for my
> point, which follows.

> The point I'm raising is this: in the future we may not only need
> conversion from "wide character" to "narrow character" filenames, but
> conversions from UCS-4 to UCS-2, and down to single-byte.  So the C++
> model of either wide character or narrow character may no longer match
> reality.  Is there any way we can prepare for this situation, so that
> the language won't be in-principle incompatible?

No answer.  I wish I had one, though.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 10 Oct 2002 18:13:43 +0000 (UTC)
Raw View
pdimov@mmltd.net (Peter Dimov) wrote in message
news:<7dc3b1ea.0210090707.cf95281@posting.google.com>...
> kanze@gabi-soft.de (James Kanze) wrote in message
> news:<d6651fb6.0210080633.e8ff293@posting.google.com>...
> > ron@sensor.com ("Ron Natalie") wrote in message
> > news:<XQno9.3253$cG.781@fe04>...
> > > Of all the "objections" to your fstream propsals are:

> > >    In the standard, this will be "implementation defined", but we
> > >    don't want to add anything "implementation defined" to the
> > >    standard without a fairly good idea of what we should expect
> > >    from a quality implementation in a specific context

> > That is, I believe, a fair summary.  (It's my position, at least:-).)

> > > The answer is that a quality implementation does what is right for
> > > it.

> > The problem is that many implementors don't know what is right for
> > them.  I'm not an implementor, but I have no idea what is right for
> > Unix, for example. So they don't want to have to implement it.

> > > If there is a mapping from wide string to a narrow one, then it
> > > can do the conversion, otherwise it has no choice but to return an
> > > error.

> > The problem is that there are potentially many mappings.  And
> > implementors don't (yet) know how to choose.

> The perfect mapping has some pretty obvious properties:

> * Equal wchar_t sequences map to equal byte sequences, independent of
> OS/runtime/compiler state.

Independent of the current locale setting?  Which may be used to write
the filename into another file?

This means that if I write the filename to a file.  I then use it to
create the file.  I (or another program) then rereads it as a narrow
character sequence, and tries to open the file.  The file won't be
their, because two different mappings were used to convert it to narrow
characters.

> * Different wchar_t sequences map to different byte sequences.

Including if they contain combining sequences?  What about the
positional variants of the Arabic alphabets.

> * Identity mapping for file names in a suitably limited character set.

The basic character set.

I think that's the one requirement which will find more or less
universal agreement.

> I believe that it is possible to invent a mapping that is very close
> to perfect. See my answer to Beman's post.

It's possible to invent any number of mappings which meet your
criteria.  I think you forgot the most important one: that a character
in the wide character set map to the same character in the narrow
character set (provided such a character exist).  But meeting this
criteria involved determining what the narrow character set is, which is
almost impossible.

> > > As you state, this is NO different than feeding inappropriate
> > > characters (say things like slashes or characters > 128) to
> > > existing implementations.

> > I feed characters > 128 to my implementation all of the time.  That
> > may be why I'm sensibilized to the problem; the results aren't
> > always what one might expect.

> > > If the implementation can't represent the given string (wide or
> > > narrow) as a filename, then it has to return error.

> > If the system has a very limited idea of what a filename can be
> > (say, like on the old PDP-11), I don't think that the problem is
> > very complex.  The real problem is with systems like Unix, which
> > allow almost anything in a filename.  But interprets them according
> > to context.  Create a file using accented characters on Solaris.
> > Then try invoking ls while changing the environment variable LC_ALL,
> > and watch the filename change.  The problem isn't trivial.  (That
> > doesn't mean that it doesn't have a usable answer.)

> Unix does not interpret file names in any way.

The Unix kernal doesn't interpret filenames beyond detecting '/' as a
directory separator and '\0' as the end of the path.  I think that '.'
also gets some special treatment, at least in some contexts (loading a
program, perhaps).

Unix, as a whole, certainly does interpret filenames.  The shells and
find do pattern matching on them, ls (and echo, and a number of other
programs) display them, etc. Suppose I generate a configuration file,
and at the same time, output the filename to a startup script that I
generate.  The output of the filename will use the current locale
(unless I explicitly imbue another locale in the filebuf).  Are you
saying that the filename in the startup script should be different than
the one generated in the file system?

> It uses a byte sequence as a file name, the same byte sequence always
> names the same file, and the names of the existing files are
> locale-independent. It is 'ls' that interprets the bytes as
> locale-dependent characters.

It's ls, and the shells, and find, and ... All part of Unix, according
to IEEE Std 1003.1.  (And of course, the kernel *does* look at the
bytes, at least for '/' and '\0', and perhaps for '.' in certan cases.)

But that's really irrelevant.  My program doesn't run in a vacuum.  Even
more important than what the standard says is what the other programs on
the machine do.  And the problem is that a lot of them are very, very
na   ve: "ls" takes the current locale into account when displaying a file
name, for example, which suggests that a C++ program should also do this
when creating the file.

Finally, I'd like to stress that I'm brainstorming here.  I DON'T know
what the correct solution is.  I don't even know how to tell whether a
solution is correct or not.  And I'm convinced that I can find problems
(like the above) with any solution.

What I'm arguing against isn't your solution.  What I'm arguing against
is the idea that this is a trivial undertaking, with "obvious"
solutions.  That doesn't mean that we shouldn't undertake it.  But it
does mean not to expect miracles anytime soon.  And while I generally
think that some support for wide character filenames is necessary and
inevitable, I'd rather have no support, than to have something that
doesn't work, and that we have to change later.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Thu, 10 Oct 2002 18:19:32 +0000 (UTC)
Raw View
"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0210100630.7c9adf55@posting.google.com...
> eldiener@earthlink.net ("Edward Diener") wrote in message
> news:<b1Wo9.17213$OB5.1719713@newsread2.prod.itd.earthlink.net>...
> > Why should it not be adequate merely to persuade the C++ standards
> > committee ?
>
> The C++ standards committee is made up of representatives of the
> national bodies.  If you cannot persuade at least one member to support
> your proposal, how do you expect to persuade the organization as a
> whole.

By creating a proposal which others would find acceptable because the idea
as proposed ( and possibly the implementation of that idea ) behind it is a
good one.

I find it sad, not personally but intellectually, that the C++ language has
devolved into a situation where politics takes precedence over creativity. I
do not believe that that was what Bjarne Stroustrup had in mind when he
created the language.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Thu, 10 Oct 2002 18:21:02 +0000 (UTC)
Raw View
"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:3da5896b$0$17185$4c41069e@reader0.ash.ops.us.uu.net...
> ""Edward Diener"" <eldiener@earthlink.net> wrote in message
> news:at3p9.18378$OB5.1806394@newsread2.prod.itd.earthlink.net...
> > Other languages ( Java, C# ) already support wide character filename,
> > rightly or wrongly, in the form of Unicode strings of some character
width.
> > They have provided semantics and now their support is already
out-of-date in
> > many instances.
>
> EXACTLY. And the designers of both languages had the information they
> needed at design time to avoid becoming out of date. They just guessed
> wrong at the time, because of insufficient practical experience (IMO).
>
> >          I hope that C++ will not make that mistake and provide
> > support for wide character filenames which an implementation/OS is
allowed
> > to define for itself. In that case, if OSs do coagulate around a
standard,
> > C++ will not be out-of-date as these other languages are or may well be
in
> > the future.
>
> So we standardize something that's so wishy washy today that it'll be
> equally wishy washy in ten years, and equally beneficial to writing
> portable programs?

I don't think that supporting programmers who want to create filenames on
their own operating system in a foreign language which they understand is a
"wishy washy" enterprise, nor that giving C++ the ability to do so is
either. I don't believe that C++ should wait ten years to make such a
decision and even waiting six years, circa 2008 when I am told the next C++
standard will be ratified, seems too long. Giving C++ the ability to support
such filenames now and then refining the meaning of what that means as
various OSs adopt the idea seems preferable to me than waiting indefinitely,
even if the first definition of what wide character filenames mean is almost
completely indefinite and just suggestive of a variety of current common
practices.

As has been pointed out numerous times without any valid counterargument,
there is nothing inherently portable about narrow character filenames so I
don't see a portability issue with wide character filenames. At most, with
narrow character filenames, there may be an understanding by the programmer
that at its simplest form, which includes no possible path or, on systems
which support it, drive information in a filename, disparate OSs such as
Linux, OS/2, Windows, and Mac OS may generally treat the name as referring
to a file in the current directory ( or folder as some OSs call it ). But
this reference is purely OS defined and is not of course spelled out in any
way in the C++ standard. Why you are therefore concerned with portability of
wide character filenames on the theoretical level is therefore something I
do not understand. At the practical, and as an implementor of cross-platform
libraries, I understand your concern, but I don't see anything in the
current C++ standard which would give you filename information as opposed to
your knowledge of the filename characteristics of the various OSs for which
you create your libraries. As such I see the implementation of wide
character filenames as equally as OS dependent.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: francis.glassborow@ntlworld.com (Francis Glassborow)
Date: Thu, 10 Oct 2002 18:40:10 +0000 (UTC)
Raw View
In article <CRip9.20151$OB5.1931283@newsread2.prod.itd.earthlink.net>,
Edward Diener <eldiener@earthlink.net> writes
>I find it sad, not personally but intellectually, that the C++ language has
>devolved into a situation where politics takes precedence over creativity. I
>do not believe that that was what Bjarne Stroustrup had in mind when he
>created the language.

I think you are mistaken on just about all counts. Bjarne has said many
times that having a good creative idea is not enough. nine out of ten
good ideas were discarded during the standardisation of C++ and Bjarne
was one of the main forces behind such parsimony. In addition those
involved at WG21 level are not bureaucrats but they do have a wider
view. For example three excellent ideas may, if all accepted, lead to
disaster. There are those who would say that templates, exceptions and
namespaces are an existing example.


--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Thu, 10 Oct 2002 19:21:01 +0000 (UTC)
Raw View
""Edward Diener"" <eldiener@earthlink.net> wrote in message
news:wkjp9.20209$OB5.1934654@newsread2.prod.itd.earthlink.net...

> > So we standardize something that's so wishy washy today that it'll be
> > equally wishy washy in ten years, and equally beneficial to writing
> > portable programs?
>
> I don't think that supporting programmers who want to create filenames on
> their own operating system in a foreign language which they understand is a
> "wishy washy" enterprise,

Nor do I. Nor did I say that.

>                          nor that giving C++ the ability to do so is
> either.

Depends on how you do it. If you produce an International Standard that
says, here's how you can *try* to open a file with a wide-character name,
but no implementation is obliged to succeed for *any* request, and every
implementation is at liberty to do something different -- that, my friend,
is wishy washy.

>        I don't believe that C++ should wait ten years to make such a
> decision and even waiting six years, circa 2008 when I am told the next C++
> standard will be ratified, seems too long.

Too long for what? People can open files with wide names right now, as
James Kanze keeps pointing out. They just can't do it in a standard-conforming
way.

>                                           Giving C++ the ability to support
> such filenames now and then refining the meaning of what that means as
> various OSs adopt the idea seems preferable to me than waiting indefinitely,
> even if the first definition of what wide character filenames mean is almost
> completely indefinite and just suggestive of a variety of current common
> practices.

You mean, giving *Standard C++* the ability, without saying what that means.
That's not wishy washy? And what happens if the obvious way to do the job
today, however wishy washy, proves to be restrictive once we finally get
enough experience to know how to do the job properly? If you don't think
backward compatibility is that big a deal, try getting Java to switch to
four-byte characters, now that two bytes is clearly too small.

> As has been pointed out numerous times without any valid counterargument,
> there is nothing inherently portable about narrow character filenames so I
> don't see a portability issue with wide character filenames. At most, with
> narrow character filenames, there may be an understanding by the programmer
> that at its simplest form, which includes no possible path or, on systems
> which support it, drive information in a filename, disparate OSs such as
> Linux, OS/2, Windows, and Mac OS may generally treat the name as referring
> to a file in the current directory ( or folder as some OSs call it ). But
> this reference is purely OS defined and is not of course spelled out in any
> way in the C++ standard. Why you are therefore concerned with portability of
> wide character filenames on the theoretical level is therefore something I
> do not understand. At the practical, and as an implementor of cross-platform
> libraries, I understand your concern, but I don't see anything in the
> current C++ standard which would give you filename information as opposed to
> your knowledge of the filename characteristics of the various OSs for which
> you create your libraries. As such I see the implementation of wide
> character filenames as equally as OS dependent.

You keep harping on this point, but that's not where the criticism has
been directed. It's the *relationship* between "abc" and L"abc" that
matters to some of us, and it is important even if you can convince yourself
that some OS somewhere is permitted to reject either or both names. I've
seen plenty of discussions, over the years, that prove conclusively that
various programming-language standards are vacuous, unimplementable,
toothless, or otherwise not worth the effort it took to produce them.
Sometimes the reduction ad absurdum argument is used, as you are doing,
as license to do whatever you want and claim conformance. But some of us
will still keep trying to provide the best guidance we can for the legions
of programmers (and managers) who think standards are worth something. The
hardest, and most valuable, lession I've learned in over two decades of
such efforts is that my One Right Way to do a thing may not be universally
shared. You might try that thought on for size.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Fri, 4 Oct 2002 20:04:37 +0000 (UTC)
Raw View
Once again both internal to our company and in comp.lang.c++ the ugly
fact that C++ does not completely support character sets is rearing it's
head.

There are a small, but fairly important number of interfaces in the standard
that do not support wchar_t based arguments.   The effect of this is that on
implementations that can only represent certain characters with wchar_t
(i.e., there is no wide char to multibyte char mapping available), there is no
way to use fstreams with filenames with these characers.

I therefore suggest the following changes:

Add the following members to parallel the existing const char* arg'd versions

27.18.1.3    basic_filebuf<charT,traits>* open(const wchar_t* s, ios_base::openmode mode);
27.18.1.6    explicit basic_ifstream(const wchar_t* s, ios_base::openmode mode = ios_base::in);
27.8.1.7   void open(const wchar_t* s, ios_base::openmode mode = ios_base::in);
27.8.1.9   explicit basic_ofstream(const wchar_t* s, ios_base::openmode mode = ios_base::out);
27.8.1.10  void open(const char* s, ios_base::openmode mode = ios_base::out);
27.8.1.11  explicit basic_fstream(const wchar_t* s, ios_base::openmode mode = ios_base::in|ios_base::out);
27.8.1.11  void open(const wchar_t* s, ios_base::openmode mode = ios_base::in|ios_base::out);

Further, I would also suggest requiring

    main(int argc, wchar_t* argv[]) { /* ... */ }

to be added to the list of required signatures of main in 3.6.1.

These will provide the necessary changes that are visible outside the program (I see no reason to require
certain other things such as the typeinfo::name or extension::what  to be similarly widened).

[C suffers the same problems with stdio and certain other file handling library functions.]




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: alf_p_steinbach@yahoo.no.invalid (Alf P. Steinbach)
Date: Fri, 4 Oct 2002 21:23:47 +0000 (UTC)
Raw View
On Fri, 4 Oct 2002 20:04:37 +0000 (UTC), ron@sensor.com ("Ron Natalie") wrote:
>...

I haven't checked the details.

But I'm willing to argue to the best of my ability for or against the general
idea, whichever is most likely to bring it about.

Perhaps the latter?  ;-)


Cheers,

- Alf

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com (Andy Sawyer)
Date: Fri, 4 Oct 2002 21:37:04 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") writes:

> Further, I would also suggest requiring
>
>     main(int argc, wchar_t* argv[]) { /* ... */ }
>
> to be added to the list of required signatures of main in 3.6.1.

It's unnecessary as a required signature, since the current

    int main( int argc, char *argv[] ) { /* ... */ }

can convey all of the necessary information. (the argv array is an
array of pointers to NTMBS - which can use pretty much any character
encoding the implementation desires). Of course, there's nothing to
stop a vendor adding your desired "main" signature as an extention.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: andys@evo6.com (Andy Sawyer)
Date: Fri, 4 Oct 2002 22:33:44 +0000 (UTC)
Raw View
andys@evo6.com (Andy Sawyer) writes:

> ron@sensor.com ("Ron Natalie") writes:
>
> > Further, I would also suggest requiring
> >
> >     main(int argc, wchar_t* argv[]) { /* ... */ }
> >
> > to be added to the list of required signatures of main in 3.6.1.
>
> It's unnecessary as a required signature, since the current
>
>     int main( int argc, char *argv[] ) { /* ... */ }
>
> can convey all of the necessary information.

I just re-read Ron's original post and realised that I had
mis-interpreted

   "implementations that can only represent certain characters with
   wchar_t"

to mean that the range of characters that wchar_t may represent is
limited, not that certain characters could not be represented /except/
by wchar_t. In those environments, the NTMBS argv clearly is not
sufficient. However, I maintain my position that it should not be
mandated by the standard, but (as now) vendors are free to offer it as
an extention.

Regards,
 Andy S.
--
"Light thinks it travels faster than anything but it is wrong. No matter
 how fast light travels it finds the darkness has always got there first,
 and is waiting for it."                  -- Terry Pratchett, Reaper Man

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Mon, 7 Oct 2002 17:07:20 +0000 (UTC)
Raw View
Previous threads initiated by me, Edward Diener, and Randy Maddox have
brought up the same points to little avail. Here are some more issues you
can add to your list:

1) The message facets, both character and wide character, specify a filename
of only characters and not wide characters.

2) The exception hierarchy returns its what() string as a character string
with no provision for wide character string returns.

The previous answer to all the points cited was essentially that only a used
C++ standard library implementation which supports these additional wide
character implementations will further the cause of a proposal for including
such additions to the library.

""Ron Natalie"" <ron@sensor.com> wrote in message
news:sgmn9.11631$Jw5.6647@fe04...
> Once again both internal to our company and in comp.lang.c++ the ugly
> fact that C++ does not completely support character sets is rearing it's
> head.
>
> There are a small, but fairly important number of interfaces in the
standard
> that do not support wchar_t based arguments.   The effect of this is that
on
> implementations that can only represent certain characters with wchar_t
> (i.e., there is no wide char to multibyte char mapping available), there
is no
> way to use fstreams with filenames with these characers.
>
> I therefore suggest the following changes:
>
> Add the following members to parallel the existing const char* arg'd
versions
>
> 27.18.1.3    basic_filebuf<charT,traits>* open(const wchar_t* s,
ios_base::openmode mode);
> 27.18.1.6    explicit basic_ifstream(const wchar_t* s, ios_base::openmode
mode = ios_base::in);
> 27.8.1.7   void open(const wchar_t* s, ios_base::openmode mode =
ios_base::in);
> 27.8.1.9   explicit basic_ofstream(const wchar_t* s, ios_base::openmode
mode = ios_base::out);
> 27.8.1.10  void open(const char* s, ios_base::openmode mode =
ios_base::out);
> 27.8.1.11  explicit basic_fstream(const wchar_t* s, ios_base::openmode
mode = ios_base::in|ios_base::out);
> 27.8.1.11  void open(const wchar_t* s, ios_base::openmode mode =
ios_base::in|ios_base::out);
>
> Further, I would also suggest requiring
>
>     main(int argc, wchar_t* argv[]) { /* ... */ }
>
> to be added to the list of required signatures of main in 3.6.1.
>
> These will provide the necessary changes that are visible outside the
program (I see no reason to require
> certain other things such as the typeinfo::name or extension::what  to be
similarly widened).
>
> [C suffers the same problems with stdio and certain other file handling
library functions.]


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Mon, 7 Oct 2002 17:36:34 +0000 (UTC)
Raw View
""Edward Diener"" <eldiener@earthlink.net> wrote in message news:CoOn9.10944$OB5.1113866@newsread2.prod.itd.earthlink.net...

> 1) The message facets, both character and wide character, specify a filename
> of only characters and not wide characters.

Agreed.

>
> 2) The exception hierarchy returns its what() string as a character string
> with no provision for wide character string returns.

I mentioned this.   Things like exception::what and typeinfo::what are not
really exposed outside of the program so are of lesser significance.   The
tools of programming themselves are going to be stuck in the char world.

> The previous answer to all the points cited was essentially that only a used
> C++ standard library implementation which supports these additional wide
> character implementations will further the cause of a proposal for including
> such additions to the library.

I can't parse what the above says, but we've already extended our streambufs
to have the wchar_t overloads.   It's not all that complicated, just makes our code
not portable.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Mon, 7 Oct 2002 18:37:05 +0000 (UTC)
Raw View
""Ron Natalie"" <ron@sensor.com> wrote in message
news:%mjo9.1100$cG.109@fe04...
>
> ""Edward Diener"" <eldiener@earthlink.net> wrote in message
news:CoOn9.10944$OB5.1113866@newsread2.prod.itd.earthlink.net...

> > The previous answer to all the points cited was essentially that only a
used
> > C++ standard library implementation which supports these additional wide
> > character implementations will further the cause of a proposal for
including
> > such additions to the library.
>
> I can't parse what the above says, but we've already extended our
streambufs
> to have the wchar_t overloads.   It's not all that complicated, just makes
our code
> not portable.

I will parse it for you. Various people who answered previously mentioned
that the only way which would exist for wide character filename support to
be taken seriously as a proposal is if an implementation for such support
were done and shown to the C++ committee. I still have no idea of the formal
way in which a proposal is made to the C++ committee for changes to the C++
language or C++ standard libraries, and gathered from the previous
discussion that only implementations of proposed changes, and not just
ideas, will somehow be considered.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bdawes@acm.org (Beman Dawes)
Date: Mon, 7 Oct 2002 21:46:17 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...

> I therefore suggest the following changes:
>
> Add the following members to parallel the existing const char* arg'd versions
>
> 27.18.1.3    basic_filebuf<charT,traits>* open(const wchar_t* s, ios_base::openmode mode);
> 27.18.1.6    explicit basic_ifstream(const wchar_t* s, ios_base::openmode mode = ios_base::in);
> 27.8.1.7   void open(const wchar_t* s, ios_base::openmode mode = ios_base::in);
> 27.8.1.9   explicit basic_ofstream(const wchar_t* s, ios_base::openmode mode = ios_base::out);
> 27.8.1.10  void open(const char* s, ios_base::openmode mode = ios_base::out);
> 27.8.1.11  explicit basic_fstream(const wchar_t* s, ios_base::openmode mode = ios_base::in|ios_base::out);
> 27.8.1.11  void open(const wchar_t* s, ios_base::openmode mode = ios_base::in|ios_base::out);

With what semantics on operating systems which do not support wchar_t
filenames?

Based on past committee discussions, numerous members of the standards
committee's library working group would like to provide for wchar_t
filenames. But not a single LWG member has been willing to speak in
favor of doing so without explicit semantics.

"Implementation defined" won't fly; that's just an "illusion of
portability" without any underlying reality. The implementors say they
have no idea what semantics they would be defining.

"Unspecified" won't fly either; unlike narrow character filenames,
there is no agreed upon behavior for an implementation to fall back
upon.

Perhaps there are some reasonably portable semantics for wide names on
narrow platforms, and the LWG has somehow overlooked them. But until
someone steps forward and explains what those semantics are, the LWG
can't standardize them.

--Beman

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Mon, 7 Oct 2002 21:46:42 +0000 (UTC)
Raw View
"Andy Sawyer" <andys@evo6.com> wrote in message news:n0ptc2fb.fsf@ender.evo6.com...

> can convey all of the necessary information. (the argv array is an
> array of pointers to NTMBS - which can use pretty much any character
> encoding the implementation desires).

I strongly disagree.  The general assumtion that a there exist a unique
(or in some cases ANY) multibyte representation that fully represents
the wide character set is an assumption that was made repeatedly
in the standard that does not hold up in practice.   It was only BARELY
true for the UNIX implementations that essentially were just international
character sets shoe-horned into the traditional 8 bit implementation.

This is not the case.  In Windows, there may not be any multibyte set
that is appropriate, in some cases there is more than one.   You just
can't assume there is a 1-to-1 mapping between the a single wchar_t
representation and a single NTMBS.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 7 Oct 2002 21:48:17 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...
> Once again both internal to our company and in comp.lang.c++ the ugly
> fact that C++ does not completely support character sets is rearing it's
> head.
>

Once again, indeed.  See also the thread
"Internationalization/localization support in stdlib" where similar
issues were brought up, and severely hammered.

Best of luck with this one.  :-)

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: bop2@telia.com ("Bo Persson")
Date: Mon, 7 Oct 2002 21:54:16 +0000 (UTC)
Raw View
""Edward Diener"" <eldiener@earthlink.net> skrev i meddelandet
news:ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net...
> ""Ron Natalie"" <ron@sensor.com> wrote in message
> news:%mjo9.1100$cG.109@fe04...
> >
> > ""Edward Diener"" <eldiener@earthlink.net> wrote in message
> news:CoOn9.10944$OB5.1113866@newsread2.prod.itd.earthlink.net...
>
> > > The previous answer to all the points cited was essentially that only
a
> used
> > > C++ standard library implementation which supports these additional
wide
> > > character implementations will further the cause of a proposal for
> including
> > > such additions to the library.
> >
> > I can't parse what the above says, but we've already extended our
> streambufs
> > to have the wchar_t overloads.   It's not all that complicated, just
makes
> our code
> > not portable.
>
> I will parse it for you. Various people who answered previously mentioned
> that the only way which would exist for wide character filename support to
> be taken seriously as a proposal is if an implementation for such support
> were done and shown to the C++ committee. I still have no idea of the
formal
> way in which a proposal is made to the C++ committee for changes to the
C++
> language or C++ standard libraries, and gathered from the previous
> discussion that only implementations of proposed changes, and not just
> ideas, will somehow be considered.
>

I think the message from the committee is that they already *have*
considered the idea, and rejected it. To have them reconsider, someone has
to show how it can be of general use. Specifically, how it is to be
implemented for operating systems that do not support wide character file
names.


Bo Persson
bop2@telia.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Mon, 7 Oct 2002 22:13:45 +0000 (UTC)
Raw View
""Ron Natalie"" <ron@sensor.com> wrote in message news:kYfo9.207802$F21.72447@fe02...

> "Andy Sawyer" <andys@evo6.com> wrote in message news:n0ptc2fb.fsf@ender.evo6.com...
>
> > can convey all of the necessary information. (the argv array is an
> > array of pointers to NTMBS - which can use pretty much any character
> > encoding the implementation desires).
>
> I strongly disagree.  The general assumtion that a there exist a unique
> (or in some cases ANY) multibyte representation that fully represents
> the wide character set is an assumption that was made repeatedly
> in the standard that does not hold up in practice.   It was only BARELY
> true for the UNIX implementations that essentially were just international
> character sets shoe-horned into the traditional 8 bit implementation.
>
> This is not the case.

Uh, our CoreX Library includes an extended UCS-4 to UTF-8 mapping that's
32-bit transparent. Existence proof.

>                       In Windows, there may not be any multibyte set
> that is appropriate, in some cases there is more than one.   You just
> can't assume there is a 1-to-1 mapping between the a single wchar_t
> representation and a single NTMBS.

And *that's* the problem. Implementors (and users) need some clear
guidance on what the correspondence might be between narrow and wide
names, in the presence of parallel signatures. It's not that there might
not be a way -- there always is. It's that there might be more than one
way, and no clear rule to choose.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Mon, 7 Oct 2002 23:12:19 +0000 (UTC)
Raw View
"Randy Maddox" <rmaddox@isicns.com> wrote in message news:8c8b368d.0210071008.6c67de6f@posting.google.com...
> ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...
> > Once again both internal to our company and in comp.lang.c++ the ugly
> > fact that C++ does not completely support character sets is rearing it's
> > head.
> >
>
> Once again, indeed.  See also the thread
> "Internationalization/localization support in stdlib" where similar
> issues were brought up, and severely hammered.
>
I read that thread.  It seems mostly griping about exception::what() which I am
not so convinced their is an overwhelming need to provide some alternate
scheme for wchat_t (and I don't have any better answers than the rest of
you as to how that would be implemented).  I believe your comments on
fstream echo mine and they were right then and they are right now.

I so no reason that these issues are necessarily linked.
It's a royal pain in the ass that I'm rewriting a perfectly compliant library
implementation on Windows to get around the assumption the standard
makes.

Of all the "objections" to your fstream propsals are:

   In the standard, this will be "implementation defined", but we don't
   want to add anything "implementation defined" to the standard without
   a fairly good idea of what we should expect from a quality
   implementation in a specific context

The answer is that a quality implementation does what is right for it.   If there
is a mapping from wide string to a narrow one, then it can do the conversion,
otherwise it has no choice but to return an error.   As you state, this is NO
different than feeding inappropriate characters (say things like slashes or
characters > 128) to existing implementations.   If the implementation can't
represent the given string (wide or narrow) as a filename, then it has to
return error.

Pete Becker's suggestion:

  You do what C++ programmers do: you write code. If adding a constructor
  to fstream, say, is too daunting then just do the work somewhere else:
  write a function to convert wide character file names to byte names.

Doesn't work.   You can't do that in all cases (and he should know, he wrote the
library on the platform that suffers mostly from this).    The further assertion from
Allan W:

   Presumably, data storage (such as a file) can hold data which does not
   belong to the implementation's basic character set. But if the file system
   understands 16-bit characters, then (IMHO) the "implementation's basic
  character set" should also be 16-bit characters.

doesn't hold up well do the the unfortunate double duty that the char type
plays as both the basic character type and as the base addressable unit.
If you switch char to be 16 bits, you lose the ability to address 8 bit storage.

I also decry the facetious remarks (especially on the part of Mr. Becker who
should have known better having provided the library implementation on
the system in question) that the no system required such a feature.

This should be a simple addition to the language.    The semantics are
simple and inline with the current behavior of the char-based file names.









---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Mon, 7 Oct 2002 23:12:24 +0000 (UTC)
Raw View
""Bo Persson"" <bop2@telia.com> wrote in message news:g6no9.721$hV3.29062@newsb.telia.net...

> I think the message from the committee is that they already *have*
> considered the idea, and rejected it. To have them reconsider, someone has
> to show how it can be of general use. Specifically, how it is to be
> implemented for operating systems that do not support wide character file
> names.

It's simple, if the file can be opened with a wchar_t based name, it succeeds.
If not it fails.   It's no different now than if you are trying to use char based names
on a file system that uses wide characters in it's names.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: ron@sensor.com ("Ron Natalie")
Date: Mon, 7 Oct 2002 23:13:06 +0000 (UTC)
Raw View
""P.J. Plauger"" <pjp@dinkumware.com> wrote in message news:3da205f1$0$23004$4c41069e@reader1.ash.ops.us.uu.net...
> ""Ron Natalie"" <ron@sensor.com> wrote in message news:kYfo9.207802$F21.72447@fe02...
>
> > "Andy Sawyer" <andys@evo6.com> wrote in message news:n0ptc2fb.fsf@ender.evo6.com...
> >
> > > can convey all of the necessary information. (the argv array is an
> > > array of pointers to NTMBS - which can use pretty much any character
> > > encoding the implementation desires).
> >
> > I strongly disagree.  The general assumtion that a there exist a unique
> > (or in some cases ANY) multibyte representation that fully represents
> > the wide character set is an assumption that was made repeatedly
> > in the standard that does not hold up in practice.   It was only BARELY
> > true for the UNIX implementations that essentially were just international
> > character sets shoe-horned into the traditional 8 bit implementation.
> >
> > This is not the case.
>
> Uh, our CoreX Library includes an extended UCS-4 to UTF-8 mapping that's
> 32-bit transparent. Existence proof.

This is a non-sequitor.   Showing once case doesn't prove that there all impementations
exhibit that behavior.


> And *that's* the problem. Implementors (and users) need some clear
> guidance on what the correspondence might be between narrow and wide
> names, in the presence of parallel signatures. It's not that there might
> not be a way -- there always is. It's that there might be more than one
> way, and no clear rule to choose.

Not having a way to resolve the ambiguity means that there exists no
correspondence.

However, as you should know, that on NT-derived Windows systems, if you want
wide character file names you must use the wide character CreateFile.  While there
is also a narrow CreateFile, it doesn't support MBCS of any form.   There in fact
exists NO SUCH CONVERSION.   While an implementation theoretically could
"invent" an arbitrary wide-to-narrow conversion and then unfold it later on, this seems
to be a lot of headstanding.   You couldn't necessarily just use UTF-8 for this (if the
characters with the high bits were perfectly legal characters in this particular
implementation).   If you did that then legal char based strings would have to
be processed by some algorithm to distinguish them from the multibyte input.
This now breaks the behavior of the existing standard (that such a check would
be necessary).

The only clear way to solve this problem without breaking existing code is to
add the suggested overloads.




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: NOSPAMsjhowe@dial.pipex.com ("Stephen Howe")
Date: Mon, 7 Oct 2002 23:13:20 +0000 (UTC)
Raw View
> I will parse it for you. Various people who answered previously mentioned
> that the only way which would exist for wide character filename support to
> be taken seriously as a proposal is if an implementation for such support
> were done and shown to the C++ committee. I still have no idea of the
formal
> way in which a proposal is made to the C++ committee for changes to the
C++
> language or C++ standard libraries, and gathered from the previous
> discussion that only implementations of proposed changes, and not just
> ideas, will somehow be considered.

Which means Ron if the group that represents STLPort accepts that as logical
extension then it _might_ be added to the next standard.

It would be worthwhile talking to the group that represents STLPort are
interested as they are platform-neutral, compiler-neutral.

Stephen Howe


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: eldiener@earthlink.net ("Edward Diener")
Date: Mon, 7 Oct 2002 23:48:57 +0000 (UTC)
Raw View
"Beman Dawes" <bdawes@acm.org> wrote in message
news:70fa0367.0210061718.3b8747e8@posting.google.com...
> ron@sensor.com ("Ron Natalie") wrote in message
news:<sgmn9.11631$Jw5.6647@fe04>...
>
> > I therefore suggest the following changes:
> >
> > Add the following members to parallel the existing const char* arg'd
versions
> >
> > 27.18.1.3    basic_filebuf<charT,traits>* open(const wchar_t* s,
ios_base::openmode mode);
> > 27.18.1.6    explicit basic_ifstream(const wchar_t* s,
ios_base::openmode mode = ios_base::in);
> > 27.8.1.7   void open(const wchar_t* s, ios_base::openmode mode =
ios_base::in);
> > 27.8.1.9   explicit basic_ofstream(const wchar_t* s, ios_base::openmode
mode = ios_base::out);
> > 27.8.1.10  void open(const char* s, ios_base::openmode mode =
ios_base::out);
> > 27.8.1.11  explicit basic_fstream(const wchar_t* s, ios_base::openmode
mode = ios_base::in|ios_base::out);
> > 27.8.1.11  void open(const wchar_t* s, ios_base::openmode mode =
ios_base::in|ios_base::out);
>
> With what semantics on operating systems which do not support wchar_t
> filenames?
>
> Based on past committee discussions, numerous members of the standards
> committee's library working group would like to provide for wchar_t
> filenames. But not a single LWG member has been willing to speak in
> favor of doing so without explicit semantics.
>
> "Implementation defined" won't fly; that's just an "illusion of
> portability" without any underlying reality. The implementors say they
> have no idea what semantics they would be defining.

There is no portability for narrow character filenames, nor should there be
one. A given string of chars on one OS may mean something entirely different
as a filename as the same string on another OS. The C++ standard says
nothing that I can see about the characters of a filename or what
constitutues a filename, because it would be an impossible task to tell
various OSs that they must conform in their filename to some C++
specification.

The same should go for wide character filenames. It has to be implementation
defined because portability for any file names is just an illusion anyway,
whether characters or wide characters.

The only semantics that need be defined is that the wide character filename
is a string of wide characters. If you ask, what character set, my answer is
that it is an OS issue and not a C++ issue. On Windows it would normally be
the character set of the Windows user locale, on some other OS it may be
something entirely different.

As I see it, the whole point of supporting wide character filenames is not
to specify what the wide characters actually mean in all C++ implementations
but to allow various implementations to use the wide character filename
constructors and member functions to implement what it deems to be the most
normal implementation for a given OS. If there is no wide character support
on a given C++ implementation/OS, then the attempt to open such a wide
character file fails in the same way as opening a non-existent char filename
fails.

I personally wouldn't mind if the C++ committee attempted to find some
lowest common denominator for wide character filenames as part of the
semantics, but I don't see any for char filenames other than a string of
characters. Yet I don't understand rejecting the issue entirely because no
semantics have been provided. If there are semantics for char filenames, I
would be glad if you or anybody would pointed them out to me.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: francis.glassborow@ntlworld.com (Francis Glassborow)
Date: Tue, 8 Oct 2002 12:34:09 +0000 (UTC)
Raw View
In article <ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net>,
Edward Diener <eldiener@earthlink.net> writes
>I will parse it for you. Various people who answered previously mentioned
>that the only way which would exist for wide character filename support to
>be taken seriously as a proposal is if an implementation for such support
>were done and shown to the C++ committee. I still have no idea of the formal
>way in which a proposal is made to the C++ committee for changes to the C++
>language or C++ standard libraries, and gathered from the previous
>discussion that only implementations of proposed changes, and not just
>ideas, will somehow be considered.

No, we are quite willing to claim credit for implementing excellent
ideas and campaigning for them. What we are not willing to accept is the
principle that someone should do something about an idea. If you have an
idea and really want to see it discussed you must do the spade work. I
have quite enough good ideas of my own without being asked to work up
proposals based on someone else's sketchy suggestion.


--
Francis Glassborow      ACCU
64 Southfield Rd
Oxford OX4 1PA          +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 8 Oct 2002 16:57:23 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message news:<XQno9.3253$cG.781@fe04>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message news:8c8b368d.0210071008.6c67de6f@posting.google.com...
> > ron@sensor.com ("Ron Natalie") wrote in message news:<sgmn9.11631$Jw5.6647@fe04>...
> > > Once again both internal to our company and in comp.lang.c++ the ugly
> > > fact that C++ does not completely support character sets is rearing it's
> > > head.
> > >
> The answer is that a quality implementation does what is right for it.   If there
> is a mapping from wide string to a narrow one, then it can do the conversion,
> otherwise it has no choice but to return an error.   As you state, this is NO
> different than feeding inappropriate characters (say things like slashes or
> characters > 128) to existing implementations.   If the implementation can't
> represent the given string (wide or narrow) as a filename, then it has to
> return error.
>

I agree with you completely, support your position 100%, and am
completely at a loss to understand why this is even an issue.

What constitutes a valid file name is defined by the OS and not by
C++.  What happens when we try to open a file using an invalid file
name is that the open fails.  The reasons why any particular attempt
to open a file might fail, and there are many, are defined by the OS
and not by C++.

These statements are equally true whether narrow or wide characters
are involved, and the objections raised to date make no attempt to
assert otherwise.  All I have seen are contrived situations that
purport to be reasons why this is a bad idea, but that have really
appeared, IMHO, to be either bad programming or inconsistent
assumptions.

Why is the C++ community so strongly against providing better support
for different character sets?  It seems to me that the committee put
in a lot of work and went a long way toward this goal, only to falter
now that suggestions are being made to clean up the last few places
where single byte character use is hardwired into C++.

So, if anyone out there has some good reasons why this is a bad idea,
then I for one would dearly love to hear about them.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 8 Oct 2002 17:12:26 +0000 (UTC)
Raw View
bop2@telia.com ("Bo Persson") wrote in message
news:<g6no9.721$hV3.29062@newsb.telia.net>...
> ""Edward Diener"" <eldiener@earthlink.net> skrev i meddelandet
> news:ONjo9.13879$OB5.1397398@newsread2.prod.itd.earthlink.net...
> > ""Ron Natalie"" <ron@sensor.com> wrote in message
> > news:%mjo9.1100$cG.109@fe04...
> > > ""Edward Diener"" <eldiener@earthlink.net> wrote in message
> > news:CoOn9.10944$OB5.1113866@newsread2.prod.itd.earthlink.net...

> > > > The previous answer to all the points cited was essentially that
> > > > only a used C++ standard library implementation which supports
> > > > these additional wide character implementations will further the
> > > > cause of a proposal for including such additions to the library.

> > > I can't parse what the above says, but we've already extended our
> > > streambufs to have the wchar_t overloads. It's not all that
> > > complicated, just makes our code not portable.

I'm not too sure what the "not portable" is meant to refer to. A library
implementation which added an open( wchar_t const* ...) to filebuf
should not break any existing code -- I don't know whether the standard
formally allows such extensions, but they certainly don't cause any
problems in practice.  And of course, there is no way to implement
filebuf portably anyway (except maybe by forwarding to stdio.h, but that
just moves the problem down one level).

The only real problem is that code which uses your extension won't be
portable.

> > I will parse it for you. Various people who answered previously
> > mentioned that the only way which would exist for wide character
> > filename support to be taken seriously as a proposal is if an
> > implementation for such support were done and shown to the C++
> > committee. I still have no idea of the formal way in which a
> > proposal is made to the C++ committee for changes to the C++
> > language or C++ standard libraries, and gathered from the previous
> > discussion that only implementations of proposed changes, and not
> > just ideas, will somehow be considered.

> I think the message from the committee is that they already *have*
> considered the idea, and rejected it.

The message from the committee was that they vaguely considered the
idea, but since they really didn't know what the semantics should be,
the time wasn't yet ripe.

> To have them reconsider, someone has to show how it can be of general
> use.

To have them consider any idea, it must be shown generally useful and
implementable.  It must also be precise, which is where the problems
with wchar_t filenames lay.  In the previous thread, a number of people
wanted wchar_t filenames; all of them had a very definite idea as to the
desired semantics, and all of the definite ideas were different.

> Specifically, how it is to be implemented for operating systems that
> do not support wide character file names.

That is the big question.  I don't think it is a killer problem, but to
date, no one has really decided to address it.

Given that there are a number of possible solutions (of which several
are probably generally useful), it would be nice if there were some
experience on which to base a choice.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 8 Oct 2002 17:12:51 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message
news:<NXno9.3334$cG.719@fe04>...
> ""Bo Persson"" <bop2@telia.com> wrote in message
> news:g6no9.721$hV3.29062@newsb.telia.net...
> > I think the message from the committee is that they already *have*
> > considered the idea, and rejected it. To have them reconsider,
> > someone has to show how it can be of general use. Specifically, how
> > it is to be implemented for operating systems that do not support
> > wide character file names.

> It's simple, if the file can be opened with a wchar_t based name, it
> succeeds. If not it fails. It's no different now than if you are
> trying to use char based names on a file system that uses wide
> characters in it's names.

Not having any experience with a system using wide character file names,
it's hard to say.  Still

  - If the system *only* uses wide character filenames, I would expect
    it to accept narrow character filenames anyway.  Since these are the
    only ones which are standard at present.

  - If I specify a filename "abc", I expect the system to find the file
    with that name (supposing it exists), even if the file has a wide
    character filename.

  - I would certainly expect that "abc" and L"abc" refer to the same
    file.  But I could be wrong in my expectations.

  - If wide character filenames are supported, I would expect to be able
    to use them to specify filenames on a system like Plan 9, where the
    actual filenames are in UTF-8.  (I believe that this is also the
    direction that Linux is going.)

The question is, of course, what does it mean to say that a file can be
opened with wchar_t based names.  If the proposal is just to add the
necessary functions, but that they will fail systematically on most
systems (which don't support wide character filenames), then I fail to
see how that would help portability.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Tue, 8 Oct 2002 17:21:13 +0000 (UTC)
Raw View
"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0210080458.443bf6e@posting.google.com...

> Why is the C++ community so strongly against providing better support
> for different character sets?

Easy answer -- it isn't.

>                               It seems to me that the committee put
> in a lot of work and went a long way toward this goal, only to falter
> now that suggestions are being made to clean up the last few places
> where single byte character use is hardwired into C++.

Faltering is in the eye of the beholder.

> So, if anyone out there has some good reasons why this is a bad idea,
> then I for one would dearly love to hear about them.

You've heard them. You just can't hear them.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 8 Oct 2002 17:38:35 +0000 (UTC)
Raw View
ron@sensor.com ("Ron Natalie") wrote in message
news:<XQno9.3253$cG.781@fe04>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0210071008.6c67de6f@posting.google.com...
> > ron@sensor.com ("Ron Natalie") wrote in message
> > news:<sgmn9.11631$Jw5.6647@fe04>...
> > > Once again both internal to our company and in comp.lang.c++ the
> > > ugly fact that C++ does not completely support character sets is
> > > rearing it's head.

> > Once again, indeed. See also the thread
> > "Internationalization/localization support in stdlib" where similar
> > issues were brought up, and severely hammered.

> I read that thread. It seems mostly griping about exception::what()
> which I am not so convinced their is an overwhelming need to provide
> some alternate scheme for wchat_t (and I don't have any better answers
> than the rest of you as to how that would be implemented). I believe
> your comments on fstream echo mine and they were right then and they
> are right now.

> I so no reason that these issues are necessarily linked.

Agreed.  It should be possible to discuss them separately.

> It's a royal pain in the ass that I'm rewriting a perfectly compliant
> library implementation on Windows to get around the assumption the
> standard makes.

> Of all the "objections" to your fstream propsals are:

>    In the standard, this will be "implementation defined", but we
>    don't want to add anything "implementation defined" to the standard
>    without a fairly good idea of what we should expect from a quality
>    implementation in a specific context

That is, I believe, a fair summary.  (It's my position, at least:-).)

> The answer is that a quality implementation does what is right for it.

The problem is that many implementors don't know what is right for them.
I'm not an implementor, but I have no idea what is right for Unix, for
example. So they don't want to have to implement it.

> If there is a mapping from wide string to a narrow one, then it can do
> the conversion, otherwise it has no choice but to return an error.

The problem is that there are potentially many mappings.  And
implementors don't (yet) know how to choose.

> As you state, this is NO different than feeding inappropriate
> characters (say things like slashes or characters > 128) to existing
> implementations.

I feed characters > 128 to my implementation all of the time.  That may
be why I'm sensibilized to the problem; the results aren't always what
one might expect.

> If the implementation can't represent the given string (wide or
> narrow) as a filename, then it has to return error.

If the system has a very limited idea of what a filename can be (say,
like on the old PDP-11), I don't think that the problem is very complex.
The real problem is with systems like Unix, which allow almost anything
in a filename.  But interprets them according to context.  Create a file
using accented characters on Solaris.  Then try invoking ls while
changing the environment variable LC_ALL, and watch the filename
change.  The problem isn't trivial.  (That doesn't mean that it doesn't
have a usable answer.)

> Pete Becker's suggestion:

>   You do what C++ programmers do: you write code. If adding a
>   constructor to fstream, say, is too daunting then just do the work
>   somewhere else: write a function to convert wide character file
>   names to byte names.

> Doesn't work.  You can't do that in all cases (and he should know, he
> wrote the library on the platform that suffers mostly from this).  The
> further assertion from

> Allan W:

>    Presumably, data storage (such as a file) can hold data which does
>    not belong to the implementation's basic character set. But if the
>    file system understands 16-bit characters, then (IMHO) the
>    "implementation's basic > character set" should also be 16-bit
>    characters.

> doesn't hold up well do the the unfortunate double duty that the char
> type plays as both the basic character type and as the base
> addressable unit.  If you switch char to be 16 bits, you lose the
> ability to address 8 bit storage.

That sounds like a very legitimate core issue:-).  If you can find a
solution to it which doesn't break existing code, I suspect that you
will make a lot of people happy.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]