Thread

Topic: Internationalization/localization support in stdlib

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Tue, 6 Aug 2002 00:34:45 GMT Raw View

Randy Maddox wrote:

> "Ken Shaw" <ken@DIESPAMDIEcompinnovations.com> wrote in message news:<aibu89$228@dispatch.concentric.net>...
>
>>"Randy Maddox" <rmaddox@isicns.com> wrote in message
>>news:8c8b368d.0208010531.1c697dbf@posting.google.com...
>>
>>>However, when you look at that statement a bit more deeply, it becomes
>>>clear that we are talking about a mapping from 16-bit values to 8-bit
>>>values, i.e., mapping a set of 65,536 values onto a set of 256 values,
>>>which means, assuming even distribution, that each character in the
>>>smaller set would correspond to 256 characters in the larger set.  No
>>>way that can ever make sense, although the reverse mapping from the
>>>smaller set to the larger set does work just fine.
>>>
>>actually that is a simple matter of converting UTF-16 to UTF-8 if Unicode is
>>the character mapping of choice for 16 bit wchar_t and 8 bit char.

     Ken,
   I dispute that *variable-width* UTF-series encodings is some sort of
universally preferred encoding over fixed-width UCS-series encodings for
string manipulation.    It is generally preferred that text-processing
be permitted to have stateless random access to a string's entire
content without the stateful escape-character mechanisms imposed by the
UTF series encodings in Unicode.  The UCS series permit such stateless
random access to a string's entire content from beginning to end without
the need for parsing of escape characters from the beginning.  UCS-4's
over 4 billion  graphemes (over 4000 million graphemes in British
culture or over 4 milliard graphemes in European/Chuquet culture) is
plenty to accommodate every grapheme ever known by every era of
humankind since the beginning of time (and if SETI@home is successful
someday, probably all of the writing systems of multiple species of
space aliens too).

   The UTF-series encodings may in fact be the preferred *compacted*
form when the string its content not being manipulated piecemeal, such
as the name of a file in a filesystem.

   Also I dispute the implicit claim that the universally-preferred size
of wchar_t is 16 bits.  Note that g++ endows its wchar_t with a full 32
bits on most (if not all) platforms.  The popularity of g++ must be
reckoned with.

> OK.  Since it's so simple, let's just try a simple example.  Let us
> postulate two character sets, one much smaller than the other.  The
> first set contains the characters {a, b, c}, while the second contains
> the characters {A, B, C, D, E, F}.
>
> How do we map the larger set onto the smaller set?  Well, A -> a, B ->
> b, and C -> c seems obvious enough, but what then do we do with the
> remaining characters in the larger set?  Well, the easiest thing would
> be just to loop over the smaller set again so that D -> a, E -> b, and
> F -> c.  OK.  We have a defined mapping.  That was simple.  Now let's
> try to map some strings:
>
>   "AD" -> "aa"  - That was simple.  Maybe this isn't as difficult as I
> thought.
>
>   "DA" -> "aa"  - Ooops!
>
> This loss of information will always occur when mapping from a larger
> set to a smaller set, and for UTF-16 to UTF-8 we are talking not about
> a factor of 2, as with this simple example, but a factor of 256.
>
> So tell me again how simple this is.

   Randy,
   Your line of reasoning here mistakenly ignores any escape/shift
character mechanism.  However, there is a replacement downside of
larger-character-set to smaller-character-set encodings which I will
substitute though: annoying stateful variable-width encoding instead of
convenient stateless fixed-width encoding.

   Your line of reasoning here applies to trying to map UCS-4 to UCS-2
which are fixed-width character set encodings without Unicode-provided
escape/shift mechanisms.  UTF-8 and UTF-16 do have escape mechanisms
defined to map larger planes into smaller-sized character-sets at the
price of giving up on fixed-width characters.

   So in your examples, we could mnemonically pick c to be the esCape
character.  ccc would represent an instance of c (i.e., c outside of the
escape-convention encoding, as \\ in UNIX culture represents a single
instance of the ASCII \ grapheme).  Thus

   A -> a
   B -> b
   C -> ccc
   D -> ca
   E -> cb
   F -> cca

   The downside with this encoding (and thus with the UTF-series
encodings) is that knowing the offset of a beginning of a character
requires a stateful analysis of the string from its beginning.  For
example, quick tell me how many graphemes are in the string ADBCFECF.
Eight.  Quick tell me how many graphemes are in the string
acabcccccacbccccca.  Eight again, because it is a variable-width
encoding of the same string, but the answer was not so easily determined
with variable-width encodings.

   It is for this reason that many of us prefer manipulating strings in
UCS-4 (or even UCS-2 if we care only about characters currently in use
in languages whose writing system is mature/nonexperimental) to gain the
UCS-series' fixed-width properties, so that accessing graphemes in the
interior of a string is of UCS's fast constant time instead of UTF's
slow linear time.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 10 Aug 2002 11:35:12 GMT Raw View

Randy Maddox wrote:
>
> "James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message news:<3D51B95C.A60D5368@wizard.net>...
> > ....
> > The key problem is that I could not reasonably consider it abuse, if the
> > standard itself provides the class template they would be using. I'd
> > consider it a perfectly ordinary way of using something provided by the
> > standard. As such, I can't condemn the users for using it, but only the
> > people who decided that the standard should provide it (much the same
> > way that I feel about gets()).
> >
>
> But if you could not reasonably consider it abuse to do something
> which the standard allows, then do you now consider it abuse for
> someone to throw an exception not derived from std::exception?  This
> is no violation of the standard.

No, I would not consider it abuse; I'd consider it bad design. I'd
continue to consider it bad design to use any exception class not
derived from std::exception, even is your basic_exception<> template
class is approved. That's the problem: the standard would then be
implicitly endorsing bad design, by containing a construct who's only
possible use constitutes bad design.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 10 Aug 2002 11:47:25 GMT Raw View

James Kanze wrote:
>
> "James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
> news:<3D530686.D08ABFF0@wizard.net>...
...
> > I'm in (almost) perfect agreement with that; portable code certainly
> > can only specialize basic_string<> or the iostream templates on a
> > user-defined type, and must specialize char_traits<> on that same type
> > if it's to make any use of them.
>
> > However, I believe that non-portable code is allowed to make use of
> > implementation-defined instantiations for types that are defined by
> > the standard or by the implementation, or to make us of an
> > implementation-defined generic definition for those templates. I also
> > believe that non-portable code may take advantage of
> > implementation-specific permission to define such instantiations
> > themselves.
>
> Certainly.  Non-portable code can do anything the author can get away
> with his compiler:-).

But if we require that the compiler remain conforming, then there are a
great many things that the author could never get away with (at least,
not without a diagnostic). I believe that fully conforming
implementations can provide all of the features I described, without
triggering any mandatory diagnostics when using them. In any event, the
option of using a user-defined type, and explicitly defining
instantiations for all of those templates for that type, is completely
portable.

If my code (which is, ideally, completely portable) gets linked under
such an implementation to non-portable code written to use those
features, then (in some cases) I want my code to be able to catch any
exceptions that pass through it, coming from the non-portable code. I
can't do that; the best I can do is catch(std::exception&e), but what
I'd want to be able if this proposal is approved is
catch(std::basic_exception<charT>&e) for arbitrary charT, not just
'char' and 'wchar_t'. And that, of course, would not be possible.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Mon, 5 Aug 2002 04:41:31 GMT Raw View

"James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3D4C35C4.CA29AF3D@wizard.net...
> Edward Diener wrote:
> ....
> > the future. If users then decided to create their own exceptions based
on
> > other basic types, or their own user-defined types, that's up to them,
but
> > that wouldn't be part of the C++ standard library.
>
> If they're specializing a template that's declared by the C++ standard
> library, they're using part of that library.

"Using part of the C++ standard library" and "being part of the C++ standard
library" are not the same thing. The difference is crucial. "Being part of
the C++ standard library" means that C++ implementions must follow the
guidelines of the C++ standard. "Using part of the C++ standard library"
means that a user is free to specialize or derive in his own way, and
hopefully do it intelligently. I am only concerned in my suggestion of
parallel exception hierarchies based on native character types with the
"being part of the C++ standard library" definition.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 5 Aug 2002 16:20:39 GMT Raw View

Allan_W@my-dejanews.com (Allan W) wrote in message news:<23b84d65.0208021351.38bc35a7@posting.google.com>...
> rmaddox@isicns.com (Randy Maddox) wrote
> >
> > So I do indeed stand by my statement that since the implementation of
> > the standard exception classes is tied to the implementation of the
> > standard string classes, there is absolutely no danger of "infinite"
> > exception types.  Only the built-in types meet the requirements for
> > character types, and only the integer types really make any sense as
> > character types.
>
> There are certainly a lot more than that!

There really are not that many types that can be used as character
types.  Look at clause 3.9 of the Standard, paragraphs 1 through 4.
Only POD types can be used for character types.  And character types
are explicitly NOT allowed to require any of the following:

  User-defined ctor
  User-defined copy ctor
  User-defined assignment operator
  User-defined dtor

What does that leave us with?  Only those types that can be correctly
copied using memcpy(), i.e., only value types whose bit representation
contains all required information.  We're talking here only about
built-in types, or user-defined types composed solely of built-in
types.

Also, there is a required mapping between some integer type and a
given charater type.  In particular, the following must hold for some
character type, charT, and a character in that set, ch:

  ch == charT(char_traits<charT>.to_int(ch))

Thus, any POD type that can take on more values than can be
represented in some integer type is not really useful in that only the
subset of mappable values may actually be used.  Practical reality
trumps theoretical concern.

If the concern about "infinite" exception types is correct, then where
is the corresponding concern about "infinite" string types?  The
situation is exactly the same.  The string base class is templated on
its character type, identical to my proposal for the exception base
class.  Yet I hear not one word from anyone worried about "infinite"
string types.  Nor, despite the fact that the string classes have been
exposed to the possiblity of "infinite" string types for several years
do I see any string types other than string and wstring.  Why is that?
 Perhaps this concern really is misplaced?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 5 Aug 2002 16:20:31 GMT Raw View

Edward Diener wrote:
>
> "James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
> news:3D4C35C4.CA29AF3D@wizard.net...
> > Edward Diener wrote:
> > ....
> > > the future. If users then decided to create their own exceptions based
> on
> > > other basic types, or their own user-defined types, that's up to them,
> but
> > > that wouldn't be part of the C++ standard library.
> >
> > If they're specializing a template that's declared by the C++ standard
> > library, they're using part of that library.
>
> "Using part of the C++ standard library" and "being part of the C++ standard
> library" are not the same thing. The difference is crucial. "Being part of
> the C++ standard library" means that C++ implementions must follow the
> guidelines of the C++ standard. "Using part of the C++ standard library"
> means that a user is free to specialize or derive in his own way, and
> hopefully do it intelligently. ...

When specializing a standard library template for a user-defined type,
the user is still required to meet the standard's requirements for that
template.  I presume that there would be no special exemption from this
requirement for std::basic_exception<charT>. So the difference is not as
great as you suggest.

> ... I am only concerned in my suggestion of

I really don't care what you're concerned about. My concerns are about
the fact that I would be unable to catch std::basic_exeception<charT>
for arbitrary charT meeting the character requirements. If it were true
that char and wchar_t were the only possibilities, I would only be
mildly dis-gruntled by this suggestion; I could replace a single catch
statement with a pair of catch statements, and it wouldn't be horribly
difficult to automate the process. However, that's not the case.

An implementation is currently free to provide a definition of
std::char_traits<> and std::basic_string<> that works in some reasonable
fashion a wide variety of types, or has implementation-provided
specializations for various standard-defined or implementation-defined
types. A user is free to define any one of a literally infinite variety
of meaningfully different class types that meet all of the character
type requirements, and to specialize those templates for that type.
Unless std::basic_exception specifically disallows doing so (which would
break the analogy currently being made with basic_string), any of those
types could also be used to specialize std::basic_exception<charT>. And
I would, under some circumstances, have a corresponding need to be able
to catch them in code that knows nothing about that particular value of
charT. That is a need that I would be completely unable to satisfy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 5 Aug 2002 21:12:44 GMT Raw View

"Ken Shaw" <ken@DIESPAMDIEcompinnovations.com> wrote in message news:<aibu89$228@dispatch.concentric.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0208010531.1c697dbf@posting.google.com...
> > However, when you look at that statement a bit more deeply, it becomes
> > clear that we are talking about a mapping from 16-bit values to 8-bit
> > values, i.e., mapping a set of 65,536 values onto a set of 256 values,
> > which means, assuming even distribution, that each character in the
> > smaller set would correspond to 256 characters in the larger set.  No
> > way that can ever make sense, although the reverse mapping from the
> > smaller set to the larger set does work just fine.
>
> actually that is a simple matter of converting UTF-16 to UTF-8 if Unicode is
> the character mapping of choice for 16 bit wchar_t and 8 bit char.
>

OK.  Since it's so simple, let's just try a simple example.  Let us
postulate two character sets, one much smaller than the other.  The
first set contains the characters {a, b, c}, while the second contains
the characters {A, B, C, D, E, F}.

How do we map the larger set onto the smaller set?  Well, A -> a, B ->
b, and C -> c seems obvious enough, but what then do we do with the
remaining characters in the larger set?  Well, the easiest thing would
be just to loop over the smaller set again so that D -> a, E -> b, and
F -> c.  OK.  We have a defined mapping.  That was simple.  Now let's
try to map some strings:

  "AD" -> "aa"  - That was simple.  Maybe this isn't as difficult as I
thought.

  "DA" -> "aa"  - Ooops!

This loss of information will always occur when mapping from a larger
set to a smaller set, and for UTF-16 to UTF-8 we are talking not about
a factor of 2, as with this simple example, but a factor of 256.

So tell me again how simple this is.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Mon, 5 Aug 2002 23:47:40 GMT Raw View

"James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3D4E6CEA.5D3CB490@wizard.net...
> Edward Diener wrote:
> >
> > "James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
> > news:3D4C35C4.CA29AF3D@wizard.net...
> > > Edward Diener wrote:
> > > ....
> > > > the future. If users then decided to create their own exceptions
based
> > on
> > > > other basic types, or their own user-defined types, that's up to
them,
> > but
> > > > that wouldn't be part of the C++ standard library.
> > >
> > > If they're specializing a template that's declared by the C++ standard
> > > library, they're using part of that library.
> >
> > "Using part of the C++ standard library" and "being part of the C++
standard
> > library" are not the same thing. The difference is crucial. "Being part
of
> > the C++ standard library" means that C++ implementions must follow the
> > guidelines of the C++ standard. "Using part of the C++ standard library"
> > means that a user is free to specialize or derive in his own way, and
> > hopefully do it intelligently. ...
>
> When specializing a standard library template for a user-defined type,
> the user is still required to meet the standard's requirements for that
> template.  I presume that there would be no special exemption from this
> requirement for std::basic_exception<charT>.

Of course not.

> So the difference is not as
> great as you suggest.

I don't see that that follows.

>
> > ... I am only concerned in my suggestion of
>
> I really don't care what you're concerned about. My concerns are about
> the fact that I would be unable to catch std::basic_exeception<charT>
> for arbitrary charT meeting the character requirements. If it were true
> that char and wchar_t were the only possibilities, I would only be
> mildly dis-gruntled by this suggestion; I could replace a single catch
> statement with a pair of catch statements, and it wouldn't be horribly
> difficult to automate the process. However, that's not the case.

A user can throw any type of exception anyway for you to catch, and it
doesn't have to have anything to do with the C++ standard library
exceptions. In that case your current single std::exception catch or the
pair of catch statements with the proposal would do you no good in catching
all exceptions anyway.

I don't see any practical difference between the end user creating other
types on which to base their exception hierarchies using the parallel
template class and the end user just creating their own exceptions which are
not based on the C++ standard library.

I view the former case as less likely since the entire reason for the
parallel exception hierarchy is to allow a message returned for different
character types and language encodings. The end user creating another
parallel exception hierarchy based on their own type would serve no function
at all since it is highly unlikely that their own type can return a
different character encoding message than a basic character type using their
own type. The only possibility I see in this area is for the end user to
adopt some one of the semi-official Unicode encodings as his "type", in
which case more power to him.

The latter case, which exists today, may be done by anyone who doesn't want
to follow the C++ standard library exception classes. This may not be
"right" but it is certainly allowed.

I can understand your saying that if the exception hierarchy was template
based it would allow end users to create hierarchies based on non-basic
character types, and you don't believe that freedom is a good idea when it
comes to having to catch exceptions. I disagree because I don't think that
freedom will be any more readily abused than the current case where
exceptions are created which having nothing to do with the C++ standard
library and have to be caught.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 3 Aug 2002 00:07:35 GMT Raw View

Randy Maddox wrote:
>
> OK.  Thanks to James Kuyper, who has had some trouble posting to this
> discussion and has instead sent many emails directly to me, I finally

I finally got my new computer system working and connected, so I'm
finally able to fully re-join this forum.

....
> exception types.  Only the built-in types meet the requirements for
> character types, and only the integer types really make any sense as

Citation, please? As far as I can tell, there's no reason a POD struct
could not meet those requirements as well. This would require
specialization of std::char_traits and std::basic_string for that type,
which is a lot of work if the implementation does not choose to provide
a generic definition of those templates. Still, it would be legal.

Between the operator overloads of that POD struct, and the
specializations of those templates, there's an enormous potential for
interestingly non-trivial behavior of such a user-defined type, with a
correspondingly large number of different reasons why someone might want
to define one. Indeed, the effort that's been put into char_traits<>
(particularly into the genericity of the Table 37 requirements) argues
strongly that the writers expected user-defined types with non-trivial
implementations of those functions.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Sat, 3 Aug 2002 10:38:54 GMT Raw View

"James Russell Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3D4B158F.6A6D5BC5@wizard.net...
> Randy Maddox wrote:
> >
> > OK.  Thanks to James Kuyper, who has had some trouble posting to this
> > discussion and has instead sent many emails directly to me, I finally
>
> I finally got my new computer system working and connected, so I'm
> finally able to fully re-join this forum.
>
> ....
> > exception types.  Only the built-in types meet the requirements for
> > character types, and only the integer types really make any sense as
>
> Citation, please? As far as I can tell, there's no reason a POD struct
> could not meet those requirements as well. This would require
> specialization of std::char_traits and std::basic_string for that type,
> which is a lot of work if the implementation does not choose to provide
> a generic definition of those templates. Still, it would be legal.
>
> Between the operator overloads of that POD struct, and the
> specializations of those templates, there's an enormous potential for
> interestingly non-trivial behavior of such a user-defined type, with a
> correspondingly large number of different reasons why someone might want
> to define one. Indeed, the effort that's been put into char_traits<>
> (particularly into the genericity of the Table 37 requirements) argues
> strongly that the writers expected user-defined types with non-trivial
> implementations of those functions.

My own point is that exception hierarchies based on the character type, in
order to return a what() explanation using that character type, would be
implemented as part of the C++ standard library. The current character types
are "char" and "wchar_t" but more basic character types might be added in
the future. If users then decided to create their own exceptions based on
other basic types, or their own user-defined types, that's up to them, but
that wouldn't be part of the C++ standard library.

The fear that there may be an "infinite" number of exception hierarchies, if
C++ were to have parallel exception hierarchies based on the basic character
types, is unwarranted. If C++ can not introduce a new idea and
implementation without being afraid that end users will push that idea to
the point of absurdity, then the language itself is in bigger trouble than
just considering this suggestion as a possibility. But I don't think for one
second that is the case. C++ has always trusted the programmer to be
intelligent and know what he is doing and I don't see why that  philosophy
should change or serve as an excuse for not considering changes to the
language itself or the C++ standard library.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Sat, 3 Aug 2002 19:03:43 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote

a lot more calmly than previously -- thank you.

> Due to the intractability of the mapping problem I would say that
[the approach of converting wide character strings to narrow ones]
> simply cannot be made to work.  Thus I stand by my proposed
> solution of a templated base class.  At least this would give anyone
> using wide-character exceptions a single point to hook derived
> exception classes to, and both char and wchar_t exceptions derived
> from the standard exception base classes could all be caught with only
> two catch statements.
>
> That last statement has been argued against in several postings about
> the "infinite" number of exception base classes made possible by
> templating basic_exception on its character type.  People have
> postulated basic_exception<int>, basic_exception<MyCharType>, and
> basic_exception<AnyClassILike>.  If true this would indeed be a
> serious issue, but, as I have pointed out previously, this is not in
> fact the case.  There are a great number of restrictions on what can
> be used as a character type.  For details, see any of the following
> citations:
>
> C++ Standard:  clause 21, paragraph 1, "... characters may be of any
> POD
> type ...".
>
> C++ Standard:  clause 3.9, paragraphs 1 through 4 describe a POD type.
>
> Stroustrup, "The C++ Programming Language", Special Edition, section
> 20.2.1,
> "... requires that a type used as its character type does not have
> user-defined copy operations ...".
>
> Langer and Kreft, "Standard C++ IOStreams and Locales", section
> 2.3.3.1
> describes in great detail the requirements for types that may be used
> as
> characters.
>
> So I do indeed stand by my statement that since the implementation of
> the standard exception classes is tied to the implementation of the
> standard string classes, there is absolutely no danger of "infinite"
> exception types.  Only the built-in types meet the requirements for
> character types, and only the integer types really make any sense as
> character types.

There are certainly a lot more than that!

    enum sevenbit { min=0, max=127 };
    enum percent { zero=0, onehundred=100 };
    enum arbitrary { fee=123, fie=234, foe=345, fum=4567 };

It's true that enums have an "underlying" type which accept only integral
values. But from a language standpoint, sevenbit is completely distinct
from either 8- or 16-bit integers.

    typedef std::basic_exception<sevenbit> sevenbitexception;

> We already have 8-bit (char) and 16-bit (wchar_t)
> character types, so I submit that at most we might have to someday
> worry about a 32-bit (long_wchar_t ?) character type.  So the concern
> of "infinite" exception types, which is shared by many, while
> understandable, is based on a lack of information. If any type could
> be used as a character type, then this would be a serious objection,
> but as the facts clearly demonstrate, it is really just a misdirected
> concern.

I respectfully disagree.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "James Russell Kuyper Jr." <kuyper@wizard.net>
Date: Mon, 5 Aug 2002 02:22:24 GMT Raw View

Edward Diener wrote:
....
> the future. If users then decided to create their own exceptions based on
> other basic types, or their own user-defined types, that's up to them, but
> that wouldn't be part of the C++ standard library.

If they're specializing a template that's declared by the C++ standard
library, they're using part of that library.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: 30 Jul 2002 22:45:01 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote
> Peter Dimov" <pdimov@mmltd.net> wrote
> > There is nothing inherently wrong with parallel exception hierarchies
> > per se.
> >
> > What is wrong with the proposed std::basic_exception is not that it's
> > evil; it's that it is not a solution to a practical problem. Actual
> > experience with parallel exception hierarchies suggests that
> > basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> > useless even with std:: in front; its main purpose would be to
> > generate questions of the sort: "why doesn't std::wstring throw
> > wlength_error and wout_of_range?"
>
> If it existed, would it not be useful when throwing an exception from some
> standard library wide character implementation ?

Maybe, but not in any way that is as obvious as you seem to think.

> As an example, for
> std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically make
> more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> existed, and pass a wide character string to it ? To me it does.

What wide character string would you pass to it? The contents of the
wstring?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Tue, 30 Jul 2002 23:56:13 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote
> Really, I don't think the C+ language and library can always
> intelligently move forward and address new ideas and uses without correcting
> some of the possible design mistakes of the past.

Really, I don't think that every inconsistency is neccesarily a
"design mistake" -- and among those that are, not all of them need
to be corrected. Weigh the likely costs against the likely benefits!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 31 Jul 2002 15:03:28 GMT Raw View

Allan_W@my-dejanews.com (Allan W) wrote in message news:<23b84d65.0207301436.10e83867@posting.google.com>...
> "Edward Diener" <eldiener@earthlink.net> wrote
> > Peter Dimov" <pdimov@mmltd.net> wrote
> > As an example, for
> > std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically make
> > more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> > existed, and pass a wide character string to it ? To me it does.
>
> What wide character string would you pass to it? The contents of the
> wstring?
>

Most likely something along the lines of L"Position dd is invalid", or
something like that, an informative error message at any rate.

I see the point Mr. Diener is making here:  If a program is dealing
only with wide-character strings, why should it have to get an
exception message from one of those wide-character strings as a char
string?  That forces the program to deal with both char and wchar_t
strings when it would be just as happy dealing only with wchar_t
strings.

Although I see the validity of this point, I do not agree that the
stdlib should be modified to throw anything other than what it already
does.  The effort of updating existing code to deal with this would,
IMHO, seem to outweigh the benefit.  And if existing code were not
updated, then it would be broken when an unexpected, and uncaught,
wchar_t exception was thrown from the stdlib.  Breaking existing code
with a stdlib upgrade is not what I would call a good thing.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Wed, 31 Jul 2002 17:54:42 GMT Raw View

"Allan W" <Allan_W@my-dejanews.com> wrote in message
news:23b84d65.0207301436.10e83867@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote
> > Peter Dimov" <pdimov@mmltd.net> wrote
> > > There is nothing inherently wrong with parallel exception hierarchies
> > > per se.
> > >
> > > What is wrong with the proposed std::basic_exception is not that it's
> > > evil; it's that it is not a solution to a practical problem. Actual
> > > experience with parallel exception hierarchies suggests that
> > > basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> > > useless even with std:: in front; its main purpose would be to
> > > generate questions of the sort: "why doesn't std::wstring throw
> > > wlength_error and wout_of_range?"
> >
> > If it existed, would it not be useful when throwing an exception from
some
> > standard library wide character implementation ?
>
> Maybe, but not in any way that is as obvious as you seem to think.
>
> > As an example, for
> > std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically
make
> > more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> > existed, and pass a wide character string to it ? To me it does.
>
> What wide character string would you pass to it? The contents of the
> wstring?

Yes, and a message stating in the language of the implementation that the
position x is out of range.

Two points:

1) The actual message passed in the standard exceptions is not specified by
the C++ standard and is therefore implementation dependent.

2) The idea is to simply allow implementors of wide character C++ standard
library classes to pass back intelligent messages in their own language
encoding.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Wed, 31 Jul 2002 17:58:27 GMT Raw View

"Allan W" <Allan_W@my-dejanews.com> wrote in message
news:23b84d65.0207301452.22c0b848@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote
> > Really, I don't think the C+ language and library can always
> > intelligently move forward and address new ideas and uses without
correcting
> > some of the possible design mistakes of the past.
>
> Really, I don't think that every inconsistency is neccesarily a
> "design mistake" -- and among those that are, not all of them need
> to be corrected. Weigh the likely costs against the likely benefits!

I agree. But in this case it is a general inconsistency which says: you can
not pass back exception messages in a wide character language encoding but
instead you must come up with another methodology for doing this. A number
of other methodologies have been proposed, none of which seemed very
satisfactory to me, but evidently are satisfactory to others. Playing
devil's advocate, among them are:

1) Pass back a numeric string and document what this means or use a message
catalog with this number.
2) Pass back an MBCS string.

To which might be added:

3) Pass back an English ASCII string and assume that most programmers know
English.
4) Derive one's own documented type from one of the standard exception types
and document what this means without having to pass back any string.
5) The what() string is generally pretty useless anyway.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Wed, 31 Jul 2002 23:16:59 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207310519.54c350d2@posting.google.com...
> Allan_W@my-dejanews.com (Allan W) wrote in message
news:<23b84d65.0207301436.10e83867@posting.google.com>...
> > "Edward Diener" <eldiener@earthlink.net> wrote
> > > Peter Dimov" <pdimov@mmltd.net> wrote
> > > As an example, for
> > > std::wstring::at(size_type pos) if pos >= size(), wouldn't it
logically make
> > > more sense to throw a std::basic_out_of_range<wchar_t> exception, if
it
> > > existed, and pass a wide character string to it ? To me it does.
> >
> > What wide character string would you pass to it? The contents of the
> > wstring?
> >
>
> Most likely something along the lines of L"Position dd is invalid", or
> something like that, an informative error message at any rate.
>
> I see the point Mr. Diener is making here:  If a program is dealing
> only with wide-character strings, why should it have to get an
> exception message from one of those wide-character strings as a char
> string?  That forces the program to deal with both char and wchar_t
> strings when it would be just as happy dealing only with wchar_t
> strings.
>
> Although I see the validity of this point, I do not agree that the
> stdlib should be modified to throw anything other than what it already
> does.  The effort of updating existing code to deal with this would,
> IMHO, seem to outweigh the benefit.  And if existing code were not
> updated, then it would be broken when an unexpected, and uncaught,
> wchar_t exception was thrown from the stdlib.  Breaking existing code
> with a stdlib upgrade is not what I would call a good thing.

I can live with the anomaly of existing standard library wide character
implementations throwing a narrow character exception in a world where wide
characters exceptions existed. I can think of only one case where this
currently happens in the standard library anyway, which is with
std::wstring::at, although there may be some others.

In this hypothetical world where wide character exceptions existed, the
future may well see more standard library implementations templates based on
the character type. I would then encourage any exceptions thrown by those
wide character implementations to be wide character exceptions. In that case
the anomaly would show up more clearly and the documentation of the standard
library might read for std::wstring::at "This implemention throws a narrow
character exception only because of legacy code from the early days of C++,
while other wide character implementations in the standard library throw the
normal appropriate wide character exceptions". Of course my point is, if one
breaks code now, one avoids the more obvious anomaly later and the need to
remember it, but I do understand that breaking any existing code is one of
the taboos of C++ and not a pleasant experience for C++ programmers.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 1 Aug 2002 15:09:49 GMT Raw View

OK.  Thanks to James Kuyper, who has had some trouble posting to this
discussion and has instead sent many emails directly to me, I finally
understand one of the points that some people have been making in this
thread:

Even if wide-character exceptions were added, the argument is that
they still want to be able to catch all expected exceptions as
std::exception.  To this end the suggestion is to derive the
wide-character exceptions from std::exception, a technique I never
understood the rationale behind since it is so obviously not the best
idea in a number of regards, not the least of which is the
difficulties involved in having both a char and wchar_t string in the
same object, which I was completely at a loss to grasp the reasoning
behind.

Mr. Kuyper explained the rationale of still being able to catch even a
wide-character exception as std::exception, which I now understand,
and responded to my question about what the base class, i.e.,
char-based, what() member should return, or how the char what_arg
string of the base class could even be initialized by the derived
wide-character exception.  His response was that the base class
what_arg string should be a "suitably converted version of the
wide-character string", which sounds plausible on the face of it.

However, when you look at that statement a bit more deeply, it becomes
clear that we are talking about a mapping from 16-bit values to 8-bit
values, i.e., mapping a set of 65,536 values onto a set of 256 values,
which means, assuming even distribution, that each character in the
smaller set would correspond to 256 characters in the larger set.  No
way that can ever make sense, although the reverse mapping from the
smaller set to the larger set does work just fine.

Due to the intractability of the mapping problem I would say that this
approach simply cannot be made to work.  Thus I stand by my proposed
solution of a templated base class.  At least this would give anyone
using wide-character exceptions a single point to hook derived
exception classes to, and both char and wchar_t exceptions derived
from the standard exception base classes could all be caught with only
two catch statements.

That last statement has been argued against in several postings about
the "infinite" number of exception base classes made possible by
templating basic_exception on its character type.  People have
postulated basic_exception<int>, basic_exception<MyCharType>, and
basic_exception<AnyClassILike>.  If true this would indeed be a
serious issue, but, as I have pointed out previously, this is not in
fact the case.  There are a great number of restrictions on what can
be used as a character type.  For details, see any of the following
citations:

C++ Standard:  clause 21, paragraph 1, "... characters may be of any
POD
type ...".

C++ Standard:  clause 3.9, paragraphs 1 through 4 describe a POD type.

Stroustrup, "The C++ Programming Language", Special Edition, section
20.2.1,
"... requires that a type used as its character type does not have
user-defined copy operations ...".

Langer and Kreft, "Standard C++ IOStreams and Locales", section
2.3.3.1
describes in great detail the requirements for types that may be used
as
characters.

So I do indeed stand by my statement that since the implementation of
the standard exception classes is tied to the implementation of the
standard string classes, there is absolutely no danger of "infinite"
exception types.  Only the built-in types meet the requirements for
character types, and only the integer types really make any sense as
character types.  We already have 8-bit (char) and 16-bit (wchar_t)
character types, so I submit that at most we might have to someday
worry about a 32-bit (long_wchar_t ?) character type.  So the concern
of "infinite" exception types, which is shared by many, while
understandable, is based on a lack of information.  If any type could
be used as a character type, then this would be a serious objection,
but as the facts clearly demonstrate, it is really just a misdirected
concern.

I hope that this goes some way toward addressing the concerns that
have been raised in this thread, and I thank everyone who has
participated for their interest and feedback.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Ken Shaw" <ken@DIESPAMDIEcompinnovations.com>
Date: Thu, 1 Aug 2002 21:52:28 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0208010531.1c697dbf@posting.google.com...
> OK.  Thanks to James Kuyper, who has had some trouble posting to this
> discussion and has instead sent many emails directly to me, I finally
> understand one of the points that some people have been making in this
> thread:
>
> Even if wide-character exceptions were added, the argument is that
> they still want to be able to catch all expected exceptions as
> std::exception.  To this end the suggestion is to derive the
> wide-character exceptions from std::exception, a technique I never
> understood the rationale behind since it is so obviously not the best
> idea in a number of regards, not the least of which is the
> difficulties involved in having both a char and wchar_t string in the
> same object, which I was completely at a loss to grasp the reasoning
> behind.
>
> Mr. Kuyper explained the rationale of still being able to catch even a
> wide-character exception as std::exception, which I now understand,
> and responded to my question about what the base class, i.e.,
> char-based, what() member should return, or how the char what_arg
> string of the base class could even be initialized by the derived
> wide-character exception.  His response was that the base class
> what_arg string should be a "suitably converted version of the
> wide-character string", which sounds plausible on the face of it.
>
> However, when you look at that statement a bit more deeply, it becomes
> clear that we are talking about a mapping from 16-bit values to 8-bit
> values, i.e., mapping a set of 65,536 values onto a set of 256 values,
> which means, assuming even distribution, that each character in the
> smaller set would correspond to 256 characters in the larger set.  No
> way that can ever make sense, although the reverse mapping from the
> smaller set to the larger set does work just fine.

actually that is a simple matter of converting UTF-16 to UTF-8 if Unicode is
the character mapping of choice for 16 bit wchar_t and 8 bit char.

>
> Due to the intractability of the mapping problem I would say that this
> approach simply cannot be made to work.  Thus I stand by my proposed
> solution of a templated base class.  At least this would give anyone
> using wide-character exceptions a single point to hook derived
> exception classes to, and both char and wchar_t exceptions derived
> from the standard exception base classes could all be caught with only
> two catch statements.
>
> That last statement has been argued against in several postings about
> the "infinite" number of exception base classes made possible by
> templating basic_exception on its character type.  People have
> postulated basic_exception<int>, basic_exception<MyCharType>, and
> basic_exception<AnyClassILike>.  If true this would indeed be a
> serious issue, but, as I have pointed out previously, this is not in
> fact the case.  There are a great number of restrictions on what can
> be used as a character type.  For details, see any of the following
> citations:
>
> C++ Standard:  clause 21, paragraph 1, "... characters may be of any
> POD
> type ...".
>
> C++ Standard:  clause 3.9, paragraphs 1 through 4 describe a POD type.
>
> Stroustrup, "The C++ Programming Language", Special Edition, section
> 20.2.1,
> "... requires that a type used as its character type does not have
> user-defined copy operations ...".
>
> Langer and Kreft, "Standard C++ IOStreams and Locales", section
> 2.3.3.1
> describes in great detail the requirements for types that may be used
> as
> characters.
>
> So I do indeed stand by my statement that since the implementation of
> the standard exception classes is tied to the implementation of the
> standard string classes, there is absolutely no danger of "infinite"
> exception types.  Only the built-in types meet the requirements for
> character types, and only the integer types really make any sense as
> character types.  We already have 8-bit (char) and 16-bit (wchar_t)
> character types, so I submit that at most we might have to someday
> worry about a 32-bit (long_wchar_t ?) character type.  So the concern
> of "infinite" exception types, which is shared by many, while
> understandable, is based on a lack of information.  If any type could
> be used as a character type, then this would be a serious objection,
> but as the facts clearly demonstrate, it is really just a misdirected
> concern.

actually g++ makes wchar_t 32 bits on most if not all systems it supports.
Also Unicode does presently support a 4 bite version, UCS-4. I think the
primary problem having to do with std::exception and wide character support
is a failing in the locale part of the standard library. We're supposed to
have a message catalog feature and simple encoding conversions available but
those facts and implementation details are buried so deep in the library
that with the exception of forthcoming material from Dinkumware I have never
seen any commercial product provide any sort of reasonable standard library
locale support.

>
> I hope that this goes some way toward addressing the concerns that
> have been raised in this thread, and I thank everyone who has
> participated for their interest and feedback.
>
> Randy.
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]
>


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 17 Jul 2002 15:51:50 GMT Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<8yMY8.47900$6r.1539212@news4.srv.hcvlny.cv.net>...
> Throwing exception objects not derived from std::exception is simply a very
> bad idea ("shooting yourself in the foot"), and not likely a good footing to
> prove a point. C++ is great about letting users do what they want (sometimes
> ;-). catch(...) is very bad and should be avoided because it can catch
> non-C++ exceptions such as OS-specific SEH-style exceptions on various
> platforms, which are often intended to be fatal errors (such as
> access-violations). Only through strict adherence to the rule of all
> exception objects inherit from std::exception (or subclasses) can proper C++
> catch-all exception handling really be done, via use of
> catch(std::exception&). Changing std::exception from an ultimate base-class
> to a sibling class with infinite other classes is not acceptible.

There is no change suggested here as far as use of std::exception as
"an ultimate base class" is concerned.  std::exception is currently
only one of many possible base classes for exceptions, so
characterizing this situation remaining identical to its current state
as a change is a false characterization.

>
>
> Why does it have to be this way? Because there is no other choice, it was
> already released with std::exception as the ultimate base class, and there's
> no way to change that now; it must always be the ultimate base class for now
> and forever.

See Chapter 15 (Exception Handling) of the C++ standard.  The standard
exception hierarchy is not even mentioned there.  There is nothing in
C++ that std::exception is the ultimate base class of any but the
exceptions that are explicitly thrown by the stdlib.

> It is too late to change the what() interface. Maybe a language-independent
> message ID interface could be added to std::exception in the future which
> could be of use for internationalization concerns and wouldn't break
> existing code.

Again, your assertion that this would break existing code is simply
erroneous.  Check it out.  Look it up.  You are arguing based on a
misconception or misunderstanding, which is no basis for a valid
argument.

Randy.

>
> thanks,
> hys
>
> --
> Hillel Y. Sims
> FactSet Research Systems
> hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 17 Jul 2002 15:51:59 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<YQ1Z8.25437$A43.2538691@newsread2.prod.itd.earthlink.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207161053.64cc8afb@posting.google.com...
> > "Edward Diener" <eldiener@earthlink.net> wrote in message
>  news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> > > "James Dennett" <jdennett@acm.org> wrote in message
> > > news:3D2DD9D1.7010109@acm.org...
> > > > Randy Maddox wrote:
> > > Regarding your first statement, it is true that it breaks the idiom of
> > > catching all possible exceptions by catching std::exception. Perhaps
>  this is
> > > not such a serious problem as it is viewed.
> > >
> >
> > I see a point of confusion here.  It is NOT true that
> >
> >   catch(const std::exception & ex)
> >
> > will catch "all possible exceptions".
>
> You are correct. I phrased that incorrectly above. I should have written "it
> is true it breaks the idiom of catching all possible exceptions that are
> thrown directly by standard library classes". Further on in that same
> message I clarified why it was not "such a serious" problem in arguments
> which are very similar to what you have written below.
>

I thank you for your support during this discussion, but must point
out here that I am NOT suggesting anything that would change the set
of exceptions currently explicitly thrown by the stdlib.  That would
indeed break existing code and I am highly opposed to that.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 17 Jul 2002 15:52:21 GMT Raw View

Daniel Miller <daniel.miller@tellabs.com> wrote in message news:<3D3487D5.7010806@tellabs.com>...
> Peter Dimov wrote:
>
> > "Edward Diener" <eldiener@earthlink.net> wrote in message news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> >
> >>"James Dennett" <jdennett@acm.org> wrote in message
> >>news:3D2DD9D1.7010109@acm.org...
> >>
> > [...]
> >
> >>>* it fails to meet the needs of internationalised applications
> >>>   anyway, as they would typically throw an exception with a
> >>>   message id if they needed text to be presented to a user,
> >>>   and for other text a narrow string generally suffices
> >>>
> >>Yes, they could throw a message ID, an argument Peter Dimov originally
> >>brought up, when I first brought up this subject. But why should they have
> >>to ?
> >>
> >
> > The argument is not that "they could throw a message ID as a
> > workaround." The argument is: "in my experience, when writing a robust
> > localizable application, you _must_ throw a message ID, and not text."
>
>
>    The fundamental difference of understanding between Edward Diener
> versus Peter Dimov is for how many locales is the software to be
> localized.  Peter's technique of passing message-ids around in the
> transnational bulk of software, delaying conversion of each message-id
> to the installed local to a lingusitic/user-interface/outer-perimeter
> layer of software is prepared to have, say, 37 locales which it
> supports.  Throwing a type containing a localized string from some deep
> layer of software requires that layer of software to know about all 37
> locales and no longer be simply obliviously/blithely metanational.

Nobody is arguing that hard-coding text strings into the code of an
application is a good idea.  However, there are many possible
approaches to design of an application that works well in i18n
situations.  The approach of throwing a message id is a common
solution, but is NOT the only possible solution.

>
>    I agree with Peter Dimov.
>    "Wouldn't it be better to let those using a wide character encoding
> incorporate a wide character error string in their exception?"  No, it
> is better to have some single message-id key known by lower layers of
> software than to have every layer of software (no matter how low-level)
> know about all 37 languages for which that software has been localized.
>   Passing around message-ids in the layers of software which are
> transnational simplifies those layers.  Isolating conversion of the
> message-id to an i18n layer of software keeps the linguistic
> sophistication isolated from the metanational bulk of the software.

Isolating the translation from message id to message text is a sound
design principle.  The choice of where that translation is done,
however, is not fixed.  It may be done at upper layers, or it may be
done at lower layers.  You seem to be saying that doing this
translation at the uppermost layer is the only valid approach.  I
don't think it's really reasonable to insist that one approach that
works must be the only approach that anyone can ever use.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Wed, 17 Jul 2002 15:52:41 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<0F1Z8.25412$A43.2537905@newsread2.prod.itd.earthlink.net>...
>
> Perhaps I do not have enough experience in throwing and catching exceptions,
> but I truly do not see what is so negative in having parallel exception
> hierarchies based on the C++ intrinsic character types in order to return
> strings of that type, whether message IDs or actual locale-based grammatical
> text.

There is nothing inherently wrong with parallel exception hierarchies
per se.

What is wrong with the proposed std::basic_exception is not that it's
evil; it's that it is not a solution to a practical problem. Actual
experience with parallel exception hierarchies suggests that
basic_exception<wchar_t> is rarely, if ever, useful. It will remain
useless even with std:: in front; its main purpose would be to
generate questions of the sort: "why doesn't std::wstring throw
wlength_error and wout_of_range?"

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 17 Jul 2002 17:22:10 GMT Raw View

pdimov@mmltd.net (Peter Dimov) wrote in message news:<7dc3b1ea.0207160546.3dd559f2@posting.google.com>...
> "Edward Diener" <eldiener@earthlink.net> wrote in message news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> > "James Dennett" <jdennett@acm.org> wrote in message
> > news:3D2DD9D1.7010109@acm.org...
>  [...]
> > > * it fails to meet the needs of internationalised applications
> > >    anyway, as they would typically throw an exception with a
> > >    message id if they needed text to be presented to a user,
> > >    and for other text a narrow string generally suffices
> >
> > Yes, they could throw a message ID, an argument Peter Dimov originally
> > brought up, when I first brought up this subject. But why should they have
> > to ?
>
> The argument is not that "they could throw a message ID as a
> workaround." The argument is: "in my experience, when writing a robust
> localizable application, you _must_ throw a message ID, and not text."

That certainly is one approach, but would you mandate that it be the
only approach allowed?  Part of the whole philosophy of C++ is to
support many different programming styles and idioms.  The fact that
one technique works well for you in no way implies that must be the
only technique allowed.  As an alternative, what if you used message
catalogs and threw a message from the catalog?  The point here is not
to disagree with your approach, which since it works for you is
demonstrably valid, but again only to argue that there is no reason
other approaches should not also be allowed.

>
> > Wouldn't it be better to let those using a wide character encoding
> > incorporate a wide character error string in their exception ?
>
> The surprising answer is "no." Throwing a wide character string
> doesn't solve any real i18n problems. In a real application, you
> simply don't _have_ a wide character string to throw; there is no text
> anywhere in the program.
>

Again, that is one approach.  There could easily be other solutions
that did not involve hard-coded text in the program and yet worked
well with throwing exceptions containing useful messages in the local
character set.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Wed, 17 Jul 2002 19:07:52 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207161053.64cc8afb@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> > "James Dennett" <jdennett@acm.org> wrote in message
> > news:3D2DD9D1.7010109@acm.org...
> > > Randy Maddox wrote:
> > Regarding your first statement, it is true that it breaks the idiom of
> > catching all possible exceptions by catching std::exception. Perhaps
this is
> > not such a serious problem as it is viewed.
> >
>
> I see a point of confusion here.  It is NOT true that
>
>   catch(const std::exception & ex)
>
> will catch "all possible exceptions".

If you are only dealing with exceptions that are publicly derived from
std::exception (as you always should be), then it does catch all possible
exceptions. If you are dealing with any exceptions that are not publicly
derived from std::exception, how do you ensure properly catching all
exceptions in nothrow situations? catch(...) is not a good answer for
_handling_ exceptions, because the exception does not even have a name.
Contrary to popular advice, you should never actually do "catch (...) {}" to
achive nothrow code, but you can safely do "catch (std::exception&) {}" if
all exception types are inherited from std::exception. You cannot correctly
handle an exception that you do not know what type it is and cannot even
give it a name.

> It is NOT even true that this
> will catch all exceptions occurring during use of the stdlib.

? As far as I know, all exceptions thrown by std library components are
inherited from std::exception...?

>
> It is ONLY true that this will catch all exceptions that are, or are
> publicly derived from, std::exception.  Any user-defined exception
> that is not publicly derived from std::exception may still occur while
> using the stdlib, since that library uses the ctor, copy ctor and
> assignment operator of the class being used with the stdlib.

The fact that user-code throws a custom exception has nothing to do with the
types of exceptions being thrown by the standard library.

>  That
> class may throw any exception, of any type, it desires.  There is no
> requirement that such an exception be in any way related to
> std::exception, which means that catching only std::exception may not
> catch any of a whole universe of possible exceptions.

Nope, but think about what benefits you do get if all custom exception types
_are_ derived from std::exception.

>
> My proposal to implement std::exception as a typedef of
> std::basic_exception makes no difference to this situation.  Catching
> only std::exception will still catch only any and all exceptions
> publicly derived from std::exception.

....which should be all user-defined exception types in well-written
software. Changing std::exception to no longer even possibly be the ultimate
base class of all C++ exceptions would result in a fundamental flaw in the
C++ exception-handling mechanism.

thanks,
hys

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 17 Jul 2002 19:07:59 GMT Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<MI4Z8.62622$6r.1977537@news4.srv.hcvlny.cv.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207161044.5e284c8c@posting.google.com...
> > >
> > > MyClass::~MyClass()
> > > {
> > >   try {
> > >     .. cleanup code that may throw exceptions ..
> > >   }
>  catch (std::exception&) {}
> > > }
> > >
> > > How can this idiom work correctly in the presence of templated exception
> > > base classes (except via a fictional templated catch clause)?
> >
> > This works just fine because you can throw pretty much anything that
> > has a copy constructor, and catch the same.
>
> Yes, you _can_ also dereference off the end of the bounds of an array. Does
> that mean you _should_ do it? User exceptions should always be derived from
> std::exception (or subclass). All library exceptions adhere to this
> strategy. There are various reasons why it is a fairly bad idea to
> flippantly use exceptions that are not derived from std::exception or
> catch(...) (I have detailed this argument in
> http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&
> th=1218f94d7b6d779a&seekm=3D331092.EDAD6B0A%40web.de#link19 ) Basing a
> templated exception strategy on a flawed exception-handling technique is a
> bad idea.

Well, there is a rather big difference there in that dereferencing
beyond the end of an array means that your program behavior is
undefined, while throwing an exception of some type not derived from
std::exception is well defined.  So equating these two is a false
comparison.

And yes it is a good idea to derive user exceptions from
std::exception, but just because something is in general a good idea
does not then imply that it must necessarily be the only thing
allowed.  The C++ committee quite wisely decided that the standard
exception hierarchy, while sufficient for the needs of the stdlib, was
not appropriate to impose as a one-size-fits-all solution that would
disallow all other approaches.

Are you proposing that throwing anything not publicly derived from
std::exception should be disallowed?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: news_comp.std.c++_expires-2002-09-01@nmhq.net (Niklas Matthies)
Date: Wed, 17 Jul 2002 19:08:19 GMT Raw View

On Wed, 17 Jul 2002 15:52:41 GMT, Peter Dimov <pdimov@mmltd.net> wrote:
>  "Edward Diener" <eldiener@earthlink.net> wrote in message news:<0F1Z8.25412$A43.2537905@newsread2.prod.itd.earthlink.net>...
> >
> > Perhaps I do not have enough experience in throwing and catching exceptions,
> > but I truly do not see what is so negative in having parallel exception
> > hierarchies based on the C++ intrinsic character types in order to return
> > strings of that type, whether message IDs or actual locale-based grammatical
> > text.
>
>  There is nothing inherently wrong with parallel exception hierarchies
>  per se.
>
>  What is wrong with the proposed std::basic_exception is not that it's
>  evil; it's that it is not a solution to a practical problem. Actual
>  experience with parallel exception hierarchies suggests that
>  basic_exception<wchar_t> is rarely, if ever, useful. It will remain
>  useless even with std:: in front; its main purpose would be to
>  generate questions of the sort: "why doesn't std::wstring throw
>  wlength_error and wout_of_range?"

How about instead adding std::wexception, derived from std::exception and
providing an additional member function `wchar_t const * wwhat() const'?
Not that I'm dramatically in favor of supporting "wide exceptions", but
it seems to me that this would be a simple solution that doesn't hurt
anyone and provides what is being asked for in this thread.

-- Niklas Matthies
--
Good judgement comes from experience,
and experience comes from bad judgement.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 17 Jul 2002 23:05:52 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message
news:<HfVY8.24530$A43.2448625@newsread2.prod.itd.earthlink.net>...

> What I would like to see is a serious discussion by the C++ standards
> committee of the internationalization issues represented in this
> thread, rather than a rejection of them by members of the committee in
> this thread solely on the basis of these ideas not having been
> implemented yet by anybody. I can respect the C++ commitee wanting
> implementations of ideas but I can not understand the C++ committee
> summarily rejecting any well-thought out idea, argued and discussed
> cogently, simply because an implementation does not exist.

But that is neither the purpose of a standard, nor the purpose of the
standardization committee, to try out new ideas, no matter how
well-thought out.  The standardization committee is not, or should not
be, a research organization.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Wed, 17 Jul 2002 19:54:12 CST Raw View

"Edward Diener" <eldiener@earthlink.net> wrote
> but I truly do not see what is so negative in having parallel exception
> hierarchies based on the C++ intrinsic character types in order to return
> strings of that type, whether message IDs or actual locale-based grammatical
> text.

It's true; you don't.

You've put your finger on it, though -- the parallel exception heirarchies.
I think that a lot of C++ experts have dismissed the whole discussion
under an assumption that "it's obvious." But one man's obvious is
another man's quandry. I think that once you understand the problem with
parallel exception heirarchies, then all of the other remarks will make a
lot more sense.

I'm not sure that I'm up to explaining the basic problem, but I'm going
to try -- and I welcome additional comments from others to help clarify
the whole issue.

The main problem with parallel exception heirarchies, IMHO, is exactly
the same problem as a certain pattern of switch/case statements. So I'm
going to fall back on the overused "shape" heirarchy. If you were going
to code this in C, you might do it something like this:

    enum shapetype { none, circle, square, triangle };
    struct point { float x, y; };
    struct shapecircle { point center; float radius; };
    struct shapesquare { point upperleft; float sidelength; };
    struct shapetriangle { point vertex[3]; };
    struct shape {
        enum shapetype ShapeType;
        void *shapedata;
    };
    void drawCircle(shape*);
    void drawSquare(shape*);
    void drawTriangle(shape*);
    void drawShape(shape *s) {
        switch(s->ShapeType) {
            case circle:   drawCircle(s);   break;
            case square:   drawSquare(s);   break;
            case triangle: drawTriangle(s); break;
            default:       assert(false);   break; // Should never get here
    }   }

This code demonstrates one of the fundamental problems that C++ tries to
solve: high code maintenance needs. What happens when we try to add a new
shape to the list above?
  * We have to change the ENUM for a new constant, (say) rectangle.
  * We have to define the new struct shaperectangle.
  * We have to create a new drawRectangle() function.
  * We have to modify drawShape!
That last one is the worst part -- here is a single function that has
to know about every possible shape, or else it won't work right! There
are strategies that could be used to automate parts of this, but basically
everything is manual.

Contrast this to the "C++ way."

    struct point { float x, y; }; // Could do more here, but...
    class shape { public:
        virtual ~shape() {};
        virtual void draw() = 0;
    };
    class circle : public shape {
        point center;
        float radius;
    public:
        ~circle() {}
        virtual void draw() { /* ... */ }
    };
    class square : public shape {
        point upperleft;
        float sidelength;;
    public:
        ~square() {}
        virtual void draw() { /* ... */ }
    };
    class triangle : public shape {
        point vertex[3];
    public:
        ~triangle() {}
        virtual void draw() { /* ... */ }
    };
I'm sure you've seen this type of demonstration 1,000 times before.
But the main thing I'd like you to take away from it today is,
class shape does not know ANYTHING about the other shapes. In fact,
there isn't any one piece of code that has to know about all of the
others! This means that adding a rectangle is simple:
    class rectangle : public shape {
        point upperleft, lowerright;
    public:
        ~rectangle() {}
        virtual void draw() { /* ... */ }
    };
Now shape::draw() correctly handles the new case without having
changed even one line of source code (in fact, without having to
recompile!)

How does all of this apply to parallel exception heirarchies? It's
really the same thing, except that switch/case is spelled try/catch.

Have you ever studied relational databases? When you design two tables
to relate to each other, there are usually three cases to consider:
   1->0 (the record in table A does not correspond to one in table B)
   1->1 (the record in table A corresponds to one in table B)
   1->N, aka 1->many (the record in table A corresponds to multiple
         records in table B).

This fundamental concept recognizes a simple but essential point: Once
you're allowed two of something, you soon may have three, four, six,
twenty...

Notwithstanding all of the C++ code that would break if
   catch(std::logic_error) { /* ... */ }
had to be replaced by
   catch(std::logic_error<wchar>) { /* ... */ }
   catch(std::logic_error<char>) { /* ... */ }
there's still the simple problem that these two won't be enough.
There are a lot more than two types of characters in some systems.
There are certainly a lot more than two locales. "Explosive combinations."

I'm sure that I haven't given this topic due justice, but perhaps others
can explain it more clearly than I have.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Thu, 18 Jul 2002 07:57:21 GMT Raw View

Peter Dimov" <pdimov@mmltd.net> wrote in message
news:7dc3b1ea.0207170553.76fad632@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<0F1Z8.25412$A43.2537905@newsread2.prod.itd.earthlink.net>...
> >
> > Perhaps I do not have enough experience in throwing and catching
exceptions,
> > but I truly do not see what is so negative in having parallel exception
> > hierarchies based on the C++ intrinsic character types in order to
return
> > strings of that type, whether message IDs or actual locale-based
grammatical
> > text.
>
> There is nothing inherently wrong with parallel exception hierarchies
> per se.
>
> What is wrong with the proposed std::basic_exception is not that it's
> evil; it's that it is not a solution to a practical problem. Actual
> experience with parallel exception hierarchies suggests that
> basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> useless even with std:: in front; its main purpose would be to
> generate questions of the sort: "why doesn't std::wstring throw
> wlength_error and wout_of_range?"

If it existed, would it not be useful when throwing an exception from some
standard library wide character implementation ? As an example, for
std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically make
more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
existed, and pass a wide character string to it ? To me it does. And of
course it would then allow exceptions thrown from user-defined wide
character implementations to follow the standard library wide character
exception hierarchy.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 18 Jul 2002 12:04:31 GMT Raw View

news_comp.std.c++_expires-2002-09-01@nmhq.net (Niklas Matthies) wrote in message news:<slrnajb7kc.s6.news_comp.std.c++_expires-2002-09-01@nightrunner.nmhq.net>...
>
> How about instead adding std::wexception, derived from std::exception and
> providing an additional member function `wchar_t const * wwhat() const'?
> Not that I'm dramatically in favor of supporting "wide exceptions", but
> it seems to me that this would be a simple solution [...]

... to what problem?

> that doesn't hurt
> anyone and provides what is being asked for in this thread.

If it provides what is being asked, why don't people simply implement
wexception and use it throughout their projects? (As some of us
actually did, and dropped it.)

The std:: prefix in front has some obvious implications: it raises
expectations that standard library components know about, and
sometimes throw, a wexception. This means that it is not a cosmetic,
pure addition.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Thu, 18 Jul 2002 07:39:10 CST Raw View

"Hillel Y. Sims" wrote:
[...]
> Contrary to popular advice, you should never actually do "catch (...) {}" to
> achive nothrow code, but you can safely do "catch (std::exception&) {}" if
> all exception types are inherited from std::exception.

I guess, you have a rather strange opinion with respect to "nothrow code",
Hillel. Yes, "nothrow code" isn't meant to throw, but it also just-can't-FAIL
[funny destructors with sort-of "error logging" aside] meaning that the caller
is given the guarantee that the operation WILL succeed... or the program will
"die" in core-/crash-/whatever-dump or simply "JIT" debugger. ;-)

> You cannot correctly
> handle an exception that you do not know what type it is and cannot even
> give it a name.

And how do you know "what type it is" if you catch via std::exception or alike?

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 18 Jul 2002 15:23:31 GMT Raw View

Allan_W@my-dejanews.com (Allan W) wrote in message news:<23b84d65.0207171652.68a03cf@posting.google.com>...
> Notwithstanding all of the C++ code that would break if
>    catch(std::logic_error) { /* ... */ }
> had to be replaced by
>    catch(std::logic_error<wchar>) { /* ... */ }
>    catch(std::logic_error<char>) { /* ... */ }
> there's still the simple problem that these two won't be enough.
> There are a lot more than two types of characters in some systems.
> There are certainly a lot more than two locales. "Explosive combinations."
>
> I'm sure that I haven't given this topic due justice, but perhaps others
> can explain it more clearly than I have.
>

Hello, how many times do I have to explain that what you are saying
above in simply not correct?  I am NOT suggesting any change to the
standard exception hierarchy, nor to the set of exceptions explicitly
thrown by the stdlib.  There would NOT, repeat NOT, be any
std::logic_error<wchar_t> since std::logic_error is already defined to
be derived from std::exception, which would not change.

I don't mind arguing fact-based technical arguments, but it gets kind
of tiring when incorrect statements are made as facts to be rebutted.
Please check your sources of information before stating such obvious
errors as valid arguments.

Finally, in response to your comment about "a lot more than two types
of characters", there are only two types of characters with standard
support:  char and wchar_t, for which standard traits classes are
defined.  Thus the combinatorial explosion you predict is highly
unlikely given that the exception classes make use of the string
classes, which are pretty much restricted to char and wchar_t.  This
situation will remain true for all except those who need to go through
the effort to provide their own traits classes for their own character
type, which, if you look into it, is still fairly restrictive.  You
cannot use just any old type as a char type.  See Langer and Kreft
"Standard C++ IOStreams and Locales" for mor info on this.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 18 Jul 2002 17:29:13 GMT Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<cU4Z8.62930$6r.1978746@news4.srv.hcvlny.cv.net>...
> > It is NOT even true that this
> > will catch all exceptions occurring during use of the stdlib.
>
> ? As far as I know, all exceptions thrown by std library components are
> inherited from std::exception...?

Again, not completely correct.  The stdlib components use members of
the classes with which the stdlib is used, and those classes may throw
an exception during a stdlib operation, which will then percolate out
of the stdlib function during which it occurred.

>
> >
> > It is ONLY true that this will catch all exceptions that are, or are
> > publicly derived from, std::exception.  Any user-defined exception
> > that is not publicly derived from std::exception may still occur while
> > using the stdlib, since that library uses the ctor, copy ctor and
> > assignment operator of the class being used with the stdlib.
>
> The fact that user-code throws a custom exception has nothing to do with the
> types of exceptions being thrown by the standard library.

See above.

> >
> > My proposal to implement std::exception as a typedef of
> > std::basic_exception makes no difference to this situation.  Catching
> > only std::exception will still catch only any and all exceptions
> > publicly derived from std::exception.
>
> ....which should be all user-defined exception types in well-written
> software. Changing std::exception to no longer even possibly be the ultimate
> base class of all C++ exceptions would result in a fundamental flaw in the
> C++ exception-handling mechanism.

Since you see so clearly the value of using std::exception as a base
class for your user-defined exceptions, i.e., you use std::exception
as a well-known and standard hook for your own exceptions, perhaps you
may also see the value in providing a similar hook for those who need
to use wide-character exceptions?

Randy.

>
> thanks,
> hys
>
> --
> Hillel Y. Sims
> FactSet Research Systems
> hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 18 Jul 2002 17:51:33 GMT Raw View

"P.J. Plauger" <pjp@dinkumware.com> wrote in message news:<3d356251$0$17232$724ebb72@reader2.ash.ops.us.uu.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207160434.229c6802@posting.google.com...
>
> > > Once upon a time, people thought it wise to first look at existing
> > > practice to see what was worth standardizing. Standards are at their
> > > best when they codify existing practice, often at their worst when
> > > they invent. You, and others in this thread, take it as axiomatic
> > > that your current bright ideas should be stuffed into an International
> > > Standard, the sooner the better, to force those recalcitrant users and
> > > implementors to try them out. While I agree there's precedent for that
> > > approach, it's *not* your inalienable right. And it often leads to bad
> > > standards.
> >
> > Sorry, but I must humbly and respectfully disagree with your
> > characterization of my attitude.  I do not assume, and have never
> > suggested, any inalienable right to have any ideas incorporated into
> > the standard.  Nonetheless, I do feel that I do indeed have a right to
> > suggest ideas regarding the standard, and to have those ideas
> > reasonably considered and discussed, whether or not I have the time or
> > resources to be able to implement those ideas myself.
>
> And you do. And you have, at least by my metric. But you have clearly
> set the bar rather higher for what constitutes ``reasonably considered
> and discussed.'' On 3 July, for example, you stated rather petulantly:
>
> >> So be it.  I think your concept is wrong-headed and needlessy
> >> discouraging to developers who work with C++ on a daily basis and may
> >> have some good thoughts about how the language might best evolve to
> >> meet their real needs.  At any rate, you have certainly succeeded in
> >> discouraging me.  I have no intention of participating further in this
> >> discussion.  I've already wasted enough time that was clearly not
> >> appreciated.
> >>
> >> Thanks.  You win.
>
> The strong signal I've been getting from this thread is that there's
> only one reasonable outcome -- the C++ Committee should get their butts
> in gear and do this obvious right thing. Well, our mileage may vary.

You take my remark out of context here.  My complaint was based on the
argument that no one who has not the time or resources to implement an
idea should be allowed to suggest an idea.  Clearly it is simply not
conceivable that the entire universe of C++ developers, most of whom
are too busy delivering product to be able to muck about with changes
to their copy of the stdlib, would never ever have a single idea
worthy of consideration in spite of that defect.  We work with C++
every day, and I suspect that more than a few of us actually do have
good ideas that could be justifiably included in the standard.

>
> > Much of the discussion here has consisted of incorrect arguments
> > imputing results that simply cannot be supported by the facts, or
> > suggestion of additional changes that would impact existing code or
> > practices.
>
> That's your opinion. And yet, there seem to be groups of people who
> have formed a shared world view that differs from yours. Strange.

Yes, there is an entire group in this discussion who cannot see that
simply adding a template base class would have no impact on the
current standard exception hierarchy, or insist that implementation
would require the use of mythical templated catch statements.  These
arguments are simply incorrect technically and therefore not a solid
counter argument.  The mere fact that some people insist on continuing
to restate that which is demonstrably incorrect does not lessen the
technical merit of my suggestions.

Randy.

>
> >         I was extremely careful to suggest ideas that could be
> > implemented in such a way as to have zero impact on existing code or
> > practices, yet resistance to supposed changes required to existing
> > code or practice has been the largest component of the counter
> > arguments.
>
> Also your opinion. I, and others, have expressed opposition for other
> reasons.
>
> > I enjoy reasonable discussion of alternatives, pros and cons, etc.,
> > and I was eager to see enlightened technical arguments on the merits
> > of my proposal.  So much of the argument has been misdirected that the
> > hoped for technical merit discussion has been largely obscured.
>
> Why do I get the feeling that ``enlightened technical arguments on the
> merits'' should include more praise and support than you've garnered?
>
> > I have yet to see any substantive technical argument against this
> > proposal that was not completely rebutted.
>
> I've *never* seen an argument about anything as abstract as a programming
> language that was completely rebutted to everyone's satisfaction. That
> doesn't make any of the arguments, pro or con, dead right or dead wrong.
>
> >                                           Of course, it is pretty
> > much impossible to rebut an argument that is simply incorrect when the
> > presenter of that argument is not willing to listen, but there is
> > nothing to be done about that.
>
> Now *that* I will agree with.
>
> P.J. Plauger
> Dinkumware, Ltd.
> http://www.dinkumware.com
>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 18 Jul 2002 19:24:58 GMT Raw View

Randy Maddox wrote:
>
> My complaint was based on the
> argument that no one who has not the time or resources to implement an
> idea should be allowed to suggest an idea.

You are complaining about an argument that hasn't been made.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 18 Jul 2002 19:29:18 GMT Raw View

pdimov@mmltd.net (Peter Dimov) wrote in message news:<7dc3b1ea.0207180337.17ff7ac@posting.google.com>...

> The std:: prefix in front has some obvious implications: it raises
> expectations that standard library components know about, and
> sometimes throw, a wexception. This means that it is not a cosmetic,
> pure addition.
>

Au contraire.  My expectation would be that the stdlib itself
explicitly throws only those exceptions which the standard documents
it to throw, which is not the entire set of standard exceptions as it
now exists.  Since I am not proposing any change to the standard
exception hierarchy, or to the stdlib, why would you make this
unwarranted assumption?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Thu, 18 Jul 2002 15:05:48 CST Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207180436.460e0954@posting.google.com...

> > The strong signal I've been getting from this thread is that there's
> > only one reasonable outcome -- the C++ Committee should get their butts
> > in gear and do this obvious right thing. Well, our mileage may vary.
>
> You take my remark out of context here.  My complaint was based on the
> argument that no one who has not the time or resources to implement an
> idea should be allowed to suggest an idea.

Don't recall anybody ever saying that. The issue is how much work others
should have to do just because someone makes a suggestion.

>                                           Clearly it is simply not
> conceivable that the entire universe of C++ developers, most of whom
> are too busy delivering product to be able to muck about with changes
> to their copy of the stdlib, would never ever have a single idea
> worthy of consideration in spite of that defect.

I agree that it's not conceivable. It's also not relevant to this
discussion.

>                                                    We work with C++
> every day, and I suspect that more than a few of us actually do have
> good ideas that could be justifiably included in the standard.

Could be. And you have had described to you on multiple occasions
some of the ways you get things included into an ISO programming
language standard. What I hear in return is whingeing that this is
too much work for all you ``idea'' people out there. And I detect
a strong sense of entitlement -- you have what you think is a good
idea, the C++ committee should do the work to make it a part of a
subsequent standard. Never mind whether the good idea even belongs
in a standard to begin with.

Well it don't work that way. I've accumulated over three years of
total effort, over the past couple of decades, working on several
ANSI and ISO software standards. For practically all that time I've
either been self employed or worked for a company that I own. That's
a nontrivial investment by any metric, not even counting all the
travel costs. If I were to list the things I got put into these
standards, in return for all that effort, it would be a very short
and modest list indeed. Nevertheless I do it, for a host of reasons
good and bad.

You want the C++ committee to consider your good ideas, you have
a role model. However questionable.

> > > Much of the discussion here has consisted of incorrect arguments
> > > imputing results that simply cannot be supported by the facts, or
> > > suggestion of additional changes that would impact existing code or
> > > practices.
> >
> > That's your opinion. And yet, there seem to be groups of people who
> > have formed a shared world view that differs from yours. Strange.
>
> Yes, there is an entire group in this discussion who cannot see that
> simply adding a template base class would have no impact on the
> current standard exception hierarchy, or insist that implementation
> would require the use of mythical templated catch statements.  These
> arguments are simply incorrect technically and therefore not a solid
> counter argument.  The mere fact that some people insist on continuing
> to restate that which is demonstrably incorrect does not lessen the
> technical merit of my suggestions.

It's extremely frustrating when people are slow to get your point.
Imagine for a moment how you might look to some of them.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 18 Jul 2002 21:26:14 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<qhqZ8.1968$_C2.126446@newsread2.prod.itd.earthlink.net>...
> Peter Dimov" <pdimov@mmltd.net> wrote in message
> news:7dc3b1ea.0207170553.76fad632@posting.google.com...
> >
> > What is wrong with the proposed std::basic_exception is not that it's
> > evil; it's that it is not a solution to a practical problem. Actual
> > experience with parallel exception hierarchies suggests that
> > basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> > useless even with std:: in front; its main purpose would be to
> > generate questions of the sort: "why doesn't std::wstring throw
> > wlength_error and wout_of_range?"
>
> If it existed, would it not be useful when throwing an exception from some
> standard library wide character implementation ? As an example, for
> std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically make
> more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> existed, and pass a wide character string to it ? To me it does.

No. That is part of the problem. There is no connection between the
character type of the string, and the character type of the exception.
The former is determined by the contents of the string, and the latter
is determined by the language/encoding you want the error message in.

Another example:

FILE* fopen(char const * name, char const * mode);

The first 'char' is different from the second 'char' (and both are
different from the two 'chars' discussed above.) The easy, and
incorrect, way to "go international" is to mechanically replace 'char'
with 'wchar_t' everywhere, like Microsoft did, for example:

FILE* _wfopen(wchar_t const * name, wchar_t const * mode);

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Thu, 18 Jul 2002 22:52:07 GMT Raw View

"Peter Dimov" <pdimov@mmltd.net> wrote in message
news:7dc3b1ea.0207181112.2b0e6a7@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<qhqZ8.1968$_C2.126446@newsread2.prod.itd.earthlink.net>...
> > Peter Dimov" <pdimov@mmltd.net> wrote in message
> > news:7dc3b1ea.0207170553.76fad632@posting.google.com...
> > >
> > > What is wrong with the proposed std::basic_exception is not that it's
> > > evil; it's that it is not a solution to a practical problem. Actual
> > > experience with parallel exception hierarchies suggests that
> > > basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> > > useless even with std:: in front; its main purpose would be to
> > > generate questions of the sort: "why doesn't std::wstring throw
> > > wlength_error and wout_of_range?"
> >
> > If it existed, would it not be useful when throwing an exception from
some
> > standard library wide character implementation ? As an example, for
> > std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically
make
> > more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> > existed, and pass a wide character string to it ? To me it does.
>
> No. That is part of the problem. There is no connection between the
> character type of the string, and the character type of the exception.
> The former is determined by the contents of the string, and the latter
> is determined by the language/encoding you want the error message in.
>

I think there is a connection. A wide character implementation of something
which parallels a similar narrow character implementation of the same
concept, such as std::wstring paralleling std::string, is much more likely
to benefit from a parallel wide character exception in the place of the
equivalent narrow character exception because:

1) The programmer using the wide character implementation is much more
likely to be able to catch and handle the wide character exception
effectively since that programmer has already decided to use the parallel
wide character implementation.

2) The wide character exception may provide a what() wide character string
to the catch handler which the programmer catching the exception can use in
that programmer's language encoding

3) The thrower of the exception can incorporate a wide character string as
the what() value which reflects a particular wide character encoding, one
that may even be related to the wide character encoding of the std::wstring
itself.

While there is no mandated connection between a wide character
implementation, such as std::wstring and my theoretical wde character
exception, logically I believe that a very pragmatic connection should
exist.

My final argument regarding this is that wide character encodings are
normally a superset of narrow character encodings so that if our mythical
implementor of the std::wstring at() exception did have a parallel wide
character exception at his disposal, it would be easy to incorporate a
narrow character message such as L"Index value XX is out of bounds" if
necessary. The opposite is not true. A Japanese implementor of a Japanese
version of the C++ standard library and the std::wstring at() exception has
no way to incorporate a wide character Kanji message in the current narrow
character out of range exception. Having a wide character exception for the
wide character implementation would allow that.

BTW I am aware of PJ Plauger's argument that they could use a MBCS string. I
just don't regard it as the best ideal solution and would rather see the C++
standard library move in the direction of greater wide character acceptance
rather than rely on what I believe will be largely an outdated technology in
the future as wide characters and Unicode encodings gain more acceptance and
become more the norm. .

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Fri, 19 Jul 2002 15:27:17 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207181127.8168346@posting.google.com...
> Au contraire.  My expectation would be that the stdlib itself
> explicitly throws only those exceptions which the standard documents
> it to throw, which is not the entire set of standard exceptions as it
> now exists.  Since I am not proposing any change to the standard
> exception hierarchy, or to the stdlib, why would you make this
> unwarranted assumption?
>

Now you are contradicting your argument that there would be no impact on
current exception handling -- "not proposing any change to the standard
exception hierarchy"?? std::exception is the base class of all standard
library exceptions. Your proposal for a templated basic_exception<T> which
std::exception would be derived from would totally alter that dynamic...

thanks,
hys

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Fri, 19 Jul 2002 15:28:12 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207180454.7edbd121@posting.google.com...
> Allan_W@my-dejanews.com (Allan W) wrote in message
news:<23b84d65.0207171652.68a03cf@posting.google.com>...
> > Notwithstanding all of the C++ code that would break if
> >    catch(std::logic_error) { /* ... */ }
> > had to be replaced by
> >    catch(std::logic_error<wchar>) { /* ... */ }
> >    catch(std::logic_error<char>) { /* ... */ }
> > there's still the simple problem that these two won't be enough.
> > There are a lot more than two types of characters in some systems.
> > There are certainly a lot more than two locales. "Explosive
combinations."
> >
> > I'm sure that I haven't given this topic due justice, but perhaps others
> > can explain it more clearly than I have.
> >
>
> Hello, how many times do I have to explain that what you are saying
> above in simply not correct?  I am NOT suggesting any change to the
> standard exception hierarchy,

>From my interpretation, you most certainly are. You would like to change
std::exception from its status as THE ultimate base class of std library
exceptions (and idealistically, all user exceptions) to something with an
unbounded number of sibling classes.

>nor to the set of exceptions explicitly
> thrown by the stdlib.  There would NOT, repeat NOT, be any
> std::logic_error<wchar_t> since std::logic_error is already defined to
> be derived from std::exception, which would not change.

Then what is the point of this whole discussion? Why does std::exception
need to change at all then? Just make your own custom exception class(es)
that use wchar_t and use them however you like. Even better, you could
_inherit_ them from std::exception too! (You know you want to...)


>
> I don't mind arguing fact-based technical arguments, but it gets kind
> of tiring when incorrect statements are made as facts to be rebutted.
> Please check your sources of information before stating such obvious
> errors as valid arguments.
>

The only "sources" of information here come from a rambling thread on usenet
with about a million posts at this point; I'm not sure it is quite fair to
make this kind of statement. Perhaps consider why so many people are
seemingly misinterpreting your intent so frequently.

thanks,
hys

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Fri, 19 Jul 2002 19:01:00 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<pAHZ8.3683$_C2.261992@newsread2.prod.itd.earthlink.net>...
>
> BTW I am aware of PJ Plauger's argument that they could use a MBCS string. I
> just don't regard it as the best ideal solution and would rather see the C++
> standard library move in the direction of greater wide character acceptance
> rather than rely on what I believe will be largely an outdated technology in
> the future as wide characters and Unicode encodings gain more acceptance and
> become more the norm. .

Absolutely!  Putting an MBCS string into a narrow character exception
seems like a throwback to old-style C code.  And if you are catching
exceptions that are all narrow character, how do you know which are
actually MBCS strings as opposed to simply char strings?

At least with both narrow and wide character exceptions you would know
for sure what type was really being returned by the what() member.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 19 Jul 2002 21:06:50 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<pAHZ8.3683$_C2.261992@newsread2.prod.itd.earthlink.net>...
> "Peter Dimov" <pdimov@mmltd.net> wrote in message
> news:7dc3b1ea.0207181112.2b0e6a7@posting.google.com...
> > "Edward Diener" <eldiener@earthlink.net> wrote in message
>  news:<qhqZ8.1968$_C2.126446@newsread2.prod.itd.earthlink.net>...
> > > Peter Dimov" <pdimov@mmltd.net> wrote in message
> > > news:7dc3b1ea.0207170553.76fad632@posting.google.com...
> > > >
> > > > What is wrong with the proposed std::basic_exception is not that it's
> > > > evil; it's that it is not a solution to a practical problem. Actual
> > > > experience with parallel exception hierarchies suggests that
> > > > basic_exception<wchar_t> is rarely, if ever, useful. It will remain
> > > > useless even with std:: in front; its main purpose would be to
> > > > generate questions of the sort: "why doesn't std::wstring throw
> > > > wlength_error and wout_of_range?"
> > >
> > > If it existed, would it not be useful when throwing an exception from
>  some
> > > standard library wide character implementation ? As an example, for
> > > std::wstring::at(size_type pos) if pos >= size(), wouldn't it logically
>  make
> > > more sense to throw a std::basic_out_of_range<wchar_t> exception, if it
> > > existed, and pass a wide character string to it ? To me it does.
> >
> > No. That is part of the problem. There is no connection between the
> > character type of the string, and the character type of the exception.
> > The former is determined by the contents of the string, and the latter
> > is determined by the language/encoding you want the error message in.
> >
>
> I think there is a connection. A wide character implementation of something
> which parallels a similar narrow character implementation of the same
> concept, such as std::wstring paralleling std::string, is much more likely
> to benefit from a parallel wide character exception in the place of the
> equivalent narrow character exception because:

[...]

OK. What should basic_string<MyChar> throw? vector<wchar_t>?
my_algorithm(It first, It last) when invoked with wide_str.begin(),
wide_str.end()?

[...]

> My final argument regarding this is that wide character encodings are
> normally a superset of narrow character encodings so that if our mythical
> implementor of the std::wstring at() exception did have a parallel wide
> character exception at his disposal, it would be easy to incorporate a
> narrow character message such as L"Index value XX is out of bounds" if
> necessary. The opposite is not true. A Japanese implementor of a Japanese
> version of the C++ standard library and the std::wstring at() exception has
> no way to incorporate a wide character Kanji message in the current narrow
> character out of range exception. Having a wide character exception for the
> wide character implementation would allow that.

Good argument, let's turn it around. If std::wstring throws a wide
exception, why should the Japanese implementor be constrained to
throwing a narrow exception type from std::string? The "no connection"
argument works the other way, too. The choice of the appropriate
exception depends, in this case, on the nationality of the vendor, not
on the contents of the string being manipulated.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 19 Jul 2002 21:07:07 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0207181127.8168346@posting.google.com>...
> pdimov@mmltd.net (Peter Dimov) wrote in message news:<7dc3b1ea.0207180337.17ff7ac@posting.google.com>...
>
> > The std:: prefix in front has some obvious implications: it raises
> > expectations that standard library components know about, and
> > sometimes throw, a wexception. This means that it is not a cosmetic,
> > pure addition.
> >
>
> Au contraire.  My expectation would be that the stdlib itself
> explicitly throws only those exceptions which the standard documents
> it to throw, which is not the entire set of standard exceptions as it
> now exists.  Since I am not proposing any change to the standard
> exception hierarchy, or to the stdlib, why would you make this
> unwarranted assumption?

Which unwarranted assumption?

There are two cases: either the standard library throws wexception, or
it doesn't.

Case 1: it does throw wexception. See above.

Case 2: it doesn't throw wexception.

2.1. people will ask why. Trust me.
2.2. if wexception has nothing in common with the stdlib, why should
it be standard? What is the added value that the std:: prefix will
bring? If a standard 'wexception' is needed, why is there no existing
implementation acting as a de-facto standard?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Sat, 20 Jul 2002 04:54:31 CST Raw View

"Peter Dimov" <pdimov@mmltd.net> wrote in message
news:7dc3b1ea.0207190247.5db6bf67@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<pAHZ8.3683$_C2.261992@newsread2.prod.itd.earthlink.net>...
> > "Peter Dimov" <pdimov@mmltd.net> wrote in message
> > news:7dc3b1ea.0207181112.2b0e6a7@posting.google.com...
> > > "Edward Diener" <eldiener@earthlink.net> wrote in message
> >  news:<qhqZ8.1968$_C2.126446@newsread2.prod.itd.earthlink.net>...
> > > > Peter Dimov" <pdimov@mmltd.net> wrote in message
> > > > news:7dc3b1ea.0207170553.76fad632@posting.google.com...
> > > > >
> > > > > What is wrong with the proposed std::basic_exception is not that
it's
> > > > > evil; it's that it is not a solution to a practical problem.
Actual
> > > > > experience with parallel exception hierarchies suggests that
> > > > > basic_exception<wchar_t> is rarely, if ever, useful. It will
remain
> > > > > useless even with std:: in front; its main purpose would be to
> > > > > generate questions of the sort: "why doesn't std::wstring throw
> > > > > wlength_error and wout_of_range?"
> > > >
> > > > If it existed, would it not be useful when throwing an exception
from
> >  some
> > > > standard library wide character implementation ? As an example, for
> > > > std::wstring::at(size_type pos) if pos >= size(), wouldn't it
logically
> >  make
> > > > more sense to throw a std::basic_out_of_range<wchar_t> exception, if
it
> > > > existed, and pass a wide character string to it ? To me it does.
> > >
> > > No. That is part of the problem. There is no connection between the
> > > character type of the string, and the character type of the exception.
> > > The former is determined by the contents of the string, and the latter
> > > is determined by the language/encoding you want the error message in.
> > >
> >
> > I think there is a connection. A wide character implementation of
something
> > which parallels a similar narrow character implementation of the same
> > concept, such as std::wstring paralleling std::string, is much more
likely
> > to benefit from a parallel wide character exception in the place of the
> > equivalent narrow character exception because:
>
> [...]
>
> OK. What should basic_string<MyChar> throw?

If it's not specialized, narrrow character exception. One can decide to
specialize in order to throw the narrow character or wide character
exception as approprate.

> vector<wchar_t>?

Wide character exception if the vector is specialized to do so, otherwise
narrow character exceptions.

> my_algorithm(It first, It last) when invoked with wide_str.begin(),
> wide_str.end()?

Unless specialized for some wide character sequence, it doesn't now what it
is invoked with so it would throw some narrow character exception.

All of your examples are good and there is no rule that needs to be set in
stone but, for the most part, good pragmatic decisions can be made, both for
the standard library exceptions and for user-defined exceptions on their
class templates and specializations. I totally agree that there can often be
a gray area, and I am not striving for absolute rules. I think good
documentation is the key once a choice is made, just like any programmer
catching exceptions should understand what are the possible exceptions which
can be thrown and what are the reasons a possible exception might be thrown.
I do understand your point. The closest pragmatic rule which I would make is
that the wide character exception should be thrown when a class template is
clearly specialized for wide characters as opposed to narrow characters.

>
> [...]
>
> > My final argument regarding this is that wide character encodings are
> > normally a superset of narrow character encodings so that if our
mythical
> > implementor of the std::wstring at() exception did have a parallel wide
> > character exception at his disposal, it would be easy to incorporate a
> > narrow character message such as L"Index value XX is out of bounds" if
> > necessary. The opposite is not true. A Japanese implementor of a
Japanese
> > version of the C++ standard library and the std::wstring at() exception
has
> > no way to incorporate a wide character Kanji message in the current
narrow
> > character out of range exception. Having a wide character exception for
the
> > wide character implementation would allow that.
>
> Good argument, let's turn it around. If std::wstring throws a wide
> exception, why should the Japanese implementor be constrained to
> throwing a narrow exception type from std::string?

Because he is dealing with a narrow character string and his
user-programmers catching the exception have chosen to use a narrow
character string and therefore should accept dealing with a narrow character
exception.

> The "no connection"
> argument works the other way, too. The choice of the appropriate
> exception depends, in this case, on the nationality of the vendor, not
> on the contents of the string being manipulated.

That wasn't really my point, that just because there is an implementation in
a language encoding the type of exception must follow the language encoding.
I was just arguing that some implementation in a wide character language
encoding should have the facility to create wide character exceptions for
wide character implementations and this would happen if the standard library
were changed to support a parallel wide character excprion hierarchy and if
some of the most obvious wide character implementations were changed to
throw wide character exceptions instead of narrow character ones. What those
standard library wide character implementations might be, is opened to
debate as per above.

Yes, it would mean, and this is purely my conception, beside adding a
parallel exception exception hierarchy for wide character exceptions,
actually changing the standard library and "breaking" some code. There I
have said it, the cat is out of the bag and I have committed the ultimate
C++ sin <g>. Really, I don't think the C+ language and library can always
intelligently move forward and address new ideas and uses without correcting
some of the possible design mistakes of the past. It's necessary to get over
this idea that moving forward must be done in such a way as to never break
any code that might exist from the past. It has become a mania of the C++
ethos and.like any "foolish consistency" needs to be reconsidered. While
striving to maintain backward compatibility is a noble goal, if it impedes
the future excellence of a computer language and its library in some
importance instance, it needs to be disregarded for that instance.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 22 Jul 2002 15:04:50 GMT Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<aMMZ8.87570$6r.2942855@news4.srv.hcvlny.cv.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207180454.7edbd121@posting.google.com...
> > Allan_W@my-dejanews.com (Allan W) wrote in message
>  news:<23b84d65.0207171652.68a03cf@posting.google.com>...
> > > Notwithstanding all of the C++ code that would break if
> > >    catch(std::logic_error) { /* ... */ }
> > > had to be replaced by
> > >    catch(std::logic_error<wchar>) { /* ... */ }
> > >    catch(std::logic_error<char>) { /* ... */ }
> > > there's still the simple problem that these two won't be enough.
> > > There are a lot more than two types of characters in some systems.
> > > There are certainly a lot more than two locales. "Explosive
>  combinations."
> > >
> > > I'm sure that I haven't given this topic due justice, but perhaps others
> > > can explain it more clearly than I have.
> > >
> >
> > Hello, how many times do I have to explain that what you are saying
> > above in simply not correct?  I am NOT suggesting any change to the
> > standard exception hierarchy,
>
> >From my interpretation, you most certainly are. You would like to change
> std::exception from its status as THE ultimate base class of std library
> exceptions (and idealistically, all user exceptions) to something with an
> unbounded number of sibling classes.
>
> >nor to the set of exceptions explicitly
> > thrown by the stdlib.  There would NOT, repeat NOT, be any
> > std::logic_error<wchar_t> since std::logic_error is already defined to
> > be derived from std::exception, which would not change.
>
> Then what is the point of this whole discussion? Why does std::exception
> need to change at all then? Just make your own custom exception class(es)
> that use wchar_t and use them however you like. Even better, you could
> _inherit_ them from std::exception too! (You know you want to...)

First, std::exception would really NOT change at all.  What I am
suggesting is:

  template <typename Char_t>
  class basic_exception
  {
    explicit basic_exception(const std::basic_string<Char_t> &
what_arg);

    const Char_t * what() const;

    // other members as currently exist
  };

  typedef basic_exception<char> exception;

At this point, std::exception is exactly the same class, and exactly
the same code, as it ever was.  It is NOT derived from a template, it
is a specific instantiation of a template.  As such it is a specific
type, and the code generated from the template is exactly the same
code as currently exists for std::exception.  Since the other standard
exceptions are all derived from std::exception, they too are exactly
as they ever were.  And since no change is proposed to the stdlib,
everything there is exactly the same as it ever was.

The one difference, however, is that now anyone wanting to use
wide-character exceptions will have a standard base class to derive
from.  Since it has so often been pointed out how important this
technique is for narrow character exceptions I am sure the value of
the corresponding technique for wide character exceptions is self
evident.  The suggestion of inheriting wide character exceptions from
std::exception would obviously not work as well due to the fact that
std::exception is currently hard-wired to support only narrow
character exceptions.

Of course, anyone wanting to use wide character exceptions now can
implement their own.  Anyone using any exceptions can implement their
own for that matter.  The reason we customarily derive exceptions from
std::exception is exactly the reason we need wexception as the common,
standard base for wide character exceptions.

Now am I making sense?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Sat, 13 Jul 2002 16:42:58 GMT Raw View

Pete Becker wrote:
>=20
> "Martin v. L=F6wis" wrote:
> >
> > Pete Becker <petebecker@acm.org> writes:
> >
> > > > > What should the name L"aebc" mean when an application is compil=
ed for an
> > > > > OS that doesn't support file names with character sizes corresp=
onding to
> > > > > wchar_t?
> > > >
> > > > Well, something similar to "aebc", i.e whatever the implementatti=
on
> > > > likes. What does "aebc" mean when an application is compiled for
> > > > as OS that doesn't support file names with character sizes
> > > > corresponding to /char/ ?
> > >
> > > I don't know. What is your experience with such systems?
> >
> > In my experience, on such systems, the char file names are converted
> > to wchar file names, using mbstowcs (they call that
> > MultiByteToWideChar, as they need to pass an additional parameter
> > depending on the type of binary you are executing).
> >
>=20
> If MultiByteToWideChar is the Win32 API function, then you aren't
> talking about a system that doesn't support file names with character
> sizes coresponding to char.
>=20

Unless, of course, you're talking about Windows CE. <g>

But to get back to my point, is it your contention that the mere
existence of Windows CE invalidates 20 years of experience in using
portable file names? In which case, of course, there simply is no way to
write file names that are reasonably portable. So, naturally, the
solution is to add still more ways of naming files that we don't
understand.

--=20
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Sat, 13 Jul 2002 19:32:21 GMT Raw View

Pete Becker wrote:
>=20
> Richard J Cox wrote:
> >
> > In article <3D2B2DC8.C2C2A633@acm.org>, petebecker@acm.org (Pete Beck=
er)
> > wrote:
> >
> > > Further, "aebc" contains the same characters without regard to OS,
> > > compiler, or locale.
> >
> > Does it? What about ASCII vs. EBCDIC?
>=20
> Makes no difference: the array holds the same characters. "aebc"[1] =3D=
=3D
> 'e', regardless of the underlying encoding.

Well, consider: < ;-) ;-) >

http://www.ibm.com/software/ad/c390/czos/versions/czosv1r2.html
("ASCII Support")

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/EDCLB120/3.6?S=
HELF=3DCBCBS120
("3.6 Enhanced ASCII Support")

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/CBCPG120/7.7.4=
?SHELF=3DCBCBS120
("7.7.4 Coded Character Set Independence in Developing Applications")

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/CBCPG120/7.7.4=
.2.1?SHELF=3DCBCBS120
("7.7.4.2.1 CONVLIT Compiler Option")

"....
 Consider the following program:=20




     /* header.h */
     char *text=3D"Hello World";


     /* test.c */
     #pragma convlit(suspend)
     #pragma comment (user, "A user comment")

     #include <stdio.h>
     #include "header.h"
     #pragma convlit(resume)

     main (){
         char *text2 =3D"Hi There!";
            }


 When this program is compiled with the CONVLIT(ISO8859-1) option,=20
 the string "Hi There!" will be converted to an ASCII string, but=20
 the string "Hello World" will not be converted.=20
 ----------------------------------------------------------------
 =A9 Copyright IBM Corp. 1996, 2002"

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Sun, 14 Jul 2002 09:56:01 GMT Raw View

Peter Dimov wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D2DB549.8E373619@acm.org>...
> > "Martin v. L?is" wrote:
> > >
> > > In most
> > > systems using byte-based file names, the meaning of a file name (in
> > > terms of characters) can vary with the locale.
> > >
> >
> > There is a well-understood set of rules for creating portable names.
>
> There sure is, but this doesn't help a C++ program that needs to open
> an existing file with a non-portable name. When the OS uses wide
> character file names, the same narrow character literal doesn't
> necessarily name the same file.
>
> When the input character type, in_char_t, and the OS character type,
> os_char_t, are different, we encounter implementation defined
> translation, with its associated issues. This is true for in_char_t ==
> wchar_t and os_char_t == char, but it's equally true for in_char_t ==
> char and os_char_t == wchar_t.
>

Is it your position, then, that the issues involved in writing portable
file names with wide characters are as well understood today as the
issued involved in writing portable file names with ordinary characters?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Mon, 15 Jul 2002 07:50:52 CST Raw View

Pete Becker <petebecker@acm.org> writes:

> If MultiByteToWideChar is the Win32 API function, then you aren't
> talking about a system that doesn't support file names with character
> sizes coresponding to char.

I sure do. On the NTFS file system, you cannot store file names as
sequences of bytes (likewise for Joliet, and VFAT with long file
names). They do provide API to give the illusion of char* file name
API, but all this API converts the char* file names to wide strings
before passing them to the file system.

As the result, changing the locale may change the char strings that
you have to use to access the file names.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Mon, 15 Jul 2002 14:46:09 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2F7FA3.F7922B9B@acm.org>...
> Peter Dimov wrote:
> >
> > Pete Becker <petebecker@acm.org> wrote in message news:<3D2DB549.8E373619@acm.org>...
> > > "Martin v. L?is" wrote:
> > > >
> > > > In most
> > > > systems using byte-based file names, the meaning of a file name (in
> > > > terms of characters) can vary with the locale.
> > > >
> > >
> > > There is a well-understood set of rules for creating portable names.
> >
> > There sure is, but this doesn't help a C++ program that needs to open
> > an existing file with a non-portable name. When the OS uses wide
> > character file names, the same narrow character literal doesn't
> > necessarily name the same file.
> >
> > When the input character type, in_char_t, and the OS character type,
> > os_char_t, are different, we encounter implementation defined
> > translation, with its associated issues. This is true for in_char_t ==
> > wchar_t and os_char_t == char, but it's equally true for in_char_t ==
> > char and os_char_t == wchar_t.
> >
>
> Is it your position, then, that the issues involved in writing portable
> file names with wide characters are as well understood today as the
> issued involved in writing portable file names with ordinary characters?

Interesting question. As we have already established, "writing" a
portable wide character string is pretty much impossible in C++, no
matter whether it denotes a file name or not.

This aside, I admit that it is possible that wide character file names
are not as well understood as narrow character file names.

It is my position, however, that this doesn't affect our discussion in
any significant way. If you want a portable file name, use a narrow
character string. If you need to open an existing file on a wide
character OS without resorting to platform-specific facilities, you
must either be able to represent all wide character file names as a
narrow character string (not possible today), _or_ be able to name the
file using a wide character string (not possible today.)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 15 Jul 2002 14:46:25 GMT Raw View

James Dennett <jdennett@acm.org> wrote in message news:<3D2DD9D1.7010109@acm.org>...
> Randy Maddox wrote:
> > Since the thread here is getting so deeply nested, let's see
>  > if I can summarize the arguments again.
> >
> > First, there no longer seems to be any argument about wide
>  > character support in template what arguments to be provided
>  > by a templatized base class along the lines of what was done
>  > for strings with basic_string, string and wstring.  Reusing
>  > a previous solution seems like a good idea, and this would
> > have zero impact on existing code.  Those who need and want
>  > wide character exception what arguments could use exceptions
>  > derived from wexception, while those who don't would merely
>  > continue to use exceptions derived from exception.  The
> > entire std::exception hierarchy would remain exactly as is.
> >
> > I may be wrong about that discussion being over, but no one
>  > seems to be saying anything about it anymore.
>
> It seemed to me that the proposal "lost" -- there was no good
> argument for adding anything like std::basic_exception, and
> there are good reasons to avoid it, including
> * it breaks the idiom of catching std::exception&, as that
>    would no longer catch the whole standard exception hierarchy;
>    hence, it does have impact on existing code in libraries in
>    spite of the claim made above, because those libraries will
>    be re-used in future and would require modification

Why would you make this statement as if it were true?  Please explain
to me how the following would in any way have this impact:

template <typename Char_t>
class basic_exception
{
  explicit basic_exception(const basic_string<Char_t> & what_arg);

  const Char_t * what() const;

  // other members as currently exist
};

typedef basic_exception<char> exception;

// rest of standard exception hierarchy as currently exists

If your argument were true, then it would directly imply that the
existing typedef of string as basic_string<char> would cause problems
with the string class, which is demonstrably not true.  I am not
proposing some off-the-wall change, but rather a tried-and-true
technique that is already used in the stdlib.  Since use of this
technique has already been demonstrated to work with the string
classes, why would you believe that it would not work identically with
the exception classes?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Mon, 15 Jul 2002 14:46:03 GMT Raw View

Pete Becker <petebecker@acm.org> writes:

> Unless, of course, you're talking about Windows CE. <g>

I was not talking about Windows CE; I was talking about the file
systems of Windows NT.

> But to get back to my point, is it your contention that the mere
> existence of Windows CE invalidates 20 years of experience in using
> portable file names? In which case, of course, there simply is no way to
> write file names that are reasonably portable. So, naturally, the
> solution is to add still more ways of naming files that we don't
> understand.

If applications restrict their filenames to the POSIX Portable
Filename Character Set, then there is no ambiguity on Win32, either.

However, users are unwilling to restrict themselves to this narrow set
of characters, and application developers need a way to fulfil the
user demands.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: James Dennett <jdennett@acm.org>
Date: Mon, 15 Jul 2002 13:24:44 CST Raw View

Randy Maddox wrote:
> James Dennett <jdennett@acm.org> wrote in message news:<3D2DD9D1.7010109@acm.org>...
>
>>Randy Maddox wrote:
>>
>>>Since the thread here is getting so deeply nested, let's see
>>
>> > if I can summarize the arguments again.
>>
>>>First, there no longer seems to be any argument about wide
>>
>> > character support in template what arguments to be provided
>> > by a templatized base class along the lines of what was done
>> > for strings with basic_string, string and wstring.  Reusing
>> > a previous solution seems like a good idea, and this would
>>
>>>have zero impact on existing code.  Those who need and want
>>
>> > wide character exception what arguments could use exceptions
>> > derived from wexception, while those who don't would merely
>> > continue to use exceptions derived from exception.  The
>>
>>>entire std::exception hierarchy would remain exactly as is.
>>>
>>>I may be wrong about that discussion being over, but no one
>>
>> > seems to be saying anything about it anymore.
>>
>>It seemed to me that the proposal "lost" -- there was no good
>>argument for adding anything like std::basic_exception, and
>>there are good reasons to avoid it, including
>>* it breaks the idiom of catching std::exception&, as that
>>   would no longer catch the whole standard exception hierarchy;
>>   hence, it does have impact on existing code in libraries in
>>   spite of the claim made above, because those libraries will
>>   be re-used in future and would require modification
>
>
> Why would you make this statement as if it were true?  Please explain
> to me how the following would in any way have this impact:
>
> template <typename Char_t>
> class basic_exception
> {
>   explicit basic_exception(const basic_string<Char_t> & what_arg);
>
>   const Char_t * what() const;
>
>   // other members as currently exist
> };
>
> typedef basic_exception<char> exception;
>
> // rest of standard exception hierarchy as currently exists

Explanation repeated further down.  It's also been explained
by a number of other posters, if my explanation is unclear.

>
> If your argument were true, then it would directly imply that the
> existing typedef of string as basic_string<char> would cause problems
> with the string class, which is demonstrably not true.

It would not, and does not, imply any such thing.

> I am not
> proposing some off-the-wall change, but rather a tried-and-true
> technique that is already used in the stdlib.  Since use of this
> technique has already been demonstrated to work with the string
> classes, why would you believe that it would not work identically with
> the exception classes?
>
> Randy.

There's no deep magic here.  Library code currently
written to catch _all exceptions in the standard
exception hierarchy_, and documented as such, would
no longer catch all such exceptions after your change,
because you've changed the definition of what it means
to be in the standard exception hierarchy.

The analogy to std::basic_string does not disprove
this.  It's a very weak analogy.

There are uses for code which does, for example,

try {
   callback_that_may_throw_any_standard_derived_exception();
} catch (std::exception &) {
   do_some_cleanup();
   throw;
}

and your proposal would make this code less useful,
by allowing/encouraging exceptions that are not
derived from std::exception.

It could be that we're all explaining really badly,
or that you actually have a counterargument which you've
not yet managed to explain to us.  I cannot understand
why you cannot simply accept that the code above is
affected by your proposal.

std::basic_exception<T> does not, in my opinion, have
enough upside to warrant the significant cost of adding
it to C++, if indeed the upside even outweighs the down.

-- James Dennett

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: bdawes@acm.org (Beman Dawes)
Date: Mon, 15 Jul 2002 23:20:37 GMT Raw View

Daniel Miller <daniel.miller@tellabs.com> wrote in message news:<3D2DD530.2070702@tellabs.com>...

>    Assume in coming months or years that at least one person in the
> world invests the time & money to build out a
> proposal-for-standardization & implementation.  Rather than relying only
> on off-the-cuff divergent hearsay in newsgroups, that person would want
> to rely more heavily on an exhaustive survey-research of how all popular
> modern operating systems work with wide-character filesystem identifiers
> (where "popular modern" is defined very generously, e.g., at least:
> Windows variants; SGI IRIX; HP-UX; Tru64; OpenVMS; Caldera OpenUnix 8=
> System V Release 5, the successor to UnixWare; Solaris; Linux
> (variants); AIX; MacOS-X; BSD; OS/390; PalmOS; Plan 9; any
> wide-character-supporting internationalized RTOSes (do any exist?);
> emerging/forthcoming OSes).

Be careful. I've been dealing with a somewhat similar question
recently:

"What characters does an operating system permit in
(narrow-char-based) filenames?"

What I've been finding is that you cannot rely on the operating
system's documentation.  You have to write a program to probe what is
really permitted and not permitted.  Some operating systems don't have
a single "standard-like" document which spells out their specs. Their
general documentation will say one thing in one place, and something
slightly different in another place.

When you run a probe program, you find that corner cases like the
first character of a filename, the last character of a filename, or
single character filenames, may be handled differently than what the
docs specify.  Ditto directory names.

So if someone does ever run a survey of how wide-characters names are
handled, they might want to both ask the question of the OS
documentation, but also probe programmatically to see what is really
allowed.

--Beman

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Mon, 15 Jul 2002 23:27:37 GMT Raw View

"James Dennett" <jdennett@acm.org> wrote in message
news:3D2DD9D1.7010109@acm.org...
> Randy Maddox wrote:
> > Since the thread here is getting so deeply nested, let's see
>  > if I can summarize the arguments again.
> >
> > First, there no longer seems to be any argument about wide
>  > character support in template what arguments to be provided
>  > by a templatized base class along the lines of what was done
>  > for strings with basic_string, string and wstring.  Reusing
>  > a previous solution seems like a good idea, and this would
> > have zero impact on existing code.  Those who need and want
>  > wide character exception what arguments could use exceptions
>  > derived from wexception, while those who don't would merely
>  > continue to use exceptions derived from exception.  The
> > entire std::exception hierarchy would remain exactly as is.
> >
> > I may be wrong about that discussion being over, but no one
>  > seems to be saying anything about it anymore.
>
> It seemed to me that the proposal "lost" -- there was no good
> argument for adding anything like std::basic_exception, and
> there are good reasons to avoid it, including

The argument for adding a
std::basic_exception<wchar_t> was so that a wide character error message
string could be returned from wide character implementations of standard
library and user-defined classes.

> * it breaks the idiom of catching std::exception&, as that
>    would no longer catch the whole standard exception hierarchy;
>    hence, it does have impact on existing code in libraries in
>    spite of the claim made above, because those libraries will
>    be re-used in future and would require modification

Regarding your first statement, it is true that it breaks the idiom of
catching all possible exceptions by catching std::exception. Perhaps this is
not such a serious problem as it is viewed.

The only modifications made to the library in the future with a wide
character exception library is the possibility that wide character
implementations which are currently in the library and do throw
std::exception derived exceptions ( are there currently any that do ? ) will
be changed to throw std::basic_exception<wchar_t> derived exceptions
instead.

> * it involves significant amounts of work to write a proposal
>    and to gather experience of use

This is true of any change or enhancement to C++.

> * users are already free to do similar things

Users are always free too "do similar things" for any
proposed change to the C++ standard library.

> * it fails to meet the needs of internationalised applications
>    anyway, as they would typically throw an exception with a
>    message id if they needed text to be presented to a user,
>    and for other text a narrow string generally suffices

Yes, they could throw a message ID, an argument Peter Dimov originally
brought up, when I first brought up this subject. But why should they have
to ? Wouldn't it be better to let those using a wide character encoding
incorporate a wide character error string in their exception ? As for
"narrow string generally suffices", it may for you but I wonder if it does
for Japanaese, Chinese, and other nationalities whose encodings are wide
character. Are they all trained to know English and Ascii and comfortable
with the fact that their exception error messages must be in a language
which is not their own, nor their end users ?

> * no evidence of widespread "prior art" has been shown, and
>    standardisation is not supposed to be an inventive process.

This is a classical Catch-22 situation. No evidence of widespread "prior
art"
has been shown because it does not exist in the current C++ standard, and
it is not to be considered as an addition to the C++ standard because no
evidence of widespread "prior art" has been shown.

>
> Possibly the conversation died away because these arguments
> were unanswered, and those who presented them assumed that
> the lack of a good basis for a proposal meant that the idea
> had been given up.

Actual when I first brought this up, I was the last to answer and justify my
reasons for the parallel wide character exception hierarchy in a discussion
with Peter Dimov in which he made some very good points. I never made a
formal proposal but just brought it up for discussion. Subsequently Randy
Maddox, totally independent of me, brought it up again in his post on
further proprosed wide character changes in the C++  standard library. For
all I know someone else had brought up the idea before either I or Randy
Maddox.

Since the strongest argument is, quite rightly, that with a parallel
exception hierarchy all exceptions will not necessarily be caught by
catching std::exception, let me offer a few practical considerations, not in
an effort to dispute that truth but in an effort to put a more pragmatic
viewpoint on it.

Firstly, as far as the C++ standard library is concerned,
std::basic_exception<wchar_t> would almost certainly only be thrown for wide
character implementations whether currently or in the future. Programmers in
using the standard library would be well aware when they are using a wide
character library implementation and would therefore be poised to catch
appropriate wide character exceptions as well as narrow character exceptions
in their catch blocks. For many programmers, knowing that they are not using
wide characters, or are not using wide characters in conjunction with
standard library functionality, or are not using wide characters in
conjunction with standard library functionality which actually throws wide
character exceptions, will never need to also put a catch for wide character
exceptions in their try/catch blocks.

Secondly, even in our current situation std::exception does not guarantee
that all exceptions are caught, only that all C++ standard library
exceptions are caught. While I, like many programmers, would encourage other
programmers to use the C++ standard exception library as the basis for all
their own exceptions, C++, in the spirit of a language which gives norammly
maximum freedom and responsibility to programmers, does not mandate that a
programmer can not throw his own exception. Of course, just as in the C++
standard library, good documentation should exist explicitly specifying if a
wide character exception can be thrown in user-defined functionality.

Finally, there is a cogent argument I initially made which I will bring up
again, even if others do not see the point of worrying too much about the
future. I think it is fairly possible that if C++ evolves as a language into
the future that other basic character types will be added to the language.
Perhaps, as just a possible example, some universally excepted Unicode
variant will become so popular, that C++ will add this character type in
some XX number of years. Luckily for C++ it has an advanced technique, and
for this I am very thankful for Bjarne Stroustrup's foresight, called
templates, which can make the transition to new types much less painful in
regards to the C++ standard library. If we plan now, adding a new character
type in the future to the language will not entail serious changes to the
C++ standard library as long as we seek to templatize in the standard
library all situations where the character or character string has possible
importance as an actual language encoding. I believe the What() exception
message string does have this importance. Although it could be used simply
as some sort of internal id, I believe the original idea for it must have
been a way for an intelligent message to be returned to the programmer when
an exception occurred.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Tue, 16 Jul 2002 01:25:56 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message
news:QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net...

> Yes, they could throw a message ID, an argument Peter Dimov originally
> brought up, when I first brought up this subject. But why should they have
> to ? Wouldn't it be better to let those using a wide character encoding
> incorporate a wide character error string in their exception ? As for
> "narrow string generally suffices", it may for you but I wonder if it does
> for Japanaese, Chinese, and other nationalities whose encodings are wide
> character. Are they all trained to know English and Ascii and comfortable
> with the fact that their exception error messages must be in a language
> which is not their own, nor their end users ?

They are all trained to use multibyte encodings. In fact, the Japanese
pioneered the use of multibyte encodings. The rest of us have been
playing catch up for the past few decades.

> > * no evidence of widespread "prior art" has been shown, and
> >    standardisation is not supposed to be an inventive process.
>
> This is a classical Catch-22 situation. No evidence of widespread "prior art"
> has been shown because it does not exist in the current C++ standard, and
> it is not to be considered as an addition to the C++ standard because no
> evidence of widespread "prior art" has been shown.

Once upon a time, people thought it wise to first look at existing
practice to see what was worth standardizing. Standards are at their
best when they codify existing practice, often at their worst when
they invent. You, and others in this thread, take it as axiomatic
that your current bright ideas should be stuffed into an International
Standard, the sooner the better, to force those recalcitrant users and
implementors to try them out. While I agree there's precedent for that
approach, it's *not* your inalienable right. And it often leads to bad
standards.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Anthony Williams" <anthwil@nortelnetworks.com>
Date: Tue, 16 Jul 2002 11:52:23 GMT Raw View

"Beman Dawes" <bdawes@acm.org> wrote in message
news:70fa0367.0207151447.3a2d4998@posting.google.com...
> "What characters does an operating system permit in
> (narrow-char-based) filenames?"

> What I've been finding is that you cannot rely on the operating
> system's documentation.  You have to write a program to probe what is
> really permitted and not permitted.  Some operating systems don't have
> a single "standard-like" document which spells out their specs. Their
> general documentation will say one thing in one place, and something
> slightly different in another place.

> When you run a probe program, you find that corner cases like the
> first character of a filename, the last character of a filename, or
> single character filenames, may be handled differently than what the
> docs specify.  Ditto directory names.


Not to mention the fact that, for example "con", "con.abc",
"CON.someOtherExtension" are all invalid filenames on Windows NT, despite
meeting all the general requirements, as they name a system device with an
optional extension. The same applies to 22 other base names, including
"nul", "prn" and "clock$"

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 16 Jul 2002 11:52:05 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207150608.7e210c1d@posting.google.com...
> Why would you make this statement as if it were true?  Please explain
> to me how the following would in any way have this impact:
>
> template <typename Char_t>
> class basic_exception
> {
>   explicit basic_exception(const basic_string<Char_t> & what_arg);
>
>   const Char_t * what() const;
>
>   // other members as currently exist
> };
>
> typedef basic_exception<char> exception;
>
> // rest of standard exception hierarchy as currently exists
>

MyClass::~MyClass()
{
  try {
    .. cleanup code that may throw exceptions ..
  }
  catch (std::exception&) {}
}

How can this idiom work correctly in the presence of templated exception
base classes (except via a fictional templated catch clause)?

thanks,
hys

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 16 Jul 2002 16:54:10 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message
news:QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net...
> Regarding your first statement, it is true that it breaks the idiom of
> catching all possible exceptions by catching std::exception. Perhaps this
is
> not such a serious problem as it is viewed.
>

template <typename charT> class ThirdPartyLibClass {
  public:
   ...
   ~ThirdPartyLibClass()
   {
     try {
       .. cleanup code that may throw exceptions ..
     }
     catch (std::exception&) {}
   }
};

> Firstly, as far as the C++ standard library is concerned,
> std::basic_exception<wchar_t> would almost certainly only be thrown for
wide
> character implementations whether currently or in the future. Programmers
in
> using the standard library would be well aware when they are using a wide
> character library implementation and would therefore be poised to catch
> appropriate wide character exceptions as well as narrow character
exceptions
> in their catch blocks. For many programmers, knowing that they are not
using
> wide characters, or are not using wide characters in conjunction with
> standard library functionality, or are not using wide characters in
> conjunction with standard library functionality which actually throws wide
> character exceptions, will never need to also put a catch for wide
character
> exceptions in their try/catch blocks.

See above example.

>
> Secondly, even in our current situation std::exception does not guarantee
> that all exceptions are caught, only that all C++ standard library
> exceptions are caught. While I, like many programmers, would encourage
other
> programmers to use the C++ standard exception library as the basis for all
> their own exceptions, C++, in the spirit of a language which gives
norammly
> maximum freedom and responsibility to programmers, does not mandate that a
> programmer can not throw his own exception. Of course, just as in the C++
> standard library, good documentation should exist explicitly specifying if
a
> wide character exception can be thrown in user-defined functionality.

Throwing exception objects not derived from std::exception is simply a very
bad idea ("shooting yourself in the foot"), and not likely a good footing to
prove a point. C++ is great about letting users do what they want (sometimes
;-). catch(...) is very bad and should be avoided because it can catch
non-C++ exceptions such as OS-specific SEH-style exceptions on various
platforms, which are often intended to be fatal errors (such as
access-violations). Only through strict adherence to the rule of all
exception objects inherit from std::exception (or subclasses) can proper C++
catch-all exception handling really be done, via use of
catch(std::exception&). Changing std::exception from an ultimate base-class
to a sibling class with infinite other classes is not acceptible.


Why does it have to be this way? Because there is no other choice, it was
already released with std::exception as the ultimate base class, and there's
no way to change that now; it must always be the ultimate base class for now
and forever.


>
> Finally, there is a cogent argument I initially made which I will bring up
> again, even if others do not see the point of worrying too much about the
> future. I think it is fairly possible that if C++ evolves as a language
into
> the future that other basic character types will be added to the language.
> Perhaps, as just a possible example, some universally excepted Unicode
> variant will become so popular, that C++ will add this character type in
> some XX number of years. Luckily for C++ it has an advanced technique, and
> for this I am very thankful for Bjarne Stroustrup's foresight, called
> templates, which can make the transition to new types much less painful in
> regards to the C++ standard library.

Sure, because templated character support was provided for those objects
out-of-the-box. However, it didn't actually happen for std::exception (for
whatever reason), and now it is too late to break that functionality out,
because too much existing code depends upon it.

> If we plan now, adding a new character
> type in the future to the language will not entail serious changes to the
> C++ standard library as long as we seek to templatize in the standard
> library all situations where the character or character string has
possible
> importance as an actual language encoding. I believe the What() exception
> message string does have this importance. Although it could be used simply
> as some sort of internal id, I believe the original idea for it must have
> been a way for an intelligent message to be returned to the programmer
when
> an exception occurred.
>

It is too late to change the what() interface. Maybe a language-independent
message ID interface could be added to std::exception in the future which
could be of use for internationalization concerns and wouldn't break
existing code.

thanks,
hys

--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Tue, 16 Jul 2002 16:56:48 GMT Raw View

"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:3d336e8d$0$255$4c4eb88e@reader.news.uu.net...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
> news:QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net...
>
> > Yes, they could throw a message ID, an argument Peter Dimov originally
> > brought up, when I first brought up this subject. But why should they
have
> > to ? Wouldn't it be better to let those using a wide character encoding
> > incorporate a wide character error string in their exception ? As for
> > "narrow string generally suffices", it may for you but I wonder if it
does
> > for Japanaese, Chinese, and other nationalities whose encodings are wide
> > character. Are they all trained to know English and Ascii and
comfortable
> > with the fact that their exception error messages must be in a language
> > which is not their own, nor their end users ?
>
> They are all trained to use multibyte encodings. In fact, the Japanese
> pioneered the use of multibyte encodings. The rest of us have been
> playing catch up for the past few decades.

OK, that is a good technical point. Having dealt with multibyte encodings in
Windows modules, it seems to me more difficult to manipulate on the
programming level than wide characters although that shouldn't come into
effect much if they are being passed back in What() error strings. I do
believe, however, that the programming world in general is moving more
toward wide character sets via the flavors of Unicode than multibyte
encodings. Perhaps the Japanese programmers have been trained to use MBCS
because none of the Unicode standards were formalized when they had to use
some technique of passing strings around using their own encodings.
Personally I see the reliance on multibyte encodings as an older technology
whose efficacy has been superceded by wide characters and Unicode.

>
> > > * no evidence of widespread "prior art" has been shown, and
> > >    standardisation is not supposed to be an inventive process.
> >
> > This is a classical Catch-22 situation. No evidence of widespread "prior
art"
> > has been shown because it does not exist in the current C++ standard,
and
> > it is not to be considered as an addition to the C++ standard because no
> > evidence of widespread "prior art" has been shown.
>
> Once upon a time, people thought it wise to first look at existing
> practice to see what was worth standardizing. Standards are at their
> best when they codify existing practice, often at their worst when
> they invent. You, and others in this thread, take it as axiomatic
> that your current bright ideas should be stuffed into an International
> Standard, the sooner the better, to force those recalcitrant users and
> implementors to try them out.

I can't speak for others but that's not my intention, nor have I ever said
such a thing, so I don't understand why you would say that.

What I would like to see is a serious discussion by the C++ standards
committee of the internationalization issues represented in this thread,
rather than a rejection of them by members of the committee in this thread
solely on the basis of these ideas not having been implemented yet by
anybody. I can respect the C++ commitee wanting implementations of ideas but
I can not understand the C++ committee summarily rejecting any well-thought
out idea, argued and discussed cogently, simply because an implementation
does not exist.

It is easier for you, as a compiler writer and standard library implementor,
to insist on an implementation as a proof of concept, and I do believe I
understand where you are coming from on this issue and how much you value an
actual implementation since you have been down that road many times already.
But for a C++ programmer such as I, even given that I can provide a standard
library implementation for myself by hacking the source code of my favorite
compiler or 3rd party library, what good will this do other than to say that
I have successfully done X for myself and as far as I can see it is working
the way I believe it should work ? I do not, and of course probably can not
legally, distribute my library change to thousands of developers as
Dinkumware does. I am only a single developer with some, what I believe are,
good ideas to improvement of C++. And of course it is far beyond the ken of
most C++ developers, including myself, to make a language change even to a
freely distributed compiler, such as gcc, in order to offer a proof of
concept of a suggestion or proposal for a change to the C++ language.

I assume that part of the use of this NG is to discuss new ideas in order to
attract the notice of C++ committee members to actually thinking about
improvements to the C++ language. If the only new ideas to be discussed must
come from language and library implementors, then the use of this NG in
presenting ideas from intelligent C++ programmers, who are not compiler or
library writers, is severely limited.

If there is a formal proposal mechanism I would be glad to find out what it
is, but so far the only information I have gotten is that one should present
ideas on this NG and be willing to discuss them with others, possible C++
committee members who might be interested in them and, if they support such
an idea, might be willing to argue it in front of the committee. That is
what I, and Randy Maddox in a later post, have done regards some
internationalization issues which have been brought up.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Tue, 16 Jul 2002 11:57:07 CST Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> "James Dennett" <jdennett@acm.org> wrote in message
> news:3D2DD9D1.7010109@acm.org...
[...]
> > * it fails to meet the needs of internationalised applications
> >    anyway, as they would typically throw an exception with a
> >    message id if they needed text to be presented to a user,
> >    and for other text a narrow string generally suffices
>
> Yes, they could throw a message ID, an argument Peter Dimov originally
> brought up, when I first brought up this subject. But why should they have
> to ?

The argument is not that "they could throw a message ID as a
workaround." The argument is: "in my experience, when writing a robust
localizable application, you _must_ throw a message ID, and not text."

> Wouldn't it be better to let those using a wide character encoding
> incorporate a wide character error string in their exception ?

The surprising answer is "no." Throwing a wide character string
doesn't solve any real i18n problems. In a real application, you
simply don't _have_ a wide character string to throw; there is no text
anywhere in the program.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 16 Jul 2002 18:12:52 GMT Raw View

"P.J. Plauger" <pjp@dinkumware.com> wrote in message news:<3d336e8d$0$255$4c4eb88e@reader.news.uu.net>...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
> news:QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net...

> Once upon a time, people thought it wise to first look at existing
> practice to see what was worth standardizing. Standards are at their
> best when they codify existing practice, often at their worst when
> they invent. You, and others in this thread, take it as axiomatic
> that your current bright ideas should be stuffed into an International
> Standard, the sooner the better, to force those recalcitrant users and
> implementors to try them out. While I agree there's precedent for that
> approach, it's *not* your inalienable right. And it often leads to bad
> standards.
>
> P.J. Plauger
> Dinkumware, Ltd.
> http://www.dinkumware.com

Sorry, but I must humbly and respectfully disagree with your
characterization of my attitude.  I do not assume, and have never
suggested, any inalienable right to have any ideas incorporated into
the standard.  Nonetheless, I do feel that I do indeed have a right to
suggest ideas regarding the standard, and to have those ideas
reasonably considered and discussed, whether or not I have the time or
resources to be able to implement those ideas myself.

Much of the discussion here has consisted of incorrect arguments
imputing results that simply cannot be supported by the facts, or
suggestion of additional changes that would impact existing code or
practices.  I was extremely careful to suggest ideas that could be
implemented in such a way as to have zero impact on existing code or
practices, yet resistance to supposed changes required to existing
code or practice has been the largest component of the counter
arguments.

I enjoy reasonable discussion of alternatives, pros and cons, etc.,
and I was eager to see enlightened technical arguments on the merits
of my proposal.  So much of the argument has been misdirected that the
hoped for technical merit discussion has been largely obscured.

I have yet to see any substantive technical argument against this
proposal that was not completely rebutted.  Of course, it is pretty
much impossible to rebut an argument that is simply incorrect when the
presenter of that argument is not willing to listen, but there is
nothing to be done about that.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 16 Jul 2002 18:12:48 GMT Raw View

James Dennett <jdennett@acm.org> wrote in message news:<3D331079.4040909@acm.org>...
> Randy Maddox wrote:
> > James Dennett <jdennett@acm.org> wrote in message news:<3D2DD9D1.7010109@acm.org>...
>
> There's no deep magic here.  Library code currently
> written to catch _all exceptions in the standard
> exception hierarchy_, and documented as such, would
> no longer catch all such exceptions after your change,
> because you've changed the definition of what it means
> to be in the standard exception hierarchy.
>
> The analogy to std::basic_string does not disprove
> this.  It's a very weak analogy.
>
> There are uses for code which does, for example,
>
> try {
>    callback_that_may_throw_any_standard_derived_exception();
> } catch (std::exception &) {
>    do_some_cleanup();
>    throw;
> }
>
> and your proposal would make this code less useful,
> by allowing/encouraging exceptions that are not
> derived from std::exception.
>
> It could be that we're all explaining really badly,
> or that you actually have a counterargument which you've
> not yet managed to explain to us.  I cannot understand
> why you cannot simply accept that the code above is
> affected by your proposal.
>

It may very well be that I have not explained my idea carefully
enough, or perhaps it has not been listened to well enough.  In either
case, your code snippet above will currently catch only exceptions
derived from std::exception, which includes all explicit stdlib
exceptions.  There may be other user-defined exceptions in any case,
and since they are not required to be derived from std::exception,
your snippet will not catch them.

In my proposal, std::exception would still exist, would still be the
base class for the standard exception hierarchy, and all exceptions
explicitly thrown by the stdlib would be the exact same exceptions
they are now.

Note too that even though your code snippet catches all standard
exceptions, there is no guarantee that the stdlib will not throw an
exception of some other user-defined type.  What if you are copying
elements into a container and the assignment operator for the element
class throws an exception?  Nothing forces that exception to be
derived from std::exception.

Take a look back at the code snippet in my last response to you and
think about this again.  Your point here is simply incorrect.

Randy.

> std::basic_exception<T> does not, in my opinion, have
> enough upside to warrant the significant cost of adding
> it to C++, if indeed the upside even outweighs the down.
>
> -- James Dennett

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 16 Jul 2002 19:22:12 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> "James Dennett" <jdennett@acm.org> wrote in message
> news:3D2DD9D1.7010109@acm.org...
> > Randy Maddox wrote:
> Regarding your first statement, it is true that it breaks the idiom of
> catching all possible exceptions by catching std::exception. Perhaps this is
> not such a serious problem as it is viewed.
>

I see a point of confusion here.  It is NOT true that

  catch(const std::exception & ex)

will catch "all possible exceptions".  It is NOT even true that this
will catch all exceptions occurring during use of the stdlib.

It is ONLY true that this will catch all exceptions that are, or are
publicly derived from, std::exception.  Any user-defined exception
that is not publicly derived from std::exception may still occur while
using the stdlib, since that library uses the ctor, copy ctor and
assignment operator of the class being used with the stdlib.  That
class may throw any exception, of any type, it desires.  There is no
requirement that such an exception be in any way related to
std::exception, which means that catching only std::exception may not
catch any of a whole universe of possible exceptions.

My proposal to implement std::exception as a typedef of
std::basic_exception makes no difference to this situation.  Catching
only std::exception will still catch only any and all exceptions
publicly derived from std::exception.

Hope this clarifies things.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 16 Jul 2002 14:50:00 CST Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<nmMY8.47888$6r.1516572@news4.srv.hcvlny.cv.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207150608.7e210c1d@posting.google.com...
> > Why would you make this statement as if it were true?  Please explain
> > to me how the following would in any way have this impact:
> >
> > template <typename Char_t>
> > class basic_exception
> > {
> >   explicit basic_exception(const basic_string<Char_t> & what_arg);
> >
> >   const Char_t * what() const;
> >
> >   // other members as currently exist
> > };
> >
> > typedef basic_exception<char> exception;
> >
> > // rest of standard exception hierarchy as currently exists
> >
>
> MyClass::~MyClass()
> {
>   try {
>     .. cleanup code that may throw exceptions ..
>   }
>  catch (std::exception&) {}
> }
>
> How can this idiom work correctly in the presence of templated exception
> base classes (except via a fictional templated catch clause)?

This works just fine because you can throw pretty much anything that
has a copy constructor, and catch the same.  See Chapter 14 of
Stroustrup's 3rd edition of "The C++ Programming Language" where he
offers lots of examples of throwing and catching objects that are not
part of the std::exception hierarchy.  The fact that std::exception
would, under my proposal, be a typedef for a specific template
instantiation makes no difference here.  You could, for example, throw
an std::string object, and catch it as:  catch(const std::string &),
and that would work just fine.

The mere fact that a template is involved does not imply a requirement
for templated catch statements.  As long as you are referring to a
specific template instantiation then you are referring to a specific
type, and the fact that type is based on a template makes no
difference.

What cannot be done, and what I am in no way suggesting should be
supported, is to have a templated catch statement something like:

  template <typename T> catch(const std::basic_exception<T> &)

The syntax here is just made up since it does not exist, but you can
clearly see the difference between this and the previous example

  catch(std::exception &)

which is equivalent, under my proposal, to

  catch(std::basic_exception<char> &)

Here the type of the caught object is known at compile time, whereas
with the fictional, and not needed or suggested, templated catch
statement the type of the caught object is not known at compile time.
And therein lies the key difference.

Randy.

>
> thanks,
> hys
>
> --
> Hillel Y. Sims
> FactSet Research Systems
> hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Tue, 16 Jul 2002 19:51:58 GMT Raw View

pdimov@mmltd.net (Peter Dimov) wrote
> Interesting question. As we have already established, "writing" a
> portable wide character string is pretty much impossible in C++, no
> matter whether it denotes a file name or not.

I suppose you could say this about narrow character strings, too.
The existing standard does not specify ASCII over EBCDIC, for instance.

Despite this limitation, I note that most C++ programmers are able to
read and write files that can be exchanged with other programmers,
without creating or buying extensive libraries. There have even been
rumors of C++ programs exchanging files with programs written in
(gasp!) other languages.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Tue, 16 Jul 2002 21:04:53 GMT Raw View

Peter Dimov wrote:

> "Edward Diener" <eldiener@earthlink.net> wrote in message news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
>
>>"James Dennett" <jdennett@acm.org> wrote in message
>>news:3D2DD9D1.7010109@acm.org...
>>
> [...]
>
>>>* it fails to meet the needs of internationalised applications
>>>   anyway, as they would typically throw an exception with a
>>>   message id if they needed text to be presented to a user,
>>>   and for other text a narrow string generally suffices
>>>
>>Yes, they could throw a message ID, an argument Peter Dimov originally
>>brought up, when I first brought up this subject. But why should they have
>>to ?
>>
>
> The argument is not that "they could throw a message ID as a
> workaround." The argument is: "in my experience, when writing a robust
> localizable application, you _must_ throw a message ID, and not text."

   The fundamental difference of understanding between Edward Diener
versus Peter Dimov is for how many locales is the software to be
localized.  Peter's technique of passing message-ids around in the
transnational bulk of software, delaying conversion of each message-id
to the installed local to a lingusitic/user-interface/outer-perimeter
layer of software is prepared to have, say, 37 locales which it
supports.  Throwing a type containing a localized string from some deep
layer of software requires that layer of software to know about all 37
locales and no longer be simply obliviously/blithely metanational.

   In the ASCII world, software was written by Americans for Americans.
  In the i18n world, software is not necessarily written by Russians for
Russians only, different software written by Germans for Germans only,
and still different software written by Swedes for Swedes only.

   In the i18n world, the same software is written by citizens of
country c for use by citizens of a set of countries S, where |S| (the
number of countries in which the software is to be installed) is
typically greater than 1.  S might or might not include c (because c
might be a country to which the software was out-sourced as an off-shore
development and where c is not a target market of the software).
Because |S| is greater than 1, which member country of S would Edward
Diener's technique pick to be more equal than the rest to be the locale
in which error strings are to be presented.  If the software is
installed in Canada, would we still want the error messages to be in the
engineer's native German.  Or if the software were to be installed in
USA, Mexico, Germany, Russia, communist China (simplified Chinese),
nationalist China (traditional Chinese), Japan, Brazil, and Saudi
Arabia, would we want the error messages to be in the programmer's
native Bengali?  Likewise if the software were to be installed in USA,
Mexico, Germany, Russia, the two Chinas, Japan, Brazil, Saudi Arabia,
and India, would we want *every* layer of software which throws
exceptions to be aware of all of the linguistic i18n complexities of
English, Spanish, German, Russian, simplified Chinese, traditional
Chinese, Japanese, Portugese, Arabic, and Hindi/Urdu/Bengali each and
every time that they throw an exception (or detect any other form of
error which deserves some form of error message)?  No, the localization
should be quarrantined to certain layers of software (and possibly
outsourced to linguistically-knowledgable experts), not strewn
throughout every layer of software which throws exceptions.

>>Wouldn't it be better to let those using a wide character encoding
>>incorporate a wide character error string in their exception ?
>>
>
> The surprising answer is "no." Throwing a wide character string
> doesn't solve any real i18n problems. In a real application, you
> simply don't _have_ a wide character string to throw; there is no text
> anywhere in the program.

   I agree with Peter Dimov.
   "Wouldn't it be better to let those using a wide character encoding
incorporate a wide character error string in their exception?"  No, it
is better to have some single message-id key known by lower layers of
software than to have every layer of software (no matter how low-level)
know about all 37 languages for which that software has been localized.
  Passing around message-ids in the layers of software which are
transnational simplifies those layers.  Isolating conversion of the
message-id to an i18n layer of software keeps the linguistic
sophistication isolated from the metanational bulk of the software.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Tue, 16 Jul 2002 23:26:05 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207161053.64cc8afb@posting.google.com...
> "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> > "James Dennett" <jdennett@acm.org> wrote in message
> > news:3D2DD9D1.7010109@acm.org...
> > > Randy Maddox wrote:
> > Regarding your first statement, it is true that it breaks the idiom of
> > catching all possible exceptions by catching std::exception. Perhaps
this is
> > not such a serious problem as it is viewed.
> >
>
> I see a point of confusion here.  It is NOT true that
>
>   catch(const std::exception & ex)
>
> will catch "all possible exceptions".

You are correct. I phrased that incorrectly above. I should have written "it
is true it breaks the idiom of catching all possible exceptions that are
thrown directly by standard library classes". Further on in that same
message I clarified why it was not "such a serious" problem in arguments
which are very similar to what you have written below.

>  It is NOT even true that this
> will catch all exceptions occurring during use of the stdlib.
>
> It is ONLY true that this will catch all exceptions that are, or are
> publicly derived from, std::exception.  Any user-defined exception
> that is not publicly derived from std::exception may still occur while
> using the stdlib, since that library uses the ctor, copy ctor and
> assignment operator of the class being used with the stdlib.  That
> class may throw any exception, of any type, it desires.  There is no
> requirement that such an exception be in any way related to
> std::exception, which means that catching only std::exception may not
> catch any of a whole universe of possible exceptions.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Wed, 17 Jul 2002 01:04:52 GMT Raw View

"Daniel Miller" <daniel.miller@tellabs.com> wrote in message
news:3D3487D5.7010806@tellabs.com...
> Peter Dimov wrote:
>
> > "Edward Diener" <eldiener@earthlink.net> wrote in message
news:<QNIY8.23386$A43.2347915@newsread2.prod.itd.earthlink.net>...
> >
> >>"James Dennett" <jdennett@acm.org> wrote in message
> >>news:3D2DD9D1.7010109@acm.org...
> >>
> > [...]
> >
> >>>* it fails to meet the needs of internationalised applications
> >>>   anyway, as they would typically throw an exception with a
> >>>   message id if they needed text to be presented to a user,
> >>>   and for other text a narrow string generally suffices
> >>>
> >>Yes, they could throw a message ID, an argument Peter Dimov originally
> >>brought up, when I first brought up this subject. But why should they
have
> >>to ?
> >>
> >
> > The argument is not that "they could throw a message ID as a
> > workaround." The argument is: "in my experience, when writing a robust
> > localizable application, you _must_ throw a message ID, and not text."
>
>
>    The fundamental difference of understanding between Edward Diener
> versus Peter Dimov is for how many locales is the software to be
> localized.  Peter's technique of passing message-ids around in the
> transnational bulk of software, delaying conversion of each message-id
> to the installed local to a lingusitic/user-interface/outer-perimeter
> layer of software is prepared to have, say, 37 locales which it
> supports.  Throwing a type containing a localized string from some deep
> layer of software requires that layer of software to know about all 37
> locales and no longer be simply obliviously/blithely metanational.

My fundamental point in this particular aspect of the discussion is not that
it is not a good idea to pass a message ID instead of an actual textual
message, but that one can do one or the other when programming for a wide
string locale and having a facility which allows wide character what()
strings in exceptions. I am promoting the flexibility of doing that. I
understand that many people do not believe that flexibility is enough to
warrant parallel exception hierarchies based on intrinsic character types.

I agree that it is just about always a better design in programs whose
messages will be translated to different locales to not hardcode any message
strings. I will even say that in any program, whether for different locales
or not, it is most often better not to hard-code message strings of any kind
but create them as resources instead and find them from a resource ID during
run-time. If I had been a member of the C++ committee when the structure of
the std::exception class was designed, I would have vociferously argued
against any string at all as part of the class, since strings are heavily
locale dependent, and would have argued for an "unsigned long" ID instead
and encouraged plentiful documentation regarding the meaning of any such ID
thrown by current standard library exceptions. However this was not the
decision made by the C++ committee designers and, given that the decision
was made to allow passing back of an exception string, I think it behooves
the committee to consider a parallel exception hierarchy which would also
allow the passing back of a wide string.

Perhaps I do not have enough experience in throwing and catching exceptions,
but I truly do not see what is so negative in having parallel exception
hierarchies based on the C++ intrinsic character types in order to return
strings of that type, whether message IDs or actual locale-based grammatical
text. There is a sort of practical fervor which I evidently don't share
regarding this, about not being able to catch all exceptions which are
generated off the standard exception hierarchy in one catch statement. I
believe I have always thought about, generated, and caught exceptions which
were specific to some implementation and found the more general "catch
(std::exception)" to be fairly useless. Furthermore if one wanted to
guarantee catching all exceptions which could be thrown as opposed to all
the exceptions thrown off of the current standard library exception
hierarchy, one could use "catch(...)" instead and then one would catch all
exceptions even if a separate exception hierarchy based on a wide character
type existed.

In actual usage, with a parallel exception hierarchy for wide character
exception what() strings, the addition would hardly be noticed by the vast
majority of programmers in day to day use, since nearly all exceptions being
thrown would still be off of the std::exception hierarchy. There might be a
few exceptions thrown off of the wide character hierarchy for specific wide
character implementations in the standard library, as well as wide character
exceptions based on the standard library classes by user-defined
implementations, most especially in countries where wide character encodings
were the norm, but for the most part I would expect little change in the use
of the standard library by the majority of programmers. Of course, in this
scenario, if there are a great deal of programmers who program in such a way
that they feel absolutely guaranteed of catching all exceptions in their
code by using the "catch (std::exception & )" idiom, they could be
disappointed if they are working with wide character standard library
classes or third-party implementations. As Randy Maddox and I have both
pointed out, in different ways in different threads of this subject, even
this feeling of guarantee is not valid unless one knows absolutely that all
of the low-level library code being used is that of the C++ standard library
and no user code or 3rd party code being used throws any possible exceptions
except those based off of the standard library hierarchy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 16 Jul 2002 23:59:05 CST Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207161044.5e284c8c@posting.google.com...
> >
> > MyClass::~MyClass()
> > {
> >   try {
> >     .. cleanup code that may throw exceptions ..
> >   }
> >  catch (std::exception&) {}
> > }
> >
> > How can this idiom work correctly in the presence of templated exception
> > base classes (except via a fictional templated catch clause)?
>
> This works just fine because you can throw pretty much anything that
> has a copy constructor, and catch the same.

Yes, you _can_ also dereference off the end of the bounds of an array. Does
that mean you _should_ do it? User exceptions should always be derived from
std::exception (or subclass). All library exceptions adhere to this
strategy. There are various reasons why it is a fairly bad idea to
flippantly use exceptions that are not derived from std::exception or
catch(...) (I have detailed this argument in
http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&oe=UTF-8&frame=right&
th=1218f94d7b6d779a&seekm=3D331092.EDAD6B0A%40web.de#link19 ) Basing a
templated exception strategy on a flawed exception-handling technique is a
bad idea.

>  See Chapter 14 of
> Stroustrup's 3rd edition of "The C++ Programming Language" where he
> offers lots of examples of throwing and catching objects that are not
> part of the std::exception hierarchy.  The fact that std::exception
> would, under my proposal, be a typedef for a specific template
> instantiation makes no difference here.  You could, for example, throw
> an std::string object, and catch it as:  catch(const std::string &),
> and that would work just fine.

How does this help improve program stability and robustness? How do you know
what to catch then in nothrow code where you need to be able to catch all
C++ exceptions? catch(...) is not the answer (see the google link to some
reasons why it is not the answer). You should not really even be allowed to
throw anything other than derived from std::exception, except that this is
C++ and you are allowed to shoot yourself in the foot, so you _can_ throw
pretty much anything that has a copy constructor. Except you should only
throw stuff inherited from a single common C++ base-class, and that is
std::exception (as the std library does in all cases).

> What cannot be done, and what I am in no way suggesting should be
> supported, is to have a templated catch statement something like:
>
>   template <typename T> catch(const std::basic_exception<T> &)
>
> The syntax here is just made up since it does not exist, but you can
> clearly see the difference between this and the previous example
>
>   catch(std::exception &)

Indeed!

thanks,
hys


--
Hillel Y. Sims
FactSet Research Systems
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 17 Jul 2002 15:50:58 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message
news:HfVY8.24530$A43.2448625@newsread2.prod.itd.earthlink.net...

> > They are all trained to use multibyte encodings. In fact, the Japanese
> > pioneered the use of multibyte encodings. The rest of us have been
> > playing catch up for the past few decades.
>
> OK, that is a good technical point. Having dealt with multibyte encodings in
> Windows modules, it seems to me more difficult to manipulate on the
> programming level than wide characters although that shouldn't come into
> effect much if they are being passed back in What() error strings. I do
> believe, however, that the programming world in general is moving more
> toward wide character sets via the flavors of Unicode than multibyte
> encodings.

That may indeed be true, but it's not necessarily reason enough to add
complexity to such a low-level concept as exception objects. For the
limited purpose of passing back a brief message, a multibyte string may
be *good enough*. And it avoids adding any more complexity to exception
classes.

>            Perhaps the Japanese programmers have been trained to use MBCS
> because none of the Unicode standards were formalized when they had to use
> some technique of passing strings around using their own encodings.


The Japanese were developing what we now call wide-character encodings
for many a year before Unicode came along. We don't need to teach them
how to suck eggs.

> Personally I see the reliance on multibyte encodings as an older technology
> whose efficacy has been superceded by wide characters and Unicode.

That may also be true. But see above.

> > > This is a classical Catch-22 situation. No evidence of widespread "prior
> art"
> > > has been shown because it does not exist in the current C++ standard,
> and
> > > it is not to be considered as an addition to the C++ standard because no
> > > evidence of widespread "prior art" has been shown.
> >
> > Once upon a time, people thought it wise to first look at existing
> > practice to see what was worth standardizing. Standards are at their
> > best when they codify existing practice, often at their worst when
> > they invent. You, and others in this thread, take it as axiomatic
> > that your current bright ideas should be stuffed into an International
> > Standard, the sooner the better, to force those recalcitrant users and
> > implementors to try them out.
>
> I can't speak for others but that's not my intention, nor have I ever said
> such a thing, so I don't understand why you would say that.

Sorry if I offended you, but that's the signal I've been getting repeatedly
from this thread.

> What I would like to see is a serious discussion by the C++ standards
> committee of the internationalization issues represented in this thread,
> rather than a rejection of them by members of the committee in this thread
> solely on the basis of these ideas not having been implemented yet by
> anybody.

We think we've had serious discussions. We've tried to communicate some
of our deliberations and the reasons for our current conclusions. You,
and others, have concluded that we haven't worked as hard at what *you*
want us to do as you'd like. TS, we're all volunteers. If you want a
discussion that's adequately serious by your metric, you'll have to
become more active in the committee. Then you'll have to take your
chances on getting the amount of agenda time you think you deserve.
And you may still not get the next C++ Standard changed the way you'd
like.

> I can respect the C++ commitee wanting implementations of ideas but
> I can not understand the C++ committee summarily rejecting any well-thought
> out idea, argued and discussed cogently, simply because an implementation
> does not exist.

We haven't summarily rejected anything and we haven't acted simply because
of any one reason. You'd notice that if you reviewed this thread more
carefully. But I, for one, make no apologies about requiring proven prior
art before I take seriously even an allegedly well thought out idea.
Experience has proven repeatedly that ideas suffer useful amendment in
the process of being implemented. They improve more when subject to
user experience and feedback. And they improve even more when they
survive the Second System Effect. If I had my druthers, I wouldn't
standardize anything that hadn't reached V3.0 in the field. (Luckily
for all the visionaries in the world, I don't always get my way in the
development of programming language standards.)

> It is easier for you, as a compiler writer and standard library implementor,
> to insist on an implementation as a proof of concept, and I do believe I
> understand where you are coming from on this issue and how much you value an
> actual implementation since you have been down that road many times already.
> But for a C++ programmer such as I, even given that I can provide a standard
> library implementation for myself by hacking the source code of my favorite
> compiler or 3rd party library, what good will this do other than to say that
> I have successfully done X for myself and as far as I can see it is working
> the way I believe it should work ?

Not much. But if an idea is worthy of being put in an international
standard, it should have at least some demand pull from the field.
You'll never see that demand pull until you give the world something
real to pull and explain why they might want to demand it.

>                                  I do not, and of course probably can not
> legally, distribute my library change to thousands of developers as
> Dinkumware does.

Sure you can. There are several free libraries kicking around out there
that you could modify and post for easy downloading.

>                I am only a single developer with some, what I believe are,
> good ideas to improvement of C++. And of course it is far beyond the ken of
> most C++ developers, including myself, to make a language change even to a
> freely distributed compiler, such as gcc, in order to offer a proof of
> concept of a suggestion or proposal for a change to the C++ language.

We've been talking here about a smallish change to a few library functions.
That's a much lower overhead proposition.

> I assume that part of the use of this NG is to discuss new ideas in order to
> attract the notice of C++ committee members to actually thinking about
> improvements to the C++ language.

Yes. And this thread has succeeded in that.

>                                If the only new ideas to be discussed must
> come from language and library implementors, then the use of this NG in
> presenting ideas from intelligent C++ programmers, who are not compiler or
> library writers, is severely limited.

You're stretching a point rather thin here. Pick up a copy of Libstdc++
from Gnu, past in an overload to basic_filebuf<T>::open, and you've
joined the loft ranks of library implementors. Not a very exclusive club,
by that metric. Of course, if you want to put your kids through school
by implementing commercial libraries (as I have done for lo these many
decades) you'll have to get a bit more serious about it. But nobody said
you have to be a trained professional to provide a proof of concept of
a bright idea.

> If there is a formal proposal mechanism I would be glad to find out what it
> is, but so far the only information I have gotten is that one should present
> ideas on this NG and be willing to discuss them with others, possible C++
> committee members who might be interested in them and, if they support such
> an idea, might be willing to argue it in front of the committee. That is
> what I, and Randy Maddox in a later post, have done regards some
> internationalization issues which have been brought up.

I've heard people say more, but it might have got lost in the general hubbub.
Visit the WG21 web site and read more about the business of writing and
submitting proposals.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 17 Jul 2002 15:51:17 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207160434.229c6802@posting.google.com...

> > Once upon a time, people thought it wise to first look at existing
> > practice to see what was worth standardizing. Standards are at their
> > best when they codify existing practice, often at their worst when
> > they invent. You, and others in this thread, take it as axiomatic
> > that your current bright ideas should be stuffed into an International
> > Standard, the sooner the better, to force those recalcitrant users and
> > implementors to try them out. While I agree there's precedent for that
> > approach, it's *not* your inalienable right. And it often leads to bad
> > standards.
>
> Sorry, but I must humbly and respectfully disagree with your
> characterization of my attitude.  I do not assume, and have never
> suggested, any inalienable right to have any ideas incorporated into
> the standard.  Nonetheless, I do feel that I do indeed have a right to
> suggest ideas regarding the standard, and to have those ideas
> reasonably considered and discussed, whether or not I have the time or
> resources to be able to implement those ideas myself.

And you do. And you have, at least by my metric. But you have clearly
set the bar rather higher for what constitutes ``reasonably considered
and discussed.'' On 3 July, for example, you stated rather petulantly:

>> So be it.  I think your concept is wrong-headed and needlessy
>> discouraging to developers who work with C++ on a daily basis and may
>> have some good thoughts about how the language might best evolve to
>> meet their real needs.  At any rate, you have certainly succeeded in
>> discouraging me.  I have no intention of participating further in this
>> discussion.  I've already wasted enough time that was clearly not
>> appreciated.
>>
>> Thanks.  You win.

The strong signal I've been getting from this thread is that there's
only one reasonable outcome -- the C++ Committee should get their butts
in gear and do this obvious right thing. Well, our mileage may vary.

> Much of the discussion here has consisted of incorrect arguments
> imputing results that simply cannot be supported by the facts, or
> suggestion of additional changes that would impact existing code or
> practices.

That's your opinion. And yet, there seem to be groups of people who
have formed a shared world view that differs from yours. Strange.

>         I was extremely careful to suggest ideas that could be
> implemented in such a way as to have zero impact on existing code or
> practices, yet resistance to supposed changes required to existing
> code or practice has been the largest component of the counter
> arguments.

Also your opinion. I, and others, have expressed opposition for other
reasons.

> I enjoy reasonable discussion of alternatives, pros and cons, etc.,
> and I was eager to see enlightened technical arguments on the merits
> of my proposal.  So much of the argument has been misdirected that the
> hoped for technical merit discussion has been largely obscured.

Why do I get the feeling that ``enlightened technical arguments on the
merits'' should include more praise and support than you've garnered?

> I have yet to see any substantive technical argument against this
> proposal that was not completely rebutted.

I've *never* seen an argument about anything as abstract as a programming
language that was completely rebutted to everyone's satisfaction. That
doesn't make any of the arguments, pro or con, dead right or dead wrong.

>                                           Of course, it is pretty
> much impossible to rebut an argument that is simply incorrect when the
> presenter of that argument is not willing to listen, but there is
> nothing to be done about that.

Now *that* I will agree with.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rjcox@cix-remove-me-.co.uk (Richard J Cox)
Date: Wed, 10 Jul 2002 16:31:11 GMT Raw View

In article <3D2A1F03.8F1E5341@acm.org>, petebecker@acm.org (Pete Becker)
wrote:

> Randy Maddox wrote:
> >
> > Here is a more fundamental question:  Why should the stdlib care about
> > this?  Is it not a question of how the OS deals with this?
>
> What should the name L"aebc" mean when an application is compiled for an
> OS that doesn't support file names with character sizes corresponding to
> wchar_t?
>

What should "aebc" mean as a filename? Or rather where is it defined what
it means as a filename (since it is certainly usable now on many systems)?


--
rjcox at cix dot co dot uk

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Wed, 10 Jul 2002 17:13:54 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2B2DC8.C2C2A633@acm.org>...
> Peter Dimov wrote:
> >
> > Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> > > Randy Maddox wrote:
> > > >
> > > > Now, you have repeatedly stated that these ideas have been considered
> > > > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > > > could you please share the basis for that rejection?  It's rather
> > > > difficult to attempt any refutation without knowing what the technical
> > > > arguments are.
> > > >
> > >
> > > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > > [roughly speaking] name the same file? Now generalize.
> >
> > I never understood that argument. When do "abcdefghi", "Abcdefghi",
> > "abcdefgh", "abcdefghi.", "./abcdefghi", ".\\abcdefghi" name the same
> > file?
> >
>
> False analogy. "There are some things that vary" isn't the same as
> "everything can vary."

I'm trying to understand. The meaning of a 'char' file name is
implementation defined. This is acceptable. The behavior of a
'wchar_t' file name is implementation defined. This is unacceptable.
Why?

> > Even better, when do "aebc" and "aebc" name the same file?
>
> Yes, much better. When run in applications on the same system that refer
> to the same directory they always name the same file. Unless, of course,
> you take the nihilistic view that since the standard doesn't say how the
> names should be mapped it's impossible to talk about what file names
> mean. Common experience indicates otherwise. On the other hand, there is
> no common experience to indicate how any non-trivial wide character
> string maps to the name of a file.

One might argue that unless there is a standard wchar_t based fopen()
that common experience will never materialize. Why should it?

> Further, "aebc" contains the same characters without regard to OS,
> compiler, or locale. L"aebc", on the other hand, is an array of wide
> characters whose contents are implementation-defined, possibly depending
> on the locale that was set on the system at the time the code was
> compiled. So when you say ofstream out(L"aebc") you may be using a
> different name from the name used in ifstream in(L"aebc") in an
> application compiled with a different compiler.

Yes, but this is a general problem with L"literals", not file names.
There's almost never a reason to use a wide literal as a file name,
unless you are willing to accept the L"nonportability"; you could have
chosen a narrow literal just as well. Wide file names are typically
used to name an existing file, and there is no portability problem
with that. A C++ friendly file system would be able to map UCS-?? file
names to UTF-8 and return an NTBS, but existing practice shows that
this is rarely the case.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 10 Jul 2002 12:19:52 CST Raw View

beman_d@yahoo.com (Beman Dawes) wrote in message news:<fbc37fef.0207081738.2e2a1af5@posting.google.com>...
> rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0207080559.548a5e9e@posting.google.com>...

> When the standard library provides a function with behavior which is
> entirely implementation defined, an illusion of portability has been
> created where no real portability exists.

But isn't the file system itself implementation defined?  What about
OS-specific file permissions?  I might have a perfectly legal file
name and not be able to open it because I don't have read permission.
This just causes the open to fail.

>
> But in the case of wide character file names on systems which don't
> support them, there is no well known, argeed upon behavior.

I still have not seen any argument as to why the library has to worry
about this.  If the underlying file system doesn't support wide
character file names then any attempt to use them will fail.  No one
is trying to suggest that the library should provide support that is
not provided by the underlying file system.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 00:22:54 GMT Raw View

Peter Dimov wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D2B2DC8.C2C2A633@acm.org>...
> > Peter Dimov wrote:
> > >
> > > Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> > > > Randy Maddox wrote:
> > > > >
> > > > > Now, you have repeatedly stated that these ideas have been considered
> > > > > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > > > > could you please share the basis for that rejection?  It's rather
> > > > > difficult to attempt any refutation without knowing what the technical
> > > > > arguments are.
> > > > >
> > > >
> > > > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > > > [roughly speaking] name the same file? Now generalize.
> > >
> > > I never understood that argument. When do "abcdefghi", "Abcdefghi",
> > > "abcdefgh", "abcdefghi.", "./abcdefghi", ".\\abcdefghi" name the same
> > > file?
> > >
> >
> > False analogy. "There are some things that vary" isn't the same as
> > "everything can vary."
>
> I'm trying to understand. The meaning of a 'char' file name is
> implementation defined. This is acceptable. The behavior of a
> 'wchar_t' file name is implementation defined. This is unacceptable.
> Why?

Because the context is different. Character-based file names are well
understood and universally supported. Neither is true of wide character
filenames.

>
> > > Even better, when do "aebc" and "aebc" name the same file?
> >
> > Yes, much better. When run in applications on the same system that refer
> > to the same directory they always name the same file. Unless, of course,
> > you take the nihilistic view that since the standard doesn't say how the
> > names should be mapped it's impossible to talk about what file names
> > mean. Common experience indicates otherwise. On the other hand, there is
> > no common experience to indicate how any non-trivial wide character
> > string maps to the name of a file.
>
> One might argue that unless there is a standard wchar_t based fopen()
> that common experience will never materialize. Why should it?

That's not a standards issue. It's not the purpose of the C++ standard
to provide an experimental framework for cool new ideas. Those should be
explored in other places.

>
> > Further, "aebc" contains the same characters without regard to OS,
> > compiler, or locale. L"aebc", on the other hand, is an array of wide
> > characters whose contents are implementation-defined, possibly depending
> > on the locale that was set on the system at the time the code was
> > compiled. So when you say ofstream out(L"aebc") you may be using a
> > different name from the name used in ifstream in(L"aebc") in an
> > application compiled with a different compiler.
>
> Yes, but this is a general problem with L"literals", not file names.

Literals are one way of creating wide character strings which can be
used as file names. The translation issues that they present become much
more acute in this context.

> There's almost never a reason to use a wide literal as a file name,
> unless you are willing to accept the L"nonportability"; you could have
> chosen a narrow literal just as well.

Whether there's a reason, of course, has very little to do with whether
programmers will do it.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Thu, 11 Jul 2002 13:46:20 GMT Raw View

> Pete Becker <petebecker@acm.org> wrote in message
> > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > [roughly speaking] name the same file? Now generalize.

rmaddox@isicns.com (Randy Maddox) wrote
> Here is a more fundamental question:  Why should the stdlib care about
> this?  Is it not a question of how the OS deals with this?  Under
> Windows if I have a file named "aebc" I can ask for it as "aebc" or
> "AEBC" or even "AeBc" and I get the same file.  Under Unix I can only
> ask for it as "aebc".  I don't expect my code to work the same on
> Windows and Unix, so why should I not expect some differences between
> an OS that supports wide-character file names and one that does not?
> I would certainly not expect the library to hide these differences,
> just as I don't expect it to hide the difference in case handling in
> file names between Windows and Unix.

Name an OS that uses both Unicode and non-Unicode characters in
filenames. Is there such a beast? If not, then why would we need two
versions of functions that name files?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Michiel.Salters@cmg.nl (Michiel Salters)
Date: Thu, 11 Jul 2002 13:46:33 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2A1F03.8F1E5341@acm.org>...
> Randy Maddox wrote:
> >
> > Here is a more fundamental question:  Why should the stdlib care about
> > this?  Is it not a question of how the OS deals with this?
>
> What should the name L"aebc" mean when an application is compiled for an
> OS that doesn't support file names with character sizes corresponding to
> wchar_t?

Well, something similar to "aebc", i.e whatever the implementattion
likes. What does "aebc" mean when an application is compiled for
as OS that doesn't support file names with character sizes
corresponding to /char/ ?

I think the only requirement that should be imposed is that identical
filenames denote the same file, if the application doesn't switch
locales or restarts between the two uses of that name. That is,
L"aebc" is the same file as L"aebc", and "aebc" equals "aebc" if
the locale isn't changed. It would remain up to the implementation
whether "aebc" equals "aebc:", "Aebc", L"aebc", ".\aebc" and by
the same logic L"aebc" might or might refer to the same file as
L"Aebc" etc.

For all I care a legal implementation of wchar_t file names on
systems supporting only ASCII filenames natively would be to map
all Unicode names to "Unicode_not_supported.txt". The point is that
if there are a number of similar reasonable implementations we
can leave the implementation to QoI and specify only the interface.

Regards,
--
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Thu, 11 Jul 2002 13:46:27 GMT Raw View

Pete Becker <petebecker@acm.org> writes:

> Yes, much better. When run in applications on the same system that refer
> to the same directory they always name the same file. Unless, of course,
> you take the nihilistic view that since the standard doesn't say how the
> names should be mapped it's impossible to talk about what file names
> mean. Common experience indicates otherwise. On the other hand, there is
> no common experience to indicate how any non-trivial wide character
> string maps to the name of a file.

There sure is. Some systems (notably Windows NT) do support arbitrary
wide character file names (with usual limitations that some characters
cannot be used in a file name); on those systems, the same wide string
always refers to the same file - independent of the locale.

It is the narrow character strings whose interpretation varies with
locale on such systems.

> Further, "aebc" contains the same characters without regard to OS,
> compiler, or locale.

That does not mean that those characters always name the same file; on
many systems, file names are byte strings, not character strings.

> mbstowcs, of course, maps NTBS's to wide character arrays in accordance
> with the locale that is set at the time the function is called. So the
> name you get with it can change during execution of a single
> application, and you may find that you can't read the data that you
> wrote earlier.

The same may happen with a narrow character string.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Thu, 11 Jul 2002 13:46:40 GMT Raw View

beman_d@yahoo.com (Beman Dawes) wrote in message news:<fbc37fef.0207081738.2e2a1af5@posting.google.com>...
> rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0207080559.548a5e9e@posting.google.com>...
> >
> > > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > > [roughly speaking] name the same file? Now generalize.
> > >
> >
> > Here is a more fundamental question:  Why should the stdlib care about
> > this?
>
> When the standard library provides a function with behavior which is
> entirely implementation defined, an illusion of portability has been
> created where no real portability exists.

But the meaning of a file name _is_ already entirely implementation defined.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 11 Jul 2002 13:47:01 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2B372B.1B5CD755@acm.org>...
> >
> > You are saying that the conversion to and from narrow character strings and
> > wide character strings is affected by the locale. This is known. But it is a
> > programmer's problem, not the language or library.
>
> That's one view. If you can't read your data because the locale changed
> it's not very comforting.

Even if the locale changes you should still be able to read your data
by using the original file name.  Again, changing locales changes a
lot of things, and to expect to be able to generate a new file name
under a different locale and have it match a file name generated under
a different locale is simply not reasonable.  Plus which, forget about
file names, if the locale changes how do you know you will even be
able to understand the data that was written under another locale?

Here's an analogy:  You would not expect the translation of an English
word into French to match the translation of the same English word
into German, so why would you expect equivalent behavior for file
names?

> Once again: implement it and get some real-world experience with it.
>
> That is one view. Implement it, use it, and see if you still believe it.
>

You're dropping back into your previous habit of not refuting the
argument but rather changing the subject by making a suggestion which
is intended only to discourage the commenter.  Does this indicate that
your technical arguments are too weak to stand on their own?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 11 Jul 2002 13:47:04 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2B2DC8.C2C2A633@acm.org>...
> Peter Dimov wrote:
> >
> > Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...

> mbstowcs, of course, maps NTBS's to wide character arrays in accordance
> with the locale that is set at the time the function is called. So the
> name you get with it can change during execution of a single
> application, and you may find that you can't read the data that you
> wrote earlier.
>
> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>

Absolutely correct!  So given that you don't necessarily expect to be
able to read the data written under a different locale, how can you
object that we are saying that you should also not be able to expect
to use a file name translated under a different locale?  Locale
changes directly imply that everything character related may change.
Why should file names be any different?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 11 Jul 2002 13:47:34 GMT Raw View

Since the thread here is getting so deeply nested, let's see if I can
summarize the arguments again.

First, there no longer seems to be any argument about wide character
support
in template what arguments to be provided by a templatized base class
along
the lines of what was done for strings with basic_string, string and
wstring.
Reusing a previous solution seems like a good idea, and this would
have zero
impact on existing code.  Those who need and want wide character
exception
what arguments could use exceptions derived from wexception, while
those who
don't would merely continue to use exceptions derived from exception.
The
entire std::exception hierarchy would remain exactly as is.

I may be wrong about that discussion being over, but no one seems to
be
saying anything about it anymore.  Instead, the discussion has
centered on
wide character file names.

Here the main objection seems to be related to the following scenario
provided by Pete Becker.

1. Start with a file name string and a locale.
2. Translate that file name string in a locale dependent way.
3. Use the translated string to open a file.
4. Write to the file, then close it.
5. Change to a different locale.
6. Translate the original file name string in a locale dependent way.
7. Try to use the newly translated string to open the same file.

The issue here is that the second open may fail because the OS may not
be
able to translate the new name to match the name the file was
originally
opened with. However, Pete Becker also pointed out that even if the
open
succeeds, we may not be able to read the data in the file anymore
since it
was written under a different locale.

None of that can be disagreed with, but my counter-argument is that,
given
that we cannot expect to read the data under different locales, why
should we
expect to be able to open the same file with names translated under
different
locales?  The latter expectation is inconsistent and should not, IMHO,
be
supported.  Stating the obvious fact that these expectations are not
consistent with each other is an argument that supports that position,
rather
than an argument against wide character file name support in general.

A secondary objection revolves around the issue of file systems that
do not
support wide character file names.  Since what constitutes a valid
file name,
as well as what actual file a valid file name is mapped to, are both
already
implementation defined I don't see the problem here.  If the OS does
not
support wide character file names then, by definition, no wide
character file
name is a valid file name in that implementation, unless the OS
happens to
provide a translation mechanism.  In any case, that is an
implementation
issue and not a library issue.  If the open succeeds it succeeds.  If
it
fails, it fails.

Coupled with these issues is the claim that providing wide character
file
name support in the stdlib provides an illusion of portability that
does not
exist.  IMHO, this is not a valid argument unless you also want to
argue that
any wide character support in C++ provides the same illusion.  The
fact that
we can use different character sets and locales in C++ does not imply
that we
can use any character set and locale and still expect to be able to
read and
write files, or even screen output, that can be understood under every
character set and locale combination.  Again, stating the obvious fact
that
everything character related may change between different character
sets and
locales is not a valid argument against providing support for
different
character sets and locales.  Instead it is an argument that developers
who
use such support must be careful about how they do so.

A major part of the philosophy behind C++ is that it should allow the
programmer the freedom to do what is necessary in a natural way. The
language
should not prevent the programmer from accomplishing those tasks
necessary to
deliver what is needed.

The flip side of power is always responsibility to use that power
correctly.
Since C++ does not needlessly restrict what programmers may do, it
also
allows the programmer to shoot him/herself in the foot.  Support for
different character sets and locales is no different.  If we accept as
valid
the argument that programmers can get into trouble using these
features, then
what other unsafe features should we eliminate?  Should we get rid of
pointers because they are a source of many program errors?  I don't
think
anyone (other than Java developers :-)) will make that argument.  So
why
should the fact that wide character file name support might also lead
to
program errors be an argument against providing that support?

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Thu, 11 Jul 2002 13:48:21 GMT Raw View

Pete Becker <petebecker@acm.org> writes:

> Because the context is different. Character-based file names are well
> understood and universally supported.

That is not true. Character-based file names are misunderstood, and
byte-based file names is what is universally supported. In most
systems using byte-based file names, the meaning of a file name (in
terms of characters) can vary with the locale.

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 13:47:58 GMT Raw View

Richard J Cox wrote:
>
> > Nope. "abc\\def" means the same thing at any time during the running of
> > an application. mbstowcs("aebc") can change when you change locales.
>
> Not as a filename, or rather if used as a filename it is implementation
> defined whether it means anything whatsoever.

Sigh. The contents of the string "abc\\def" do not change from OS to OS,
from compiler to compiler, or from locale to locale. The same cannot be
said of L"aebc" nor of mbstowcs("aebc"). If you want to open the same
file it's much easier if you can be fairly sure that you're using the
same name.

>
> I.e. where do the standards (C or C++) say that the two calls to fopen
> here
>
>   FILE* f = fopen("a", "r");
>   fclose(f);
>   f = fopen("a", "r");

See my reply to Peter Dimov, who raised the same red herring.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 13:48:11 GMT Raw View

Randy Maddox wrote:
>
> If the OS doesn't support it then it should mean nothing except that
> trying to use it to open a file won't work.  That seems simple enough.
>

In fact it's beyond simple enough: it's too simple. It means that you
cannot write reasonably portable code that uses it.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Thu, 11 Jul 2002 15:26:39 GMT Raw View

pdimov@mmltd.net (Peter Dimov) wrote in message
news:<7dc3b1ea.0207100355.46120119@posting.google.com>...

> I'm trying to understand. The meaning of a 'char' file name is
> implementation defined. This is acceptable. The behavior of a
> 'wchar_t' file name is implementation defined. This is unacceptable.
> Why?

Because although the standard leaves it implementation defined, we
know more or less what to expect from an implementation of a
reasonable quality on a typical OS.  (The "implementation defined", in
this case, is to make allowances for untypical OS's.)

When we standardize something as "implementation defined", that
doesn't mean that we shouldn't have the slightest idea of what it
might look like on, say, Unix or Windows, even if the standard doesn't
want to impose that behavior on all machines.  In this case, we don't
have the slightest idea what a Unix implementation should do; every
time the question comes up, there are several posters who point out
"obvious" semantics -- but the obvious semantics are different for
each of the posters.  To me, this is a proof that we don't yet know.

> > > Even better, when do "aebc" and "aebc" name the same file?

> > Yes, much better. When run in applications on the same system that
> > refer to the same directory they always name the same
> > file. Unless, of course, you take the nihilistic view that since
> > the standard doesn't say how the names should be mapped it's
> > impossible to talk about what file names mean. Common experience
> > indicates otherwise. On the other hand, there is no common
> > experience to indicate how any non-trivial wide character string
> > maps to the name of a file.

> One might argue that unless there is a standard wchar_t based
> fopen() that common experience will never materialize. Why should
> it?

Good question.  Why should it?  If there is no need for it, or the
need is so slight that no one is willing to invest the time and money
to implement it unless the standard mandates it, maybe we are better
off without it.  It's not as if C++ suffered from being too small and
too simple.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 16:51:45 GMT Raw View

"Martin v. L=F6wis" wrote:
>=20
> In most
> systems using byte-based file names, the meaning of a file name (in
> terms of characters) can vary with the locale.
>=20

There is a well-understood set of rules for creating portable names.

--=20
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 16:51:54 GMT Raw View

Peter Dimov wrote:
>
> beman_d@yahoo.com (Beman Dawes) wrote in message news:<fbc37fef.0207081738.2e2a1af5@posting.google.com>...
> > rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0207080559.548a5e9e@posting.google.com>...
> > >
> > > > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > > > [roughly speaking] name the same file? Now generalize.
> > > >
> > >
> > > Here is a more fundamental question:  Why should the stdlib care about
> > > this?
> >
> > When the standard library provides a function with behavior which is
> > entirely implementation defined, an illusion of portability has been
> > created where no real portability exists.
>
> But the meaning of a file name _is_ already entirely implementation defined.
>

Yes, Beman's comment was a little too narrow. There are well-understood
ways of writing char-based names that are portable, and that's part of
the context of the C/C++ rules for file names.

A good illustration of the difference between char-based file names and
wide-character-based file names is the number of messages in this thread
that have given examples of non-portable char-based names, and the lack
of messages (other than mine) that have given examples of non-portable
wide-character-based names. There's a shared context for char-based
names; there is none for wide-character-based names.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 16:52:03 GMT Raw View

"Martin v. L=F6wis" wrote:
>=20
> Pete Becker <petebecker@acm.org> writes:
>=20
> > Yes, much better. When run in applications on the same system that re=
fer
> > to the same directory they always name the same file. Unless, of cour=
se,
> > you take the nihilistic view that since the standard doesn't say how =
the
> > names should be mapped it's impossible to talk about what file names
> > mean. Common experience indicates otherwise. On the other hand, there=
 is
> > no common experience to indicate how any non-trivial wide character
> > string maps to the name of a file.
>=20
> There sure is. Some systems (notably Windows NT) do support arbitrary
> wide character file names (with usual limitations that some characters
> cannot be used in a file name); on those systems, the same wide string
> always refers to the same file - independent of the locale.

That's experience, but it's system-specific, not common.

--=20
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 17:49:49 GMT Raw View

Michiel Salters wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D2A1F03.8F1E5341@acm.org>...
> > Randy Maddox wrote:
> > >
> > > Here is a more fundamental question:  Why should the stdlib care about
> > > this?  Is it not a question of how the OS deals with this?
> >
> > What should the name L"aebc" mean when an application is compiled for an
> > OS that doesn't support file names with character sizes corresponding to
> > wchar_t?
>
> Well, something similar to "aebc", i.e whatever the implementattion
> likes. What does "aebc" mean when an application is compiled for
> as OS that doesn't support file names with character sizes
> corresponding to /char/ ?

I don't know. What is your experience with such systems?

>
> I think the only requirement that should be imposed is that identical
> filenames denote the same file, if the application doesn't switch
> locales or restarts between the two uses of that name. That is,
> L"aebc" is the same file as L"aebc", and "aebc" equals "aebc" if
> the locale isn't changed. It would remain up to the implementation
> whether "aebc" equals "aebc:", "Aebc", L"aebc", ".\aebc" and by
> the same logic L"aebc" might or might refer to the same file as
> L"Aebc" etc.

Maybe. Implement it and get some real-world experience with it. If you
still think those are good rules, write it up and propose it.

>
> For all I care a legal implementation of wchar_t file names on
> systems supporting only ASCII filenames natively would be to map
> all Unicode names to "Unicode_not_supported.txt". The point is that
> if there are a number of similar reasonable implementations we
> can leave the implementation to QoI and specify only the interface.
>

Fine. Produce one of those implementations so that people can judge its
quality.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rjcox@cix-remove-me-.co.uk (Richard J Cox)
Date: Thu, 11 Jul 2002 19:03:45 GMT Raw View

In article <3D2B2DC8.C2C2A633@acm.org>, petebecker@acm.org (Pete Becker)
wrote:

> Further, "aebc" contains the same characters without regard to OS,
> compiler, or locale.

Does it? What about ASCII vs. EBCDIC? (modulo my memory of the
abbreviations :-) (Given filenames will always need to go beyond the
bounds of C++'s implementation character set and into the OS' current
character set).

> L"aebc", on the other hand, is an array of wide
> characters whose contents are implementation-defined, possibly depending
> on the locale that was set on the system at the time the code was
> compiled.

Perhaps this is part of the problem, C++ doesn't actually define
sufficient semantics to wide characters to be able to really start
thinking about a narrowing operations (or visa versa).

Perhaps rather than trying to specify what L"A" means as a filename, a
step back is needed to ask what the relationship between the semantics of
L"a" and "a" is when moving to the host environment.

PS. for a implementation of wide character support in filenames, I refer
you to MS' implementation of _tfopen (see open.c in the standard lib
source). When UNICODE is defined this takes wide character arguments.

(To an extent this implementation is cheating, it does not need to narrow,
if necessary -- not on NTFS -- the OS API (CreateFileW) will do it;
however it does show an implementation that has been around for quite a
while.)

--
rjcox at cix dot co dot uk

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 11 Jul 2002 19:04:04 GMT Raw View

Allan_W@my-dejanews.com (Allan W) wrote in message news:<23b84d65.0207091445.72e78f3d@posting.google.com>...
> > Pete Becker <petebecker@acm.org> wrote in message
> > > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > > [roughly speaking] name the same file? Now generalize.
>
> Name an OS that uses both Unicode and non-Unicode characters in
> filenames. Is there such a beast? If not, then why would we need two
> versions of functions that name files?
>

I don't know of any OS that mixes both Unicode and non-Unicode
characters in the same file name, nor do I expect such to exist since
that really would make no sense.  However, Windows NT and its
derivatives, such as XP and 2000, all use Unicode internally, which
means that when you open a file with a char string it is translated to
Unicode internally, and that translation is almost certainly dependent
on the current locale.  So providing both would allow code that
already exists to continue to work as it does, but would also allow
the use of wide character file names as NT uses internally anyway.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Thu, 11 Jul 2002 19:05:23 GMT Raw View

James Kanze wrote:

> pdimov@mmltd.net (Peter Dimov) wrote in message
[...snip...]
>>One might argue that unless there is a standard wchar_t based
>>fopen() that common experience will never materialize. Why should
>>it?
>>
>
> Good question.  Why should it?  If there is no need for it, or the
> need is so slight that no one is willing to invest the time and money
> to implement it unless the standard mandates it, maybe we are better
> off without it.  It's not as if C++ suffered from being too small and
> too simple.

   Assume in coming months or years that at least one person in the
world invests the time & money to build out a
proposal-for-standardization & implementation.  Rather than relying only
on off-the-cuff divergent hearsay in newsgroups, that person would want
to rely more heavily on an exhaustive survey-research of how all popular
modern operating systems work with wide-character filesystem identifiers
(where "popular modern" is defined very generously, e.g., at least:
Windows variants; SGI IRIX; HP-UX; Tru64; OpenVMS; Caldera OpenUnix 8=
System V Release 5, the successor to UnixWare; Solaris; Linux
(variants); AIX; MacOS-X; BSD; OS/390; PalmOS; Plan 9; any
wide-character-supporting internationalized RTOSes (do any exist?);
emerging/forthcoming OSes).

   I suspect that such a survey would expose more convergence &
commonality than divergence & dispute regarding wide/multibyte
characters in general and Unicode in particular.  (But I could be wrong,
since I have neither written nor read such an exhaustive survey; getting
an accurate existential-generalization depiction of reality is what the
exhaustive survey would be for.)

   I have immensely enjoyed this thread.  There is one component which I
have found lacking though: the pragmatic reality dictated by the real
world.  On the proponent side, I would request that the exhaustive
survey-research be performed to fortify the argument that at least one
form of sane commonality already exists as a fruit which is ripe for
standardization.  On the contrarian side, I would ask that those people
whose views appear to argue wholesale against standardization at this
time of wide-character/MBCS/Unicode file-names to use concrete examples
from real-world operating systems today, showing that the wide-character
filesystem identifier topic is in the complete nonportable shambles that
is claimed and that all of the alternative roads lead to heck too.
Unlike the proponents who really need to perform the aforementioned
exhaustive survey-research of *all* modern wide-character-capable
operating systems to show enough commonality to standardize, the
contrarians need only show two scenarios  (maybe even on the same
real-world operating system) :
   1) between which portability is so effectively complicated/inhibited
to preclude standardization of wide-character filesystem identifiers any
time soon and
   2) for which they prove that there exists absolutely no viable
alternative which does not have that complexity/inhibition.  This second
criterion is to eliminate the playing of spurious trump cards in an
attempt to cast aside the entire topic for no good reason.

   Discussing a topic without referring to real-world scenarios at all
permits a style of argumentation which can be purely obstructive, when
in fact the obstruction is not observable in reality.  For example, let
us assume that neither C nor C++ ever had neither a ^ operator nor a ^=
operator.  Discussing the potential standardization of some XOR proposal
might similarly divide into two viewpoints: proponents who want to see
XOR standardized soon and contrarians who want to at least hold off from
standardizing XOR or at most never standardize XOR.  The contrarians
could theoretically/hypothetically discuss thought experiments where
processors which do not implement XOR in their instruction set as
justification for their nonproponent viewpoint regarding
XOR-standardization.  The proponents could perform an exhaustive
survey-research of all processor instruction sets so far invented by
humankind, say, all 817 of them.

   The proponents would find that, say, 798 of the instruction sets had
a suitable XOR instruction and of the remaining 19, 15 were last sold
commercially in the early 1960s, and of the remaining four 3 are
extremely unpopular/trivial platforms which are unlikely to have C++
(e.g., a child's MyFirstComputerBreadboard toy) and the remaining one
which is to support C++ has other instructions which can easily
implement XOR efficiently via multiple instructions.  Thus by
existential generalization the proponents of XOR operators could fortify
their argument that sufficient commonality exists on all modern
platforms to consider standardization.  (And if this is how the survey
were to turn out, the contrarians would be hard-pressed to find two
scenarios between which XOR portability would be complex/inhibited in
real-world platforms, making discussing their viewpoint using only
theoretical/hypothetical/fictional thought-experiments their preferred
comfortable style of argumentation.  Conversely, if the survey's results
were fabricated in the proponents' favor in some way, the contrarians
would have something concretely founded in reality to criticize without
relying solely on pure thought experiments.)

  In short, where there is a will (to model reality) there is a way (provided to us by reality).  (We merely need to discover/uncover/expose it.)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 11 Jul 2002 22:05:21 GMT Raw View

Richard J Cox wrote:
>
> In article <3D2B2DC8.C2C2A633@acm.org>, petebecker@acm.org (Pete Becker)
> wrote:
>
> > Further, "aebc" contains the same characters without regard to OS,
> > compiler, or locale.
>
> Does it? What about ASCII vs. EBCDIC?

Makes no difference: the array holds the same characters. "aebc"[1] ==
'e', regardless of the underlying encoding.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Fri, 12 Jul 2002 03:50:46 GMT Raw View

"Allan W" <Allan_W@my-dejanews.com> wrote in message
news:23b84d65.0207091445.72e78f3d@posting.google.com...
> > Pete Becker <petebecker@acm.org> wrote in message
> > > Once again, in miniature: when do "aebc", L"aebc", and
mbstowcs("aebc")
> > > [roughly speaking] name the same file? Now generalize.
>
> rmaddox@isicns.com (Randy Maddox) wrote
> > Here is a more fundamental question:  Why should the stdlib care about
> > this?  Is it not a question of how the OS deals with this?  Under
> > Windows if I have a file named "aebc" I can ask for it as "aebc" or
> > "AEBC" or even "AeBc" and I get the same file.  Under Unix I can only
> > ask for it as "aebc".  I don't expect my code to work the same on
> > Windows and Unix, so why should I not expect some differences between
> > an OS that supports wide-character file names and one that does not?
> > I would certainly not expect the library to hide these differences,
> > just as I don't expect it to hide the difference in case handling in
> > file names between Windows and Unix.
>
> Name an OS that uses both Unicode and non-Unicode characters in
> filenames. Is there such a beast? If not, then why would we need two
> versions of functions that name files?

Microsoft Windows. Of course this is an unimportant and little-used OS of no
real consequence, but you asked for one so I dug one up for you.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 12 Jul 2002 16:12:48 GMT Raw View

kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0207110615.23bb1c2b@posting.google.com>...
> pdimov@mmltd.net (Peter Dimov) wrote in message
> news:<7dc3b1ea.0207100355.46120119@posting.google.com>...
>
> > I'm trying to understand. The meaning of a 'char' file name is
> > implementation defined. This is acceptable. The behavior of a
> > 'wchar_t' file name is implementation defined. This is unacceptable.
> > Why?
>
> Because although the standard leaves it implementation defined, we
> know more or less what to expect from an implementation of a
> reasonable quality on a typical OS.  (The "implementation defined", in
> this case, is to make allowances for untypical OS's.)

Let me put it this way. A portable interface doesn't guarantee
portability. But the lack of a portable interface guarantees lack of
portability. Do we want that portability or not? If we do, it is our
responsibility to provide the portable interface.

> When we standardize something as "implementation defined", that
> doesn't mean that we shouldn't have the slightest idea of what it
> might look like on, say, Unix or Windows, even if the standard doesn't
> want to impose that behavior on all machines.

OSes exist (Windows NT) that store file names using wide characters.
Those OSes have to translate narrow file names. It's only fair to
provide a wfopen() that will require no translation.

> In this case, we don't
> have the slightest idea what a Unix implementation should do; every
> time the question comes up, there are several posters who point out
> "obvious" semantics -- but the obvious semantics are different for
> each of the posters.  To me, this is a proof that we don't yet know.

The Unix implementation should do whatever the market decides it
should do. In time, a de-facto standard will emerge. (If I had to
implement wfopen() for Unix, I'd simply use UTF-8.)

It doesn't matter that posters are unable to come up with a single
obvious definition. They never are.

> > One might argue that unless there is a standard wchar_t based
> > fopen() that common experience will never materialize. Why should
> > it?
>
> Good question.  Why should it?  If there is no need for it, or the
> need is so slight that no one is willing to invest the time and money
> to implement it unless the standard mandates it, maybe we are better
> off without it.

We're back to the original question. Do we need a portable way to open
a file with a wide character name? "No" is a legitimate answer, feel
free to defend it. :-)

On the other hand, if the answer is "yes," that portable way isn't
likely to appear without a std::wfopen. People will use _wfopen,
std::ifstream(FILE*), and whatever platform-specific ways are
available. You might argue that there's nothing wrong with that, and
this might be true, but this has absolutely nothing to do with the
non-argument that wfopen() is a terrible idea because of its
implementation defined semantics.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 12 Jul 2002 16:13:53 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0207100638.225abf1e@posting.google.com>...
> Since the thread here is getting so deeply nested, let's see if I
> can summarize the arguments again.

> First, there no longer seems to be any argument about wide character
> support in template what arguments to be provided by a templatized
> base class along the lines of what was done for strings with
> basic_string, string and wstring.

Right.  That was a no-runner from the start (unless someone seriously
wants templated catch blocks).

> Reusing a previous solution seems like a good idea, and this would
> have zero impact on existing code.  Those who need and want wide
> character exception what arguments could use exceptions derived from
> wexception,

No they couldn't, because the standard library will throw exception,
and not wexception.

> while those who don't would merely continue to use exceptions
> derived from exception.  The entire std::exception hierarchy would
> remain exactly as is.

Right.  Everything will remain exactly as is, since there is no need
nor no reasonable implementation for wide character exceptions.

Can you understand that those of us who actually work in an
international environment, have a need for wide characters in general,
and have practical experience with the problems involved, neither need
nor want wide character exceptions?

> I may be wrong about that discussion being over, but no one seems to
> be saying anything about it anymore.  Instead, the discussion has
> centered on wide character file names.

Right.  Because in this case, there is a reasonable need.

> Here the main objection seems to be related to the following scenario
> provided by Pete Becker.

> 1. Start with a file name string and a locale.
> 2. Translate that file name string in a locale dependent way.
> 3. Use the translated string to open a file.
> 4. Write to the file, then close it.
> 5. Change to a different locale.
> 6. Translate the original file name string in a locale dependent way.
> 7. Try to use the newly translated string to open the same file.

You seem to have misunderstood Pete's main objection.  It has nothing
to do with any particular scenario.  It is simply that for most of the
possible scenarios, several solutions are possible.  Only one is
correct, but without experience, we don't know which one.

The entire argument against wide character file names rests on a lack
of experience with them.  And no amount of discussion here is going to
create that experience, so no amount of discussion here will create a
movement for them.  I'm fully convinced of their utility, but I
haven't the slightest idea what their semantics should be.  And every
time I ask, there are several claims that the semantics are obvious.
Each claim with different "obvious" semantics.

There's an interesting article by James Gosling at
http://java.sun.com/people/jag/StandardsPhases/index.html, where he
discusses when something should be standardized.  (The article
precedes the development of Java, and it is interesting to consider
what it might mean in standardizing Java, but that isn't relevant
here.)  In the case of wide character file names, it is clear (at
least to me, who has to deal with multiple languages every day) that
we are still at a point where the technology has not been fully
understood.

While this hasn't prevented adding things to the standard in the past,
I hope we have learned a bit since then.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 12 Jul 2002 11:55:32 CST Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2DB549.8E373619@acm.org>...
> "Martin v. L?is" wrote:
> >
> > In most
> > systems using byte-based file names, the meaning of a file name (in
> > terms of characters) can vary with the locale.
> >
>
> There is a well-understood set of rules for creating portable names.

There sure is, but this doesn't help a C++ program that needs to open
an existing file with a non-portable name. When the OS uses wide
character file names, the same narrow character literal doesn't
necessarily name the same file.

When the input character type, in_char_t, and the OS character type,
os_char_t, are different, we encounter implementation defined
translation, with its associated issues. This is true for in_char_t ==
wchar_t and os_char_t == char, but it's equally true for in_char_t ==
char and os_char_t == wchar_t.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: loewis@informatik.hu-berlin.de (Martin v. =?iso-8859-1?q?L=F6wis?=)
Date: Fri, 12 Jul 2002 18:35:39 GMT Raw View

Pete Becker <petebecker@acm.org> writes:

> > > What should the name L"aebc" mean when an application is compiled for an
> > > OS that doesn't support file names with character sizes corresponding to
> > > wchar_t?
> >
> > Well, something similar to "aebc", i.e whatever the implementattion
> > likes. What does "aebc" mean when an application is compiled for
> > as OS that doesn't support file names with character sizes
> > corresponding to /char/ ?
>
> I don't know. What is your experience with such systems?

In my experience, on such systems, the char file names are converted
to wchar file names, using mbstowcs (they call that
MultiByteToWideChar, as they need to pass an additional parameter
depending on the type of binary you are executing).

Regards,
Martin

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: James Dennett <jdennett@acm.org>
Date: Fri, 12 Jul 2002 18:36:03 GMT Raw View

Randy Maddox wrote:
> Since the thread here is getting so deeply nested, let's see
 > if I can summarize the arguments again.
>
> First, there no longer seems to be any argument about wide
 > character support in template what arguments to be provided
 > by a templatized base class along the lines of what was done
 > for strings with basic_string, string and wstring.  Reusing
 > a previous solution seems like a good idea, and this would
> have zero impact on existing code.  Those who need and want
 > wide character exception what arguments could use exceptions
 > derived from wexception, while those who don't would merely
 > continue to use exceptions derived from exception.  The
> entire std::exception hierarchy would remain exactly as is.
>
> I may be wrong about that discussion being over, but no one
 > seems to be saying anything about it anymore.

It seemed to me that the proposal "lost" -- there was no good
argument for adding anything like std::basic_exception, and
there are good reasons to avoid it, including
* it breaks the idiom of catching std::exception&, as that
   would no longer catch the whole standard exception hierarchy;
   hence, it does have impact on existing code in libraries in
   spite of the claim made above, because those libraries will
   be re-used in future and would require modification
* it involves significant amounts of work to write a proposal
   and to gather experience of use
* users are already free to do similar things
* it fails to meet the needs of internationalised applications
   anyway, as they would typically throw an exception with a
   message id if they needed text to be presented to a user,
   and for other text a narrow string generally suffices
* no evidence of widespread "prior art" has been shown, and
   standardisation is not supposed to be an inventive process.

Possibly the conversation died away because these arguments
were unanswered, and those who presented them assumed that
the lack of a good basis for a proposal meant that the idea
had been given up.

--
James Dennett <jdennett@acm.org>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Ken Hagan" <K.Hagan@thermoteknix.co.uk>
Date: Fri, 12 Jul 2002 18:36:27 GMT Raw View

"Allan W" <Allan_W@my-dejanews.com> wrote...
>
> Name an OS that uses both Unicode and non-Unicode characters in
> filenames. Is there such a beast?

MS Windows supports a variety of file systems (NTFS, FAT, CDFS ...).
The more recent ones store filenames as UTF-16 on the disc. The ones
inherited from DOS use 8-bit characters in whatever code page was the
default for that country's version of DOS.

Windows 2000 and XP let you mount one file-system on top of another.
(I know, UNIX nerds will be stunned at such an idea :-) Thus, a single
filename in Win2K might have components that use more than one character
set. I recall some discussion of this when the boost file-system library
was being debated, but I'm not sure if anyone came up with a solution
or whether indeed it was considered much of a problem (in that context).

> If not, then why would we need two versions of functions that name files?

Even if so, why do we need two versions? Personally I'm happy with a
locale dependent mapping. I'm happy that it is possible for one program
to name files that another program (different locale) can't see. I'm
also happy that if I change my locale then I might change the set of
visible files. If I were seeking to write highly portable code, I'd
use a character encoding like UTF-16 that had a fighting change of being
able to represent all files, and if someone changes my locale without
my knowledge, then that's their problem just as much as if they deleted
a file I was using.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Sat, 13 Jul 2002 14:34:12 GMT Raw View

"Martin v. L=F6wis" wrote:
>=20
> Pete Becker <petebecker@acm.org> writes:
>=20
> > > > What should the name L"aebc" mean when an application is compiled=
 for an
> > > > OS that doesn't support file names with character sizes correspon=
ding to
> > > > wchar_t?
> > >
> > > Well, something similar to "aebc", i.e whatever the implementattion
> > > likes. What does "aebc" mean when an application is compiled for
> > > as OS that doesn't support file names with character sizes
> > > corresponding to /char/ ?
> >
> > I don't know. What is your experience with such systems?
>=20
> In my experience, on such systems, the char file names are converted
> to wchar file names, using mbstowcs (they call that
> MultiByteToWideChar, as they need to pass an additional parameter
> depending on the type of binary you are executing).
>=20

If MultiByteToWideChar is the Win32 API function, then you aren't
talking about a system that doesn't support file names with character
sizes coresponding to char.

--=20
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 8 Jul 2002 12:32:38 CST Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> Randy Maddox wrote:
> >
> > Now, you have repeatedly stated that these ideas have been considered
> > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > could you please share the basis for that rejection?  It's rather
> > difficult to attempt any refutation without knowing what the technical
> > arguments are.
> >
>
> Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> [roughly speaking] name the same file? Now generalize.

Is there any reason this needs to be more complicated than:

  "aebc" is not equal to L"aebc" or to mbstowcs("aebc") so it can
never name the same file as either of those.

  If L"aebc" == mbstowcs("aebc") then they name the same file,
otherwise not.

That is, a narrow character name matches only the same narrow
character file name, and a wide character name matches only the same
wide character file name.  Why should this be handled any differently
than the issue of case in file names in Unix?  A match is a match.  A
non-match is not.

Randy.

> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 8 Jul 2002 12:33:08 CST Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> Randy Maddox wrote:
> >
> > Now, you have repeatedly stated that these ideas have been considered
> > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > could you please share the basis for that rejection?  It's rather
> > difficult to attempt any refutation without knowing what the technical
> > arguments are.
> >
>
> Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> [roughly speaking] name the same file? Now generalize.
>

Here is a more fundamental question:  Why should the stdlib care about
this?  Is it not a question of how the OS deals with this?  Under
Windows if I have a file named "aebc" I can ask for it as "aebc" or
"AEBC" or even "AeBc" and I get the same file.  Under Unix I can only
ask for it as "aebc".  I don't expect my code to work the same on
Windows and Unix, so why should I not expect some differences between
an OS that supports wide-character file names and one that does not?
I would certainly not expect the library to hide these differences,
just as I don't expect it to hide the difference in case handling in
file names between Windows and Unix.

Randy.

> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

======================================= MODERATOR'S COMMENT:
 Please trim footers when replying to posts

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 08:31:20 GMT Raw View

Randy Maddox wrote:
>

Whoops, the example I gave was wrong. Try this:

char name[] = "aebc";
wchar_t wname0[] = L"aebc";
whcar_t wname1[WIDE_NAME_MAX];
ofstream out(L"aebc");
// write some data
// change locale
mbstate_t mbs = {0};
wcstombs("aebc", wname1, WIDE_NAME_MAX, &mbs);
ifstream in(wname1); // whoops, where'd my data go?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 16:16:33 GMT Raw View

Randy Maddox wrote:
>
> Here is a more fundamental question:  Why should the stdlib care about
> this?  Is it not a question of how the OS deals with this?

What should the name L"aebc" mean when an application is compiled for an
OS that doesn't support file names with character sizes corresponding to
wchar_t?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 16:16:45 GMT Raw View

Randy Maddox wrote:
>
>   If L"aebc" == mbstowcs("aebc") then they name the same file,
> otherwise not.
>

The reason this doesn't work is that the matching rules then depend on
what compiler you used and on what locale you're using, so the meaning
of a name can change not only from comiler to compiler but during a
single execution of the application.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 16:16:38 GMT Raw View

Richard J Cox wrote:
>
> In article <3D25CB23.9772D5A5@acm.org>, petebecker@acm.org (Pete Becker)
> wrote:
>
> > Randy Maddox wrote:
> > >
> > > Now, you have repeatedly stated that these ideas have been considered
> > > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > > could you please share the basis for that rejection?  It's rather
> > > difficult to attempt any refutation without knowing what the technical
> > > arguments are.
> > >
> >
> > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > [roughly speaking] name the same file? Now generalize.
> >
>
> Exactly when "abc\\def" and "abc/def" mean the same thing. Since all
> filenames used in library functions are implementation defined (C99
> 7.19.3/8) the standard need say nothing.

Nope. "abc\\def" means the same thing at any time during the running of
an application. mbstowcs("aebc") can change when you change locales.

ofstream out(L"aebc");
// write some data
// do something that changes the locale
ifstream in(L"aebc"); // whoops, where's my data?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 9 Jul 2002 16:17:34 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0207080546.7412eba7@posting.google.com>...

> Pete Becker <petebecker@acm.org> wrote in message
> news:<3D25CB23.9772D5A5@acm.org>...

> > Randy Maddox wrote:

> > > Now, you have repeatedly stated that these ideas have been
> > > considered in the past and rejected as bad ideas.  OK.  I can
> > > accept that, but could you please share the basis for that
> > > rejection?  It's rather difficult to attempt any refutation
> > > without knowing what the technical arguments are.

> > Once again, in miniature: when do "aebc", L"aebc", and
> > mbstowcs("aebc") [roughly speaking] name the same file? Now
> > generalize.

> Is there any reason this needs to be more complicated than:

>   "aebc" is not equal to L"aebc" or to mbstowcs("aebc") so it can
> never name the same file as either of those.

>   If L"aebc" == mbstowcs("aebc") then they name the same file,
> otherwise not.

> That is, a narrow character name matches only the same narrow
> character file name, and a wide character name matches only the same
> wide character file name.  Why should this be handled any
> differently than the issue of case in file names in Unix?

Possibly because what you propose is probably not implementable on the
most widely spread system which does allow wide character file names.

However, the strongest argument I see against premature
standardization of this is precisely in the responses to Pete's
posting.  What should be done is obvious.  So obvious that the two
responders proposed completely different "obvious" semantics.

I'd very much like to see some support for wide character file names.
But I'm not going to push for it until I have some idea what the
semantics should be.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Tue, 9 Jul 2002 16:17:28 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> Randy Maddox wrote:
> >
> > Now, you have repeatedly stated that these ideas have been considered
> > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > could you please share the basis for that rejection?  It's rather
> > difficult to attempt any refutation without knowing what the technical
> > arguments are.
> >
>
> Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> [roughly speaking] name the same file? Now generalize.

I never understood that argument. When do "abcdefghi", "Abcdefghi",
"abcdefgh", "abcdefghi.", "./abcdefghi", ".\\abcdefghi" name the same
file? Even better, when do "aebc" and "aebc" name the same file?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: beman_d@yahoo.com (Beman Dawes)
Date: Tue, 9 Jul 2002 16:59:38 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0207080559.548a5e9e@posting.google.com>...
>
> > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > [roughly speaking] name the same file? Now generalize.
> >
>
> Here is a more fundamental question:  Why should the stdlib care about
> this?

When the standard library provides a function with behavior which is
entirely implementation defined, an illusion of portability has been
created where no real portability exists.

Now if it is well known to implementors what behavior they are to
deliver, this might be (barely) acceptable.

But in the case of wide character file names on systems which don't
support them, there is no well known, argeed upon behavior.

>  Is it not a question of how the OS deals with this?  Under
> Windows if I have a file named "aebc" I can ask for it as "aebc" or
> "AEBC" or even "AeBc" and I get the same file.  Under Unix I can only
> ask for it as "aebc".  I don't expect my code to work the same on
> Windows and Unix, ...

Then why not just use the native API's?  They aren't portable, but
then you are saying you have no expectation of portability.

--Beman

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Ken Hagan" <K.Hagan@thermoteknix.co.uk>
Date: Tue, 9 Jul 2002 17:03:49 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote...
>
> Is there any reason this needs to be more complicated than:
>
>   "aebc" is not equal to L"aebc" or to mbstowcs("aebc") so it can
> never name the same file as either of those.
>
>   If L"aebc" == mbstowcs("aebc") then they name the same file,
> otherwise not.

Presumably, wcstombs(L"aebc") might be the same file as "aebc".

Now we have two distinct files. Let the end-user do a directory listing.
Suppose they want to delete one of them. Which do they pick?

In short, what do they see?

> Why should this be handled any differently than the issue of case in
> file names in Unix?  A match is a match.  A non-match is not.

A match may or may not be a match if one looks at pixels rather than bits.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: Tue, 9 Jul 2002 17:04:30 GMT Raw View

"Pete Becker" <petebecker@acm.org> wrote in message
news:3D2A23A4.6DF08E9F@acm.org...
> Randy Maddox wrote:
> >
>
> Whoops, the example I gave was wrong. Try this:
>
> char name[] = "aebc";
> wchar_t wname0[] = L"aebc";
> whcar_t wname1[WIDE_NAME_MAX];

wchar_t  wname1[WIDE_NAME_MAX];

> ofstream out(L"aebc");
> // write some data
> // change locale
> mbstate_t mbs = {0};
> wcstombs("aebc", wname1, WIDE_NAME_MAX, &mbs);

mbstowcs ?

> ifstream in(wname1); // whoops, where'd my data go?

You are saying that the conversion to and from narrow character strings and
wide character strings is affected by the locale. This is known. But it is a
programmer's problem, not the language or library. Narrow character file
names and, in our discussion, theoretical wide character file names are two
different file names. C++ should not guarantee a conversion between them as
representing the same actual file nor should it worry about the programmer
erroneously converting the name back and forth with a resulting different
file name.

There are two different issues here. One is the file name and two is the
actual file. C++ has never, to my understanding, tried to adjudicate that
different file names must not represent the same file or even that the same
file name must not represent different files. I believe the mapping of a
file name to an actual file must be implementation dependent and that
therefore ideas of guaranteed portability as far as file names referring to
actual files are not apropos to the C++ language specification.

The issue with wide character file names is simply to say that C++ should
support wide character file names in the C++ standard library and will leave
it up to the implementation to define how these names are mapped to actual
files. It should be enough to specify that file name "abc" is not the same
file name as L"abc" and leave it at that. Whether it is the same file or not
is something which the C++ standard should not attempt to specify.

My reason for wanting wide character file names support in the C++ standard
library is in order to support operating systems and locales where actual
files are more likely to be a wide character names in some sort of Unicode
variant. Saying to the programmers of those locales, "sorry but if you want
to use the C++ standard library you can only use narrow character file names
rather than ones which you will more easily understand in your own language"
seems really retrograde to me. C++ should at least make it possible. How the
particular implementation handles it is not the language's concern.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 19:05:29 GMT Raw View

Peter Dimov wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D25CB23.9772D5A5@acm.org>...
> > Randy Maddox wrote:
> > >
> > > Now, you have repeatedly stated that these ideas have been considered
> > > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > > could you please share the basis for that rejection?  It's rather
> > > difficult to attempt any refutation without knowing what the technical
> > > arguments are.
> > >
> >
> > Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> > [roughly speaking] name the same file? Now generalize.
>
> I never understood that argument. When do "abcdefghi", "Abcdefghi",
> "abcdefgh", "abcdefghi.", "./abcdefghi", ".\\abcdefghi" name the same
> file?
>

False analogy. "There are some things that vary" isn't the same as
"everything can vary."

> Even better, when do "aebc" and "aebc" name the same file?

Yes, much better. When run in applications on the same system that refer
to the same directory they always name the same file. Unless, of course,
you take the nihilistic view that since the standard doesn't say how the
names should be mapped it's impossible to talk about what file names
mean. Common experience indicates otherwise. On the other hand, there is
no common experience to indicate how any non-trivial wide character
string maps to the name of a file.

Further, "aebc" contains the same characters without regard to OS,
compiler, or locale. L"aebc", on the other hand, is an array of wide
characters whose contents are implementation-defined, possibly depending
on the locale that was set on the system at the time the code was
compiled. So when you say ofstream out(L"aebc") you may be using a
different name from the name used in ifstream in(L"aebc") in an
application compiled with a different compiler.

mbstowcs, of course, maps NTBS's to wide character arrays in accordance
with the locale that is set at the time the function is called. So the
name you get with it can change during execution of a single
application, and you may find that you can't read the data that you
wrote earlier.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 9 Jul 2002 19:24:55 GMT Raw View

Edward Diener wrote:
>
> "Pete Becker" <petebecker@acm.org> wrote in message
> news:3D2A23A4.6DF08E9F@acm.org...
> > Randy Maddox wrote:
> > >
> >
> > Whoops, the example I gave was wrong. Try this:
> >
> > char name[] = "aebc";
> > wchar_t wname0[] = L"aebc";
> > whcar_t wname1[WIDE_NAME_MAX];
>
> wchar_t  wname1[WIDE_NAME_MAX];
>
> > ofstream out(L"aebc");
> > // write some data
> > // change locale
> > mbstate_t mbs = {0};
> > wcstombs("aebc", wname1, WIDE_NAME_MAX, &mbs);
>
> mbstowcs ?

Yup. sorry...

>
> > ifstream in(wname1); // whoops, where'd my data go?
>
> You are saying that the conversion to and from narrow character strings and
> wide character strings is affected by the locale. This is known. But it is a
> programmer's problem, not the language or library.

That's one view. If you can't read your data because the locale changed
it's not very comforting.

> Narrow character file
> names and, in our discussion, theoretical wide character file names are two
> different file names. C++ should not guarantee a conversion between them as
> representing the same actual file nor should it worry about the programmer
> erroneously converting the name back and forth with a resulting different
> file name.

That's one view.

>
> There are two different issues here. One is the file name and two is the
> actual file. C++ has never, to my understanding, tried to adjudicate that
> different file names must not represent the same file or even that the same
> file name must not represent different files. I believe the mapping of a
> file name to an actual file must be implementation dependent and that
> therefore ideas of guaranteed portability as far as file names referring to
> actual files are not apropos to the C++ language specification.

Nobody's talking about guaranteed portability. What's important is
portability in practice, which is will understood for char-based file
names, and, more important, was well understood at the time the C
standard was adopted.

>
> The issue with wide character file names is simply to say that C++ should
> support wide character file names in the C++ standard library and will leave
> it up to the implementation to define how these names are mapped to actual
> files. It should be enough to specify that file name "abc" is not the same
> file name as L"abc" and leave it at that. Whether it is the same file or not
> is something which the C++ standard should not attempt to specify.

Once again: implement it and get some real-world experience with it.

>
> My reason for wanting wide character file names support in the C++ standard
> library is in order to support operating systems and locales where actual
> files are more likely to be a wide character names in some sort of Unicode
> variant. Saying to the programmers of those locales, "sorry but if you want
> to use the C++ standard library you can only use narrow character file names
> rather than ones which you will more easily understand in your own language"
> seems really retrograde to me. C++ should at least make it possible. How the
> particular implementation handles it is not the language's concern.
>

That is one view. Implement it, use it, and see if you still believe it.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 9 Jul 2002 16:14:20 CST Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2A23A4.6DF08E9F@acm.org>...
> Randy Maddox wrote:
> >
>
> Whoops, the example I gave was wrong. Try this:
>
> char name[] = "aebc";
> wchar_t wname0[] = L"aebc";
> whcar_t wname1[WIDE_NAME_MAX];
> ofstream out(L"aebc");
> // write some data
> // change locale
> mbstate_t mbs = {0};
> wcstombs("aebc", wname1, WIDE_NAME_MAX, &mbs);
> ifstream in(wname1); // whoops, where'd my data go?
>
Gosh, Pete, that's a pretty contrived example.  And all you're saying
is that translating a string under different locales may not result in
strings that match.  Why is this an issue for the library?

I could do the same thing under Unix by creating a file using a
lowercase name, then uppercasing that name and trying to open the file
with that uppercased name.  The open would fail.  So what?

A legal file name that matches the name of an existing file can be
used to open that file.  A file name, legal or not, that does not
match the name of an existing file will cause the open to fail.  The
OS determines what "legal file name" and "matches" mean.  The library
doesn't care one way or the other since file names are implementation
defined.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Allan_W@my-dejanews.com (Allan W)
Date: Wed, 10 Jul 2002 00:04:42 GMT Raw View

"Ken Hagan" <K.Hagan@thermoteknix.co.uk> wrote
> Presumably, wcstombs(L"aebc") might be the same file as "aebc".
>
> Now we have two distinct files. Let the end-user do a directory listing.
> Suppose they want to delete one of them. Which do they pick?
>
> In short, what do they see?
>
> > Why should this be handled any differently than the issue of case in
> > file names in Unix?  A match is a match.  A non-match is not.
>
> A match may or may not be a match if one looks at pixels rather than bits.

Okay, let's make this *REALLY* simple.

Change C++ specifications to always allow Unicode filenames. If the
code specifies anything else, it's automatically converted to Unicode
first.

Now, find every single platform that doesn't accept Unicode, and every
single C++ implementation for those platforms. Put a little sticker on
them that says, "non-Standard."

Nobody would object to that, right?

...What? You're saying that a language standard shouldn't make this
type of imposition on the OS? That's like saying that C++ ought to
work on more than one platform! It's preposterous!

:-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Wed, 10 Jul 2002 00:17:30 GMT Raw View

Randy Maddox wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D2A23A4.6DF08E9F@acm.org>...
> > Randy Maddox wrote:
> > >
> >
> > Whoops, the example I gave was wrong. Try this:
> >
> > char name[] = "aebc";
> > wchar_t wname0[] = L"aebc";
> > whcar_t wname1[WIDE_NAME_MAX];
> > ofstream out(L"aebc");
> > // write some data
> > // change locale
> > mbstate_t mbs = {0};
> > wcstombs("aebc", wname1, WIDE_NAME_MAX, &mbs);
> > ifstream in(wname1);  // whoops, where'd my data go?
> >
> Gosh, Pete, that's a pretty contrived example.

Simple examples always look contrived. Use your imagination: this could
easily occur in a large application.

> And all you're saying
> is that translating a string under different locales may not result in
> strings that match.  Why is this an issue for the library?

It is an issue for designing an interface that is reasonably robust. If
you think it's unimportant then implement it, get some real-world
experience with it, and if you still believe it's appropriate, write it
up and propose it.

>
> I could do the same thing under Unix by creating a file using a
> lowercase name, then uppercasing that name and trying to open the file
> with that uppercased name.  The open would fail.  So what?

Lowercase vs. uppercase is well understood in the industry. Can you
explain when the meaning of a file name changes under your proposal?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Anthony Williams" <anthwil@nortelnetworks.com>
Date: Wed, 10 Jul 2002 16:20:50 GMT Raw View

"James Kanze" <kanze@gabi-soft.de> wrote in message
news:d6651fb6.0207090730.9ef039@posting.google.com...
> rmaddox@isicns.com (Randy Maddox) wrote in message
> news:<8c8b368d.0207080546.7412eba7@posting.google.com>...
>
> > Pete Becker <petebecker@acm.org> wrote in message
> > news:<3D25CB23.9772D5A5@acm.org>...
>
> > > Randy Maddox wrote:
>
> > > > Now, you have repeatedly stated that these ideas have been
> > > > considered in the past and rejected as bad ideas.  OK.  I can
> > > > accept that, but could you please share the basis for that
> > > > rejection?  It's rather difficult to attempt any refutation
> > > > without knowing what the technical arguments are.
>
> > > Once again, in miniature: when do "aebc", L"aebc", and
> > > mbstowcs("aebc") [roughly speaking] name the same file? Now
> > > generalize.

On systems that store filenames as MBCS, the actual characters represented
by an MBCS may vary depending on the current locale. Not only that, but
different MBCS filenames may represent the same character sequence, under
different locales. However, once you have an MBCS, it will either match the
filename or not. Except that on a case-insensitive system, case-folding
could be locale-dependent, so two MBCS's may fold the same under one locale,
and differently under another.

On systems that store filenames as Wide character strings, it is likely that
a WCS always represents the same sequence of characters, regardless of
locale. Therefore once you have a WCS, it will either match or not, and
case-folding is largely locale-independent, so this even applies on a
case-insensitive system (though there are issues with single lowercase
characters that match multiple uppercase characters, and a couple of
locale-dependent case-foldings).

Since the conversion of MBCS to WCS (and vice-versa) is locale dependent,
this means that it is not possible to portably use MBCS names on systems
that store WCS filenames, or use WCS names on systems that store MBCS
filenames. In practice, you are OK if you use only a subset of characters
which has a fixed translation between wide character values and narrow
character values --- given the example of "aebc", I would expect it to work
under most locales, as the latin letters tend to have the same values.
However, if the string contained accented characters, or chinese or japanese
characters, I would expect that the MBCS, the WCS literal, and the WCS
obtained from the MBCS may or may not refer to the same name, depending on
the locale.

This problem already exists, in disguise, on a widely used platform ---
Microsoft Windows stores filenames as Unicode. Therefore a given MBCS
filename may or may not refer to the same file at different points in a
program's execution, depending on the currently active locale. This is the
only scenario supported by the current C++ APIs.

Therefore, I do not see that adding APIs that support wide character names
adds any problems that don't already exist. Indeed, on Windows it would mean
that the wide character filenames could be specified, and they _could_
therefore be reliably used to refer to the same file later in the program,
regardless of any locale changes. Admittedly, it would propagate the issue
so that systems with MBCS filenames would now have the problem with wide
character names, but in reality this exists anyway --- if you allow the user
to enter a filename that contains characters that aren't in a subset with a
locale-independent MBCS representation, then the MBCS obtained for the
filename depends on the current locale, and may vary during program
execution, and thus if the user enters the same sequence of characters as a
filename twice in the same program, it may or may not map to the same file.

I think it would be sufficient for the standard to state that opening a file
with a wide character name on systems that don't directly support wide
character filenames is equivalent to opening the file with the MBCS name
obtained by converting the filename under the "current" locale --- where
"current" could imply either the current global locale, or the
currently-imbued locale of the stream used to open the file, or even a fixed
locale (the "C" locale?) --- the committee should decide which. Likewise, on
systems that don't directly support MBCS filenames, opening a file with an
MBCS name is equivalent to opening the file with the WCS name obtained by
converting the filename under the current locale.

Porting software that refers to files to a system that stores filenames
differently may already have an impact, due to the issues mentioned above.
Adding APIs for wide character filenames doesn't actually have any tangible
impact on the portability of such programs, but it might make programs that
use e.g. chinese or japanese characters simpler --- on a system that stores
wide character filenames, why should I have to convert my filename to an
MBCS name, just so the OS can convert it back again? And on a system that
stores MBCS filenames, wide character names have to be converted to MBCS
names somewhere, so why should I have to do it, rather than the C++ library,
or the OS?

> However, the strongest argument I see against premature
> standardization of this is precisely in the responses to Pete's
> posting.  What should be done is obvious.  So obvious that the two
> responders proposed completely different "obvious" semantics.

How true.

> I'd very much like to see some support for wide character file names.
> But I'm not going to push for it until I have some idea what the
> semantics should be.

This is a very sensible standpoint. However, we do need to push for
agreement on what those semantics should be, through discussions such as
this, and those on the Standard Committee's reflector.

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optical Components Ltd
The opinions expressed in this message are not necessarily those of my
employer

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 10 Jul 2002 16:21:48 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D2A1F03.8F1E5341@acm.org>...
> Randy Maddox wrote:
> >
> > Here is a more fundamental question:  Why should the stdlib care about
> > this?  Is it not a question of how the OS deals with this?
>
> What should the name L"aebc" mean when an application is compiled for an
> OS that doesn't support file names with character sizes corresponding to
> wchar_t?
>
> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)

If the OS doesn't support it then it should mean nothing except that
trying to use it to open a file won't work.  That seems simple enough.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rjcox@cix-remove-me-.co.uk (Richard J Cox)
Date: Wed, 10 Jul 2002 16:31:06 GMT Raw View

In article <3D2A1FCD.47FB913B@acm.org>, petebecker@acm.org (Pete Becker)
wrote:

> Richard J Cox wrote:
> >
> > In article <3D25CB23.9772D5A5@acm.org>, petebecker@acm.org (Pete
> > Becker)
> > wrote:
> >
> > > Randy Maddox wrote:
> > > >
> > > > Now, you have repeatedly stated that these ideas have been
> > > > considered
> > > > in the past and rejected as bad ideas.  OK.  I can accept that,
> > > > but
> > > > could you please share the basis for that rejection?  It's rather
> > > > difficult to attempt any refutation without knowing what the
> > > > technical
> > > > arguments are.
> > > >
> > >
> > > Once again, in miniature: when do "aebc", L"aebc", and
> > > mbstowcs("aebc")
> > > [roughly speaking] name the same file? Now generalize.
> > >
> >
> > Exactly when "abc\\def" and "abc/def" mean the same thing. Since all
> > filenames used in library functions are implementation defined (C99
> > 7.19.3/8) the standard need say nothing.
>
> Nope. "abc\\def" means the same thing at any time during the running of
> an application. mbstowcs("aebc") can change when you change locales.

Not as a filename, or rather if used as a filename it is implementation
defined whether it means anything whatsoever.

I.e. where do the standards (C or C++) say that the two calls to fopen
here

  FILE* f = fopen("a", "r");
  fclose(f);
  f = fopen("a", "r");

treat their first parameter in the same way? They don't even say if the
second call in

  FILE* f = fopen("a", "r");
  FILE* g = fopen("a", "r");

will work or not.

> ofstream out(L"aebc");
> // write some data
> // do something that changes the locale
> ifstream in(L"aebc"); // whoops, where's my data?

ofstream out("aebc");
// write some data, close out.
// Don't bother to change the locale.
ifstream in("aebc");

Can I, in standard compliant code, use in to read back what I've just
written to out?

What says that the second file operation acts on the same file as the
first, even with char * arguments?

Unless I'm missing something neither the C++98 or C99 say anything at all
beyond it being up to the implementation.

>From C++98, 27.8.1.3/1-2

  basic_filebuf<charT,traits>* open(
                                  const char* s,
                                  ios_base::openmode mode );
  Effects: If is_open() != false, returns a null pointer. Otherwise,
initializes the filebuf as required. It then opens a file, if
    possible,   whose name is the NTBS s (''as if       by calling
    std::fopen( s, modstr)).

fopen is not defined in C++98, but refers to C89 (which I don't have a
copy of unfortunately) which I understand defines it in the same way as
C99.

C99 section 7.19.5.3

  The fopen function

  Synopsis

    #include <stdio.h>
      FILE *fopen(const char * restrict filename,
                  const char * restrict mode);

  Description

  The fopen function opens the file whose name is the string pointed to by
  filename, and associates a stream with it.

"filename" is not further defined here, and the only definition I can find
of what a filename is, is in 7.19.3/8:

  Functions that open additional (non temporary) files require a file
name,
  which is a string. The rules for composing valid file names are
  implementation-defined. Whether the same file can be simultaneously
  open multiple times is also implementation-defined.

This could (modulo being in a C++0x standard rather than a C0x standard)
become:

  Functions that open additional (non temporary) files require a file
name,
  which is a string of either characters or wide characters depending on
  the definition of the function to which it is passed. The rules for
  composing valid file names are implementation-defined. [...]

fopen gains an overload of:

      FILE *fopen(const wchar_t * filename,
                  const wchar_t * mode);

with the only significant difference in wording being for the mode
definitions (and in this case we simply define that L"r" is equivalent to
"r", L"r+" to "r+", ...

(It might, for the sake of C consistency be better to have fopenw rather
than an overload).

Continuing on this theme filebuf can be extended with overloads of its
open method:

  basic_filebuf<charT,traits>* open(
                                  const wchar_t* s,
                                  ios_base::openmode mode );

  basic_filebuf<charT,traits>* open(
                                  wstring const & ss,
                                  ios_base::openmode mode );

  basic_filebuf<charT,traits>* open(
                                  string const & ss,
                                  ios_base::openmode mode );

The former on the same basis, in terms of fopenw above, and the latter two
as if ss.c_str() had been passed rather than ss.

---

Essentially I am arguing that C++ does not need to define semantics for
file names because it quite evidently works without currently defining any
such semantics.

The question of how a wide character string is treated on a filesystem
with narrow character support only is interesting. However it can only be
answered with knowledge of that file system's namespace. What kind of
characters can be used for filenames?

With the information about the actual host it should be possible to define
a set of consistent semantics. I would argue that the narrowing, if
applicable, should be locale dependent; this should be no worse than

  ostringstream os;
  os << boolalpha << true << " " << false;
  string s = os.str();
  // Change locale.
  istringstream is(s);
  bool a, b;
  is >> boolalpha >> a >> b;

--
rjcox at cix dot co dot uk

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Fri, 5 Jul 2002 20:03:00 GMT Raw View

Randy Maddox wrote:
>
> Now, you have repeatedly stated that these ideas have been considered
> in the past and rejected as bad ideas.  OK.  I can accept that, but
> could you please share the basis for that rejection?  It's rather
> difficult to attempt any refutation without knowing what the technical
> arguments are.
>

Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
[roughly speaking] name the same file? Now generalize.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Sat, 6 Jul 2002 06:19:22 GMT Raw View

"Julian Smith" <jules@REMOVETHIS.op59.net> wrote in message
news:20020704011723.246d5e3a.jules@REMOVETHIS.op59.net...

> The trouble with separate hierachies is that you can no longer catch all
> exceptions with `catch( std::exception& e) {...}'. This is a pretty
> fundamental part of how exception objects are handled (why else would
> there be a std::exception base class?).

Indeed.

> At the risk of repeating myself, I think that the only good way of
> extending fundamental interfaces like std::exception is to implement some
> sort of multimethod-style dispatch.
>
> Whenever I mention these issues on c.s.c++ or c.l.c++.moderated, I get a
> deafening silence in response. If I'm talking nonsense, will someone
> please tell me? Otherwise, will someone at least explain why there is no
> interest in even discussing these issues in more detail?

I don't think it's nonsense, but I do feel that it's largely a nonproblem.
A nul-terminated multibyte string is the nearest thing we have to a lingua
franca in C/C++. You can encode sequences of characters from any large set
with a notation that looks for all the world like a good old-fashioned
NTBS from K&R C. Why gild the lily? And, as others have pointed out, it's
generally bad style to expose thrown messages directly to the end user.
(That's what the Blue Screen of Death is for, in some circles.) So you
can just as easily throw a catalog index and use it to look up the
Croatian version of the nasty message.

Several places within the Standard C++ library have been pointed out as
candidates for augmenting with wide-string equivalents. I have yet to
hear a compelling reason why a multibyte string can't do the job
adequately. (Wide-character filenames are arguably a separable matter.
For them, a number of other issues have been raised and, IMO, not yet
well addressed.)

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Sat, 6 Jul 2002 06:19:50 GMT Raw View

Julian Smith wrote:
[...]
> > > The C++ standards people invented something that, at the time, was the
> best interface they could come up with for exception objects. This
> interface has a pure virtual `const char* std::exception::what() const'

Note that this thing usually "Returns: An implementation-defined NTBS."
[but don't miss P.S. section below and... things like "Notes: The result
of calling what() on the newly constructed object is implementation-defined,
"Notes: The effects of calling what() after assignment are implementation-
defined" aside].

> method. Nowadays, people are asking to change the std::exception interface
> so that it can return wide-character strings.

Don't really know why folks are asking for yet another USELESS and rather
confusing "what". I know for sure, however, that personally I'd be quite
happy if that "what()" would become DEPRECATED. It doesn't help to write
RECOVERY handlers; it doesn't help to provide meaningful info to the USERS
[support folks usually prefer 'core dumps' -- complete (debug-able) program
state recorded >>at throw point<<]. It's 'brain-dead', so to speak.

[...]
> The trouble with separate hierachies is that you can no longer catch all
> exceptions with `catch( std::exception& e) {...}'. This is a pretty
> fundamental part of how exception objects are handled (why else would
> there be a std::exception base class?).

It's just sort-of ``Trojan horse'' [deadly virus] -- so that you could
easily write yourself [and 'enjoy'] code like "catch(const std::exception&
e) { std::cout << e.what(); } or, even more sophisticated: "catch(const
std::exception&) { /**/ throw; }". It's incredibly useful thing [w.r.t.
silly catching], indeed. ;-)

regards,
alexander.

P.S. ISO/IEC 14882:1998(E), 18.6.1 Class exception

"....
 virtual const char* what() const throw();

 8 Returns: An implementation-defined NTBS.

 9 Notes: The message may be a null-terminated
   multibyte string (17.3.2.1.3.2), suitable for
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   conversion and display as a wstring (21.2,
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   22.2.1.5)
 ...."

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rjcox@cix.co.uk (Richard J Cox)
Date: Mon, 8 Jul 2002 16:05:51 GMT Raw View

In article <3D25CB23.9772D5A5@acm.org>, petebecker@acm.org (Pete Becker)
wrote:

> Randy Maddox wrote:
> >
> > Now, you have repeatedly stated that these ideas have been considered
> > in the past and rejected as bad ideas.  OK.  I can accept that, but
> > could you please share the basis for that rejection?  It's rather
> > difficult to attempt any refutation without knowing what the technical
> > arguments are.
> >
>
> Once again, in miniature: when do "aebc", L"aebc", and mbstowcs("aebc")
> [roughly speaking] name the same file? Now generalize.
>

Exactly when "abc\\def" and "abc/def" mean the same thing. Since all
filenames used in library functions are implementation defined (C99
7.19.3/8) the standard need say nothing.

Of course as a QoI issue one would expect that when working with a narrow
filename on UNIODE based file system with a Western European locale the
above names would all be the same.

When moving outside this case, an implementation on a narrow character
based file system, would have two choices, either to
. Raise an error
. Munge the filename into something that would be allowed.

This is much the same as when the implementation is faced with a filename
that is locally invalid due to either length of filenames (e.g.
"ABCDEFGHI.WXYZ" on FAT under MS-DOS), or when using an invalid character
(like, in most cases. ':' on all FAT/NTFS type file systems).

Since neither C99 nor C++98 impose what to do with an invalid filename
each implementation is free to do something that is sensible locally...
for example

x = fopen(L"WIDE", L"r")

could well be mapped to fopen(mbstowcs(L"WIDE"), mbstowcs(L"r")), if the
resultant filename is invalid it would be treated in the normal way for an
invalid filename in that circumstance.

OTOH on a wide character enabled file system, the wide versions of the
functions could be the native ones and the char * interfaces use a locale
relative widening.

Richard

--
rjcox at cix dot co dot uk

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Wed, 3 Jul 2002 00:20:45 GMT Raw View

P.J. Plauger wrote:

> "Daniel Miller" <daniel.miller@tellabs.com> wrote in message news:3D20F=
54E.5050004@tellabs.com...
>=20
>=20
>>>The C++ committee's Library Working Group just discussed the issue of
>>>wide-character file names again in some detail this past Spring, both
>>>on the committee's library reflector and at the Curacao meeting.
>>>
>>>The concensus was clear that (1) file names based on types other than
>>>char are extremely non-portable,
>>>
>>   File names based on a sequence of octets are extremely nonportable
>>too!  (But that unfortunate fact doesn't seem to stop anyone over the
>>past decade from having standards-based filesystem operations.)
>>
>=20
> Indeed not, because there is a nontrivial portable *subset* of filename=
s
> you can use when you have to specify filenames for a program you intend
> to keep highly portable.
>=20
>=20
>>                                                                On
>>MSDOS the file name is restricted to <=3D8 characters followed by an
>>optional <=3D3 characters preceded by a period.  No other operating sys=
tem
>>has that restriction.
>>
>=20
> MSDOS no longer has that restriction. At the time the C Standard was
> developed, *several* operating systems had similar restrictions. That's
> why the C Standard requires only 6.2 names, which are even more restric=
tive
> than the 8.3 of DOS and its predecessors.

   Similar approaches to marking out a safe/portable=20
to-be-well-travelled path can be devised for wide-character filesystem=20
identifiers.  (And for wide-character string processing in general.)

>>                       The System V file-system have a maximum
>>14-character length limit per component.  Other operating systems had a
>>32-character length limit per component.  Each operating system has its
>>own peculiar restrictions regarding which characters are permitted
>>versus prohibited and which one must be escaped to avoid special
>>meaning, as well as how they must be escaped.
>>
>=20
> Indeed, just as each computer has its own restriction on the number of
> bits in a long double mantissa. The idea of standards is to define a
> reasonable common denominator that you can rely on across systems. Wher=
e
> that RCD is not yet clear, standardization is premature.

   Until there is either standardization of wide-character string=20
processing in general and of wide-character filesystem-identifiers in=20
particular or an analogous widely-disseminated set of=20
not-(yet-)standardized libraries, the support for wide character sets in=20
many modern operating systems goes largely unused by the bulk of C and=20
C++ software.

>>                                             Wide-character
>>filesystem-identifiers continues this multi-decade-practiced
>>hardly-portable tradition.
>>
>=20
> And add to the problem in ways that have no widely accepted solutions.

   Popularity is but one metric of what constitutes a good idea.=20
Popularity up to and including a point of time is largely a posteriori=20
(or in other words: a descriptive technique).

   An alternative metric is to judge whether a design forms a cohesive=20
consistent system of thought which handles all (or a vast subset of) the=20
cases.  C and C++ are not popular so much because they were previously=20
popular as much as because they were well-thought-out on Day One to=20
service a particular interesting set of goals.  This is largely a priori=20
(or in some sense, prescriptive: this is how the problem ought to be=20
solved well).  The designer through experience & insight & creative=20
fortitude figures out what axiom system would be useful to express a=20
solution-space covering the intended problem space.  The usefulness=20
exhibited by the C and C++ languages and UNIX operating system was=20
largely designed a priori by individuals of immense creative vision,=20
instead of accidently stumbled across after the fact via endorsing that=20
era's market popularity or via political lobbying.

>>>(2) there are no agreed upon
>>>semantics for conversion between wide-character and narrow-character
>>>names for file systems which do not support wide-character name,
>>>
>>   Instead of using this as an excuse for a scorched-earth approach to
>>the topic,
>>
>=20
> Who's scorching earth?

   By dismissing an entire topic due to a few annoying trouble spots=20
(which I consider exaggerated), the entire topic does not get=20
standardized at all.  Leaving the divergent tribes in a=20
disorganized/proprietary state-of-affairs without the=20
unification-of-vision permitted by a well-founded civilization.  (In=20
computer technology, a well-founded civilization is usually created by=20
either a standards body or a centrally-controlled cultural movement such=20
as UNIX in the 1970s & 1980s.)  As long a topic remains in a tribal=20
state-of-affairs, the cultural traditions of a civilization do not=20
develop and thus a single cohesive existing practice does not usually=20
develop due to the tribes going their separate ways as they wander in=20
the desert of a multitude of inferior solutions or, more commonly, going=20
nowhere at all.  By focusing on a can't-do viewpoint, innovation is=20
suppressed by the prevailing negativity.  Thus such a topic does not get=20
standardized and without standardization languishes in unpopular=20
obscurity of a perceived intellectual back-water (because people tend to=20
gravitate to what is already extant for effort-reduction reasons).  By=20
focusing on a can-do viewpoint, the well-thought-out ways of handling a=20
topic can be either 1) devised as a side effect of an optimistic free &=20
open thought process or 2) the good ideas in intellectual Backwaterville=20
can be harvested to be brought to the benefit of a larger community.

> We went to lots of meetings and discussed this
> point on more than one occasion.

   Without a strong leadership position from some cartographer, few=20
individual explorers will chart original courses into the sea either of=20
wide character-set strings in general or of wide-character=20
filesystem-identifiers in particular.  Most times discovery of the New=20
World requires a Queen Isabella facilitator to foster the correct=20
environment for discovery (as opposed on the other hand to the bickering=20
political gridlock of partisan politicians in a legislature).  If the=20
waters are left uncharted (or sparsely charted by amateur=20
cartographers), few ships bravely embark for the new world, due to=20
risk-avoidance.

   I ask that WG21 (and subcomponents) foster on all topics an attitude=20
of 1) encouraging new research into topics for which the current=20
candidates are lack-luster or nonexistent or 2) encouraging harvesting=20
existing good work which is less well-known (even if that good work=20
travels to WG21 along an unexpected avenue).

 > Didn't see you there.

   Would you like to see me there?

>>          then simply don't convert between the two in the C++
>>language.  Char-based filesystem-identifiers should stay char-based
>>without conversion.  Wide-character-based filesystem-identifiers should
>>stay wide-character-based without conversion.
>>
>=20
> I got that *you* find this an acceptable solution.

   yes

> Others have not, so far.

   Why have these other people been insisting on such conversions?=20
Given a Chinese language filename (in written ideographs), I do not see=20
any interesting transliteration conversions to octet-sized char-strings=20
other than:
   1) bopomofo, a Nationalist Chinese phonetic nonWestern alphabet which=20
is not widely used outside of the Republic of China.  This scheme would=20
suffer from homophone conflation.
   2) pinyin romanization, a Communist Chinese phonetic roman alphabet=20
(with diareses/umlauts) which is accessible to Western European speakers=20
given a little training (e.g., c is as in Lithuanian: a TS sound; x is=20
like SH; q is like CH; r is approximated by English's meaSURE).  This=20
scheme would suffer from homophone conflation.
   3) Gwoyeh Romatzyh, a Nationalist Chinese phonetic roman alphabet.=20
This scheme would suffer from homophone conflation.
   4) Wade-Giles romanization, a precursor of pinyin popular in the=20
English-speaking world.  This scheme would suffer from homophone conflati=
on.
   5) Ecole Fran=E7ais de l'Ext=EAme-Orient used in France.  This scheme=20
would suffer from homophone conflation.
   6) Lessing system used in Germany.  This scheme would suffer from=20
homophone conflation.
   7) translation to English et al.  This scheme would suffer from=20
semantic conflation & dispersion.

http://lcweb.loc.gov/catdir/pinyin/romcover.html
http://www.wlu.edu/~hhill/tlit.html
http://www.whiteclouds.com/iclc/cliej/cl4ao.htm

   But all of these useful aforementioned transliterations are different=20
than simple character-set mappings in the normal sense.  They are all=20
prohibitive higher-level congitive operations: either conflative=20
phonetic romanization or full-blown semantic translation similar to=20
Babblefish on the WWW.  I would claim that this cognitive-layer versus=20
mechanical-layer mismatch is at the heart of the complications arising=20
for the (gratuitous) desire for mechanized mapping of, say, Chinese=20
ideographs into ASCII.

   Because of these extreme complications which beget complete inaction=20
on the whole topic, I would say that this is a textbook case for the=20
KISS principle: do not down-convert from wide-character strings to=20
octet-based strings.  (Up-conversion seems much less problematic for=20
popular octet-based character-sets to Unicode, but if it has=20
complications for, say hypothetically, Cyrillic ISO8859 to Big 5=20
Chinese, throw up-conversion overboard too to remove the gratuitous=20
complexity.)

>>   If an operating system is unfit to support the optional
>>wide-character filesystem identifiers, then that option in C++0x would
>>be turned off on that platform.  There would be some #include <options>
>>header which would indicate which optional portions of C++0x are presen=
t
>>versus absent on a particular platform.
>>
>=20
> #include <options>
> ...
> #if AINT_GOT_NO_WIDE_CHARACTER_FILE_NAMES
>     <<do what?>>

   Use the=20
char-/octet-/ISO8859-/ASCII/POSIX-Portable-Character-Set-based filenames=20
which we do today.

>>>(3) even the committee members from the international community most
>>>interested in wide-character names don't think that wide-character
>>>names are a good idea in the context of the standard library.
>>>
>>   Why?  Are they fundamentally opposed to having file names in
>>non-ISO-8859 languages?  Do they prefer multiple proprietary approaches
>>instead of a single C++0x-based approach?  Or do they prefer having no
>>approach at all as a way of suppressing wide-character
>>filesystem-identifiers via starvation of Unicode?
>>
>=20
> You've cited three less-than-admirable possible reasons. Some people ma=
y
> share some or all of those views. Others may have other reasons.

   Again, what are those reasons?  Beman Dawes's statement (quoted as=20
(3) above) implies that certain international members of WG21 have=20
substantial reasons for why wide-character filesystem-identifiers are=20
(at the broadly wholesale level) not "a good idea in the context of the=20
standard library".

> One
> way to find out is to make a proposal and defend it. Perhaps you'll fin=
d
> that people will like it. Or perhaps you'll find that people will oppos=
e
> it for reasons that may even border on the rational. One way to find ou=
t.

   I agree.  Other positive & productive variations on this theme exist=20
as well.

>>>Wide-character file names would provide an illusion of portability
>>>where portability does not in fact exist. Behavior would be completely
>>>different on operating systems (Windows, for example) that support
>>>wide-character names, than on systems which don't.
>>>
>>   The C++0x option to support wide-character filesystem-identifiers
>>should be turned off for an operating systems which does not support
>>wide character names.  The portability of wide-character
>>filesystem-identifiers is only among operating systems which support
>>that feature in the first place. Likewise with any optional portion of
>>C++0x: the portability is only among platforms which support that
>>optionality.  A platform which does not support the optionality does no=
t
>>participate in the portability.
>>
>=20
> Again, that's one way to get a weak form of standardization. Others
> prefer to define a feature in such a way that it has a meaningful
> implementation on most or all systems that currently support C++.
> (Vendors who are excluded from full conformance naturally object to
> the loss of cachet. Not to mention contracts.)

   Loss of cachet and loss of contracts can apply a desirable=20
survival-of-the-fittest Darwinian stressor to that vendor.

>>   I will grant you that some operating system might permit only one
>>Unicode encoding (e.g., UTF-8) whereas another operating system might
>>permit only a different Unicode encoding (e.g., UTF-32).  Handling that
>>difference between Unicode encodings is not merely a deficiency of some
>>wide-character filesystem-identifier proposal, but rather a gross
>>deficiency in C++ (and C) in general.  For an interesting attempt at
>>solving aspects of this problem, take a look at IBM's ICU library.
>>
>=20
> Actually, we worried more about the opposite problem. C++ (and C) in
> general let you work with a number of different wide-character and
> multibyte encodings, all within the same program. IBM's ICU library is
> just one, non-standard, approach to managing part of the complexity
> that causes. The unsolved problem is determining how best to specify
> which of a host of possible encodings, or interconversions, to apply
> in a given context. This rather gross deficiency stems from a
> worldwide lack of experience; it is not specifically a failure of
> Standard C or Standard C++. They're just the commonest tractable
> host languages.

   Failure of the sum total of the C++ community to provide=20
widely-disseminated well-thought-out solutions for each category of=20
problem tars & feathers the entire C++ community without ability of one=20
subcommunity to blame another.  Here "C++ community" includes the=20
standards bodies as well as the various open source movements among other=
s.

>>>So until someone steps forward with an acceptable portable
>>>specification for wide-character file names, the idea is a
>>>non-starter.
>>>
>>   Because not all operating systems support multithreaded
>>programming,---using the line of reasoning presented along this
>>thread---MT support in C++0x would likewise merely "provide an illusion
>>of portability where portability does not in fact exist".  Will MT be a
>>"nonstarter" too for similar reasons?
>>
>=20
> Depends on how it's specified. Boost has a C++ package, which Dinkumwar=
e
> has extended to include C as well, that lets us write the kind of
> portable multithreaded code we find commercially useful. It helps
> that MT is particularly easy to implement in a system that supports
> at most one thread of control...

   I agree that interthread synchronization points degenerate to nothing=20
in a strictly single-threaded environment.  But conversely, the concepts=20
of start-a-new-thread and thread-join for example, have no (portable)=20
corresponding substantially-similar behavior on strictly single-threaded=20
environments (e.g., nonpreemptive event loops; multiple address-spaces=20
each with its one thread of control).  Thus I reinterate my question:=20
if portions of MT can optionally be supported on certain platforms but=20
cannot on others, then that would be largely the same=20
optionality-in-C++0x situation as supporting wide-character=20
filesystem-identifiers on certain platforms but not on others.  How can=20
such warm embracement of MT (not necessarily Boost.threads) be justified=20
vis a vis the cold dismissal of wide-character filesystem-identifiers?

> People can debate whether the Boost approach is suitable for adding
> to the C++ Standard, but it's clearly a candidate.

   I was discussing the larger topic of MT independent of Boost.threads.

   In fact in my postings on this thread, beyond merely MT & wide-chars,=20
I am discussing the even larger topic of which guiding principles will=20
C++0x have for topics which are supported on one platform but not on=20
others.  Multiple techniques exist, including:
   1) the (misnamed) lowest-common denominator:  The worst boat-anchor=20
platforms drag all of the other greater-capability platforms down so=20
that the standardized form of the topic lives down to a low-aspiration go=
al.
   2) the can-do stretch-goal:  The worst boat-anchor platforms are=20
laboriously brought up to match the feature-set which is used with ease=20
in the majority of platforms.
   3) Draconian scorched earth:  The worst boat-anchor platforms cause=20
the topic to not be standardized at all.  Even worse is if lack of=20
leadership (e.g., standardization) overtly causes unpopularity of the top=
ic.
   4) overlapping go-forth-&-prosper alternatives:  Multiple=20
alternatives are standardized, where each alternative serves a niche set=20
of applications best.  The applying engineer then chooses among the=20
alternatives for the one which best matches each niche application.  (I=20
claim that this is what the POSIX threads & real-time extensions have=20
done: standardize the set {mutex, spinlock, semaphore} because each has=20
its strengths & weaknesses & problem-space for which it is the best or=20
only solution.)
   5) optionality:  Platforms which are fit to support a topic have the=20
topic, whereas unfit platforms opt out.  Portability is between only fit=20
platforms for each topic.  (This also is what POSIX threads & real-time=20
extensions have done.)
   6) the vagueness criterion:  Don't really do anything tangible, but=20
rather only standardize a vague framework for the topic which itself is=20
not terribly useful without some form of add-on beyond what it standardiz=
ed.

   Deciding on one or more such governing principles will permit those=20
principles to guide what to do with internationalization,=20
multithreading, network protocols, and many other topics which are=20
likely to come up in C++0x standardization (especially library=20
standardization).

   For example, there has been stated a desire to standardize some form=20
of network protocol library in C++0x.  Would that networking library=20
limit itself to IPv4 because not all platforms can support IPv6?  Would=20
networking library limit itself to only IP, TCP, UDP, and its=20
IETF-specified compatriots?  Would that networking library limit itself=20
to vaguely describing wispy etherial network programmingness without=20
bothering itself with any particular protocol's conventions (which would=20
then be brought down to reality in a proprietarian manner or each=20
app-domain roles its own)?  Would the lack of universiality of IP versus=20
OSI protocol-stacks or the lack of universiality IPv4 versus IPv6 (or=20
IPv4 VPNs) cause everyone to throw up their hands and not standardize=20
anything at all?

>>                                     If not, then please explain how
>>operating systems which support MT (versus those which do not) permit a=
n
>>entirely different warmly-accomodating attitude, whereas operating
>>systems which support wide-character filesystem-identifiers (versus
>>those which do not) beget a far harsher attitude.
[...snip..]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hans Bos" <hans.bos@xelion.nl>
Date: Wed, 3 Jul 2002 00:32:14 GMT Raw View

"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:3d219926$0$21242$724ebb72@reader2.ash.ops.us.uu.net...
> "Daniel Miller" <daniel.miller@tellabs.com> wrote in message
news:3D20F54E.5050004@tellabs.com...
>
> > > The C++ committee's Library Working Group just discussed the issue of
> > > wide-character file names again in some detail this past Spring, both
> > > on the committee's library reflector and at the Curacao meeting.
> > >
> > > The concensus was clear that (1) file names based on types other than
> > > char are extremely non-portable,
> >
> >    File names based on a sequence of octets are extremely nonportable
> > too!  (But that unfortunate fact doesn't seem to stop anyone over the
> > past decade from having standards-based filesystem operations.)
>
> Indeed not, because there is a nontrivial portable *subset* of filenames
> you can use when you have to specify filenames for a program you intend
> to keep highly portable.

The portability of file names is not because of the C++ standard.
In C90 (iostream is defined in terms of C90 FILE *) 7.9.3 states:
    The rules for composing valid file names are implementation-defined.

So an implementation that allows only one letter files is conforming to the
standard as long as it is documented.

>
> >                                                                 On
> > MSDOS the file name is restricted to <=8 characters followed by an
> > optional <=3 characters preceded by a period.  No other operating system
> > has that restriction.
>
> MSDOS no longer has that restriction. At the time the C Standard was
> developed, *several* operating systems had similar restrictions. That's
> why the C Standard requires only 6.2 names, which are even more
restrictive
> than the 8.3 of DOS and its predecessors.

I think you are confusing #include names (in C90:6.8.2 says you can use 6.1
names) with the stdio file name rules.
This says nothing about file names you can use with fopen.

Since the rules for char * names are implementation defined I don't see the
big problem in using wchar_t * names with the same rules (implementation
defined).

In that case "char *" file names are as portable as "wchar_t *" names (as
far as the C++ standard is concerned).
At least you have a portable interface when you need wide character names.

On the other hand, may be this is not the place to invent new C++ libraries.
People who need an interface for wide character file names can also complain
to their compiler/library vendors.
If someone can convince Microsoft that a wide character interface is needed
for iostream, Dinkumware will put it into their libraries.
Eventually this may make it into the standard.

Greetings,
Hans.



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 3 Jul 2002 12:09:49 CST Raw View

"Daniel Miller" <daniel.miller@tellabs.com> wrote in message news:3D22428F.2060500@tellabs.com...

   Until there is either standardization of wide-character string
processing in general and of wide-character filesystem-identifiers in
particular or an analogous widely-disseminated set of
not-(yet-)standardized libraries, the support for wide character sets in
many modern operating systems goes largely unused by the bulk of C and
C++ software.

[pjp] Yep. And vendors give customers what they ask for when they
ask for it.

   Popularity is but one metric of what constitutes a good idea.
Popularity up to and including a point of time is largely a posteriori
(or in other words: a descriptive technique).

   An alternative metric is to judge whether a design forms a cohesive
consistent system of thought which handles all (or a vast subset of) the
cases.  C and C++ are not popular so much because they were previously
popular as much as because they were well-thought-out on Day One to
service a particular interesting set of goals.  This is largely a priori
(or in some sense, prescriptive: this is how the problem ought to be
solved well).  The designer through experience & insight & creative
fortitude figures out what axiom system would be useful to express a
solution-space covering the intended problem space.  The usefulness
exhibited by the C and C++ languages and UNIX operating system was
largely designed a priori by individuals of immense creative vision,
instead of accidently stumbled across after the fact via endorsing that
era's market popularity or via political lobbying.

[pjp] Sorry, but that's not how C, Unix, or C++ evolved. I've worked
with the first two from the earliest days, and I saw *lots* of changes.
Yes, C and Unix had good firm foundations, but what made them both
so successful was the rapid, continuous, and pragmatic give and take
between designers, implementors, and users -- particularly in the
early days before the user community got too large to suffer change
gladly. C++ has a similar history, from what I've seen.

> Who's scorching earth?

   By dismissing an entire topic due to a few annoying trouble spots
(which I consider exaggerated), the entire topic does not get
standardized at all.

[pjp] If that's your idea of scorched earth, I suggest you avoid
serious games of shuffleboard.

                    Leaving the divergent tribes in a
disorganized/proprietary state-of-affairs without the
unification-of-vision permitted by a well-founded civilization.

[pjp] [Reams of similar rhetoric omitted.]

 > Didn't see you there.

   Would you like to see me there?

[pjp] Truth to tell, I'm indifferent. But if you did attend C++
meetings, you'd have a greater chance to argue your cause.
(BTW, you'd probably get 10 to 20 minutes of subcommittee time
to ride your hobby, in trade for 4+ days of work on a variety of
topics.)


[pjp] [Reams of similar rhetoric omitted, including phrases like
``conflative phonetic romanization'' which I happened to kinda
like.]

> You've cited three less-than-admirable possible reasons. Some people may
> share some or all of those views. Others may have other reasons.

   Again, what are those reasons?  Beman Dawes's statement (quoted as
(3) above) implies that certain international members of WG21 have
substantial reasons for why wide-character filesystem-identifiers are
(at the broadly wholesale level) not "a good idea in the context of the
standard library".

[pjp] I recall several additional reasons advanced earlier in this
thread. Review it if you're looking for more input. I don't feel
obliged to list all possible reasons, good and bad.

> Again, that's one way to get a weak form of standardization. Others
> prefer to define a feature in such a way that it has a meaningful
> implementation on most or all systems that currently support C++.
> (Vendors who are excluded from full conformance naturally object to
> the loss of cachet. Not to mention contracts.)

   Loss of cachet and loss of contracts can apply a desirable
survival-of-the-fittest Darwinian stressor to that vendor.

[pjp] Yeah, the C++ committee tried that with export. And with
putting the C library in namespace std. Look what it's got 'em
so far.

>           The unsolved problem is determining how best to specify
> which of a host of possible encodings, or interconversions, to apply
> in a given context. This rather gross deficiency stems from a
> worldwide lack of experience; it is not specifically a failure of
> Standard C or Standard C++. They're just the commonest tractable
> host languages.

   Failure of the sum total of the C++ community to provide
widely-disseminated well-thought-out solutions for each category of
problem tars & feathers the entire C++ community without ability of one
subcommunity to blame another.  Here "C++ community" includes the
standards bodies as well as the various open source movements among others.

[pjp] I wasn't spreading tar and feathers; I was merely observing that
no good solutions have been found in this area. That said, the *last*
group I want to hammer out a trial solution is a standards committee.
Been there, done that, too many times.

   I agree that interthread synchronization points degenerate to nothing
in a strictly single-threaded environment.  But conversely, the concepts
of start-a-new-thread and thread-join for example, have no (portable)
corresponding substantially-similar behavior on strictly single-threaded
environments (e.g., nonpreemptive event loops; multiple address-spaces
each with its one thread of control).

[pjp] Packed decimal is not implemented in hardware on all popular
architectures, either. And that's one reason why it's not part of the
C Standard. You don't have to standardize *everything*. Nor should you
try.

                                     Thus I reinterate my question:
if portions of MT can optionally be supported on certain platforms but
cannot on others, then that would be largely the same
optionality-in-C++0x situation as supporting wide-character
filesystem-identifiers on certain platforms but not on others.  How can
such warm embracement of MT (not necessarily Boost.threads) be justified
vis a vis the cold dismissal of wide-character filesystem-identifiers?

[pjp] I know of no pending MT proposal that takes this Chinese menu
approach. And I'd certainly be cold to one if it came up. Equally,
I'd probably be warm to a wide-character filename proposal that had
reasonable semantics on all popular operating systems.

   I was discussing the larger topic of MT independent of Boost.threads.

   In fact in my postings on this thread, beyond merely MT & wide-chars,
I am discussing the even larger topic of which guiding principles will
C++0x have for topics which are supported on one platform but not on
others.  Multiple techniques exist, including:

[pjp] [More reams omitted.] Yes, there are many guiding principles
for deciding what to standardize and how. They even conflict with
each other. The process of choosing which to apply in each case is
nontrivial, but it almost always ends in deliberation at a C++
committee meeting. And someone is certain to be unhappy with every
nontrivial decision made.

Deal.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 3 Jul 2002 17:10:20 GMT Raw View

"Hans Bos" <hans.bos@xelion.nl> wrote in message
news:3d2234e1$0$12287$e4fe514c@dreader4.news.xs4all.nl...

> I think you are confusing #include names (in C90:6.8.2 says you can use 6.1
> names) with the stdio file name rules.
> This says nothing about file names you can use with fopen.

Could be. Been 20 years since I wrote and/or edited that stuff for the
C Standard.

> Since the rules for char * names are implementation defined I don't see the
> big problem in using wchar_t * names with the same rules (implementation
> defined).

The slight difference is that every OS I know that supports hosted C or C++
at all has a history of supporting nul-terminated char sequences. I seriously
doubt any proposed addition to C++ will get much support unless it can be
mapped to that universal method of opening files.

> On the other hand, may be this is not the place to invent new C++ libraries.

Indeed. If only that principle were better adhered to in the past.

> People who need an interface for wide character file names can also complain
> to their compiler/library vendors.
> If someone can convince Microsoft that a wide character interface is needed
> for iostream, Dinkumware will put it into their libraries.

That's a pretty safe bet.

> Eventually this may make it into the standard.

And that's a good way to prove in ideas before ``standardizing'' them,
as I have been saying all along.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 3 Jul 2002 17:36:07 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D20F16D.CF4B1ECC@acm.org>...
> Randy Maddox wrote:
> > Perhaps the C++ library group should take a look at this and decide whether
> > or not to pursue it further.
> >
>
> As Beman Dawes pointed out, we have looked at it and found it wanting.
> If you're serious about this you should implement it, get people to use
> it in non-trivial applications, document it, and propose it.
>
> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)

Let's see if I've got this right.  What you seem to be saying is that
any idea that has not been implemented and used in real applications
is not worthy of consideration.  Now I don't know about all
developers, but I do know that a lot of us may have some very good
ideas, but at the same time not have either the time or the resources
to implement those ideas.  You would disqualify all such ideas because
of that.

I certainly could implement the suggestions I have made.  We have the
library source code and I could dig into it and make the suggested
changes.  But I doubt very much whether my employer would approve of
my taking the time away from our schedule, or of my mucking about with
the standard library used by our code.  Because of that constraint you
are not willing to consider my idea.

So be it.  I think your concept is wrong-headed and needlessy
discouraging to developers who work with C++ on a daily basis and may
have some good thoughts about how the language might best evolve to
meet their real needs.  At any rate, you have certainly succeeded in
discouraging me.  I have no intention of participating further in this
discussion.  I've already wasted enough time that was clearly not
appreciated.

Thanks.  You win.

Randy.

>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

======================================= MODERATOR'S COMMENT:
 Please be careful to avoid personal flames.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Wed, 3 Jul 2002 21:01:24 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0207030932.5f99f2b5@posting.google.com...

> Let's see if I've got this right.  What you seem to be saying is that
> any idea that has not been implemented and used in real applications
> is not worthy of consideration.

As an addition to an International Standard, yes. Where committees have
deviated from this policy, they've usually paid a high price. (Witness
export and namespace.)

>                                 Now I don't know about all
> developers, but I do know that a lot of us may have some very good
> ideas, but at the same time not have either the time or the resources
> to implement those ideas.  You would disqualify all such ideas because
> of that.

Yes. The world is full of ideas, some of them good. Few of even the
best ideas merit standardization.

> I certainly could implement the suggestions I have made.  We have the
> library source code and I could dig into it and make the suggested
> changes.  But I doubt very much whether my employer would approve of
> my taking the time away from our schedule, or of my mucking about with
> the standard library used by our code.  Because of that constraint you
> are not willing to consider my idea.

We're willing to consider it. We've already done so and rejected it.

> So be it.  I think your concept is wrong-headed and needlessy
> discouraging to developers who work with C++ on a daily basis and may
> have some good thoughts about how the language might best evolve to
> meet their real needs.  At any rate, you have certainly succeeded in
> discouraging me.  I have no intention of participating further in this
> discussion.  I've already wasted enough time that was clearly not
> appreciated.
>
> Thanks.  You win.

No. You lose. This is not a zero-sum game.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "David Abrahams" <david.abrahams@rcn.com>
Date: Wed, 3 Jul 2002 22:56:40 GMT Raw View

"P.J. Plauger" <pjp@dinkumware.com> wrote in message
news:3d2364b0$0$14258$4c41069e@reader1.ash.ops.us.uu.net...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:8c8b368d.0207030932.5f99f2b5@posting.google.com...

> > I certainly could implement the suggestions I have made.  We have the
> > library source code and I could dig into it and make the suggested
> > changes.  But I doubt very much whether my employer would approve of
> > my taking the time away from our schedule, or of my mucking about with
> > the standard library used by our code.  Because of that constraint you
> > are not willing to consider my idea.
>
> We're willing to consider it. We've already done so and rejected it.

Just to put a little more of a detached perspective on this, the members of
the LWG are volunteers (as you would be, Randy, if you were going to
meetings). We all have employers to whom we need to justify the time we
spend on considering ideas. It's pretty hard to make a case for spending
time going back to consider ideas that have already been hashed through and
rejected, when there are plenty of other difficult issues on which to make
progress.

-Dave



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: 3 Jul 2002 23:25:08 GMT Raw View

Randy Maddox wrote:
>
> Let's see if I've got this right.

You've got it wrong.

> What you seem to be saying is that
> any idea that has not been implemented and used in real applications
> is not worthy of consideration.  Now I don't know about all
> developers, but I do know that a lot of us may have some very good
> ideas, but at the same time not have either the time or the resources
> to implement those ideas.  You would disqualify all such ideas because
> of that.

If you're not willing to put in the time you've got no basis to complain
that somebody else didn't do it for you.

>
> I certainly could implement the suggestions I have made.  We have the
> library source code and I could dig into it and make the suggested
> changes.  But I doubt very much whether my employer would approve of
> my taking the time away from our schedule, or of my mucking about with
> the standard library used by our code.  Because of that constraint you
> are not willing to consider my idea.

Nope. I'm not willing to consider it because, after fifteen years of
looking at internationalization, I still don't know how to do it.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Julian Smith <jules@REMOVETHIS.op59.net>
Date: Thu, 4 Jul 2002 00:26:09 GMT Raw View

On Mon, 1 Jul 2002 16:57:23 GMT
rmaddox@isicns.com (Randy Maddox) wrote:

> Julian Smith <jules@REMOVETHIS.op59.net> wrote in message
news:<20020628113056.531fee56.jules@REMOVETHIS.op59.net>...
> > On Thu, 27 Jun 2002 20:27:24 GMT
> > Michiel.Salters@cmg.nl (Michiel Salters) wrote:

> > > I've just reread the thread, and I'm still undecided about the
wchar_t
> > > exceptions. Clearly we want to keep a single hierarchy, but having
only
> > > chars is starting to hurt. Up to 1999 the Netherlands had no
problems
> > > with ASCII, but today throwing a domain_error with a Euro-sign in
the
> > > what() is quite reasonable. That's not always present in the narrow
> > > character set (must be ISO8859-15, -1 is still common). A wchar_t
> > > interface coud help here.
> >
> > I think the problem here is that C++ interfaces aren't capable of
doing what people want:
> >
> > The C++ standards people invented something that, at the time, was the
best interface they could come up with for exception objects. This
interface has a pure virtual `const char* std::exception::what() const'
method. Nowadays, people are asking to change the std::exception interface
so that it can return wide-character strings. We still want old code to
recompile and run as before though.
>
> I personally am most definitely NOT asking to modify in any way the
> interface provided by std::exception.  Instead, what I am suggesting

I know this.

I wasn't responding to one of your posts - I was replying to Michiel
Salters's post about the possibilities of adding a `virtual wchar_t*
wwhat()' method. to std::exception.


> The entire std::exception hierarchy would then be EXACTLY as it is now
> with no change required to any existing code.  Those who need wide
> character exceptions could use them.  Those who don't could be
> blissfully unaware of their existence.  And, just as with string and
> wstring, we have no need to conjoin functionality that does not go
> together.  Instead we have separate class hierarchies to address
> separate needs.  Given which the remainder of the points in this
> original posting become moot.  There is no need to expand the
> interface provided by std::exception, no multimethods, no magic
> linkers, etc.  All of that imagined complexity is just another red
> herring in this discussion.

The trouble with separate hierachies is that you can no longer catch all
exceptions with `catch( std::exception& e) {...}'. This is a pretty
fundamental part of how exception objects are handled (why else would
there be a std::exception base class?).

At the risk of repeating myself, I think that the only good way of
extending fundamental interfaces like std::exception is to implement some
sort of multimethod-style dispatch.

Whenever I mention these issues on c.s.c++ or c.l.c++.moderated, I get a
deafening silence in response. If I'm talking nonsense, will someone
please tell me? Otherwise, will someone at least explain why there is no
interest in even discussing these issues in more detail?

- Julian

--
http://www.op59.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Thu, 4 Jul 2002 17:52:41 GMT Raw View

Pete Becker wrote:
[...]
> Nope. I'm not willing to consider it because, after fifteen years of
> looking at internationalization, I still don't know how to do it.

After fifteen minutes [or so] looking at POSIX standard [freely
available on the Net, BTW], I already know how to do it. ;-) ;-)

http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap03.html#tag_01_03_00_46

"Portable Filename Character Set

 The encoding of this character set is not specified-specifically,
 ASCII is not required. But the implementation must provide a unique
 character code for each of the printable graphics specified by POSIX.1;
 see also Filenames.

 Situations where characters beyond the portable filename character set
 (or historically ASCII or the ISO/IEC 646:1991 standard) would be used
 (in a context where the portable filename character set or  the ISO/IEC
 646:1991 standard is required by POSIX.1) are expected to be common.
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^

 Although such a situation renders the use technically non-compliant,
 mutual agreement among the users of an extended character set will
 make such use portable between those users. Such a mutual agreement
 could be formalized as an optional extension to POSIX.1. (Making it
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 required would eliminate too many possible systems, as even those
 systems using the ISO/IEC 646:1991 standard as a base character
 set extend their character sets for Western Europe and the rest
 of the world in different ways.)

 Nothing in POSIX.1 is intended to preclude the use of extended characters
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 where interchange is not required or where mutual agreement is obtained.
 It has been suggested that in several places "should" be used instead of
 "shall". Because (in the worst case) use of any character beyond the
 portable filename character set would render the program or data not
 portable to all possible systems, no extensions are permitted in this
 context.
 ...."

http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap04.html#tag_01_04_06

"Filenames

 <snip>

 Many East Asian languages, including Japanese, Chinese, and Korean, do not
 distinguish case and are sometimes encoded in character sets that use more
 than one byte per character.

 Multiple character codes may be used on the same machine simultaneously.
 There are several ISO character sets for European alphabets. In Japan,
 several Japanese character codes are commonly used together, sometimes
 even in filenames; this is evidently also the case in China. To handle
 case insensitivity, the kernel would have to at least be able to distinguish
 for which character sets the  concept made sense.
 ...."

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap06.html

"....
 C Language Wide-Character Codes

 In the shell, the standard utilities are written so that the encodings of
 characters are described by the locale's LC_CTYPE definition (see LC_CTYPE )
 and there is no differentiation between characters consisting of single
 octets (8-bit bytes) or multiple bytes. However, in the C language, a
 differentiation is made. To ease the handling of variable length characters,
 the C language has introduced the concept of wide-character codes.

 All wide-character codes in a given process consist of an equal number of
 bits. This is in contrast to characters, which can consist of a variable
 number of bytes. The byte or byte sequence that represents a character can
 also be represented as a wide-character code. Wide-character codes thus
 provide a uniform size for manipulating text data. A wide-character code
 having all bits zero is the null wide-character code (see Null Wide-Character
 Code ), and terminates wide-character strings (see Wide-Character Code
 (C Language) ). The wide-character value for each member of the portable
 character set shall equal its value when used as the lone character in
 an integer character constant. Wide-character codes for other characters
 are locale and implementation-defined. State shift bytes shall not have
 a wide-character code representation.
 ...."

So, I guess, [POSIX 'byte'/'char' vs. standard C/C++ 'byte'/'char' aside]
the concept of 'filename' [for "optional" wfopen(), for example] should
probably be extended to allow specification of filenames [probably in
'C Language Wide-Character Codes'] according to the current [or specified
as "optional" wfopen argument] >>Locale<<:

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html
(Locale)

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 4 Jul 2002 19:44:45 GMT Raw View

Alexander Terekhov wrote:
>
> Pete Becker wrote:
> [...]
> > Nope. I'm not willing to consider it because, after fifteen years of
> > looking at internationalization, I still don't know how to do it.
>
> After fifteen minutes [or so] looking at POSIX standard [freely
> available on the Net, BTW], I already know how to do it. ;-) ;-)

Yup, the POSIX standard has a bunch of words about characters beyond the
portable filename character set. Ultimately those words "it would be
nice, but we don't know how to do it." And on top of that, believe it or
not, there are systems in the world that don't conform to POSIX.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Fri, 5 Jul 2002 16:28:15 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D248FFC.DBE23514@acm.org>...
> Alexander Terekhov wrote:
> >
> > Pete Becker wrote:
> > [...]
> > > Nope. I'm not willing to consider it because, after fifteen years of
> > > looking at internationalization, I still don't know how to do it.
> >
> > After fifteen minutes [or so] looking at POSIX standard [freely
> > available on the Net, BTW], I already know how to do it. ;-) ;-)
>
> Yup, the POSIX standard has a bunch of words about characters beyond the
> portable filename character set. Ultimately those words "it would be
> nice, but we don't know how to do it." And on top of that, believe it or
> not, there are systems in the world that don't conform to POSIX.
>

One last shot at this.  IMHO, C++ provides pretty good support for
internationalization/localization, but there seem to be a very few,
very small holes in that support, specifically in the ares of
exception what_args and wide character file names.  I offered what
seemed to me to be carefully considered suggestions for addressing
these holes with no impact on existing code, programming styles, or
habits.  I would further suggest that figuring out how best to support
internationalization/localization might proceed more smoothly if these
few small holes did not make that effort more difficult than
necessary.

Now, you have repeatedly stated that these ideas have been considered
in the past and rejected as bad ideas.  OK.  I can accept that, but
could you please share the basis for that rejection?  It's rather
difficult to attempt any refutation without knowing what the technical
arguments are.

Thanks.

Randy.

> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: 2 Jul 2002 00:50:07 GMT Raw View

Randy Maddox wrote:
>
> There was also sentiment that the suggestion was unnecessary if no operating
> system provided support for wide character file names.

Citation, please?

>
> Perhaps the C++ library group should take a look at this and decide whether
> or not to pursue it further.
>

As Beman Dawes pointed out, we have looked at it and found it wanting.
If you're serious about this you should implement it, get people to use
it in non-trivial applications, document it, and propose it.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Tue, 2 Jul 2002 03:50:59 GMT Raw View

Beman Dawes wrote:

[...snip prior poster's partial postings...]
> The C++ committee's Library Working Group just discussed the issue of
> wide-character file names again in some detail this past Spring, both
> on the committee's library reflector and at the Curacao meeting.
>
> The concensus was clear that (1) file names based on types other than
> char are extremely non-portable,

   File names based on a sequence of octets are extremely nonportable
too!  (But that unfortunate fact doesn't seem to stop anyone over the
past decade from having standards-based filesystem operations.)  On
MSDOS the file name is restricted to <=8 characters followed by an
optional <=3 characters preceded by a period.  No other operating system
has that restriction.  The System V file-system have a maximum
14-character length limit per component.  Other operating systems had a
32-character length limit per component.  Each operating system has its
own peculiar restrictions regarding which characters are permitted
versus prohibited and which one must be escaped to avoid special
meaning, as well as how they must be escaped.  Wide-character
filesystem-identifiers continues this multi-decade-practiced
hardly-portable tradition.

> (2) there are no agreed upon
> semantics for conversion between wide-character and narrow-character
> names for file systems which do not support wide-character name,

   Instead of using this as an excuse for a scorched-earth approach to
the topic, then simply don't convert between the two in the C++
language.  Char-based filesystem-identifiers should stay char-based
without conversion.  Wide-character-based filesystem-identifiers should
stay wide-character-based without conversion.

   If an operating system is unfit to support the optional
wide-character filesystem identifiers, then that option in C++0x would
be turned off on that platform.  There would be some #include <options>
header which would indicate which optional portions of C++0x are present
versus absent on a particular platform.

> and
> (3) even the committee members from the international community most
> interested in wide-character names don't think that wide-character
> names are a good idea in the context of the standard library.

   Why?  Are they fundamentally opposed to having file names in
non-ISO-8859 languages?  Do they prefer multiple proprietary approaches
instead of a single C++0x-based approach?  Or do they prefer having no
approach at all as a way of suppressing wide-character
filesystem-identifiers via starvation of Unicode?

> Wide-character file names would provide an illusion of portability
> where portability does not in fact exist. Behavior would be completely
> different on operating systems (Windows, for example) that support
> wide-character names, than on systems which don't.

   The C++0x option to support wide-character filesystem-identifiers
should be turned off for an operating systems which does not support
wide character names.  The portability of wide-character
filesystem-identifiers is only among operating systems which support
that feature in the first place.  Likewise with any optional portion of
C++0x: the portability is only among platforms which support that
optionality.  A platform which does not support the optionality does not
participate in the portability.

> Providing
> functionality that appears to provide portability but in fact delivers
> only system-specific behavior is highly undesirable.

   I will grant you that some operating system might permit only one
Unicode encoding (e.g., UTF-8) whereas another operating system might
permit only a different Unicode encoding (e.g., UTF-32).  Handling that
difference between Unicode encodings is not merely a deficiency of some
wide-character filesystem-identifier proposal, but rather a gross
deficiency in C++ (and C) in general.  For an interesting attempt at
solving aspects of this problem, take a look at IBM's ICU library.

> So until someone steps forward with an acceptable portable
> specification for wide-character file names, the idea is a
> non-starter.

   Because not all operating systems support multithreaded
programming,---using the line of reasoning presented along this
thread---MT support in C++0x would likewise merely "provide an illusion
of portability where portability does not in fact exist".  Will MT be a
"nonstarter" too for similar reasons?  If not, then please explain how
operating systems which support MT (versus those which do not) permit an
entirely different warmly-accomodating attitude, whereas operating
systems which support wide-character filesystem-identifiers (versus
those which do not) beget a far harsher attitude.

   C++0x standardization must face reality that not every operating
system will be able support every topic in C++0x.  In addition to a
required base (e.g., C++98 plus language-layer features), C++0x will
need to have some sort of optionality mechanism as oft-practiced in ITU
and other ISO/IEC standards (e.g., datacom & telecom standards), so that
certain features are either 1) standardized if supportable on a
particular platform or 2) absent from that platform.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "P.J. Plauger" <pjp@dinkumware.com>
Date: Tue, 2 Jul 2002 15:47:51 GMT Raw View

"Daniel Miller" <daniel.miller@tellabs.com> wrote in message news:3D20F54E.5050004@tellabs.com...

> > The C++ committee's Library Working Group just discussed the issue of
> > wide-character file names again in some detail this past Spring, both
> > on the committee's library reflector and at the Curacao meeting.
> >
> > The concensus was clear that (1) file names based on types other than
> > char are extremely non-portable,
>
>    File names based on a sequence of octets are extremely nonportable
> too!  (But that unfortunate fact doesn't seem to stop anyone over the
> past decade from having standards-based filesystem operations.)

Indeed not, because there is a nontrivial portable *subset* of filenames
you can use when you have to specify filenames for a program you intend
to keep highly portable.

>                                                                 On
> MSDOS the file name is restricted to <=8 characters followed by an
> optional <=3 characters preceded by a period.  No other operating system
> has that restriction.

MSDOS no longer has that restriction. At the time the C Standard was
developed, *several* operating systems had similar restrictions. That's
why the C Standard requires only 6.2 names, which are even more restrictive
than the 8.3 of DOS and its predecessors.

>                        The System V file-system have a maximum
> 14-character length limit per component.  Other operating systems had a
> 32-character length limit per component.  Each operating system has its
> own peculiar restrictions regarding which characters are permitted
> versus prohibited and which one must be escaped to avoid special
> meaning, as well as how they must be escaped.

Indeed, just as each computer has its own restriction on the number of
bits in a long double mantissa. The idea of standards is to define a
reasonable common denominator that you can rely on across systems. Where
that RCD is not yet clear, standardization is premature.

>                                              Wide-character
> filesystem-identifiers continues this multi-decade-practiced
> hardly-portable tradition.

And add to the problem in ways that have no widely accepted solutions.

> > (2) there are no agreed upon
> > semantics for conversion between wide-character and narrow-character
> > names for file systems which do not support wide-character name,
>
>    Instead of using this as an excuse for a scorched-earth approach to
> the topic,

Who's scorching earth? We went to lots of meetings and discussed this
point on more than one occasion. Didn't see you there.

>           then simply don't convert between the two in the C++
> language.  Char-based filesystem-identifiers should stay char-based
> without conversion.  Wide-character-based filesystem-identifiers should
> stay wide-character-based without conversion.

I got that *you* find this an acceptable solution. Others have not,
so far.

>    If an operating system is unfit to support the optional
> wide-character filesystem identifiers, then that option in C++0x would
> be turned off on that platform.  There would be some #include <options>
> header which would indicate which optional portions of C++0x are present
> versus absent on a particular platform.

#include <options>
...
#if AINT_GOT_NO_WIDE_CHARACTER_FILE_NAMES
    <<do what?>>

> > (3) even the committee members from the international community most
> > interested in wide-character names don't think that wide-character
> > names are a good idea in the context of the standard library.
>
>    Why?  Are they fundamentally opposed to having file names in
> non-ISO-8859 languages?  Do they prefer multiple proprietary approaches
> instead of a single C++0x-based approach?  Or do they prefer having no
> approach at all as a way of suppressing wide-character
> filesystem-identifiers via starvation of Unicode?

You've cited three less-than-admirable possible reasons. Some people may
share some or all of those views. Others may have other reasons. One
way to find out is to make a proposal and defend it. Perhaps you'll find
that people will like it. Or perhaps you'll find that people will oppose
it for reasons that may even border on the rational. One way to find out.

> > Wide-character file names would provide an illusion of portability
> > where portability does not in fact exist. Behavior would be completely
> > different on operating systems (Windows, for example) that support
> > wide-character names, than on systems which don't.
>
>    The C++0x option to support wide-character filesystem-identifiers
> should be turned off for an operating systems which does not support
> wide character names.  The portability of wide-character
> filesystem-identifiers is only among operating systems which support
> that feature in the first place. Likewise with any optional portion of
> C++0x: the portability is only among platforms which support that
> optionality.  A platform which does not support the optionality does not
> participate in the portability.

Again, that's one way to get a weak form of standardization. Others
prefer to define a feature in such a way that it has a meaningful
implementation on most or all systems that currently support C++.
(Vendors who are excluded from full conformance naturally object to
the loss of cachet. Not to mention contracts.)

>    I will grant you that some operating system might permit only one
> Unicode encoding (e.g., UTF-8) whereas another operating system might
> permit only a different Unicode encoding (e.g., UTF-32).  Handling that
> difference between Unicode encodings is not merely a deficiency of some
> wide-character filesystem-identifier proposal, but rather a gross
> deficiency in C++ (and C) in general.  For an interesting attempt at
> solving aspects of this problem, take a look at IBM's ICU library.

Actually, we worried more about the opposite problem. C++ (and C) in
general let you work with a number of different wide-character and
multibyte encodings, all within the same program. IBM's ICU library is
just one, non-standard, approach to managing part of the complexity
that causes. The unsolved problem is determining how best to specify
which of a host of possible encodings, or interconversions, to apply
in a given context. This rather gross deficiency stems from a
worldwide lack of experience; it is not specifically a failure of
Standard C or Standard C++. They're just the commonest tractable
host languages.

> > So until someone steps forward with an acceptable portable
> > specification for wide-character file names, the idea is a
> > non-starter.
>
>    Because not all operating systems support multithreaded
> programming,---using the line of reasoning presented along this
> thread---MT support in C++0x would likewise merely "provide an illusion
> of portability where portability does not in fact exist".  Will MT be a
> "nonstarter" too for similar reasons?

Depends on how it's specified. Boost has a C++ package, which Dinkumware
has extended to include C as well, that lets us write the kind of
portable multithreaded code we find commercially useful. It helps
that MT is particularly easy to implement in a system that supports
at most one thread of control...

People can debate whether the Boost approach is suitable for adding
to the C++ Standard, but it's clearly a candidate.

>                                      If not, then please explain how
> operating systems which support MT (versus those which do not) permit an
> entirely different warmly-accomodating attitude, whereas operating
> systems which support wide-character filesystem-identifiers (versus
> those which do not) beget a far harsher attitude.

The warm accommodation that I've seen toward the Boost proposal I
can attribute to the fact that a) it's well thought out, b) it has
been implemented, c) people have been exercising it, d) it has a
precise enough description to serve as a proposal, and e) it's
sponsors have attended C++ meetings.

You encounter a far harsher attitude when you misstate facts and
use such pejorative terms as ``scorched earth'', ``multiple
proprietary approaches'', ``starvation of Unicode'', and ``gross
deficiency''.

A wink is as good as a nod.

>    C++0x standardization must face reality that not every operating
> system will be able support every topic in C++0x.  In addition to a
> required base (e.g., C++98 plus language-layer features), C++0x will
> need to have some sort of optionality mechanism as oft-practiced in ITU
> and other ISO/IEC standards (e.g., datacom & telecom standards), so that
> certain features are either 1) standardized if supportable on a
> particular platform or 2) absent from that platform.

That is indeed one approach. It has been used far more widely in C99
than in C90. And C99 has been criticized in some circles for walking
that path. (We used to laugh at the Cobol standard that had twelve
optional modules, giving a total of 4,096 possible dialects.)

Still another approach is *not* to standardize something until there's
enough existing practice to give clear guidance.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Alexander Terekhov <terekhov@web.de>
Date: Tue, 2 Jul 2002 12:29:05 CST Raw View

Daniel Miller wrote:
>
> Beman Dawes wrote:
>
> [...snip prior poster's partial postings...]
> > The C++ committee's Library Working Group just discussed the issue of
> > wide-character file names again in some detail this past Spring, both
> > on the committee's library reflector and at the Curacao meeting.
> >
> > The concensus was clear that (1) file names based on types other than
> > char are extremely non-portable,
>
>    File names based on a sequence of octets are extremely nonportable
> too!  (But that unfortunate fact doesn't seem to stop anyone over the
> past decade from having standards-based filesystem operations.)  On
> MSDOS the file name is restricted to <=8 characters followed by an
> optional <=3 characters preceded by a period.  No other operating system
> has that restriction.  The System V file-system have a maximum
> 14-character length limit per component.  Other operating systems had a
> 32-character length limit per component.  Each operating system has its
> own peculiar restrictions regarding which characters are permitted
> versus prohibited and which one must be escaped to avoid special
> meaning, as well as how they must be escaped.

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap03.html#tag_03_170

"....
 Filename Portability

 Filenames should be constructed from the portable filename character set
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 because the use of other characters can be confusing or ambiguous in
 certain contexts. (For example, the use of a colon ( ':' ) in a pathname
 could cause ambiguity if that pathname were included in a PATH definition.)
 ...."

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap03.html#tag_03_275

"....
 Portable Character Set

 The collection of characters that are required to be present in all locales
 supported by conforming systems.

 Note:
     The Portable Character Set is defined in detail in Portable Character Set.

 This term is contrasted against the smaller portable filename character set;
 see also Portable Filename Character Set.
 ...."

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap03.html#tag_03_276

"....
 Portable Filename Character Set
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 The set of characters from which portable filenames are constructed.

     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
     a b c d e f g h i j k l m n o p q r s t u v w x y z
     0 1 2 3 4 5 6 7 8 9 . _ -


 The last three characters are the period, underscore, and hyphen
 characters, respectively.
 ...."

> Wide-character
> filesystem-identifiers continues this multi-decade-practiced
> hardly-portable tradition.

http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap03.html#tag_01_03_00_46
(Portable Filename Character Set)

http://www.opengroup.org/onlinepubs/007904975/xrat/xbd_chap04.html#tag_01_04_06
(Filenames)

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap06.html
(Character Set)

http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap07.html
(Locale)

regards,
alexander.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Julian Smith <jules@REMOVETHIS.op59.net>
Date: Fri, 28 Jun 2002 17:17:36 GMT Raw View

On Thu, 27 Jun 2002 20:27:24 GMT
Michiel.Salters@cmg.nl (Michiel Salters) wrote:

(Apologies for the length of this post. Hopefully some people will find it interesting enough to read to the end.)

> Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...

> > No, they weren't missed. Those were deliberate decisions, and the
> > reasons for those decisions have been explained several times in this
> > thread, and many times in the past.
>
> I've just reread the thread, and I'm still undecided about the wchar_t
> exceptions. Clearly we want to keep a single hierarchy, but having only
> chars is starting to hurt. Up to 1999 the Netherlands had no problems
> with ASCII, but today throwing a domain_error with a Euro-sign in the
> what() is quite reasonable. That's not always present in the narrow
> character set (must be ISO8859-15, -1 is still common). A wchar_t
> interface coud help here.

I think the problem here is that C++ interfaces aren't capable of doing what people want:

The C++ standards people invented something that, at the time, was the best interface they could come up with for exception objects. This interface has a pure virtual `const char* std::exception::what() const' method. Nowadays, people are asking to change the std::exception interface so that it can return wide-character strings. We still want old code to recompile and run as before though.

I don't think there is any nice solution to this sort of problem in C++. Which is a shame because, even with the best intentions, interfaces often need to change after they have been used in other code. The approach outlined by Michiel Salters (quoted below) works by adding stuff to std::exception, stuff that will not be used by code that isn't interested in wide character error messages. Legend tells of linkers that can miss out virtual functions that are never called, but a lot of systems will end up linking in the `const wchar_t* std::exception::wwhat() const' methods every time.

Even if your linker is clever enough to do this optimisation, the interface is poor from an aesthetic point of view - it has methods for everything that may be required of it, irrespective of whether they will ever be used.

For example, the discussion so far has focused on just two output formats - char* and wchar_t*, but there are many more ways of outputing the information. Maybe the application will open a dialogue box showing the details of the error, or write to a log file, or send an email. It may be required to output information in English or Japanese. It could draw a diagram. It could do more than one of these things.

So how can we to this? Surely not by defining a pure virtual function for each possible output method:

    class   exception
    {   const char*     what() const = 0;
        const wchar_t*  wwhat() const = 0;
        dialoguebox&    dbox() const = 0;
        dialoguebox&    dbox_japanese() const = 0;
        void            email_person_who_supplied_inputdata() const = 0;
        void            email_boss_of_person_who_supplied_inputdata() const = 0;
        ...
    };

I think it's interesting to view the problem differently, in a non-object-oriented way (gasp!).

We have to defer the decision of how to represent the exception data to the user, because usually the code that calls throw doesn't know how the exception will be communicated to the user.

So, forget about virtual methods. Treat an exception object (the thing that is thrown) as /data/. No code. No virtual functions. Just plain typed data that encodes whatever information is needed to represent what went wrong. That's the most we can do.

How do we output the exception information in an appropriate format? Well, unfortunately, it isn't currently possible to do this systematically, because the solution requires Multimethods. We want to say `output this std::exception object as Japanese text to this std::wstring', or `output this std::exception object as English text to this std::string', or `output this std::exception object in as complete a way of possible to mail to the developers of the programme as part of a bug report'.

With multimethods, it's pretty easy. We can define a multimethod function for our particular application for the Japanese market, looking like:

    std::wstring    OutputJapanese( virtual std::exception& e);
    // Using Stroustrup's suggested multimethod syntax in the D&E.

- and then write implementations of this function specialised for each class that can be thrown by our application. For example:

    std::wstring    OutputJapanese_( std::runtime_error& e) { ...}
    std::wstring    OutputJapanese_( CutomException& e)     { ...}
    std::wstring    OutputJapanese_( std::logic_error& e)   { ...}
    // Using Cmm's multimethod implementation syntax

Exception handlers can then output japanese text very easily:

    catch ( std::exception& e)
    {   std::wcout << OutputJapanese( e);
        // calls whatever is the best matching OutputJapanese_() fucntion.
        ...
    }

What's happening is that we're effectively adding to the std::exception interface, without modifying the std::exception class itself. Various people (Scott Meyers, Andrei Alexandrescu) have written about the equivalence of global functions and class methods. Here, the OutputJapanese() virtual function is doing pretty much the same thing as a `virtual std::wstring std::exception::OutputJapanese()' would do. The crucial difference is that we have added it as part of our application without changing the std::exception class itself.

Despite using the term Multimethods here, OutputJapanese() only has one virtual parameter. Multimethods is used simply as a way of effectively adding conventional virtual functions to an existing class.

However, support for more than one virtual parameter can get us more good stuff: multi-lingual language support for outputing errors. Instead of defining a OutputJapanese method, we define a generic OutputText() function, which also takes a virtual parameter to identify the language to use:

    std::wstring    OutputText( std::exception& e, Language& l);

`Language' is an empty base class. We can derive new classes from it, such as English, French, Japanese etc.

We now write implementations of the OutputText function for all the combinations of error type and language that will arise in the application:

    std::wstring    OutputText_( std::runtime_error& e, English& l) {...}
    std::wstring    OutputText_( CutomException& e, English& l) {...}

    std::wstring    OutputText_( std::runtime_error& e, Japanses& l) {...}
    std::wstring    OutputText_( CutomException& e, Japanses& l) {...}

This may look like lot of work, and indeed it is. But it's no more work than we'd have to do anyway - each error type/language combination needs separate handling somehow. Often it's in the form of message IDs and message templates; using individual functions may be more verbose, but it is much more powerful, and there is plenty of scope for the functions to share common code.

Our application can have a global `Language* default_language;' that refers to whatever is the currently configured language, and do:

    catch ( std::exception& e)
    {   std::wcout << OutputText( e, *default_language);
        // Calls whatever is the best matching OutputText_() function.
        ...
    }

>
> I think it's reasonable to require that in C++0x,
> * the exception ctors are overloaded for wchar_t*,
> * what() is supplemented by wwhat() or a similar function returning wchar_t,
> * calling what() (to get a char*) when the std::exception was initialized
>   with a wchar_t* returns an implementation-defined string
>   ( returning "No mapping from wchar_t* to char*" is legal, like returning
>     all character values modulo UCHAR_MAX )
> * calling wwhat()( to get a wchar_t* )when the exception was initialized with a
>   char* uses an implementation-defined reversible mapping to convert
>   the string.
> * exceptions thrown by the implementation (like std::bad_alloc) are initialized
>   with a char* (like in C++98).
>
> The main issue here is that the implementation-defined reversible
> mapping from char* to wchar_t* takes memory, which may be in short supply.
> However, the wchar_t* wwhat() return value for the predefined exceptions
> (like std::bad_alloc) could be statically allocated while other exceptions
> can safely throw std::bad_alloc during construction. ( IIRC, it is
> legal to throw B; from A::A in an throw A() statement ).

To sum up, I think that the issue of adding a std::exception::wwhat() function is a particular case of a much larger problem, whose solution requires a substantial addition to C++ - multimethods.

I've done as much as I can to encourage people to consider multimethods for incorporation into C++ (including writing a working implementation - see http://www.op59.net/cmm/readme.html), but so far there has been no interest from anyone connected with the standards committee.

This being the case, I think that any modification to std::exception to output wchar_t* will turn out to be a hack for a problem that is basically not solvable in the language as it currently exists.

- Julian

--
http://www.op59.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: pdimov@mmltd.net (Peter Dimov)
Date: Fri, 28 Jun 2002 17:18:03 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0206270723.1767efa0@posting.google.com>...
>
> Again, the mere fact that a particular decision was made at some point
> in the past is not a valid argument that said decision must remain the
> same for all time.  And I am scarcely claiming American jingoism,
> merely stating that, upon careful review under present conditions,
> there may be some basis for revisiting a few past decisions.

Fine, but why don't you let the non-ASCII developers speak for
themselves? They surely understand the problems better.

Let me address the 'wchar_t const * what()' problem (I've done so in
the past.)

First, what gets displayed to the user is not a programmer's decision.
It's a designer (sometimes marketing) decision, and it's definitely
not low-level library decision. This in practice means that what()
does not contain the string that the user sees, merely some kind of
reference to it, so it can be looked up in some kind of message
catalog.

Second, the language of the what() string is typically not a low-level
library decision. In order to supply the proper wide string to the
exception constructor, the low-level library needs to somehow obtain
the language that the catch() site will later use to present the
string to the user.

Third, the catch site may need to display a dialog box with the
exception description that has a 'change language' button/combo box.

Fourth, the catch site may need to obtain two different descriptions
for the same exception, if the log file is in English but the UI is in
Bulgarian, for example.

And of course the standard what() strings rarely contain anything
useful. :-)

> I would also note that Windows NT, since version 3.5.2, back in 1996,
> has and does provide support for wide character file names.  NT
> derivatives such as Windows 2000, which is pretty widely used, also
> provide this support.  And please correct me if I'm wrong here, but I
> also seem to recall reading something about wide character file name
> support in Linux.  I was not able to locate the reference, so I cannot
> state that as a fact.
>
> In any case, the fact that a major vendor like Microsoft saw a need
> for wide character file name support, and went through the trouble to
> build it into their OS, would seem to indicate that I am not alone in
> thinking this.

I wouldn't say that wchar_t support for file names is a bad idea, or
useless, but it's a bit less useful than it seems. :-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Ken Shaw" <ken@DIE_SPAM_DIE_compinnovations.com>
Date: Fri, 28 Jun 2002 17:26:17 GMT Raw View

"Daniel Miller" <daniel.miller@tellabs.com> wrote in message
news:3D19DD33.9030306@tellabs.com...
> Pete Becker wrote:
>
> > Daniel Miller wrote:
> >
> >>   If type-parameterized catch blocks "shouldn't be very hard to
> >>implement", please describe exactly how the compiler would arrive at
> >>exactly which set of types the type-parameterized catch block would be
> >>specialized at compile-time (or exactly how run-time specialization of
> >>templates would work).
> >>
> >>
> >
> > Magic linkers can do this, of course.
>
>
>    Such link-time-based type-parameterized catch blocks would move the
> C++ community from *permitting* such "magic linkers" as QoI improvements
> to *demanding* such "magic linkers" as minimum acceptable
> implementation.  Is C++0x standardization prepared to make the
> aggressive leap of requiring *every* linker for C++0x to be such a
> "magic linker"?
>
>    If the answer to that question is "no" (i.e., if the answer is C++0x
> should not place such a burden for "magic linkers" on every platform),
> then my request for an exact & thorough explanation of a reference
> implementation for compile-time or run-time implementation of
> type-parameterized catch blocks still stands unaddressed.  If
> type-parameterized catch blocks "shouldn't be very hard to implement",
> please describe exactly how the compiler would arrive at
> exactly which set of types the type-parameterized catch block would
> specialize at compile-time (or exactly how run-time specialization of
> templates would work) *without* relying on some optional/QoI "magic
linker".
>

A reasonably simple solution for a standard conformant compiler is any type
explicitly thrown inside the corresponding try block plus the the exception
specifications of all functions called inside the try block.

If this isn't possible would someone at least tell me why not?

Ken Shaw




---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Fri, 28 Jun 2002 21:40:59 GMT Raw View

Michiel.Salters@cmg.nl (Michiel Salters) wrote in message
news:<cefd6cde.0206270231.53c5983c@posting.google.com>...
> Pete Becker <petebecker@acm.org> wrote in message
> news:<3D1A53C3.83C1F498@acm.org>...

> > Randy Maddox wrote:

> > > All I am saying is that the effort to support
> > > internationalization/localization in C++ missed a few small
> > > points,

> > No, they weren't missed. Those were deliberate decisions, and the
> > reasons for those decisions have been explained several times in
> > this thread, and many times in the past.

> I've just reread the thread, and I'm still undecided about the
> wchar_t exceptions. Clearly we want to keep a single hierarchy, but
> having only chars is starting to hurt. Up to 1999 the Netherlands
> had no problems with ASCII, but today throwing a domain_error with a
> Euro-sign in the what() is quite reasonable. That's not always
> present in the narrow character set (must be ISO8859-15, -1 is still
> common). A wchar_t interface coud help here.

The real question is: what do the strings in the exception messages
represent?  From what you say, you are using them directly as error
messages (for the user).  I'm not sure that this is a good idea, and
it doesn't fit well with internationalization (support for multiple
locales -- as opposed to support for just one non-American locale).
My tendancy is to consider the strings simply as "keys" which are
readable for debugging purposes (and not much else).  For display to a
user, I would normally use them to select a message (using the gettext
interface in Unix).

Other than that, I find your proposal rather well thought out.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: bdawes@acm.org (Beman Dawes)
Date: Sat, 29 Jun 2002 20:57:16 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message news:<8c8b368d.0206270723.1767efa0@posting.google.com>...
> Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...
> > Randy Maddox wrote:
> > >
> > > All I am saying is that the effort to support
> > > internationalization/localization in C++ missed a few small points,
> >
> >
> > No, they weren't missed. Those were deliberate decisions, and the
> > reasons for those decisions have been explained several times in this
> > thread, and many times in the past.
>
> Just because a decision is deliberate is no indication that decision
> will remain valid for all time.  Things change, and sometimes past
> decisions need to be revisited.  So it goes.

The C++ committee's Library Working Group just discussed the issue of
wide-character file names again in some detail this past Spring, both
on the committee's library reflector and at the Curacao meeting.

The concensus was clear that (1) file names based on types other than
char are extremely non-portable, (2) there are no agreed upon
semantics for conversion between wide-character and narrow-character
names for file systems which do not support wide-character name, and
(3) even the committee members from the international community most
interested in wide-character names don't think that wide-character
names are a good idea in the context of the standard library.

Wide-character file names would provide an illusion of portability
where portability does not in fact exist. Behavior would be completely
different on operating systems (Windows, for example) that support
wide-character names, than on systems which don't. Providing
functionality that appears to provide portability but in fact delivers
only system-specific behavior is highly undesirable.

So until someone steps forward with an acceptable portable
specification for wide-character file names, the idea is a
non-starter.

--Beman Dawes

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 1 Jul 2002 16:57:23 GMT Raw View

Julian Smith <jules@REMOVETHIS.op59.net> wrote in message news:<20020628113056.531fee56.jules@REMOVETHIS.op59.net>...
> On Thu, 27 Jun 2002 20:27:24 GMT
> Michiel.Salters@cmg.nl (Michiel Salters) wrote:
>
> (Apologies for the length of this post. Hopefully some people will find it interesting enough to read to the end.)
>
> > Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...
>
> > > No, they weren't missed. Those were deliberate decisions, and the
> > > reasons for those decisions have been explained several times in this
> > > thread, and many times in the past.
> >
> > I've just reread the thread, and I'm still undecided about the wchar_t
> > exceptions. Clearly we want to keep a single hierarchy, but having only
> > chars is starting to hurt. Up to 1999 the Netherlands had no problems
> > with ASCII, but today throwing a domain_error with a Euro-sign in the
> > what() is quite reasonable. That's not always present in the narrow
> > character set (must be ISO8859-15, -1 is still common). A wchar_t
> > interface coud help here.
>
> I think the problem here is that C++ interfaces aren't capable of doing what people want:
>
> The C++ standards people invented something that, at the time, was the best interface they could come up with for exception objects. This interface has a pure virtual `const char* std::exception::what() const' method. Nowadays, people are asking to change the std::exception interface so that it can return wide-character strings. We still want old code to recompile and run as before though.

I personally am most definitely NOT asking to modify in any way the
interface provided by std::exception.  Instead, what I am suggesting
is that the same technique used in the std::string and std::wstring
classes be applied to the std::exception classes.  That is, just as
std::string and std::wstring are merely typedefs of std::basic_string,
I am proposing a new class, std::basic_exception that would,
partially, look like:

  template <typename Char_t>
  class basic_exception
  {
  public:

    explicit basic_exception(const basic_string<Char_t> what_arg);

    virtual const Char_t * what() const;

    ...    // other members
  };

  typedef basic_exception<char> exception;
  typedef basic_exception<wchar_t> wexception;

The entire std::exception hierarchy would then be EXACTLY as it is now
with no change required to any existing code.  Those who need wide
character exceptions could use them.  Those who don't could be
blissfully unaware of their existence.  And, just as with string and
wstring, we have no need to conjoin functionality that does not go
together.  Instead we have separate class hierarchies to address
separate needs.  Given which the remainder of the points in this
original posting become moot.  There is no need to expand the
interface provided by std::exception, no multimethods, no magic
linkers, etc.  All of that imagined complexity is just another red
herring in this discussion.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Mon, 1 Jul 2002 22:09:26 GMT Raw View

Let me see if I can summarize the discussion to date:

I made a suggestion that it might be a good thing for the standard C++
library to fix a few small spots where support for only narrow characters
is hardwired.  Specifically this suggestion addressed exception what_arg
strings, file names, and message catalog names.  I was careful to suggest
ways these changes could be made so as to have no impact on existing code,
and to require no change in programming style or habits of any developer who
did not require the additional functionality.

Apparently I did not cover that last point clearly enough because the greater
portion of initial objections centered around resistance to breaking existing
code.  Hopefully I have been clear enough since then and those objections do
appear to have been assuaged.

Next there was a sideshow involving templated catch statements, which on the
surface have a certain appeal, but which on closer inspection seem to be not
possible to implement.  So it goes.  In any case, that part of the discussion
has no bearing on the original suggestion.

There was also sentiment that the suggestion was unnecessary if no operating
system provided support for wide character file names.  As it turns out,
however, at least Windows NT, and its derivatives such as Windows 2000, have
and do provide such support.  So it seems there is a market perception that
wide character file names actually are useful.

Additional support appeared in the form of a posting from Edward Diener who
indicated that he too had run into issues with needing wide character support
in exception what_arg strings.  And finally there was a posting from Michiel
Salters indicating that some actual problems had been encountered tyring to
use the Euro sign in exception what_arg strings.

Despite strong and vocal opposition from Pete Becker, whose specific
objections I still do not quite clearly grasp despite his protestations that
all has been explained, it appears that there is some support for this
suggestion, and that there is indeed a market need to address the issue.

Perhaps the C++ library group should take a look at this and decide whether
or not to pursue it further.

Randy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 25 Jun 2002 21:34:18 GMT Raw View

Randy Maddox wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D146B7D.4DDB23C6@acm.org>...
>
> > Implement it, get some experience with it, write it up, and propose it.
> > Perhaps you've had some brilliant insight that has escaped the rest of
> > us.
>
> I'm not
> arguing that implementing these suggestions and trying them out is a
> bad idea.  Au contraire.  It is indeed an excellent idea.  However, I
> don't see how a non-stdlib-implementor can do much with them.  Perhaps
> you would like to take a shot at it?

In case you haven't noticed, I think the ideas proposed in this thread
are not useful. While I appreciate your willingness to volunteer my time
for what I think is a waste of time, I must decline. I have more
productive work to do.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Michiel.Salters@cmg.nl (Michiel Salters)
Date: Tue, 25 Jun 2002 21:34:46 GMT Raw View

Hyman Rosen <hyrosen@mail.com> wrote in message news:<3D180171.4080109@mail.com>...
> dmeyer@dmeyer.net wrote:
> > You're not kidding.
>
> Yes he is, you just didn't get the joke. After all
> the discussion of export, the logical next step is
> to take template instantiation to run time!

Which despite your assumptions isn't a joke either. Todd
Veldhuizen ( at Indiana University , IIRC ) has been working
on this.

"There are more things at compiletime and runtime, Horatio
 Than are dreamt of in your philosophy. " :-)



Regards,
--
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Tue, 25 Jun 2002 21:35:28 GMT Raw View

Pete Becker wrote:

> Allan W wrote:
>
>>Code that currently catches std::logic_error would now have to catch
>>both std::logic_error<char> and std::logic_error<wchar_t>. And what
>>about std::logic_error<int> -- unlikely, perhaps, but not impossible.
>>
>>
>
> But all that's needed for that is templatized catch clauses. Someone
> should propose this:
>
> try {
>     // code here
>     }
> template <class T> catch(const std::logic_error<T>&)
>     {
>     }

   As stated, this idea has not addressed the quantification (as in
UG/UI/EG/EI quantification theory in logic) of which types T shall take
on and thus for which types the compiler will specialize such a
T-parameterized catch block.

   [ UI OF THE UNIVERSE OF ALL OF THAT COMPILATION-UNIT'S TYPES =
DISADVANTAGEOUS BLOAT ]

   As stated, this would require run-time specialization of that
type-parameterized catch block based on which type is thrown at
run-time.  In C++ there is no overt representation of what Alexander
Stepanov calls a "concept".  Thus the compiler does not know which type
(nor which set of types) can play the role of T other than to infer that
T may take on some subset of the, say, 891 types which have been
previously declared in that compilation unit.  For the 891
afore-declared types in this compilation unit should the compiler
attempt to generate up to 891 catch blocks at compile-time, silently
discarding those whose syntactic interfaces do not conform to the
interface required by the concept T?

   [ UI OF A ONE SET/CONCEPT-CATEGORY OF TYPES = LACK OF EXPRESSIVITY IN
C++98 ]

   Or more conservatively, should the compiler specialize that catch
block for all types which have been declared to be of concept-category
"character set"?  Oops, C++ has no overt concepts; C++ has no overt
declaration of sets of types, each of whose interface satisfies a
concept-category.

   [ EI = VIOLATION OF SINGLE COMPILATION-UNIT COMPILATION ]

   Or even more conservatively, should the compiler only existentially
specialize the actual types thrown in actual code in the corresponding
try block?  Oops, C++ cannot infer the actual types thrown by inspecting
a single compilation unit nor which types will be thrown by
function-templates or class-templates.

   The "someone" who would propose type-parameterized catch blocks would
be biting off some challenging homework.  That homework would involve
one or more of the following:  some for-alls, some there-exists, and
some set theory.

   [ GLOSSARY ]
   UI = universal instantiation
   UG = universal generalization
   EI = existential instantiation
   EG = existential generalization

> The comiler should turn this into code that catches any object whose
> type is an instance of this new logic_error template. That shouldn't be
> very hard to implement. And it would be so useful...

   If type-parameterized catch blocks "shouldn't be very hard to
implement", please describe exactly how the compiler would arrive at
exactly which set of types the type-parameterized catch block would be
specialized at compile-time (or exactly how run-time specialization of
templates would work).

[...snip...]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 25 Jun 2002 22:04:34 GMT Raw View

Daniel Miller wrote:
>
>    If type-parameterized catch blocks "shouldn't be very hard to
> implement", please describe exactly how the compiler would arrive at
> exactly which set of types the type-parameterized catch block would be
> specialized at compile-time (or exactly how run-time specialization of
> templates would work).
>

Magic linkers can do this, of course.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Hillel Y. Sims" <usenet@phatbasset.com>
Date: Tue, 25 Jun 2002 22:15:08 CST Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:8c8b368d.0206250429.6f094007@posting.google.com...
..
>
> Negatory!  You have missed a key point here.  My suggestion was for a
> new, templated exception class basic_exception that could support
> different character sets, but with the standard std::exception being a
> typedef as:  typedef basic_exception<char> exception.  Thus any and
> all code that uses the standard exception hierarchy would still see
> that hierarchy exactly as it is now.
>

Of course, nothing prevents you from defining such a custom hierarchy in
your own code and using it as desired. Since std::exception is totally
orthogonal to any other "basic_exception<>" types anyhow under your scheme,
and templated catch blocks don't exist (at least until we get some of those
magic linkers previously referenced..), there seems to be no reason why you
couldn't do this immediately in your own code.

hys

--
Hillel Y. Sims
hsims AT factset.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 26 Jun 2002 16:34:23 GMT Raw View

"Hillel Y. Sims" <usenet@phatbasset.com> wrote in message news:<j2aS8.15372$5M2.808311@news4.srv.hcvlny.cv.net>...

> Of course, nothing prevents you from defining such a custom hierarchy in
> your own code and using it as desired. Since std::exception is totally
> orthogonal to any other "basic_exception<>" types anyhow under your scheme,
> and templated catch blocks don't exist (at least until we get some of those
> magic linkers previously referenced..), there seems to be no reason why you
> couldn't do this immediately in your own code.
>

Pardon me, but this whole discussion has gotten way off track.  The
entire issue of templated catch blocks is a sideshow that has nothing
to do with either my suggestions or the basic issue, which is this:

C++ has wisely evolved into a language with fairly extensive support
for internationalization/localization.  Given the fact that software
development is clearly now a global enterprise this was entirely
necessary in order for C++ to become the major programming language
that it is today.

All I am saying is that the effort to support
internationalization/localization in C++ missed a few small points,
which is certainly not unexpected, and that it would probably be a
good thing if those omissions were now corrected, and more
specifically, were corrected in a fashion that would have zero impact
on existing code.

Now, as far as exceptions go, I could certainly develop my own
exception hierarchy to provide support for different character sets.
Any half-way competent C++ developer could do so as well.  But why
should we have a multitude of individual solutions when direct support
at the library level would make things easier and more consistent?

As for the issues with fstream and message catalogs, these cannot
really be addressed efficiently except at the library level.  And
again, even if it were done by individual developers, would it not be
better overall to build this support directly into the standard
library?

We developers here in America, and other fortunate developers for whom
the ASCII character set is fully adequate, are provided with complete
support for our character set by C++.  Other developers, however, are
forced to work around these few small holes that can be easily
corrected with no impact on existing code, and without requiring any
change in programming style or habits by those of us for whom ASCII is
sufficient.  How is it appropriate that we as a community should be so
self-centered as to deny the same level of support to other C++
developers around the world?  How is it in the best interests of C++
as a language to deny that support?

Randy.

> hys
>
> --
> Hillel Y. Sims
> hsims AT factset.com
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Daniel Miller <daniel.miller@tellabs.com>
Date: Wed, 26 Jun 2002 16:35:18 GMT Raw View

Pete Becker wrote:

> Daniel Miller wrote:
>
>>   If type-parameterized catch blocks "shouldn't be very hard to
>>implement", please describe exactly how the compiler would arrive at
>>exactly which set of types the type-parameterized catch block would be
>>specialized at compile-time (or exactly how run-time specialization of
>>templates would work).
>>
>>
>
> Magic linkers can do this, of course.

   Such link-time-based type-parameterized catch blocks would move the
C++ community from *permitting* such "magic linkers" as QoI improvements
to *demanding* such "magic linkers" as minimum acceptable
implementation.  Is C++0x standardization prepared to make the
aggressive leap of requiring *every* linker for C++0x to be such a
"magic linker"?

   If the answer to that question is "no" (i.e., if the answer is C++0x
should not place such a burden for "magic linkers" on every platform),
then my request for an exact & thorough explanation of a reference
implementation for compile-time or run-time implementation of
type-parameterized catch blocks still stands unaddressed.  If
type-parameterized catch blocks "shouldn't be very hard to implement",
please describe exactly how the compiler would arrive at
exactly which set of types the type-parameterized catch block would
specialize at compile-time (or exactly how run-time specialization of
templates would work) *without* relying on some optional/QoI "magic linker".

   If the answer to that question is "yes", then what has changed over
the years across all compiler writers and/or all operating systems to
permit such additional burdens on the linker which were inconvenient,
unagreeable/politically-incorrect, or impossible in years gone by?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Wed, 26 Jun 2002 23:55:37 GMT Raw View

Randy Maddox wrote:
>
> All I am saying is that the effort to support
> internationalization/localization in C++ missed a few small points,

No, they weren't missed. Those were deliberate decisions, and the
reasons for those decisions have been explained several times in this
thread, and many times in the past.

>
> Now, as far as exceptions go, I could certainly develop my own
> exception hierarchy to provide support for different character sets.
> Any half-way competent C++ developer could do so as well.  But why
> should we have a multitude of individual solutions when direct support
> at the library level would make things easier and more consistent?
>

The reason you should do it, and get some actual experience with it, is
so that you'll understand what a bad idea it is.

> As for the issues with fstream and message catalogs, these cannot
> really be addressed efficiently except at the library level.  And
> again, even if it were done by individual developers, would it not be
> better overall to build this support directly into the standard
> library?

Sure, if supporting those features was actually a good idea. If you want
to convince people of that you need implementations and real-world
experience. Stamping your foot and saying "You're wrong, you're wrong"
wont' change anyone's mind.

Incidentally, the original design for templated iostreams was done by
members of the Japanese delegation to WG21. They didn't see a need for
wide character file names.

>
> We developers here in America, and other fortunate developers for whom
> the ASCII character set is fully adequate, are provided with complete
> support for our character set by C++.  Other developers, however, are
> forced to work around these few small holes that can be easily
> corrected with no impact on existing code, and without requiring any
> change in programming style or habits by those of us for whom ASCII is
> sufficient.  How is it appropriate that we as a community should be so
> self-centered as to deny the same level of support to other C++
> developers around the world?  How is it in the best interests of C++
> as a language to deny that support?
>

I don't have the complete list handy, but Japan, Germany, France, and
Russia, all of whom use characters outside the ASCII character set, are
among the national bodies who approved the present international
standard. Hardly demonstrative of the American jingoism that you claim.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 27 Jun 2002 15:51:45 GMT Raw View

Pete Becker wrote:
>
> Incidentally, the original design for templated iostreams was done by
> members of the Japanese delegation to WG21. They didn't see a need for
> wide character file names.
>

Whoops, I was thinking of basic_string. So ignore this.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Thu, 27 Jun 2002 15:58:38 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...
> Randy Maddox wrote:
> >
> > All I am saying is that the effort to support
> > internationalization/localization in C++ missed a few small points,
>
>
> No, they weren't missed. Those were deliberate decisions, and the
> reasons for those decisions have been explained several times in this
> thread, and many times in the past.

Just because a decision is deliberate is no indication that decision
will remain valid for all time.  Things change, and sometimes past
decisions need to be revisited.  So it goes.

>
> >
> > Now, as far as exceptions go, I could certainly develop my own
> > exception hierarchy to provide support for different character sets.
> > Any half-way competent C++ developer could do so as well.  But why
> > should we have a multitude of individual solutions when direct support
> > at the library level would make things easier and more consistent?
> >
>
> The reason you should do it, and get some actual experience with it, is
> so that you'll understand what a bad idea it is.

I have done so in the past and it worked just fine.  Perhaps your
experience has been different and you could offer some valuable
lessons on exactly why this is such a bad idea?  That would certainly
be appreciated.

>
> > As for the issues with fstream and message catalogs, these cannot
> > really be addressed efficiently except at the library level.  And
> > again, even if it were done by individual developers, would it not be
> > better overall to build this support directly into the standard
> > library?
>
> Sure, if supporting those features was actually a good idea. If you want
> to convince people of that you need implementations and real-world
> experience. Stamping your foot and saying "You're wrong, you're wrong"
> wont' change anyone's mind.

Neither will simply asserting that something is a bad idea without
offering any rationale for that view.  Again, if you do have specific
objections I would be pleased to hear them.

>
> Incidentally, the original design for templated iostreams was done by
> members of the Japanese delegation to WG21. They didn't see a need for
> wide character file names.

I believe that statement could be reasonably ammended to:  They didn't
see a need for wide character file names at that point in time.

> I don't have the complete list handy, but Japan, Germany, France, and
> Russia, all of whom use characters outside the ASCII character set, are
> among the national bodies who approved the present international
> standard. Hardly demonstrative of the American jingoism that you claim.

Again, the mere fact that a particular decision was made at some point
in the past is not a valid argument that said decision must remain the
same for all time.  And I am scarcely claiming American jingoism,
merely stating that, upon careful review under present conditions,
there may be some basis for revisiting a few past decisions.

I would also note that Windows NT, since version 3.5.2, back in 1996,
has and does provide support for wide character file names.  NT
derivatives such as Windows 2000, which is pretty widely used, also
provide this support.  And please correct me if I'm wrong here, but I
also seem to recall reading something about wide character file name
support in Linux.  I was not able to locate the reference, so I cannot
state that as a fact.

In any case, the fact that a major vendor like Microsoft saw a need
for wide character file name support, and went through the trouble to
build it into their OS, would seem to indicate that I am not alone in
thinking this.

Personally, I have been quite surprised at the vehemence of the
objections to the suggestion of a few very small changes that would
have zero impact on existing code, and that would require no change
whatsoever in the programming style or habits of any developer who did
not need the additional features.  I would have thought that these
suggestions would have been welcomed as a good way to extend the
usefulness and usability of C++ in global markets.  Clearly I was
wrong about that!  :-)

Randy.

> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Michiel.Salters@cmg.nl (Michiel Salters)
Date: Thu, 27 Jun 2002 20:27:24 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...
> Randy Maddox wrote:
> >
> > All I am saying is that the effort to support
> > internationalization/localization in C++ missed a few small points,
>
> No, they weren't missed. Those were deliberate decisions, and the
> reasons for those decisions have been explained several times in this
> thread, and many times in the past.

I've just reread the thread, and I'm still undecided about the wchar_t
exceptions. Clearly we want to keep a single hierarchy, but having only
chars is starting to hurt. Up to 1999 the Netherlands had no problems
with ASCII, but today throwing a domain_error with a Euro-sign in the
what() is quite reasonable. That's not always present in the narrow
character set (must be ISO8859-15, -1 is still common). A wchar_t
interface coud help here.

I think it's reasonable to require that in C++0x,
* the exception ctors are overloaded for wchar_t*,
* what() is supplemented by wwhat() or a similar function returning wchar_t,
* calling what() (to get a char*) when the std::exception was initialized
  with a wchar_t* returns an implementation-defined string
  ( returning "No mapping from wchar_t* to char*" is legal, like returning
    all character values modulo UCHAR_MAX )
* calling wwhat()( to get a wchar_t* )when the exception was initialized with a
  char* uses an implementation-defined reversible mapping to convert
  the string.
* exceptions thrown by the implementation (like std::bad_alloc) are initialized
  with a char* (like in C++98).

The main issue here is that the implementation-defined reversible
mapping from char* to wchar_t* takes memory, which may be in short supply.
However, the wchar_t* wwhat() return value for the predefined exceptions
(like std::bad_alloc) could be statically allocated while other exceptions
can safely throw std::bad_alloc during construction. ( IIRC, it is
legal to throw B; from A::A in an throw A() statement ).

Regards,
--
Michiel Salters

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Thu, 27 Jun 2002 20:28:21 GMT Raw View

Randy Maddox wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D1A53C3.83C1F498@acm.org>...
> > Randy Maddox wrote:
> > >
> > > All I am saying is that the effort to support
> > > internationalization/localization in C++ missed a few small points,
> >
> >
> > No, they weren't missed. Those were deliberate decisions, and the
> > reasons for those decisions have been explained several times in this
> > thread, and many times in the past.
>
> Just because a decision is deliberate is no indication that decision
> will remain valid for all time.

Nobody has made such a claim. I was responding to your assertion that
"C++ missed a few small points...." It didn't.


> >
> > >
> > > Now, as far as exceptions go, I could certainly develop my own
> > > exception hierarchy to provide support for different character sets.
> > > Any half-way competent C++ developer could do so as well.  But why
> > > should we have a multitude of individual solutions when direct support
> > > at the library level would make things easier and more consistent?
> > >
> >
> > The reason you should do it, and get some actual experience with it, is
> > so that you'll understand what a bad idea it is.
>
> I have done so in the past and it worked just fine.

Really? How did you handle code that used multiple libraries with
different sets of exception instances? How did you write throw
specifiers?

>
> Neither will simply asserting that something is a bad idea without
> offering any rationale for that view.  Again, if you do have specific
> objections I would be pleased to hear them.

You've ignored everyone who has given you those explanations.

> > I don't have the complete list handy, but Japan, Germany, France, and
> > Russia, all of whom use characters outside the ASCII character set, are
> > among the national bodies who approved the present international
> > standard. Hardly demonstrative of the American jingoism that you claim.
>
> Again, the mere fact that a particular decision was made at some point
> in the past is not a valid argument that said decision must remain the
> same for all time.

In the absence of new facts there's no reason to change a decision. You
have cited nothing new. Everything you say was considered at the time,
and your position was not accepted. Live with it.

> And I am scarcely claiming American jingoism,

I see no other way of interpreting your statement:

>> How is it appropriate that we as a community should be so
>> self-centered as to deny the same level of support to other C++
>> developers around the world?

>
> Personally, I have been quite surprised at the vehemence of the
> objections to the suggestion of a few very small changes that would
> have zero impact on existing code, and that would require no change
> whatsoever in the programming style or habits of any developer who did
> not need the additional features.  I would have thought that these
> suggestions would have been welcomed as a good way to extend the
> usefulness and usability of C++ in global markets.  Clearly I was
> wrong about that!  :-)

Superficial analysis of complex issues usually gets that reaction.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: 22 Jun 2002 00:35:29 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:<000e01c21639$3fc95020$3502a8c0@nancy>...

> I have noticed some possible holes in support for
> internationalization/localization in the stdlib, i.e., places where
> char is used explicitly, preventing use of other character types.

You're not the first.

Saying that something is missing isn't sufficient.  Someone has to
specify what should be there instead, and what it should mean.

> The two specific places this occurs that have caused headaches for me are:

> 1) The exception classes, in which the return value of what() is
> specified as char *, and the constructors are all hardwired to use
> std::string => std::basic_string<char>.  It would be helpful to
> provide a class std::basic_exception, similar to class
> std::basic_string, which could be parameterized with the character
> type of the string passed to the constructor.

So how would you use such a class?

> Of course, the return value of the what() member would also be so
> parameterized.

So what should someException.what< MyClass const* >() mean for an
exception initialized with std::string?  Or for std::bad_alloc?

It's not enough to say that the templated functions should exist; you
have to specify what they mean.

> Then class std::exception could become a typedef for
> std::basic_exception<char>, exactly parallel to the typedef for
> std::string as std::basic_string<char>.  This would have no impact
> on existing code, but would allow exceptions to contain strings in
> local character sets.

And make it impossible to catch all of the standard exceptions with a
single catch clause.

> 2) The fstream classes, which are already templated on the character
> type contained in the file itself, but the constructor and open()
> members require a char *, which limits their usefulness with file
> systems that support non-char file names.  It would be helpful to
> provide a templated ctor and open() member that would support other
> character types.  Again, this would have no impact on existing code,
> but would be more friendly for both developers and users whose file
> system supports their local character set.

And what should it do if the file system doesn't support other
character sets?  One might expect opening L"abc" and "abc" to get the
same file, for example.  But how?

> In personal email correspondence with Herb Sutter, he noted the
> following places in the Standard where similar problems may occur:

> > In clause 18, type_info::name(), bad_cast::what(),
> > bad_typeid::what(), exception::what(), and bad_exception::what()
> > are all possibly unhelpfully specified to return a char* that "MAY
> > be a null-terminated multibyte string, suitable for conversion and
> > display as a wstring" (emphasis mine) which isn't very portable.

Guess what.  The return values of type_info::name() and the
exception::what() *are* pretty useless; an implementation can simply
return the empty string in all cases.

The intent, of course, is that the string contain *some* information.
But what information, and in what format, are not specified, so there
is nothing you can do with it portably.

This problem has nothing to do with different character types, of
course.

> > In clause 22, locale names are hardwired to be
> > basic_string<char>. The grouping stuff is probably fine as char
> > strings are sufficient to specify grouping rules.  (Randy:
> > Although it seems a bit odd to me that to specify the name of a
> > locale using a non-char character type I have to use a locale name
> > composed of chars.)

Again, the problem is one of interpretation.  What are the different
types of names supposed to mean?

> Following Herb's lead I also searched through the Standard and note
> that in clause 22 the messages in a message catalog are templated on
> character type, but the name of a message catalogue is hardwired as
> basic_string<char>.  This too could be addressed by a templated
> open() member to support catalogue names in the local character set.

What does this gain us?  The most frequent convention for locale names
is based on the ISO standard abreviations for the country and the
language.  These are easily expressed in ASCII.

There is already more liberty here than I like.

> In my humble opinion these minor inconsistencies should be fixed, in
> the spirit of more fully supporting local character sets.  I believe
> that the fixes suggested will work, with no impact on existing code.

For the moment, your fix has been just to provide additional
functions.  Until you have specified a semantic for these functions,
we can't begin.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Edward Diener" <eldiener@earthlink.net>
Date: 22 Jun 2002 00:35:46 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote in message
news:000e01c21639$3fc95020$3502a8c0@nancy...
> Hello,
>
> I have noticed some possible holes in support for
> internationalization/localization in the stdlib, i.e., places where char
is
> used explicitly, preventing use of other character types.
>
> The two specific places this occurs that have caused headaches for me are:
>
> 1) The exception classes, in which the return value of what() is specified
> as char *, and the constructors are all hardwired to use std::string =>
> std::basic_string<char>.  It would be helpful to provide a class
> std::basic_exception, similar to class std::basic_string, which could be
> parameterized with the character type of the string passed to the
> constructor.  Of course, the return value of the what() member would also
be
> so parameterized.  Then class std::exception could become a typedef for
> std::basic_exception<char>, exactly parallel to the typedef for
std::string
> as std::basic_string<char>.  This would have no impact on existing code,
but
> would allow exceptions to contain strings in local character sets.

I had previously started a thread pointing this out and suggesting that the
standard exception hierarchy be templatized on the character type, with the
return value of what() being based on the character type. This would allow a
possible error message thrown by some sort of wide character related
implementation to return a wide string error messsage.

The downside of this would be that code might have to catch an exception
based on both a narrow character and a wide character thrown exception.
However for implementions throwing wide character exceptions, the fact
should be well documented. If this change were made, the current use of the
exception hierarchy could be typedefs for the narrow character exceptions so
as not to affect already written code.

The one person who responded was Peter Dimov. Among other discussions, he
correctly pointed out that the result from a failure of "new" is a
std::bad_alloc exception whose what() is a narrow character string. My
suggestions to counter this is that the syntax "new<char_type> etc>" be
allowed for the sole reason that if this failed, the std::bad_alloc
exception thrown could be once more templatized on the character type. Even
without this syntax change, the other point I made regarding this is that
the what() from std::bad_alloc is pretty irrelevant anyway.

>
> 2) The fstream classes, which are already templated on the character type
> contained in the file itself, but the constructor and open() members
require
> a char *, which limits their usefulness with file systems that support
> non-char file names.  It would be helpful to provide a templated ctor and
> open() member that would support other character types.  Again, this would
> have no impact on existing code, but would be more friendly for both
> developers and users whose file system supports their local character set.

Once again I have brought this up in a discusssion and totally agree with
the point you make and I have made. C++ should support wide character file
names in the C++ standard library and in the fstream templates and
everywhere else. How the library specifies the mapping of this name to
operating systems which do not support wide character file names is the main
crux of the issue. My proposal was that it should be implementation defined
with suggestive guidelines along the lines of possible alternatives, such as
mapping the wide character name to a narrow character name if possible or
just failing with an exception thrown if not.

>
> In personal email correspondence with Herb Sutter, he noted the following
> places in the Standard where similar problems may occur:
>
> > In clause 18, type_info::name(), bad_cast::what(), bad_typeid::what(),
> > exception::what(), and bad_exception::what() are all possibly
> > unhelpfully specified to return a char* that "MAY be a null-terminated
> > multibyte string, suitable for conversion and display as a wstring"
> > (emphasis mine) which isn't very portable.
>
> > In clause 22, locale names are hardwired to be basic_string<char>. The
> > grouping stuff is probably fine as char strings are sufficient to
> > specify grouping rules.  (Randy:  Although it seems a bit odd to me that
> to specify the name of a locale using a non-char character type I have to
> use a locale name composed of chars.)
>
> Following Herb's lead I also searched through the Standard and note that
in
> clause 22 the messages in a message catalog are templated on character
type,
> but the name of a message catalogue is hardwired as basic_string<char>.
> This too could be addressed by a templated open() member to support
> catalogue names in the local character set.

I have also brought up this issue since I have been working on an
implementation for wide character support which needs to use message
catalogs. I have always found it amusing that the wide character message
catalog facet opens a narrow character message catalog. When I brought this
up, PJ Plauger responded saying, essentially, that the C++ standards
committee didn't quite work through all the issues regarding this when the
message catalog facet was defined. I believe quite simply that the message
catalog opened should also be parameterized on the character type. Again
this is an issue of what to do if the operating system does not support wide
character file names.

>
> In my humble opinion these minor inconsistencies should be fixed, in the
> spirit of more fully supporting local character sets.  I believe that the
> fixes suggested will work, with no impact on existing code.

I don't think these are just minor inconsistencies but view tham as major
issues in supporting C++ for locales which are fully dependent on wide
character sets, such as Japan and China. Otherwise large areas of the world
will adopt a language ( Java ? C# ? ) which already has full support for
wide characters in all of its functionality rather than a superior language
such as C++. While changes should not be made to a computer language on the
basis of attempts at popularity, I think it is imperative that wide
character support be added to the C++ standard library on a consistent basis
using the C++ template facility.

Another reason for regularizing wide character support in the standard
library in all situations using the C++ template facility is that C++ in the
future may add more basic character types as the computer world standardizes
on another character type ( some implementation of "Unicode" is a possible
example ). By using templates in all situations where character types are to
be considered in the C++ standard library, the addition in the future of
possible further basic character types to C++ will be relatively painless as
far as changes to the C++ standard are concerned.


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Sat, 22 Jun 2002 17:06:54 GMT Raw View

Edward Diener wrote:
>
> Again
> this is an issue of what to do if the operating system does not support wide
> character file names.
>

"Other than that, Mrs. Lincoln, what did you think of the play?"

> I don't think these are just minor inconsistencies but view tham as major
> issues in supporting C++ for locales which are fully dependent on wide
> character sets, such as Japan and China.

Oddly enough, neither the Japanese nor the Chinese are asking for this
sort of feature. Last I heard the Japanese position was that it's too
soon to do this; it's not understood well enough for standardization.

> Otherwise large areas of the world
> will adopt a language ( Java ? C# ? ) which already has full support for
> wide characters in all of its functionality rather than a superior language
> such as C++.

Java has a major problem: it standardized on 16-bit characters, and
Unicode no longer fits in 16 bits. If you're willing to live with
multi-character string representations (e.g. UTF-16) then you've already
got that capability in C and C++, with multibyte characters.

> While changes should not be made to a computer language on the
> basis of attempts at popularity, I think it is imperative that wide
> character support be added to the C++ standard library on a consistent basis
> using the C++ template facility.

It's already been done. The issue you're raising is that you don't like,
or perhaps don't understand, the decisions that were made and applied
consistently.

>
> Another reason for regularizing wide character support in the standard
> library in all situations using the C++ template facility is that C++ in the
> future may add more basic character types as the computer world standardizes
> on another character type ( some implementation of "Unicode" is a possible
> example ). By using templates in all situations where character types are to
> be considered in the C++ standard library, the addition in the future of
> possible further basic character types to C++ will be relatively painless as
> far as changes to the C++ standard are concerned.
>

Implement it, get some experience with it, write it up, and propose it.
Perhaps you've had some brilliant insight that has escaped the rest of
us.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: allan_W@my-dejanews.com (Allan W)
Date: 24 Jun 2002 22:40:08 GMT Raw View

"Randy Maddox" <rmaddox@isicns.com> wrote
> 1) The exception classes, in which the return value of what() is specified
> as char *, and the constructors are all hardwired to use std::string =>
> std::basic_string<char>.

That does seem inconsistent, but I think there were specific reasons for this.

> It would be helpful to provide a class
> std::basic_exception, similar to class std::basic_string, which could be
> parameterized with the character type of the string passed to the
> constructor.  Of course, the return value of the what() member would also be
> so parameterized.  Then class std::exception could become a typedef for
> std::basic_exception<char>, exactly parallel to the typedef for std::string
> as std::basic_string<char>.

Heavens, no! We need to be able to catch exceptions.

Code that currently catches std::logic_error would now have to catch
both std::logic_error<char> and std::logic_error<wchar_t>. And what
about std::logic_error<int> -- unlikely, perhaps, but not impossible.

> This would have no impact on existing code

You need to think about this some more. It has major impact on code.

> but
> would allow exceptions to contain strings in local character sets.

Maybe something simpler could solve the same problem. Provide a string
constructor to convert narrow- to wide- characters and vice-versa.

> 2) The fstream classes, which are already templated on the character type
> contained in the file itself, but the constructor and open() members require
> a char *, which limits their usefulness with file systems that support
> non-char file names.

Are there any?

3.9.1/1: "Objects declared as characters (char) shall be large enough to
store any member of the implementation's basic character set."

Presumably, data storage (such as a file) can hold data which does not
belong to the implementation's basic character set. But if the file system
understands 16-bit characters, then (IMHO) the "implementation's basic
character set" should also be 16-bit characters.

> It would be helpful to provide a templated ctor and
> open() member that would support other character types. Again, this would
> have no impact on existing code, but would be more friendly for both
> developers and users whose file system supports their local character set.

At least there I agree with you... overloading a new member function
wouldn't require anyone to call it.

> Any feedback on these suggestions will be appreciated.  In particular, I am
> specifically interested in how to get from posting this message to
> submitting a formal change request, and whether or not that is even a good
> idea.

Let's make an effort NOT to break every existing C++ program on the planet!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 25 Jun 2002 00:09:36 GMT Raw View

Allan W wrote:
>
> Code that currently catches std::logic_error would now have to catch
> both std::logic_error<char> and std::logic_error<wchar_t>. And what
> about std::logic_error<int> -- unlikely, perhaps, but not impossible.
>

But all that's needed for that is templatized catch clauses. Someone
should propose this:

try {
    // code here
    }
template <class T> catch(const std::logic_error<T>&)
    {
    }

The comiler should turn this into code that catches any object whose
type is an instance of this new logic_error template. That shouldn't be
very hard to implement. And it would be so useful...

> > This would have no impact on existing code
>
> You need to think about this some more. It has major impact on code.
>
> > but
> > would allow exceptions to contain strings in local character sets.
>
> Maybe something simpler could solve the same problem. Provide a string
> constructor to convert narrow- to wide- characters and vice-versa.

Which locale should it use?

>
> > 2) The fstream classes, which are already templated on the character type
> > contained in the file itself, but the constructor and open() members require
> > a char *, which limits their usefulness with file systems that support
> > non-char file names.
>
> Are there any?

None that require them, which is why "limit their usefulness" is a
political statement, not a technical one.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: dmeyer@dmeyer.net ()
Date: Tue, 25 Jun 2002 05:00:52 GMT Raw View

According to Pete Becker  <petebecker@acm.org>:
> Allan W wrote:
> >
> > Code that currently catches std::logic_error would now have to catch
> > both std::logic_error<char> and std::logic_error<wchar_t>. And what
> > about std::logic_error<int> -- unlikely, perhaps, but not impossible.
> >
>
> But all that's needed for that is templatized catch clauses. Someone
> should propose this:
>
> try {
>     // code here
>     }
> template <class T> catch(const std::logic_error<T>&)
>     {
>     }
>
> The comiler should turn this into code that catches any object whose
> type is an instance of this new logic_error template. That shouldn't be
> very hard to implement. And it would be so useful...

You're not kidding.  I hate to think how much time I've spent trying
to figure out what exception some library is throwing.  If I
could have replaced catch (...) with

template<typename T> catch (const T& e) {
   std::cerr << "It threw a " << typeid(e).name() << std::endl;
}

it would have saved me endless grief.
--
Dave Meyer
dmeyer@dmeyer.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Tue, 25 Jun 2002 05:00:11 GMT Raw View

Pete Becker wrote:
> But all that's needed for that is templatized catch clauses.
 > Someone should propose this:
>
> try { /* code here */ }
> template <class T> catch(const std::logic_error<T>&) { }
>
> The compiler should turn this into code that catches any object whose
> type is an instance of this new logic_error template. That shouldn't be
> very hard to implement. And it would be so useful...

You forgot the smiley :-)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 25 Jun 2002 06:52:49 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D146B7D.4DDB23C6@acm.org>...

> Implement it, get some experience with it, write it up, and propose it.
> Perhaps you've had some brilliant insight that has escaped the rest of
> us.

I claim no brilliant insight.  I am merely pointing out that while the
C++ stdlib does provide full support for ASCII, it seems to have a
couple of small holes in its support for other character sets.  As for
your suggestion above, how do you propose that I, a C++ developer like
millions of others and not a C++ stdlib implementor, do so?  I'm not
arguing that implementing these suggestions and trying them out is a
bad idea.  Au contraire.  It is indeed an excellent idea.  However, I
don't see how a non-stdlib-implementor can do much with them.  Perhaps
you would like to take a shot at it?  IMHO that could only be a good
thing.

>
> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 25 Jun 2002 06:52:32 GMT Raw View

kanze@gabi-soft.de (James Kanze) wrote in message news:<d6651fb6.0206210423.1ca048a5@posting.google.com>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:<000e01c21639$3fc95020$3502a8c0@nancy>...
>
> > I have noticed some possible holes in support for
> > internationalization/localization in the stdlib, i.e., places where
> > char is used explicitly, preventing use of other character types.
>
> You're not the first.
>
> Saying that something is missing isn't sufficient.  Someone has to
> specify what should be there instead, and what it should mean.

I thought that my suggestion of a class std::basic_exception analogous
to class std::basic_string was sufficient illustration.  Perhaps the
following snippet, assumed to reside in namespace std will help:

  template <typename Char_t>
  class basic_exception
  {
  public:

    basic_exception(const basic_string<Char_t> & what_arg);

    const Char_t * what() const;

    ...    // other members
  };

The declaration of this class could then be followed by a typedef for
class exception as:  typedef basic_exception<char> exception.  The
rest of the standard exception hierarchy would remain exactly as is,
i.e., no existing code would be broken, but international users could
derive directly from std::basic_exception to create exceptions that
use their own character set, which would be a big convenience for
them.

>
> > The two specific places this occurs that have caused headaches for me are:
>
> > 1) The exception classes, in which the return value of what() is
> > specified as char *, and the constructors are all hardwired to use
> > std::string => std::basic_string<char>.  It would be helpful to
> > provide a class std::basic_exception, similar to class
> > std::basic_string, which could be parameterized with the character
> > type of the string passed to the constructor.
>
> So how would you use such a class?

  class MyException : public std::basic_exception<MyChar_t>
  {
  public:

    MyException(const std::basic_string<MyChar_t> & what_arg)
      : std::basic_exception<MyChar_t>(what_arg)
    { }
  };

  throw MyException(L"...");    // assuming wide chars

>
> > Of course, the return value of the what() member would also be so
> > parameterized.
>
> So what should someException.what< MyClass const* >() mean for an
> exception initialized with std::string?  Or for std::bad_alloc?

See snippet in previous response above.  The what() member would
return a pointer to an array of the type of character used to
instantiate the templated exception class.  Thus an exception type
that used std::string would return a const char *.  This would have no
effect on std::bad_alloc, or any of the other standard exceptions.

>
> It's not enough to say that the templated functions should exist; you
> have to specify what they mean.

They mean the same as they ever did, i.e., the templated version
changes no semantics, it only allows developers to use their native
character set.

>
> > Then class std::exception could become a typedef for
> > std::basic_exception<char>, exactly parallel to the typedef for
> > std::string as std::basic_string<char>.  This would have no impact
> > on existing code, but would allow exceptions to contain strings in
> > local character sets.
>
> And make it impossible to catch all of the standard exceptions with a
> single catch clause.

How so?  None of the standard exceptions would be affected by this.

>
> > 2) The fstream classes, which are already templated on the character
> > type contained in the file itself, but the constructor and open()
> > members require a char *, which limits their usefulness with file
> > systems that support non-char file names.  It would be helpful to
> > provide a templated ctor and open() member that would support other
> > character types.  Again, this would have no impact on existing code,
> > but would be more friendly for both developers and users whose file
> > system supports their local character set.
>
> And what should it do if the file system doesn't support other
> character sets?  One might expect opening L"abc" and "abc" to get the
> same file, for example.  But how?

IMHO, in any case where the name passed to fstream::open() does not
match an existing file, the call should simply fail.  This is exactly
what happens with fstream::open() right now, and I see no need for
that to change.  All I am suggesting is that in class fstream add the
following member:

  template <typename Char_t>
  int open(const std::basic_string<Char_t> & fileName);

>
> > In personal email correspondence with Herb Sutter, he noted the
> > following places in the Standard where similar problems may occur:
>
> > > In clause 18, type_info::name(), bad_cast::what(),
> > > bad_typeid::what(), exception::what(), and bad_exception::what()
> > > are all possibly unhelpfully specified to return a char* that "MAY
> > > be a null-terminated multibyte string, suitable for conversion and
> > > display as a wstring" (emphasis mine) which isn't very portable.
>
> Guess what.  The return values of type_info::name() and the
> exception::what() *are* pretty useless; an implementation can simply
> return the empty string in all cases.
>
> The intent, of course, is that the string contain *some* information.
> But what information, and in what format, are not specified, so there
> is nothing you can do with it portably.
>
> This problem has nothing to do with different character types, of
> course.
>
> > > In clause 22, locale names are hardwired to be
> > > basic_string<char>. The grouping stuff is probably fine as char
> > > strings are sufficient to specify grouping rules.  (Randy:
> > > Although it seems a bit odd to me that to specify the name of a
> > > locale using a non-char character type I have to use a locale name
> > > composed of chars.)
>
> Again, the problem is one of interpretation.  What are the different
> types of names supposed to mean?

They just serve to identify a locale and have no inherent meaning
other than that.  How does allowing a user to specify a name that is
meaningful to them have any impact?

>
> > Following Herb's lead I also searched through the Standard and note
> > that in clause 22 the messages in a message catalog are templated on
> > character type, but the name of a message catalogue is hardwired as
> > basic_string<char>.  This too could be addressed by a templated
> > open() member to support catalogue names in the local character set.
>
> What does this gain us?  The most frequent convention for locale names
> is based on the ISO standard abreviations for the country and the
> language.  These are easily expressed in ASCII.

These are easily expressed in ASCII by those for whom ASCII is their
native character set, but perhaps a developer who speaks only
<substitute your favorite non-English language here> might find it
less so.

>
> There is already more liberty here than I like.

That may well be true, but what bearing does that have here?  The
issue is not what any of us like, but rather what makes C++ easier to
use with the native character sets of other countries.  We already
have full support for our native character set, why should we skimp on
support for developers and users in other countries?

>
> > In my humble opinion these minor inconsistencies should be fixed, in
> > the spirit of more fully supporting local character sets.  I believe
> > that the fixes suggested will work, with no impact on existing code.
>
> For the moment, your fix has been just to provide additional
> functions.  Until you have specified a semantic for these functions,
> we can't begin.
>
> --
> James Kanze                           mailto:jkanze@caicheuvreux.com
> Conseils en informatique orient   e objet/
>                     Beratung in objektorientierter Datenverarbeitung
> Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 25 Jun 2002 06:53:14 GMT Raw View

"Edward Diener" <eldiener@earthlink.net> wrote in message news:<yHHQ8.9665$Fv1.970767@newsread2.prod.itd.earthlink.net>...
> "Randy Maddox" <rmaddox@isicns.com> wrote in message
> news:000e01c21639$3fc95020$3502a8c0@nancy...
> > Hello,
> >
> > I have noticed some possible holes in support for
> > internationalization/localization in the stdlib, i.e., places where char
>  is
> > used explicitly, preventing use of other character types.
> I had previously started a thread pointing this out and suggesting that the
> standard exception hierarchy be templatized on the character type, with the
> return value of what() being based on the character type. This would allow a
> possible error message thrown by some sort of wide character related
> implementation to return a wide string error messsage.

Glad to see I wasn't the first to notice this.

> >
> > 2) The fstream classes, which are already templated on the character type
> > contained in the file itself, but the constructor and open() members
>  require
> > a char *, which limits their usefulness with file systems that support
> > non-char file names.  It would be helpful to provide a templated ctor and
> > open() member that would support other character types.  Again, this would
> > have no impact on existing code, but would be more friendly for both
> > developers and users whose file system supports their local character set.
>
> Once again I have brought this up in a discusssion and totally agree with
> the point you make and I have made. C++ should support wide character file
> names in the C++ standard library and in the fstream templates and
> everywhere else. How the library specifies the mapping of this name to
> operating systems which do not support wide character file names is the main
> crux of the issue. My proposal was that it should be implementation defined
> with suggestive guidelines along the lines of possible alternatives, such as
> mapping the wide character name to a narrow character name if possible or
> just failing with an exception thrown if not.

I must respectfully disagree with some of this.  First, I'm not sure
that any mapping between wide and narrow character file names is
necessary.  If the file system supports only one or the other, which
seems likely, and certainly simplest, then no mapping is necessary.
If a file system supports both, then it may be that both a wide and
narrow character name are mapped to the same file, in which case using
either should work.  If only one name is mapped to the file, then only
that name should work.  In the case of failure I don't think it is
appropriate to throw an exception because that is not the current
behavior and changing that behavior would break existing code.

> > Following Herb's lead I also searched through the Standard and note that
>  in
> > clause 22 the messages in a message catalog are templated on character
>  type,
> > but the name of a message catalogue is hardwired as basic_string<char>.
> > This too could be addressed by a templated open() member to support
> > catalogue names in the local character set.
>
> I have also brought up this issue since I have been working on an
> implementation for wide character support which needs to use message
> catalogs. I have always found it amusing that the wide character message
> catalog facet opens a narrow character message catalog. When I brought this
> up, PJ Plauger responded saying, essentially, that the C++ standards
> committee didn't quite work through all the issues regarding this when the
> message catalog facet was defined. I believe quite simply that the message
> catalog opened should also be parameterized on the character type. Again
> this is an issue of what to do if the operating system does not support wide
> character file names.

Interesting ...  IMHO, if a wide character name is supplied but not
supported, then the call should fail in the same way that it now would
now if a narrow character name is not found.  No new behavior should
be introduced at this point.

> Another reason for regularizing wide character support in the standard
> library in all situations using the C++ template facility is that C++ in the
> future may add more basic character types as the computer world standardizes
> on another character type ( some implementation of "Unicode" is a possible
> example ). By using templates in all situations where character types are to
> be considered in the C++ standard library, the addition in the future of
> possible further basic character types to C++ will be relatively painless as
> far as changes to the C++ standard are concerned.

Excellent point!

>
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Hyman Rosen <hyrosen@mail.com>
Date: Tue, 25 Jun 2002 06:53:53 GMT Raw View

dmeyer@dmeyer.net wrote:
> You're not kidding.

Yes he is, you just didn't get the joke. After all
the discussion of export, the logical next step is
to take template instantiation to run time!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 25 Jun 2002 15:21:56 GMT Raw View

allan_W@my-dejanews.com (Allan W) wrote in message news:<69eb7585.0206241323.7b686d58@posting.google.com>...
> "Randy Maddox" <rmaddox@isicns.com> wrote
> > 1) The exception classes, in which the return value of what() is specified
> > as char *, and the constructors are all hardwired to use std::string =>
> > std::basic_string<char>.
>
> That does seem inconsistent, but I think there were specific reasons for this.
>
> > It would be helpful to provide a class
> > std::basic_exception, similar to class std::basic_string, which could be
> > parameterized with the character type of the string passed to the
> > constructor.  Of course, the return value of the what() member would also be
> > so parameterized.  Then class std::exception could become a typedef for
> > std::basic_exception<char>, exactly parallel to the typedef for std::string
> > as std::basic_string<char>.
>
> Heavens, no! We need to be able to catch exceptions.
>
> Code that currently catches std::logic_error would now have to catch
> both std::logic_error<char> and std::logic_error<wchar_t>. And what
> about std::logic_error<int> -- unlikely, perhaps, but not impossible.

Negatory!  You have missed a key point here.  My suggestion was for a
new, templated exception class basic_exception that could support
different character sets, but with the standard std::exception being a
typedef as:  typedef basic_exception<char> exception.  Thus any and
all code that uses the standard exception hierarchy would still see
that hierarchy exactly as it is now.

I too am highly opposed to anything that would break existing code and
would never make a suggestion that would do so.  At least not unless
there were some extremely overpowering reason to do so, which there
definitely is not in this case.

> Let's make an effort NOT to break every existing C++ program on the planet!
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: Pete Becker <petebecker@acm.org>
Date: Tue, 25 Jun 2002 15:22:46 GMT Raw View

Randy Maddox wrote:
>
> Pete Becker <petebecker@acm.org> wrote in message news:<3D146B7D.4DDB23C6@acm.org>...
>
> > Implement it, get some experience with it, write it up, and propose it.
> > Perhaps you've had some brilliant insight that has escaped the rest of
> > us.
>
> I claim no brilliant insight.  I am merely pointing out that while the
> C++ stdlib does provide full support for ASCII, it seems to have a
> couple of small holes in its support for other character sets.  As for
> your suggestion above, how do you propose that I, a C++ developer like
> millions of others and not a C++ stdlib implementor, do so?

You do what C++ programmers do: you write code. If adding a constructor
to fstream, say, is too daunting then just do the work somewhere else:
write a function to convert wide character file names to byte names. Get
people to use it in real code. See what the problems are. Once you've
solved them then you're in a position to propose it. Chanting 'wide
characters are good' doesn't show that you understand the issues that
they pose, nor does it provide a sufficient basis for standardization.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 25 Jun 2002 15:22:19 GMT Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0206241048.6511f4d7@posting.google.com>...
> Pete Becker <petebecker@acm.org> wrote in message
> news:<3D146B7D.4DDB23C6@acm.org>...

> > Implement it, get some experience with it, write it up, and
> > propose it.  Perhaps you've had some brilliant insight that has
> > escaped the rest of us.

> I claim no brilliant insight.  I am merely pointing out that while
> the C++ stdlib does provide full support for ASCII, it seems to have
> a couple of small holes in its support for other character sets.  As
> for your suggestion above, how do you propose that I, a C++
> developer like millions of others and not a C++ stdlib implementor,
> do so?  I'm not arguing that implementing these suggestions and
> trying them out is a bad idea.  Au contraire.  It is indeed an
> excellent idea.  However, I don't see how a non-stdlib-implementor
> can do much with them.  Perhaps you would like to take a shot at it?
> IMHO that could only be a good thing.

You seem to have missed Pete's point.  Pete is saying that he doesn't
have the insight.  And that unless someone has it, and can show him
(and others, like myself) what has to be done, we will have to do
without it.

Off hand, I can definitely see the usefulness of allowing filenames
using wchar_t.  Regretfully, I don't see exactly what the semantics
should be.  For the rest, I find that having more than one hierarchy
of exceptions would be a disaster, and I think that allowing more
freedom in naming locales will not help portability either.

And I'm someone who consistently argues for a maximum of
internationalization, and who regularly works in a multi-lingual
environment.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: kanze@gabi-soft.de (James Kanze)
Date: Tue, 25 Jun 2002 16:34:07 CST Raw View

rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0206241024.8c2a561@posting.google.com>...
> kanze@gabi-soft.de (James Kanze) wrote in message
> news:<d6651fb6.0206210423.1ca048a5@posting.google.com>...
> > "Randy Maddox" <rmaddox@isicns.com> wrote in message
> > news:<000e01c21639$3fc95020$3502a8c0@nancy>...

> > > I have noticed some possible holes in support for
> > > internationalization/localization in the stdlib, i.e., places
> > > where char is used explicitly, preventing use of other character
> > > types.

> > You're not the first.

> > Saying that something is missing isn't sufficient.  Someone has to
> > specify what should be there instead, and what it should mean.

> I thought that my suggestion of a class std::basic_exception
> analogous to class std::basic_string was sufficient illustration.
> Perhaps the following snippet, assumed to reside in namespace std
> will help:

>   template <typename Char_t>
>   class basic_exception
>   {
>   public:
>
>     basic_exception(const basic_string<Char_t> & what_arg);
>
>     const Char_t * what() const;
>
>     ...    // other members
>   };

> The declaration of this class could then be followed by a typedef
> for class exception as: typedef basic_exception<char> exception.
> The rest of the standard exception hierarchy would remain exactly as
> is, i.e., no existing code would be broken, but international users
> could derive directly from std::basic_exception to create exceptions
> that use their own character set, which would be a big convenience
> for them.

As an international user, I don't really see what it buys me except
additional complexity.

> > > The two specific places this occurs that have caused headaches
> > > for me are:

> > > 1) The exception classes, in which the return value of what() is
> > > specified as char *, and the constructors are all hardwired to
> > > use std::string => std::basic_string<char>.  It would be helpful
> > > to provide a class std::basic_exception, similar to class
> > > std::basic_string, which could be parameterized with the
> > > character type of the string passed to the constructor.

> > So how would you use such a class?

>   class MyException : public std::basic_exception<MyChar_t>
>   {
>   public:
>
>     MyException(const std::basic_string<MyChar_t> & what_arg)
>       : std::basic_exception<MyChar_t>(what_arg)
>     { }
>   };

>   throw MyException(L"...");    // assuming wide chars

That's the easy part.  Just throwing exceptions is uninteresting
unless I intend to catch them somewhere.  How should I write a catch
clause to catch all possible exceptions, and display the error
message?

> > > Of course, the return value of the what() member would also be
> > > so parameterized.

> > So what should someException.what< MyClass const* >() mean for an
> > exception initialized with std::string?  Or for std::bad_alloc?

> See snippet in previous response above.  The what() member would
> return a pointer to an array of the type of character used to
> instantiate the templated exception class.  Thus an exception type
> that used std::string would return a const char *.  This would have
> no effect on std::bad_alloc, or any of the other standard
> exceptions.

In practice, and the standard conforms to that practice, I want all of
my exceptions to have a common base class; e.g. std::exception.  That
class has a single function, what(), which may return a string which I
can potentially display.

What you are proposing is an infinit number of unrelated exception
hierarchies.  How can I regroup exception handling in such cases?

> > It's not enough to say that the templated functions should exist;
> > you have to specify what they mean.

> They mean the same as they ever did, i.e., the templated version
> changes no semantics, it only allows developers to use their native
> character set.

I can understand an argument which said that the exception hierarchy
should only use wchar_t, instead of char.  (I don't accept it, because
of a lack of any existing practice, but I can understand it.)  I can't
understand an argument that says that we should support an infinite
number of unrelated exception hierarchies.  I don't even understand
one which says we should support two.

> > > Then class std::exception could become a typedef for
> > > std::basic_exception<char>, exactly parallel to the typedef for
> > > std::string as std::basic_string<char>.  This would have no
> > > impact on existing code, but would allow exceptions to contain
> > > strings in local character sets.

> > And make it impossible to catch all of the standard exceptions
> > with a single catch clause.

> How so?  None of the standard exceptions would be affected by this.

Today, if I write:

    catch ( std::exception const& error ) { ... }

I am pretty sure of catching *all* exceptions related to the standard
exceptions.  How do I do this if in fact, there are an infinite number
of hierarchies?

> > > 2) The fstream classes, which are already templated on the
> > > character type contained in the file itself, but the constructor
> > > and open() members require a char *, which limits their
> > > usefulness with file systems that support non-char file names.
> > > It would be helpful to provide a templated ctor and open()
> > > member that would support other character types.  Again, this
> > > would have no impact on existing code, but would be more
> > > friendly for both developers and users whose file system
> > > supports their local character set.

> > And what should it do if the file system doesn't support other
> > character sets?  One might expect opening L"abc" and "abc" to get
> > the same file, for example.  But how?

> IMHO, in any case where the name passed to fstream::open() does not
> match an existing file, the call should simply fail.  This is
> exactly what happens with fstream::open() right now, and I see no
> need for that to change.

Fine.  But how do you define "matches an existing file"?

In the standard, this will be "implementation defined", but we don't
want to add anything "implementation defined" to the standard without
a fairly good idea of what we should expect from a quality
implementation in a specific context.

> All I am suggesting is that in class fstream add the following
> member:

>   template <typename Char_t>
>   int open(const std::basic_string<Char_t> & fileName);

I understand that.  What I'm asking is: what should the semantic of
this function be?

    [...]
> > > > In clause 22, locale names are hardwired to be
> > > > basic_string<char>. The grouping stuff is probably fine as char
> > > > strings are sufficient to specify grouping rules.  (Randy:
> > > > Although it seems a bit odd to me that to specify the name of a
> > > > locale using a non-char character type I have to use a locale name
> > > > composed of chars.)

> > Again, the problem is one of interpretation.  What are the
> > different types of names supposed to mean?

> They just serve to identify a locale and have no inherent meaning
> other than that.  How does allowing a user to specify a name that is
> meaningful to them have any impact?

Currently, it is the implementation, and not the user, who specifies
the names and their meanings.  And the only "standard" in this regards
is based on ISO 639 and ISO 3166, both of which use only the 26
capital letters in the standard latin alphabet.

If you could show us a system which required, say, Chinese characters
to specify the locale, you would have a stronger argument.  But since
the locale specifies which characters are actually available, it is
preferable to use the lowest common denominator to specify it.

> > > Following Herb's lead I also searched through the Standard and
> > > note that in clause 22 the messages in a message catalog are
> > > templated on character type, but the name of a message catalogue
> > > is hardwired as basic_string<char>.  This too could be addressed
> > > by a templated open() member to support catalogue names in the
> > > local character set.

> > What does this gain us?  The most frequent convention for locale
> > names is based on the ISO standard abreviations for the country
> > and the language.  These are easily expressed in ASCII.

> These are easily expressed in ASCII by those for whom ASCII is their
> native character set, but perhaps a developer who speaks only
> <substitute your favorite non-English language here> might find it
> less so.

In the same way he might find "while" less intuitive than "pendant"?
Do you want to "internationalize" the C++ keywords as well?

The locales are something which must be addressed outside all locales,
just as the language must be.  Until I have specified a locale other
than the initial "C" locale, I don't have any other characters.  This
is something fundamental.

> > There is already more liberty here than I like.

> That may well be true, but what bearing does that have here?  The
> issue is not what any of us like, but rather what makes C++ easier
> to use with the native character sets of other countries.  We
> already have full support for our native character set, why should
> we skimp on support for developers and users in other countries?

You may have full support for your native character set, but I still
have to pay attention.  The languages I use daily require more
characters than those in the basic character set, and I already suffer
from the fact that with many implementations, they cannot be directly
represented in a char.  (I have to use negative values, rather than
the correct positive ones.  Which in turn causes problems with the
functions in ctype.h.)

For things specified within the program, the problem is (or
historically, was) the reverse -- the C language requires too many
characters, including some which weren't available in the seven bit
code sets I used.  Today, eight bit code sets have eliminated this
problem for most people, I think.

But it is still essential that things like exception messages and
locale specifiers be specified in a least common denominator -- in the
case of locale names, if anything, the standard should go further, and
require that the names contain only characters from the basic
character set (or maybe only alphanumerics plus two or three
additional characters).

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)69 63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: rmaddox@isicns.com (Randy Maddox)
Date: Tue, 25 Jun 2002 21:34:02 GMT Raw View

Pete Becker <petebecker@acm.org> wrote in message news:<3D17B44B.467FECC2@acm.org>...
> Allan W wrote:
> >
> > Code that currently catches std::logic_error would now have to catch
> > both std::logic_error<char> and std::logic_error<wchar_t>. And what
> > about std::logic_error<int> -- unlikely, perhaps, but not impossible.
> >

See my previous response, message 3 in this thread.  The statement
above is not correct based on what I have suggested, which will indeed
have no impact on the existing standard exception hierarchy.  My
suggested change is orthogonal to that hierarchy.

>
> But all that's needed for that is templatized catch clauses. Someone
> should propose this:
>
> try {
>     // code here
>     }
> template <class T> catch(const std::logic_error<T>&)
>     {
>     }
>
> The comiler should turn this into code that catches any object whose
> type is an instance of this new logic_error template. That shouldn't be
> very hard to implement. And it would be so useful...
>

This is a very interesting idea, and one I had wished for myself, but
entirely not necessary to support my suggestion here.

> --
> Pete Becker
> Dinkumware, Ltd. (http://www.dinkumware.com)
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Randy Maddox" <rmaddox@isicns.com>
Date: Thu, 20 Jun 2002 17:28:37 GMT Raw View

Hello,

I have noticed some possible holes in support for
internationalization/localization in the stdlib, i.e., places where char is
used explicitly, preventing use of other character types.

The two specific places this occurs that have caused headaches for me are:

1) The exception classes, in which the return value of what() is specified
as char *, and the constructors are all hardwired to use std::string =>
std::basic_string<char>.  It would be helpful to provide a class
std::basic_exception, similar to class std::basic_string, which could be
parameterized with the character type of the string passed to the
constructor.  Of course, the return value of the what() member would also be
so parameterized.  Then class std::exception could become a typedef for
std::basic_exception<char>, exactly parallel to the typedef for std::string
as std::basic_string<char>.  This would have no impact on existing code, but
would allow exceptions to contain strings in local character sets.

2) The fstream classes, which are already templated on the character type
contained in the file itself, but the constructor and open() members require
a char *, which limits their usefulness with file systems that support
non-char file names.  It would be helpful to provide a templated ctor and
open() member that would support other character types.  Again, this would
have no impact on existing code, but would be more friendly for both
developers and users whose file system supports their local character set.

In personal email correspondence with Herb Sutter, he noted the following
places in the Standard where similar problems may occur:

> In clause 18, type_info::name(), bad_cast::what(), bad_typeid::what(),
> exception::what(), and bad_exception::what() are all possibly
> unhelpfully specified to return a char* that "MAY be a null-terminated
> multibyte string, suitable for conversion and display as a wstring"
> (emphasis mine) which isn't very portable.

> In clause 22, locale names are hardwired to be basic_string<char>. The
> grouping stuff is probably fine as char strings are sufficient to
> specify grouping rules.  (Randy:  Although it seems a bit odd to me that
to specify the name of a locale using a non-char character type I have to
use a locale name composed of chars.)

Following Herb's lead I also searched through the Standard and note that in
clause 22 the messages in a message catalog are templated on character type,
but the name of a message catalogue is hardwired as basic_string<char>.
This too could be addressed by a templated open() member to support
catalogue names in the local character set.

In my humble opinion these minor inconsistencies should be fixed, in the
spirit of more fully supporting local character sets.  I believe that the
fixes suggested will work, with no impact on existing code.

Any feedback on these suggestions will be appreciated.  In particular, I am
specifically interested in how to get from posting this message to
submitting a formal change request, and whether or not that is even a good
idea.

Thanks.

Randy.


__________________________________________________________________________

Randall A. Maddox
C++ Author, Architect, Developer Innovative Solutions International
Phone:  703-883-8088 ext. 119  1608 Spring Hill Road, Suite 200
email:  rmaddox@isicns.com  Vienna, VA  22182  USA
__________________________________________________________________________

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]