Thread

Topic: codecvt_byname -- part of the standard or not?

Author: Dietmar Kuehl <dietmar.kuehl@claas-solutions.de>
Date: 2000/05/31 Raw View

Hi,
In article <O1LX4.667$TZ2.45131@newsread1.prod.itd.earthlink.net>,
  "Joe O'Leary" <joleary@artisoft.com> wrote:
> I'm looking into devising codecvt specializations to convert among the
> various character encodings out there (ISO-8859, EBCDIC, US-ASCII,
etc).

Sounds reasonable: This is what 'std::codecvt' is intended for.

> However since these will all end up using codecvt<char, char,
mbstate_t>,
> I'm guessing that I'll need something like codevct_byname.

Depends on what you want to do: If you just want to use code
conversions provided by the standard library implementation you are
using, it is likely that you want to use the '*_byname' version using
appropriate names from the documentation of your library. Basically,
the library is only required to provide two conversions, though, namely
those needed for the conversion between 'wchar_t' and 'char' and the
trivial conversion from 'char' to 'char' (these are used by the file
streams).

>  I can see it in the standard.

The interface is there, the definition is there but no instantiations
are required. Personally, I would guess that no current library
implementation really supports a 'std::codecvt_byname'... *checking*
Surprise! SGI STL/STLport at least has a non-trivial implementation of
'std::codecvt_byname' but I haven't looked in detail and can't tell
whether this is an implementation providing multiple named conversions
(Matt? Boris?). The other tree implementations I have checked
(including mine - well, I didn't need to check this one...) either do
not instantiate 'std::codecvt_byname<...>' at all or have trivial
implementation.

To do specific code conversions, you probably have to implement them
by deriving from 'std::codecvt<...>'. Once implemented, you then would
just create a suitable locale object and install it where needed.

>     "The codecvt_byname facets are not required by
>      the final C++ standard.  At the time of this writing,
>      the question of whether this situation is a defect
>      in the standard or whether the facets were
>      omitted intentionally is under discussion."
>
> Does anyone know the status of this?  In the future can I count on
> codecvt_byname or not?

Personally, I consider it a defect that any '_byname' facet
instantiation is required without also specifying the names and the
associated values or semantics. What is the point? You cannot rely on
any specific name to be supported except for "C" which happens to have
identical semantics to the base class anyway.

> Also in general, does writing my own templates for these encoding
> conversions sound like perhaps not such a good idea to anyone?  Is it
> overkill?  Has someone done it already?  Is there a better way?

My guess is that writing specific conversion facets for a small set of
conversions sound about reasonable. For example, writing translations
to different character sets using just bytes as representation is
actually fairly easy: You basically set up a translation table and
convert between the corresponding characters. Conversions for wide
characters are somewhat harder and for multi-byte representations with
shift states I can't even tell whether it is possible at all: Because
you have to use 'mstate_t' to hold the temporary state, it is unclear
whether partial characters can be stored and if so, it is clearly
non-portable...
--
<mailto:dietmar.kuehl@claas-solutions.de>
homepage: <http://www.informatik.uni-konstanz.de/~kuehl>

Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Martin Sebor <marts@att.net>
Date: 2000/05/29 Raw View

Joe O'Leary wrote:
>
> I'm looking into devising codecvt specializations to convert among the
> various character encodings out there (ISO-8859, EBCDIC, US-ASCII, etc).
> However since these will all end up using codecvt<char, char, mbstate_t>,
> I'm guessing that I'll need something like codevct_byname.  I can see it in
> the standard.
>
> However, in my a copy of "Standard C++ Iostreams and Locales" by Langer and
> Kreft, I came across the following ominous footnote (page 302)
>
>     "The codecvt_byname facets are not required by
>      the final C++ standard.  At the time of this writing,
>      the question of whether this situation is a defect
>      in the standard or whether the facets were
>      omitted intentionally is under discussion."
>
> Does anyone know the status of this?  In the future can I count on
> codecvt_byname or not?

An implementation is not required (but it is not prohibited, either -
although this is current an open library issue 120) to provide
instantiations (specializations) of the codecvt_byname<> template but
the primary template must still be defined (see 22.2.1.6). Unless your
specialization is different from any of those provided by the
implementation (e.g., by specializing on your own stateT) your code
would fail to compile.

>
> Also in general, does writing my own templates for these encoding
> conversions sound like perhaps not such a good idea to anyone?  Is it
> overkill?  Has someone done it already?  Is there a better way?

If all you want to do is convert between the character sets you show you
should probably derive from codecvt<char, char, mbstate_t> instead of
specializing either template. You choose codecvt<> over codecvt_byname<>
if your conversion is locale independent. You use derivation versus
specialization since none of the template parameters differs.

--Martin

>
> Thanks
>
> Joe O'
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Joe O'Leary" <joleary@artisoft.com>
Date: 2000/05/28 Raw View

I'm looking into devising codecvt specializations to convert among the
various character encodings out there (ISO-8859, EBCDIC, US-ASCII, etc).
However since these will all end up using codecvt<char, char, mbstate_t>,
I'm guessing that I'll need something like codevct_byname.  I can see it in
the standard.

However, in my a copy of "Standard C++ Iostreams and Locales" by Langer and
Kreft, I came across the following ominous footnote (page 302)

    "The codecvt_byname facets are not required by
     the final C++ standard.  At the time of this writing,
     the question of whether this situation is a defect
     in the standard or whether the facets were
     omitted intentionally is under discussion."

Does anyone know the status of this?  In the future can I count on
codecvt_byname or not?

Also in general, does writing my own templates for these encoding
conversions sound like perhaps not such a good idea to anyone?  Is it
overkill?  Has someone done it already?  Is there a better way?

Thanks

Joe O'

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]