Topic: iostreams: Does imbue() need to be called before open()?
Author: juergen@monocerus.manannan.org (Juergen Heinzl)
Date: Mon, 13 Oct 2003 17:48:11 +0000 (UTC) Raw View
In article <2Ehib.745634$Ho3.185470@sccrnsc03>, "Chuck McDevitt" wrote:
> Someone told me that when using standard iostreams, if you want to use
> imbue(), you have to call imbue() before opening the iostream to a file.
>
> Is this true? This would be a really nasty restriction:
[-]
No, but ...
The but, to the best of my knowledge, is that code conversions can be
state-dependent and if your iostream is buffered it's possible you're
trying to imbue a new locale in the middle of a state-dependent
conversion.
Best so to either all imbue before doing anything else or after
having flushed the streams' buffer.
Hoping not to be totally wrong 8-}
Juergen
--
\ Real name : Juergen Heinzl \ no flames /
\ EMail Private : juergen@manannan.org \ send money instead /
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: kanze@gabi-soft.fr
Date: Tue, 14 Oct 2003 21:06:45 +0000 (UTC) Raw View
Chuck_McDevitt@c-o-m-c-a-s-t.net ("Chuck McDevitt") wrote in message
news:<2Ehib.745634$Ho3.185470@sccrnsc03>...
> Someone told me that when using standard iostreams, if you want to use
> imbue(), you have to call imbue() before opening the iostream to a
> file.
> Is this true?
It's not quite that bad, but...
> This would be a really nasty restriction:
There are more than a few nasty restrictions, linked with the use of the
codecvt facet in the filebuf. Basically, if rdbuf() != 0,
basic_ios::imbue calls rdbuf()->imbue. In the case of a filebuf, there
is a precondition: "If the file is not positioned at its beginning and
the encoding of the current locacle as determined by
a_codecvt.encoding() is state-dependent then that facet is the same as
the corresponding facet of loc.
This means that if you are using a non-state-dependant encoding, you can
change facets at will. If the file was one imbued with a state
dependant facet, however, you must take care to ensure that any imbued
locale contains the same codecvt facet; something along the lines of the
following would be safe:
typedef std::codecvt< char, char, std::char_traits< char >::state_type >
Cvt ;
stream.imbue(
std::locale(
desiredLocale,
std::use_facet< Cvt >( stream.rdbuf()->getloc() ) ) ) ;
Alternatively, you can avoid touching the locale of the filebuf by using
something like the following:
streambuf* save = source.rdbuf() ;
source.rdbuf( NULL ) ;
source.imbue( newLocale ) ;
source.rdbuf( save ) ;
This solution actually appeals to me as a means of managing the code
conversion in the filebuf separately from the locale used for
formatting/parsing.
> 1) you couldn't change locales part way through writing to/reading from a
> stream.
You can't in general. If the goal is to change the way e.g. numbers or
dates are parsed, see above. If the goal is to change the way the
stream is encoded, you can only do this if you have been using an
encoding without state previously.
For file types like HTML2 or XML, where you have to read into the file
in order to determine the encoding, you must start with plain ASCII (or
some other single byte encoding, like ISO 8859-1), and use that, hoping
for the best, until you can actually determine the encoding.
> 2) you couldn't construct an fstream from a FILE * and then call
> imbue(), since constructing from a FILE * means the file is already
> open.
You can't currently construct an fstream from a FILE* in a strictly
standard fstream; the ability to do so is an extension. So you must
check what the implementation says concerning the extension. Typically,
I would expect the same rules as after an open, i.e. those above.
> My guess is that it is legal to call imbue() at any time. What's the
> real answer?
Your guess is wrong.
--
James Kanze GABI Software mailto:kanze@gabi-soft.fr
Conseils en informatique orient e objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: rmaddox@isicns.com (Randy Maddox)
Date: Wed, 15 Oct 2003 02:05:26 +0000 (UTC) Raw View
Chuck_McDevitt@c-o-m-c-a-s-t.net ("Chuck McDevitt") wrote in message news:<2Ehib.745634$Ho3.185470@sccrnsc03>...
> Someone told me that when using standard iostreams, if you want to use
> imbue(), you have to call imbue() before opening the iostream to a file.
>
> Is this true? This would be a really nasty restriction:
>
> 1) you couldn't change locales part way through writing to/reading from a
> stream.
> 2) you couldn't construct an fstream from a FILE * and then call imbue(),
> since constructing from a FILE * means the file is already open.
>
> My guess is that it is legal to call imbue() at any time. What's the real
> answer?
>
>
The real answer is that while it is legal to call imbue() at any time,
the fact of the matter is that this can be quite unsafe. If you look
at any of the examples in the Standard (e.g., 22.2.8/4/5) you will
note that each stream is imubed with a locale PRIOR to any
input/output, which should be safe.
Also, if you have access to the Langer and Kreft text "Standard C++
IOStreams and Locales", which should definitely be on your bookshelf,
a note in 2.1.4 headed "A Note on Proper Imbuing" points out some of
the hazards of changing a stream's locale during ongoing input/output.
Bottom line, as long as the call to imbue precedes any input/output
you should be good.
I believe this to be good advice to the best of my knowledge, but if
anyone else out there has better advice, please do jump right in. :-)
Randy.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: kanze@gabi-soft.fr
Date: Thu, 16 Oct 2003 08:47:38 +0000 (UTC) Raw View
rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0310140703.33216140@posting.google.com>...
> The real answer is that while it is legal to call imbue() at any time,
The real answer is that std::basic_ios::imbue calls rdbuf()->imbue, and
that the standard places some very concrete preconditions on
filebuf::imbue. Violating a precondition is NOT legal, and results in
undefined behavior.
On the other hand, you really have to know where to look to find this
information. (Dietmar Kuehl pointed it out to me -- before that, I
pretty much thought that it was legal too.) You really shouldn't have
to search under filebuf to find a precondition for basic_ios (which
isn't always one, because if you know that your basic_ios is really a
stringstream, there's no problem).
--
James Kanze GABI Software mailto:kanze@gabi-soft.fr
Conseils en informatique orient e objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: rmaddox@isicns.com (Randy Maddox)
Date: Fri, 17 Oct 2003 04:11:59 +0000 (UTC) Raw View
kanze@gabi-soft.fr wrote in message news:<d6652001.0310150522.c04bbe@posting.google.com>...
> rmaddox@isicns.com (Randy Maddox) wrote in message
> news:<8c8b368d.0310140703.33216140@posting.google.com>...
>
> > The real answer is that while it is legal to call imbue() at any time,
>
> The real answer is that std::basic_ios::imbue calls rdbuf()->imbue, and
> that the standard places some very concrete preconditions on
> filebuf::imbue. Violating a precondition is NOT legal, and results in
> undefined behavior.
>
> On the other hand, you really have to know where to look to find this
> information. (Dietmar Kuehl pointed it out to me -- before that, I
> pretty much thought that it was legal too.) You really shouldn't have
> to search under filebuf to find a precondition for basic_ios (which
> isn't always one, because if you know that your basic_ios is really a
> stringstream, there's no problem).
>
> --
> James Kanze GABI Software mailto:kanze@gabi-soft.fr
> Conseils en informatique orient e objet/ http://www.gabi-soft.fr
> Beratung in objektorientierter Datenverarbeitung
> 11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
>
James,
Thanks for the clarification and direction. Following your pointer I
ended up in the Standard at 27.8.1.4/17-19 and found text quite
similar to the warnings given by Langer and Kreft. Good to know where
in the Standard those warnings came from. As noted there, it seems
that you should be OK in the case of non-state-dependent encoding, or,
as you note, with a stringstream. Is that accurate advice?
Also, if you would be so kind, could you please provide any insight
into the note in para. 19 about possibly requiring reconversion of
previously converted characters or reconstruction of the original file
contents. I'm not sure what that means, but it sounds scary.
In any case, it seems clear that changing the locale may drastically
alter how input/output data is converted, which in turn will certainly
have an impact on the calling code. One instance in which this seems
useful, and even necessary, is the case of an XML parser that reads an
initial header in the file to determine the encoding of the rest of
the file and then switches to that encoding.
Randy.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: kanze@gabi-soft.fr
Date: Sun, 19 Oct 2003 02:15:07 +0000 (UTC) Raw View
rmaddox@isicns.com (Randy Maddox) wrote in message
news:<8c8b368d.0310160704.2aa5526d@posting.google.com>...
> kanze@gabi-soft.fr wrote in message
> news:<d6652001.0310150522.c04bbe@posting.google.com>...
> > rmaddox@isicns.com (Randy Maddox) wrote in message
> > news:<8c8b368d.0310140703.33216140@posting.google.com>...
> > > The real answer is that while it is legal to call imbue() at any
> > > time,
> > The real answer is that std::basic_ios::imbue calls rdbuf()->imbue,
> > and that the standard places some very concrete preconditions on
> > filebuf::imbue. Violating a precondition is NOT legal, and results
> > in undefined behavior.
> > On the other hand, you really have to know where to look to find
> > this information. (Dietmar Kuehl pointed it out to me -- before
> > that, I pretty much thought that it was legal too.) You really
> > shouldn't have to search under filebuf to find a precondition for
> > basic_ios (which isn't always one, because if you know that your
> > basic_ios is really a stringstream, there's no problem).
> Thanks for the clarification and direction. Following your pointer I
> ended up in the Standard at 27.8.1.4/17-19 and found text quite
> similar to the warnings given by Langer and Kreft. Good to know where
> in the Standard those warnings came from. As noted there, it seems
> that you should be OK in the case of non-state-dependent encoding, or,
> as you note, with a stringstream. Is that accurate advice?
For standard streambuf's. Adding, of course, strstreambuf to the list
of safe types. The problem is really that the standard gives no
specifications as to what the contract is -- a user supplied streambuf
can do anything it wishes. And when all you have got is an istream& or
an ostream&, you really have to suppose that you might be dealing with a
user defined streambuf.
In practice, I generally suppose that the only streambuf which actually
uses the locale is a filebuf, and suppose thus that its guarantees
hold. Unless, of course, I know the actual type of the streambuf -- in
such cases, you can count on the actual behavior of the streambuf.
> Also, if you would be so kind, could you please provide any insight
> into the note in para. 19 about possibly requiring reconversion of
> previously converted characters or reconstruction of the original file
> contents. I'm not sure what that means, but it sounds scary.
I think that it is only meant to be the rationale behind the
pre-conditions. I don't really understand it either, but if you have
problems with previously converted characters, it can only mean that
you've already read some.
Another thing that isn't clear is the relationship between the locale
and seeking. I would guess that the only case where imbue would be
allowed is after a seek to the beginning, but it would be nice if the
standard said so.
> In any case, it seems clear that changing the locale may drastically
> alter how input/output data is converted, which in turn will certainly
> have an impact on the calling code. One instance in which this seems
> useful, and even necessary, is the case of an XML parser that reads an
> initial header in the file to determine the encoding of the rest of
> the file and then switches to that encoding.
In the case of XML, it would seem that the information concerning the
code set can only appear after very limited header information, which
can only contain a limited number of characters and has a simple
structure. So you read ASCII (probably locale "C"), check the first few
characters, and rewind and shift to EBCDIC if necessary. Then you
finish reading up to the codeset information, and set the codeset. No
problem, since both ASCII and EBCDIC are single byte character codes,
without state information.
The case of HTML2 is somewhat more complex -- the codeset information is
nested deep in the <head> structure. And other elements, which may
precede it, may contain just about any character. The procedure for XML
should work, but anything with non-ASCII characters preceding the
codeset information will have been lost. In real life, I rather suspect
that I would read everything through the </head> tag into a string, more
or less byte by byte, and then work on the string. That way, once I
knew the codeset, I could easily start over. Of course, this means that
my code will be doing the translation (at least in the header), rather
than the filebuf, but I don't really see any other solution -- even if
it were guaranteed that I could call imbue after a rewind, this wouldn't
help if I were reading from a socket (not unlikely in the case of HTML),
or some other source which doesn't support rewind.
--
James Kanze GABI Software mailto:kanze@gabi-soft.fr
Conseils en informatique orient e objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Chuck_McDevitt@c-o-m-c-a-s-t.net ("Chuck McDevitt")
Date: Mon, 13 Oct 2003 06:50:15 +0000 (UTC) Raw View
Someone told me that when using standard iostreams, if you want to use
imbue(), you have to call imbue() before opening the iostream to a file.
Is this true? This would be a really nasty restriction:
1) you couldn't change locales part way through writing to/reading from a
stream.
2) you couldn't construct an fstream from a FILE * and then call imbue(),
since constructing from a FILE * means the file is already open.
My guess is that it is legal to call imbue() at any time. What's the real
answer?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]