Thread

Topic: FDIS <locale> questions

Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/04/16 Raw View

Bjorn Fahller <Bjorn.Fahller@ebc.ericsson.se> wrote:
>Nathan Myers wrote:
>
>Thanks, this clears up a bit, although everything isn't clear yet.
>
>> >1. Immutable.
>> >22.1.1/6 says that locale objects are immutable and especially that a
>> >facet reference retrieved from one is valid as long as the locale
>> >object exists. This is emphasised further by 22.1.2/1-4, use_facet.
>> >This, I think, contradicts the very existence of locale::operator=()
>> >(22.1.1.2/3-5).
>>
>> Locale objects have something like reference semantics.

Let's simplify a bit.

  std::locale a = locale("fr");
  std::locale b = locale("de");
  typedef std::ctype<char> Ct;
  const Ct& ct = use_facet<Ct>(a);
  a = b;
  ct.is(Ct::upper, 'a');  // error

>As I can see it, there are 3 alternatives.
>
>1. a is changed so that locale("fr") is now deleted, in which case ct is not
>   valid (in violation with 22.1.1/6 and 22.1.2/2)
>
>2. a refers to both locale("fr") and locale("de")
>   (if so, which is returned when calling use_facet<...>(a)?)
>
>3. a forgets about locale("fr") and we have a memory leak.
>
>Is there any other scenario that I've missed? Alternative 2 seems to make
>sense, although it means that locales must also have a collection of no longer
>used facets.

The value, locale("fr"), that was referenced by a exists in the program
until it is replaced by a copy of locale("de").  After that, references
to facets of the value locale("fr") are invalid.  This does not
violate 22.1.1/6 or 22.1.2/2.

In reference semantics it's important to keep the distinction
between variables and values.  Variables hold references to
values.  When a variable refers to a different value, the old
value may disappear.

>> >2. *_get<charT>::do_get and error flags.
>> >As I read 22.2.2.1.2/16 (num_get<T>::do_get for bool when boolalpha is
>> >set,) eofbit should not be set if eof is encountered (in==end) when the
>> >last character which forms a complete match is read. Also, it appears
>> >like failbit should not be set (only eofbit) if eof is found before
>> >having read the complete true/false word (but they match as far as
>> >have been read.) These two cases contradict the behaviour for numeric
>> >values, for which eof while parsing is not an error if the characters
>> >extracted so far yield a legal value for the data type (22.2.2.1.2/1-13).
>> >money_get also disallows success when eof is encountered
>> >(22.2.6.1.2/1.) num_get explicitly sets the error flag to goodbit on
>> >success (22.2.2.1.2/16,) while money_get promises not to change it
>> >(22.2.6.1.2/1.) num_get also sets the error flag to failbit and eofbit
>> >when needed (22.2.6.1.2/16,) while money_get adds the bits (bitwise
>> >or) (22.2.6.1.2/1.) Why this difference?
>>
>> These are results of different authors.  They should all be the same.
>
>Where "same" is what? To set failbit if eof if found after having extracted
>enough data to yield a valid value, or not to? To set eofbit if eof is found
>after having extracted enough data to yield a valid value or not to? To
>explicitly set error state, or to add with logical or?

Failbit should not be set if you got a valid value, regardless of
whether EOF showed up after.  Failbit should be set if you didn't get
a valid value, regardless of whether EOF is the reason.  eofbit
should be set if it got EOF when that is the reason for failure.
Should eofbit be set when you got a valid parse, but EOF delimited
it?  I don't recall.

>> In numeric, and particularly monetary, parsing, there are no minor errors.
>> Any error could mean anything.   Grouping for both numbers and money
>> should be the same: if there is no grouping, money parsing should stop
>> at the first separator.
>
>So, if I want a liberal interpretation of grouping, I must extract the
>characters into a string,remove all thousandsseparators, imbue an
>istringstream object with a locale that doesn't expect grouping, and
>extract the value?

Any locale will accept a number with no grouping.  Toss out the
separators and feed it back to the same locale.  You're not doing
users any favors, though, unless you format it back out and let
the user confirm the result.

>> >Why is a postfix currency symbol not extracted by
>> >money_get<T>::do_get,
>> >unless required to complete a negative number pattern? Making
>> >money_get<T>::do_get always extract the entire currency string (if not
>> >in error) would not make the library much heavier, but would
>> >drastically ease application programming. Now the application
>> >programmer must check for the negative pattern, the result
>> >(if it's negative) and then maybe read a currency symbol.
>>
>> An optional trailing currency symbol would require infinite
>> push-back, but the input iterator offers no pushback at all.
>
>Or, one could just set failbit and abort parsing if an unexpected
>character is found while extracting the currency symbol, which seems
>to be the case when it must be extracted (if the negative or positive
>pattern so requires.)   _

But then it's not optional.  What if the first character doesn't match?
How do we know if that's an error or just the next field?  Monetary
parsing is hard if there are too many options.  You need to constrain
the problem space before you can say or do anything useful.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Bjorn Fahller <Bjorn.Fahller@ebc.ericsson.se>
Date: 1998/04/16 Raw View

Nathan Myers wrote:

OK, now I know how it's supposed to work.

> Let's simplify a bit.
>
>   std::locale a = locale("fr");
>   std::locale b = locale("de");
>   typedef std::ctype<char> Ct;
>   const Ct& ct = use_facet<Ct>(a);
>   a = b;
>   ct.is(Ct::upper, 'a');  // error

> The value, locale("fr"), that was referenced by a exists in the program
> until it is replaced by a copy of locale("de").  After that, references
> to facets of the value locale("fr") are invalid.  This does not
> violate 22.1.1/6 or 22.1.2/2.

Reading 22.1.1/6 more carefully, I agree. It does indeed say the reference is valid
as long as the locale value does. 22.1.2/4 says the reference is valid at least as
long as any copy of the locale object exists, which is clearly not the case. I
assume the latter is a typo, though.   _
/Bjorn.

[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: bjorn@algonet.se (Bjorn Fahller)
Date: 1998/04/13 Raw View

I've been reading through FDIS (Nov 13 1997) chapter 22, locale, and
there are a number of things I wonder about.

1. Immutable.
22.1.1/6 says that locale objects are immutable and especially that a
facet reference retrieved from one is valid as long as the locale
object
exists. This is emphasised further by 22.1.2/1-4, use_facet. This, I
think, contradicts the very existense of locale::operator=()
(22.1.1.2/3-5).

2. *_get<charT>::do_get and error flags.
As I read 22.2.2.1.2/16 (num_get<T>::do_get for bool when boolalpha is
set,) eofbit should not be set if eof is encountered (in==end) when
the
last character which forms a complete match is read. Also, it appears
like failbit should not be set (only eofbit) if eof is found before
having read the complete true/false word (but they match as far as
have
been read.) These two cases contradict the behaviour for numeric
values,
for which eof while parsing is not an error if the characters
extracted
so far yield a legal value for the data type (22.2.2.1.2/1-13).
money_get also disallows success when eof is encountered
(22.2.6.1.2/1.)
num_get explicitly sets the error flag to goodbit on success
(22.2.2.1.2/16,) while money_get promises not to change it
(22.2.6.1.2/1.) num_get also sets the error flag to failbit and eofbit
when needed (22.2.6.1.2/16,) while money_get adds the bits (bitwise
or)
(22.2.6.1.2/1.) Why this difference?

3. num_get and money_get on grouping
Why so strict on checking the grouping? Just about the only situation
when grouping can be expected is in user interaction, and that is
exactly the situation when it's necessary to be a little liberal with
interpreting the input. Also, why the difference in treatment of
grouping errors when grouping is not enabled? num_get extracts all
characters and then sets failbit if unexpected thousands separators
were
found (22.2..2.1.2/8-12,) while money_get stops with failbit when
hitting the first thousands separator (22.2.6.1.2/1.)

4. currency symbol
How come the currency symbol is optional if showbase is disabled
(22.2.6.1.2/2,) when thousands separators are errors when grouping is
disabled (22.2.6.1.2/1)? This feels inconsistent.
Why is a postfix currency symbol not extracted by
money_get<T>::do_get,
unless required to complete a negative number pattern? Making
money_get<T>::do_get always extract the entire currency string (if not
in error) would not make the library much heavier, but would
drastically
ease application programming. Now the application programmer must
check
for the negative pattern, the result (if it's negative) and then maybe
read a currency symbol.

5. ctype<charT>::to_upper
22.2.1.1.2/7-8 describes the single character, and the string form of
the
to_upper member function. For the single character version, it is said
"returns the corresponding upper-case character if it is known to
exist,
or its argument if it is not." For the string version, it is said
"replaces each character *p in the range [low,high) for which a
corresponding upper-case character exists, with that character." What
does the string version do if such a character does not exist? As a
real-world example of such, consider the German double-s (looks almost
like the Greek letter beta,) which in its upper case incarnation is
"SS".

6. time (22.2.5)
Is there a special reason why there is no time_punct facet for use by
time_get and time_put, when there are money_punct and num_punct? It
would
seem both useful, and more consistent.

7. locale::combine
22.1.1.3/1-4 describes the "combine" member function, which appears to
be
non-mutating. As such, should it not be a const member function?

8. return value from num_get<T>::do_get
What is returned? 22.2.2.1.2/1-13 does not say.

9. Facet constructor
22.1.1.2/2 mentions how "facet" behaves for "refs" values of 0 and 1,
but
what about the other 2^sizeof(size_t)-2 values? They are not stated to
be
illegal, yield undefined behaviour or anything to that effect. In
fact,
they're not mentioned at all.

10. category enumeration
22.1.1.1.1/1 lists the members and requirements on the "category"
enumeration. Is "all" supposed to be part of the left hand side
expression?

11. collate<T>::do_transform
Here I guess I'm just dense, but I've read 22.2.4.1.2/2 several times
over, and I can't make any sense out of the 27 words. How does it work
and how is it supposed to be used?

   _
/Bjorn.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1998/04/14 Raw View

Bjorn Fahller<bjorn@algonet.se> wrote:
>I've been reading through FDIS (Nov 13 1997) chapter 22, locale, and
>there are a number of things I wonder about.

There are some good questions here.  Thank you for reading carefully.

>1. Immutable.
>22.1.1/6 says that locale objects are immutable and especially that a
>facet reference retrieved from one is valid as long as the locale
>object exists. This is emphasised further by 22.1.2/1-4, use_facet.
>This, I think, contradicts the very existence of locale::operator=()
>(22.1.1.2/3-5).

Locale objects have something like reference semantics.

>2. *_get<charT>::do_get and error flags.
>As I read 22.2.2.1.2/16 (num_get<T>::do_get for bool when boolalpha is
>set,) eofbit should not be set if eof is encountered (in==end) when the
>last character which forms a complete match is read. Also, it appears
>like failbit should not be set (only eofbit) if eof is found before
>having read the complete true/false word (but they match as far as
>have been read.) These two cases contradict the behaviour for numeric
>values, for which eof while parsing is not an error if the characters
>extracted so far yield a legal value for the data type (22.2.2.1.2/1-13).
>money_get also disallows success when eof is encountered
>(22.2.6.1.2/1.) num_get explicitly sets the error flag to goodbit on
>success (22.2.2.1.2/16,) while money_get promises not to change it
>(22.2.6.1.2/1.) num_get also sets the error flag to failbit and eofbit
>when needed (22.2.6.1.2/16,) while money_get adds the bits (bitwise
>or) (22.2.6.1.2/1.) Why this difference?

These are results of different authors.  They should all be the same.

>3. num_get and money_get on grouping
>Why so strict on checking the grouping? Just about the only situation
>when grouping can be expected is in user interaction, and that is
>exactly the situation when it's necessary to be a little liberal with
>interpreting the input. Also, why the difference in treatment of
>grouping errors when grouping is not enabled? num_get extracts all
>characters and then sets failbit if unexpected thousands separators
>were found (22.2..2.1.2/8-12,) while money_get stops with failbit
>when hitting the first thousands separator (22.2.6.1.2/1.)

In numeric, and particularly monetary, parsing, there are no minor errors.
Any error could mean anything.   Grouping for both numbers and money
should be the same: if there is no grouping, money parsing should stop
at the first separator.

>4. currency symbol
>How come the currency symbol is optional if showbase is disabled
>(22.2.6.1.2/2,) when thousands separators are errors when grouping is
>disabled (22.2.6.1.2/1)? This feels inconsistent.

Perhaps it is inconsistent.  It doesn't seem wrong to me.

>Why is a postfix currency symbol not extracted by
>money_get<T>::do_get,
>unless required to complete a negative number pattern? Making
>money_get<T>::do_get always extract the entire currency string (if not
>in error) would not make the library much heavier, but would
>drastically ease application programming. Now the application
>programmer must check for the negative pattern, the result
>(if it's negative) and then maybe read a currency symbol.

An optional trailing currency symbol would require infinite
push-back, but the input iterator offers no pushback at all.

>5. ctype<charT>::to_upper
>22.2.1.1.2/7-8 describes the single character, and the string form of
>the to_upper member function. For the single character version, it is
>said "returns the corresponding upper-case character if it is known to
>exist, or its argument if it is not." For the string version, it is said
>"replaces each character *p in the range [low,high) for which a
>corresponding upper-case character exists, with that character." What
>does the string version do if such a character does not exist? As a
>real-world example of such, consider the German double-s (looks almost
>like the Greek letter beta,) which in its upper case incarnation is
>"SS".

The to_upper behavior is based on the C library toupper.  What
do C libraries do?  Probably these features will be found to be
insufficient for many applications, and will be superceded by more
appropriate extensions that will then be standardized.

The C++ locale's current built-in facets only implement what is
obtainable from standard C locale descriptions.  Later I expect
there will be other standard facilities to draw upon, and C++
facet bindings for them.

>6. time (22.2.5)
>Is there a special reason why there is no time_punct facet for use by
>time_get and time_put, when there are money_punct and num_punct?  It
>would seem both useful, and more consistent.

No special reason.  It would be hard to specify such a facet, although
it may not be impossible.  Perhaps such a facet will be specified in
the next standard.

>7. locale::combine
>22.1.1.3/1-4 describes the "combine" member function, which appears to
>be non-mutating. As such, should it not be a const member function?

That's a (minor) bug.  Expect it to be fixed via defect report.

>8. return value from num_get<T>::do_get
>What is returned? 22.2.2.1.2/1-13 does not say.

It's supposed to say it returns the input iterator argument pointing
one past the last character consumed, just like all the other get()
functions.

>9. Facet constructor
>22.1.1.2/2 mentions how "facet" behaves for "refs" values of 0 and 1,
>but what about the other 2^sizeof(size_t)-2 values? They are not stated
>to be illegal, yield undefined behaviour or anything to that effect. In
>fact, they're not mentioned at all.

This is deliberate; it leaves room for vendor extensions.  (In practice,
it's a reference count.)

>10. category enumeration
>22.1.1.1.1/1 lists the members and requirements on the "category"
>enumeration. Is "all" supposed to be part of the left hand side
>expression?
Yes.  This leaves room for vendor extensions.


>11. collate<T>::do_transform
>Here I guess I'm just dense, but I've read 22.2.4.1.2/2 several times
>over, and I can't make any sense out of the 27 words. How does it work
>and how is it supposed to be used?

This is the same as the C library function strxfrm().  The point is
that if you are doing (e.g.) a binary search of a table of strings,
you don't want to run compare() on both the target string and each
table element, because that runs the state machine against the target
string for each comparison.  Instead, you can run it once on the
target string, then once on each element it is being compared against.
(You could even store the result in the table, for each element.)
The result of transform can be compared lexicographically, which
may be orders of magnitude faster.

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Bjorn Fahller <Bjorn.Fahller@ebc.ericsson.se>
Date: 1998/04/14 Raw View

Nathan Myers wrote:

Thanks, this clears up a bit, although everything isn't clear yet.

> >1. Immutable.
> >22.1.1/6 says that locale objects are immutable and especially that a
> >facet reference retrieved from one is valid as long as the locale
> >object exists. This is emphasised further by 22.1.2/1-4, use_facet.
> >This, I think, contradicts the very existence of locale::operator=()
> >(22.1.1.2/3-5).
>
> Locale objects have something like reference semantics.

Hmmm...

class MyFacetBase : public std::locale::facet
{
public:
  MyFacet() : facet(0) {}; // use ref-count
  static std::locale::id id;
  ...
};

class Facet1 : public MyFacetBase ...

class Facet2 : public MyFacetBase ...

// Facet1 and Facet2 have the same category since they share id.

std::locale loc1(std::locale::global(),new Facet1);

std::locate loc2(std::locale::global(),new Facet2);
// The Facet1 and Facet2 objects should be deleted when the last locale object

// referring to them is destroyed, according to 22.1.1.1.2/2

MyFacetBase& f1 = std::use_facet<MyFacetBase>(loc1);

// f1 MUST be valid as long as any copy of loc1 exists, according
// to 22.1.1/6 and 22.1.2/2

loc1 = loc2; // What happens here?


As I can see it, there are 3 alternatives.

1. loc1 is changed so that Facet1 is now deleted, in which case f1 is not
valid (in violation with 22.1.1/6 and 22.1.2/2)

2. loc1 refers to both Facet1 and Facet2 (if so, which is returned when
calling use_facet<MyFacetBase)(loc1)?)

3. loc1 forgets about Facet1 and we have a memory leak.

Is there any other scenario that I've missed? Alternative 2 seems to make
sense, although it means that locales must also have a collection of no longer
used facets.


> >2. *_get<charT>::do_get and error flags.
> >As I read 22.2.2.1.2/16 (num_get<T>::do_get for bool when boolalpha is
> >set,) eofbit should not be set if eof is encountered (in==end) when the
> >last character which forms a complete match is read. Also, it appears
> >like failbit should not be set (only eofbit) if eof is found before
> >having read the complete true/false word (but they match as far as
> >have been read.) These two cases contradict the behaviour for numeric
> >values, for which eof while parsing is not an error if the characters
> >extracted so far yield a legal value for the data type (22.2.2.1.2/1-13).
> >money_get also disallows success when eof is encountered
> >(22.2.6.1.2/1.) num_get explicitly sets the error flag to goodbit on
> >success (22.2.2.1.2/16,) while money_get promises not to change it
> >(22.2.6.1.2/1.) num_get also sets the error flag to failbit and eofbit
> >when needed (22.2.6.1.2/16,) while money_get adds the bits (bitwise
> >or) (22.2.6.1.2/1.) Why this difference?
>
> These are results of different authors.  They should all be the same.

Where "same" is what? To set failbit if eof if found after having extracted
enough data to yield a valid value, or not to? To set eofbit if eof is found
after having extracted enough data to yield a valid value or not to? To
explicitly set error state, or to add with logical or?

> In numeric, and particularly monetary, parsing, there are no minor errors.
> Any error could mean anything.   Grouping for both numbers and money
> should be the same: if there is no grouping, money parsing should stop
> at the first separator.

So, if I want a liberal interpretation of grouping, I must extract the
characters into a string,remove all thousandsseparators, imbue an
istringstream object with a locale that doesn't expect grouping, and extract
the value?

> >Why is a postfix currency symbol not extracted by
> >money_get<T>::do_get,
> >unless required to complete a negative number pattern? Making
> >money_get<T>::do_get always extract the entire currency string (if not
> >in error) would not make the library much heavier, but would
> >drastically ease application programming. Now the application
> >programmer must check for the negative pattern, the result
> >(if it's negative) and then maybe read a currency symbol.
>
> An optional trailing currency symbol would require infinite
> push-back, but the input iterator offers no pushback at all.

Or, one could just set failbit and abort parsing if an unexpected character is
found while extracting the currency symbol, which seems to be the case when it
must be extracted (if the negative or positive pattern so requires.)   _
/Bjorn
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]