Topic: P0085R0: Adding 0o for octal


Author: Thiago Macieira <thiago@macieira.org>
Date: Thu, 01 Oct 2015 11:24:29 -0700
Raw View
The paper is good and goes in the right direction of the discussion that
happened on the mailing lists. But the paper has a few side-effects, probably
unintentional, that need to be fixed.

The paper states

> Effects on existing code
>
> This proposal does not invalidate the existing syntax rule for integer
> literals starting with a zero. Also, under the current standard, any
> sequence starting with 0o is illegal. As a consequence, the proposed
> modification will not break existing code.

But it does. The problems are in the character literals. The paper proposes:

>       octal-escape-sequence:
>+            \o octal-digit
>             \ octal-digit
>-            \ octal-digit octal-digit
>-            \ octal-digit octal-digit octal-digit
>+            octal-escape-sequence octal-digit

It also removes

> The escape \ooo consists of the backslash followed by one, two, or three
> octal digits that are taken to specify the value of the desired character.

 and replaces that with

> The escape \onnn consists of the backslash followed o followed by one or
> more octal digits that are taken to specify the value of the desired
> character. The escape \ooo consists of the backslash followed by one or
> more octal digits that are taken to specify the value of the desired
> character.

it goes further to add "an octal or" in

> There is no limit to the number of digits in an octal or a hexadecimal
> sequence.

The side-effect, probably unintended, is of breaking the following code:

 const char s[] = "\0567";

The variable s, in current C++ as well as in C, is a string containing two
characters: one is 0o56 = 0x2E = 46 = '.' (in ASCII) and the second is '7'.
With the change, the above should be a string containing one single character,
which is an out-of-range 0o567 in systems with 8-bit characters.

Please don't break that. When used in character literals, octal sequences are
used often *because* their length is limited. It allows for code generators to
be stateless, since the equivalent hex sequence needs a non-hex break:

 const char s[] = "\x2E""7";

Therefore, I urge the following:
1) do not remove the 3-digit limitation from character literal octal escape
    sequences

2) if possible, require that the new \oNNN sequence be composed of exactly
   three digits, like the Unicode escape sequences do (\uNNNN and \UNNNNNNNN).
   I understand this would cause problems for sequences on machines with
   characters wider than 9 bits, but that would not be a new limitation and
   such environments are exceedingly rare.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.