Topic: parsing strings at compile-time


Author: restor <akrzemi1@gmail.com>
Date: Wed, 23 Jun 2010 17:20:40 CST
Raw View
Hi,
Compile-time string parsing would be very useful a tool for
implementing user-defined literals of custom syntax/format. Two
examples:

 Date date1 = "29-Feb-2012"_date;

Compliler checks whether 2012 is a leap year, and if it should allow
29 as a day in February. If not, error is reported during compilation.

 Regex regex1 = "[a-zA-Z_]+[0-9]*"_regex;

Compiler checks if we haven't made any syntx error in the expression.

C++ Standards Committee rejected this addition on the grounds that it
would make parsing of string literals too troublesome for compiler
implementers (if I got it right).

Now, I was thinkig that we could achieve the same functionality by
allowing a small extension to constexpr functions:
Allow references to arrays of a known size as arguments to constexpr
functions (or are they already allowed?), and allow the indexing
operator[] (of the build-in array type) to be acceptable constexpr
operation
on such arguments. I.e. the following should be valid:

 template< size_t N > constexpr
 char fun( char (&arr) [N], int i )
 {
   return arr[i];
 }

Having this, we can add both run-time and compile-time reporting of
invalid index values:

 template< size_t N > constexpr
 char fun( char (&arr) [N], int i )
 {
   return ( i >= 0 && i < N )
            ? arr[i]
                : throw Exception();
 }

This works at compile-time because for the valid values of i, the
throw expression is not evaluated, and for the invalid values of i, we
get a compilation error that the function is not constant expression.

The below is an attempt to implement a compile-time string parsing
tool that would be useful for checking the validity of date. I.e. it
would parse strings like "12-JUL-2007" and produce a Date or fail to
compile. I skip some
parts, as I only wanted to prove the possibility of implementing the
extraction of substrings, iteration, and error reporting.

 // ------ THE FACILITY FOR EXTRACTING SUBSTRINGS ------

 emplate< size_t N >
 struct SubStr
 {
       const char (&arr) [N + 1];
       size_t beg, end;
       constexpr SubStr( const char (&arr) [N + 1], size_t, size_t );
       constexpr char operator[]( size_t i ) {
               return i >= end - begin ? throw Error()
                    : return arr[begin + i];
       }
 }

 template< size_t N > constexpr
 SubStr<N> subStr( const char (&arr) [N + 1], size_t beg, size_t
end )
 {
   return SubStr<N>( arr, beg, end );
 }

 // ---- FUNCTIONS FOR PARSING COMPILETIME STRINGS ----

 template< size_t N, size_t M > constexpr
 bool equals( SubStr<N>, SubStr<M> );

 template< size_t N, size_t M > constexpr
 bool equals( SubStr<N> str, const char (&arr) [M + 1] )
 {
   return equals( str, subStr(arr, 0, M + 1) );
 }

 template< size_t N > constexpr
 int toInt( SubStr<N> );

 // ----

 constexpr
 unsigned month( SubStr<N> str )
 {
   return equals(str, "JAN") ? 1
            : equals(str, "FEB") ? 2
                  ...
                : equals(str, "DEC") ? 12
                : throw InvalidDate();
 }

 constexpr
 Date date( int day, int month, int year )
 {
       return (day < 1 && day > 31) ? throw InvalidDate()
                : invalid31mon(day, mon) ? throw InvalidDate()
                : invalid30mon(day, mon) ? throw InvalidDate()
                : invalid29mon(day, mon, year) ? throw InvalidDate()
                : Date( day, mon, year );
 }

 constexpr
 Date date( const char (&arr) [11 + 1] )
 {
   return date(
         toInt( subStr(arr, 0, 2) ),
         month( subStr(arr, 3, 6) ),
         toInt( subStr(arr, 7, 11) )
       );
 }

 //  ------ FUNCTIONS FOR ITERATING OVER THE STRING (2 DIRECTIONS)
----

 template< size_t N, typename Trnsf, typename State > constexpr
 State iterFwd( SubStr<N> str, size_t I, Trnsf transform, State
state )
 {
   return (I  > N) ? throw Exception()
         : (I == N) ? state
     : iterFwd( str, I + 1, transform, transform(state, str[I]) ) :
 }

 template< size_t N, typename Trnsf, typename State > constexpr
 State iterBack( SubStr<N> str, size_t I, Trnsf transform, State
state )
 {
   return (I  >= N) ? throw Exception()
         : (I == 0) ? transform( state, str[0] )
         : iterBack( str, I - 1, transform, transform(state, str[I]) ) :
 }

 // ---- EXAMPLE OF USAGE OF THE ITERATION FUNCTION ----

 struct GetNum;
 struct NumCollector; // have to be literal types

 template< size_t N > constexpr
 int toInt( SubStr<N> str )
 {
   return iterBack( str, N - 1, GetNum(), NumCollector() );
 }

Regards,
&rzej

--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: =3D?ISO-8859-1?Q?Daniel_Kr=3DFCgler?=3D <daniel.kruegler@googlemail.c=.om>
Date: Thu, 24 Jun 2010 12:19:57 CST
Raw View
On 24 Jun., 01:20, restor <akrze...@gmail.com> wrote:
[..]
> Now, I was thinkig that we could achieve the same functionality by
> allowing a small extension to constexpr functions:
> Allow references to arrays of a known size as arguments to constexpr
> functions (or are they already allowed?), and allow the indexing
> operator[] (of the build-in array type) to be acceptable constexpr
> operation on such arguments. I.e. the following should be valid:
>
> =C3=A1template< size_t N > constexpr
> =C3=A1char fun( char (&arr) [N], int i )
> =C3=A1{
> =C3=A1 =C3=A1return arr[i];
> =C3=A1}

This constexpr function is already valid as of the current FCD.

> Having this, we can add both run-time and compile-time reporting of
> invalid index values:
>
> =C3=A1template< size_t N > constexpr
> =C3=A1char fun( char (&arr) [N], int i )
> =C3=A1{
> =C3=A1 =C3=A1return ( i >= 0 && i < N )
> =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 ? arr[i]
> =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 : throw Exception=
();
> =C3=A1}

This won't compile in the run-time case, because the throw-
expression is always a non-constant expression.

> The below is an attempt to implement a compile-time string parsing
> tool that would be useful for checking the validity of date. I.e. it
> would parse strings like "12-JUL-2007" and produce a Date or fail to
> compile. I skip some
> parts, as I only wanted to prove the possibility of implementing the
> extraction of substrings, iteration, and error reporting.
>
> =C3=A1// ------ THE FACILITY FOR EXTRACTING SUBSTRINGS ------
>
> =C3=A1emplate< size_t N >
> =C3=A1struct SubStr
> =C3=A1{
> =C3=A1 =C3=A1 =C3=A1 =C3=A1const char (&arr) [N + 1];
> =C3=A1 =C3=A1 =C3=A1 =C3=A1size_t beg, end;
> =C3=A1 =C3=A1 =C3=A1 =C3=A1constexpr SubStr( const char (&arr) [N + 1], s=
ize_t, size_t );
> =C3=A1 =C3=A1 =C3=A1 =C3=A1constexpr char operator[]( size_t i ) {
> =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1return i >= end =
- begin ? throw Error()
> =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 =C3=A1 : r=
eturn arr[begin + i];
> =C3=A1 =C3=A1 =C3=A1 =C3=A1}
> =C3=A1}

This function (and the remaining ones as well as far as I see) have
the same problem as above.

HTH & Greetings from Bremen,

Daniel Kr=C5=98gler




--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: Mathias Gaunard <loufoque@gmail.com>
Date: Thu, 24 Jun 2010 12:23:24 CST
Raw View
On Jun 24, 12:20 am, restor <akrze...@gmail.com> wrote:
> Hi,
> Compile-time string parsing would be very useful a tool for
> implementing user-defined literals of custom syntax/format. Two
> examples:
>
>  Date date1 = "29-Feb-2012"_date;
>
> Compliler checks whether 2012 is a leap year, and if it should allow
> 29 as a day in February. If not, error is reported during compilation.
>
>  Regex regex1 = "[a-zA-Z_]+[0-9]*"_regex;
>
> Compiler checks if we haven't made any syntx error in the expression.
>
> C++ Standards Committee rejected this addition on the grounds that it
> would make parsing of string literals too troublesome for compiler
> implementers (if I got it right).

There was also a proposal of turning foo<"bar"> into foo<'b', 'a',
'r'>.
Don't know what happened to that one.


--
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use
mailto:std-c++@netlab.cs.rpi.edu<std-c%2B%2B@netlab.cs.rpi.edu>
]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]