Thread

Topic: Shell wildcard-like support in TR1 regex?

Author: James Dennett <jdennett@acm.org>
Date: Tue, 2 Jan 2007 12:16:46 CST Raw View

Greg Herlihy wrote:
> Pete Becker wrote:
>> It was never the goal of the TR1 regular expressions to enable
>> interpreting regular expressions without knowing the grammar that
>> defines their meanings. Indeed, that's an inherent contradiction. The
>> meaning of a regular expression is determined by the grammar in use.
>> Context matters.
>
> Since you have provided no alternate explanation why file globbing is
> not supported in the current library

One obvious explanation is that no proposal was made to
include it.

>, the only conclusion is that its
> absence is due to an oversight - and one that we should expect will be
> corrected in the future.

I've provided an alternative, more reasonable, conclusion
above.

> Therefore I assume that there is a proposal to add file globbing
> support to the Standard Library regular expression library.

Such a proposal will exist only if a volunteer writes it.
To date, I'm not aware of such a volunteer.

> This
> proposal would no doubt also explain why the addition of an alternate
> and confusingly similar regular expression syntax is not as
> questionable an idea as it would at first appear, but in reality would
> actually "enhance" the currently-supported regular expression syntax.

I cannot see how there can be "no doubt" about the content
of a hypothetical proposal; if you can provide a reference
to such a proposal (or write one up), we will be in a better
position to discuss its contents.

-- James

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: pjp@dinkumware.com ("P.J. Plauger")
Date: Tue, 2 Jan 2007 18:18:08 GMT Raw View

"Greg Herlihy" <greghe@pacbell.net> wrote in message
news:1167449736.022981.205570@73g2000cwn.googlegroups.com...

> Pete Becker wrote:
>> It was never the goal of the TR1 regular expressions to enable
>> interpreting regular expressions without knowing the grammar that
>> defines their meanings. Indeed, that's an inherent contradiction. The
>> meaning of a regular expression is determined by the grammar in use.
>> Context matters.
>
> Since you have provided no alternate explanation why file globbing is
> not supported in the current library, the only conclusion is that its
> absence is due to an oversight - and one that we should expect will be
> corrected in the future.

What a rash assumption. I can make up a dozen alternate explanations
without even trying.

> Therefore I assume that there is a proposal to add file globbing
> support to the Standard Library regular expression library.

Why should you assume that? Because nature abhors a vacuum? If you
think this is a serious oversight, make a proposal and it'll get due
consideration. If you don't, nothing is likely to change.

>                                                            This
> proposal would no doubt also explain why the addition of an alternate
> and confusingly similar regular expression syntax is not as
> questionable an idea as it would at first appear, but in reality would
> actually "enhance" the currently-supported regular expression syntax.

I personally doubt that it will explain that, unless you take the trouble
to do the work...

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Thu, 4 Jan 2007 10:37:22 CST Raw View

"P.J. Plauger" wrote:
> "Greg Herlihy" <greghe@pacbell.net> wrote in message
> news:1167449736.022981.205570@73g2000cwn.googlegroups.com...
>
> > Pete Becker wrote:
> >> It was never the goal of the TR1 regular expressions to enable
> >> interpreting regular expressions without knowing the grammar that
> >> defines their meanings. Indeed, that's an inherent contradiction. The
> >> meaning of a regular expression is determined by the grammar in use.
> >> Context matters.
> >
> > Since you have provided no alternate explanation why file globbing is
> > not supported in the current library, the only conclusion is that its
> > absence is due to an oversight - and one that we should expect will be
> > corrected in the future.
>
> What a rash assumption. I can make up a dozen alternate explanations
> without even trying.

But are any of those dozen explanations for the lack of file globbing
support in the regular expression anywhere near as preposterous as the
one I came up with? (I claimed that the omission was an "oversight").
My motivation for my post was simply this: after my earlier explanation
was evidentally a few converts short of attaining universal acceptance,
I figured that I should examine competing explanations (even though
none had been articulated) to ensure that explanations other than the
one I had offered would receive a full airing.

So to sustain the discussion, I had to pitch in by proposing an answer
so egregiously mistaken (and one which uses its own certitude as its
supporting evidence) that no one could possibly accept it as correct.
In fact, I reasoned that anyone who had even a slightly better
explanation to offer than mine - would face an overwhelming urge to
post it in response. After all, the best way to get a question answered
on USENET is not to ask it and wait for a response (from someone who
knows the answer and has the motivation to write a response). Instead,
the much more effective way (especially in the sleepier newsgroups) is
post a ludicrous explanation brimming with self assuredness. Now
whether anyone with the information will post a response or not - is no
longer in any doubt.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: pete@versatilecoding.com (Pete Becker)
Date: Thu, 28 Dec 2006 03:48:38 GMT Raw View

Greg Herlihy wrote:
>
> There is also a practical reason for leaving globbing out of the
> regular expression library: syntax. Now, while it is true that the
> current library supports several regular expression "dialects", the
> variation is largely in the richness of the vocabulary, and not due to
> differences over the intrepretation of shared forms (although there are
> exceptions).  Globbing syntax, in contrast, would not be just another
> "dialect" (defined as the same language spoken in two different ways) -
> but would instead create the opposite situation (two languages which
> sound identical, but differ in meaning). For example:
>
>     Pattern: foo.*
>
>         As RE: match "foo" and any characters up to a new line
>         Equivalent to Glob: foo*
>
>         As Glob: match "foo." and whatever follows (if anything)
>         Equivalent to RE: foo\..*
>
>      Pattern: foo?
>
>         As RE: match "fo" or "foo"
>         Equivalent to Glob: n/a
>
>         As Glob: match "foo" and the next character
>         Equivalent to RE: foo.
>

Horrors! But the same sort of thing occurs with the existing grammars.

 Pattern: (a)

 As BRE: match "(a)" literally
 As ERE: match any character "a", and treat the match
  as a capture group

 Patter: a|b
 As BRE: match "a|b" literally
 As ERE: match an character "a" or "b"

I don't give this argument much weight.

--

 -- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Fri, 29 Dec 2006 12:31:15 CST Raw View

Pete Becker wrote:
> Greg Herlihy wrote:
> >
> > There is also a practical reason for leaving globbing out of the
> > regular expression library: syntax. Now, while it is true that the
> > current library supports several regular expression "dialects", the
> > variation is largely in the richness of the vocabulary, and not due to
> > differences over the intrepretation of shared forms (although there are
> > exceptions).  Globbing syntax, in contrast, would not be just another
> > "dialect" (defined as the same language spoken in two different ways) -
> > but would instead create the opposite situation (two languages which
> > sound identical, but differ in meaning). For example:
> >
> >     Pattern: foo.*
> >
> >         As RE: match "foo" and any characters up to a new line
> >         Equivalent to Glob: foo*
> >
> >         As Glob: match "foo." and whatever follows (if anything)
> >         Equivalent to RE: foo\..*
> >
> >      Pattern: foo?
> >
> >         As RE: match "fo" or "foo"
> >         Equivalent to Glob: n/a
> >
> >         As Glob: match "foo" and the next character
> >         Equivalent to RE: foo.
> >
>
> Horrors! But the same sort of thing occurs with the existing grammars.
>
>  Pattern: (a)
>
>  As BRE: match "(a)" literally
>  As ERE: match any character "a", and treat the match
>   as a capture group
>
>  Patter: a|b
>  As BRE: match "a|b" literally
>  As ERE: match an character "a" or "b"
>
> I don't give this argument much weight.

Your examples completely miss the point. In both cases above, the
choice is whether to interpret the pattern as a literal and as a
non-literal pattern. And since the likelihood that anyone would ever
search for a "a|b" or "(a)" as a literal pattern is probably close to
nothing, it can safely be assumed that those forms specify non-literal
patterns. But even more importantly - when interpreted as non-literal
patterns, each form allows one - and only one - possible
interpretation.

File name globbing would prevent that kind of unambiguous
interpretation for any search pattern containing a "?", "*" or a "."
(which is just about all of them). In other words, there would be no
way for anyone to interpret a regular expression pattern correctly
absent a context - a context which may or may not be available. And
certainly context-free pattern specifications are far more useful and
flexible than those with context dependencies. Perl avoids such
dependencies in its regular expression support. And I would like to
think that the fact that C++ Library does the same owes more to a
conscious design decision (on someone's part) than by an accident of
sheer good luck.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jdennett@acm.org (James Dennett)
Date: Fri, 29 Dec 2006 20:03:24 GMT Raw View

Greg Herlihy wrote:
> Pete Becker wrote:
>> Greg Herlihy wrote:
>>> There is also a practical reason for leaving globbing out of the
>>> regular expression library: syntax. Now, while it is true that the
>>> current library supports several regular expression "dialects", the
>>> variation is largely in the richness of the vocabulary, and not due to
>>> differences over the intrepretation of shared forms (although there are
>>> exceptions).  Globbing syntax, in contrast, would not be just another
>>> "dialect" (defined as the same language spoken in two different ways) -
>>> but would instead create the opposite situation (two languages which
>>> sound identical, but differ in meaning). For example:
>>>
>>>     Pattern: foo.*
>>>
>>>         As RE: match "foo" and any characters up to a new line
>>>         Equivalent to Glob: foo*
>>>
>>>         As Glob: match "foo." and whatever follows (if anything)
>>>         Equivalent to RE: foo\..*
>>>
>>>      Pattern: foo?
>>>
>>>         As RE: match "fo" or "foo"
>>>         Equivalent to Glob: n/a
>>>
>>>         As Glob: match "foo" and the next character
>>>         Equivalent to RE: foo.
>>>
>> Horrors! But the same sort of thing occurs with the existing grammars.
>>
>>  Pattern: (a)
>>
>>  As BRE: match "(a)" literally
>>  As ERE: match any character "a", and treat the match
>>   as a capture group
>>
>>  Patter: a|b
>>  As BRE: match "a|b" literally
>>  As ERE: match an character "a" or "b"
>>
>> I don't give this argument much weight.
>
> Your examples completely miss the point.

If so, your point isn't clear to me either; it seemed to
be a claim that regexps' meanings is almost entirely able
to be understood with no additional context, and Pete's
examples above show that to be far from true.

> In both cases above, the
> choice is whether to interpret the pattern as a literal and as a
> non-literal pattern.

No; in both cases it is a pattern.  In the case of basic
regexp syntax, the pattern happens to be matched literally
as there are no metacharacters.

> And since the likelihood that anyone would ever
> search for a "a|b" or "(a)" as a literal pattern is probably close to
> nothing, it can safely be assumed that those forms specify non-literal
> patterns.

Code manipulating regular expressions is very likely to
do such literal matches, for one example, and it's not
hard to think of other contexts; I often work with data
in textual formats where '|' is a delimeter, for example.

> But even more importantly - when interpreted as non-literal
> patterns, each form allows one - and only one - possible
> interpretation.

Both forms above are interpreted as patterns, with two
different interpretations.  Maybe you mean to make the
weaker claim that, when interpreted according to two
grammars with the same metacharacters, the meanings are
the same?

-- James

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: pete@versatilecoding.com (Pete Becker)
Date: Fri, 29 Dec 2006 23:57:10 GMT Raw View

Greg Herlihy wrote:
>
> Your examples completely miss the point. In both cases above, the
> choice is whether to interpret the pattern as a literal and as a
> non-literal pattern.

That's a distinction without a difference. Each grammar tells you what
set of strings any particular regular expression matches. When you
change grammars, whether within the current set of grammars or to some
other one that isn't part of the current set, the meaning of a regular
expression can change. Whether you thought some particular character
matched exactly one character or more than one character doesn't really
matter. What matters is that it now means something different, and you
have to adjust your thinking accordingly.

>
> File name globbing would prevent that kind of unambiguous
> interpretation for any search pattern containing a "?", "*" or a "."
> (which is just about all of them).

And grammar differences prevent that kind of unambiguous interpretation
for any search pattern containing a "{", "(", or "|".

> In other words, there would be no
> way for anyone to interpret a regular expression pattern correctly
> absent a context - a context which may or may not be available.

It was never the goal of the TR1 regular expressions to enable
interpreting regular expressions without knowing the grammar that
defines their meanings. Indeed, that's an inherent contradiction. The
meaning of a regular expression is determined by the grammar in use.
Context matters.

--

 -- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Sat, 30 Dec 2006 12:19:06 CST Raw View

James Dennett wrote:
> No; in both cases it is a pattern.  In the case of basic
> regexp syntax, the pattern happens to be matched literally
> as there are no metacharacters.

With no metacharacters in the search pattern, there is no reason to use
a regular expression library to perform a search in the first place.
After all, std::string::find() would work just as well. The only reason
to use a regular expression library are for - rather unsurprisingly -
non-literal search patterns.

> > And since the likelihood that anyone would ever
> > search for a "a|b" or "(a)" as a literal pattern is probably close to
> > nothing, it can safely be assumed that those forms specify non-literal
> > patterns.
>
> Code manipulating regular expressions is very likely to
> do such literal matches, for one example, and it's not
> hard to think of other contexts; I often work with data
> in textual formats where '|' is a delimeter, for example.

It's not hard to think of any number of equally far-fetched scenarios.
Whether any exist in practice is the more pertinent issue. Because any
scenario in which a regular expression library is needed to perform
literal string searches is one that simply defies common sense.

> > But even more importantly - when interpreted as non-literal
> > patterns, each form allows one - and only one - possible
> > interpretation.
>
> Both forms above are interpreted as patterns, with two
> different interpretations.  Maybe you mean to make the
> weaker claim that, when interpreted according to two
> grammars with the same metacharacters, the meanings are
> the same?

Both forms have only a single interpretation when interpreted as a
(non-literal) regular expression - the only interpretation that makes
sense given the context.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Sat, 30 Dec 2006 12:19:16 CST Raw View

Pete Becker wrote:
> It was never the goal of the TR1 regular expressions to enable
> interpreting regular expressions without knowing the grammar that
> defines their meanings. Indeed, that's an inherent contradiction. The
> meaning of a regular expression is determined by the grammar in use.
> Context matters.

Since you have provided no alternate explanation why file globbing is
not supported in the current library, the only conclusion is that its
absence is due to an oversight - and one that we should expect will be
corrected in the future.

Therefore I assume that there is a proposal to add file globbing
support to the Standard Library regular expression library. This
proposal would no doubt also explain why the addition of an alternate
and confusingly similar regular expression syntax is not as
questionable an idea as it would at first appear, but in reality would
actually "enhance" the currently-supported regular expression syntax.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: jdennett@acm.org (James Dennett)
Date: Sat, 30 Dec 2006 19:28:13 GMT Raw View

Greg Herlihy wrote:
> James Dennett wrote:
>> No; in both cases it is a pattern.  In the case of basic
>> regexp syntax, the pattern happens to be matched literally
>> as there are no metacharacters.
>
> With no metacharacters in the search pattern, there is no reason to use
> a regular expression library to perform a search in the first place.

Not so; the input pattern may come from a user.

> After all, std::string::find() would work just as well.

Not if the user wished to have the ability to use
metacharacters.

> The only reason
> to use a regular expression library are for - rather unsurprisingly -
> non-literal search patterns.

As demonstrated above, this is not quite true; using
such a library is right whenever we want, at runtime,
the *ability* to use non-literal patterns.

>>> And since the likelihood that anyone would ever
>>> search for a "a|b" or "(a)" as a literal pattern is probably close to
>>> nothing, it can safely be assumed that those forms specify non-literal
>>> patterns.
>> Code manipulating regular expressions is very likely to
>> do such literal matches, for one example, and it's not
>> hard to think of other contexts; I often work with data
>> in textual formats where '|' is a delimeter, for example.
>
> It's not hard to think of any number of equally far-fetched scenarios.

Far-fetched is plain wrong here; these are simple, concrete
and common scenarios.

> Whether any exist in practice is the more pertinent issue.

That's the issue I addressed.

> Because any
> scenario in which a regular expression library is needed to perform
> literal string searches is one that simply defies common sense.

"Common sense" is not sufficient for good design.

>>> But even more importantly - when interpreted as non-literal
>>> patterns, each form allows one - and only one - possible
>>> interpretation.
>> Both forms above are interpreted as patterns, with two
>> different interpretations.  Maybe you mean to make the
>> weaker claim that, when interpreted according to two
>> grammars with the same metacharacters, the meanings are
>> the same?
>
> Both forms have only a single interpretation when interpreted as a
> (non-literal) regular expression - the only interpretation that makes
> sense given the context.

I can define additional regular expression grammars where that
is not the case.  That would be artificial.  Or we can take a
step backwards and look at all the cases where, even using the
same metacharacters, the same text denotes different patterns
for different grammars today.

-- James

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: bdawes@acm.org (Beman Dawes)
Date: Sun, 31 Dec 2006 02:10:14 GMT Raw View

Greg Herlihy wrote:
> Pete Becker wrote:
>> It was never the goal of the TR1 regular expressions to enable
>> interpreting regular expressions without knowing the grammar that
>> defines their meanings. Indeed, that's an inherent contradiction. The
>> meaning of a regular expression is determined by the grammar in use.
>> Context matters.
>
> Since you have provided no alternate explanation why file globbing is
> not supported in the current library, the only conclusion is that its
> absence is due to an oversight - and one that we should expect will be
> corrected in the future.
>
> Therefore I assume that there is a proposal to add file globbing
> support to the Standard Library regular expression library. This
> proposal would no doubt also explain why the addition of an alternate
> and confusingly similar regular expression syntax is not as
> questionable an idea as it would at first appear, but in reality would
> actually "enhance" the currently-supported regular expression syntax.

I may write such a proposal someday, but for the filesystem library
rather than the regular expression library. The traditional context for
such searches is filesystem searches; I'm not aware of any need for
wildcard/glob searches elsewhere, although the mechanism would be
convert a wildcard/glov expression into a regular expression, so it
could be used in any context a regular expression is used.

Such a function would be proposed for Boost first, and would only be
proposed to the C++ committee if a Boost implementation is found useful
by users. There have been several requests for such a feature in
Boost.Filesystem.

As far as the exact grammar, I would base it on the POSIX Pattern
Matching Notation rules, which are quite explicit and already part of an
ISO standard. The rules for converting a glob expression into a regular
expression are given in The Open Group Base Specifications Issue 6 IEEE
Std 1003.1, 2004 Edition, Shell Command Language, 2.13 Pattern Matching
Notation.

--Beman Dawes

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "James Kanze" <james.kanze@gmail.com>
Date: Mon, 1 Jan 2007 13:57:08 CST Raw View

Denise Kleingeist wrote:
> Scott Meyers wrote:
> > The other day I was asked whether TR1's regex support includes things
> > like shell wildcards, e.g, the ability to say things like "foo.*"
> > meaning "a file named foo with any extension."

> There is no support for "shell wildcards": only proper regular
> expression
> notation is used.

Strictly speaking, that's not true.  Proper regular expression
has support for concatenation, klein closure and or'ing, and
that's about it.  A proper regular expression can be converted
to a DFA.  (Presumably, extensions which are easily mapped to
the basic operators are acceptable.  My own implementation
supports ? and +, in addition to klein closure, for example.)
To the best of my knowledge, it's not possible to extend it to
capture substrings.  (If anyone knows differently, I'd be very
interested in hearing about it.  I have a situation where it is
necessary to generate a DFA; I would very much like to extend my
regular expressions to support substring capture as well, but I
don't currently know how to do it.)

What TR1 does support is Posix's extended regular expressions
(and some other extended regular expressions).  They map to
NFA's, but because of notation of the individual NFA states in
the implementation of capture, the NFA states cannot be merged
to form a DFA.

All of which is a nit, of course---filename globbing cannot be
implemented even with an NFA, and requires some ad hoc
constructs (including e.g. knowing the directory path
separator).

> Table 23 (syntax_option_type effects) lists the
> flavours
> of supported regular expressions which are essentially POSIX regexes
> plus ECMAscript regexes (the latter being a superset of of POSIX
> regexes
> covering popular extensions in a referencable standard). However, it is
> not too hard to arrive from shell wildcards at supported regular
> expressions!
> Just apply the following three transformations (in the order listed):
> - replace "." by "\\." (using C++ notation to escape the escape)
> - replace "?" by "."
> - repace "*" by ".*"

 -  Ensure that a literal . in the first position only matches a
    literal . in the expression.

 -  Ensure correct handling of the directory separator.

 -  Under Windows, ensure special handling for an initial "x:"
    (where x is an alphabetic character).

Probably a few other points I've missed as well.  (I seem to
recall reading somewhere an explination of why "{a,b,c}" could
not be replaced with "(a|b|c)", but I can't remember the
details.)  An implementation of regular expressions (extended or
not) could definitly be useful in the implementation of such a
feature (although it could be overkill), but the semantics are
different enough that you would definitely want a different
class (and probably a different interface).

--
James Kanze (Gabi Software)            email: james.kanze@gmail.com
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: usenet@aristeia.com (Scott Meyers)
Date: Sat, 23 Dec 2006 18:29:35 GMT Raw View

The other day I was asked whether TR1's regex support includes things
like shell wildcards, e.g, the ability to say things like "foo.*"
meaning "a file named foo with any extension."  I said I didn't think
there was any such support, but given that TR1 supports multiple regular
expression languages, it's possible that there is a capability I'm not
aware of, and it's hard to determine from documentation whether
something is *not* available.

So, is there any TR1 support for shell-like wildcards?

Thanks,

Scott

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Denise Kleingeist" <denise.kleingeist@googlemail.com>
Date: Sun, 24 Dec 2006 11:42:34 CST Raw View

Hi Scott!
Scott Meyers wrote:
> The other day I was asked whether TR1's regex support includes things
> like shell wildcards, e.g, the ability to say things like "foo.*"
> meaning "a file named foo with any extension."

There is no support for "shell wildcards": only proper regular
expression
notation is used. Table 23 (syntax_option_type effects) lists the
flavours
of supported regular expressions which are essentially POSIX regexes
plus ECMAscript regexes (the latter being a superset of of POSIX
regexes
covering popular extensions in a referencable standard). However, it is
not too hard to arrive from shell wildcards at supported regular
expressions!
Just apply the following three transformations (in the order listed):
- replace "." by "\\." (using C++ notation to escape the escape)
- replace "?" by "."
- repace "*" by ".*"
Some shells have character class support using regular expression
notation - there is  obviously no need to translate those.

Good luck, Denise!

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "Greg Herlihy" <greghe@pacbell.net>
Date: Sun, 24 Dec 2006 11:46:39 CST Raw View

Scott Meyers wrote:
> The other day I was asked whether TR1's regex support includes things
> like shell wildcards, e.g, the ability to say things like "foo.*"
> meaning "a file named foo with any extension."  I said I didn't think
> there was any such support, but given that TR1 supports multiple regular
> expression languages, it's possible that there is a capability I'm not
> aware of, and it's hard to determine from documentation whether
> something is *not* available.
>
> So, is there any TR1 support for shell-like wildcards?

No. For one, file name "globbing" is too limited in its pattern
matching abilities to qualify as a regular expression syntax (in
particular, globbing lacks support for quantification or alternation of
pattern sequences). And file globbing is certainly far less capable
than any of the regular expression syntaxes that the Library currently
recognizes.

There is also a practical reason for leaving globbing out of the
regular expression library: syntax. Now, while it is true that the
current library supports several regular expression "dialects", the
variation is largely in the richness of the vocabulary, and not due to
differences over the intrepretation of shared forms (although there are
exceptions). Globbing syntax, in contrast, would not be just another
"dialect" (defined as the same language spoken in two different ways) -
but would instead create the opposite situation (two languages which
sound identical, but differ in meaning). For example:

    Pattern: foo.*

        As RE: match "foo" and any characters up to a new line
        Equivalent to Glob: foo*

        As Glob: match "foo." and whatever follows (if anything)
        Equivalent to RE: foo\..*

     Pattern: foo?

        As RE: match "fo" or "foo"
        Equivalent to Glob: n/a

        As Glob: match "foo" and the next character
        Equivalent to RE: foo.

So in light of its limited pattern matching capabilities, paired with a
nearly unlimited potential to sow confusion, file globbing support
seems an unlikely candidate as a future addition to the Standard
Library - at least I hope that is the case.

Greg

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]