Topic: Template virtuals and locale facets


Author: "Anthony Williams" <anthwil@nortelnetworks.com>
Date: Tue, 25 Sep 2001 15:34:25 GMT
Raw View
"Pete Becker" <petebecker@acm.org> wrote in message
news:3BAF55F4.634FBE7E@acm.org...
> Anthony Williams wrote:
> >
> > However, I believe that the interface would benefit greatly from
enabling
> > the use of general InputIterators and OutputIterators, as in:
> >
> > ...
> >
> > This would be more in keeping with the rest of the Standard Library,
with
> > full support for iterator ranges, rather than just arrays.
> >
>
> What problems would this solve that cannot be solved fairly easily with
> the current language definition?

New code to read and convert a whole file:

std::ifstream infile("myfile.txt");
std::istreambuf_iterator<char> from(infile);
std::istreambuf_iterator<char> from_end;

std::wstring wideString;
std::back_inserter to(wideString);
const unsigned maxOut=std::wstring::npos;

std::codecvt<wchar_t,char,mbstate_t>*
cvt=std::use_facet<std::codecvt<wchar_t,char,mbstate_t> >(std::locale());
mbstate_t state;

std::istreambuf_iterator<char> endOfInput;
std::back_insert_iterator<std::wstring> endOfOutput;

cvt->in(state,from,from_end,endOfInput,
    to,maxOut,endOfOutput);

Equivalent code using current standard, not assuming anything about the
iterator types

std::ifstream infile("myfile.txt");
std::istreambuf_iterator<char> from(infile);
std::istreambuf_iterator<char> from_end;

std::wstring wideString;
std::back_inserter to(wideString);
const unsigned maxOut=std::wstring::npos;

std::codecvt<wchar_t,char,mbstate_t>*
cvt=std::use_facet<std::codecvt<wchar_t,char,mbstate_t> >(std::locale());
mbstate_t state;

std::istreambuf_iterator<char> from_next=from;
unsigned count=0;
while(from_next!=from_end && count !=maxOut)
{
    char inbuf[1];
    wchar_t outbuf[1];

    inbuf[0]=*from_next++;

    const char* nextIn;
    wchar_t* nextOut=outBuf;
    cvt->in(state,inbuf,inbuf+1,nextIn,outbuf,outbuf+1,nextOut);
    if(nextOut!=outBuf)
    {
        *to++=outBuf[0];
        ++count;
    }
}

Yes you can do it, but it's more long winded - it requires intermediate
buffering, complicating user code and leading to a source of errors. The
main loop is 16 lines instead of 3, after the common setup code. If the user
needs to do this for different iterator types - say a std::vector<wchar_t>
for output, or a socket-reading-iterator for input, then they have to repeat
the whole loop again, making sure they make the correct changes. Obviously
you can unroll the loop or use std::copy to copy the input to a buffer,
convert the buffer, and copy out again, but this is still unnecessary
complication.

IMO to argue that this is unnecessary is to argue that many of the STL
algorithms should only be implemented for raw pointers, as loops like the
above could be written as wrappers for raw pointer versions e.g.

template<typename Iterator>
void sort(Iterator begin,Iterator end)
{
    std::vector<std::iterator_traits<Iterator>::value_type> buf;

    std::copy(begin,end,std::back_inserter(buf));

    std::sort(&buf.front(),&buf.back()+1); // sort a range specified by raw
pointers only.

    std::copy(buf.begin(),buf.end(),begin);
}

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optoelectronics
The opinions expressed in this message are not necessarily those of my
employer



---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Valentin.Bonnard@free.fr (Valentin Bonnard)
Date: Thu, 27 Sep 2001 08:47:37 GMT
Raw View
I don't think anyone questions the fact that codecvt<> has
the worst interface anyone could imagine. It is also
extremely badly specified.

I personaly believe that it should be removed and replaced
with something sensible and understandable rather than
incompletely fixed, because the cost of reverse enginering
and patching can be extreme compared to clean redesign.

  --   VB


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Pete Becker <petebecker@acm.org>
Date: Thu, 27 Sep 2001 17:57:11 GMT
Raw View
Valentin Bonnard wrote:
>
> I don't think anyone questions the fact that codecvt<> has
> the worst interface anyone could imagine. It is also
> extremely badly specified.
>
> I personaly believe that it should be removed and replaced
> with something sensible and understandable rather than
> incompletely fixed, because the cost of reverse enginering
> and patching can be extreme compared to clean redesign.
>

We eagerly await your detailed proposal.

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: "Anthony Williams" <anthwil@nortelnetworks.com>
Date: Mon, 24 Sep 2001 15:28:11 GMT
Raw View
A short while back, there was a thread on comp.lang.c++.moderated about
member templates and virtual functions ("Member templates cannot be virtual.
Why?"), in which it was proposed that member templates be permitted to be
virtual, and various implementation consequences discussed.

I believe that not only is this possible (with help from the linker - see my
posts in that thread), but highly desirable, and I believe there are cases
in the Standard that would benefit from such a feature.

For example, consider the std::codecvt facets. They have public interface
functions which operate on sequences of characters, all of which delegate to
protected virtual functions of the same signature to do the actual work. Due
to the limitations of the virtual mechanism, these sequences are limited to
character arrays, with raw pointers to the start and end.

However, I believe that the interface would benefit greatly from enabling
the use of general InputIterators and OutputIterators, as in:

template<typename internT,typename externT,typename stateT>
class codecvt:
    public locale::facet,public codecvt_base
{
public:
    template<typename InputIterator,typename OutputIterator>
    result out(stateT& state,
         InputIterator from, InputIterator from_end, InputIterator&
from_next,
         OutputIterator to, unsigned maxOutputChars, OutputIterator&
to_next) const;
    template<typename OutputIterator>
    result unshift(stateT& state,
         OutputIterator to, unsigned maxOutputChars, OutputIterator&
to_next) const;
    template<typename InputIterator,typename OutputIterator>
    result in(stateT& state,
        InputIterator from, InputIterator from_end, InputIterator&
from_next,
        OutputIterator to, unsigned maxOutputChars, OutputIterator& to_next)
const;
    template<typename InputIterator>
    int length(stateT& state, InputIterator from, InputIterator end,
        size_t max) const;
// etc.
};

(Note the use of maxOutputChars as a count rather than to_limit as the
end-of-range value, since OutputIterators don't necessarily have an
end-of-range value (e.g. std::back_inserter))

This would be more in keeping with the rest of the Standard Library, with
full support for iterator ranges, rather than just arrays.

We could implement such an interface at the moment, with the wrapper
functions reading one character at a time into a single-character buffer,
and forwarding to the virtual methods dealing in arrays of characters and
raw pointers, however this is likely to be very inefficient due to the
repeated virtual calls.

What we need, therefore, is to enable the virtual implementation functions
to be templates too, so the wrapper functions can just pass the parameters
straight through, without having to buffer the characters and process one
character at a time.

Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optoelectronics
The opinions expressed in this message are not necessarily those of my
employer


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]





Author: Pete Becker <petebecker@acm.org>
Date: Mon, 24 Sep 2001 17:27:31 GMT
Raw View
Anthony Williams wrote:
>
> However, I believe that the interface would benefit greatly from enabling
> the use of general InputIterators and OutputIterators, as in:
>
> ...
>
> This would be more in keeping with the rest of the Standard Library, with
> full support for iterator ranges, rather than just arrays.
>

What problems would this solve that cannot be solved fairly easily with
the current language definition?

--
Pete Becker
Dinkumware, Ltd. (http://www.dinkumware.com)

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]