Topic: Interest in transducers framework for Unicode


Author: Mathias Gaunard <mathias@gaunard.com>
Date: Wed, 4 Nov 2015 12:21:00 +0000
Raw View
--001a11c326361209bf0523b60bf8
Content-Type: text/plain; charset=UTF-8

Hi,

A lot of discussion has been had over character encoding conversion and
Unicode, but never went anywhere as far as I know.

I suggest, to address the character encoding conversion problem (e.g.
conversion between UTF-8, UTF-16, UTF-32, narrow and wide encodings) to
propose a transducer framework.

A transducer would be a range conversion facility working in terms of
chunks, that can either be applied eagerly on a range or lazily as the
range is traversed. It can also be used to build an adaptor on streams.

I already did some preliminary work on this back in 2009, but one of the
challenges was mapping this efficiently to the iterator adaptor model
without compromising performance of eager evaluation. I think this could be
revisited now with the ranges TS.

Any interest, or people willing to work together to get a proposal ready?

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a11c326361209bf0523b60bf8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div><br></div><div>A lot of discussion has been had ov=
er character encoding conversion and Unicode, but never went anywhere as fa=
r as I know.</div><div><br></div><div>I suggest, to address the character e=
ncoding conversion problem (e.g. conversion between UTF-8, UTF-16, UTF-32, =
narrow and wide encodings) to propose a transducer framework.</div><div><br=
></div><div>A transducer would be a range conversion facility working in te=
rms of chunks, that can either be applied eagerly on a range or lazily as t=
he range is traversed. It can also be used to build an adaptor on streams.<=
/div><div><br></div><div>I already did some preliminary work on this back i=
n 2009, but one of the challenges was mapping this efficiently to the itera=
tor adaptor model without compromising performance of eager evaluation. I t=
hink this could be revisited now with the ranges TS.</div><div><br></div><d=
iv>Any interest, or people willing to work together to get a proposal ready=
?</div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a11c326361209bf0523b60bf8--

.


Author: Robert Ramey <ramey@rrsd.com>
Date: Wed, 4 Nov 2015 08:40:55 -0800
Raw View
On 11/4/15 4:21 AM, Mathias Gaunard wrote:
> Hi,
>
> A lot of discussion has been had over character encoding conversion and
> Unicode, but never went anywhere as far as I know.

<snip>

Have you looked at Boost.Locale ?  If so what is your assessment of this?

Robert Ramey


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.


Author: Mathias Gaunard <mathias@gaunard.com>
Date: Wed, 4 Nov 2015 16:55:35 +0000
Raw View
--001a11c269b012883d0523b9e1d2
Content-Type: text/plain; charset=UTF-8

On Wed, Nov 4, 2015 at 4:40 PM, Robert Ramey <ramey@rrsd.com> wrote:

> On 11/4/15 4:21 AM, Mathias Gaunard wrote:
>
>> Hi,
>>
>> A lot of discussion has been had over character encoding conversion and
>> Unicode, but never went anywhere as far as I know.
>>
>
> <snip>
>
> Have you looked at Boost.Locale ?  If so what is your assessment of this?


I'm aware of Boost.Locale and even reviewed it during its entry in Boost.
I'm not aware however of Boost.Locale ever being proposed as-is for
inclusion in the standard, have I missed something?

Boost.Locale covers a wide spectrum of Unicode features, what I'm proposing
is just a generic mechanism that could be used to address the conversion
aspect.
In terms of conversions, Boost.Locale takes a pair of pointers and returns
a basic_string. What I think would be interesting is to have a mechanism
where conversion can either be performed like that or as an iterator
adaptor (like the ones in boost/detail/pending/unicode_iterator.hpp) from
the same code.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

--001a11c269b012883d0523b9e1d2
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">On W=
ed, Nov 4, 2015 at 4:40 PM, Robert Ramey <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:ramey@rrsd.com" target=3D"_blank">ramey@rrsd.com</a>&gt;</span> wrote=
:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex"><span class=3D"">On 11/4/15 4:21 AM, Mathias Gaunard =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">
Hi,<br>
<br>
A lot of discussion has been had over character encoding conversion and<br>
Unicode, but never went anywhere as far as I know.<br>
</blockquote>
<br></span>
&lt;snip&gt;<br>
<br>
Have you looked at Boost.Locale ?=C2=A0 If so what is your assessment of th=
is?</blockquote><div><br></div><div>I&#39;m aware of Boost.Locale and even =
reviewed it during its entry in Boost.</div><div>I&#39;m not aware however =
of Boost.Locale ever being proposed as-is for inclusion in the standard, ha=
ve I missed something?</div><div><br></div><div>Boost.Locale covers a wide =
spectrum of Unicode features, what I&#39;m proposing is just a generic mech=
anism that could be used to address the conversion aspect.<br>In terms of c=
onversions, Boost.Locale takes a pair of pointers and returns a basic_strin=
g. What I think would be interesting is to have a mechanism where conversio=
n can either be performed like that or as an iterator adaptor (like the one=
s in boost/detail/pending/unicode_iterator.hpp) from the same code.</div></=
div></div></div>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/">http://groups.google.com/a/isocpp.org/group/std-proposals/<=
/a>.<br />

--001a11c269b012883d0523b9e1d2--

.


Author: Robert Ramey <ramey@rrsd.com>
Date: Wed, 4 Nov 2015 10:57:00 -0800
Raw View
On 11/4/15 8:55 AM, Mathias Gaunard wrote:

> I'm aware of Boost.Locale and even reviewed it during its entry in Boost.
> I'm not aware however of Boost.Locale ever being proposed as-is for
> inclusion in the standard, have I missed something?

> Boost.Locale covers a wide spectrum of Unicode features, what I'm
> proposing is just a generic mechanism that could be used to address the
> conversion aspect.
> In terms of conversions, Boost.Locale takes a pair of pointers and
> returns a basic_string. What I think would be interesting is to have a
> mechanism where conversion can either be performed like that or as an
> iterator adaptor (like the ones in
> boost/detail/pending/unicode_iterator.hpp) from the same code.

This seems to be rich subject.

I know that the current standard contains codecvt facets for
transforming from one character stream to another

http://en.cppreference.com/w/cpp/locale/codecvt

corresponding with the author of Boost.locale he raises a lot of
criticism of these.  On the other hand, I've the documentation of
Boost.Locale hasn't been as helpful to me as I would like.

And these facets could be wrapped in a different interface.

I have had to dip into this subject on multiple occasions and it's
always been a source of frustration.  I like the iterator adaptor
approach.  I've embodied it in boost serialization/other
classes/dataflow iterators.  I would like to see something like this
(based on the new ranges) to be the basis of a whole library for string
character conversion.  Maybe that's similar to what you're suggesting.
Or maybe it's already included in Boost.Locale and I just haven't been
able to find it yet.

Still a ripe subject and a big job.

Robert Ramey

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

.