Thread

Topic: Text_view: A C++ concepts and range based character
Author: Tom Honermann <Thomas.Honermann@synopsys.com>
Date: Mon, 8 Feb 2016 06:27:09 +0000 Raw View
I am planning to submit a paper for the Jacksonville mailing (submission=20
deadline this Friday!) discussing a library I've been developing that=20
provides code point enumeration support for modern and legacy character=20
encodings.  I will be attending the Jacksonville meeting and hope to=20
present the paper there.  The intent of this email is to request some=20
early feedback to help guide writing the paper and to prepare myself to=20
address concerns raised.

The library is named text_view and is avilable at
https://github.com/tahonermann/text_view

The readme file found there provides a short overview, feature=20
description, terminology description, list of supported character=20
encodings, and a specification of the interface.  The readme file is=20
still rough and lacking in prose to describe many of the classes.  I=20
plan to improve it soon, but am hopeful that it suffices to at least=20
provide a sense of the library and how to use it.  Contributions welcome!

Text_view avoids introducing another string type.  Instead, it provides=20
facilities for constructing a view over any range, view, or container=20
that holds a code unit sequence; the view associates an encoding with=20
the code unit sequence and provides iterators that decode the sequence=20
and produce code point values.  The value type of the iterator type is a=20
character type that associates the code point value with a character set.

An example taken from the overview follows.  Note that \u00F8 (LATIN=20
SMALL LETTER O WITH STROKE) is encoded as UTF-8 using two code units=20
(\xC3\xB8), but iterator based enumeration sees just the single code point.

using CT =3D utf8_encoding::character_type;
auto tv =3D make_text_view<utf8_encoding>(u8"J\u00F8erg is my friend");
auto it =3D tv.begin();
assert(*it++ =3D=3D CT{0x004A}); // 'J'
assert(*it++ =3D=3D CT{0x00F8}); // '=C3=B8'
assert(*it++ =3D=3D CT{0x0065}); // 'e'

Please see the readme file at [1] for more examples and details.

I see this library as a very small, but fundamental step towards=20
improving support for Unicode within the standard library.  Thank you=20
for any feedback!

Tom.

[1]: Text_view: A C++ Concepts based character encoding and code point
      enumeration library
      https://github.com/tahonermann/text_view

--=20

---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-propos=
als/.

.