Topic: Text_view: A C++ concepts and range based character
Author: Tom Honermann <Thomas.Honermann@synopsys.com>
Date: Mon, 8 Feb 2016 06:27:09 +0000
Raw View
I am planning to submit a paper for the Jacksonville mailing (submission=20
deadline this Friday!) discussing a library I've been developing that=20
provides code point enumeration support for modern and legacy character=20
encodings. I will be attending the Jacksonville meeting and hope to=20
present the paper there. The intent of this email is to request some=20
early feedback to help guide writing the paper and to prepare myself to=20
address concerns raised.
The library is named text_view and is avilable at
https://github.com/tahonermann/text_view
The readme file found there provides a short overview, feature=20
description, terminology description, list of supported character=20
encodings, and a specification of the interface. The readme file is=20
still rough and lacking in prose to describe many of the classes. I=20
plan to improve it soon, but am hopeful that it suffices to at least=20
provide a sense of the library and how to use it. Contributions welcome!
Text_view avoids introducing another string type. Instead, it provides=20
facilities for constructing a view over any range, view, or container=20
that holds a code unit sequence; the view associates an encoding with=20
the code unit sequence and provides iterators that decode the sequence=20
and produce code point values. The value type of the iterator type is a=20
character type that associates the code point value with a character set.
An example taken from the overview follows. Note that \u00F8 (LATIN=20
SMALL LETTER O WITH STROKE) is encoded as UTF-8 using two code units=20
(\xC3\xB8), but iterator based enumeration sees just the single code point.
using CT =3D utf8_encoding::character_type;
auto tv =3D make_text_view<utf8_encoding>(u8"J\u00F8erg is my friend");
auto it =3D tv.begin();
assert(*it++ =3D=3D CT{0x004A}); // 'J'
assert(*it++ =3D=3D CT{0x00F8}); // '=C3=B8'
assert(*it++ =3D=3D CT{0x0065}); // 'e'
Please see the readme file at [1] for more examples and details.
I see this library as a very small, but fundamental step towards=20
improving support for Unicode within the standard library. Thank you=20
for any feedback!
Tom.
[1]: Text_view: A C++ Concepts based character encoding and code point
enumeration library
https://github.com/tahonermann/text_view
--=20
---=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-propos=
als/.
.