Topic: Language support for effective string constants
Author: Michiel Salters<Michiel.Salters@cmg.nl>
Date: Thu, 7 Jun 2001 16:45:57 GMT Raw View
In article <9flqf3$24h8$1@news.vol.cz>, Mirek Fidler says...
>> > I hope it is clear... It is backward compatible too... It costs 4
>> > bytes
>> >per string constant on static memory, but spares strlen bytes on free
>> >storage and strlen + memcpy complexity.
>> Well, implementors can do so. Could you detect it in a conforming
>> C++ program ? No, therefore it is allowed.
> Yes, but it woudl be eventually fine to have it in every implementation.
It's a bad idea to mandate space/speed tradeoffs for each and every
implementation. For game consoles program size might be a more limiting
factor than free store ( only so much bytes in a game ROM). Do you
want to tell them that they can't have their compiler compress strings ?
Regards,
Michiel Salters
--
Michiel Salters
Consultant Technical Software Engineering
CMG Trade, Transport & Industry
Michiel.Salters@cmg.nl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Jens Kilian <Jens_Kilian@agilent.com>
Date: Thu, 7 Jun 2001 17:00:22 GMT Raw View
Michiel Salters<Michiel.Salters@cmg.nl> writes:
> > I hope it is clear... It is backward compatible too... It costs 4 bytes
> >per string constant on static memory, but spares strlen bytes on free
> >storage and strlen + memcpy complexity.
>
> Well, implementors can do so. Could you detect it in a conforming
> C++ program ? No, therefore it is allowed.
>
> PS. How big would the saving be ? Many strings are really short, I've
> heard averages of just 6 characters. I'm not sure if this applies to
> literals.
You could store the length using an encoding scheme like UTF-8, which would
reduce the space overhead for short strings.
--
mailto:jjk@acm.org phone:+49-7031-464-7698 (TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-464-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Gustavo Guerra <gmcg@mega.ist.utl.pt>
Date: Tue, 5 Jun 2001 15:36:55 GMT Raw View
What if you do this:
string s = string("Hello world!");
Shouldn't the compiler optimize it to a static constant?
On Tue, 5 Jun 2001, Mirek Fidler wrote:
> > string s = "Hello world !";
> >
> > Any string implementation is forced to do something like this in s
> > constructor (very simplified form):
> >
> > string::string(const char *s) {
> > len = strlen(s);
> > ptr = new char[len];
> > memcpy(ptr, s, len + 1);
> > }
>
> >It is not. It can recognize applications of the std::string constructor
> >to literals and treat them specially.
>
> But it is AFAIK impossible to create effective implementation for such
> case - one that would be effective for normal heap allocated strings as
> well. If you have any tips, you are welcome.
>
> >Problems with such std::string literals are different:
>
> >- '\0' characters will break.
>
> Yes (actually problem is \0 will NOT break).
>
> >- Since literals don't have the std::string type, the user can't treat
> > them as of type std::string in all contexts but must remember that
> > they are born as raw character arrays.
>
> Yes.
>
> >This doesn't work. The std::string constructor can be used not only
> >on literals but on arbitrary pointers to char, and you can't guarantee
> >that all arrays of characters will have a prefix:
> > char cs[12];
> > write_something_to (cs);
> > string s (cs + 6);
>
> What I was actually thinking about was something like
>
> string_const("Any text?");
>
> But anyway, it was just an idea. Perhaps not the best one ;-)))
>
> Mirek
>
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
>
>
>
by gugu
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Mirek Fidler" <cxl@volny.cz>
Date: Mon, 4 Jun 2001 17:49:33 GMT Raw View
Perhaps it is only a minor problem, but consider following statement
string s = "Hello world !";
Any string implementation is forced to do something like this in s
constructor (very simplified form):
string::string(const char *s) {
len = strlen(s);
ptr = new char[len];
memcpy(ptr, s, len + 1);
}
I think that it could be improved so that neither strlen, memcpy or new is
used. What I am proposing is adding length information
BEFORE string constant in memory. Also, this length information could be
changed in a way that could be used to detect that it is constant - so you
would not have to allocate free storage at all. Perhaps setting msb to 1
would be ok.
Then you could have e.g. following standard functions to play with that (I
am adding implementation example too)
size_t get_string_const_length(const char *) {
return ((size_t *)s)[-1] & 0x7fffffff;
}
bool is_string_constant(const char *) {
return ((size_t *)s)[-1] & 0x80000000;
}
Based on this one could implement string somewhat like this:
class string {
struct Info {
int count;
int alloc;
int length;
};
char *ptr;
bool is_constant() { return
is_string_constant(ptr); }
Info *get_info() { return ((Info *)ptr)[-1]; }
public:
int length() const { return
get_string_const_length(ptr); }
string(const string& s) {
ptr = s.ptr;
if(!s.is_constant())
get_info()->count++;
}
string(const char *s) {
ptr = s;
}
~string() {
if(!s.is_constant()) {
Info *info = get_info();
if(--info->count)
delete[] (char *) info;
}
}
};
I hope it is clear... It is backward compatible too... It costs 4 bytes
per string constant on static memory, but spares strlen bytes on free
storage and strlen + memcpy complexity.
Mirek
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Ron Natalie <ron@spamcop.net>
Date: Mon, 4 Jun 2001 18:40:30 GMT Raw View
Mirek Fidler wrote:
> string s = "Hello world !";
> string::string(const char *s) {
> len = strlen(s);
> ptr = new char[len];
> memcpy(ptr, s, len + 1);
> }
If you're talking specifically about string literals, you could fix it
by using a template that takes an array argument rather that char*.
By the way, the above code invokes undefined behavior.
> I think that it could be improved so that neither strlen, memcpy or new is
> used. What I am proposing is adding length information
> BEFORE string constant in memory.
This is how microsoft's BSTRINGS work (except they use wide chars in their
implementation).
However, this only work for strings allocated that way. In the case of
string literals, you can accomplish the problem (finding the length) by
applying sizeof to it. For other array's of char, you loose the general
ability to manipulate the string into the substring plus you incur a
space penalty.
> Also, this length information could be
> changed in a way that could be used to detect that it is constant - so you
> would not have to allocate free storage at all. Perhaps setting msb to 1
> would be ok.
You've lost me here. It's not only the constness of the source that is the
issue with the dynamic alloation, but the immutability of the generated
string.
> I hope it is clear... It is backward compatible too... It costs 4 bytes
> per string constant on static memory, but spares strlen bytes on free
> storage and strlen + memcpy complexity.
>
Excuse me, where does it save any space?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Mirek Fidler" <cxl@volny.cz>
Date: Mon, 4 Jun 2001 19:55:55 GMT Raw View
> > string s = "Hello world !";
> > string::string(const char *s) {
> > len = strlen(s);
> > ptr = new char[len];
> > memcpy(ptr, s, len + 1);
> > }
>
> If you're talking specifically about string literals, you could fix it
> by using a template that takes an array argument rather that char*.
> By the way, the above code invokes undefined behavior.
Yes, (new char[len + 1] ;-). It was more a list of operations than
actual code.
> > I think that it could be improved so that neither strlen, memcpy or new
is
> > used. What I am proposing is adding length information
> > BEFORE string constant in memory.
>
> However, this only work for strings allocated that way. In the case of
> string literals, you can accomplish the problem (finding the length) by
> applying sizeof to it. For other array's of char, you loose the general
> ability to manipulate the string into the substring plus you incur a
> space penalty.
You are right. It would lead to special constructor (perhaps in derived
class string_const) just for string literals (converted to string
constants).
> > Also, this length information could be
> > changed in a way that could be used to detect that it is constant - so
you
> > would not have to allocate free storage at all. Perhaps setting msb to 1
> > would be ok.
>
> You've lost me here. It's not only the constness of the source that is
the
> issue with the dynamic alloation, but the immutability of the generated
> string.
You would use is_string_constant to detect whether string is based on
string literal, or it is allocated on heap. There is implementation (perhaps
system dependend, but it is OK in standard library) trick that Info::length
points to the same offset
from char *ptr as length information stored before string literal. Both
store length of string, but in case of literal there is also MSB set to one,
so string can detect that ptr is literal and change operations accordingly.
> > I hope it is clear... It is backward compatible too... It costs 4
bytes
> > per string constant on static memory, but spares strlen bytes on free
> > storage and strlen + memcpy complexity.
> >
> Excuse me, where does it save any space?
You do not have to allocate heap to hold a copy of string literal as is
case with current string. Sure, you would be able to create such
string_const implementation now, but it would not be as efficient - at least
you should have to store more than single pointer in instance.
Mirek
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
Date: Mon, 4 Jun 2001 20:05:48 GMT Raw View
Mon, 4 Jun 2001 17:49:33 GMT, Mirek Fidler <cxl@volny.cz> pisze:
> string s =3D "Hello world !";
>=20
> Any string implementation is forced to do something like this in s
> constructor (very simplified form):
>=20
> string::string(const char *s) {
> len =3D strlen(s);
> ptr =3D new char[len];
> memcpy(ptr, s, len + 1);
> }
It is not. It can recognize applications of the std::string constructor
to literals and treat them specially.
Problems with such std::string literals are different:
- '\0' characters will break.
- Since literals don't have the std::string type, the user can't treat
them as of type std::string in all contexts but must remember that
they are born as raw character arrays.
> I think that it could be improved so that neither strlen, memcpy or
> new is used. What I am proposing is adding length information BEFORE
> string constant in memory. Also, this length information could be
> changed in a way that could be used to detect that it is constant -
> so you would not have to allocate free storage at all.
This doesn't work. The std::string constructor can be used not only
on literals but on arbitrary pointers to char, and you can't guarantee
that all arrays of characters will have a prefix:
char cs[12];
write_something_to (cs);
string s (cs + 6);
--=20
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZAST=CAPCZA
QRCZAK
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Mirek Fidler" <cxl@volny.cz>
Date: Tue, 5 Jun 2001 14:39:58 GMT Raw View
> string s = "Hello world !";
>
> Any string implementation is forced to do something like this in s
> constructor (very simplified form):
>
> string::string(const char *s) {
> len = strlen(s);
> ptr = new char[len];
> memcpy(ptr, s, len + 1);
> }
>It is not. It can recognize applications of the std::string constructor
>to literals and treat them specially.
But it is AFAIK impossible to create effective implementation for such
case - one that would be effective for normal heap allocated strings as
well. If you have any tips, you are welcome.
>Problems with such std::string literals are different:
>- '\0' characters will break.
Yes (actually problem is \0 will NOT break).
>- Since literals don't have the std::string type, the user can't treat
> them as of type std::string in all contexts but must remember that
> they are born as raw character arrays.
Yes.
>This doesn't work. The std::string constructor can be used not only
>on literals but on arbitrary pointers to char, and you can't guarantee
>that all arrays of characters will have a prefix:
> char cs[12];
> write_something_to (cs);
> string s (cs + 6);
What I was actually thinking about was something like
string_const("Any text?");
But anyway, it was just an idea. Perhaps not the best one ;-)))
Mirek
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Michiel Salters<Michiel.Salters@cmg.nl>
Date: Wed, 6 Jun 2001 16:20:14 GMT Raw View
In article <9ffis5$l1j$1@news.vol.cz>, Mirek Fidler says...
>
>Perhaps it is only a minor problem, but consider following statement
>
>string s = "Hello world !";
>
>Any string implementation is forced to do something like this in s
>constructor (very simplified form):
>
>string::string(const char *s) {
> len = strlen(s);
> ptr = new char[len];
> memcpy(ptr, s, len + 1);
>}
No. An implementor might do something like : the result will be an object
with bit pattern { 00 00 00 0D 01 48 65 6C 6C 70 30 ... } so add these
bytes to the executable, and compile the string s="Hello, world !" as a
memcpy of these bytes.
>I think that it could be improved so that neither strlen, memcpy or new is
>used. What I am proposing is adding length information
>BEFORE string constant in memory. Also, this length information could be
>changed in a way that could be used to detect that it is constant - so you
>would not have to allocate free storage at all. Perhaps setting msb to 1
>would be ok.
[ snip example of use ]
> I hope it is clear... It is backward compatible too... It costs 4 bytes
>per string constant on static memory, but spares strlen bytes on free
>storage and strlen + memcpy complexity.
Well, implementors can do so. Could you detect it in a conforming
C++ program ? No, therefore it is allowed.
PS. How big would the saving be ? Many strings are really short, I've
heard averages of just 6 characters. I'm not sure if this applies to
literals.
Regards,
Michiel Salters
--
Michiel Salters
Consultant Technical Software Engineering
CMG Trade, Transport & Industry
Michiel.Salters@cmg.nl
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Mirek Fidler" <cxl@volny.cz>
Date: Wed, 6 Jun 2001 20:18:42 GMT Raw View
> > I hope it is clear... It is backward compatible too... It costs 4
bytes
> >per string constant on static memory, but spares strlen bytes on free
> >storage and strlen + memcpy complexity.
>
> Well, implementors can do so. Could you detect it in a conforming
> C++ program ? No, therefore it is allowed.
Yes, but it woudl be eventually fine to have it in every implementation.
> PS. How big would the saving be ? Many strings are really short, I've
> heard averages of just 6 characters. I'm not sure if this applies to
> literals.
Well, but in a moment when string literal is copyconstucted to string,
result is at least 20 bytes long on normal string implementation (I have
added 4 bytes for heap overhead).
In fact that is what this proposal was about.
Mirek
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Wed, 6 Jun 2001 20:18:56 GMT Raw View
I have this idea that now that C++ has a std::string type, it should be
possible to produce a corresponding (length, char*) pair. It could for
example be strings of the form
`Hello World'
and one should then be able to somehow pick up both the char* pointer and
the length.
Then std::string could be written so that it is possible for const strings
to point at such a `...' string.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]