Topic: memstream discussion (was strstream reinstatement)


Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Wed, 13 Sep 2006 18:20:47 GMT
Raw View
werasm ha scritto:
>> // this function is called by some legacy framework,
>> // it's expected to fill the given buffer with something
>> void legacy_callback(char* buf, size_t buflen)
>> {
>>   omemstream formatter(buf, buflen);
>>   /* use formatter to fill the buffer and check overflow */
>> }
>
> This case IMO is so vastly different from the other, that it justifies
> having a unique/different ostream entity to handle it. Why provide a
> (do-it-all) interface. That is the problem with the current ostrstream.

What makes this example so different from the other one? I don't
understand. I'm not trying to do-it-all with one component. They really
looks like the same problem to me.

> I think we need a new entity that wraps around an array reference and
> performs operations on the referenced array... array_ostream? I'll have
> a look at what boost offers in terms of this.

I'm still not convinced that having a separate wrapper is a good thing
and also that is necessary.

About Boost, if we had the iostream library then we would need no
memstreams, because they'd be already there. Have a look at classes
basic_array_source and basic_array_sink. Boost.iostream is certainly a
superior solution to this and many other problems. It simply hasn't been
formally proposed (yet), otherwise I would not have submitted my proposal.

> template <unsigned N, bool AddNull = true>
> class memstream
> {
>   public: //...
>
>   enum {SzToAdd = AddNull ? 1 : 0};
>   enum{ Sz = N+SzToAdd };
>
>   memstream( char (&rarray)[Sz] )
>    : rarray_( rarray )
>   {
>     rarray_[Sz-1] = '\0';
>   }
>
>    char (&rarray_)[Sz];
> };
>
> If he specifies the input buffer size as erroneous, then compilation
> error. See why specifying size becomes safe/handy.

Hmmm... What I don't understand is why bother the programmer to remember
the size of the array and punish him if he gets it wrong? Why don't we
just use a template to select automatically the correct size for him? In
*addition* to the constructor:

  memstream(char* s, size_t n);

we could have

  template <int N>
  explicit memstream(char (&s)[N]);

which automatically selects the right size. The programmer can't go
wrong and his life would be easier.

>> 2) Alternatively, you could zero-initialize the buffer on the class
>> constructor, but then we would fail the goal to be as lightweight as
>> possible, performing an operation that the user might not want to pay.
>
> Only necessary to set that last character to NULL, no great penalty.

You don't seem to be joking, so it must be me that is missing
something... How could a NUL in the last position be enough? What if I
don't fill the buffer entirely? Suppose I use the memstream as in *your*
example:

  char a[100];
  memstream<100, true> s(a); // a[99] is now '\0'
  s << "X"; // just fill the first char of the buffer
  legacy_function(a); // ops!

you are passing to legacy_function a buffer with a 'X', 98 rubbish chars
and a '\0'. Yes, we won't have UB, but legacy_function will get
rubbish... I don't think that's tolerable. You can't even be sure about
the length of the string, because there may or may not be other NULs in
the rubbish.

>> So, you see, #1 is a no-no, #2 and #3 introduce a non-negligible cost
>> and potential confusion in certain situations.
>
> #2 IMO is negligible. Null terminating the last position...

If it worked...

>> Moreover I disagree with you that the ends is so easy to
>> forget. That would explain my resistance to have null-
>> termination as the default behaviour.
>
> Adding ends is inconsistent with how output streams are used in
> general, and for this reason easy to forget (or not know in the first
> place). Forcing NULL termination at least prevents (or lessens the
> chance of) UB, which is my objective.

Here we are in the realm of opinions. I don't find the usage of
std::ends inconsistent in general, just very uncommon. Compare it with
std::endl which is used when writing to a file to force a flush.

>> However, #3 looks
>> > very promising as it would be optional and I am indeed inclined to
>> > consider it as soon as I find a way to properly handle the overflow. I
>> > have to thank you for making me think about it.
>
> Pleasure, 3 lookes promising for me too and is more consistent with how
> other output streams are used.
>

If you speak of inconsistency, then #3 is indeed a departure from the
standard stream classes because it would be the first case where a
stream class writes something in the buffer that is not a direct result
of the application data.

Anyway, the basic controversy here is if (output) memstreams should be a
facility to build null-terminated strings or not. My proposal stems from
the assumption that "The proposed templates also don't try
to provide any view of the underlying buffer as a string" and is
consistent with that. I understand that this assumption is perhaps too
draconian and may limit the usefulness of the component, but it helps to
keep the interface simple and efficient. If the committee decides that
this makes my memstreams not useful enough or, worse, too error-prone,
then I'm ready to relax the assumption and study other options. But,
please don't say that I want to do-it-all if the component you have in
mind stems from opposite assumptions than mine.

Regards,

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "werasm" <w_erasm@telkomsa.net>
Date: Thu, 14 Sep 2006 07:57:37 CST
Raw View
Alberto Ganesh Barbati wrote:
> werasm ha scritto:

> > This case IMO is so vastly different from the other, that it justifies
> > having a unique/different ostream entity to handle it. Why provide a
> > (do-it-all) interface. That is the problem with the current ostrstream.
>
> What makes this example so different from the other one? I don't
> understand. I'm not trying to do-it-all with one component. They really
> looks like the same problem to me.

OK, vastly is perhaps to a strong word. From a usage perpective, there
exists enough compile time information in the one case to use a
reference to array, and in the other case not. This does not matter
though (looked at your proposal, BTW :-), you're right.

template <int N>
basic_membuf ( char(&s)[N],
std :: ios_base :: openmode mode = std :: ios_base :: in | std ::
ios_base :: out)
{
if(N > max_size ())
{  throw std :: length_error (" buffer   size  too  large "); }
bufsize_ = N;
if ( mode & std :: ios_base :: out)
{  this -> setp (s, s + n); }
if ( mode & std :: ios_base :: in)
{  this -> setg (s, s, s + n); }
}
Maybe overloading with this is a possibility... The stream still does
not assume ownership. No big difference, you're right.

> > If he specifies the input buffer size as erroneous, then compilation
> > error. See why specifying size becomes safe/handy.
>
> Hmmm... What I don't understand is why bother the programmer to remember
> the size of the array and punish him if he gets it wrong? Why don't we
> just use a template to select automatically the correct size for him? In
> *addition* to the constructor:
>
>   memstream(char* s, size_t n);
>
> we could have
>
>   template <int N>
>   explicit memstream(char (&s)[N]);

Yes.

> which automatically selects the right size. The programmer can't go
> wrong and his life would be easier.

Yes, I just wanted to emphasize that the last character should be left
untouched (and NULL). The user does not use it anyway. In addition to
this streaming can always add a NULL (#3). I think, although this is
not consistent in the sense you've mentioned, it is consistent from a
usage perpective (user never explicity needs to NULL terminated - as
with  std::cout, for instance). For the other cases (stringstream) NULL
termination is not necessary as std::string is per definition NULL
terminated.

> > Only necessary to set that last character to NULL, no great penalty.
>
> You don't seem to be joking, so it must be me that is missing
> something... How could a NUL in the last position be enough? What if I
> don't fill the buffer entirely? Suppose I use the memstream as in *your*
> example:

If you use #3, this becomes redundant, if you don't - you have a trade
off. Either zero-initialise all (for the price of efficiency), or at
least the last character and leave it untouched. This allows the user
to blindly stream data to the buffer, and not verify whether the
streaming operation failed (only using the buffer). Sometimes users
don't care when textual data is truncated, or whether streaming was
totally successfull (We've had those cases). They may want to use the
buffer despite truncation, but then they require the NULL termination -
at least.

char buffer[x];
std::strstream out( x, sizeof(x) );
out << "more data than x can take...."
".............................................."; //Oops, forgot ends!
//works fine, buffer null terminated.
legacy_f_requiring_null_termination( xstr );

I know this is liberal and you may differ in opinion. We've used this
when streaming diagnostic data over UDP, and truncation does not matter
so much.

> you are passing to legacy_function a buffer with a 'X', 98 rubbish chars
> and a '\0'. Yes, we won't have UB, but legacy_function will get
> rubbish... I don't think that's tolerable. You can't even be sure about
> the length of the string, because there may or may not be other NULs in
> the rubbish.

Trade offs... either #3, or the less efficient zero initialization. I
like #3. True, sometimes it is not tolerable, other times it is. In
fact, I like #3 most of all.

> If you speak of inconsistency, then #3 is indeed a departure from the
> standard stream classes because it would be the first case where a
> stream class writes something in the buffer that is not a direct result
> of the application data.

Yes, but from visible usage perspective, consistent in that for other
cases, streaming std::ends hardly ever happen. Do you ever see:

std::cout << std::ends; ? or
std::stringstream inst;
inst << std::ends;?

Bear in mind, there are many bad programmers out there, not reading the
standards as you do. I have, on large project previously searched for
the lack of ends, and found it, fixing some interesting bugs. People
don't necessarily buy the stl bible when they want to use strstream.

> Anyway, the basic controversy here is if (output) memstreams should be a
> facility to build null-terminated strings or not. My proposal stems from
> the assumption that "The proposed templates also don't try
> to provide any view of the underlying buffer as a string" and is
> consistent with that.

Yes, unfortunately the result can (and (most) often will) be used as a
c string. I think this assumption is to strict, and with slight change
can be enforced. I like the rest though. The assumption of never owning
the memory is an excellent one.

> But,
> please don't say that I want to do-it-all if the component you have in
> mind stems from opposite assumptions than mine.

Yes, accept defeat wrt. this point :-). What you are proposing is
certainly not a do-it-all interface. strstream was, though.

Regards,

Werner

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Thu, 14 Sep 2006 17:59:52 GMT
Raw View
werasm ha scritto:
>
> template <int N>
> basic_membuf ( char(&s)[N],
> std :: ios_base :: openmode mode = std :: ios_base :: in | std ::
> ios_base :: out)
> {
> if(N > max_size ())
> {  throw std :: length_error (" buffer   size  too  large "); }
> bufsize_ = N;
> if ( mode & std :: ios_base :: out)
> {  this -> setp (s, s + n); }
> if ( mode & std :: ios_base :: in)
> {  this -> setg (s, s, s + n); }
> }
> Maybe overloading with this is a possibility... The stream still does
> not assume ownership. No big difference, you're right.

Good, this one goes in the TODO list for future revisions.

> Yes, I just wanted to emphasize that the last character should be left
> untouched (and NULL). The user does not use it anyway. In addition to
> this streaming can always add a NULL (#3). I think, although this is
> not consistent in the sense you've mentioned, it is consistent from a
> usage perpective (user never explicity needs to NULL terminated - as
> with  std::cout, for instance). For the other cases (stringstream) NULL
> termination is not necessary as std::string is per definition NULL
> terminated.

Well, I have to give you a big displeasure, but in fact, despite common
understanding, std::string is *not* null-terminated *by definition*.
It's never said anywhere in the standard. It's only the c_str() member
that is required to return a null-terminated version of the controlled
sequence. Notice that the wording is carefully chosen so to allow
implementations to do a copy of the entire string to accomplish this
on-the-fly. That's why there's the data() member that doesn't try to
null-terminate the string, because it might be more efficient.

>> Anyway, the basic controversy here is if (output) memstreams should be a
>> facility to build null-terminated strings or not. My proposal stems from
>> the assumption that "The proposed templates also don't try
>> to provide any view of the underlying buffer as a string" and is
>> consistent with that.
>
> Yes, unfortunately the result can (and (most) often will) be used as a
> c string. I think this assumption is to strict, and with slight change
> can be enforced. I like the rest though. The assumption of never owning
> the memory is an excellent one.

Ok. I am now convinced that having a buffer object that provide some
form of automatic null-termination facility could be useful. By now we
agree that one promising implementation strategy is to:

a) reserve the last char of the buffer
b) force ios_base::unitbuf so sync() is called after each output operation
c) let sync() append a null without moving the pointer

However, I feel that we have only scratched the surface... Before we
could make that into a full-fledged proposal, we have to answer a lot of
other questions:

1) what about seeking?

2) shall we also have null-terminated input buffers? for example, such a
buffer could return an eof condition as soon as a null is extracted. It
might be useful if you have a short null-terminated string in a large
buffer. Of course you might just use strlen() with imemstream, but you
shouldn't have to.

3) what about i/o streams? that seems to be the most difficult task.

Granted that I prefer to wait for the Portland meeting feedback before
making further proposals, I can see two alternatives:

1) provide different class templates for null-terminated memstreams;

2) tweak the memstream to allow alternative null-terminated behaviour
(yes, the do-it-all class ;-). Which behaviour should be the default
should be decided.

I feel solution 1) to be more promising. Maybe limiting ourselves to the
output case with limited seeking support (for example, rewinding could
be supported easily) could be a start. What do you think?

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "werasm" <w_erasm@telkomsa.net>
Date: Thu, 14 Sep 2006 13:00:25 CST
Raw View
werasm wrote:

> char buffer[x];
> std::strstream out( x, sizeof(x) );
> out << "more data than x can take...."
> ".............................................."; //Oops, forgot ends!
> //works fine, buffer null terminated.
> legacy_f_requiring_null_termination( xstr );

This should read...

 char buffer[x];
 buffer[sizeof(x)-1] = '\0';

 std::strstream out( x, sizeof(x)-1 );
 out << "more data than x can take...."
   ".............................................." << std::ends;
 //works fine, buffer null terminated.
 legacy_f_requiring_null_termination( xstr );

.which is a case that we've used in the past. <buffer> is used
regardless of whether streaming were successful or not.

> Regards,
>
> Werner

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "werasm" <w_erasm@telkomsa.net>
Date: Fri, 15 Sep 2006 09:43:48 CST
Raw View
Alberto Ganesh Barbati wrote:

> Well, I have to give you a big displeasure, but in fact, despite common
> understanding, std::string is *not* null-terminated *by definition*.
> It's never said anywhere in the standard. [snip]

Yes, I'm a string user, not a string implementor :-). From my
perpective, when I call c_str(), things are NULL terminated. "Common
understanding" stems from usage perpective (IMhO).

> Ok. I am now convinced that having a buffer object that provide some
> form of automatic null-termination facility could be useful. By now we
> agree that one promising implementation strategy is to:
>
> a) reserve the last char of the buffer
> b) force ios_base::unitbuf so sync() is called after each output operation
> c) let sync() append a null without moving the pointer

Yes, if the user writes to the last non-null char, the eof flag is set.
He should not be allowed to attempt to write/read the NULL char. On the
attempt the failbit would be set. I think that if you use option #3
consistently you would have to reserve the last character.

Admittedly, I do not have as much experience as you wrt. the
implementation side of things. I have up till now only used the
iostream library out of the bag (without extension). I will therefore
let you contemplate... One suggestion I have is to leave space as
follow (it has to be carried though - I'm proposing this naively):

template <int N>
basic_membuf (
  char(&s)[N],
  std :: ios_base :: openmode mode =
    std :: ios_base :: in | std ::ios_base :: out)
{
  enum{ usedSz = N-1 };
  if(usedSz > max_size () )
  {  throw std :: length_error (" buffer   size  too  large "); }
  bufsize_ = usedSz;

  s[usedSz] = '\0';
// I think this would cause the last not to be touched...

  if ( mode & std :: ios_base :: out)
  {  this -> setp (s, s + usedSz); }
  if ( mode & std :: ios_base :: in)
  {  this -> setg (s, s, s + usedSz); }
}

The biggest problem with this approach is that it may corrupt input (if
input was not terminated in the first place). Its actually UB as input
is const anyway.

Since we are only wrapping the memory, so to speak (and the  mem_buf is
lightweight wrt to size, would it not be possible to have to seperate
underlying (buffer) entities for input and output? Output then treated
as we suggest, input treated with the NULL streamable as the last item
in the sequence. Do I make sense? I don't know how this would effect
iostream though. A possibility may be one buffer that switched between
implementations as modes are swithed between input and ouput.

> However, I feel that we have only scratched the surface.

Yes, certainly.

> 1) what about seeking?

In output mode, pre-last character considered to be the last. In input
mode? Depends on what is possible wrt. implementation. I would say the
last character could be considered the last. I must emphasize my
naivety, though - I'm only answering because you ask :-)

> 2) shall we also have null-terminated input buffers?

Is is possible? Given that input is const, and you cannot force
termination. Do we through during construction if last character is not
terminated? Is the approach of having to seperate underlying buffers
viable (or one buffer that switches between two underlying ones
depending on the mode (for iostreams)).

class iostream_mem_buf
{
  //interface...

  //Adheres to mem_buf. shares data.
  // Only one is used at any give time.
  in_mem_buf ibuff_;
  out_mem_buf obuff_;

  //set iaw with used mode
  gen_buff* used_;
};

for example, such a
> buffer could return an eof condition as soon as a null is extracted. It
> might be useful if you have a short null-terminated string in a large
> buffer. Of course you might just use strlen() with imemstream, but you
> shouldn't have to.

Yes, what if it is initialised with a buffer and a size where the size
includes the last true position, and is required to get the correct
result? I suppose having the pre-requisite that the input must be null
terminated, is an option.

> 3) what about i/o streams? that seems to be the most difficult task.

See above.

> Granted that I prefer to wait for the Portland meeting feedback before
> making further proposals, I can see two alternatives:

> 1) provide different class templates for null-terminated memstreams;
>
> 2) tweak the memstream to allow alternative null-terminated behaviour
> (yes, the do-it-all class ;-). Which behaviour should be the default
> should be decided.
>
> I feel solution 1) to be more promising. Maybe limiting ourselves to the
> output case with limited seeking support (for example, rewinding could
> be supported easily) could be a start. What do you think?

I feel honoured that you ask ;_). #2 would be difficult, as you've
pointed out. Whether is is truly do-it-all is a different matter. But
forcing null termination and not doing so is seperate concerns, which
in principle, should ask for seperate entities.

Thanks for your responses,

Kind regards,

Werner

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Sat, 9 Sep 2006 14:28:38 GMT
Raw View
werasm ha scritto:
>
> How about having a new stream type that forces this issue. You have to
> provide the size of the buffer. This is most often known at compile
> time, therefore:
>
> void output()
> {
>   omemstream<BUFLEN> stream;
>   stream << "Integer Set: " << 5 <<  ", " << 10 << ", etc. "
>     << std::endl;
>   c_legacy_function_that_reads_buffer( stream.c_str(), stream.length()
> );
>   //length() returns bytes streamed, size() return BUFLEN?
> }

I don't find it very attractive to have the memstream own the buffer
because there would still be use-case scenarios where a copy of the
entire buffer is required. That is one thing that my proposal precisely
tries to avoid. In particular, in the input case you would always
require to copy the input data, but also in output case you might need
to copy the output data. Consider this case and how it's handled with my
proposed interface:

struct LegacyStruct
{
  char s[BUFLEN];
  int p;
};

void foo()
{
  LegacyStruct ls;
  omemstream formatter(&ls.s, BUFLEN);
  formatter << /* something here */ << ends; // appends null terminator!
  if(!formatter)
    // buffer overflow! do something here
  ls.p = 12345;
  legacy_function(&ls);
}

With your proposed interface, I would need to copy the data out of the
memstream and into ls.s.

> I think it would be good to discern also, between a stream that adds a
> NULL terminator by default, and one that does not. This could be an
> instantiation option with the default choice being the safer route. For
> safety purposes implementations could also specify the actual size to
> be one greater than BUFLEN, and force the last to remain NULL
> terminated and untouched.
>
> template <unsigned size>
> class X
> {
>   enum{ StreamSz = size+1 }
>   char buffer_[StreamSz];
> };

I disagree that nulls should have a special treatment. That would make
the interface more complex and, in the end, the design would be as
error-prone as not providing the feature at all. For what's worth, just
look how easily I handled the null terminator in the example above.

> I have not looked at your proposal, but could this be considered?

Unfortunately the link I posted in my previous post is no longer valid.
I have a submitted a heavily revised paper to the committee, I don't
know how long it will take for it to be available to the public. In case
you are interested, I can send it to you directly.

Carl Barron ha scritto:
> In article <1157636428.873405.22210@m79g2000cwm.googlegroups.com>,
> werasm <w_erasm@telkomsa.net> wrote:
>
>    A lot of the excess copying can be avoid by being less terse,by
> using a default constructed stringstream and copying with
> an ostreambuf_iterator<char> to the stream and then seeking the
> beginning read [seekg(0)] and then reading the stream.

Precisely: you can avoid a lot of the excess copying, but the point here
is to avoid *all* copying *plus* avoiding the automatic buffer
management provided by stringstream. What's the point in having the
stringstream allocate a dynamically-sized buffer (something that might
require multiple accesses to the heap -> expensive) when you know in
advance the maximum size of the buffer and you already have a suitable
buffer already allocated?

As you see, passing std::string objects around is a nuisance that I aim
to avoid but it's not the only reason that makes, IMHO, this proposal
interesting.

> prepending an ostream can be read into a properly sized char array
> or vector<char> via std::copy(std::istreambuf_iterator<char>
> (stream),...);
>
>    <snip>
>
>    initializing a string stream form a '\0' terming terminated char
> array does not need to be converted to a string first, since something
> like
>
>    <snip>
> note this does no unnecessary conversions to std::string.

I hope you are not saying that that is a valid workaround! The goal is
to try to make life easier to the user. I don't think any programmer
would like to write that... (at least I don't).

> seems like the only thing is simple usages that can be handled simple
> stream buffers that don't seek, if you don't seek the stream
> [seekp,seekp] then the streambuf's defaults [always fail] will work.
> so only thing needed is a constructor to set the buffer and end()
> function for output only memory buf
>
>    struct isimplemembuf:std::streambuf
>    {
>       isimplemembuf(char *start,std::ptrdiff_t size)
>       {setg(start,start,start+size}
>       char *begin() {return eback();}
>       char *end() {return egptr();}
>    };
>
>     struct osimplemembuf:std::streambuf
>    {
>       osimplemembuf(char *start,std::ptrdiff_t size)
>       { setp(start,start+size);}
>       char *begin() {return pbase();}
>       char *end() {return pptr();}
>    };
>
> look simple enough:)
>     The seekpos and seekoff functions can be written if needed.
> Tedious but not complicated...

Yes, that's the idea, but I prefer having only one streambuf-derived
class able to manage both the input and output sequences. I did not
include begin/end as I don't see a valid reason for having them. It's
not that hard actually to provide seek support, so my proposal provides
that and little more. The proposal contains a working complete
implementation too.

Ganesh

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "werasm" <w_erasm@telkomsa.net>
Date: Mon, 11 Sep 2006 09:40:39 CST
Raw View
Alberto Ganesh Barbati wrote:
> werasm ha scritto:
> >
> > How about having a new stream type that forces this issue.
[snip]

> I don't find it very attractive to have the memstream own the buffer
> because there would still be use-case scenarios where a copy of the
> entire buffer is required. That is one thing that my proposal precisely
> tries to avoid. In particular, in the input case you would always
> require to copy the input data, but also in output case you might need
> to copy the output data. Consider this case and how it's handled with my
> proposed interface:
[snip]

> With your proposed interface, I would need to copy the data out of the
> memstream and into ls.s.

Not necessarily. See code below:

template <unsigned N>
struct array
{
  array( char (&rarray)[N+1] ): rarray_( rarray ){ }

  char (&rarray_)[N+1];
};

enum{ BUFLEN = 10 };

struct LegacyStruct
{
  char s[BUFLEN];
  int p;
};

int main()
{
  LegacyStruct ls;
  array<BUFLEN-1> a( ls.s );
  return 0;
}

Considering this code, memstream can now hold the reference to the
array:

template <int N>
class memstream
{
//...
char (&rarray_)[N+1];
};

This would work without any unecessary copying.

> > I think it would be good to discern also, between a stream that adds a
> > NULL terminator by default, and one that does not. This could be an
> > instantiation option with the default choice being the safer route. For
> > safety purposes implementations could also specify the actual size to
> > be one greater than BUFLEN, and force the last to remain NULL
> > terminated and untouched.
> >
> > template <unsigned size>
> > class X
> > {
> >   enum{ StreamSz = size+1 }
> >   char buffer_[StreamSz];
> > };
>
> I disagree that nulls should have a special treatment. That would make
> the interface more complex and, in the end, the design would be as
> error-prone as not providing the feature at all. For what's worth, just
> look how easily I handled the null terminator in the example above.

IMhO its so easy to forget that little <ends> at the end. If I had
control over the size (which I now do have :-) ), I would add the NULL
by default, always. I agree that sometimes one may not want the NULL.
Those cases could be handled seperately (see below):

template <unsigned N, bool AddNull = true>
struct array
{
  enum {SzToAdd = AddNull ? 1 : 0};
  array( char (&rarray)[N+SzToAdd] )
  : rarray_( rarray ){ }

  char (&rarray_)[N+SzToAdd];
};

> Unfortunately the link I posted in my previous post is no longer valid.
> I have a submitted a heavily revised paper to the committee, I don't
> know how long it will take for it to be available to the public. In case
> you are interested, I can send it to you directly.

I noticed (the link no longer worked), yes I would not mind if you did.

> > Carl Barron ha scritto:
[snip]
Alberto Ganesh Barbati wrote:
> I hope you are not saying that that is a valid workaround! The goal is
> to try to make life easier to the user. I don't think any programmer
> would like to write that... (at least I don't).

I absolutely agree, the whole idea is to abstract things, not to
complicate it :-)

Regards,

Werner

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: AlbertoBarbati@libero.it (Alberto Ganesh Barbati)
Date: Mon, 11 Sep 2006 23:53:54 GMT
Raw View
werasm ha scritto:
>=20
>> With your proposed interface, I would need to copy the data out of the
>> memstream and into ls.s.
>=20
> Not necessarily. See code below:
>=20
> <snip>
>=20
> Considering this code, memstream can now hold the reference to the
> array:
>=20
> template <int N>
> class memstream
> {
> //...
> char (&rarray_)[N+1];
> };
>=20
> This would work without any unecessary copying.

This code is quite different from the one you posted earlier. The
previous seemed to own the buffer, while this is just keeping a
reference. It has now become much more similar to my design. Just a
couple of remarks on yours:

1) the need to introduce a new type (array<N>) seems an unnecessary
complication to me;

2) I still fail to see the advantage of passing the size of the buffer
as a template parameter. In fact I find it a hindrance, making the
memstream useless in this real-life scenario:

// this function is called by some legacy framework,
// it's expected to fill the given buffer with something
void legacy_callback(char* buf, size_t buflen)
{
  omemstream formatter(buf, buflen);
  /* use formatter to fill the buffer and check overflow */
}

>> I disagree that nulls should have a special treatment. That would make
>> the interface more complex and, in the end, the design would be as
>> error-prone as not providing the feature at all. For what's worth, jus=
t
>> look how easily I handled the null terminator in the example above.
>=20
> IMhO its so easy to forget that little <ends> at the end. If I had
> control over the size (which I now do have :-) ), I would add the NULL
> by default, always. I agree that sometimes one may not want the NULL.
> Those cases could be handled seperately (see below):
>=20
> template <unsigned N, bool AddNull =3D true>
> struct array
> {
>   enum {SzToAdd =3D AddNull ? 1 : 0};
>   array( char (&rarray)[N+SzToAdd] )
>   : rarray_( rarray ){ }
>=20
>   char (&rarray_)[N+SzToAdd];
> };

You've got a point, but I'm still not convinced. Anyway, let's answer
this question first: *when* to add the NUL?

1) The first place I can think of is in the destructor of membuf. This
is IMHO very error prone! Consider the plain typical scenario:

void foo()
{
  char buf[BUFLEN];
  omemstream formatter(buf, BUFLEN);
  formatter << /* something here */; // no ends
  /* buffer overflow check here */;
  legacy_function(buf); // ops! NUL not yet appended!!!!
} // destructor appends NUL here: too late!

2) Alternatively, you could zero-initialize the buffer on the class
constructor, but then we would fail the goal to be as lightweight as
possible, performing an operation that the user might not want to pay.

3) Last option, we could add a NUL after each output operation. This
would ensure that the buffer is always null-terminated without the
expensive operation of initializing the entire buffer. Of course it
should be made optional. ...I just realized that there's a very nice and
quick way to implement this feature! I could just make the virtual
function membuf::sync() append the NUL without advancing the output
pointer. This is consistent with the description of streambuf::sync().
Given that, to have automatic NUL-termination you can just set flag
ios_base::unitbuf, et voil=E0: the destructor of the output sentry object
will call sync() eventually after each output operation. That would also
make flush() append the NUL without advancing the output pointer, which
could be useful if you need the intermediate results in the buffer but
you need to resume writing at a later time. Nifty!

However, whatever solution you choose, you have to be careful with
seeking and/or the possibility to use the same buffer for both output
and input operations at the same time. In these respects, I still find
solution #3 to be superior.

So, you see, #1 is a no-no, #2 and #3 introduce a non-negligible cost
and potential confusion in certain situations. Moreover I disagree with
you that the ends is so easy to forget. That would explain my resistance
to have null-termination as the default behaviour. However, #3 looks
very promising as it would be optional and I am indeed inclined to
consider it as soon as I find a way to properly handle the overflow. I
have to thank you for making me think about it. However, I guess every
change to the proposal should be postponed after the Portland meeting in
mid October.

> I noticed (the link no longer worked), yes I would not mind if you did.

The paper is now available on the C++ committee website at
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/#mailing2006-09>.
The doc number is N2065.

Ganesh

PS: Although the ISO 6429 name for the character '\0' is NULL, the name
NULL in C/C++ usually refers to a null pointer and this might create
confusion. That's why I prefer using the ASCII name NUL (with only one
L) or write null in lowercase.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]





Author: "werasm" <w_erasm@telkomsa.net>
Date: Tue, 12 Sep 2006 12:34:49 CST
Raw View
Alberto Ganesh Barbati wrote:
> 2) I still fail to see the advantage of passing the size of the buffer
> as a template parameter. In fact I find it a hindrance, making the
> memstream useless in this real-life scenario:

Granted, (static) size is restrictive.  It does allow the ostream to
own the buffer completely though, and have control over the terminator.

We use ostrstream most often (>80%)  in the following scenario (instead
of ostringstream as result of its heap usage):

{
  enum{ Size = 50 };
  char buffer[Size+1];
  buffer[Size] = 0; //terminate only last entry
  //lie about size as we want the last byte to remain untouched
  // and terminated.
  ostrstream os( buffer, Size );

  //output to ostrstream using formatting. If size
  // exceeded, last byte should not be touched and
  // buffer will remain terminated. Technically we can
  // forget adding ends, and we should not get UB
  // as streaming over last byte is not allowed.

  //buffer can now be used in call to legacy function
}

> // this function is called by some legacy framework,
> // it's expected to fill the given buffer with something
> void legacy_callback(char* buf, size_t buflen)
> {
>   omemstream formatter(buf, buflen);
>   /* use formatter to fill the buffer and check overflow */
> }

This case IMO is so vastly different from the other, that it justifies
having a unique/different ostream entity to handle it. Why provide a
(do-it-all) interface. That is the problem with the current ostrstream.
We only use it because there is not alternative for what we want to use
it for, and we know how to use it correctly. I suppose the case example
that you mention here above is more consistent with ostrstream's
current usage though.

I think we need a new entity that wraps around an array reference and
performs operations on the referenced array... array_ostream? I'll have
a look at what boost offers in terms of this.

[snip]
> > Those cases could be handled seperately (see below):
> >
> > template <unsigned N, bool AddNull = true>
> > struct array
> > {
> >   enum {SzToAdd = AddNull ? 1 : 0};
> >   array( char (&rarray)[N+SzToAdd] )
> >   : rarray_( rarray ){ }
> >
> >   char (&rarray_)[N+SzToAdd];
> > };
>
> You've got a point, but I'm still not convinced. Anyway, let's answer
> this question first: *when* to add the NUL?

Easy - right in the beginning - at construction, and never to be
touched again (we overflow before the last character is touched). The
user has to specify consider this (the NULL terminator) when specifying
size, else he would get a compilation error (which is great). I did not
show it though - but here:

template <unsigned N, bool AddNull = true>
class memstream
{
  public: //...

  enum {SzToAdd = AddNull ? 1 : 0};
  enum{ Sz = N+SzToAdd };

  memstream( char (&rarray)[Sz] )
   : rarray_( rarray )
  {
    rarray_[Sz-1] = '\0';
  }

   char (&rarray_)[Sz];
};

If he specifies the input buffer size as erroneous, then compilation
error. See why specifying size becomes safe/handy. Even though
specifying size is restrictive, it is not restrictive for the case most
often used (by us anyway). One can create seperate entities for
streaming to types vector<char> and <char*>, IMhO. Although, streaming
to char* should only remain for legacy purposes (which is a valid
reason). For this purpose, I suppose what you are proposing is good
(Must read it and will - thanks for link).

> 1) The first place I can think of is in the destructor of membuf. This
> is IMHO very error prone! Consider the plain typical scenario:

Yes, the thought never crossed my mind :-)

> 2) Alternatively, you could zero-initialize the buffer on the class
> constructor, but then we would fail the goal to be as lightweight as
> possible, performing an operation that the user might not want to pay.

Only necessary to set that last character to NULL, no great penalty.


> So, you see, #1 is a no-no, #2 and #3 introduce a non-negligible cost
> and potential confusion in certain situations.

#2 IMO is negligible. Null terminating the last position...

> Moreover I disagree with you that the ends is so easy to
> forget. That would explain my resistance to have null-
> termination as the default behaviour.

Adding ends is inconsistent with how output streams are used in
general, and for this reason easy to forget (or not know in the first
place). Forcing NULL termination at least prevents (or lessens the
chance of) UB, which is my objective.

> However, #3 looks
> very promising as it would be optional and I am indeed inclined to
> consider it as soon as I find a way to properly handle the overflow. I
> have to thank you for making me think about it.

Pleasure, 3 lookes promising for me too and is more consistent with how
other output streams are used.

> The paper is now available on the C++ committee website at
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/#mailing2006-09>.
> The doc number is N2065.

Will look, thank you :-).


> PS: Although the ISO 6429 name for the character '\0' is NULL, the name
> NULL in C/C++ usually refers to a null pointer and this might create
> confusion. That's why I prefer using the ASCII name NUL (with only one
> L) or write null in lowercase.

Makes sense to discern, Regards, Werner

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]