Topic: Should istream::read() be allowed to clobber


Author: Thiago Macieira <thiago@macieira.org>
Date: Fri, 21 Sep 2018 10:39:07 -0700
Raw View
On Friday, 21 September 2018 00:05:32 PDT Alberto Barbati wrote:
> If it allows a significant optimization, as stated by the Microsoft
> developers involved, definitely it should, in my opinion. Accessing the
> rest of the buffer pretending that it did not get overwritten is such a
> special and, no offense, rather useless scenario. I much rather prefer to
> have the main scenario optimized than having a specific corner case
> "correct". Again, that's my opinion.

If you had important data that you expected to access after the function ca=
ll,=20
you should not have left it in the buffer. You should always assume that th=
e=20
contents of the rest of the buffer, beyond what the function reported writi=
ng=20
to, to be unspecified values. For example, a security-conscious function ma=
y=20
want to simply zero the bytes out, to prevent leaking any information -- se=
e=20
also the distinction between strncpy and strlcpy.

For systems with CRLF line endings, executing the conversion in-place is a=
=20
huge performance optimisation. It can read from the underlying device the=
=20
number of bytes you asked, then go over what it read and see if there's any=
=20
CRLF to be transformed. That's much cheaper than doing a byte-by-byte read =
to=20
make sure it never writes more to your buffer than what you expected.

See also two recent posts on Raymond Chen's blog, similar to this:

https://blogs.msdn.microsoft.com/oldnewthing/20180608-00/?p=3D98945 - Why d=
oes=20
GetServiceDisplayNameA report a larger required buffer size than actually=
=20
necessary?

https://blogs.msdn.microsoft.com/oldnewthing/20180523-00/?p=3D98815 - If yo=
u say=20
that your buffer can hold 200 characters, then it had better hold 200=20
characters

In the latter, he links to this 2006 blog posting
https://blogs.msdn.microsoft.com/oldnewthing/20060320-13/?p=3D31853 (Basic=
=20
ground rules for programming =E2=80=93 function parameters and how they are=
 used),=20
where he declares:

"A function is permitted to read from the full extent of the buffer provide=
d=20
by the caller, even if not all of the buffer is required to determine the=
=20
result" and "A function is permitted to write to the full extent of the buf=
fer=20
provided by the caller, even if not all of the buffer is required to hold t=
he=20
result."

--=20
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/7370612.cpNcS2rTLB%40tjmaciei-mobl1.

.


Author: Edward Catmur <ed@catmur.co.uk>
Date: Sat, 22 Sep 2018 10:59:19 -0700 (PDT)
Raw View
------=_Part_277_1222511498.1537639159757
Content-Type: multipart/alternative;
 boundary="----=_Part_278_893034427.1537639159758"

------=_Part_278_893034427.1537639159758
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable



On Friday, 21 September 2018 19:39:14 UTC+2, Thiago Macieira wrote:
>
> On Friday, 21 September 2018 00:05:32 PDT Alberto Barbati wrote:=20
> > If it allows a significant optimization, as stated by the Microsoft=20
> > developers involved, definitely it should, in my opinion. Accessing the=
=20
> > rest of the buffer pretending that it did not get overwritten is such a=
=20
> > special and, no offense, rather useless scenario. I much rather prefer=
=20
> to=20
> > have the main scenario optimized than having a specific corner case=20
> > "correct". Again, that's my opinion.=20
>
> If you had important data that you expected to access after the function=
=20
> call,=20
> you should not have left it in the buffer. You should always assume that=
=20
> the=20
> contents of the rest of the buffer, beyond what the function reported=20
> writing=20
> to, to be unspecified values. For example, a security-conscious function=
=20
> may=20
> want to simply zero the bytes out, to prevent leaking any information --=
=20
> see=20
> also the distinction between strncpy and strlcpy.=20
>
> For systems with CRLF line endings, executing the conversion in-place is =
a=20
> huge performance optimisation. It can read from the underlying device the=
=20
> number of bytes you asked, then go over what it read and see if there's=
=20
> any=20
> CRLF to be transformed. That's much cheaper than doing a byte-by-byte rea=
d=20
> to=20
> make sure it never writes more to your buffer than what you expected.=20
>
> See also two recent posts on Raymond Chen's blog, similar to this:=20
>
> https://blogs.msdn.microsoft.com/oldnewthing/20180608-00/?p=3D98945 - Why=
=20
> does=20
> GetServiceDisplayNameA report a larger required buffer size than actually=
=20
> necessary?=20
>
> https://blogs.msdn.microsoft.com/oldnewthing/20180523-00/?p=3D98815 - If=
=20
> you say=20
> that your buffer can hold 200 characters, then it had better hold 200=20
> characters=20
>
> In the latter, he links to this 2006 blog posting=20
> https://blogs.msdn.microsoft.com/oldnewthing/20060320-13/?p=3D31853 (Basi=
c=20
> ground rules for programming =E2=80=93 function parameters and how they a=
re used),=20
> where he declares:=20
>
> "A function is permitted to read from the full extent of the buffer=20
> provided=20
> by the caller, even if not all of the buffer is required to determine the=
=20
> result" and "A function is permitted to write to the full extent of the=
=20
> buffer=20
> provided by the caller, even if not all of the buffer is required to hold=
=20
> the=20
> result."=20
>
>
The first of these dictates (reading) is a corollary of=20
[res.on.arguments]/1.2, which parallels C 7.1.4p1.

The second doesn't appear to be stated anywhere; I agree that it would be=
=20
beneficial for it to be added, other than where explicitly contradicted,=20
e.g. for C fgets:

> If end-of-file is encountered and no characters have been read into the=
=20
array, the contents of the array remain unchanged and a null pointer is=20
returned.

--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/f2722274-3d42-42d7-8488-23f4950d9ad4%40isocpp.or=
g.

------=_Part_278_893034427.1537639159758
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><br>On Friday, 21 September 2018 19:39:14 UTC+2, Thiag=
o Macieira  wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;marg=
in-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">On Friday, 2=
1 September 2018 00:05:32 PDT Alberto Barbati wrote:
<br>&gt; If it allows a significant optimization, as stated by the Microsof=
t
<br>&gt; developers involved, definitely it should, in my opinion. Accessin=
g the
<br>&gt; rest of the buffer pretending that it did not get overwritten is s=
uch a
<br>&gt; special and, no offense, rather useless scenario. I much rather pr=
efer to
<br>&gt; have the main scenario optimized than having a specific corner cas=
e
<br>&gt; &quot;correct&quot;. Again, that&#39;s my opinion.
<br>
<br>If you had important data that you expected to access after the functio=
n call,=20
<br>you should not have left it in the buffer. You should always assume tha=
t the=20
<br>contents of the rest of the buffer, beyond what the function reported w=
riting=20
<br>to, to be unspecified values. For example, a security-conscious functio=
n may=20
<br>want to simply zero the bytes out, to prevent leaking any information -=
- see=20
<br>also the distinction between strncpy and strlcpy.
<br>
<br>For systems with CRLF line endings, executing the conversion in-place i=
s a=20
<br>huge performance optimisation. It can read from the underlying device t=
he=20
<br>number of bytes you asked, then go over what it read and see if there&#=
39;s any=20
<br>CRLF to be transformed. That&#39;s much cheaper than doing a byte-by-by=
te read to=20
<br>make sure it never writes more to your buffer than what you expected.
<br>
<br>See also two recent posts on Raymond Chen&#39;s blog, similar to this:
<br>
<br><a href=3D"https://blogs.msdn.microsoft.com/oldnewthing/20180608-00/?p=
=3D98945" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#3=
9;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.com%2F=
oldnewthing%2F20180608-00%2F%3Fp%3D98945\x26sa\x3dD\x26sntz\x3d1\x26usg\x3d=
AFQjCNHG4IbFzny_h32e9Fohr_btWA3I0w&#39;;return true;" onclick=3D"this.href=
=3D&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.=
com%2Foldnewthing%2F20180608-00%2F%3Fp%3D98945\x26sa\x3dD\x26sntz\x3d1\x26u=
sg\x3dAFQjCNHG4IbFzny_h32e9Fohr_btWA3I0w&#39;;return true;">https://blogs.m=
sdn.microsoft.<wbr>com/oldnewthing/20180608-00/?<wbr>p=3D98945</a> - Why do=
es=20
<br>GetServiceDisplayNameA report a larger required buffer size than actual=
ly=20
<br>necessary?
<br>
<br><a href=3D"https://blogs.msdn.microsoft.com/oldnewthing/20180523-00/?p=
=3D98815" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#3=
9;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.com%2F=
oldnewthing%2F20180523-00%2F%3Fp%3D98815\x26sa\x3dD\x26sntz\x3d1\x26usg\x3d=
AFQjCNFPcgKh77joy1KZqXp6cHhIuhcBow&#39;;return true;" onclick=3D"this.href=
=3D&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.=
com%2Foldnewthing%2F20180523-00%2F%3Fp%3D98815\x26sa\x3dD\x26sntz\x3d1\x26u=
sg\x3dAFQjCNFPcgKh77joy1KZqXp6cHhIuhcBow&#39;;return true;">https://blogs.m=
sdn.microsoft.<wbr>com/oldnewthing/20180523-00/?<wbr>p=3D98815</a> - If you=
 say=20
<br>that your buffer can hold 200 characters, then it had better hold 200=
=20
<br>characters
<br>
<br>In the latter, he links to this 2006 blog posting
<br><a href=3D"https://blogs.msdn.microsoft.com/oldnewthing/20060320-13/?p=
=3D31853" target=3D"_blank" rel=3D"nofollow" onmousedown=3D"this.href=3D&#3=
9;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.com%2F=
oldnewthing%2F20060320-13%2F%3Fp%3D31853\x26sa\x3dD\x26sntz\x3d1\x26usg\x3d=
AFQjCNFbU1L63iILU3eqIdsYznkWA5qehw&#39;;return true;" onclick=3D"this.href=
=3D&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fblogs.msdn.microsoft.=
com%2Foldnewthing%2F20060320-13%2F%3Fp%3D31853\x26sa\x3dD\x26sntz\x3d1\x26u=
sg\x3dAFQjCNFbU1L63iILU3eqIdsYznkWA5qehw&#39;;return true;">https://blogs.m=
sdn.microsoft.<wbr>com/oldnewthing/20060320-13/?<wbr>p=3D31853</a> (Basic=
=20
<br>ground rules for programming =E2=80=93 function parameters and how they=
 are used),=20
<br>where he declares:
<br>
<br>&quot;A function is permitted to read from the full extent of the buffe=
r provided=20
<br>by the caller, even if not all of the buffer is required to determine t=
he=20
<br>result&quot; and &quot;A function is permitted to write to the full ext=
ent of the buffer=20
<br>provided by the caller, even if not all of the buffer is required to ho=
ld the=20
<br>result.&quot;
<br><br></blockquote><div><br></div><div>The first of these dictates (readi=
ng) is a corollary of [res.on.arguments]/1.2, which parallels C 7.1.4p1.</d=
iv><div><br></div><div>The second doesn&#39;t appear to be stated anywhere;=
 I agree that it would be beneficial for it to be added, other than where e=
xplicitly contradicted, e.g. for C fgets:</div><div><br></div><div>&gt;=C2=
=A0If end-of-file is encountered and no
characters have been read into the array, the contents of the array remain =
unchanged and a
null pointer is returned.</div></div>

<p></p>

-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/f2722274-3d42-42d7-8488-23f4950d9ad4%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/f2722274-3d42-42d7-8488-23f4950d9ad4=
%40isocpp.org</a>.<br />

------=_Part_278_893034427.1537639159758--

------=_Part_277_1222511498.1537639159757--

.


Author: Bengt Gustafsson <bengt.gustafsson@beamways.com>
Date: Sat, 22 Sep 2018 23:02:29 -0700 (PDT)
Raw View
------=_Part_1322_1331108195.1537682550075
Content-Type: text/plain; charset="UTF-8"

The C standard description mandates that fread be implemented using fgetc. Even if this is taken "as if" it seems pretty obvious that it is not allowed to use the remainder of the buffer as scratch pad memory.

Having noted this does not mean that I'm personally invested in formally preventing Microsoft's implementation, just reinforcing that the clarification is needed.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/a886dcb1-1e67-4629-a75f-d287e8778acb%40isocpp.org.

------=_Part_1322_1331108195.1537682550075--

.


Author: Thiago Macieira <thiago@macieira.org>
Date: Sun, 23 Sep 2018 10:40:29 -0700
Raw View
On Saturday, 22 September 2018 23:02:29 PDT Bengt Gustafsson wrote:
> The C standard description mandates that fread be implemented using fgetc.
> Even if this is taken "as if" it seems pretty obvious that it is not
> allowed to use the remainder of the buffer as scratch pad memory.
>
> Having noted this does not mean that I'm personally invested in formally
> preventing Microsoft's implementation, just reinforcing that the
> clarification is needed.

I don't agree. Even if the implementation actually did fgetc for each byte,
there's nothing that says it cannot use the rest of the buffer for some other
purpose. I stand by Raymond Chen's blog: unless otherwise specified, the
entire input buffer may be read and the entire output buffer may be written
to, regardless of how much the function actually needs to operate.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center



--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2292221.jRfMfKgFog%40tjmaciei-mobl1.

.