Topic: Closing file(s) in basic filebuf dtor problematic on Linux


Author: Alexander Sieb <sieb@sscd.de>
Date: Fri, 2 Aug 2002 16:36:26 GMT
Raw View
James Kanze wrote:
>
> Alexander Sieb <sieb@sscd.de> wrote in message
> news:<3D46BD66.B494A9FF@sscd.de>...
>
> > James Kanze wrote:
>
> > [ I deleted the beginning as we weren't really disagreeing ]
>
> > > To your credit, the Linux notes that you quote seem as vague --
> > > first they say you should always check the return status of close,
> > > then they say it doesn't really mean anything:-).
>
> > As I stated previously, the ambiguity results from to the fact that
> > only certain filesystems (NFS, Coda, AFS) sync the buffers when a file
> > is closed. These report errors. For other filesystems, close will
> > always succeed (except EINTR, EBADF), because they do not sync on
> > close and an error may get unnoticed.
>
> So if you really need to be sure, you make sure that your file is on an
> NFS system or local.
>

You basically wrote: If you want to be sure. Use a filesystem! ;-)

Seriously, if you want to be sure, you have to know your system.
Implementations have bugs. If you don't know them (in advance), you
are out of luck.

This is why I prefer Linux. I can look into the sources at any time
and if I don't understand something I can ask the person who
wrote it.

Btw, you have to fsync directories and files separately. fsync'ing a
directory doesn't mean the files it contains are also sync'ed.

[snip]

> > Yes, it is a good practice and in the normal case everything is fine.
> > I guess what I'm worried about is that a failure of close/fclose won't
> > get noticed in 100% of the cases. It's useful to have the dtors clean
> > up your resources, but then you need a mechanism to handle said
> > errors, whatever mechanism that turns out to be. That is why I have my
> > own streambuf which warns me about such errors.
>
> I agree, but I'm not to sure how the standard library can do better.
> Every project I've worked on has had some sort of logging mechanism;
> ideally, you'd like it logged that the file was closed in a destructor.
> But every project has had a different logging mechanism, and I don't see
> how a class in the standard library could be required to interact with
> an application specific log (which isn't even present on small, command
> line programs).
>

Well, why not add a logging library to the standard? Every project needs one.
Once you have that, the rest of the libraries can make use of it.


[snip]

>
> > > O_SYNC should suffice.  Round trip or no.
>
> > I hope you disabled the write cache on your HDDs.  :-)
>
> If I recall correctly, the HDDs had battery backup and a logic to flush
> the cache in case of failure.  (The disks were mirrored, of course.)
>
> The point shouldn't be forgotten, however.  There's no use making your
> software more resiliant than the hardware it runs on.
>

I agree, but people keep trying hard to do so.


[snip]

> > Obvioulsy, you want to know the failing system call + error code.
> > But, yes, it's difficult to decide what to do in such situations.
> > However, it's important to know about the problem.
>
> > (I tend to work in a distrbuted environment, which introduces a whole
> > new set of problems. It's really sad how many apps fail when you use
> > them with NFS. Using O_SYNC over NFS is a story for itself, btw)
>
> Agreed.  We're using it currently, and I'm not pleased about it.  (The
> decision was made before I came on the project.)  On the other hand,
> frankly, I don't think we need O_SYNC at all, given what the application
> does.  All of the data we write can be reconstituted from other sources.
>

Frankly, I wouldn't want to rely on NFS for our apps. But if I had to, I
would have redundant network elements (NICs, swttches, routers, etc) with
tested failover semantics. The server had to be configured to work in sync
mode; there is also async mode where the server replies to RPC immediately
without carrying out the request first. O_SYNC is useless in that case,
of course.


[snip]

> > You are probably right, I should have posted this to c.l.c++.m in the
> > first place. I thought the standard should provide users with the
> > highest degree of data integrity possible.
>
> Dream on.  C++ is a general purpose language.  It's fine as is for
> general purpose applications, but for critical software, you have to
> know what you are doing, and go beyond the standard, to get additional
> guarantees from the implementation.
>
> There are very strong historical reasons for this.  C++ derives from C.
> C was born on Unix.  And early Unix is not a system which could be used
> for critical applications anyway.  Even today, the attitude is generally
> don't slow down the typical application, even if the results aren't
> acceptable in a critical application.
>

I agree, I don't want to make C or C++ something it wasn't intended to be.


> > Loosing error states from system calls makes me nervous no matter how
> > poor the quality of an app is intended to be ;-)
>
> I agree, but I'm afraid we're fighting a loosing battle.
>

Yes, but it's still worth fighting it.

--
Regards,
 Alexander

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Sieb <sieb@sscd.de>
Date: Tue, 30 Jul 2002 16:39:40 GMT
Raw View
James Kanze wrote:
>

[ I deleted the beginning as we weren't really disagreeing ]

>
> To your credit, the Linux notes that you quote seem as vague -- first
> they say you should always check the return status of close, then they
> say it doesn't really mean anything:-).
>

As I stated previously, the ambiguity results from to the fact that only certain
filesystems (NFS, Coda, AFS) sync the buffers when a file is closed. These
report errors. For other filesystems, close will always succeed (except EINTR, EBADF),
because they do not sync on close and an error may get unnoticed.

> Seriously, for many applications, getting the data to the system is
> sufficient.  The probability of a failure after that is low, and such
> failures tend not to go unremarked, so the user will know that he
> probably has to regenerate the data.  The exception is when the "user"
> is a remotely connected machine, which bases further actions on the fact
> that you have memorized what it requested correctly.
>
> I think that the current situation is probably optimal, at least for the
> mainstream Unix platforms like Solaris and HP/UX.  The typical user
> doesn't need fully synchronous IO, and shouldn't be made to pay the
> price.  If you need more security, the first thing you normally do IS
> verify what the system actually provides.  In the case of Solaris, for
> example, you can use the O_SYNC flag on open, or set it later with
> fcntl, or you can explicitly call fsync after a critical series of
> writes.
>
> If your system doesn't provide something similar, they it is unusable
> for certain types of work.
>

I agree.

> > > It's well known in the programming circles I frequent that if you
> > > need the data on disk (as opposed to simply out of your program and
> > > into the operating system), you either set the file to a
> > > synchronized mode (so that write will wait for the actual write, and
> > > *will* report the error) or you use fdsync.  And you don't
> > > acknowledge to your clients that the request has been fulfilled
> > > until the sync has finished.
>
> > I know, but do the stream classes support this?
>
> Not directly.  The semantics of the stream classes is defined in terms
> of fopen, etc.  These functions haven't changed since C90.  When C90 was
> being formulated, most Unixes didn't yet have any of the necessary
> support.  Since it was considered unacceptable to specify C in such a
> way as to render an implementation under Unix impossible...
>
> It is a fact of life that to write critical software, you depend on
> guarantees and functions not in the standard.  Of course, if you are
> writing software that needs full data integrety, then you are probably
> using sockets as well (not in the standard), perhaps threads (not in the
> standard), and you are certainly concerned by more signals than those
> defined in the standard.
>

True.

> > > > I checked STLport and libstdc++-v3 (from gcc 3.1) and none of them
> > > > handle s this specific problem. If close(2) indicates an io error
> > > > and you rely on automatic resource management of e.g. ofstream you
> > > > can end up with a corrupted file.
>
> > > If the streams are buffered (the usual case), and you use this
> > > idiom, you can get a corrupted stream even if it is in synchronized
> > > mode; the filebuf::close function begins by flushing, which
> > > eventually calls write.  I fail to see the problem with close.
>
> > Sorry, I don't understand the last sentence.
>
> What I'm trying to say is that (obviously), the system synchronization
> only concerns the system buffers, and is oblivious to any buffering in
> filebuf or stdio.  If you count on the close in the destructor, you
> can't check the error, and there may be a significant error.
>
> My point is simply that this is the way filebuf was designed to work.
> This is the way it *must* work; if filebuf didn't close in the
> destructor, it would leak resources if the user didn't close otherwise
> (e.g. during a stack walkback due to an exception.)
>

See below.

> I'd like to see how you could get a compile time error for this.
>

Sorry about that one. I shouldn't answer messages at 4am :-)

> > The problem I have with the way it is now is that a novice users will
> > mimic this kind of design in their own projects. It's problematic at
> > least from an educational point of view.
>
> Have you really looked at the standard library some?  Have you found
> anything in it that could serve as a good example?  IMHO, the iostream
> hierarchy (with the separation of concerns between [io]stream and
> streambuf, the intelligent use of the template pattern in streambuf,
> etc.) is probably the best part of the library.
>
> I'm not too sure either what you are criticizing.  Calling close in
> filebuf::~filebuf is good defensive programming; it prevents a serious
> resource leak in the case of sloppy user programming.  IMHO, it is also
> a useful feature in certain specific cases of error handling; I don't
> need a try block to close the file after an exception.  It's obviously
> an error to count on it in the "standard" case.  Any textbook presenting
> the standard library should point this out.  The standard is not a
> textbook, though, and should only say what the implementation should do,
> not why.
>

Sorry about the confusion.

Yes, it is a good practice and in the normal case everything is fine.
I guess what I'm worried about is that a failure of close/fclose won't get
noticed in 100% of the cases. It's useful to have the dtors clean up your
resources, but then you need a mechanism to handle said errors, whatever
mechanism that turns out to be. That is why I have my own streambuf which
warns me about such errors.

[snip]

>
> > As I stated above. You can never be sure unless you have a full round
> > trip from user space to your storage and back again (using O_SYNC of
> > course).
>
> O_SYNC should suffice.  Round trip or no.
>

I hope you disabled the write cache on your HDDs.  :-)


[snip]

> > I agree, but you would also need an indication of what happened.  A
> > failure like this won't be easily reproducible.  Ideally, you would
> > get a core dump plus a syslog message.  You could do that by adding a
> > fatal_io_error callback which is to be called for situations described
> > here.
>
> So what if the failure occurs after you've called exit?  (Sorry, I don't
> have any good answers.  As I say, I don't do many of these type of
> applications.  The applications I do do tend to make extensive use of
> O_SYNC or fsync.  The more critical applications ensure that the
> critical files are on the local disk, too.  No need to go looking for
> trouble:-).)
>

Obvioulsy, you want to know the failing system call + error code.
But, yes, it's difficult to decide what to do in such situations.
However, it's important to know about the problem.

(I tend to work in a distrbuted environment, which introduces a whole new
set of problems. It's really sad how many apps fail when you use them
with NFS. Using O_SYNC over NFS is a story for itself, btw)

> Anyway, as I said, I'm not sure what you point is.  I agree with the
> need for conservative caution, and a lot more mistrust with regards to
> what is actually happening in the system.  But I don't see a standards
> issue here; at most, a quality of implementation question.  In this
> sense, the points are worth insisting upon, but in
> comp.lang.c++.moderated, rather than here.  And you mustn't forget that
> different applications have different requirements with regards to
> quality.  Linux, or even Windows NT, seems to be adequate for Web site
> hosting, for example.  But I don't know anyone using them for a
> telephone node, or for critical banking applications.  (Windows is very
> popular for the client applications, which don't contain any critical
> data or functionality, but the servers tend to come from Sun, HP or IBM,
> and they all run Open Systems Unix.)
>

You are probably right, I should have posted this to c.l.c++.m in the first
place. I thought the standard should provide users with the highest
degree of data integrity possible. Loosing error states from system calls makes
me nervous no matter how poor the quality of an app is intended to be ;-)

[ Linux is really worth looking at. It becomes more and more popular these
days as it has proven to be reliable and secure. The best example is probably
Google which relies completely on Linux for their ~10.000 servers. ]


--
Regards,
 Alexander

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Wed, 31 Jul 2002 17:58:59 GMT
Raw View
Alexander Sieb <sieb@sscd.de> wrote in message
news:<3D46BD66.B494A9FF@sscd.de>...

> James Kanze wrote:

> [ I deleted the beginning as we weren't really disagreeing ]

> > To your credit, the Linux notes that you quote seem as vague --
> > first they say you should always check the return status of close,
> > then they say it doesn't really mean anything:-).

> As I stated previously, the ambiguity results from to the fact that
> only certain filesystems (NFS, Coda, AFS) sync the buffers when a file
> is closed. These report errors. For other filesystems, close will
> always succeed (except EINTR, EBADF), because they do not sync on
> close and an error may get unnoticed.

So if you really need to be sure, you make sure that your file is on an
NFS system or local.

    [...]
> > I'm not too sure either what you are criticizing.  Calling close in
> > filebuf::~filebuf is good defensive programming; it prevents a
> > serious resource leak in the case of sloppy user programming.  IMHO,
> > it is also a useful feature in certain specific cases of error
> > handling; I don't need a try block to close the file after an
> > exception.  It's obviously an error to count on it in the "standard"
> > case.  Any textbook presenting the standard library should point
> > this out.  The standard is not a textbook, though, and should only
> > say what the implementation should do, not why.

> Sorry about the confusion.

> Yes, it is a good practice and in the normal case everything is fine.
> I guess what I'm worried about is that a failure of close/fclose won't
> get noticed in 100% of the cases. It's useful to have the dtors clean
> up your resources, but then you need a mechanism to handle said
> errors, whatever mechanism that turns out to be. That is why I have my
> own streambuf which warns me about such errors.

I agree, but I'm not to sure how the standard library can do better.
Every project I've worked on has had some sort of logging mechanism;
ideally, you'd like it logged that the file was closed in a destructor.
But every project has had a different logging mechanism, and I don't see
how a class in the standard library could be required to interact with
an application specific log (which isn't even present on small, command
line programs).

> [snip]

> > > As I stated above. You can never be sure unless you have a full
> > > round trip from user space to your storage and back again (using
> > > O_SYNC of course).

> > O_SYNC should suffice.  Round trip or no.

> I hope you disabled the write cache on your HDDs.  :-)

If I recall correctly, the HDDs had battery backup and a logic to flush
the cache in case of failure.  (The disks were mirrored, of course.)

The point shouldn't be forgotten, however.  There's no use making your
software more resiliant than the hardware it runs on.

> [snip]

> > > I agree, but you would also need an indication of what happened.
> > > A failure like this won't be easily reproducible.  Ideally, you
> > > would get a core dump plus a syslog message.  You could do that by
> > > adding a fatal_io_error callback which is to be called for
> > > situations described here.

> > So what if the failure occurs after you've called exit?  (Sorry, I
> > don't have any good answers.  As I say, I don't do many of these
> > type of applications.  The applications I do do tend to make
> > extensive use of O_SYNC or fsync.  The more critical applications
> > ensure that the critical files are on the local disk, too.  No need
> > to go looking for trouble:-).)

> Obvioulsy, you want to know the failing system call + error code.
> But, yes, it's difficult to decide what to do in such situations.
> However, it's important to know about the problem.

> (I tend to work in a distrbuted environment, which introduces a whole
> new set of problems. It's really sad how many apps fail when you use
> them with NFS. Using O_SYNC over NFS is a story for itself, btw)

Agreed.  We're using it currently, and I'm not pleased about it.  (The
decision was made before I came on the project.)  On the other hand,
frankly, I don't think we need O_SYNC at all, given what the application
does.  All of the data we write can be reconstituted from other sources.

> > Anyway, as I said, I'm not sure what you point is.  I agree with the
> > need for conservative caution, and a lot more mistrust with regards
> > to what is actually happening in the system.  But I don't see a
> > standards issue here; at most, a quality of implementation question.
> > In this sense, the points are worth insisting upon, but in
> > comp.lang.c++.moderated, rather than here.  And you mustn't forget
> > that different applications have different requirements with regards
> > to quality.  Linux, or even Windows NT, seems to be adequate for Web
> > site hosting, for example.  But I don't know anyone using them for a
> > telephone node, or for critical banking applications.  (Windows is
> > very popular for the client applications, which don't contain any
> > critical data or functionality, but the servers tend to come from
> > Sun, HP or IBM, and they all run Open Systems Unix.)

> You are probably right, I should have posted this to c.l.c++.m in the
> first place. I thought the standard should provide users with the
> highest degree of data integrity possible.

Dream on.  C++ is a general purpose language.  It's fine as is for
general purpose applications, but for critical software, you have to
know what you are doing, and go beyond the standard, to get additional
guarantees from the implementation.

There are very strong historical reasons for this.  C++ derives from C.
C was born on Unix.  And early Unix is not a system which could be used
for critical applications anyway.  Even today, the attitude is generally
don't slow down the typical application, even if the results aren't
acceptable in a critical application.

> Loosing error states from system calls makes me nervous no matter how
> poor the quality of an app is intended to be ;-)

I agree, but I'm afraid we're fighting a loosing battle.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: 27 Jul 2002 17:45:32 GMT
Raw View
Alexander Sieb <sieb@sscd.de> wrote in message
news:<3D408DFC.8F546FF8@sscd.de>...

> it is hopefully^W well known that ignoring the return value of
> close(2) is a programming error on Linux.

Ignoring the return value of most system functions is generally an
error.  Close is probably an exception; there are some situations where
ignoring the return value is perfectly acceptable.  (Note that in this
context, close is significantly different from fclose or
filebuf::close, because there is no local buffering.)

> The close(2) man page describes it as follows:

> <quote>
> NOTES
>        Not checking the return value of close is a common  but  neverthe?
>        less serious programming error.  File system implementations which
>        use techniques as ``write-behind''  to  increase  performance  may
>        lead  to write(2) succeeding, although the data has not been writ?
>        ten yet.  The error status may be reported at a later write opera?
>        tion,  but  it  is  guaranteed to be reported on closing the file.
>        Not checking the return value when closing the file  may  lead  to
>        silent loss of data.  This can especially be observed with NFS and
>        disk quotas.
>
>        A successful close does not guarantee that the data has been  suc-
>        cessfully  saved  to  disk, as the kernel defers writes. It is not
>        common for a filesystem to flush the buffers when  the  stream  is
>        closed.  If you need to be sure that the data is physically stored
>        use fsync(2) or sync(2), they will get you closer to that goal (it
>        will depend on the disk hardware at this point).
> </quote>

An interesting quote.  First, it says that you shouldn't ignore the
return value of close, because the write may not report the error.  Then
it says that close may not report the error.

It's well known in the programming circles I frequent that if you need
the data on disk (as opposed to simply out of your program and into the
operating system), you either set the file to a synchronized mode (so
that write will wait for the actual write, and *will* report the error)
or you use fdsync.  And you don't acknowledge to your clients that the
request has been fulfilled until the sync has finished.

> The same is probably true for other operating systems but not
> documented.  I'm writing about Linux here.

I think that the behavior is general Unix.  It was true with version 7
(back in the early 80's).  What has changed is that you now have the
option to synchronize.

> Now what happens if close(2) is called from within a dtor and returns
> EIO ?  How can an application find out that not all data has made it
> to the filesystem? (Note: Even if no error is reported in user space,
> a million things can go wrong in kernel space before your data
> *really* hits the platters of your HDD. But that's a totally different
> story)

I would guess the same thing as what happens if close doesn't report an
error, but the system crashes before disks were sync'ed.

> As I read the Standard, basic filebuf::~basic filebuf is supposed to
> clos e the file if it hasn't been done before. But nothing is being
> said about what should happen if the underlying system call close(2)
> fails.

The close should fail.  But the standard doesn't address this level.
All the standard discusses is getting the data into the system.  For
stricter requirements, you can't use the standard streams.  Unless, of
course, the implementor has provided some extensions.  (Most Unix
implementations offer some means of getting at the file descriptor.  And
with the file descriptor, you can place the output in synchronized
mode.)

> I checked STLport and libstdc++-v3 (from gcc 3.1) and none of them
> handle s this specific problem. If close(2) indicates an io error and
> you rely on automatic resource management of e.g. ofstream you can end
> up with a corrupted file.

If the streams are buffered (the usual case), and you use this idiom,
you can get a corrupted stream even if it is in synchronized mode; the
filebuf::close function begins by flushing, which eventually calls
write.  I fail to see the problem with close.

It is, I think, generally well known that you should *normally* close
the file before calling the destructor, at least for output, so that you
can handle eventual errors.  This is just standard good programming
practice, in effect just about everywhere I've worked.  That doesn't
mean that you shouldn't close in the destructor.  Closing an already
closed file is not a serious error, and not closing a file, even in the
case of a programming error, is a resource leak.

In a correct program, about the only time the file will be open when the
destructor is called is during stack walkback during exception handling.
In this case, you have an exceptional case.  In many cases, you will
throw out the file you have been writing anyway, because it probably is
corrupt (even without an error in close).  In other cases, you have to
do something to recover; what is not obvious.  Typically, if you get an
error from flush or close, the resulting file is unusable anyway.  (In
my current application, we first write recognizable dummy data of the
target length, and check the return status, so that disk full errors can
be be caught before writing half a record.  But it still isn't 100%
safe; the data can span several disk blocks, and the system can crash
after writing one, but before writing the second.)

> So is it possible to change the Standard to handle this situation
> without breaking existing applications?

> IMHO, the only solution at the moment is to make sure a file stream
> has b een closed before its dtor is being called.

This is standard practice except in error cases where the file will be
discarded as a result of the error anyway.

> Ideas anybody?

> PS: What happens if an application's cout is redirected to a file
> located on a NFS mounted filesystem, but cout.close() fails?

You're out of luck.

You've actually got a good point there.  Outputting to standard out is
generally characteristic of command line interface programs, and not
others.  Command line programs are used within a larger context, and it
is important, yeh essential, that the larger context be informed of the
failure.  Logically, the standard should be modified such that if the
close of cout fails, the program returns EXIT_FAILURE, regardless of
what the programmer has passed to exit or returned from main.  Except
that cout is never closed.  About the best one can hope for is to not
use cout in static destructors, do a flush on it at the end of the
program, and check the return value of that flush.

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: Alexander Sieb <sieb@sscd.de>
Date: Sun, 28 Jul 2002 21:54:11 GMT
Raw View
James Kanze wrote:
>
> Alexander Sieb <sieb@sscd.de> wrote in message
> news:<3D408DFC.8F546FF8@sscd.de>...
>
> > it is hopefully^W well known that ignoring the return value of
> > close(2) is a programming error on Linux.
>
> Ignoring the return value of most system functions is generally an
> error.  Close is probably an exception; there are some situations where
> ignoring the return value is perfectly acceptable.  (Note that in this
> context, close is significantly different from fclose or
> filebuf::close, because there is no local buffering.)
>

Can you please explain why user space buffering makes a difference?
fclose() calls fflush() (which calls write(2)) and then close(2).
If any of those functions fails errno is set and fclose() fails.
filebuf::close calls fflush + fclose() or write(2) + close(2) depending on
how the iostreams library has been compiled (to use stdio or system calls).


> > The close(2) man page describes it as follows:
>
> > <quote>
> > NOTES
> >        Not checking the return value of close is a common  but  neverthe?
> >        less serious programming error.  File system implementations which
> >        use techniques as ``write-behind''  to  increase  performance  may
> >        lead  to write(2) succeeding, although the data has not been writ?
> >        ten yet.  The error status may be reported at a later write opera?
> >        tion,  but  it  is  guaranteed to be reported on closing the file.
> >        Not checking the return value when closing the file  may  lead  to
> >        silent loss of data.  This can especially be observed with NFS and
> >        disk quotas.
> >
> >        A successful close does not guarantee that the data has been  suc-
> >        cessfully  saved  to  disk, as the kernel defers writes. It is not
> >        common for a filesystem to flush the buffers when  the  stream  is
> >        closed.  If you need to be sure that the data is physically stored
> >        use fsync(2) or sync(2), they will get you closer to that goal (it
> >        will depend on the disk hardware at this point).
> > </quote>
>
> An interesting quote.  First, it says that you shouldn't ignore the
> return value of close, because the write may not report the error.  Then
> it says that close may not report the error.
>

The error can only be reported if the filesystem implements certain
methods (f_op->flush). Currently only NFS, Coda and AFS do that AFAIK.
So you better run your test suites with your data located on a NFS server
and see what happens when cut the connection to it.

In addition, I'd like to quote a message from Alan Cox he wrote on the Linux
kernel mailing list.

<quote>
close() checking is not about physical disk guarantees. It's about more
basic "I/O completed". In some future Linux only close() might tell you
about some kinds of I/O error. The fact it doesn't do it now is no
excuse for sloppy programming
</quote>


> It's well known in the programming circles I frequent that if you need
> the data on disk (as opposed to simply out of your program and into the
> operating system), you either set the file to a synchronized mode (so
> that write will wait for the actual write, and *will* report the error)
> or you use fdsync.  And you don't acknowledge to your clients that the
> request has been fulfilled until the sync has finished.
>

I know, but do the stream classes support this?

In fact to be absolutely sure you have to do a write-read-verify because
some storage devices - especially IDE - don't implement a 'verify' command or
if they do it is buggy or the driver is buggy, or...

> > I checked STLport and libstdc++-v3 (from gcc 3.1) and none of them
> > handle s this specific problem. If close(2) indicates an io error and
> > you rely on automatic resource management of e.g. ofstream you can end
> > up with a corrupted file.
>
> If the streams are buffered (the usual case), and you use this idiom,
> you can get a corrupted stream even if it is in synchronized mode; the
> filebuf::close function begins by flushing, which eventually calls
> write.  I fail to see the problem with close.
>

Sorry, I don't understand the last sentence.

> It is, I think, generally well known that you should *normally* close
> the file before calling the destructor, at least for output, so that you
> can handle eventual errors.  This is just standard good programming
> practice, in effect just about everywhere I've worked.  That doesn't
> mean that you shouldn't close in the destructor.  Closing an already
> closed file is not a serious error, and not closing a file, even in the
> case of a programming error, is a resource leak.
>

It is a good practice, but shouldn't it be enforced by the design?
IMHO, an application shouldn't even compile if you forget to close
the stream before destructing it.
The problem I have with the way it is now is that a novice users will mimic
this kind of design in their own projects. It's problematic at least
from an educational point of view.


> In a correct program, about the only time the file will be open when the
> destructor is called is during stack walkback during exception handling.
> In this case, you have an exceptional case.  In many cases, you will
> throw out the file you have been writing anyway, because it probably is
> corrupt (even without an error in close).  In other cases, you have to
> do something to recover; what is not obvious.  Typically, if you get an
> error from flush or close, the resulting file is unusable anyway.  (In
> my current application, we first write recognizable dummy data of the
> target length, and check the return status, so that disk full errors can
> be be caught before writing half a record.  But it still isn't 100%
> safe; the data can span several disk blocks, and the system can crash
> after writing one, but before writing the second.)
>

As I stated above. You can never be sure unless you have a full round trip
from user space to your storage and back again (using O_SYNC of course).
But in case of a failure I would like to have the full state of every
single object in the system available to run consistency checks.

> > PS: What happens if an application's cout is redirected to a file
> > located on a NFS mounted filesystem, but cout.close() fails?
>
> You're out of luck.
>
> You've actually got a good point there.  Outputting to standard out is
> generally characteristic of command line interface programs, and not
> others.  Command line programs are used within a larger context, and it
> is important, yeh essential, that the larger context be informed of the
> failure.  Logically, the standard should be modified such that if the
> close of cout fails, the program returns EXIT_FAILURE, regardless of
> what the programmer has passed to exit or returned from main.  Except
> that cout is never closed.  About the best one can hope for is to not
> use cout in static destructors, do a flush on it at the end of the
> program, and check the return value of that flush.
>

I agree, but you would also need an indication of what happened. A failure
like this won't be easily reproducible. Ideally, you would get a core
dump plus a syslog message. You could do that by adding a fatal_io_error
callback which is to be called for situations described here.

--
Regards,
 Alexander

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: kanze@gabi-soft.de (James Kanze)
Date: Mon, 29 Jul 2002 16:41:26 GMT
Raw View
Alexander Sieb <sieb@sscd.de> wrote in message
news:<3D43502B.A252D18C@sscd.de>...
> James Kanze wrote:

> > Alexander Sieb <sieb@sscd.de> wrote in message
> > news:<3D408DFC.8F546FF8@sscd.de>...

> > > it is hopefully^W well known that ignoring the return value of
> > > close(2) is a programming error on Linux.

> > Ignoring the return value of most system functions is generally an
> > error.  Close is probably an exception; there are some situations
> > where ignoring the return value is perfectly acceptable.  (Note that
> > in this context, close is significantly different from fclose or
> > filebuf::close, because there is no local buffering.)

> Can you please explain why user space buffering makes a difference?

Because that's where the standard draws its line. ISO/IEC 9899:1999
7.19.5.2/2: "[...] the fflush function causes any unwritten data for
that stream to be delivered to the host environment to be written to the
file; [...]".  Note the "to be written"; it is quite clear here that
fflush can return before the actual write to disk takes place.  The
fclose function has exactly the same words.

I had some trouble composing an answer to your initial posting because I
was not fully clear what point you were trying to make.  It is a serious
programming error not to check the return status of fclose or
filebuf::close before using the file being written; in cases where
another program may attempt to use the file, as in the case of command
line functions, it is essential to reflect the error in the return
status.

And yes, this does mean that you cannot *normally* count on the close in
the destructor.  You must explicitly call close, testing the status.
Unless, of course, another error has already made the file unusable.  In
this case, the close in the destructor (called for example when stack
unwinding as a result of an exception) is fine, since the file will be
thrown away anyway, regardless of whether the close succeeds or not.

This is one point.  It is, or should be, well known, but it seems to be
forgotten often enough that repeating it cannot do any harm.

The second point (also important) is that even after a successful close
(or flush), the data may not be on the disk, and it may never get there,
because of the effects of system buffering.  The standard explicitly
esquises this point, doubtlessly for the reason that in earlier Unix
(including those current when the first version of the C standard was
being written), requiring anything else would have resulted in a
language which couldn't be implemented under Unix.  Even today,
requiring anything else will result in an implementation significantly
slower.  (I know.  My applications *require* synchronous writes.  And
they are slow.)

> fclose() calls fflush() (which calls write(2)) and then close(2).  If
> any of those functions fails errno is set and fclose() fails.
> filebuf::close calls fflush + fclose() or write(2) + close(2)
> depending on how the iostreams library has been compiled (to use stdio
> or system calls).

>
> > > The close(2) man page describes it as follows:
>
> > > <quote>
> > > NOTES
> > >        Not checking the return value of close is a common  but  neverthe?
> > >        less serious programming error.  File system implementations which
> > >        use techniques as ``write-behind''  to  increase  performance  may
> > >        lead  to write(2) succeeding, although the data has not been writ?
> > >        ten yet.  The error status may be reported at a later write opera?
> > >        tion,  but  it  is  guaranteed to be reported on closing the file.
> > >        Not checking the return value when closing the file  may  lead  to
> > >        silent loss of data.  This can especially be observed with NFS and
> > >        disk quotas.

> > >        A successful close does not guarantee that the data has been  suc-
> > >        cessfully  saved  to  disk, as the kernel defers writes. It is not
> > >        common for a filesystem to flush the buffers when  the  stream  is
> > >        closed.  If you need to be sure that the data is physically stored
> > >        use fsync(2) or sync(2), they will get you closer to that goal (it
> > >        will depend on the disk hardware at this point).
> > > </quote>

To your credit, the Linux notes that you quote seem as vague -- first
they say you should always check the return status of close, then they
say it doesn't really mean anything:-).

Seriously, for many applications, getting the data to the system is
sufficient.  The probability of a failure after that is low, and such
failures tend not to go unremarked, so the user will know that he
probably has to regenerate the data.  The exception is when the "user"
is a remotely connected machine, which bases further actions on the fact
that you have memorized what it requested correctly.

I think that the current situation is probably optimal, at least for the
mainstream Unix platforms like Solaris and HP/UX.  The typical user
doesn't need fully synchronous IO, and shouldn't be made to pay the
price.  If you need more security, the first thing you normally do IS
verify what the system actually provides.  In the case of Solaris, for
example, you can use the O_SYNC flag on open, or set it later with
fcntl, or you can explicitly call fsync after a critical series of
writes.

If your system doesn't provide something similar, they it is unusable
for certain types of work.

> > An interesting quote.  First, it says that you shouldn't ignore the
> > return value of close, because the write may not report the error.
> > Then it says that close may not report the error.

> The error can only be reported if the filesystem implements certain
> methods (f_op->flush). Currently only NFS, Coda and AFS do that AFAIK.
> So you better run your test suites with your data located on a NFS
> server and see what happens when cut the connection to it.

I'm not interested in the means.  All I am concerned with is the end.

> In addition, I'd like to quote a message from Alan Cox he wrote on the
> Linux kernel mailing list.

> <quote>
> close() checking is not about physical disk guarantees. It's about more
> basic "I/O completed". In some future Linux only close() might tell you
> about some kinds of I/O error. The fact it doesn't do it now is no
> excuse for sloppy programming
> </quote>

I totally agree with this message.

As I said, for most programs, just getting the data to the system is
sufficient.

> > It's well known in the programming circles I frequent that if you
> > need the data on disk (as opposed to simply out of your program and
> > into the operating system), you either set the file to a
> > synchronized mode (so that write will wait for the actual write, and
> > *will* report the error) or you use fdsync.  And you don't
> > acknowledge to your clients that the request has been fulfilled
> > until the sync has finished.

> I know, but do the stream classes support this?

Not directly.  The semantics of the stream classes is defined in terms
of fopen, etc.  These functions haven't changed since C90.  When C90 was
being formulated, most Unixes didn't yet have any of the necessary
support.  Since it was considered unacceptable to specify C in such a
way as to render an implementation under Unix impossible...

It is a fact of life that to write critical software, you depend on
guarantees and functions not in the standard.  Of course, if you are
writing software that needs full data integrety, then you are probably
using sockets as well (not in the standard), perhaps threads (not in the
standard), and you are certainly concerned by more signals than those
defined in the standard.

> In fact to be absolutely sure you have to do a write-read-verify
> because some storage devices - especially IDE - don't implement a
> 'verify' command or if they do it is buggy or the driver is buggy,
> or...

The answer to that is that if you need data integrety, you don't use
these types of storage devices.

> > > I checked STLport and libstdc++-v3 (from gcc 3.1) and none of them
> > > handle s this specific problem. If close(2) indicates an io error
> > > and you rely on automatic resource management of e.g. ofstream you
> > > can end up with a corrupted file.

> > If the streams are buffered (the usual case), and you use this
> > idiom, you can get a corrupted stream even if it is in synchronized
> > mode; the filebuf::close function begins by flushing, which
> > eventually calls write.  I fail to see the problem with close.

> Sorry, I don't understand the last sentence.

What I'm trying to say is that (obviously), the system synchronization
only concerns the system buffers, and is oblivious to any buffering in
filebuf or stdio.  If you count on the close in the destructor, you
can't check the error, and there may be a significant error.

My point is simply that this is the way filebuf was designed to work.
This is the way it *must* work; if filebuf didn't close in the
destructor, it would leak resources if the user didn't close otherwise
(e.g. during a stack walkback due to an exception.)

> > It is, I think, generally well known that you should *normally*
> > close the file before calling the destructor, at least for output,
> > so that you can handle eventual errors.  This is just standard good
> > programming practice, in effect just about everywhere I've worked.
> > That doesn't mean that you shouldn't close in the destructor.
> > Closing an already closed file is not a serious error, and not
> > closing a file, even in the case of a programming error, is a
> > resource leak.

> It is a good practice, but shouldn't it be enforced by the design?
> IMHO, an application shouldn't even compile if you forget to close the
> stream before destructing it.

I'd like to see how you could get a compile time error for this.

But seriously, as I said above, the close in the destructor *is* useful
in conjunction with exceptions, and it is perfectly valid to count on it
when handling an error condition which will result in the file being
thrown away anyway.

> The problem I have with the way it is now is that a novice users will
> mimic this kind of design in their own projects. It's problematic at
> least from an educational point of view.

Have you really looked at the standard library some?  Have you found
anything in it that could serve as a good example?  IMHO, the iostream
hierarchy (with the separation of concerns between [io]stream and
streambuf, the intelligent use of the template pattern in streambuf,
etc.) is probably the best part of the library.

I'm not too sure either what you are criticizing.  Calling close in
filebuf::~filebuf is good defensive programming; it prevents a serious
resource leak in the case of sloppy user programming.  IMHO, it is also
a useful feature in certain specific cases of error handling; I don't
need a try block to close the file after an exception.  It's obviously
an error to count on it in the "standard" case.  Any textbook presenting
the standard library should point this out.  The standard is not a
textbook, though, and should only say what the implementation should do,
not why.

> > In a correct program, about the only time the file will be open when
> > the destructor is called is during stack walkback during exception
> > handling.  In this case, you have an exceptional case.  In many
> > cases, you will throw out the file you have been writing anyway,
> > because it probably is corrupt (even without an error in close).  In
> > other cases, you have to do something to recover; what is not
> > obvious.  Typically, if you get an error from flush or close, the
> > resulting file is unusable anyway.  (In my current application, we
> > first write recognizable dummy data of the target length, and check
> > the return status, so that disk full errors can be be caught before
> > writing half a record.  But it still isn't 100% safe; the data can
> > span several disk blocks, and the system can crash after writing
> > one, but before writing the second.)

> As I stated above. You can never be sure unless you have a full round
> trip from user space to your storage and back again (using O_SYNC of
> course).

O_SYNC should suffice.  Round trip or no.

> But in case of a failure I would like to have the full state of every
> single object in the system available to run consistency checks.

> > > PS: What happens if an application's cout is redirected to a file
> > > located on a NFS mounted filesystem, but cout.close() fails?

> > You're out of luck.

> > You've actually got a good point there.  Outputting to standard out
> > is generally characteristic of command line interface programs, and
> > not others.  Command line programs are used within a larger context,
> > and it is important, yeh essential, that the larger context be
> > informed of the failure.  Logically, the standard should be modified
> > such that if the close of cout fails, the program returns
> > EXIT_FAILURE, regardless of what the programmer has passed to exit
> > or returned from main.  Except that cout is never closed.  About the
> > best one can hope for is to not use cout in static destructors, do a
> > flush on it at the end of the program, and check the return value of
> > that flush.

> I agree, but you would also need an indication of what happened.  A
> failure like this won't be easily reproducible.  Ideally, you would
> get a core dump plus a syslog message.  You could do that by adding a
> fatal_io_error callback which is to be called for situations described
> here.

So what if the failure occurs after you've called exit?  (Sorry, I don't
have any good answers.  As I say, I don't do many of these type of
applications.  The applications I do do tend to make extensive use of
O_SYNC or fsync.  The more critical applications ensure that the
critical files are on the local disk, too.  No need to go looking for
trouble:-).)

Anyway, as I said, I'm not sure what you point is.  I agree with the
need for conservative caution, and a lot more mistrust with regards to
what is actually happening in the system.  But I don't see a standards
issue here; at most, a quality of implementation question.  In this
sense, the points are worth insisting upon, but in
comp.lang.c++.moderated, rather than here.  And you mustn't forget that
different applications have different requirements with regards to
quality.  Linux, or even Windows NT, seems to be adequate for Web site
hosting, for example.  But I don't know anyone using them for a
telephone node, or for critical banking applications.  (Windows is very
popular for the client applications, which don't contain any critical
data or functionality, but the servers tend to come from Sun, HP or IBM,
and they all run Open Systems Unix.)

--
James Kanze                           mailto:jkanze@caicheuvreux.com
Conseils en informatique orient   e objet/
                    Beratung in objektorientierter Datenverarbeitung

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: John Nagle <nagle@animats.com>
Date: Tue, 30 Jul 2002 15:55:42 GMT
Raw View
James Kanze wrote:

> Alexander Sieb <sieb@sscd.de> wrote in message
> news:<3D43502B.A252D18C@sscd.de>...
>
>>James Kanze wrote:
>>
>
>>>Alexander Sieb <sieb@sscd.de> wrote in message
>>>news:<3D408DFC.8F546FF8@sscd.de>...
>>>
>
>>>>it is hopefully^W well known that ignoring the return value of
>>>>close(2) is a programming error on Linux.


    I think we're stuck with that for historical reasons.

    Arguably, destroying a file object in error status
should throw an exception.  But exceptions in destructors create
major problems.

    The really hardcore position would be that if you don't
close the object properly before it is destroyed, you get
a rollback, in the database sense, to the initial state of
the file.  UCLA Locus actually implemented such semantics;
if a program exited via "abort()" or an uncaught signal,
all the open files rolled back to their initial state.
(You could "commit" a file while open, thus keeping all
changes up to the commit, but the default
action was to rollback on error termination.)  Thus, you
always got either a clean update or no update, never
a half-written file.  That was a good idea, but alien
to the C/C++/UNIX/Win32 community.

     We're probably stuck with "traditional UNIX semantics" here.

     John Nagle
     Animats

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]