Thread

Topic: Using "start" to open fstream

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: 2000/11/10 Raw View

In article <ur94ny2cl.fsf@mail.crocodial.de>, Benjamin Riefenstahl
<Benjamin.Riefenstahl@ision.net> wrote:
>AFAICS people seem to prefer the distant possibility of that kind of
>unexpected behavior to having to take great care about case in
>filenames.  Case-insensitive systems (Windows and Mac) are a fact of
>life and seem to cover the vast majority of the desktop computers.

Just a note: The MacOS does _not_ use a case insensitive file system. It
is in fact perfectly OK for the MacOS to have files in the same directory
that only differ in upper/lower case. However, the user front end called
the "Finder" uses look-up routines that treats upper/lower case the same.
So it is not possible to put such files in the same directory via the
Finder.

Actually, I used to run the Tenon Mach/BSD which is a UNIX added on top of
MacOS pre-X, which uses the MacOS file system. Then the MacOS file system
does not complain if files only differ in upper/lower case.

Now for details under MacOS X, see quote in one of my earlier posts: MacOS
X does support Unicode filenames, but I got the impression this was done
via conversions to ASCII or 8-bit characters.

The main thing though for the C++ standard though, is that there already
are (in the near future) commonplace OS's that uses wstring filenames.

So definitely, the C++ standard ought to support this.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Benjamin Riefenstahl <Benjamin.Riefenstahl@ision.net>
Date: 2000/11/07 Raw View

Hi James,

> Benjamin Riefenstahl <Benjamin.Riefenstahl@ision.net> writes:
>> [...]  applications will have to expect more and more file names
>> that are only representable in Unicode, but not in the single-byte
>> mode of Windows programming.  These files can not be opened by
>> standard C++ iostreams (at least not by name).

kanze@gabi-soft.de writes:
> Why not?  I can represent any Unicode character with 8 bit bytes.  I
> may generally need more than one of them, but I can still represent
> it.  All the system needs to do is define a standard encoding (like
> UTF-8).

Currently the "system" (Windows + any available compiler) doesn't do
so.  The compiler runtime could do it, but I doubt that anybody on the
platform actually wants that.  There already are existing local 8-bit
character encodings in use and in my experience programmers expect to
have exactly the one that is configured on the user's computer when
they use "char".  UTF-8 is not a choice there AFAIK.  The platform
guidelines say, if you want to support file names that are not covered
by that encoding, you use Unicode (wchar_t implemented as UTF-2).

> The problem isn't there.  The problem is how to interpret these
> filenames if the system is supposed to be case insensitive.  Is 'i'
> the same as 'I', or a capital I with a dot over it?
> [...]
> Practically, the only really acceptable solution would be to make
> the filenames case sensitive.  If not, you either have unexpected
> behavior for the user (e.g. the Turkish i),

AFAICS people seem to prefer the distant possibility of that kind of
unexpected behavior to having to take great care about case in
filenames.  Case-insensitive systems (Windows and Mac) are a fact of
life and seem to cover the vast majority of the desktop computers.

I don't think that this is a case of how things could theoretically
be, they certainly *could* use UTF-8 and case-sensitive file names.
It's rather a question of what people (and programmers) want.

so long, benny
--
ISION Internet AG
Benjamin Riefenstahl
mailto:benjamin.riefenstahl@ision.net

Ruhrstrasse 61
D-22761 Hamburg
http://www.ision.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: Benjamin Riefenstahl <Benjamin.Riefenstahl@ision.net>
Date: 2000/11/05 Raw View

Hi James,

This is most probably off-topic, but anyway just FYI:

James.Kanze@dresdner-bank.com writes:
> Seriously, I'm not aware of any file system which supports Unicode
> file names, at least not completely.  I've heard that Windows NT
> does, but I've not seen the actual documentation.

It does.  Win2000 is even better by bringing the feature to the actual
user.  It supports a large number of scripts OOTB.  They can be
installed very simply by the average user and all these scripts are
available for file names.  If this is widely used, applications will
have to expect more and more file names that are only representable in
Unicode, but not in the single-byte mode of Windows programming.
These files can not be opened by standard C++ iostreams (at least not
by name).

Novell Netware also has support for Unicode file names.  Some Unix
type systems can use UTF-8 (Plan 9) or have at least plans to support
that (Linux).

While I haven't seen the feature on MacOS X yet, the latest file
system versions from Apple do support Unicode internally, so it's a
good bet that we may also see APIs to use it some day.

> Since filenames under Windows are case insensitive, this should lead
> to some interesting problems in comparing filenames.  Whether two
> filenames designate the same file depends on the locale, which can
> change in time and with user.

Yes that's an interesting problem ;-).  A quick look at the docs seems
to indicate that NT uses a simple fixed table, stored on the disk.
It's probably done this way so that the interpretation of file names
does *not* change with the locale, but is instead fixed at the point
of the creation of the file system.

so long, benny
--
ISION Internet AG
Benjamin Riefenstahl
mailto:benjamin.riefenstahl@ision.net

Ruhrstrasse 61
D-22761 Hamburg
http://www.ision.net

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: kanze@gabi-soft.de
Date: 2000/11/06 Raw View

Benjamin Riefenstahl <Benjamin.Riefenstahl@ision.net> writes:

|>  James.Kanze@dresdner-bank.com writes:
|>  > Seriously, I'm not aware of any file system which supports Unicode
|>  > file names, at least not completely.  I've heard that Windows NT
|>  > does, but I've not seen the actual documentation.

|>  It does.  Win2000 is even better by bringing the feature to the
|>  actual user.  It supports a large number of scripts OOTB.  They can
|>  be installed very simply by the average user and all these scripts
|>  are available for file names.  If this is widely used, applications
|>  will have to expect more and more file names that are only
|>  representable in Unicode, but not in the single-byte mode of Windows
|>  programming.  These files can not be opened by standard C++
|>  iostreams (at least not by name).

Why not?  I can represent any Unicode character with 8 bit bytes.  I may
generally need more than one of them, but I can still represent it.  All
the system needs to do is define a standard encoding (like UTF-8).

The problem isn't there.  The problem is how to interpret these
filenames if the system is supposed to be case insensitive.  Is 'i' the
same as 'I', or a capital I with a dot over it?

|>  Novell Netware also has support for Unicode file names.  Some Unix
|>  type systems can use UTF-8 (Plan 9) or have at least plans to
|>  support that (Linux).

Exactly.  UTF-8 is an eight bit multibyte encoding stream.  According to
the standard, an implementation can already use it for narrow string
literals.  Off hand, I don't know of any that do, but the problem isn't
with the standard.

|>  While I haven't seen the feature on MacOS X yet, the latest file
|>  system versions from Apple do support Unicode internally, so it's a
|>  good bet that we may also see APIs to use it some day.

|>  > Since filenames under Windows are case insensitive, this should lea=
d
|>  > to some interesting problems in comparing filenames.  Whether two
|>  > filenames designate the same file depends on the locale, which can
|>  > change in time and with user.

|>  Yes that's an interesting problem ;-).  A quick look at the docs
|>  seems to indicate that NT uses a simple fixed table, stored on the
|>  disk.  It's probably done this way so that the interpretation of
|>  file names does *not* change with the locale, but is instead fixed
|>  at the point of the creation of the file system.

Practically, the only really acceptable solution would be to make the
filenames case sensitive.  If not, you either have unexpected behavior
for the user (e.g. the Turkish i), or you have locale specific behavior,
which, as you seem to realize, is going to pose a whole new set of
problems.

--=20
James Kanze                               mailto:kanze@gabi-soft.de
Conseils en informatique orient=E9e objet/
                   Beratung in objektorientierter Datenverarbeitung
Ziegelh=FCttenweg 17a, 60598 Frankfurt, Germany Tel. +49(069)63198627

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 30 Oct 2000 22:55:05 GMT Raw View

In article <8tk4cn$kfp$1@nnrp1.deja.com>, James.Kanze@dresdner-bank.com wrote:
>Why?  You can't read or write them to/from a file, so why should the
>names be any different:-).

I think that MacOS X, which exists in beta is Unicode based as far as
reading from/writing to files.

>Seriously, I'm not aware of any file system which supports Unicode
>file names, at least not completely.  I've heard that Windows NT does,
>but I've not seen the actual documentation.  Since filenames under
>Windows are case insensitive, this should lead to some interesting
>problems in comparing filenames.  Whether two filenames designate the
>same file depends on the locale, which can change in time and with
>user.

Actually, the document
http://developer.apple.com/techpubs/macosx/SystemOverview/SystemOverview.pdf
says on page 181, under the heading "HFS+" that MacOS X already has a
facilility for allowing Unicode file and directory names.

>(Note that I'm in favor of allowing wchar_t filenames.  But I can see
>a bit of work being necessary before any such proposal is ready for
>the standard.)

I think this will go the same way as the extension 7 bit->8 bit: OS's
support 8 bit nowadays, but that only works with special font encodings.
But in language localizations, one will use those character extensions in
files. Then, as Unicode becomes available in text-files, the need for
having it on filenames as well will emerge.

If one is using a language in which is it suffices with only 7-bit ASCII
(is there any other than English?), it can be hard to think of a need for
more.

So I feel sure OS's will have it; if Unicode now becoems widely available
on PC's (like Mac's), the change can happen fairly fast.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: James.Kanze@dresdner-bank.com
Date: Mon, 30 Oct 2000 20:14:25 GMT Raw View

In article
<remove.haberg-2910002159350001@du138-226.ppp.su-anst.tninet.se>,
  remove.haberg@matematik.su.se (Hans Aberg) wrote:

> One another problem is that Unicode is now coming along, and one
> would expect filenames to be able to use Unicode wide characters.

Why?  You can't read or write them to/from a file, so why should the
names be any different:-).

Seriously, I'm not aware of any file system which supports Unicode
file names, at least not completely.  I've heard that Windows NT does,
but I've not seen the actual documentation.  Since filenames under
Windows are case insensitive, this should lead to some interesting
problems in comparing filenames.  Whether two filenames designate the
same file depends on the locale, which can change in time and with
user.

When reading and writing wchar_t to a file, the system uses a codecvt
facet to convert from wchar_t to/from byte (char).  I'm not sure how
this would apply to filenames.

(Note that I'm in favor of allowing wchar_t filenames.  But I can see
a bit of work being necessary before any such proposal is ready for
the standard.)

Sent via Deja.com http://www.deja.com/
Before you buy.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sun, 29 Oct 2000 02:58:25 GMT Raw View

The following does not work
  std::string file_name = "my_file";
  std::ifstream my_ifs(file_name);
It seems one has to write
  std::ifstream my_ifs(file_name.c_str());

Isn't that strange, that in the C++ standard, one must revert from the new
std::string class to use the old C-strings? (Breaking the C++ string
paradigm.)

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: James Dennett <james@evtechnology.com>
Date: Sun, 29 Oct 2000 17:29:43 GMT Raw View

Hans Aberg wrote:

> The following does not work
>   std::string file_name = "my_file";
>   std::ifstream my_ifs(file_name);
> It seems one has to write
>   std::ifstream my_ifs(file_name.c_str());
>
> Isn't that strange, that in the C++ standard, one must revert from the new
> std::string class to use the old C-strings? (Breaking the C++ string
> paradigm.)
>

This subject has been covered a number of times here and on
comp.lang.c++.moderated.

Yes, it's a little strange.  It's mostly a result of the way the various
pieces of the C++ Standard have evolved and been pulled together.

Some have said that it helps to reduce coupling, but that appears
to be a somewhat bankrupt argument.

I don't think that this counts as a defect (in the ISO sense) with the
current Standard, but it would be nice to see it fixed next time around
by adding an explicit constructor from std::string to ifstream.  There
are issues to consider (such as the fact that std::string is a typdef for
std::basic_string<char> -- should we allow other std::basic_string
types?) but I'm sure we can do better than forcing people to use a
raw char pointer.

-- James Dennett <jdennett@acm.org>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]

Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Mon, 30 Oct 2000 14:45:50 GMT Raw View

In article <39FC7272.8C04990E@evtechnology.com>, James Dennett
<james@evtechnology.com> wrote:
>> Isn't that strange, that in the C++ standard, one must revert from the new
>> std::string class to use the old C-strings [when opening fstreams]?
(Breaking the C++ string
>> paradigm.)
...
>Yes, it's a little strange.  It's mostly a result of the way the various
>pieces of the C++ Standard have evolved and been pulled together.
...
>I don't think that this counts as a defect (in the ISO sense) with the
>current Standard, but it would be nice to see it fixed next time around
>by adding an explicit constructor from std::string to ifstream.  There
>are issues to consider (such as the fact that std::string is a typdef for
>std::basic_string<char> -- should we allow other std::basic_string
>types?) but I'm sure we can do better than forcing people to use a
>raw char pointer.

C++ has been described as a multi-paradigm language. One would expect
though that there is a new set of C++ paradigms, like "string" and
"stream" classes without having to pass over the "compatibility with C"
paradigm.

One another problem is that Unicode is now coming along, and one would
expect filenames to be able to use Unicode wide characters.

  Hans Aberg      * Anti-spam: remove "remove." from email address.
                  * Email: Hans Aberg <remove.haberg@member.ams.org>
                  * Home Page: <http://www.matematik.su.se/~haberg/>
                  * AMS member listing: <http://www.ams.org/cml/>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html                ]
[ Note that the FAQ URL has changed!  Please update your bookmarks.     ]