Topic: [C++0x] Type system, binary data support, etc
Author: "Al Grant" <tnarga@arm.REVERSE-NAME.com>
Date: Wed, 30 May 2001 10:43:43 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3B13A89A.6432757B@wizard.net...
> I'm just suggesting an intermediate work-around; if it were possible to
> remove the distinctions between user-defined and built-in types, I'd be
> all in favor of it, but I think that would require a major re-design.
> Most of those distinctions were there for backward compatibility with C;
> and by now there's also a huge body of C++ code dependent on them.
The two distinctions I am thinking of are the inability to
have structs with constructors in unions - even when the
no-parameters constructor has the do-nothing semantics that
it has for built-in types - and the fact that 'volatile'
structs have assignment semantics that are not those of
built-in types or even C structs - i.e. C programs with
volatile struct assignment may not compile under C++!
> > But it is already possible to use a traits type:
> >
> > _int<32>::TheType k;
> >
> > Would it be sufficient to have a new syntactic sugar
> > that allowed you to omit the 'TheType'? That would be
> > a much more general new feature.
>
> I'm not at all sure how you envision this feature. I doubt it's general
> usefulness. Could you please describe in more detail?
It would just be a way of saying "when this struct
appears, its value is not a struct type, but the value
of one of its members". For example in
template<int nBits> struct _int {
auto typedef
COND<(nBits > 8*sizeof(short)), int, short>::TheType
TheType;
};
If you then said
_int<16> n;
the template would designate, not an empty struct, but
whatever was designated by its (unique) 'auto' member,
in this case an integral type. In other cases you might
have the 'auto' member be a constant value. (The COND
template above could also use it for its 'result' type.)
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Radoslav Getov" <nospam@mai.com>
Date: Wed, 30 May 2001 16:34:46 GMT Raw View
"Niklas Matthies" <news/comp.std.c++@nmhq.net> wrote in message
news:slrn9gnhmv.6qs.news/comp.std.c++@ns.nmhq.net...
: Integer types larger than 64 bits will only be necessary for specialized
: applications (when you need to count more than 18 quintillion things).
: While a generic mean for specifying N-bit integer types is certainly
: something to be favored, it's not very likely that there will be a
: shortage of convenient names for fundamental types.
There are domains in which no matter how fast and wide the computers become,
that will not be sufficient. Cryptography is one such thing that everybody
uses.
The assumption that integral types are used only or mainly for storing
counts is not justified. Some guys use them for caclulations as well.
This remind me of some folks that were sure (many years ago) that a 6MHz
CPU and a 1.44MB diskette is more than *anybody* would *ever* need.
Radoslav Getov
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
Date: Wed, 30 May 2001 18:20:29 GMT Raw View
Tue, 29 May 2001 17:22:24 GMT, James Kuyper Jr. <kuyper@wizard.net> pisze=
:
>> _int<32>::TheType k;
>>=20
>> Would it be sufficient to have a new syntactic sugar
>> that allowed you to omit the 'TheType'? That would be
>> a much more general new feature.
>=20
> I'm not at all sure how you envision this feature. I doubt it's
> general usefulness. Could you please describe in more detail?
Template typedefs with specialization of course.
template<int n> class _GenericInt
{
Generic but possibly slow implementation.
};
template<int n> typedef _GenericInt _int;
template<> typedef int16_t _int<16>;
template<> typedef int32_t _int<32>;
--=20
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZAST=CAPCZA
QRCZAK
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Al Grant" <tnarga@arm.REVERSE-NAME.com>
Date: Tue, 29 May 2001 12:43:17 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3B0E5AEF.8D8BC53B@wizard.net...
> Al Grant wrote:
> > "James Kuyper Jr." <kuyper@wizard.net> wrote in message
> > news:3B0DDA1B.E9C6D@wizard.net...
> > > int<32> k; // Implies, correctly, that this could be a template.
> >
> > A template what? If you used a template struct or class then
> > it couldn't be used the same way as a real int. It is not
>
> You're correct; there are a few important ways in which structs cannot
> precisely emulate built-in types. However, they can come very close.
> What I was trying to imply in a very brief comment is that you can
> emulate this right now, modulo those problems
Yes, my point is that every time things like this are
discussed, these problems are modulo'd away. Why doesn't
someone fix the problems, then C++ would have a much
more powerful mechanism for wrapping the built-in types.
Sized ints would then be easy to implement this way.
> implementation-specific template named something like _int<>. If it were
> standardized, it could be defined as borrowing many of the features of
> template struct, while lacking the defects.
But it is already possible to use a traits type:
_int<32>::TheType k;
Would it be sufficient to have a new syntactic sugar
that allowed you to omit the 'TheType'? That would be
a much more general new feature.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
Date: Tue, 29 May 2001 17:19:30 GMT Raw View
Thu, 24 May 2001 19:46:51 GMT, Chris Newton <chrisnewton@no.junk.please.b=
tinternet.com> pisze:
> My point was that using multiple read statements (as many suggested in
> response to my original post) is not in any way guaranteed not to go an=
d
> perform multiple reads, which may be very inefficient.
I would read data into a buffer of unsigned chars and extract integers
with specific sizes and endiannesses into a C++ struct by bit shifts.
No need for extending the core language here. It would make an illusion
that binary reading of a bunch of shorts and ints is portable.
--=20
__("< Marcin Kowalczyk * qrczak@knm.org.pl http://qrczak.ids.net.pl/
\__/
^^ SYGNATURA ZAST=CAPCZA
QRCZAK
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Tue, 29 May 2001 17:21:04 GMT Raw View
Garry Lancaster:
> > I would write something like this:
> >
> > // Library code.
> > template <typename T>
> > void ReadFromFile(std::ifstream& stm, T* dst)
> > {
> > char buf[ sizeof(T) ];
> >
> > stm.read( buf, sizeof(T) );
> > std::memcpy( dst, buf, sizeof(T) );
> > }
> >
> > // Code for reading a particular structure.
> > void ReadHeader(std::ifstream& file)
> > {
> > Header dst;
> >
> > ReadFromFile( file, &dst.m_signature );
> > ReadFromFile( file, &dst.m_a );
> > ReadFromFile( file, &dst.m_b );
> > }
> >
> > Note I don't use this mysterious File type you made use of in
> > your example.
Chris Newton:
> My problem with the above is still that it's awkward. Have you ever seen
> anyone write anything like that code in real life? I haven't.
If you mean member by member binary I/O, then, yes, I have seen it and,
indeed, used it.
> Most seriously, you have information about the Header struct in more
> than one place; every time you change it, you need to update ReadHeader
> (and presumably WriteHeader, etc.) as well. That's a readability and
> maintenance issue, and a potential source of bugs.
Agreed. However, although for your simple Header struct it worked OK, often
the struct-layout information does not provide sufficient information
anyway, as I explain below.
> You're also introducing the somewhat arbitrary overhead of copying
> around memory for every read, just to add a degree of type safety. I
> accept that in most environments, the cost of that copy will be
> insignificant, but I still don't like the idea that we're introducing
> overhead for no particularly good reason.
The total number of bytes copied is exactly the same as required in your
scheme to copy from packed to non-packed. There is nothing arbitrary about
it - the number of bytes copied is always exactly equal to the number of
bytes read.
If you rewrite ReadFromFile as
template <typename T>
inline void ReadFromFile(std::ifstream& stm, T* dst)
{
std::memcpy( reinterpret_cast<char*>( dst ), sizeof( T ) );
}
the copying is entirely eliminated (unlike in your example). I believe this
is safe for the int16 and int32 types. However, I wanted to follow your
non-cast edict. Incidentally, you never showed us how you implemented the
mysterious File::read function without a cast or extra copy.
> I accept that the challenge I set can be achieved in standard C++, as we
> all knew already.
If you knew that already I wonder why in a previous post you wrote "At the
moment, C++ provides *ZERO* support for doing this [binary file I/O] in
*ANY* portable [within hardware families] way, AFAIK"?
> However, I think you have made my point for me. Doing
> it in a portable way is so cumbersome that most people just write their
> #pragma pack(1)
> or whatever before their structs, and use their CFile, FileReader or
> some such class to do the I/O. They do that because it's easier, clearer
> and often more efficient. Unfortunately, in the process, they write
> highly non-portable code -- not that they realise until later, of
> course, when suddenly all hell breaks loose as they try to move to a new
> platform and everything breaks.
Anyone who doesn't realise these things are non-portable is rather naive
with respect to standard C++. I think people actually use them when they are
just not interested in portable-just-within-hardware-families binary file
I/O. They either want greater portability, for which packed is insufficient,
or don't need any type of portability outside their chosen compiler or
framework at all.
> All in all, I think I stand by my original argument that standardising
> such functionality would be widely beneficial.
No-one is arguing that packed would not occassionally lead to less source
code for binary file I/O. This is not an especially strong claim. However,
packed cannot deal with the following cases:
1. When the struct is a non-POD type.
2. When a member is a pointer (OK for some temporary files).
3. When a member is a bit-field.
4. When you do not want one or more members to be persisted.
5. When you wish to have different in-memory and on-disk representations for
the top-level struct or any member.
Items 1 to 3 restrict the kind of structs for which packed helps binary I/O
to a less-than-C subset of C++ (C--?). Overall these restrictions mean that
the improvements to binary I/O offered by packed are quite limited, whereas
the existing approach offers immense flexibility to overcome *all* these
hurdles.
Furthermore, it is not sufficient for a proposed new feature to have
advantages - to be acceptable its advantages must outweigh its disadvantages
both to programmers and implementors. There is a word for software that
contains every feature someone once thought was cool: bloatware.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Tue, 29 May 2001 17:22:24 GMT Raw View
Al Grant wrote:
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
...
> Yes, my point is that every time things like this are
> discussed, these problems are modulo'd away. Why doesn't
> someone fix the problems, then C++ would have a much
> more powerful mechanism for wrapping the built-in types.
> Sized ints would then be easy to implement this way.
I'm just suggesting an intermediate work-around; if it were possible to
remove the distinctions between user-defined and built-in types, I'd be
all in favor of it, but I think that would require a major re-design.
Most of those distinctions were there for backward compatibility with C;
and by now there's also a huge body of C++ code dependent on them.
> > implementation-specific template named something like _int<>. If it were
> > standardized, it could be defined as borrowing many of the features of
> > template struct, while lacking the defects.
>
> But it is already possible to use a traits type:
>
> _int<32>::TheType k;
>
> Would it be sufficient to have a new syntactic sugar
> that allowed you to omit the 'TheType'? That would be
> a much more general new feature.
I'm not at all sure how you envision this feature. I doubt it's general
usefulness. Could you please describe in more detail?
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Mon, 28 May 2001 23:00:25 GMT Raw View
glancaster <glancaster@ntlworld.com> wrote...
> Chris Newton:
> > My point was that using multiple read statements (as many
> > suggested in response to my original post) is not in any way
> > guaranteed not to go and perform multiple reads, ...
>
> Neither is a single read. You are completely abstracted from
> the underlying file I/O: once by the stream buffer, probably
> once by the operating system, and probably once by the hard
> drive's own hardware cache. That's what James and others
> have been explaining.
Of course. And as I've been trying to explain in return, that makes
using multiple read statements when you only want to read one composite
thing a sub-optimal solution. The abstractions remove some degree of
control from the programmer, and as a consequence, the programmer can do
best by making his intent as clear as possible. In particular, not all
compilers, libraries, OSes and/or hardware platforms provide good
buffering, and in cases where they don't, multiple read statements
*will* be a major performance hit. On systems where the cacheing is
good, a single statement is "only" more readable and perhaps marginally
more efficient, but certainly no worse.
> Have you any proof that this micro-optimization yields
> significantly faster I/O? You seem very convinced.
In an application that reads many formatted structures, possibly from
many files, it can make a huge difference. Granted that's not usually
the case, but why impose the need to "lie" to C++ about your intent and
potentially damage those cases?
> Have you not seen my previous post? I provided a completely
> standard-compliant implementation, apart from the fixed-size
> types you wanted to use. And you're right: it wasn't hard.
> Furthermore, even if I wasn't allowed to use your types, it
> still wouldn't be hard.
Apologies; my ISP seems to be dropping the odd post in this thread from
its news server, so I hadn't seen it. I've since looked it up on Google,
however.
[The following quotations are from Garry's other post...]
> > 2. Different input functions to read every type in the known
> > universe
>
> To get around this requirement in your code you appear to
> introduce a read member function of an unknown type called
> File. It looks like the first parameter is a void* and the second
> std::size_t.
>
> I prefer something like:
>
> template <typename T>
> void File::read(T* data); // uses sizeof(T) internally.
>
> This avoids the possibility of someone passing the wrong size.
It might be nice to provide such a templated version on safety grounds.
I elected to use a void* version for illustration, simply because you
can build safety on top of efficiency, but not vice versa here.
(I'm not really trying to define the "random-access binary file I/O"
protocol I advocated in my original post here, just using a simple
example for illustration purposes.)
> But then, as each different template function instantiation is
> strictly a new function for each type, this interface, which is
> safer than your original, violates your rule. Therefore it is a
> good thing to break this rule.
I have no problem with the template version. My objection is to having
to write a new function every time I want to read/write a new type in
binary form. That's a maintenance headache and a recipe for bugs.
> I would write something like this:
>
> // Library code.
> template <typename T>
> void ReadFromFile(std::ifstream& stm, T* dst)
> {
> char buf[ sizeof(T) ];
>
> stm.read( buf, sizeof(T) );
> std::memcpy( dst, buf, sizeof(T) );
> }
>
> // Code for reading a particular structure.
> void ReadHeader(std::ifstream& file)
> {
> Header dst;
>
> ReadFromFile( file, &dst.m_signature );
> ReadFromFile( file, &dst.m_a );
> ReadFromFile( file, &dst.m_b );
> }
>
> Note I don't use this mysterious File type you made use of in
> your example.
My problem with the above is still that it's awkward. Have you ever seen
anyone write anything like that code in real life? I haven't.
Most seriously, you have information about the Header struct in more
than one place; every time you change it, you need to update ReadHeader
(and presumably WriteHeader, etc.) as well. That's a readability and
maintenance issue, and a potential source of bugs.
You're also introducing the somewhat arbitrary overhead of copying
around memory for every read, just to add a degree of type safety. I
accept that in most environments, the cost of that copy will be
insignificant, but I still don't like the idea that we're introducing
overhead for no particularly good reason.
I accept that the challenge I set can be achieved in standard C++, as we
all knew already. However, I think you have made my point for me. Doing
it in a portable way is so cumbersome that most people just write their
#pragma pack(1)
or whatever before their structs, and use their CFile, FileReader or
some such class to do the I/O. They do that because it's easier, clearer
and often more efficient. Unfortunately, in the process, they write
highly non-portable code -- not that they realise until later, of
course, when suddenly all hell breaks loose as they try to move to a new
platform and everything breaks.
All in all, I think I stand by my original argument that standardising
such functionality would be widely beneficial.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 25 May 2001 04:55:42 GMT Raw View
Phil Edwards wrote:
...
> Maybe allow the arithmatic tokens to appear in types?
>
> short^4 int x = 0; // x might be 65Kbits
> int^4 x = 0; // if sizeof(int)==32 then sizeof x == 1Mbits
> int*4 x = 0; // "x is four times as large as an int, however
> // big that happens to be"
>
> Maybe only the last one makes sense. On the other hand, in ten years the
> concept of allocating a million bits on the stack for a integer might seem
> perfectly normal. Maybe you're calculating Chairman Gates' net worth.
>
> Don't ask me about keyboards/charsets which don't support the ^ character.
> I haven't figured that one yet. :-)
Theres lots of better options. For one thing, sizes specified in bits
rather than bytes are more flexible, and are a better match to what most
people want. Next, there are at least three different ways of doing it
that each have some advantage over the ones you suggested:
int32_t i; // Compatible with C99
int:32 j; // Modelled on bit-field syntax.
int<32> k; // Implies, correctly, that this could be a template.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Al Grant" <tnarga@arm.REVERSE-NAME.com>
Date: Fri, 25 May 2001 10:43:35 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3B0DDA1B.E9C6D@wizard.net...
> int<32> k; // Implies, correctly, that this could be a template.
A template what? If you used a template struct or class then
it couldn't be used the same way as a real int. It is not
possible to do both
int<32> k = 1;
and
union { int<32> n; };
There are also problems with 'volatile'. Fixing these
problems would make things like this a lot easier.
You'd still be left not being able to use the type for
bitfields though.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Fri, 25 May 2001 11:12:57 GMT Raw View
Chris Newton:
> > > I will forbid the use of the following in your answer.
> > >
> > > 1. More than one disk access to read the data from the file
James Kuyper Jr.:
> > OK - so you've already prohibited anything even faintly
> > resembling C++; C++ defines only buffered I/O, it provides
> > no primitives for bypassing the buffering and directly
> > controlling how many disk accesses are made.
Chris Newton:
> My point was that using multiple read statements (as many suggested in
> response to my original post) is not in any way guaranteed not to go and
> perform multiple reads, ...
Neither is a single read. You are completely abstracted from the underlying
file I/O: once by the stream buffer, probably once by the operating system,
and probably once by the hard drive's own hardware cache. That's what James
and others have been explaining.
> which may be very inefficient.
> My proposals allow you to tell the language exactly what you want. It
> can buffer it however it likes; at least it has all the information to
> make an informed choice, which is more than it does reading a variable
> at a time.
Have you any proof that this micro-optimization yields significantly faster
I/O? You seem very convinced.
> If you don't like the condition, or the way I phrased it, please ignore
> it. I'm still waiting for anyone to show me anything approximating an
> answer to my challenge. Come on, it wasn't hard. I can do it in two
> minutes using any number of third party libraries and
> compiler-specifics. Can't the standard keep up? :-)
Have you not seen my previous post? I provided a completely
standard-compliant implementation, apart from the fixed-size types you
wanted to use. And you're right: it wasn't hard. Furthermore, even if I
wasn't allowed to use your types, it still wouldn't be hard.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Paul Mensonides" <pmenso57@home.com>
Date: Fri, 25 May 2001 20:08:45 GMT Raw View
"James Kuyper Jr." <kuyper@wizard.net> wrote in message
news:3B0DDA1B.E9C6D@wizard.net...
| Phil Edwards wrote:
| ...
| > Maybe allow the arithmatic tokens to appear in types?
| >
| > short^4 int x = 0; // x might be 65Kbits
| > int^4 x = 0; // if sizeof(int)==32 then sizeof x == 1Mbits
| > int*4 x = 0; // "x is four times as large as an int, however
| > // big that happens to be"
| >
| > Maybe only the last one makes sense. On the other hand, in ten years the
| > concept of allocating a million bits on the stack for a integer might seem
| > perfectly normal. Maybe you're calculating Chairman Gates' net worth.
| >
| > Don't ask me about keyboards/charsets which don't support the ^ character.
| > I haven't figured that one yet. :-)
|
| Theres lots of better options. For one thing, sizes specified in bits
| rather than bytes are more flexible, and are a better match to what most
| people want. Next, there are at least three different ways of doing it
| that each have some advantage over the ones you suggested:
|
| int32_t i; // Compatible with C99
| int:32 j; // Modelled on bit-field syntax.
| int<32> k; // Implies, correctly, that this could be a template.
You would also want some mechanism to specify "difference from int." A lot of
software is written assuming "int" is the natural size and the fastest,
therefore the source code scales with the hardware. However, for obvious
reasons, requiring a character type to me stored with 128 bits is unreasonable,
but to have integer arithmetic that can go to that range isn't, nor is heavy
floating-point calculations. Of course, it should be "implementation" defined
whether or not a given bit size is supported (and it shouldn't be "rounded up"
to the nearest size that fits, it should be a compile time error if it isn't
supported directly).
Paul Mensonides
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Fri, 25 May 2001 20:10:51 GMT Raw View
Stephen Howe:
> > Have I missed something?
> > What is wrong with fstream.read() & fstream.write()?
Chris Newton:
> In no particular order...
>
> First off, they don't belong there. The IOstreams classes' principal
> purpose in life is to represent *streams*.
And the read and write functions read and write to a stream, so what's the
problem?
> The IOstreams library is full
> of obscurity at the best of times. They have cryptically named methods.
'read' and 'write' do exactly what they say on the tin. So do most of the
others. It is impossible to have method names that everyone will think are
perfect.
> They have obscure error handling protocols.
I don't find them obscure. In general exceptions would be too inconvenient
for common I/O failure conditions, but streams can be configured to use them
if you wish.
> Extending them is so
> absurdly complicated that people have written whole books on the
> subject, which few professional programmers ever read IME.
Whole books certainly exist for those who are really interested e.g.
implementors. Most programmers are only interested in using them (a lot) and
extending them (a little). For this purpose I find chapter 13 of Josuttis is
more than adequate.
> Secondly, these methods aren't exactly the height of convenient
> extensibility.
The read and write methods are extremely easy to extend, by wrapping them,
just as you can do for any accessible function.
> Look at the hoops I have to jump through, just to get a
> memory mapped file to look like a disk file.
We will have no idea exactly which hoops you jumped through unless you
describe them.
> What we need is a clear
> interface, with a small number of obviously named methods, for binary
> I/O.
This is exactly what we have at the moment with the binary I/O subset of the
file stream interfaces.
> It would be useful to provide common implementations supporting
> files, perhaps memory-mapped I/O, and the like.
Memory-mapped I/O would not be a fully portable facility, but could
conceivably be offered as an optional part of the standard.
> Thirdly, the methods are fundamentally unsafe under common
> circumstances. See below.
They are low-level, but if used correctly are perfectly safe. If you prefer
a higher level interface to stop yourself or your colleagues making common
programming errors, you are free to wrap them in your own classes or
functions. A standard higher level binary file stream would be nice to have,
but AFAIK no-one offered one to the standardization committee.
> > They read/write binary data.
>
> That's about the only good thing going for them, yep.
And this is the only thing they *should* do.
[snip]
> > Granted you need a reinterpret_cast for types other than
> > char but this mirrors the functions fread(), fwrite() for stdio.
Not strictly true - you could use memcpy instead to copy your type to a char
buffer - see my earlier post. Of course, memcpy is likely to be using
reinterpret_cast or equivalent internally. (And BTW, like memcpy, fread and
fwrite have a void* interface, so no cast is normally needed to call them.)
> As soon as you start casting around from char buffers in the world of
> binary I/O, you introduce an enormous risk of errors. It will happen. It
> does happen!
The conversion of a value to one or more chars is inevitable in order to
write it to a char-based stream. The only question is: where does it happen?
As the existing facilities are low-level, you are expected to do it
yourself.
If you believe that a binary stream should be anything other than
char-based, you should explain exactly what you are suggesting.
> Why are we defending the existing facilities instead of
> doing something about it?
Or in your case, why are you *criticizing* the existing facilities instead
of doing something about it?
IMHO there's nothing intrinsically wrong with the existing facilities.
> Let's use this opportunity to bring a useful,
> sane extension to the library.
You are free to do so.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 25 May 2001 20:11:48 GMT Raw View
Al Grant wrote:
>
> "James Kuyper Jr." <kuyper@wizard.net> wrote in message
> news:3B0DDA1B.E9C6D@wizard.net...
> > int<32> k; // Implies, correctly, that this could be a template.
>
> A template what? If you used a template struct or class then
> it couldn't be used the same way as a real int. It is not
You're correct; there are a few important ways in which structs cannot
precisely emulate built-in types. However, they can come very close.
What I was trying to imply in a very brief comment is that you can
emulate this right now, modulo those problems, with an
implementation-specific template named something like _int<>. If it were
standardized, it could be defined as borrowing many of the features of
template struct, while lacking the defects. My model for this is
static_cast<>, which syntatically looks a lot like a template function.
static_cast<> could as a stop-gap measure have been implemented as one
by the user, if they were using an non-conforming (or pre-conforming)
implementation that supported function templates, but did not have
static_cast<> yet.
...
> You'd still be left not being able to use the type for
> bitfields though.
That's true only with respect to my suggestion that it could be
implemented as a template right now, without a change to the standard.
If and when it were properly standardized, one of the things that would
have to be done is to figure out how size-named types (regardless of the
syntax used to name them) and bit-fields should interact. There's some
very tricky issues there, but I doubt that they're unresolvable.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Thu, 24 May 2001 19:10:02 GMT Raw View
Niklas Matthies <news/comp.std.c++@nmhq.net> wrote...
> Me, and I'm certainly not alone. As someone else already
> explained in this thread, the main purpose is I/O of binary
> formats that have a non-padded (or at least implementation-
> independently padded) and sometimes misaligned structure,
> and to safely and portably read und write to the fields of
> such structures.
Absolutely. Let's not forget that, please!
While my original post did rather hint at certain ideas I've been
thinking about, I welcome debate if it leads to a better solution. Let's
just remember that it's a simple, but common, problem that we set out to
solve, and not try to fix the whole universe in the process! :-)
Niklas Matthies <news/comp.std.c++@nmhq.net> wrote...
> > > This proposed extension allows them to let the compiler
> > > choose the most efficient way to do this, and to not clutter
> > > the source code with these operations.
glancaster <glancaster@ntlworld.com> wrote:
> > And by hiding what is actually happening, increasing the
> > liklihood that people will use this inefficient technique,
> > even when they don't actually have to.
You can write your functions taking a C instead of a const C&, too, if
you want. Smart programmers don't.
> I don't really buy your argument; people who decide to
> use packed structures without being aware that they trade
> of space for time will get what they deserve. This is similar
> to, say, using bitfields where they aren't appropriate.
Exactly.
[Now discussing one possibility for using the "packed" keyword...]
> > This is a can of worms. Avoid it. Ban non-POD types from
> > packed structs (I think the OP may have suggested this.)
>
> I agree that this would be a reasonable trade-off.
Yes, I did suggest that, and for precisely the reaons you two have been
considering.
Now, while we're discussing methods on packed structs, and what a
"packed struct" actually is...
First off, let me suggest a simple relationship between packed and
unpacked structures. They are different types, period. As far as
definitions and the "packed" keyword, I suggest that
struct A
{
int item;
char another;
};
be used as usual, and "packed A" be an independent type, which happens
to have the same members with the same names. This allows for convenient
typedefs, etc. We can just make "packed A" illegal code if A doesn't
meet our requirements for a packed structure.
BTW, I would not allow any conversion between an A* and a packed A*, and
in similar cases. I think they are different types, end of story.
I don't see any problem with allowing methods per se, just polymorphism.
In a private correspondance off the group, we've been discussing the
idea of compiler-generated methods, similar to the automatically
generated constructors, etc. In particular, it might be very useful to
have compiler-generated memberwise "unpacking" and "unpacking" c-tors
and op=s. That is,
A fred = {1, 'A'};
packed A bill = fred;
should simply initialise bill's members by performing the usual sort of
memberwise action. These can be used to simplify and legitimise
conversions where they are intended.
Since the compiler writer would have to generate code to handle any odd
alignments, etc., anyway, this is a very low effort requirement. It
does, however, make it much easier to read data into a packed struct and
then immediately move it to an unpacked struct for efficiency, for
example.
I'll leave you to discuss the other issues in this subthread separately;
personally, I think talk of new casts, conversions between references to
A and packed A, yada yada is massively overcomplicating a simple issue.
:-)
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Thu, 24 May 2001 19:40:58 GMT Raw View
On Thu, 24 May 2001 12:42:28 GMT, Eugene Karpachov <jk@steel.orel.ru> wro=
te:
> Wed, 23 May 2001 18:29:23 GMT Niklas Matthies =CE=C1=D0=C9=D3=C1=CC:
> >Integer types larger than 64 bits will only be necessary for specializ=
ed
> >applications (when you need to count more than 18 quintillion things).
>=20
> What if somewant want fast bitvector implementation? Or to implement
> set with no more then 100 elements?
Certainly you are aware of the possibility of using arrays.
I'm not sure what your point is. The posting I was replying to implied
that the size of native types available on architectures will continue
to double every N years as it has up to now. I was pointing out that
beyond a certain number of bits, the general usefullness of such integer
types decreases, and hence it's unlikely that native types provided by
architectures will infinitely continue to grow, even if technology would
allow it. Wider data paths are more efficiently put to use for parallel
processing, for example.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Thu, 24 May 2001 19:46:51 GMT Raw View
James Kuyper Jr. <kuyper@wizard.net> wrote...
> Chris Newton wrote:
> ...
> > Please tell me how you are going to get the above into
> > a meaningful "Header" struct using only currently
> > standard C++. To make things reasonably realistic, I
> > will forbid the use of the following in your answer.
> >
> > 1. More than one disk access to read the data from the file
>
> OK - so you've already prohibited anything even faintly
> resembling C++; C++ defines only buffered I/O, it provides
> no primitives for bypassing the buffering and directly
> controlling how many disk accesses are made.
My point was that using multiple read statements (as many suggested in
response to my original post) is not in any way guaranteed not to go and
perform multiple reads, which may be very inefficient.
My proposals allow you to tell the language exactly what you want. It
can buffer it however it likes; at least it has all the information to
make an informed choice, which is more than it does reading a variable
at a time.
If you don't like the condition, or the way I phrased it, please ignore
it. I'm still waiting for anyone to show me anything approximating an
answer to my challenge. Come on, it wasn't hard. I can do it in two
minutes using any number of third party libraries and
compiler-specifics. Can't the standard keep up? :-)
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Thu, 24 May 2001 19:46:16 GMT Raw View
Stephen Howe <SPAMstephen.howeGUARD@tnsofres.com> wrote...
> > 3. Random access binary file I/O
> >
> > Speaking of reading and writing binary files, I think
> > it's long past time C++ acknowledged the existence
> > of structured files as well as streamed I/O. A simple
> > framework allowing random-access file I/O is well
> > overdue.
>
> Have I missed something?
> What is wrong with fstream.read() & fstream.write()?
In no particular order...
First off, they don't belong there. The IOstreams classes' principal
purpose in life is to represent *streams*. The IOstreams library is full
of obscurity at the best of times. They have cryptically named methods.
They have obscure error handling protocols. Extending them is so
absurdly complicated that people have written whole books on the
subject, which few professional programmers ever read IME.
Secondly, these methods aren't exactly the height of convenient
extensibility. Look at the hoops I have to jump through, just to get a
memory mapped file to look like a disk file. What we need is a clear
interface, with a small number of obviously named methods, for binary
I/O. It would be useful to provide common implementations supporting
files, perhaps memory-mapped I/O, and the like.
Thirdly, the methods are fundamentally unsafe under common
circumstances. See below.
> They read/write binary data.
That's about the only good thing going for them, yep. Of course, they
only do so if that data happens to be stored in an array of char, as you
point out yourself:
> Granted you need a reinterpret_cast for types other than
> char but this mirrors the functions fread(), fwrite() for stdio.
As soon as you start casting around from char buffers in the world of
binary I/O, you introduce an enormous risk of errors. It will happen. It
does happen! Why are we defending the existing facilities instead of
doing something about it? Let's use this opportunity to bring a useful,
sane extension to the library.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Paul Mensonides" <pmenso57@home.com>
Date: Thu, 24 May 2001 23:51:54 GMT Raw View
"Niklas Matthies" <news/comp.std.c++@nmhq.net> wrote in message
news:slrn9gq3cs.4aa.news/comp.std.c++@ns.nmhq.net...
On Thu, 24 May 2001 12:42:28 GMT, Eugene Karpachov <jk@steel.orel.ru> wrote:
> Wed, 23 May 2001 18:29:23 GMT Niklas Matthies :
> >Integer types larger than 64 bits will only be necessary for specialized
> >applications (when you need to count more than 18 quintillion things).
>
> What if somewant want fast bitvector implementation? Or to implement
> set with no more then 100 elements?
Certainly you are aware of the possibility of using arrays.
I'm not sure what your point is. The posting I was replying to implied
that the size of native types available on architectures will continue
to double every N years as it has up to now. I was pointing out that
beyond a certain number of bits, the general usefullness of such integer
types decreases, and hence it's unlikely that native types provided by
architectures will infinitely continue to grow, even if technology would
allow it. Wider data paths are more efficiently put to use for parallel
processing, for example.
-- Niklas Matthies
The size of the instruction sets will grow, increasing the speed of the
processor indirectly. Likewise, 128-bit integers are used on many systems for
unique identifiers. They have to be hacked on the typical PC, of course. But
to say that larger sizes won't be used because they are impractical is a big
assumption.
signed char x = -1, y = 10; // 8-bit on my impl.
signed char z = x + y;
// becomes:
signed char z =
demote_to_char_and_store(
promote_to_128_bit_int(x) + promote_to_128_bit_int(y)
);
How about memory addresses on a super-computer?
On my implementation I have:
char = 8
short = 16
int = 32
long = 32
__int64 = 64 // they should have used *long*
GUID = 128 // hacked 128-bit integer (the 64-bit one is hack also).
So say, on a 128-bit machine, these sizes all must still be supported, so you
get (since "int" is usually native type):
short char = 8
char = 16
wchar_t = 32
short short = 32
short = 64
int = 128
long = 256
long long = ??
This undoubtedly *will* happen whether it is practical or not. Also, consider
floating-point storage on such a machine, where floating-point numbers are
calculated at, say, 240-bit. Which 3d game programmers will love, and which
*will* be used in mission-critical places. The point is that it is not just
*range* but *accuracy*. Also, I was not saying this will happen any time soon.
I'm just looking at the future in general, and "long long" is *not* a solution
to the problem.
Paul Mensonides
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Phil Edwards <pedwards@dmapub.dma.org>
Date: Fri, 25 May 2001 00:38:55 GMT Raw View
> We are also going to run out of *usable* keywords as processors get larger and
> larger. Platform extensions might be fine for specialized systems but not for
> mass-market personal computers of all types. Say, in ten years you have the
> Ultra-Tritanium 256-bit processor. I'm not looking forward to writing: short
> short short short int x = 0;
Maybe allow the arithmatic tokens to appear in types?
short^4 int x = 0; // x might be 65Kbits
int^4 x = 0; // if sizeof(int)==32 then sizeof x == 1Mbits
int*4 x = 0; // "x is four times as large as an int, however
// big that happens to be"
Maybe only the last one makes sense. On the other hand, in ten years the
concept of allocating a million bits on the stack for a integer might seem
perfectly normal. Maybe you're calculating Chairman Gates' net worth.
Don't ask me about keyboards/charsets which don't support the ^ character.
I haven't figured that one yet. :-)
Phil
--
pedwards at disaster dot jaj dot com | pme at sources dot redhat dot com
devphil at several other less interesting addresses in various dot domains
The gods do not protect fools. Fools are protected by more capable fools.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Wed, 23 May 2001 15:05:00 GMT Raw View
Ross Smith suggested:
> Four new keywords: packed, big_endian, little_endian, and native_endian.
Isn't native_endian redundant? It's just the absence of a big_endian or
little_endian qualification.
Maybe a library solution for endianness would be better than a language one.
I can see class templates big_endian<T> and little_endian<T>.
Implementation is left as an exercise for the reader ;-)
Kind regards
Garry Lancaster
Codemill Ltd
mailto << "glancaster" << at << "codemill" << dot << "net";
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 23 May 2001 15:07:04 GMT Raw View
Paul Mensonides wrote:
...
> We are also going to run out of *usable* keywords as processors get larger and
> larger. Platform extensions might be fine for specialized systems but not for
> mass-market personal computers of all types. Say, in ten years you have the
> Ultra-Tritanium 256-bit processor. I'm not looking forward to writing: short
> short short short int x = 0; We will also have to deal *normally* with 8, 16,
> 32, 64-bit character sets, etc.. I don't like: unsigned short long wchar_t c =
> 'a'; much either!
> In other words, the C99 solution is only a temporary fix.
But that's not the C99 solution to that problem. "long long" is a
syntactic abomination, it doesn't represent the way anyone intends C to
continue evolving. The size-named types are the approach that's intended
to handle this situation, and they're completely scaleable. It's already
quite legal for a C99 implementation to support int256_t,
int_least256_t, and int_fast256_t, if they want to. The only thing the
committee might want to change is to make the least and fast forms
mandatory, as they currently are for 8, 16, 32, and 64 bits.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Michael Lee Finney <michael.finney@acm.org>
Date: Wed, 23 May 2001 15:06:42 GMT Raw View
In article <3B09A1FE.FEC234EF@ihug.co.nz>,
ross.s@ihug.co.nz says...
> Example:
>
> packed struct ip_header {
> unsigned<8> version_ihl;
> unsigned<8> tos;
> big_endian unsigned<16> length;
> big_endian unsigned<16> identification;
> big_endian unsigned<16> flags_offset;
> unsigned<8> ttl;
> unsigned<8> protocol;
> big_endian unsigned<16> checksum;
> big_endian unsigned<32> source;
> big_endian unsigned<32> destination;
> };
I would argue that the only irreducible component of
your suggest is the ability to declare a struct/class
as "packed". Everything else can be done with
templates. For example (quick off the cuff, no
compiler verification):
template <class X> class BigEndian
{
X value;
X();
X(X x);
operator X();
X operator=(X x);
};
I didn't provide any implementation, but basically
operator X() reads the value, and converts from big
endian format (if necessary, depending on the
execution platform) and operator=() does the opposite.
I used X instead of X const & in places because it is
better suited to the primitive types. You can
separately create typedefs for each of the basic sizes
(given that you are not trying to mix sizes too badly,
otherwise you might need to create some special types
to use as "X", and then use partial specialization to
get what you want. Again, that would depend on your
platform. But, given the above, you could then write
your example as:
packed struct ip_header
{
unsigned1 version_ihl;
unsigned1 tos;
BigEndian<unsigned2> length;
BigEndian<unsigned2> identification;
BigEndian<unsigned2> flags_offset;
unsigned1 ttl;
unsigned1 protocol;
BigEndian<unsigned2> checksum;
BigEndian<unsigned4> source;
BigEndian<unsigned4> destination;
};
Also, of course, most compilers would align the above
correctly even without the packed keyword -- but that
is not generally true.
Even that could possibly be handled by internally
using an array of the right length (which would be
byte aligned) and then loading/storing fields byte at
a time. A bit slower, but it always works and the
fields where that is necessary are usually not in the
heavy internal computation part of a program. So,
replace
x value;
with:
unsigned char value[sizeof(x)];
and then modify the access operators appropriately.
There still might be some compilers that just would
not align things correctly (you can always insert pad
bytes yourself, it is unwanted pad bytes that are the
problem) so there is still justification for the
"packed" keyword.
Many compilers have the equivalent as "#pragma
packed" or #pragma pack(1)" or as a compile line
option. But, it really does need to be a compiler
keyword to be used on structures.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Wed, 23 May 2001 15:07:40 GMT Raw View
Chris Newton wrote:
...
> Please tell me how you are going to get the above into a meaningful
> "Header" struct using only currently standard C++. To make things
> reasonably realistic, I will forbid the use of the following in your
> answer.
>
> 1. More than one disk access to read the data from the file
OK - so you've already prohibited anything even faintly resembling C++;
C++ defines only buffered I/O, it provides no primitives for bypassing
the buffering and directly controlling how many disk accesses are made.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Wed, 23 May 2001 15:08:28 GMT Raw View
On Tue, 22 May 2001 20:30:33 GMT, glancaster <glancaster@ntlworld.com> wrote:
> There is still a lot of detail missing about packed (this is not surprising
> for a brainstorming exercise) but for me the required changes to the
> language are *already* too great for the benefit given.
>
> If an alignment standard is required, I suggest natural alignment. This is
> where each fundamental type of size n is aligned on an n-byte boundary.
> Aggregates are aligned identically to their largest member. E.g.
>
> natural struct MyNatural
> {
> int8_t c;
> int16_t s;
> int32_t l;
> };// sizeof(MyNatural) == 8, aligned on 4 byte boundaries.
>
> (Assume the intX_t types are the obvious typedefs or new language defined
> fixed size types like C99's.)
>
> AFAIK all architectures treat natural alignment as aligned, whatever other
> alignment options they offer, so the only language extension this requires
> is a natural keyword to prefix a struct definition.
Not the architectures are the issue, but the binary formats for data
exchange (via files, streams or other means of transmission). There are
binary formats out there where for example 32-bit entities are only
16-bit aligned (presumably because the architectures where they
originated only required at most 16-bit alignment). Generally,
architectures only require alignment up to a certain number of bytes,
which is not necessarily the size of the largest type of the
architecture, which in turn is not necessarily at least as large as the
largest fundamental C++ type.
A structure layed out with "natural alignment" might hence have even
more padding than a regular structure (for example, this would be the
case for `struct { uint8_t u8; uint64_t u64; };' on many current
implementations where its size is 12 bytes, not 16 bytes, and its
alignment requirement 4 bytes, not 8 bytes).
Packed structures are wanted exactly because the requirements of
architectures differ, but binary formats need to be handled in an
architecture-independent way. Since the compiler knows the target
architecture, it only makes sense that the compiler should bother about
the architecture-specific issues, and not the programmer who aspires to
write portable source code for the handling of some binary format.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Wed, 23 May 2001 17:40:37 GMT Raw View
Chris Newton:
> Let us suppose we have a binary file to read from disk, which begins
> with some header information as follows.
>
> Signature: 16-bit unsigned integer
> Value A: 32-bit signed integer
> Value B: 32-bit signed integer
>
> Please tell me how you are going to get the above into a meaningful
> "Header" struct using only currently standard C++. To make things
> reasonably realistic, I will forbid the use of the following in your
> answer.
I don't understand the thinking behind most of your rules.
> 1. More than one disk access to read the data from the file
You have very little control over this. Don't assume that one function call
= one disk access. Implementations and operating systems implement cacheing,
so looking at your source code doesn't tell you.
> 2. Different input functions to read every type in the known universe
To get around this requirement in your code you appear to introduce a read
member function of an unknown type called File. It looks like the first
parameter is a void* and the second std::size_t. I prefer something like:
template <typename T>
void File::read(T* data); // uses sizeof(T) internally.
This avoids the possibility of someone passing the wrong size.
But then, as each different template function instantiation is strictly a
new function for each type, this interface, which is safer than your
original, violates your rule. Therefore it is a good thing to break this
rule.
> 3. Casts
Some casts are perfectly safe and portable. Even reinterpret_cast can be
safe and portable in some contexts.
> 4. Any typedef, #pragma or other code that would likely need adjusting
> when porting to a new compiler on the same platform.
OK.
I assume I'm allowed (nay, required) to use your original header defintion,
as we're not now discussing the fixed size types.
> struct Header
> {
> unsigned int<16> m_signature;
> int<32> m_a;
> int<32> m_b;
> };
> OK, let's see you do better with the current standard. :-)
I would write something like this:
// Library code.
template <typename T>
void ReadFromFile(std::ifstream& stm, T* dst)
{
char buf[ sizeof(T) ];
stm.read( buf, sizeof(T) );
std::memcpy( dst, buf, sizeof(T) );
}
// Code for reading a particular structure.
void ReadHeader(std::ifstream& file)
{
Header dst;
ReadFromFile( file, &dst.m_signature );
ReadFromFile( file, &dst.m_a );
ReadFromFile( file, &dst.m_b );
}
Note I don't use this mysterious File type you made use of in your example.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Wed, 23 May 2001 18:28:22 GMT Raw View
On Wed, 23 May 2001 09:12:54 GMT, Ross Smith <ross.s@ihug.co.nz> wrote:
> Niklas Matthies wrote:
> > On Mon, 21 May 2001 18:45:18 GMT, glancaster <glancaster@ntlworld.com> wrote:
> > >
> > > (b) create a binary standard file layout with a
> > > single call to a binary file output function, passing a pointer to the
> > > struct.
> >
> > The sketched mechanism is sufficiently helpful for (b) for my tastes.
> > It solves one particular and well-defined problem, namely the implemen-
> > tation-definedness and non-suppressability of the placement of padding
> > bytes within structures. There's a lot of things it doesn't solve, like
> > endianness, represention of signed integers and of floating types and
> > the like, and the inherent and practically unfixable tiedness of binary
> > formats (and I/O) to the architecture-dependant byte size.
>
> But unless you do solve at least some of those, you still don't have
> binary compatibility between implementations.
What packed structures achieve only meets a partial goal, I certainly do
agree to that, but it is already a very helpful one in its own right.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Wed, 23 May 2001 18:29:23 GMT Raw View
On Wed, 23 May 2001 09:18:04 GMT, Paul Mensonides <pmenso57@home.com> wrote:
> "Anthony Williams" <anthwil@nortelnetworks.com> wrote in message
> news:9eb44c$1tbqm$1@ID-49767.news.dfncis.de...
> | This is a fundamental issue, and needs to be addressed before such packed
> | structs can be used for IO. Basically, we need some support for bitstreams
> | (rather than char streams), with defined "endianness", so a multi-bit object
> | is always written and read the same on all systems.
>
> We are also going to run out of *usable* keywords as processors get
> larger and larger. Platform extensions might be fine for specialized
> systems but not for mass-market personal computers of all types. Say,
> in ten years you have the Ultra-Tritanium 256-bit processor. I'm not
> looking forward to writing: short short short short int x = 0; We
> will also have to deal *normally* with 8, 16, 32, 64-bit character
> sets, etc.. I don't like: unsigned short long wchar_t c = 'a'; much
> either!
> In other words, the C99 solution is only a temporary fix.
Integer types larger than 64 bits will only be necessary for specialized
applications (when you need to count more than 18 quintillion things).
While a generic mean for specifying N-bit integer types is certainly
something to be favored, it's not very likely that there will be a
shortage of convenient names for fundamental types.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Wed, 23 May 2001 22:01:27 GMT Raw View
> > If an alignment standard is required, I suggest natural alignment. This
is
> > where each fundamental type of size n is aligned on an n-byte boundary.
....and aggregates align to the lowest common multiple of their members'
alignment.
> > AFAIK all architectures treat natural alignment as aligned, whatever
other
> > alignment options they offer, so the only language extension this
requires
> > is a natural keyword to prefix a struct definition.
> Generally,
> architectures only require alignment up to a certain number of bytes,
> which is not necessarily the size of the largest type of the
> architecture, which in turn is not necessarily at least as large as the
> largest fundamental C++ type.
>
> A structure layed out with "natural alignment" might hence have even
> more padding than a regular structure (
Exactly! You can think of it as "worst-case alignment" if you want (although
that may sometimes be an oversimplification). Because of this it's supported
everywhere AFAIK.
> Packed structures are wanted exactly because the requirements of
> architectures differ, but binary formats need to be handled in an
> architecture-independent way.
Provided the binary format is itself naturally aligned, natural alignment
fulfils this requirement to the same extent as packed structs. Moreover it
does so without the restrictions and implementation complications we have
thrashed out for packed structs. You can use it for non-PODS, you don't need
new pointer and reference types, and the compiler doesn't have to emit extra
instructions for dealing with non-aligned data.
Many structs intended for binary compatibility purposes already use natural
alignment, albeit sometimes with dummy data members to enforce the correct
padding when compiler alignment settings are less strict. So in this sense,
natural alignment could be seen as just standardising existing practice.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Sean Parent <sparent@Adobe.COM>
Date: Thu, 24 May 2001 12:39:39 GMT Raw View
in article 9ec8t7$okg$1@neptunium.btinternet.com, Chris Newton at
chrisnewton@no.junk.please.btinternet.com wrote on 5/23/01 2:14 AM:
> <sigh> I'm still not getting through to people, I can tell... :-)
>
> Let us suppose we have a binary file to read from disk, which begins
> with some header information as follows.
>
> Signature: 16-bit unsigned integer
> Value A: 32-bit signed integer
> Value B: 32-bit signed integer
I think what would be very helpful for these cases is a standard binary
stream. A binary stream would have formatting options for little endian vs.
big endian. You would be able to specify how many bytes to read for each
data type - (with good defaults), and you would be able to specify what
value to write for bools.
Then you example would look like:
struct header
{
header(std::binary_stream& stream)
{
stream.set_endian(std::big_endian);
stream >> fSignature;
stream >> fValueA;
stream >> fValueB;
}
unsigned short fSignature;
int fValueA;
int fValueB;
};
How efficient this is relies greatly on the implementation of your OS IO and
your streams - I use a library that does async read ahead and write behind
on streams and optimizes for disk head movement. This wouldn't even hit the
disk since the first page of the file was fetched when the stream was opened
(although it may block momentarily if enough of the file isn't read in yet).
--
Sean Parent
Sr. Engineering Manager
Adobe Workgroup Services
sparent@adobe.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: jk@steel.orel.ru (Eugene Karpachov)
Date: Thu, 24 May 2001 12:42:28 GMT Raw View
Wed, 23 May 2001 18:29:23 GMT Niklas Matthies =CE=C1=D0=C9=D3=C1=CC:
>Integer types larger than 64 bits will only be necessary for specialized
>applications (when you need to count more than 18 quintillion things).
What if somewant want fast bitvector implementation? Or to implement set =
with
no more then 100 elements?
--=20
jk
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Tue, 22 May 2001 20:30:33 GMT Raw View
There is still a lot of detail missing about packed (this is not surprising
for a brainstorming exercise) but for me the required changes to the
language are *already* too great for the benefit given.
If an alignment standard is required, I suggest natural alignment. This is
where each fundamental type of size n is aligned on an n-byte boundary.
Aggregates are aligned identically to their largest member. E.g.
natural struct MyNatural
{
int8_t c;
int16_t s;
int32_t l;
};// sizeof(MyNatural) == 8, aligned on 4 byte boundaries.
(Assume the intX_t types are the obvious typedefs or new language defined
fixed size types like C99's.)
AFAIK all architectures treat natural alignment as aligned, whatever other
alignment options they offer, so the only language extension this requires
is a natural keyword to prefix a struct definition.
Kind regards
Garry Lancaster
Codemill Ltd
mailto << "glancaster" << at "codemill" << dot << "net";
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "C. M. Heard" <heard@vvnet.com>
Date: Tue, 22 May 2001 20:30:55 GMT Raw View
"Chris Newton" wrote:
[ ... ]
> Nope, sorry, I want *exactly* fixed types. I want
> sizeof(my struct) to equal sum(sizeof(members of
> my struct)). Anything else defeats my purpose,
> which is the ability to handle precisely defined binary
> formats. I will reiterate, this wouldn't be possible on
> a small number of implementations, but would be
> useful on the vast majority.
[ ... ]
> Everyone seems to be missing my point, so I'm obviously not making it
> very well. What I am arguing for is a new possibility, so that I can
> read in non-aligned data if I need to, in a standard way. This would
> typically be to support the binary formats I keep mentioning, although
> maybe it has applications elsewhere as well. I do not propose forcing
> the use of packed structs for routine work in any way; there is no
> overhead here if packed structs are not used.
A very common use for this sort of structure is in the processing
packet headers in communication protocols. However, even if packed
structs were universally supported a program that used them would
not be portable unless the binary representation of the fields were
also standardized. In particular, it would be necessary to standardize
the byte order of multibyte quantities.
> People seem to be ignoring the fact that *this already happens*. I would
> go so far as to say that the vast majority of serious applications I've
> ever worked on would have benefitted from this being standardised. The
> alternative is that, every time you want to perform any sort of binary
> I/O, you create an array of char, use that as some sort of monolithic
> buffer, and then cast everything into or out of that array to get at the
> actual data. Typically, that casting is itself done using #defined or
> typedef'd types, which either differ between implementations, or have to
> be provided and changed on a per-platform basis by people porting the
> code. It's all very awkward and error-prone, and yet people are doing it
> all the time, because they don't have a choice.
A better idea, if you want to be portable, is to use an array of unsigned
char and write explicit conversion functions using shifting and masking
to extract what you want. This works on arbitrary architectures and
does not require the use of any exactly sized types. It only requires
types that area at least as large as the size that you want.
> Standardising this behaviour by adding the packed structs and fixed-size
> types I proposed would have several immediate and highly visible
> effects.
>
> (a) It makes programs more readable. The buffer need no longer be an
> array of char. A packed struct can be written to reflect the way a
> binary data format really is. This allows helpfully named and typed
> members - no more lists of enumerated offsets that, hopefully, match the
> array you're talking about, no more trying to remember whether that's a
> two-byte or four-byte integer every time you read/write the structure so
> you can cast properly.
In my experience this turns out not to be so helpful if byte-ordering is
an issue. A good example is the BSD implementation of the TCP/IP protocol
suite. The headers in this protocol were carefully designed to allow
the headers to be overlaid with structs on common 32-bit machines. The
BSD code takes advantage of this. Alas, much of the kernel code, as well
as some extensions such as the ipfilter package, tend to be littered with
htons/ntohs and friends to deal with byte ordering issues. It's very
easy to forget to put one of those suckers in when it's needed.
> (b) It makes programs safer. Accessing data in the buffer is done via
> the named member of the packed struct - no more off-by-one errors where
> you read the wrong data because the offset enum wasn't changed, no more
> hiding errors because you're casting everything from unformatted bytes
> into whatever type it's supposed to be.
>
> (c) It makes programs easier to write. Defining a packed structure is
> easier and less error-prone than definining a suitable size for an array
> of char and a list of offset constants. Using that structure does not
> require casting clutter.
It is a pain in the neck to generate the list of offset constants and
explicit conversion functions (_not_ casts please if you value portability).
I will grant that. However, when byte-ordering is an issue, I have found
that technique tends to be far less prone to portability problems than
using structs with htons/ntohs and friends. Programs that are originally
written on machines where the byte re-ordering functions are no-ops tend
to have some of them missing (casts have the same problems, BTW). Programs
that use offset lists and explicit conversion functions sometimes take more
effort to write, but are usually easier to get right the first time.
> In fact, the list of advantages is pretty much the same as the list of
> advantages brought by having any struct construction in the language.
> All I'm asking is that a possibility be added so that structures can be
> used in one of the most important scenarios for structured data: binary
> I/O protocols.
Most binary I/O protocols ultimately rely on transferring sequences of
8-bit bytes. What you want the packed structures to do for you is to
map data structures onto such byte sequences. For transferring the
mapped data structures between _unlike machines_ you need not only to
fix the sizes of all fields but also all aspects of the external
representation (such as byte order, 2's vs 1's complement, etc.).
Mike
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Wed, 23 May 2001 08:59:07 GMT Raw View
On Mon, 21 May 2001 18:45:18 GMT, glancaster <glancaster@ntlworld.com> wrote:
> NM =Niklas Matthies
> GL = Garry Lancaster
>
> NM: Again, these operations are something that people who need to access
> NM: data records with packed members need to do anyway.
>
> GL: Who are these people? What are they actually trying to do? See
> GL: point 6 below.
Me, and I'm certainly not alone. As someone else already explained in
this thread, the main purpose is I/O of binary formats that have a
non-padded (or at least implementation-independently padded) and
sometimes misaligned structure, and to safely and portably read und
write to the fields of such structures.
> NM: This proposed
> NM: extension allows them to let the compiler choose the most efficient way
> NM: to do this, and to not clutter the source code with these operations.
>
> GL: And by hiding what is actually happening, increasing the liklihood
> GL: that people will use this inefficient technique, even when they
> GL: don't actually have to.
It's still within the spirit of C++ to allow people to shoot themselves
in their foot. I don't really buy your argument; people who decide to
use packed structures without being aware that they trade of space for
time will get what they deserve. This is similar to, say, using
bitfields where they aren't appropriate.
> NM: One approach that seems straightforward to me
> NM: is that `packed' would be a qualifier in object declarations (much like
> NM: const and volatile), not in type definitions. E.g.:
>
> NM: A a; // regular non-packed A
> NM: packed A pa; // packed A
>
> NM: All members of a packed-declared object would be "packed" themselves,
> NM: recursively. Alternative versions of member functions would be generated
> NM: when necessary.
>
> GL: Sometimes the compiler would not have the information to do this i.e.
> GL: if the definition of a member function was in another translation unit.
> GL: You need help from the linker or a pre-link stage. This is a can of
> GL: worms. Avoid it. Ban non-POD types from packed structs (I think the
> GL: OP may have suggested this.)
I agree that this would be a reasonable trade-off.
> NM: It would be undefined behaviour to access packed objects
> NM: through pointers to the non-packed-qualified type (similar as with const
> NM: and volatile). For non-aggregate types, "packed" translates to merely
> NM: "not necessarily aligned". E.g. in
>
> NM: struct A { char c; int i; };
> NM: packed A pa;
>
> NM: the type of `pa.i' would be `packed int', the type of `&pa.i' would be
> NM: `packed int *', and the compiler would know that for accesses to `packed
> NM: int' it needs to generate misalignment-safe code.
>
> GL: So you have abandoned your original idea of disallowing pointers and
> GL: references to members and now added new types of pointers and
> GL: references to the language.
Yes.
> GL: Which conversions will you allow between
> GL: the different pointer and reference types?
> GL: It seems that T* => packed T* and T& => packed T& would be an
> GL: obvious ones to allow, but are flawed.
For non-aggregate types, T -> packed T should be fine. For aggregate
types, member accesses wouldn't generally behave as expected any more; I
suppose that's what you mean by their being flawed.
> What about packed T* => void*?
Seems okay to me.
> GL: Are any conversions implicit or do they require const_cast<>,
> GL: static_cast<> or reinterpret_cast<>?
I personally wouldn't mind at all banning implicit conversions.
The introduction of a packed_cast<> that is valid for those conversions
where member accesses using the target type still have well-defined
behavior, would make sense. Const_cast<> doesn't make sense (or will
even be dangerous), and reinterpret_cast<> would take away too much of
the benefit of packed structures (i.e. safe accesses). Extending
static_cast<> to cover this case might be an option, but a dedicated
cast operator seem more sensible.
> GL: Existing C++98 code will not accept your new pointer and
> GL: reference types, unless you define conversions
> GL: from packed to current pointer/ref types.
The only reasonable types for passing pointers to existing code are
void * and unsigned char *. The purpose of the proposed packed structure
facility is exactly for such code that needs to handle binary structures.
Most often, this code will be located within a single module, or a small
number thereof, therefore I wouldn't expect passing pointers to such
structures around be a very common need, at least not where the type of
the pointer is of interest.
> GL: The standard library will of course need to be updated to
> GL: accept packed refs and pointers when sensible - this could greatly
> GL: increase it's size.
I don't think so. Packed pointers can have the same representation as
void * or unsigned char *, so there should be no problem passing them to
library functions that process raw memory or character strings. I see no
need to have them passed to library functions for other purposes. What
functions are you thinking of?
> GL: BTW, what is the type of &pa in your example? packed A* or A*?
It is packed A*.
> NM: Well, I didn't make the original proposal, I'm just defending its
> NM: purpose, and yes, I obviously need to make things up as we go along,
> NM: just as you point out possible problems. IMHO that's a Good Thing.
>
> GL: I hope you think it is fair to point out how much you have extended the
> GL: original, intuitively appealing, idea, in order to get it to work.
Yes. I don't think I have extended the original idea, though, but just
pointed out mechanisms that would makes it work, as objections where
raised to the implementability of the idea.
> GL: So far the mods are:
>
> GL: 1. Compiler to add extra invisible copies to cope with non-alignment.
I don't conceptually see them as copies, no more than in arithmetic
expressions, the values of subexpressions are "copies", or the accesses
of bit-fields involve copies. Copying into an aligned object is one
possible implementation strategy, yes, but certainly not a generally
required one and only needed on particular architectures. The rationale
is that the compiler should know the most efficient way to implement
this, so the programmer should not have to fiddle around with
complicated expressions involving unsafe casts and hope for the
compiler's optimizer to be smart enough (which it generally isnt't) to
see that what the programmer actually wants is to read the value out of
a possibly misaligned object.
> GL: 2. New types of pointers and references added to language and
> GL: library e.g.
> GL: packed int*. How these work exactly is not currently defined.
To the language. As said above, I currently don't see the library
relevance. As for how they work, details would certainly need to be
hammered out, but as long as unpacked packet T pointers can be passed
around and can be converted to void * and character pointers, I guess
that everyone that has a need for packed structures will be content.
Remember that I didn't attempt at any time to make any proposal readily
fit to be added to the standard, but merely defend the implementability
of the concept of a packed structure.
> GL: 3. (a - If you take my advice) Non-POD types cannot be packed.
> GL: (b - If you don't) Requirement for pre-linker or linker
> enhancements.
Yes.
Since I currently do not plan at all to write any specific proposal,
the question of whether I "take your advice" is moot. We are merely
discussing the possibilities to get a grip on their feasability,
benefits and drawbacks.
> GL: And there's some other points that need addressing.
>
> GL: 4. Do you allow bit-fields in packed structs. If so, will you specify
> GL: their binary layout (currently implementation defined)?
It might make sense to do so. But it may possibly be wiser to let it be,
or to provide this functionality by different means to prevent people to
come to the wrong conclusion that bit-fields in non-packed structures
have an implementation-independent layout.
> GL: 5. Do you want to pack down to the *bit* level? Important for bit-fields
> GL: and (for some people's ideas of) the proposed fixed-size types.
No, of course not, and I don't think some people's ideas of the proposed
fixed-size types are very well founded if we want CHAR_BIT to continue
to be implementation-defined.
> GL: 6. Last, but not least, what do you want packed structs for? In my
> GL: experience people ask for something like this to
>
> GL: (a) map exactly to a hardware structure
>
> GL: (b) create a binary standard file layout with a
> GL: single call to a binary file output function, passing a pointer to the
> GL: struct.
>
> GL: The proposed solution is massive overkill for (a) and
> GL: insufficient for (b).
As I wrote above, primarily (b), although it is certainly suitable for
(a), and might also be used in situations unrelated to binary I/O where
space efficiency at any cost has highest priority.
The sketched mechanism is sufficiently helpful for (b) for my tastes.
It solves one particular and well-defined problem, namely the implemen-
tation-definedness and non-suppressability of the placement of padding
bytes within structures. There's a lot of things it doesn't solve, like
endianness, represention of signed integers and of floating types and
the like, and the inherent and practically unfixable tiedness of binary
formats (and I/O) to the architecture-dependant byte size.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Ross Smith <ross.s@ihug.co.nz>
Date: Wed, 23 May 2001 09:12:54 GMT Raw View
Niklas Matthies wrote:
>
> On Mon, 21 May 2001 18:45:18 GMT, glancaster <glancaster@ntlworld.com> wrote:
> >
> > (b) create a binary standard file layout with a
> > single call to a binary file output function, passing a pointer to the
> > struct.
>
> The sketched mechanism is sufficiently helpful for (b) for my tastes.
> It solves one particular and well-defined problem, namely the implemen-
> tation-definedness and non-suppressability of the placement of padding
> bytes within structures. There's a lot of things it doesn't solve, like
> endianness, represention of signed integers and of floating types and
> the like, and the inherent and practically unfixable tiedness of binary
> formats (and I/O) to the architecture-dependant byte size.
But unless you do solve at least some of those, you still don't have
binary compatibility between implementations.
Let's take a fresh look at all this. (Translation: I've waded through
everybody else's opinions so now I'm going to make you lot wade through
mine. :-) )
The basic goal is to make it possible to read and write data structures
whose binary formats are externally imposed, and which may not be
compatible with the compiler's normal ideas about structure layout and
value representation.
Start by eliminating the things that are just plain Too Hard. Certainly
forget about floating point; requiring compilers to translate arbitrary
FP formats is obviously way beyond the bounds of reason. Second, forget
about bitfields; trying to specify nine and sixty ways of constructing
tribal lays on a bit-by-bit basis just opens too many cans of worms.
Third, forget about compatibility between systems with different bytes
sizes. That last one is going to be a bit controversial, so I'll go into
more detail.
There are basically two classes of system with CHAR_BIT != 8. One is old
boxes with 36/48/72-bit words and 9-bit characters. Moving data between
these systems and the 8-bits-to-the-byte world necessarily involves
custom hardware that does complicated bit twiddling; this kind of thing
_can't_ be made portable, it's always going to depend on the specific
details of the systems involved, and trying to standardise it would be a
waste of time.
The other is DSP chips that do everything in fixed-size chunks, where
CHAR_BIT along with everything else is 32 or 64 bits. Here a case could
be made for portability; these chips often share a box with more
conventional CPUs, and their word size is a multiple of 8 so packing
8-bit bytes into a 32/64-bit word makes sense. But the nature of the
processing involved suggests that this will rarely be needed -- a DSP
treats its data as a single chunk more or less by definition; if it was
likely to need to break it down into smaller parts, the compiler
implementers would presumably have provided the appropriate types and a
smaller CHAR_BIT in the first place.
Then there's the signed integer problem. All versions of C and C++ agree
on the representation of positive integers, regardless of whether the
data type is signed or not. C89 and C++ both leave the representation of
negative integers entirely up to the implementation, while C99 restricts
it to one of three (four? I forget) allowed formats. The formats are
simple enough that restricting integers (at least in packed structs) to
these and providing conversions between them would be technically
possible, but I have my doubts about whether it would be worth the
trouble. (I suspect that virtually all binary formats use twos
complement anyway -- perhaps a compromise would be to restrict packed
data to that form.) I'll leave this one open.
Next problem: integer sizes. Given that I'm not attempting compatibility
between systems with different byte sizes, it's only necessary to
specify integer sizes in bytes, not bits; there's no need to force a
36-bit system to handle 32-bit integers. This is obviously going to
interact with whatever approach is adopted to integer sizes in general
-- C99's <stdint.h> and so on. Rather than try to duplicate that effort
here, I'll just assume that some way is provided of specifying integers
of known size, restricted to those sizes naturally supported by the
architecture, and that integers with the same specified size (e.g. "at
least 32 bits") can be trusted to be the same size on every system with
the same CHAR_BIT.
Finally: endianness. This is complicated by the existence of
"middle-endian" systems that don't fit either of the obvious models, but
not, I think, so much that we should simply exclude such systems by
fiat. Any system capable of IP already has htonl() and its relatives;
all we need to do to allow direct handling of structures with externally
specified endianness is to build the equivalent into the compiler. We
can allow for middle-endian systems without having to allow for
middle-endian _external_ formats.
Putting it all together into the vague outline of a proposal:
I'll assume the existence of a sized-integer syntax in which something
like "int<32>" means "the smallest integer with at least 32 bits", and
that we can trust this to always be the same size on two systems with
the same CHAR_BIT. I'll punt the negative representation issue for the
moment.
Four new keywords: packed, big_endian, little_endian, and native_endian.
A struct or class (I'll use struct from here on and take "or class" as
read) can be declared packed, by adding that keyword before the struct
keyword. A packed struct must be a POD type; its data members are
further restricted to primitive integer types (no bitfields), other
packed structs, and arrays of such types.
The size of a packed struct is always exactly equal to the sum of the
sizes of its data members. Packed structs whose size is not appropriate
to the underlying system's alignment may not be placed in an array;
exactly which sizes allow this is implementation-defined. It is illegal
to take the address of a member of a packed struct.
(Open question: What about STL containers? It makes sense for vectors to
be restricted in the same way as arrays, but what about the others?)
Integer members of a packed struct may be tagged as big_endian,
little_endian, or native_endian. A complete packed struct type can also
be given an endianness tag; all integers within the struct will then
default to that endianness (native_endian if the struct isn't tagged).
The compiler will insert the code to convert between native and
non-native formats in assignments involving packed struct members. For
types whose sizeof is 1, endianness tags have no effect.
Example:
packed struct ip_header {
unsigned<8> version_ihl;
unsigned<8> tos;
big_endian unsigned<16> length;
big_endian unsigned<16> identification;
big_endian unsigned<16> flags_offset;
unsigned<8> ttl;
unsigned<8> protocol;
big_endian unsigned<16> checksum;
big_endian unsigned<32> source;
big_endian unsigned<32> destination;
};
(Equivalently, declare the whole thing big_endian and leave it off the
fields.)
This gets you as much portability as can reasonably be expected. The
version_ihl and flags_offset fields will still need to be split into
their component bitfields by hand. It won't work on CHAR_BIT!=8 systems,
but IP on such systems is going to require a whole bunch of non-portable
black magic anyway.
--
Ross Smith <ross.s@ihug.co.nz> The Internet Group, Auckland, New Zealand
========================================================================
"Hungarian notation is the tactical nuclear weapon of
source code obfuscation techniques." -- Roedy Green
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Wed, 23 May 2001 09:14:34 GMT Raw View
glancaster <glancaster@ntlworld.com> wrote...
> We're in agreement then. Not portable to all existing systems
> that support C++, but possible in an optional part of the standard.
Yes.
> [Chris Newton wrote...]
> > Again, that's true, but again, it's missing the point. Much of
> > the binary data read and written in real programs is shared
> > between different applications on the same machine (or at
> > least, the same family of hardware). At the moment, C++
> > provides *ZERO* support for doing this in *ANY* portable
> > way, AFAIK.
>
> Not true. Within your limited definition of portability (i.e.
> portable within hardware families), C++ supports portable
> binary I/O in the form of read and get member functions
> in std::istream and put and write member functions in
> std::ostream. There are also legacy functions from the C
> standard library.
<sigh> I'm still not getting through to people, I can tell... :-)
Let us suppose we have a binary file to read from disk, which begins
with some header information as follows.
Signature: 16-bit unsigned integer
Value A: 32-bit signed integer
Value B: 32-bit signed integer
Please tell me how you are going to get the above into a meaningful
"Header" struct using only currently standard C++. To make things
reasonably realistic, I will forbid the use of the following in your
answer.
1. More than one disk access to read the data from the file
2. Different input functions to read every type in the known universe
3. Casts
4. Any typedef, #pragma or other code that would likely need adjusting
when porting to a new compiler on the same platform.
I will claim, playing devil's advocate somewhat, that each of the above
either does not scale well (causing performance and/or maintenance
problems) and/or significantly increases the risk of bugs. If you
disagree on any point, feel free to use it, but please explain why you
disagree.
For comparison, here is the sort of code I would like to be able to
write, assuming the presence of the extensions I propose.
struct Header
{
unsigned int<16> m_signature;
int<32> m_a;
int<32> m_b;
};
// The following use of a "packed" keyword might be
// convenient, allowing a single struct declaration to
// be used to define corresponding packed and
// unpacked structures
typedef Header UnpackedHeader;
typedef packed Header PackedHeader;
void ReadHeader(File & file)
{
PackedHeader ph;
file.read(&ph, sizeof(ph)); // Throws on failure, perhaps
// Now either access the odd member directly,
// accepting a possible performance hit
// if the generated code has to work around
// awkward alignments...
unsigned int<16> sig = ph.m_signature;
// ...or copy memberwise to an unpacked
// version using a compiler-generated
// "unpacking" assignment operator
UnpackedHeader uh = ph;
}
For the programmer, this is easy to write, easy to read, easy to
maintain and hard to get wrong (without generating an obvious
compile-time error).
The only awkward point I can see for compiler writers is generating code
to work around alignment issues to access the members of a packed struct
(from which things like the "unpacking" op= automagically follow). The
compiler is free to optimize the handling of the packed struct in any
way possible on the particular platform, with no assistance needed from
the programmer.
OK, let's see you do better with the current standard. :-)
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Paul Mensonides" <pmenso57@home.com>
Date: Wed, 23 May 2001 09:18:04 GMT Raw View
"Anthony Williams" <anthwil@nortelnetworks.com> wrote in message
news:9eb44c$1tbqm$1@ID-49767.news.dfncis.de...
| This is a fundamental issue, and needs to be addressed before such packed
| structs can be used for IO. Basically, we need some support for bitstreams
| (rather than char streams), with defined "endianness", so a multi-bit object
| is always written and read the same on all systems.
|
| Anthony
We are also going to run out of *usable* keywords as processors get larger and
larger. Platform extensions might be fine for specialized systems but not for
mass-market personal computers of all types. Say, in ten years you have the
Ultra-Tritanium 256-bit processor. I'm not looking forward to writing: short
short short short int x = 0; We will also have to deal *normally* with 8, 16,
32, 64-bit character sets, etc.. I don't like: unsigned short long wchar_t c =
'a'; much either!
In other words, the C99 solution is only a temporary fix.
Paul Mensonides
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Wed, 23 May 2001 13:43:42 GMT Raw View
> If an alignment standard is required, I suggest natural alignment. This is
> where each fundamental type of size n is aligned on an n-byte boundary.
> Aggregates are aligned identically to their largest member.
I should have written "aggregates are aligned according to the lowest common
multiple of their members' alignments". Largest member doesn't work for
certain cases e.g. 6 byte and 8 byte members.
Kind regards
Garry Lancaster
Codemill Ltd
mailto: glancaster@codemill.net
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Sat, 19 May 2001 18:45:22 GMT Raw View
Garry Lancaster:
> > I'm curious as to how you would implement, say, an 8 bit type on a
machine
> > with 9 bit bytes.
James Kuyper Jr:
> By using only 8 of the 9 bits, the same way that bit-fields are
> implemented in structures.
And do you also want to artificially limit the range of this type to that of
an 8 bit type?
If yes, rather inefficient for some machines and arguably not in the spirit
of C++. If no, I would say you actually have a "fixed minimum size type"
rather than a "fixed size type".
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Sat, 19 May 2001 18:45:11 GMT Raw View
glancaster <glancaster@ntlworld.com> wrote...
> Chris Newton suggested:
> > 1. Fixed-size integral types
>
> Which ones?
Any that make sense. That's the point. You can have int<8>, int<16>,
unsigned int <32> or even int<17> if your platform happens to support
it.
All of the features I suggested may not apply to every C++
implementation in existence, but are of significant use to very, very
many developers. I suggest that they simply be made optional, such that
requesting an int<17> or packed struct on a platform that can't sensibly
do it results in a compile-time error.
At that point, you can replace the offending code with something
platform-specific instead. Note that this is exactly what you would have
had to do for *every* platform before, if you used
implementation-specific macros or keywords to fix the sizes of your
structures (which you did, since at present you don't have any choice).
> Bear in mind that, as the standard currently works, they
> would have to be implementable on all machines that can
> currently support C++. I fear this would not be portable.
That, again, depends on how you define portable. I would find the
ability to write code that was portable between pretty much every
compiler and OS on a given hardware platform very valuable indeed. It
removes the barriers to swapping compilers that currently exist (e.g.,
those platform-specific macros) and makes it much easier to port between
different OSes.
Ceratinly, there are still other issues and to consider, including
endianness and platforms with unusual natural integer sizes. However,
note that my suggestions do nothing to make them any worse than they are
now. You can still have int, long, short, etc. as
implementation-dependent sizes, too.
> > (and other useful common types)
>
> I certainly agree with this, but it isn't terribly specific ;-)
I was talking about the char<> and fixed<> ideas. :-)
> > 2. Packed structures
>
> Different machines have different alignment requirements. So,
> again, I can't see a portable way of doing this.
Again, that's true, but again, it's missing the point. Much of the
binary data read and written in real programs is shared between
different applications on the same machine (or at least, the same family
of hardware). At the moment, C++ provides *ZERO* support for doing this
in *ANY* portable way, AFAIK.
I see no reason to cripple the vast numbers of people who would find
this useful, purely in order to include nothing but 100% universal
content in the standard. Trying to keep all of standard C++ 100%
universal is *the* surest way of killing it (or having it overrun with
implementation-specific extensions) IMHO.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Sat, 19 May 2001 19:49:31 GMT Raw View
Niklas Matthies <news/comp.std.c++@nmhq.net> wrote [abridged]...
> glancaster <glancaster@ntlworld.com> wrote:
> > The standard does not currently require fixed-size integral
> > types *at all*.
Indeed. That is precisely the flaw I hope to address here.
> > I'm curious as to how you would implement, say, an 8 bit
> > type on a machine with 9 bit bytes.
You probably wouldn't. You're still working on this assumption that
standard C++ must be 100% portable. Please, let's stop crippling C++ so
that it works exactly the same on 5% of implementations, at the expense
of the other 95%. At the moment, portability across totally different
platforms (which is rarely needed in practice) is screwing up
portability across compilers or OSes (which is needed every day by many
people). Why?
If you want implementation-specific sensible integer sizes, you can use
short, int or long. I am not proposing any change to that, nor would I
want to see one. However, if you want to match a defined binary data
standard, for example reading a binary file from disk, you have to use a
suitable fixed size type, such as those I'm proposing. The real world
already does this every day, but it uses implementation-defined macros
such as WORD or BYTE to do it, instead of a nice, standard type.
> At least I assume that the original poster doesn't actually
> want a type that always occupies exactly N bits (and no
> more) in memory, but one whose range of values is exactly
> [0, 2^N -1] or [-2^(N-1), 2^(N-1)-1].
Nope, sorry, I want *exactly* fixed types. I want sizeof(my struct) to
equal sum(sizeof(members of my struct)). Anything else defeats my
purpose, which is the ability to handle precisely defined binary
formats. I will reiterate, this wouldn't be possible on a small number
of implementations, but would be useful on the vast majority.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sat, 19 May 2001 19:51:09 GMT Raw View
glancaster wrote:
...
> > At least I assume that the original poster doesn't actually want a type
> > that always occupies exactly N bits (and no more) in memory, but one
> > whose range of values is exactly [0, 2^N -1] or [-2^(N-1), 2^(N-1)-1].
>
> This would be rather inefficient and annoying to implement on machines that
> don't have N size bytes, since it requires extra operations to keep them in
> range. This "tyranny of the many" is arguably against the "spirit" of C++.
> As I said before, I think the idea has merit as an optional part of the
> standard for certain machine classes.
Keep in mind that you pay the price of the inefficient implementation
only when you use the type; requiring that the types be supported
imposes an unavoidable burden only on the implementor, not on users of
the implementation. Still, I appreciate the C99 approach, which is to
not make any of the exact-sized types mandatory, and to specify macros
whose definition can be tested for, to determine whether any particular
exact-sized type is supported. I also think that least-sized types,
which are mandatory in C99 for 4 different sizes, are far more important
than fixed-size types. Of course, a lot of idiots out there are going to
use int16_t even when all they need is int_least16_t, if only because it
has the shorter name (a bad decision, IMO, and one that C++ needn't
duplicate).
Terminology:
exact-size = takes up exactly the number of bits specified, and uses
them all.
fixed-size = uses exactly the number of bits specified, may have unused
padding bits.
least-size = uses at least the number of bits specified, but may use
more, and may have unused padding bits.
...
> For example, you mean that on some machines a structure such as this:
>
> packed struct A
> {
> char c;
> int i; // Not aligned.
> };
>
> would require a simple ++i to generate:
>
> 1. A non-aligned copy to an aligned integer i_temp.
> 2. Increment aligned i_temp.
> 3. A non-aligned copy back to non-aligned i.
The copy can, of course, be in a register.
> Although C++ often inserts extra non-visible code behind the scenes AFAIK it
> has never done this before for basic operations on fundamental types like
> ints.
It does it all the time, on some architectures. The C++ standard does
not specify how the the semantics it specifies are to be implemented,
and the required semantics for some of the standard types can be
expensive to emulate on some platforms. There used to be many platforms
(there's probably still a few) where there was no hardware support for
'long'; it had to be implemented as, in effect, int[2]. All of the
attendant complexity is handled by the compiler - as far as the user is
concerned, 'long' is just a larger and slower version of 'int'.
On many platforms, short poses a problem, because it's shorter than the
smallest register size. Implementing it often requires bit-masking
operations.
Its normally the case that either 'signed char' or 'unsigned char' is
the natural type for a given platform; that's the one that gets used as
'char'; the other one is usually a little more expensive to emulate.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: Matthew Austern <austern@research.att.com>
Date: Sat, 19 May 2001 20:33:25 GMT Raw View
"Chris Newton" <chrisnewton@no.junk.please.btinternet.com> writes:
> Nope, sorry, I want *exactly* fixed types. I want sizeof(my struct) to
> equal sum(sizeof(members of my struct)). Anything else defeats my
> purpose, which is the ability to handle precisely defined binary
> formats. I will reiterate, this wouldn't be possible on a small number
> of implementations, but would be useful on the vast majority.
I think it's the other way around, actually. If you've got a struct X
with three members, each of which has size 2, and one member with size
1, then on most platforms I think you'll find that sizeof(X) == 8.
Alignment is a very real concern on most architectures.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Sun, 20 May 2001 10:47:35 GMT Raw View
On Sat, 19 May 2001 08:57:37 GMT, glancaster <glancaster@ntlworld.com> wr=
ote:
[=B7=B7=B7]
> > At least I assume that the original poster doesn't actually want a ty=
pe
> > that always occupies exactly N bits (and no more) in memory, but one
> > whose range of values is exactly [0, 2^N -1] or [-2^(N-1), 2^(N-1)-1].
>=20
> This would be rather inefficient and annoying to implement on machines
> that don't have N size bytes, since it requires extra operations to
> keep them in range.
The assumption is that the programmer (for whatever reason) needs a
fixed-size integer type. Currently, if portability is important, he has
to pick some integer type that is guaranteed by the standard to be at
least as large, and which might unnecessarily waste space if it is much
larger, and code those extra operations himself, which is non-trivial,
error-prone and often with sub-optimal runtime efficiency (I've done
that myself a couple of times, it really is annoying).
In practice, the usual strategy is to not go that route, but pick one
type that is known to happen to be a fixed-sized type of the needed size
on the particular implementation, and to change that type whenever the
program is ported to another implementation. C++-provided fixed-size
types would really help for those applications.
> > > > > > 2. Packed structures
[=B7=B7=B7]
> > > For POD types the *programmer* has the option of doing this *now*
> > > (3.9/2). You don't need an extension.
> >
> > You need an extension to declare and access such misaligned members i=
n
> > the same simple syntax as you use for regular aligned members today.
> > Ideally, inserting or removing some qualifier (say, "packed") in the
> > type's declaration should be sufficient to switch the type between be=
ing
> > packed or being non-packed, and the code accessing its members needn'=
t
> > have to be changed.
>=20
> I take it you mean the "source code" when you say "code" in your last
> sentence?
Yes.
> The code the compiler generates will have to change a lot.
Of course.
> For example, you mean that on some machines a structure such as this:
>=20
> packed struct A
> {
> char c;
> int i; // Not aligned.
> };
>=20
> would require a simple ++i to generate:
>=20
> 1. A non-aligned copy to an aligned integer i_temp.
> 2. Increment aligned i_temp.
> 3. A non-aligned copy back to non-aligned i.
This is one option how to implement this on a machine that doesn't
support unaligned accesses. Another option often is that i_temp is
actually a register which can be loaded and stored byte-wise, where
necessary using shifts, masks and a second register.
> Although C++ often inserts extra non-visible code behind the scenes
> AFAIK it has never done this before for basic operations on
> fundamental types like ints.
Again, these operations are something that people who need to access
data records with packed members need to do anyway. This proposed
extension allows them to let the compiler choose the most efficient way
to do this, and to not clutter the source code with these operations.
> > I'm truly confused about what problem you're seeing here. I'm
> > certainly not arguing anything like bypassing custom copy ctors and
> > custom assignment.
>=20
> For example,
>=20
> class WithCustomCopy {...}; // Requires 4 byte alignment.
>=20
> packed struct A
> {
> char c;
> WithCustomCopy wcc; // Unaligned
> };
>=20
> How do you copy a normal WithCustomCopy object to wcc or vice versa?
> You can't use the copy assignment operator it defines because that
> won't copy to or from an unaligned location (plus no references
> allowed, remember?). Ctors and dtors have the same problems.
I now see the problem. One approach that seems straightforward to me is
that `packed' would be a qualifier in object declarations (much like
const and volatile), not in type definitions. E.g.:
A a; // regular non-packed A
packed A pa; // packed A
All members of a packed-declared object would be "packed" themselves,
recursively. Alternative versions of member functions would be generated
when necessary. It would be undefined behaviour to access packed objects
through pointers to the non-packed-qualified type (similar as with const
and volatile). For non-aggregate types, "packed" translates to merely
"not necessarily aligned". E.g. in
struct A { char c; int i; };
packed A pa;
the type of `pa.i' would be `packed int', the type of `&pa.i' would be
`packed int *', and the compiler would know that for accesses to `packed
int' it needs to generate misalignment-safe code.
> (Incidentally, you are wrong about register variables (7.1.1/3) and,
Ah, yes, sorry, I was thinking C.
> It seems to me that what you propose is not really a "packed structure"=
at
> all, but a new type of aggregate that has some of the rules of structur=
es
> and some completely new rules that you just made up.
Well, I didn't make the original proposal, I'm just defending its
purpose, and yes, I obviously need to make things up as we go along,
just as you point out possible problems. IMHO that's a Good Thing.
What is wanted is an aggregate structure that doesn't contain padding
bytes. This may lead to misaligned objects, of course, and the machine
code to access such objects generally needs to differ from the code that
accesses aligned objects (provided we want the latter to have maximum
efficiency). The rules for structures need not to change, as far as I
can currently see.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Sun, 20 May 2001 10:48:55 GMT Raw View
glancaster wrote:
>
> Garry Lancaster:
> > > I'm curious as to how you would implement, say, an 8 bit type on a
> machine
> > > with 9 bit bytes.
>
> James Kuyper Jr:
> > By using only 8 of the 9 bits, the same way that bit-fields are
> > implemented in structures.
>
> And do you also want to artificially limit the range of this type to that of
> an 8 bit type?
As I understood it, that's what you wanted. I'm just telling you how to
get it. You asked how to implement an 8-bit type. If you didn't want an
8-bit type, why did you ask for one?
> If yes, rather inefficient for some machines and arguably not in the spirit
> of C++. ...
Agreed. However, there's no more efficient way to have an 8-bit type on
such a machine, so that's not a disadvantage. If you want efficient
implementation on such a machine, you'll have to accept a 9-bit type. I
presumed that was not what you were asking for, because the
implementation of such a type on such a machine is trivial.
C99 introduces some new typedefs that could provide a common ground for
discussion of what C++ should do:
[u]intN_t: Has exactly N bits, no padding bits, and the signed types are
required to use 2's complement representation. Not required to be
supported.
[u]int_leastN_t: The smallest supported integer type with at least N
bits. Required to be defined for N=8,16,32, and 64.
[u]int_fastN_t: The fastest supported integer type with at least N bits.
Required to be defined for N=8,16,32, and 64.
While the C99 standard does not require support for any values of N
other than those, it allows an implementation to provide any of them
that it chooses. There are macros for each of these, such as INT16_MAX,
whose presence can be tested with #ifdef to determine whether the
corresponding type is supported.
I'm not suggesting that C++ must adopt these types, in fact I think
they're badly named. intN_t should have been a least-sized or a fastest
integer, since that's what most people need in most contexts; of the
three types, exact-sized types are by far the least common need (though
there's no shortage of idiots who'll think they need one when they
don't). By giving the shortest name to the least useful form, C99 will
encourage such misuse. Arguably int<N>, int_exact<N>, and int_least<N>
would be more appropriate names in C++; another possibility is int:N,
borrowed from bit-fields.
My main point is that you should try to describe what you want using
terminology similar to that used by C99. That will provide a common
point of reference.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sun, 20 May 2001 11:38:35 GMT Raw View
In article <3B067364.EB319E2F@wizard.net>, "James Kuyper Jr."
<kuyper@wizard.net> wrote:
>... Still, I appreciate the C99 approach, which is to
>not make any of the exact-sized types mandatory, and to specify macros
>whose definition can be tested for, to determine whether any particular
>exact-sized type is supported.
I think must be the approach: With some compilers, some of the binary
specifications will be necessary, whereas on some platforms, they will be
very essential. Further, as it may take time to implement the new
features, it will be important that compiler writers can pick the features
that are most important for their particular platform.
Also, suppose that one tries to port a software package to a new platform.
Then one will want to what parts that do not work, so one can figure out
fixes.
>Terminology:
>exact-size = takes up exactly the number of bits specified, and uses
>them all.
>fixed-size = uses exactly the number of bits specified, may have unused
>padding bits.
>least-size = uses at least the number of bits specified, but may use
>more, and may have unused padding bits.
It think is also a very important question: How to introduce padding. In
many cases one may want to use an exact binary model in order access a
binary interface of some kind (say a binary specification standard), but
one may want to use padding internally in the program for the sake of
speed.
One example is Unicode: One wants to ensure that files are read and
written in 32-bit format or some of the other encodings formats UTF-8,
etc., but internally is suffices to know that the character type holds as
many bits as is required by Unicode.
But sometimes one may want an exact binary implementation too: Say one
introduces a type binary<n>, with exactly n bits. Then it may happen that
one want to create an array with no padding at all in order to ensure that
the bits a contiguous in memory. Of course this would be slower than by
using padding, so one would only want to use it when absolutely necessary.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: remove.haberg@matematik.su.se (Hans Aberg)
Date: Sun, 20 May 2001 11:39:15 GMT Raw View
In article <9e3kbm$oal$1@plutonium.btinternet.com>, "Chris Newton"
<chrisnewton@no.junk.please.btinternet.com> wrote:
>Any that make sense. That's the point. You can have int<8>, int<16>,
>unsigned int <32> or even int<17> if your platform happens to support
>it.
I suggested such a type in another post, except that suggested it to be
called binary<n> instead in order to not confuse it with integers. In
addition to the traditional int type of operations, it must have
implementation independent shift and (possibly) rotate operations.
In article <9e5m33$s4d$1@uranium.btinternet.com>, "Chris Newton"
<chrisnewton@no.junk.please.btinternet.com> wrote:
... if you want to match a defined binary data
>standard, for example reading a binary file from disk, you have to use a
>suitable fixed size type, such as those I'm proposing. The real world
>already does this every day, but it uses implementation-defined macros
>such as WORD or BYTE to do it, instead of a nice, standard type.
The new C++ standard is to support distributed programming, and also
embedded programming, so I figure that some kind of way to specify the
underlying binary model must be introduced in an appropriate manner.
It is also necessary if one should be able to produce Unicode files. Also,
when implementing polymorphic structures, it is sometimes necessary to
know more about the underlying implementation than is possible now in C++.
(For example, when self-mutating "unboxed" elements of derived classes, I
think.)
So I figure some such things where more about the underlying binary model
can be specified must be added to the C++ standard.
Hans Aberg * Anti-spam: remove "remove." from email address.
* Email: Hans Aberg <remove.haberg@member.ams.org>
* Home Page: <http://www.matematik.su.se/~haberg/>
* AMS member listing: <http://www.ams.org/cml/>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Sun, 20 May 2001 23:35:03 CST Raw View
Matthew Austern <austern@research.att.com> wrote...
> "Chris Newton" <chrisnewton@no.junk.please.btinternet.com> writes:
> > Nope, sorry, I want *exactly* fixed types. I want
> > sizeof(my struct) to equal sum(sizeof(members of
> > my struct)). Anything else defeats my purpose,
> > which is the ability to handle precisely defined binary
> > formats. I will reiterate, this wouldn't be possible on
> > a small number of implementations, but would be
> > useful on the vast majority.
>
> I think it's the other way around, actually. If you've got
> a struct X with three members, each of which has size 2,
> and one member with size 1, then on most platforms I
> think you'll find that sizeof(X) == 8. Alignment is a very
> real concern on most architectures.
That may well be true, and I'm not arguing for a change in status quo as
far as alignment of structs/classes generally goes. (I do note, however,
that alignment is often chosen for efficiency reasons, and is not always
an absolute requirement.)
Everyone seems to be missing my point, so I'm obviously not making it
very well. What I am arguing for is a new possibility, so that I can
read in non-aligned data if I need to, in a standard way. This would
typically be to support the binary formats I keep mentioning, although
maybe it has applications elsewhere as well. I do not propose forcing
the use of packed structs for routine work in any way; there is no
overhead here if packed structs are not used.
People seem to be ignoring the fact that *this already happens*. I would
go so far as to say that the vast majority of serious applications I've
ever worked on would have benefitted from this being standardised. The
alternative is that, every time you want to perform any sort of binary
I/O, you create an array of char, use that as some sort of monolithic
buffer, and then cast everything into or out of that array to get at the
actual data. Typically, that casting is itself done using #defined or
typedef'd types, which either differ between implementations, or have to
be provided and changed on a per-platform basis by people porting the
code. It's all very awkward and error-prone, and yet people are doing it
all the time, because they don't have a choice.
Standardising this behaviour by adding the packed structs and fixed-size
types I proposed would have several immediate and highly visible
effects.
(a) It makes programs more readable. The buffer need no longer be an
array of char. A packed struct can be written to reflect the way a
binary data format really is. This allows helpfully named and typed
members - no more lists of enumerated offsets that, hopefully, match the
array you're talking about, no more trying to remember whether that's a
two-byte or four-byte integer every time you read/write the structure so
you can cast properly.
(b) It makes programs safer. Accessing data in the buffer is done via
the named member of the packed struct - no more off-by-one errors where
you read the wrong data because the offset enum wasn't changed, no more
hiding errors because you're casting everything from unformatted bytes
into whatever type it's supposed to be.
(c) It makes programs easier to write. Defining a packed structure is
easier and less error-prone than definining a suitable size for an array
of char and a list of offset constants. Using that structure does not
require casting clutter.
In fact, the list of advantages is pretty much the same as the list of
advantages brought by having any struct construction in the language.
All I'm asking is that a possibility be added so that structures can be
used in one of the most important scenarios for structured data: binary
I/O protocols.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Mon, 21 May 2001 08:42:14 CST Raw View
> > Chris Newton suggested:
> > > 1. Fixed-size integral types
Garry Lancaster asked:
> > Which ones?
Chris Newton replied:
> Any that make sense. That's the point. You can have int<8>, int<16>,
> unsigned int <32> or even int<17> if your platform happens to support
> it.
We're in agreement then. Not portable to all existing systems that support
C++, but possible in an optional part of the standard.
[...]
> > > 2. Packed structures
> >
> > Different machines have different alignment requirements. So,
> > again, I can't see a portable way of doing this.
>
> Again, that's true, but again, it's missing the point. Much of the
> binary data read and written in real programs is shared between
> different applications on the same machine (or at least, the same family
> of hardware). At the moment, C++ provides *ZERO* support for doing this
> in *ANY* portable way, AFAIK.
Not true. Within your limited definition of portability (i.e. portable
within hardware families), C++ supports portable binary I/O in the form of
read and get member functions in std::istream and put and write member
functions in std::ostream. There are also legacy functions from the C
standard library.
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: jk@steel.orel.ru (Eugene Karpachov)
Date: Mon, 21 May 2001 18:42:51 GMT Raw View
Sun, 20 May 2001 23:35:03 CST Chris Newton =CE=C1=D0=C9=D3=C1=CC:
>Everyone seems to be missing my point, so I'm obviously not making it
No, not everyone; your point is very reasonable.
>People seem to be ignoring the fact that *this already happens*. I would
Let me to say farther: is is completely impossible to have binary librari=
es
without binary data support, because application compiler is not always
the same as the library compiler. If the goal is "to make the language be=
tter
as system and library tool", I can't see how to achieve this goal without
standard binary data support and standard packing.
--=20
jk
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Mon, 21 May 2001 18:45:18 GMT Raw View
NM =Niklas Matthies
GL = Garry Lancaster
NM: Again, these operations are something that people who need to access
NM: data records with packed members need to do anyway.
GL: Who are these people? What are they actually trying to do? See
GL: point 6 below.
NM: This proposed
NM: extension allows them to let the compiler choose the most efficient way
NM: to do this, and to not clutter the source code with these operations.
GL: And by hiding what is actually happening, increasing the liklihood
GL: that people will use this inefficient technique, even when they
GL: don't actually have to.
GL: So, to summarize our disagreement:
GL: "say what you mean" vs. "let the compiler do the work for you".
GL: A common dilemma in the design of C++, and not one with a
GL: consistent answer, although when dealing with fundamental
GL: operations on fundamental types, generally the former has been
GL: preferred in the past.
GL: class WithCustomCopy {...}; // Requires 4 byte alignment.
GL: packed struct A
GL: {
GL: char c;
GL: WithCustomCopy wcc; // Unaligned
GL: };
GL: How do you copy a normal WithCustomCopy object to wcc or vice versa? You
GL: can't use the copy assignment operator it defines because that won't
copy to
GL: or from an unaligned location (plus no references allowed, remember?).
Ctors
GL: and dtors have the same problems.
NM: I now see the problem.
GL: Glad I was able to explain it.
NM: One approach that seems straightforward to me
NM: is that `packed' would be a qualifier in object declarations (much like
NM: const and volatile), not in type definitions. E.g.:
NM: A a; // regular non-packed A
NM: packed A pa; // packed A
NM: All members of a packed-declared object would be "packed" themselves,
NM: recursively. Alternative versions of member functions would be generated
NM: when necessary.
GL: Sometimes the compiler would not have the information to do this i.e.
GL: if the definition of a member function was in another translation unit.
GL: You need help from the linker or a pre-link stage. This is a can of
GL: worms. Avoid it. Ban non-POD types from packed structs (I think the
GL: OP may have suggested this.)
NM: It would be undefined behaviour to access packed objects
NM: through pointers to the non-packed-qualified type (similar as with const
NM: and volatile). For non-aggregate types, "packed" translates to merely
NM: "not necessarily aligned". E.g. in
NM: struct A { char c; int i; };
NM: packed A pa;
NM: the type of `pa.i' would be `packed int', the type of `&pa.i' would be
NM: `packed int *', and the compiler would know that for accesses to `packed
NM: int' it needs to generate misalignment-safe code.
GL: So you have abandoned your original idea of disallowing pointers and
GL: references to members and now added new types of pointers and
GL: references to the language.
GL: Which conversions will you allow between
GL: the different pointer and reference types? It seems that:
GL: T* => packed T* and T& => packed T& would be an obvious ones
GL: to allow, but are flawed. What about packed T* => void*?
GL: Are any conversions
GL: implicit or do they require const_cast<>, static_cast<> or
GL: reinterpret_cast<>?
GL: Existing C++98 code will not accept your new pointer and
GL: reference types, unless you define conversions
GL: from packed to current pointer/ref types.
GL: The standard library will of course need to be updated to
GL: accept packed refs and pointers when sensible - this could greatly
GL: increase it's size.
GL: BTW, what is the type of &pa in your example? packed A* or A*?
GL: It seems to me that what you propose is not really a "packed structure"
at
GL: all, but a new type of aggregate that has some of the rules of
structures
GL: and some completely new rules that you just made up.
NM: Well, I didn't make the original proposal, I'm just defending its
NM: purpose, and yes, I obviously need to make things up as we go along,
NM: just as you point out possible problems. IMHO that's a Good Thing.
GL: I hope you think it is fair to point out how much you have extended the
GL: original, intuitively appealing, idea, in order to get it to work. So
GL: far the mods are:
GL: 1. Compiler to add extra invisible copies to cope with non-alignment.
GL: 2. New types of pointers and references added to language and
GL: library e.g.
GL: packed int*. How these work exactly is not currently defined.
GL: 3. (a - If you take my advice) Non-POD types cannot be packed.
GL: (b - If you don't) Requirement for pre-linker or linker
enhancements.
GL: And there's some other points that need addressing.
GL: 4. Do you allow bit-fields in packed structs. If so, will you specify
their
GL: binary layout (currently implementation defined)?
GL: 5. Do you want to pack down to the *bit* level? Important for bit-fields
GL: and (for some people's ideas of) the proposed fixed-size types.
GL: 6. Last, but not least, what do you want packed structs for? In my
GL: experience people ask for something like this to
GL: (a) map exactly to a hardware structure
GL: (b) create a binary standard file layout with a
GL: single call to a binary file output function, passing a pointer to the
GL: struct.
GL: The proposed solution is massive overkill for (a) and
GL: insufficient for (b).
Kind regards
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Mon, 21 May 2001 18:45:49 GMT Raw View
Garry Lancaster:
> > Although C++ often inserts extra non-visible code behind the scenes
AFAIK it
> > has never done this before for basic operations on fundamental types
like
> > ints.
James Kuyper Jr:
> It does it all the time, on some architectures. The C++ standard does
> not specify how the the semantics it specifies are to be implemented,
> and the required semantics for some of the standard types can be
> expensive to emulate on some platforms.
It's true the standard does not specify this directly.
However, I believe the rules for the current fundamental types are framed in
such a way to allow most architectures to implement them in an efficient
way. packed struct code is absolutely *guaranteed* to be inefficient on many
machines. So, there's definitely a difference in emphasis.
> There used to be many platforms
> (there's probably still a few) where there was no hardware support for
> 'long'; it had to be implemented as, in effect, int[2]. All of the
> attendant complexity is handled by the compiler - as far as the user is
> concerned, 'long' is just a larger and slower version of 'int'.
>
> On many platforms, short poses a problem, because it's shorter than the
> smallest register size. Implementing it often requires bit-masking
> operations.
Since sizeof(char) == sizeof(short) == sizeof(int) == sizeof(long) is a
legal implementation (3.9.1), the implementors you mention weren't
*required* to implement short or long this way, they *chose* to.
> Its normally the case that either 'signed char' or 'unsigned char' is
> the natural type for a given platform; that's the one that gets used as
> 'char'; the other one is usually a little more expensive to emulate.
I'll take you word on this, although I doubt the overhead is usually
comparable to packed struct code.
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Mon, 21 May 2001 18:46:20 GMT Raw View
> > Garry Lancaster:
> > > > I'm curious as to how you would implement, say, an 8 bit type on a
> > machine
> > > > with 9 bit bytes.
> >
> > James Kuyper Jr:
> > > By using only 8 of the 9 bits, the same way that bit-fields are
> > > implemented in structures.
Garry Lancaster:
> > And do you also want to artificially limit the range of this type to
that of
> > an 8 bit type?
James Kuyper Jr:
> As I understood it, that's what you wanted. I'm just telling you how to
> get it. You asked how to implement an 8-bit type. If you didn't want an
> 8-bit type, why did you ask for one?
Others proposed it, not me. I'm interested in how they would achieve it.
The 3 options are:
1. As an optional part of the standard. Unimplementable on a machine that
doesn't have the architecture to support that size exactly *with no
padding*. Like C99 intN_t.
2. A type with padding and range limiting as required, as you suggest. Not
in C99.
3. A minimum fixed size type. Like C99 int_leastN_t or int_fastN_t.
Looking at this thread, it's clear the original proposal meant different
things to different people.
[snip!]
> C99 introduces some new typedefs that could provide a common ground for
> discussion of what C++ should do:
>
> [u]intN_t: Has exactly N bits, no padding bits, and the signed types are
> required to use 2's complement representation. Not required to be
> supported.
>
> [u]int_leastN_t: The smallest supported integer type with at least N
> bits. Required to be defined for N=8,16,32, and 64.
>
> [u]int_fastN_t: The fastest supported integer type with at least N bits.
> Required to be defined for N=8,16,32, and 64.
>
> While the C99 standard does not require support for any values of N
> other than those, it allows an implementation to provide any of them
> that it chooses. There are macros for each of these, such as INT16_MAX,
> whose presence can be tested with #ifdef to determine whether the
> corresponding type is supported.
>
> I'm not suggesting that C++ must adopt these types, in fact I think
> they're badly named. intN_t should have been a least-sized or a fastest
> integer, since that's what most people need in most contexts; of the
> three types, exact-sized types are by far the least common need (though
> there's no shortage of idiots who'll think they need one when they
> don't). By giving the shortest name to the least useful form, C99 will
> encourage such misuse. Arguably int<N>, int_exact<N>, and int_least<N>
> would be more appropriate names in C++; another possibility is int:N,
> borrowed from bit-fields.
IMHO if the C99 types are adopted, then the names should be too. Anything
else is unwarranted gratuitous incompatibility
If people find the macro usage offensive then an *additional* alternative
could be supplied.
> My main point is that you should try to describe what you want using
> terminology similar to that used by C99. That will provide a common
> point of reference.
Seems sensible. But bear in mind that the option (2) isn't in C99.
Kind regard
Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Anthony Williams" <anthwil@nortelnetworks.com>
Date: Mon, 21 May 2001 18:48:14 GMT Raw View
"Chris Newton" <chrisnewton@no.junk.please.btinternet.com> wrote in message
news:9e8d81$s5l$1@uranium.btinternet.com...
> Everyone seems to be missing my point, so I'm obviously not making it
> very well. What I am arguing for is a new possibility, so that I can
> read in non-aligned data if I need to, in a standard way. This would
> typically be to support the binary formats I keep mentioning, although
> maybe it has applications elsewhere as well. I do not propose forcing
> the use of packed structs for routine work in any way; there is no
> overhead here if packed structs are not used.
Fine for in-memory stuff. Handle mis-aligned data correctly with packed
structs.
Thinking about it, though, why can't we just use bitfields?
struct x
{
unsigned char a:8,b:6,c:23;
};
Obviously, there's an issue of alignment (some systems impose an alignment
on structs, whatever their contents), but we get that problem anyway - what
if "char" on the host system is larger (in bits) than the size of the
struct - sizeof(x) would by 1, even if not all the bits were used, and
reading/writing "char" would skip bits anyway.
> People seem to be ignoring the fact that *this already happens*. I would
> go so far as to say that the vast majority of serious applications I've
> ever worked on would have benefitted from this being standardised. The
> alternative is that, every time you want to perform any sort of binary
> I/O, you create an array of char, use that as some sort of monolithic
> buffer, and then cast everything into or out of that array to get at the
> actual data. Typically, that casting is itself done using #defined or
> typedef'd types, which either differ between implementations, or have to
> be provided and changed on a per-platform basis by people porting the
> code. It's all very awkward and error-prone, and yet people are doing it
> all the time, because they don't have a choice.
>
> Standardising this behaviour by adding the packed structs and fixed-size
> types I proposed would have several immediate and highly visible
> effects.
>
> (a) It makes programs more readable. The buffer need no longer be an
> array of char. A packed struct can be written to reflect the way a
> binary data format really is. This allows helpfully named and typed
> members - no more lists of enumerated offsets that, hopefully, match the
> array you're talking about, no more trying to remember whether that's a
> two-byte or four-byte integer every time you read/write the structure so
> you can cast properly.
>
> (b) It makes programs safer. Accessing data in the buffer is done via
> the named member of the packed struct - no more off-by-one errors where
> you read the wrong data because the offset enum wasn't changed, no more
> hiding errors because you're casting everything from unformatted bytes
> into whatever type it's supposed to be.
>
> (c) It makes programs easier to write. Defining a packed structure is
> easier and less error-prone than definining a suitable size for an array
> of char and a list of offset constants. Using that structure does not
> require casting clutter.
>
> In fact, the list of advantages is pretty much the same as the list of
> advantages brought by having any struct construction in the language.
> All I'm asking is that a possibility be added so that structures can be
> used in one of the most important scenarios for structured data: binary
> I/O protocols.
I agree with all the possible advantages, however, one problem associated
with this is endianness - if I have a fixed size integer that is not the
same size as the basic character type of the host system, how do I know how
to write it tothe binary IO stream, or read it from such a stream? Take a
simple example - a fixed, 16-bit type, on a system with 8-bit char. Do we
read/write high byte first, or low byte first? OK, now move to a system with
a 16-bit char, what order will the bits be in now? On a system with a 32-bit
char, which 16 bits of our single read "byte" do we need, and in what order?
On more esoteric systems, with 9-bit char, or 17-bit char, what do we do
then?
This is a fundamental issue, and needs to be addressed before such packed
structs can be used for IO. Basically, we need some support for bitstreams
(rather than char streams), with defined "endianness", so a multi-bit object
is always written and read the same on all systems.
Anthony
--
Anthony Williams
Software Engineer, Nortel Networks Optoelectronics
The opinions expressed in this message are not necessarily those of my
employer
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Mon, 21 May 2001 20:25:09 GMT Raw View
On Mon, 21 May 2001 18:45:18 GMT, glancaster <glancaster@ntlworld.com> wrote:
> NM =Niklas Matthies
> GL = Garry Lancaster
>
> NM: Again, these operations are something that people who need to access
> NM: data records with packed members need to do anyway.
>
> GL: Who are these people? What are they actually trying to do? See
> GL: point 6 below.
Me, and I'm certainly not alone. As someone else already explained in
this thread, the main purpose is I/O of binary formats that have a
non-padded (or at least implementation-independently padded) and
sometimes misaligned structure, and to safely and portably read und
write to the fields of such structures.
> NM: This proposed
> NM: extension allows them to let the compiler choose the most efficient way
> NM: to do this, and to not clutter the source code with these operations.
>
> GL: And by hiding what is actually happening, increasing the liklihood
> GL: that people will use this inefficient technique, even when they
> GL: don't actually have to.
It's still within the spirit of C++ to allow people to shoot themselves
in their foot. I don't really buy that argument; people who decide to
use packed structures without being aware that they trade of space for
time will get what they deserve. This is similar to, say, using
bitfields where they aren't appropriate.
> NM: One approach that seems straightforward to me
> NM: is that `packed' would be a qualifier in object declarations (much like
> NM: const and volatile), not in type definitions. E.g.:
>
> NM: A a; // regular non-packed A
> NM: packed A pa; // packed A
>
> NM: All members of a packed-declared object would be "packed" themselves,
> NM: recursively. Alternative versions of member functions would be generated
> NM: when necessary.
>
> GL: Sometimes the compiler would not have the information to do this i.e.
> GL: if the definition of a member function was in another translation unit.
> GL: You need help from the linker or a pre-link stage. This is a can of
> GL: worms. Avoid it. Ban non-POD types from packed structs (I think the
> GL: OP may have suggested this.)
I agree that this would be a reasonable trade-off.
> NM: It would be undefined behaviour to access packed objects
> NM: through pointers to the non-packed-qualified type (similar as with const
> NM: and volatile). For non-aggregate types, "packed" translates to merely
> NM: "not necessarily aligned". E.g. in
>
> NM: struct A { char c; int i; };
> NM: packed A pa;
>
> NM: the type of `pa.i' would be `packed int', the type of `&pa.i' would be
> NM: `packed int *', and the compiler would know that for accesses to `packed
> NM: int' it needs to generate misalignment-safe code.
>
> GL: So you have abandoned your original idea of disallowing pointers and
> GL: references to members and now added new types of pointers and
> GL: references to the language.
Yes.
> GL: Which conversions will you allow between
> GL: the different pointer and reference types?
> GL: It seems that T* => packed T* and T& => packed T& would be an
> GL: obvious ones to allow, but are flawed.
For non-aggregate types, T -> packed T should be fine. For aggregate
types, member accesses wouldn't generally behave as expected any more; I
suppose that's what you mean by their being flawed.
> What about packed T* => void*?
Seems okay to me.
> GL: Are any conversions implicit or do they require const_cast<>,
> GL: static_cast<> or reinterpret_cast<>?
I personally wouldn't mind at all banning implicit conversions.
The introduction of a packed_cast<> that is valid for those conversions
where member accesses using the target type still have well-defined
behavior, would make sense. Const_cast<> doesn't make sense (or will
even be dangerous), and reinterpret_cast<> would take away too much of
the benefit of packed structures (i.e. safe accesses). Extending
static_cast<> to cover this case might be an option, but a dedicated
cast operator seem more sensible.
> GL: Existing C++98 code will not accept your new pointer and
> GL: reference types, unless you define conversions
> GL: from packed to current pointer/ref types.
Them only reasonable types for passing pointers to existing code are
void * and unsigned char *. The purpose of the proposed packed structure
facility is exactly for such code that needs to handle binary structures.
Most often, this code will be located within a single module, or a small
number thereof, therefore I wouldn't expect passing pointers to such
structures around be a very common need, at least not where the type of
the pointer is of interest.
> GL: The standard library will of course need to be updated to
> GL: accept packed refs and pointers when sensible - this could greatly
> GL: increase it's size.
I don't think so. Packed pointers can have the same representation as
void * or unsigned char *, so there should be no problem passing them to
library functions that process raw memory or character strings. I see no
need to have them passed to library functions for other purposes. What
functions are you thinking of?
> GL: BTW, what is the type of &pa in your example? packed A* or A*?
It is packed A*.
> NM: Well, I didn't make the original proposal, I'm just defending its
> NM: purpose, and yes, I obviously need to make things up as we go along,
> NM: just as you point out possible problems. IMHO that's a Good Thing.
>
> GL: I hope you think it is fair to point out how much you have extended the
> GL: original, intuitively appealing, idea, in order to get it to work.
Yes. I don't think I have extended the original idea, though, but just
pointed out mechanisms that would makes it work, as objections where
raised to the implementability of the idea.
> GL: So far the mods are:
>
> GL: 1. Compiler to add extra invisible copies to cope with non-alignment.
I don't conceptually see them as copies, no more than in arithmetic
expressions, the values of subexpressions are "copies", or the accesses
of bit-fields involve copies. Copying into an aligned object is one
possible implementation strategy, yes, but certainly not a generally
required one and only needed on particular architectures. The rationale
is that the compiler should know the most efficient way to implement
this, so the programmer should not have to fiddle around with
complicated expressions involving unsafe casts and hope for the
compiler's optimizer to be smart enough (which it generally isnt't) to
see that what the programmer actually wants is to read the value out of
a possibly misaligned object.
> GL: 2. New types of pointers and references added to language and
> GL: library e.g.
> GL: packed int*. How these work exactly is not currently defined.
To the language. As said above, I currently don't see the library
relevance. As for how they work, details would certainly need to be
hammered out, but as long as unpacked packet T pointers can be passed
around and can be converted to void * and character pointers, I guess
that everyone that has a need for packed structures will be content.
Remember that I didn't attempt at any time to make any proposal readily
fit to be added to the standard, but merely defend the implementability
of the concept of a packed structure.
> GL: 3. (a - If you take my advice) Non-POD types cannot be packed.
> GL: (b - If you don't) Requirement for pre-linker or linker
> enhancements.
Yes.
Since I currently do not plan at all to write any specific proposal,
the question of whether I "take your advice" is moot. We are merely
discussing the possibilities to get a grip on their feasability,
benefits and drawbacks.
> GL: And there's some other points that need addressing.
>
> GL: 4. Do you allow bit-fields in packed structs. If so, will you specify
> GL: their binary layout (currently implementation defined)?
It might make sense to do so. But it may possibly be wiser to let it be,
or to provide this functionality by different means to prevent people to
come to the wrong conclusion that bit-fields in non-packed structures
have an implementation-independent layout.
> GL: 5. Do you want to pack down to the *bit* level? Important for bit-fields
> GL: and (for some people's ideas of) the proposed fixed-size types.
No, of course not, and I don't think some people's ideas of the proposed
fixed-size types are very well founded if we want CHAR_BIT to continue
to be implementation-defined.
> GL: 6. Last, but not least, what do you want packed structs for? In my
> GL: experience people ask for something like this to
>
> GL: (a) map exactly to a hardware structure
>
> GL: (b) create a binary standard file layout with a
> GL: single call to a binary file output function, passing a pointer to the
> GL: struct.
>
> GL: The proposed solution is massive overkill for (a) and
> GL: insufficient for (b).
As I wrote above, primarily (b), although it is certainly suitable for
(a), and might also be used in situations unrelated to binary I/O where
space efficiency at any cost has highest priority.
The sketched mechanism is sufficiently helpful for (b) for my tastes.
It solves one particular and well-defined problem, namely the implemen-
tation-definedness and non-suppressability of the placement of padding
bytes within structures. There's a lot of things it doesn't solve, like
endianness, represention of signed integers and of floating types and
the like, and the inherent and practically unfixable tiedness of binary
formats (and I/O) to the architecture-dependant byte size.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: jk@steel.orel.ru (Eugene Karpachov)
Date: Tue, 22 May 2001 20:27:43 GMT Raw View
Mon, 21 May 2001 18:48:14 GMT Anthony Williams =CE=C1=D0=C9=D3=C1=CC:
>Thinking about it, though, why can't we just use bitfields?
Because their placement is not standardized.
--=20
jk
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Fri, 18 May 2001 11:57:21 GMT Raw View
> > Chris Newton suggested:
> > > 1. Fixed-size integral types
> >
> > Which ones? Bear in mind that, as the standard currently works, they
> > would have to be implementable on all machines that can currently
> > support C++. I fear this would not be portable.
Niklas Matthies:
> All fixed-size integral types are implementable on all machines. Not
> necessarily using architecture-buildin "types", but the standard does
> not require this anyway.
The standard does not currently require fixed-size integral types *at all*.
I'm curious as to how you would implement, say, an 8 bit type on a machine
with 9 bit bytes.
> > > 2. Packed structures
> >
> > Different machines have different alignment requirements. So, again, I
> > can't see a portable way of doing this.
>
> The compiler has always the option to generate code that loads and
> stores members of packed structures with multiple (aligned) loads and
> stores of smaller memory objects (bytes, in the worst case).
For POD types the *programmer* has the option of doing this *now* (3.9/2).
You don't need an extension.
For other types, C++ does not currently require the necessary copy operation
to be supported. For a good example why not, consider a class with a custom
copy ctor and custom copy assignment. Essentially, you seem to be arguing
that it is OK to bypass these and use bitwise copy. I'm sure you are
familiar with the arguments against this.
> Of course,
> taking pointers or references from members of packed structures would
> have to be disallowed
A significant change and not one that makes the language more intuitive.
> (or their dereferencing undefined),
No. Some machines place requirements on pointers even when they are not
explicitly dereferenced (see the recent thread on "deleted pointers in a
container").
Kind regards
Garry Lancaster
Codemill Ltd
mailto << "glancaster" << at << "codemill" << "net";
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Fri, 18 May 2001 16:41:33 GMT Raw View
On Fri, 18 May 2001 11:57:21 GMT, glancaster <glancaster@ntlworld.com> wrote:
> > > Chris Newton suggested:
> > > > 1. Fixed-size integral types
> > >
> > > Which ones? Bear in mind that, as the standard currently works, they
> > > would have to be implementable on all machines that can currently
> > > support C++. I fear this would not be portable.
>
> Niklas Matthies:
> > All fixed-size integral types are implementable on all machines. Not
> > necessarily using architecture-buildin "types", but the standard does
> > not require this anyway.
>
> The standard does not currently require fixed-size integral types *at all*.
I meant to say that the standard doesn't require the C++-builtin types
to be implemented as architecture-builtin types. Hence this shouldn't be
an impedement for new C++-builtin types.
> I'm curious as to how you would implement, say, an 8 bit type on a
> machine with 9 bit bytes.
By letting it have 1 padding bit.
At least I assume that the original poster doesn't actually want a type
that always occupies exactly N bits (and no more) in memory, but one
whose range of values is exactly [0, 2^N -1] or [-2^(N-1), 2^(N-1)-1].
> > > > 2. Packed structures
> > >
> > > Different machines have different alignment requirements. So, again, I
> > > can't see a portable way of doing this.
> >
> > The compiler has always the option to generate code that loads and
> > stores members of packed structures with multiple (aligned) loads and
> > stores of smaller memory objects (bytes, in the worst case).
>
> For POD types the *programmer* has the option of doing this *now* (3.9/2).
> You don't need an extension.
You need an extension to declare and access such misaligned members in
the same simple syntax as you use for regular aligned members today.
Ideally, inserting or removing some qualifier (say, "packed") in the
type's declaration should be sufficient to switch the type between being
packed or being non-packed, and the code accessing its members needn't
have to be changed.
> For other types, C++ does not currently require the necessary copy
> operation to be supported. For a good example why not, consider a
> class with a custom copy ctor and custom copy assignment. Essentially,
> you seem to be arguing that it is OK to bypass these and use bitwise
> copy. I'm sure you are familiar with the arguments against this.
I'm truly confused about what problem you're seeing here. I'm certainly
not arguing anything like bypassing custom copy ctors and custom
assignment.
> > Of course, taking pointers or references from members of packed
> > structures would have to be disallowed
>
> A significant change and not one that makes the language more intuitive.
You already have that situation for register variables, bit fields and
temporaries. The features proposed here are not to make the language
more intuitive, but more useful.
> > (or their dereferencing undefined),
>
> No. Some machines place requirements on pointers even when they are not
> explicitly dereferenced (see the recent thread on "deleted pointers in a
> container").
Ok, make their use undefined (since it isn't possible on all
architectures to handle comparisons and casts correctly for such
pointers).
Making just their dereferencing undefined was to allow for members of
packed structures that happen to be aligned, or on architectures where
misalignmend only result in no more than performance penalties, to have
their address be taken.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "James Kuyper Jr." <kuyper@wizard.net>
Date: Fri, 18 May 2001 16:47:27 GMT Raw View
glancaster wrote:
...
> I'm curious as to how you would implement, say, an 8 bit type on a machine
> with 9 bit bytes.
By using only 8 of the 9 bits, the same way that bit-fields are
implemented in structures.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Sat, 19 May 2001 08:57:37 GMT Raw View
> > > > Chris Newton suggested:
> > > > > 1. Fixed-size integral types
Garry Lancaster:
> > > > Which ones? Bear in mind that, as the standard currently works, they
> > > > would have to be implementable on all machines that can currently
> > > > support C++. I fear this would not be portable.
> > Niklas Matthies:
> > > All fixed-size integral types are implementable on all machines. Not
> > > necessarily using architecture-buildin "types", but the standard does
> > > not require this anyway.
Garry Lancaster:
> > The standard does not currently require fixed-size integral types *at
all*.
Niklas Matthies:
> I meant to say that the standard doesn't require the C++-builtin types
> to be implemented as architecture-builtin types. Hence this shouldn't be
> an impedement for new C++-builtin types.
Because the ranges of the current builtins have very few constraints, it is
(almost?) always possible for them to be implemented as architecture-builtin
types, even though the standard doesn't say so.
> > I'm curious as to how you would implement, say, an 8 bit type on a
> > machine with 9 bit bytes.
>
> By letting it have 1 padding bit.
Ah. I see where you're coming from now.
> At least I assume that the original poster doesn't actually want a type
> that always occupies exactly N bits (and no more) in memory, but one
> whose range of values is exactly [0, 2^N -1] or [-2^(N-1), 2^(N-1)-1].
This would be rather inefficient and annoying to implement on machines that
don't have N size bytes, since it requires extra operations to keep them in
range. This "tyranny of the many" is arguably against the "spirit" of C++.
As I said before, I think the idea has merit as an optional part of the
standard for certain machine classes.
> > > > > 2. Packed structures
> > > >
> > > > Different machines have different alignment requirements. So, again,
I
> > > > can't see a portable way of doing this.
> > >
> > > The compiler has always the option to generate code that loads and
> > > stores members of packed structures with multiple (aligned) loads and
> > > stores of smaller memory objects (bytes, in the worst case).
> >
> > For POD types the *programmer* has the option of doing this *now*
(3.9/2).
> > You don't need an extension.
>
> You need an extension to declare and access such misaligned members in
> the same simple syntax as you use for regular aligned members today.
> Ideally, inserting or removing some qualifier (say, "packed") in the
> type's declaration should be sufficient to switch the type between being
> packed or being non-packed, and the code accessing its members needn't
> have to be changed.
I take it you mean the "source code" when you say "code" in your last
sentence? The code the compiler generates will have to change a lot.
For example, you mean that on some machines a structure such as this:
packed struct A
{
char c;
int i; // Not aligned.
};
would require a simple ++i to generate:
1. A non-aligned copy to an aligned integer i_temp.
2. Increment aligned i_temp.
3. A non-aligned copy back to non-aligned i.
Although C++ often inserts extra non-visible code behind the scenes AFAIK it
has never done this before for basic operations on fundamental types like
ints.
> > For other types, C++ does not currently require the necessary copy
> > operation to be supported. For a good example why not, consider a
> > class with a custom copy ctor and custom copy assignment. Essentially,
> > you seem to be arguing that it is OK to bypass these and use bitwise
> > copy. I'm sure you are familiar with the arguments against this.
>
> I'm truly confused about what problem you're seeing here. I'm certainly
> not arguing anything like bypassing custom copy ctors and custom
> assignment.
For example,
class WithCustomCopy {...}; // Requires 4 byte alignment.
packed struct A
{
char c;
WithCustomCopy wcc; // Unaligned
};
How do you copy a normal WithCustomCopy object to wcc or vice versa? You
can't use the copy assignment operator it defines because that won't copy to
or from an unaligned location (plus no references allowed, remember?). Ctors
and dtors have the same problems.
> > > Of course, taking pointers or references from members of packed
> > > structures would have to be disallowed
> >
> > A significant change and not one that makes the language more intuitive.
>
> You already have that situation for register variables, bit fields and
> temporaries. The features proposed here are not to make the language
> more intuitive, but more useful.
Intuitive and useful go hand in hand. (Incidentally, you are wrong about
register variables (7.1.1/3) and, in some situations, temporaries - you can
bind a const reference parameter to a temporary.)
It seems to me that what you propose is not really a "packed structure" at
all, but a new type of aggregate that has some of the rules of structures
and some completely new rules that you just made up.
For me, the cost/benefit balance is not persuasive.
[snip!]
Kind regards
Garry Lancaster
Codemill Ltd
mailto << "glancaster" << at "codemill" << dot << "net";
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "glancaster" <glancaster@ntlworld.com>
Date: Thu, 17 May 2001 17:37:38 GMT Raw View
Chris Newton suggested:
> 1. Fixed-size integral types
Which ones? Bear in mind that, as the standard currently works, they would
have to be implementable on all machines that can currently support C++. I
fear this would not be portable.
> (and other useful common types)
I certainly agree with this, but it isn't terribly specific ;-)
> 2. Packed structures
Different machines have different alignment requirements. So, again, I can't
see a portable way of doing this.
Basically, given the range of machines C++ currently supports it can't as
standard offer these things. To break with this pattern and offer optional
parts of the standard, for certain machine classes, is certainly being
considered, according to presentations from C++ standard committee members
at the recent ACCU conference.
For instance, 8, 16 and 32 (and 64?) bit fixed-size integrals could
certainly be added to an optional part of a standard for "machines that can
run Java" only. But it has to be made clear that anyone using these features
is sacrificing some portability.
Kind regards
Garry Lancaster
Codemill Ltd
mailto << "glancaster" << at << "codemill" << dot << "net";
Visit our web site at http://www.codemill.net
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: news/comp.std.c++@nmhq.net (Niklas Matthies)
Date: Thu, 17 May 2001 18:21:53 GMT Raw View
On Thu, 17 May 2001 17:37:38 GMT, glancaster <glancaster@ntlworld.com> wrote:
> Chris Newton suggested:
> > 1. Fixed-size integral types
>
> Which ones? Bear in mind that, as the standard currently works, they
> would have to be implementable on all machines that can currently
> support C++. I fear this would not be portable.
All fixed-size integral types are implementable on all machines. Not
necessarily using architecture-buildin "types", but the standard does
not require this anyway.
> > 2. Packed structures
>
> Different machines have different alignment requirements. So, again, I
> can't see a portable way of doing this.
The compiler has always the option to generate code that loads and
stores members of packed structures with multiple (aligned) loads and
stores of smaller memory objects (bytes, in the worst case). Of course,
taking pointers or references from members of packed structures would
have to be disallowed (or their dereferencing undefined), but otherwise
I see nothing that prevents implementing packed structures on any
machine.
-- Niklas Matthies
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Stephen Howe" <SPAMstephen.howeGUARD@tnsofres.com>
Date: Wed, 16 May 2001 19:59:48 GMT Raw View
> 3. Random access binary file I/O
>
> Speaking of reading and writing binary files, I think it's long past
> time C++ acknowledged the existence of structured files as well as
> streamed I/O. A simple framework allowing random-access file I/O is well
> overdue.
Have I missed something?
What is wrong with fstream.read() & fstream.write()? They read/write binary
data. Granted you need a reinterpret_cast for types other than char but this
mirrors the functions fread(), fwrite() for stdio.
Stephen Howe
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Al Grant" <tnarga@arm.REVERSE-NAME.com>
Date: Tue, 15 May 2001 17:27:27 GMT Raw View
"Chris Newton" <chrisnewton@no.junk.please.btinternet.com> wrote in message
news:9dpi6h$rcr$1@uranium.btinternet.com...
> This appears to be a "must have" in many people's books, and rightly so,
> IMHO. I would prefer a parameterized notation, e.g., int<32>
You can do this with a traits type. It should also be
possible to say things like
template<class T> Num<T>::Unsigned abs(T n) ...
Easy to define this yourself but it should be in the
standard.
Making the types any more "built in" than that would be
confusing, as they are only aliases for some base type,
and overloading etc. will be in terms of that base type,
not the multiple synonyms by which it is referred.
(Of course C++ should adopt the C99 int stuff too.)
> While we're on the subject of types, I'd also like to see basic
> fixed-point support, either as a fundamental type or as a small addition
> to the library. All you need is basic arithmetic, basic I/O support and
> the *guarantee* that no rounding errors are going to creep in. Would
> suggesting something like fixed<N> (where N is the number of fixed
> places) be too obvious? :-) (OK, you probably also want to allow
> selection of the underlying integral type somehow.)
I would much rather see C++ advance to the point where
it is possible to define a fixed-point number class and
have it implemented efficiently. That means, for
example, supporting
fixed<5> n; // Null operation
fixed<7> n(20); // But has a constructor for this
union { fixed<6> n; ... };
volatile fixed<8> n; // Natural copy semantics
The problem with making fixed<> a built-in type (and as
an ex-PL/1 programmer I accept the usefulness) is that
the same problems remain for the next programmer who
wants a special type - saturating arithmetic being one
obvious example.
> 2. Packed structures
>
> I would like to see the addition of a keyword forcing a "packed"
> structure with no extra padding or alignment requirements (in
> particular, the sizeof the structure is the sum of the sizeof each
> member of the structure). You would also need a guarantee that the
> members of the structure would be stored in the order they were named.
Yes, this is long overdue and already implemented
(differently) by any real-world compiler. But there
are degrees of alignment between "unaligned" and
"natural alignment" and perhaps this should be reflected.
> This, combined with the fixed-size types above, would finally allow C++
> to read in data from a binary file on disk without hassle
Apart from the endianness problem. This should also
be solved. I have in the past done this with a set of
"wrong-endian" integer wrapper classes (whose assignment
operators etc. do the byte swapping) but ran into the
fundamental efficiency problems I mentioned above.
> 3. Random access binary file I/O
>
> Speaking of reading and writing binary files, I think it's long past
> time C++ acknowledged the existence of structured files as well as
> streamed I/O.
You mean, wrap the C library functions fseek, ftell etc.?
This would be a good idea, if not exactly urgent
(because of the C functions).
It would also be nice if there were more C functions
(wrappable in C++ classes) for things such as sparse
file maps, multiple name spaces, multiple data streams,
and locks.
But what I'd really like to see, and where C++ could
show off, is a class that provided a "memory mapped"
view of a random-access file - where you could obtain
data structure pointers directly into file buffers,
rather than copying the data to and from data
structures you allocate yourself. This is not
trivial (unless the OS provides it as a primitive,
which in practice requires 64-bit pointers), but even
a half-decent implementation would make binary file
updating a lot easier.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]
Author: "Chris Newton" <chrisnewton@no.junk.please.btinternet.com>
Date: Mon, 14 May 2001 23:06:15 GMT Raw View
Dear all,
After following the recent threads on C++0x, I am surprised to find so
few comments about support for binary data. I see this as one of the
biggest problems with the C++ standard library, and to a lesser extent
the language itself, at present.
I would like to invite comments on the following new features to be
added.
1. Fixed-size integral types (and other useful common types)
2. Packed structures
3. Random access binary file I/O
1. Fixed-size integral types (and other useful common types)
This appears to be a "must have" in many people's books, and rightly so,
IMHO. I would prefer a parameterized notation, e.g., int<32>, to
something like __int32 on flexibility grounds. The former could allow
symbolic constants to represent the width of the type, easy selection of
the largest available integral type with help from <limits>, etc. The
latter just seems unnecessarily restrictive. At this point, short, int
and long just match onto int<N> for suitable N, and there are no
problems with anticipating the name of longer types or types of unusual
size on unusual platforms.
I'd also like to see an overhaul of character representation. In
particular, I'd like to break the association between char (as an entry
in the character set, with matching literals, etc.) and char (as the
only sensible way to represent an 8-bit byte on many current platforms).
Perhaps a new "byte" type to match short, int and long, breaking the
association with char would be in order? That could, of course, just be
a familiar alias for int<8>. Actually, it might also be sensible to
allow char<N> representation for fixed-size characters (e.g., matching
UTF-8, UTF-16 and UTF-32). Again, char and wchar_t then simply become
shorthand for suitable char<N> types. As internationalisation and larger
character sets become more important over the next few years, it would
be good to see C++ having a head start in supporting them.
While we're on the subject of types, I'd also like to see basic
fixed-point support, either as a fundamental type or as a small addition
to the library. All you need is basic arithmetic, basic I/O support and
the *guarantee* that no rounding errors are going to creep in. Would
suggesting something like fixed<N> (where N is the number of fixed
places) be too obvious? :-) (OK, you probably also want to allow
selection of the underlying integral type somehow.)
2. Packed structures
I would like to see the addition of a keyword forcing a "packed"
structure with no extra padding or alignment requirements (in
particular, the sizeof the structure is the sum of the sizeof each
member of the structure). You would also need a guarantee that the
members of the structure would be stored in the order they were named.
This, combined with the fixed-size types above, would finally allow C++
to read in data from a binary file on disk without hassle and
platform-specific extensions to guarantee that the data is read as you
want. I'm sure a sensible and obvious set of rules for when packed is
allowed could be quickly worked out, perhaps starting from the
requirement that a packed struct be a POD type.
3. Random access binary file I/O
Speaking of reading and writing binary files, I think it's long past
time C++ acknowledged the existence of structured files as well as
streamed I/O. A simple framework allowing random-access file I/O is well
overdue. Just provide a hierarchy by analogy with the fstream hierarchy,
but with input and output methods taking a position, a length and a
void* for the data (which might, he noted, point to a packed struct
known to be in the correct format for the file's data).
Each of the above features is used every day in thousands of programs,
yet currently requires platform- or compiler-specific extensions or
guarantees to work. While, inevitably, there will be a small number of
specialist platforms where these features are not appropriate and/or
viable, they would be useful to very, very many C++ programmers. I see
them as simple additions, long overdue, and hope that the community here
will support them.
Regards,
Chris
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.research.att.com/~austern/csc/faq.html ]