Topic: Serialization and RTTI/Introspection
Author: allan_w@my-dejanews.com (Allan W)
Date: Tue, 8 Apr 2003 20:33:31 +0000 (UTC) Raw View
eforren@cox.net ("Eddie Forren") wrote
> Currently, RTTI works for classes that have virtual functions.
> This makes sense because polymorphic type info needs to be
> accessed at run time. Perhaps this is not general enough
> though. Maybe RTTI should work for anything that gets allocated
> using the new operator ( or an overloaded, rtti version of the
> new operator). This would allow run time type information to be
> accessed for any pointer that is created using the "rtti" new
> operator(s).
What you're requiring would require massive amounts of memory at
runtime.
int * i = new int(3);
unsigned * j = new unsigned;
Now we need RTTI for type int.
What happens when we try to get RTTI information for RTTI information?
Currently most systems treat int either identically to short or long.
But now we would need to carry around extra information so that the RTTI
on a short would give us different information than the RTTI on a long.
> After all, void* is a polymorphic pointer (sort of). One should be
> able to determine the data type and length that is referenced by
> void*.
Wouldn't the RTTI just tell you that the type is void?
> This solution could also solve the ambiguity of normal type*
> declarations and char* declarations. Type infomation for
> all non-pointer types should be available to the compiler at
> compile time.
>
> One way to do this:
>
> Add a new specifier called "rtti" that can be used for
> any class, struct, enum, or type definition. It flags a data type
> as needing run time type identification. You could also have a
> compiler flag to add rtti to all data types that are compiled
> in a module.
>
> rtti struct my_struct
> {
> int* my_array;
> }
>
> When new is invoked for and "rtti" data type, operator new
> with the following signature is invoked.
>
> void* operator new(size_t size, type_info* rtti_type_info)
> void* operator new[](size_t size, type_info* rtti_type_info)
>
> The implementer of operator new can use "cookie" mechanisms to
> associate the type information with the data instance.
Instead of changing the language to improve the hacks like RTTI,
I think it is doing things to encourage programmers to avoid the
need. Thoughtful use of virtual functions can (almost?) always
eliminate the need for RTTI, other than during debugging efforts.
The OP (Eddie Forren) had a good point about pointers being
ambiguous -- sometimes you can use them as the base of a heirarchy
of inherited types, and sometimes you can use them as a pointer to
the first item in an array, but never both at once -- and the pointer
itself doesn't even try to handle this information.
What can we do about it at this late date? Probably nothing. It might
have been nice to make these two declarations mean different things:
int *mypointer;
int myarray[];
but C++'s original design goal of C compatibility has rendered this
idea moot. With so much legacy code, making the change now is neigh
impossible.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Thu, 10 Apr 2003 05:48:56 +0000 (UTC) Raw View
> What you're requiring would require massive amounts of memory at
> runtime.
My assumption was that the amount of RTTI generated is controlled by the
user using the rtti specifier. -- I got another email from Aatu Koskensilta
that detailed an approach that uses template specialization to control which
meta information gets generated by the compiler. Perhaps this is needed
with
my overloaded new operator instead of the rtti specifier.
To me, if you keep the meta information, the new operator idea doesn't
change
the memory requirements much. You still have one copy of meta-information
for
each data type. If you choose to overload the rtti new operator, you may
choose
to store a pointer to it with each data instance, or you may choose some
other
scheme to associate type information with instances at run time.
> What happens when we try to get RTTI information for RTTI information?
It doesn't have to be defined that way -- don't use the rtti specifier or
Aatu's
mechanisms and the meta-information won't exist for the "type_info" data
type.
Using Aatu's mechanisms may be better (If I understand them) because you
may be able to control how much type information is generated.
> Currently most systems treat int either identically to short or long.
> But now we would need to carry around extra information so that the RTTI
> on a short would give us different information than the RTTI on a long.
The idea here is that this is optional and that this would be equivilant to
Java's Integer, Float, and other "primitive" wrapper objects.
> Wouldn't the RTTI just tell you that the type is void?
Depends on how the new operator is implemented. If the new
operator chose to put a type information pointer as a cookie before the
memory of every allocation, then void* would basically be polymorphic.
> Instead of changing the language to improve the hacks like RTTI,
> I think it is doing things to encourage programmers to avoid the
> need. Thoughtful use of virtual functions can (almost?) always
> eliminate the need for RTTI, other than during debugging efforts.
I think this is frequently true, but I think the having a full and
complete self-description capabilities in a language is worth small language
extensions
if:
1. It can be made optional so that its overhead is not forced upon
those that do not need it.
2. It can be reasonably efficient when it is used.
3. The original spirit and intent of the language is maintained.
(For C++, this is mostly
effiiciency).
The power of self-description is evident in languages like Smalltalk and
Java --.
Extremely generic software can be written when it is needed. (Of course,
this feature can be over-used when
simpler designs are sufficient).
My current interest is in using these self-description techniques in
heterogeneous, distributed systems, but
they can also be used in databases. Any other apps? Note: Currently in
C and C++
approaches for serialization, remote procedure calls, and databases,
frequently require significant external
languages. If "complete" self-description capabilities are provided as part
of most languages, the need for
some or part of these external languages goes away (or at least parts of the
external language can be generated from
native language(s)). This seems to fit in the area of responsibility for
languages when it can be designed in
a way that meets the intent and spirit of the language.
If "complete" self-description capabilities are not provided, then
cumbersome 3rd party languages that maintain
duplicate information with languages remain necessary for several generic
applications (IDL for distributed systems, DDL for databases). Applications
like generic GUI's and GUI builders are more difficult without this feature.
If the desired functionality (perhaps what I have stated isn't desired) can
implemented
without language extensions, I'm all for it.
> What can we do about it at this late date? Probably nothing.
My answer is probably nothing if I wasn't interested in providing "complete"
self-description
information. If self-description capabilities are really being considered
for C++, I think
completeness of the type information at compile time (new array, string,
declarations, etc) or at
run time (rtti new operator) may be necessary to make those self-description
capabilities
as powerful as they need to be.
> but C++'s original design goal of C compatibility has rendered this
> idea moot.
If new declarations were added, they should be added to C also.
If they are added to both languages, they are still backward compatible,
just
not forward compatible. Almost nothing is forward compatible. (old
software running
against new features). Legacy code would only need to "evolve" forward on
an
"as needed" basis.
Thanks for your input.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Fri, 4 Apr 2003 05:24:25 +0000 (UTC) Raw View
Thanks to your earlier email, I learned about the array "cookie" that
indicate
the length of the array when using then new operator. I also have been
using this technique in some of my own APIs (C- not C++) also. This is a
response to Allan W's comments also.
Given this info, I will ask my original question again (a little differently
this time).
Why not allow declarations of dynamic arrays in C++? The main purpose is
to improve the quality of the type information maintained by an extended run
time type identification system. If programmers started declaring arrays
with the new declaration, the ambiguity of the type* declaration would be
removed for most normal data structures.
Something like:
class my_class
{
int my_array[]; // The compiler may choose to implement this with
int* and an array "cookie" for
// the length. It could also be
implemented as a structure with the array
// length followed by
// an int*.
}
These arrays would require a few new standard C/C++ functions:
malloc_array - Same parameters as calloc except an array is
returned.
An array consists of the length of the array
and its data.
Of course it can be accessed with the []
operators just like
a normal buffer could be accessed.
realloc_array - Same parameters as malloc_array except a parameter is
added to pass in the array to be re-allocated.
free_array - Frees an array allocated or reallocated using malloc_array
or
realloc_array.
init_array - Allows an array to be initialized from an external buffer.
Same parameters as realloc_array. No allocation occurs
for this function. The assumption is that the user is
maintaining
the memory somewhere (static, on the stack, in a memory
mapped file, on the heap, etc.).
The default new and delete array operators would use some of the new
functions internally.
new operator for arrays - calls malloc_array
new placement operator for arrays - calls init_array
delete operator for arrays - calls free_array
delete operator for arrays allocated with the new placement
perator?? - I
don't remember how this works now - How do you keep from calling
free/free_array here?
There has to be a function or a field that can be used to access the length
of the array.
int my_array[];
my_array = new int[10];
cout << "Length of array is " << my_array.len << endl;
or
cout << "Length of array is " << array_len(my_array) << endl;
Advantages:
- Run time type information is more complete. This will make the
extended run time type information more powerful.
- It is a very small extension that is easy to implement.
- There are not any backward compatibility issues? Old style
type*
arrays are still allowed. Hopefully, use of the new array
declaration
would be encouraged. Of course, serialization of "old" style
declarations using extended run time type identification
information
would be difficult.
- These extensions are also applicable to C also. They are not as
useful because C isn't considering the implementation of
extended
run time type information (at least, I don't think so).
- Because of the completeness of the new extended run time type
information, automatic serialization can easily be implemented
for a large set of C++ data structures without knowledge of
non-primitive data types.
Disadvantages:
- ????????
Although I don't have all the details completely worked out,
I can make a similar argument for standardizing the declaration of strings
at
the C level of the language. The idea is disambiguate the char*
declaration by introducing a new built in type called string. Other than
the declaration, very little about strings would change --
struct my_struct
{
string my_string; // This is a string.
Probably implemented
// with a string
"cookie" that
// maintains
allocated length of the string
char my_array[]; // This is a new style
dynamic array
char* my_old_style_buffer; // This is an old style buffer
which has an
// ambiguous
declaration
// You can't tell if
it is a string, a pointer to
// a character, or a
pointer to a character
// array
}
malloc_string, realloc_string, free_string, init_string??
new (string) and delete(string)
str.buf_len or string_buflen(string)
Existing string functions would still work if the "cookie" implementation
was
chosen.
New standard string functions could be implemented to automatically
reallocate
strings when they are updated.
Example:
string my_string;
my_string = new string("");
my_string = str_sprintf(my_string, "Update a string and automatically
reallocate it length to %d
bytes", 66);
printf("Real string length is %d\n", strlen(my_string);
etc....
Any opinions are greatly appreciated.
"Rayiner Hashem" <heliosc@mindspring.com> wrote in message
news:a3995c0d.0303312112.7ea013f8@posting.google.com...
> To tell the truth, I really don't think there is any other way to
> support certain cases. For the most part, you can use RTTI to
> statically determine the size of objects. For example, the static size
> of all stack variables can be determined by the compiler. For *most*
> objects allocated on the heap, you can also RTTI. If you're
> serializing a pointer to a non-polymorphic type, then you're sure of
> the static size. If you're serializing a pointer to a polymorphic
> type, you can use the RTTI info in the v-table to get the size. The
> only case where you *need* the memory manager is in the case of
> arrays. Existing C++ constructs already require the memory manager to
> keep some accounting information. Take an operation like:
> int * i = new int[256];
> Usually, the compiler will turn this into __vec_new(size_t sz, size_t
> count). __vec_new() is pretty much required to store count before the
> start of the array, so the array can be freed later. There isn't
> really another way of handling it. So if you want to serialize the
> array 'i', you can use the same 'count' token that delete[] would use.
>
> > Using the C++ memory manager is great for prototyping the extended type
> > information interfaces + serialization code, but I'm not sure it is the
best
> > language
> > choice from the perspective of implementing a generic serializer library
> > that
> > uses run time type information.
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Fri, 4 Apr 2003 05:26:04 +0000 (UTC) Raw View
===================================== MODERATOR'S COMMENT:
Please do not overquote in responses, and ideally follow
the convention of placing your respond after the material to which
you are replying.
===================================== END OF MODERATOR'S COMMENT
> The major snag I've run into is how to write the serialization
> routines. If a class declares the serialization routine as 'friend'
> then it is fairly easy for an external tool to generate a function
> that can use the compiler to access each member.
I thought about this a little bit. If I understand your solution correctly,
you are planning to declare serialization or type information access
routines as friends so that they can access private fields for the purposes
of extracting the offset of fields from the beginning of a class or
structure?
To get the compiler to generate the extended type information (types,
offsets, and length of fields/methods), some options are:
1. Use friend declarations to allow external routines to have access
private declarations within a class. Generate friend routines
that declare a static instance of the data type and populates a
static type information structure with offsets, lengths, and types
of
each field in a structure or class. I don't know how to figure out
the
address of the vtable(s) using this mechanism. There are other
restrictions here - no type information for classes with pure
virtual functions (this is probably okay for serialization
purposes).
FYI:
I have implemented a set of macros that allows self-descriptive
information to be "declared" manually within a class
implementation
file. This is very similar to the information you would be
generating
for a serializer. I didn't use friend, although I probably
should have.
Instead, I used a macro to declare the necessary type
information
access functions in the class itself. (one static function and
virtual
function to access the "extended" type information.). Macros
were also provided for automatically implement those functions:
Something like this in the .h file
class my_class
{
int my_field_1;
char* my_str_field_2;
DECLARE_CLASS(my_class)
}
In the cpp file (this is not exactly the macros)
BEGIN_CLASS_DEF(my_class)
INT_FIELD(my_field_1);
STRING_FIELD(my_field_1);
END_CLASS_DEF(my_class)
2. Override the implementation of the compiler generated virtual
and static functions used to access run time type
information. I'm not sure how feasible this is. I think you
can make the linker drop the compiler generated implementation
of these functions and replace them with your own
implementation.
Those functions should have access to all of the private fields
in a class.
The serializer would then use the "extended" type information to
do its work.
3. Does the information in GNUs -fdump_translation_unit contain field
offset information? What about offset information for vtbls? If it
does,
then use this information instead of generating C++ source file
that
uses the compiler to figure out the offsets. Still have to provide
some
type "rtti" like virtual function on the class?
"Rayiner Hashem" <heliosc@mindspring.com> wrote in message
news:a3995c0d.0303252120.1a2ba638@posting.google.com...
> You're definately going to need runtime support to implement
> serialization. The C++ memory manager has enough information about the
> exact sizes of objects, while the compiler knows about where pointers
> to other objects are. Put the two together, and you've probably got
> enough to implement reasonably transparent serialization. Nothing will
> ever handle stuff like this:
>
> int * ptrTable[256];
>
> struct my_struct
> {
> int idx; //index into ptrTable
> };
>
> But it's not entirely clear that you'd want the runtime to handle
> stuff like this anyway (too ambiguous, probably don't want to store
> the entire referred to by ptrTable along with my_struct, etc). I've
> been looking into hacking up some serialization code, and I think I'm
> making some (conceptually, at least) progress. There is a project
> (http://www.omegahat.org/GccTranslationUnit/) that has written a
> Python utility to parse GCC's internal compiler information. This
> might very well provide enough information for generic serialization.
> The major snag I've run into is how to write the serialization
> routines. If a class declares the serialization routine as 'friend'
> then it is fairly easy for an external tool to generate a function
> that can use the compiler to access each member. If you want
> transparency, then you're going to have to figure out how the compiler
> lays out data structures and emulate the algorithm in the external
> tool. This is possible for GCC, because the C++ ABI it uses is
> publically documented at CodeSourcery, but might not be possible for
> something like Visual C++. If anybody can come up with test cases that
> can't be deduced either from compiler info, or by the runtime, I'd
> appreciate it if you'd post on this thread.
>
> >
> > struct my_struct
> > {
> > int* my_int;
> > int* my_int_array;
> > }
> >
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: heliosc@mindspring.com (Rayiner Hashem)
Date: Fri, 4 Apr 2003 11:30:58 +0000 (UTC) Raw View
Of course, I realize the mechanism will be different in each case, I'm
simply giving examples from my experience with GCC. Since
serialization will most likely require support from the runtime
library (like exception handling, RTTI, operator new/delete, etc) you
can count on the serialization code knowing how the sizes of arrays
allocated by new[] are tracked. The point is that since delete[] must
know the size, the runtime must be able to find the size out *somehow*
and the serialization code can use the same mechanism.
Again, serialization will most likely be runtime and compiler
specific, just like RTTI and EH. In the case below, and in all stack
object cases, the compiler knows statically the exact sizes of all
arrays, because it days the stack frame layout itself. It can easily
pass this information to the runtime library, like it does for EH.
> But this simply can't work. First of all, the code that destroys an
> array isn't always called due to a delete[].
> foo() {
> MyObject o[10];
> }
Overriding new is the kicker. If the user overrides new[], then you're
right, the runtime library will have no clue how large the array is.
Only delete[] knows that information, and since delete[] is user
provided, the runtime can't use any information in it. Now if the user
just overrides scaler new (a much more common occurance?) then
uncertainty only kicks in for some corner cases. If the user is
allocating a polymorphic type, then RTTI can be used to find the exact
type and size of the object. However, the following situation breaks:
struct base {int var1;};
struct derived : public base {int var2;};
//overrides operator new
base * b = new derived;
serialize(b);
Now, the problem is that the user-provided operator delete knows the
exact size of the object, because it must track it somewhere, but the
serialize routine has no idea of what 'b' actually points to, and it
can't use RTTI to find out. Of course, this code is rather contrived
--- you shouldn't be declaring 'base' with a non-virtual destructor,
but I think it's still legal C++ and it would be nice to support most
legal C++ constructs in a serialization mechanism.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net (Eddie Forren)
Date: Tue, 8 Apr 2003 08:33:32 +0000 (UTC) Raw View
> Again, serialization will most likely be runtime and compiler
> specific,
It would be nice if ony RTTI were compiler specific and the serialization
routines simply used compiler specific information provided by extended
RTTI.
The point is that since delete[] must
> know the size, the runtime must be able to find the size out *somehow*
> and the serialization code can use the same mechanism.
Adding very simple dynamic arrays (essentially, a dynamic array declaration,
a way to determine the length of the array, and not much else) seems to be a
good,
non-compiler specific way to solve this problem for the delete operator
and the serializer. A standard way to determine the length is all that is
needed
for delete operator. The serializer needs the length and RTTI information
that is
not ambiguous for pointer and array fields and that is not ambiguous for
char* array,
char* pointer, and string fields. Another application for the extended
RTTI information
is remote procedure calls. Non-amiguous type declarations would help this
situation also
because array and string parameters would be easily identifiable. This does
bring up
a problem -- since static arrays can already be passed using a dynamic array
declaration,
dynamic and static arrays would need to compatible in some way (both have a
length field?)
void process_arrays(int array1[], int array2[])
int static_array[10];
int dynamic_array[];
dynamic_array = new int[20];
process_arrays(static_array, dynamic_array)
RTTI information for the process_arrays function would indicate
that both parameters are arrays of ints. The length of each array
should be determined using the same function or field (Java uses a field).
Now if the user
> just overrides scaler new (a much more common occurance?) then
> uncertainty only kicks in for some corner cases. If the user is
> allocating a polymorphic type, then RTTI can be used to find the exact
> type and size of the object. However, the following situation breaks:
>
> struct base {int var1;};
> struct derived : public base {int var2;};
> //overrides operator new
> base * b = new derived;
> serialize(b);
One solution is to force or automate the
declaration/implementation of virtual destructors for base classes.
Do we need a way to explicitly include RTTI in a class?
What about explicitly including extended RTTI (vs. normal RTTI) in a class?
One way is:
// class with extended run time type information
struct derived: public extended_rtti (int var2)
and
// class with normal run time type information
struct derived: public rtti {int var2)
These declarations would define and implement the
virtual rtti function that provides access to extended
or normal run time time information.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Tue, 8 Apr 2003 08:34:13 +0000 (UTC) Raw View
I had some different thoughts on this subject because of some
self-description APIs
I am working on for C (not C++ yet).
Currently, RTTI works for classes that have virtual functions. This makes
sense because
polymorphic type info needs to be accessed at run time. Perhaps this is not
general enough
though. Maybe RTTI should work for anything that gets allocated using the
new operator
( or an overloaded, rtti version of the new operator). This would allow run
time type
information to be accessed for any pointer that is created using the "rtti"
new operator(s).
After all, void* is a polymorphic pointer (sort of). One should be able to
determine
the data type and length that is referenced by void*. This solution could
also solve the
ambiguity of normal type* declarations and char* declarations. Type
infomation for
all non-pointer types should be available to the compiler at compile time.
One way to do this:
Add a new specifier called "rtti" that can be used for
any class, struct, enum, or type definition. It flags a data type
as needing run time type identification. You could also have a
compiler flag to add rtti to all data types that are compiled
in a module.
rtti struct my_struct
{
int* my_array;
}
When new is invoked for and "rtti" data type, operator new
with the following signature is invoked.
void* operator new(size_t size, type_info* rtti_type_info)
void* operator new[](size_t size, type_info* rtti_type_info)
The implementer of operator new can use "cookie" mechanisms to
associate the type information with the data instance.
.....
"Allan W" <allan_w@my-dejanews.com> wrote in message
news:7f2735a5.0303251946.133fec69@posting.google.com...
> eforren@cox.net ("Eddie Forren") wrote
> > Doesn't C and C++ have a fundamental type weakness in its declarations?
> > For example, if I declare a struct
> >
> > struct my_struct
> > {
> > int* my_int;
> > int* my_int_array;
> > }
> >
> > The meaning of the int* is ambiguous. One int* refers to a single
> > value and one int* refers to an array. Length information is
> > usually contained in a different field for arrays.
>
> The type of array you're suggesting is already part of standard C++.
> It's called "std::vector."
>
> In my opinion, std::vector should be considered beginner-level C++
> in schools, and "C-style" arrays should be considered advanced stuff.
> Unfortunately, as a C++ teacher I cannot use this with current
> textbooks, which require me to teach pointers and arrays pretty
> much at the same time.
>
> > This type weakness makes it difficult to implement serialization
> > generically using self-descriptive information. When I say generically,
I
> > mean that that the serializer should have knowlege of only a fixed
number of
> > pre-defined types and all user types should be constructed from those
types.
> > It would be nice if the meta-information was complete enough so that STL
> > style collections could be serialized without knowlege of specific
> > collection types. One solution to this would be to add some sort of
dynamic
> > array declaration and/or possible primitive operatorations so that the
above
> > declaration would become:
>
> One solution would be to write a template class, which accepts a
> vector<T> and serializes the elements. Of course, it has to know how
> to serialize each element somehow, but the same problem would be
> present in ANY attempt to globally serialize an array.
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Tue, 25 Mar 2003 17:36:39 +0000 (UTC) Raw View
I didn't really ask the question clearly (it happens sometimes) --
Roger, I am familiar with most of the information you dicussed even if it
didn't appear to be so - thanks for the info.
Doesn't C and C++ have a fundamental type weakness in its declarations?
For example, if I declare a struct
struct my_struct
{
int* my_int;
int* my_int_array;
}
The meaning of the int* is ambiguous. One int* refers to a single value and
one int* refers to
an array. Length information is usually contained in a different field for
arrays. This type weakness makes it difficult to implement serialization
generically using self-descriptive information. When I say generically, I
mean that that the serializer should have knowlege of only a fixed number of
pre-defined types and all user types should be constructed from those types.
It would be nice if the meta-information was complete enough so that STL
style collections could be serialized without knowlege of specific
collection types. One solution to this would be to add some sort of dynamic
array declaration and/or possible primitive operatorations so that the above
declaration would become:
struct my_struct
{
int* my_int;
int my_int_array[];
}
Now the intent is clear from the declaration. The length of the array would
be accessible/updateable somehow. If STL collections used these arrays
instead of conventional C/C++ weakly typed arrays when blocks of memory were
needed, then they could be serialized generically using meta-information.
Roger,
I've read some about .NET, but I have not used it. My impression(not very
knowlegeble) is that .NET imposes too many Java-like rules (like garbage
collection) on languages. How does .NET handle the problem stated above?
Is the .NET abstraction elegant? Does it keep the spirit of C/C++ (ie.
garbage collection is an option, it is not mandatory) when it provides
self-descriptive information?
""Roger Orr"" <rogero@howzatt.demon.co.uk> wrote in message
news:b5am02$pj6$1$8302bc10@news.demon.co.uk...
>
> ""Eddie Forren"" <eforren@cox.net> wrote in message
> news:SL4da.58995$JE5.24324@news2.central.cox.net...
> [snip]
> > My question is this:
> > When Java implements automatic serialization, I believe it does
so
> > without any class
> > specific operations. It does not know about specific collection
> > classes like vector or
> > hash table. I think it does this because it knows the complete
> type
> > of every attribute
> > in a class. (Everything goes through the new keyword).
>
> Java does know the type if everything - but that's not because of the
'new'
> keyword but because the Java specification defines 'meta' information for
> all classes, arrays and primitive data types. The compiler must create
such
> information and the virtual machine makes use of it. See the
> 'java.lang.Class' class for more information.
>
> In order for C++ to deal with the issue in a flexible way C++ would need
an
> infrastructure, supported at both compile and run time, to produce and
> manage type information.
> At present the C++ standard does not make any requirements of this type,
but
> various people do seem to be interested in proposing standard way(s) of
> representing and accessing such meta information.
> I suspect such extended RTTI could become an optional part of the language
> at some future date, but I don't think it is likely to become mandatory.
>
> Some implementations of C++ may support this sort of information already -
> however at present the way such information is presented is non-standard
and
> hence non-portable. One example is Microsoft's managed C++ (which runs
> inside the .NET infrastructure) which does provide C++ code with access to
> comprehensive information about class types, etc. although this language
> does have some restrictions compared to 'standard' C++.
>
> Hope this helps,
> Roger Orr
> --
> MVP in C++ at www.brainbench.com
>
>
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eldiener@earthlink.net ("Edward Diener")
Date: Tue, 25 Mar 2003 22:35:22 +0000 (UTC) Raw View
"Eddie Forren" wrote:
> I didn't really ask the question clearly (it happens sometimes) --
> Roger, I am familiar with most of the information you dicussed even
> if it didn't appear to be so - thanks for the info.
>
> Doesn't C and C++ have a fundamental type weakness in its
> declarations?
> For example, if I declare a struct
>
> struct my_struct
> {
> int* my_int;
> int* my_int_array;
> }
>
> The meaning of the int* is ambiguous. One int* refers to a single
> value and one int* refers to
> an array.
No, both declarations refer to a pointer to an int. Just because you say
'my_int_array' refers to an array doesn't make it so. If you want an array
you can write:
int my_int_array[n]; // where n is some number
or use std::vector instead. Both are superior IMO to using int * to actually
point to a sequence of ints as opposed to its normal usage as a pointer to a
single int, which is what the declartation tells you..
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: allan_w@my-dejanews.com (Allan W)
Date: Wed, 26 Mar 2003 18:20:47 +0000 (UTC) Raw View
eforren@cox.net ("Eddie Forren") wrote
> Doesn't C and C++ have a fundamental type weakness in its declarations?
> For example, if I declare a struct
>
> struct my_struct
> {
> int* my_int;
> int* my_int_array;
> }
>
> The meaning of the int* is ambiguous. One int* refers to a single
> value and one int* refers to an array. Length information is
> usually contained in a different field for arrays.
The type of array you're suggesting is already part of standard C++.
It's called "std::vector."
In my opinion, std::vector should be considered beginner-level C++
in schools, and "C-style" arrays should be considered advanced stuff.
Unfortunately, as a C++ teacher I cannot use this with current
textbooks, which require me to teach pointers and arrays pretty
much at the same time.
> This type weakness makes it difficult to implement serialization
> generically using self-descriptive information. When I say generically, I
> mean that that the serializer should have knowlege of only a fixed number of
> pre-defined types and all user types should be constructed from those types.
> It would be nice if the meta-information was complete enough so that STL
> style collections could be serialized without knowlege of specific
> collection types. One solution to this would be to add some sort of dynamic
> array declaration and/or possible primitive operatorations so that the above
> declaration would become:
One solution would be to write a template class, which accepts a
vector<T> and serializes the elements. Of course, it has to know how
to serialize each element somehow, but the same problem would be
present in ANY attempt to globally serialize an array.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Wed, 26 Mar 2003 18:33:01 +0000 (UTC) Raw View
This is a multi-part message in MIME format.
------=_NextPart_000_00FA_01C2F310.E6ABCBC0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
I must really be off in left field or I'm just not making my idea clear. =
I will try
one more time. If this doesn't work I will just move on to some other =
subject.
First some assumptions:
a.. I heard some discussions about adding extended run time type =
information to the C++ standard. (Bjarne Stroustrup - Software =
Development and Expo 2001-04-10 Audio can be found at =
http://technetcast.ddj.com/tnc_catalog.html in the c++ section.)
b.. I assume this optional extended run time type information would =
include compile time descriptive information(field data types, field =
offsets, field names, parameter types passed in method function calls, =
etc) about declared fields and methods and a standard way to access the =
descriptive information.=20
c.. I thought one of the uses for this information would be to =
implement generic serialization/ deserialization routines for arbitrary, =
variable length, C++ data structures. (Java implements this using their =
run time type information).
d.. This could be very useful in a multi-language, distributed =
environment and it would be easier than using IDL mechanisms (It is very =
easy to pass arbitrary, variable length data structures from client to =
server using standard serialization in Java). Of course, to be truly =
multi-language, other languages besides Java would have to add similar =
self-description/serialization facilities. I think .NET tries to do =
some of this by defining a computing platform and extensions for =
languages, but I would like to see a smaller set of extensions that do =
not violate the original intent and spirit of each language.
Using a std::vector is fine for dynamic arrays unless you want to =
implement a generic serializer that uses=20
extended run time type information to serialize/deserialize any =
arbitrary C++ data structure. Since other STL collections do not use =
vector when they need an array of data (I think they tend to use a =
structure similar to the one in my previous message), the generic =
serializer would need knowlege of internal implementation details of =
each STL collection type to serialize each collection type. If new =
collection types were added in the future, then the serializer would =
have to be updated. Am I missing something? If I had extended type =
information in a compiler today, could I write a generic serializer for =
arbitrary data structures declared in C++? (The serializer should only =
have knowledge of a fixed set of pre-defined data types). It seems like =
it would be useful to add dynamic arrays to the language and encourage =
their use so that run time type information would be completely =
descriptive for any arbitrary data structure and generic serialization =
would then be feasible (at least when dynamic arrays were used).
I think one of the main reasons Java can implement automatic =
serialization and deserialization is because all fundamental types are =
clearly defined, the length of each fundamental type can be determined, =
and all other types are built only from those fundamental data types. I =
don't think this is true for C++ (the length of memory buffer
referenced by int*, char* etc. can not easily be determined by a =
generic serializer with access to extended run time type information -- =
In Java, a pointer (hidden from the user) only references a single =
instance of a data type and dynamic arrays are used when a block of =
memory contains a list of some data type). =20
Questions:
Is the idea I'm trying to communicate here clear?
Is there some other alternative? =20
Is extended type information being considered in part to make automatic =
serialization/deserializatin possible? =20
If not, what will be some of the main uses of extended type information?
""Edward Diener"" <eldiener@earthlink.net> wrote in message =
news:0J4ga.19572$jA2.1752373@newsread2.prod.itd.earthlink.net...
> "Eddie Forren" wrote:
> > I didn't really ask the question clearly (it happens sometimes) --
> > Roger, I am familiar with most of the information you dicussed even
> > if it didn't appear to be so - thanks for the info.
> >
> > Doesn't C and C++ have a fundamental type weakness in its
> > declarations?
> > For example, if I declare a struct
> >
> > struct my_struct
> > {
> > int* my_int;
> > int* my_int_array;
> > }
> >
> > The meaning of the int* is ambiguous. One int* refers to a single
> > value and one int* refers to
> > an array.
>=20
> No, both declarations refer to a pointer to an int. Just because you =
say
> 'my_int_array' refers to an array doesn't make it so. If you want an =
array
> you can write:
>=20
> int my_int_array[n]; // where n is some number
>=20
> or use std::vector instead. Both are superior IMO to using int * to =
actually
> point to a sequence of ints as opposed to its normal usage as a =
pointer to a
> single int, which is what the declartation tells you..
>=20
>=20
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting =
with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu =
]
> [ --- Please see the FAQ before posting. --- =
]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html =
]
>
------=_NextPart_000_00FA_01C2F310.E6ABCBC0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=3DArial size=3D2>I must really be off in left field or =
I'm just not=20
making my idea clear. I will try</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>one more time. If this doesn't =
work I will=20
just move on to some other subject.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>First some assumptions:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<UL>
<LI><FONT face=3DArial size=3D2>I heard some discussions about adding =
extended run=20
time type information to the C++ standard.</FONT><FONT face=3DArial=20
size=3D2> (Bjarne Stroustrup - Software Development and Expo =
2001-04-10 Audio can be found at</FONT><FONT face=3DArial =
size=3D2> </FONT><A=20
href=3D"http://technetcast.ddj.com/tnc_catalog.html"><FONT =
face=3DArial=20
size=3D2>http://technetcast.ddj.com/tnc_catalog.html</FONT></A><FONT =
face=3DArial=20
size=3D2> in the c++ section.)</FONT></LI>
<LI><FONT face=3DArial size=3D2>I assume this optional extended run =
time type=20
information would=20
include compile time descriptive information(field =
data=20
types, field offsets, field names, parameter types passed in =
method=20
function calls, etc) about declared fields and methods and a =
standard way to access the descriptive information. </FONT></LI>
<LI><FONT face=3DArial size=3D2>I thought one of the uses for =
this=20
information would be to implement generic serialization/ </FONT><FONT=20
face=3DArial size=3D2>deserialization routines for arbitrary, variable =
length, C++ data structures. (Java implements this using =
their run=20
</FONT><FONT face=3DArial size=3D2>time type information).</FONT></LI>
<LI><FONT face=3DArial size=3D2>This could be very useful in a =
multi-language,=20
distributed environment and it would be easier than using IDL =
mechanisms=20
(It is very easy to pass arbitrary, variable length data =
structures=20
from client to server using standard serialization in Java). Of =
course,=20
to be truly multi-language, other languages besides Java would have to =
add=20
similar self-description/serialization facilities. I think .NET =
tries to=20
do some of this by defining a computing platform and extensions for =
languages,=20
but I would like to see a smaller set of extensions that do not =
violate the=20
original intent and spirit of each language.</FONT></LI></UL>
<DIV><FONT face=3DArial size=3D2>Using a std::vector is fine for dynamic =
arrays=20
unless you want to implement a generic serializer that uses =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>extended run time type information to=20
serialize/deserialize any arbitrary C++ data structure. Since =
other STL=20
collections do not use vector when they need an array of data (I =
think they=20
tend to use a structure similar to the one in my previous message), the =
generic=20
serializer would need knowlege of internal implementation details of =
each STL=20
collection type to serialize each collection type. If new =
collection types=20
were added in the future, then the serializer would have to be =
updated. Am=20
I missing something? If I had extended type information in a =
compiler=20
today, could I write a generic serializer for arbitrary data structures =
declared=20
in C++? (The serializer should only have knowledge of a fixed set =
of=20
pre-defined data types). </FONT><FONT face=3DArial size=3D2>It =
seems like it=20
would be useful to add dynamic arrays to the language and encourage =
their use so=20
that run time type information would be completely descriptive for =
any=20
arbitrary data structure and generic serialization would then be =
feasible (at=20
least when dynamic arrays were used).</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I think one of the main reasons Java =
can implement=20
automatic serialization and deserialization is because all fundamental =
types are=20
clearly defined, the length of each fundamental type can be =
determined,=20
and all other types are built only from those fundamental data =
types. I=20
don't think this is true for C++ (the length of memory =
buffer</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>referenced by int*, char* etc. =
can not easily=20
be determined by a generic serializer with access to extended run time =
type=20
information -- In Java, a pointer (hidden from the user) only references =
a=20
single instance of a data type and dynamic arrays are used when a block =
of=20
memory contains a list of some data type). </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Questions:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Is the idea I'm trying to communicate =
here=20
clear?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Is there some other alternative? =20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Is extended type information being =
considered in=20
part to make automatic serialization/deserializatin possible? =20
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>If not, what will be some of the main =
uses of=20
extended type information?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT><FONT face=3DArial =
size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>""Edward Diener"" <</FONT><A=20
href=3D"mailto:eldiener@earthlink.net"><FONT face=3DArial=20
size=3D2>eldiener@earthlink.net</FONT></A><FONT face=3DArial =
size=3D2>> wrote in=20
message </FONT><A=20
href=3D"news:0J4ga.19572$jA2.1752373@newsread2.prod.itd.earthlink.net"><F=
ONT=20
face=3DArial=20
size=3D2>news:0J4ga.19572$jA2.1752373@newsread2.prod.itd.earthlink.net</F=
ONT></A><FONT=20
face=3DArial size=3D2>...</FONT></DIV><FONT face=3DArial size=3D2>> =
"Eddie Forren"=20
wrote:<BR>> > I didn't really ask the question clearly (it =
happens=20
sometimes) --<BR>> > Roger, I am familiar with most of the =
information you=20
dicussed even<BR>> > if it didn't appear to be so - thanks for the =
info.<BR>> ><BR>> > Doesn't C and C++ have a fundamental =
type=20
weakness in its<BR>> > declarations?<BR>> > For example, if =
I=20
declare a struct<BR>> ><BR>> > struct my_struct<BR>> > =
{<BR>> > int* my_int;<BR>>=20
> int* my_int_array;<BR>> > }<BR>>=20
><BR>> > The meaning of the int* is ambiguous. One int* =
refers to=20
a single<BR>> > value and one int* refers to<BR>> > an=20
array.<BR>> <BR>> No, both declarations refer to a pointer to an =
int. Just=20
because you say<BR>> 'my_int_array' refers to an array doesn't make =
it so. If=20
you want an array<BR>> you can write:<BR>> <BR>> int =
my_int_array[n];=20
// where n is some number<BR>> <BR>> or use std::vector instead. =
Both are=20
superior IMO to using int * to actually<BR>> point to a sequence of =
ints as=20
opposed to its normal usage as a pointer to a<BR>> single int, which =
is what=20
the declartation tells you..<BR>> <BR>> <BR>> ---<BR>> [=20
comp.std.c++ is moderated. To submit articles, try just posting =
with=20
]<BR>> [ your news-reader. If that fails, use </FONT><A=20
href=3D"mailto:std-c++@ncar.ucar.edu"><FONT face=3DArial=20
size=3D2>mailto:std-c++@ncar.ucar.edu</FONT></A><FONT face=3DArial=20
size=3D2> ]<BR>>=20
[ =
=20
--- Please see the FAQ before posting.=20
--- &nbs=
p; =20
]<BR>> [ FAQ: </FONT><A=20
href=3D"http://www.jamesd.demon.co.uk/csc/faq.html"><FONT face=3DArial=20
size=3D2>http://www.jamesd.demon.co.uk/csc/faq.html</FONT></A><FONT =
face=3DArial=20
size=3D2> &nbs=
p; =20
]<BR>> </FONT></BODY></HTML>
------=_NextPart_000_00FA_01C2F310.E6ABCBC0--
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: heliosc@mindspring.com (Rayiner Hashem)
Date: Wed, 26 Mar 2003 22:47:29 +0000 (UTC) Raw View
You're definately going to need runtime support to implement
serialization. The C++ memory manager has enough information about the
exact sizes of objects, while the compiler knows about where pointers
to other objects are. Put the two together, and you've probably got
enough to implement reasonably transparent serialization. Nothing will
ever handle stuff like this:
int * ptrTable[256];
struct my_struct
{
int idx; //index into ptrTable
};
But it's not entirely clear that you'd want the runtime to handle
stuff like this anyway (too ambiguous, probably don't want to store
the entire referred to by ptrTable along with my_struct, etc). I've
been looking into hacking up some serialization code, and I think I'm
making some (conceptually, at least) progress. There is a project
(http://www.omegahat.org/GccTranslationUnit/) that has written a
Python utility to parse GCC's internal compiler information. This
might very well provide enough information for generic serialization.
The major snag I've run into is how to write the serialization
routines. If a class declares the serialization routine as 'friend'
then it is fairly easy for an external tool to generate a function
that can use the compiler to access each member. If you want
transparency, then you're going to have to figure out how the compiler
lays out data structures and emulate the algorithm in the external
tool. This is possible for GCC, because the C++ ABI it uses is
publically documented at CodeSourcery, but might not be possible for
something like Visual C++. If anybody can come up with test cases that
can't be deduced either from compiler info, or by the runtime, I'd
appreciate it if you'd post on this thread.
>
> struct my_struct
> {
> int* my_int;
> int* my_int_array;
> }
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eldiener@earthlink.net ("Edward Diener")
Date: Sat, 29 Mar 2003 21:58:18 +0000 (UTC) Raw View
> ""Eddie Forren"" <eforren@cox.net> wrote in message news:JB8ga.26822
>$jN6.14965@news2.central.cox.net...
>I must really be off in left field or I'm just not making my idea clear. I
will try
>one more time. If this doesn't work I will just move on to some other
subject.
>First some assumptions:
>I heard some discussions about adding extended run time type information to
the C++ standard. (Bjarne >Stroustrup - Software Development and Expo
2001-04-10 Audio can be found at
>http://technetcast.ddj.com/tnc_catalog.html in the c++ section.)
>I assume this optional extended run time type information would include
compile time descriptive >information(field data types, field offsets, field
names, parameter types passed in method function calls, etc) >about declared
fields and methods and a standard way to access the descriptive information.
>I thought one of the uses for this information would be to implement
generic serialization/ deserialization >routines for arbitrary, variable
length, C++ data structures. (Java implements this using their run time
type >information).
>This could be very useful in a multi-language, distributed environment and
it would be easier than using >IDL mechanisms (It is very easy to pass
arbitrary, variable length data structures from client to server using
>standard serialization in Java). Of course, to be truly multi-language,
other languages besides Java would >have to add similar
self-description/serialization facilities. I think .NET tries to do some of
this by defining >a computing platform and extensions for languages, but I
would like to see a smaller set of extensions that >do not violate the
original intent and spirit of each language.
>Using a std::vector is fine for dynamic arrays unless you want to implement
a generic serializer that uses
>extended run time type information to serialize/deserialize any arbitrary
C++ data structure. Since other >STL collections do not use vector when
they need an array of data (I think they tend to use a structure >similar to
the one in my previous message), the generic serializer would need knowlege
of internal >implementation details of each STL collection type to serialize
each collection type. If new collection types >were added in the future,
then the serializer would have to be updated. Am I missing something? If I
had >extended type information in a compiler today, could I write a generic
serializer for arbitrary data structures >declared in C++? (The serializer
should only have knowledge of a fixed set of pre-defined data types). It
>seems like it would be useful to add dynamic arrays to the language and
encourage their use so that run time >type information would be completely
descriptive for any arbitrary data structure and generic serialization
>would then be feasible (at least when dynamic arrays were used).
You are assuming that a generic serializer will be built directly into the
C++ language and that it will have no part which uses the idea of a C++
library. This may not be the case. The generic serializer may consist of
language additions purely, it may consist of library additions purely, or it
may consist of a mix of both.
The other thing which I think you are missing is that all complex types are
built on top of basic types. So the std::vector<> object, C++'s idea of a
dynamic array, could be serialized by serializing its basic type data just
as any complex class could be whether now or in the future.
I personally believe the introduction of more built-in types, to replace
types which are implemented by libraries via templates, just to support
serialization, is a horrible idea. I don't want to go backward into more C
style constructs to support such an idea. But as I pointed out previously,
it is not needed. Even in the case where serialization may be just a
language edition to handle all basic types without any library support
whatever, it is not needed since complex types are just aggregations of
basic types.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Sun, 30 Mar 2003 20:46:41 +0000 (UTC) Raw View
===================================== MODERATOR'S COMMENT:
Please don't overquote.
===================================== END OF MODERATOR'S COMMENT
Nothing will
> ever handle stuff like this:
>
> int * ptrTable[256];
>
> struct my_struct
> {
> int idx; //index into ptrTable
> };
This is a good reason for the serialization interface to be extensible.
Java allows
standard serialization to be overridden. C++ could use the same mechanism
or it could
use function objects or function pointers to allow standard serialization
functionality to be overridden
for a specific class.
> The major snag I've run into is how to write the serialization
> routines. If a class declares the serialization routine as 'friend'
> then it is fairly easy for an external tool to generate a function
> that can use the compiler to access each member. If you want
> transparency, then you're going to have to figure out how the compiler
> lays out data structures and emulate the algorithm in the external
> tool. This is possible for GCC, because the C++ ABI it uses is
> publically documented at CodeSourcery, but might not be possible for
> something like Visual C++.
friend is okay. I would prefer transparency.
"Rayiner Hashem" <heliosc@mindspring.com> wrote in message
news:a3995c0d.0303252120.1a2ba638@posting.google.com...
> You're definately going to need runtime support to implement
> serialization. The C++ memory manager has enough information about the
> exact sizes of objects, while the compiler knows about where pointers
> to other objects are. Put the two together, and you've probably got
> enough to implement reasonably transparent serialization. Nothing will
> ever handle stuff like this:
>
> int * ptrTable[256];
>
> struct my_struct
> {
> int idx; //index into ptrTable
> };
>
> But it's not entirely clear that you'd want the runtime to handle
> stuff like this anyway (too ambiguous, probably don't want to store
> the entire referred to by ptrTable along with my_struct, etc). I've
> been looking into hacking up some serialization code, and I think I'm
> making some (conceptually, at least) progress. There is a project
> (http://www.omegahat.org/GccTranslationUnit/) that has written a
> Python utility to parse GCC's internal compiler information. This
> might very well provide enough information for generic serialization.
> The major snag I've run into is how to write the serialization
> routines. If a class declares the serialization routine as 'friend'
> then it is fairly easy for an external tool to generate a function
> that can use the compiler to access each member. If you want
> transparency, then you're going to have to figure out how the compiler
> lays out data structures and emulate the algorithm in the external
> tool. This is possible for GCC, because the C++ ABI it uses is
> publically documented at CodeSourcery, but might not be possible for
> something like Visual C++. If anybody can come up with test cases that
> can't be deduced either from compiler info, or by the runtime, I'd
> appreciate it if you'd post on this thread.
>
> >
> > struct my_struct
> > {
> > int* my_int;
> > int* my_int_array;
> > }
> >
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Sun, 30 Mar 2003 20:46:52 +0000 (UTC) Raw View
===================================== MODERATOR'S COMMENT:
Please don't overquote.
===================================== END OF MODERATOR'S COMMENT
> You are assuming that a generic serializer will be built directly into the
> C++ language and that it will have no part which uses the idea of a C++
> library. This may not be the case. The generic serializer may consist of
> language additions purely, it may consist of library additions purely, or
it
> may consist of a mix of both.
I was assuming the serialization code would exist in a library, not in the
language.
The serializer is not part of the Java language either.
> The other thing which I think you are missing is that all complex types
are
> built on top of basic types. So the std::vector<> object, C++'s idea of a
> dynamic array, could be serialized by serializing its basic type data just
> as any complex class could be whether now or in the future.
The problem with this is that std::vector is not always used when you need a
dynamic array.
I'm not sure, but I don't think it is always used within STL when a dynamic
array is needed.
I will find out the answer to this question soon when I write my serializer
against the existing
language and STL implementation(s). As long as STL always uses vector
internally for a dynamic array,
then the only STL type that the serializer needs to know about will be
vector and that is perfectly fine
with me.
> I personally believe the introduction of more built-in types, to replace
> types which are implemented by libraries via templates, just to support
> serialization, is a horrible idea.
It may be a bad idea or it may be an idea that violates the spirt of C/C++.
I knew this when I wrote the original email. I'm not sure why though. To
me,
the pre-defined type information in C/C++ is more incomplete than it needs
to be.
The user of a reflection API should be able to determine type and length of
something from its type information. This should be true if the information
is
allocated in the heap, allocated in a memory mapped file, or even allocated
in a simple array by overridding the placement new operator.
""Edward Diener"" <eldiener@earthlink.net> wrote in message
news:wB8ha.865$rN3.37636@newsread2.prod.itd.earthlink.net...
> > ""Eddie Forren"" <eforren@cox.net> wrote in message news:JB8ga.26822
> >$jN6.14965@news2.central.cox.net...
> >I must really be off in left field or I'm just not making my idea clear.
I
> will try
> >one more time. If this doesn't work I will just move on to some other
> subject.
>
> >First some assumptions:
>
> >I heard some discussions about adding extended run time type information
to
> the C++ standard. (Bjarne >Stroustrup - Software Development and Expo
> 2001-04-10 Audio can be found at
> >http://technetcast.ddj.com/tnc_catalog.html in the c++ section.)
> >I assume this optional extended run time type information would include
> compile time descriptive >information(field data types, field offsets,
field
> names, parameter types passed in method function calls, etc) >about
declared
> fields and methods and a standard way to access the descriptive
information.
> >I thought one of the uses for this information would be to implement
> generic serialization/ deserialization >routines for arbitrary, variable
> length, C++ data structures. (Java implements this using their run time
> type >information).
> >This could be very useful in a multi-language, distributed environment
and
> it would be easier than using >IDL mechanisms (It is very easy to pass
> arbitrary, variable length data structures from client to server using
> >standard serialization in Java). Of course, to be truly multi-language,
> other languages besides Java would >have to add similar
> self-description/serialization facilities. I think .NET tries to do some
of
> this by defining >a computing platform and extensions for languages, but I
> would like to see a smaller set of extensions that >do not violate the
> original intent and spirit of each language.
> >Using a std::vector is fine for dynamic arrays unless you want to
implement
> a generic serializer that uses
> >extended run time type information to serialize/deserialize any arbitrary
> C++ data structure. Since other >STL collections do not use vector when
> they need an array of data (I think they tend to use a structure >similar
to
> the one in my previous message), the generic serializer would need
knowlege
> of internal >implementation details of each STL collection type to
serialize
> each collection type. If new collection types >were added in the future,
> then the serializer would have to be updated. Am I missing something? If
I
> had >extended type information in a compiler today, could I write a
generic
> serializer for arbitrary data structures >declared in C++? (The serializer
> should only have knowledge of a fixed set of pre-defined data types). It
> >seems like it would be useful to add dynamic arrays to the language and
> encourage their use so that run time >type information would be completely
> descriptive for any arbitrary data structure and generic serialization
> >would then be feasible (at least when dynamic arrays were used).
>
> You are assuming that a generic serializer will be built directly into the
> C++ language and that it will have no part which uses the idea of a C++
> library. This may not be the case. The generic serializer may consist of
> language additions purely, it may consist of library additions purely, or
it
> may consist of a mix of both.
>
> The other thing which I think you are missing is that all complex types
are
> built on top of basic types. So the std::vector<> object, C++'s idea of a
> dynamic array, could be serialized by serializing its basic type data just
> as any complex class could be whether now or in the future.
>
> I personally believe the introduction of more built-in types, to replace
> types which are implemented by libraries via templates, just to support
> serialization, is a horrible idea. I don't want to go backward into more C
> style constructs to support such an idea. But as I pointed out previously,
> it is not needed. Even in the case where serialization may be just a
> language edition to handle all basic types without any library support
> whatever, it is not needed since complex types are just aggregations of
> basic types.
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Mon, 31 Mar 2003 04:24:36 +0000 (UTC) Raw View
Sorry for the previous partial response. I only have one comment to add:
> You're definately going to need runtime support to implement
> serialization. The C++ memory manager has enough information about the
> exact sizes of objects, while the compiler knows about where pointers
> to other objects are. Put the two together, and you've probably got
> enough to implement reasonably transparent serialization.
Using the C++ memory manager is great for prototyping the extended type
information interfaces + serialization code, but I'm not sure it is the best
language
choice from the perspective of implementing a generic serializer library
that
uses run time type information.
What if a data type is allocated in a memory mapped file?
What if a a data type is allocated in a array by overridding the new
operator?
I think the generic serializer should serialize most "ordinary" C++ data
structures.
In Java, ordinary data structures are always on the heap. Assuming
"ordinary" data structures
are always on the heap may be inconsistent with the spirit/intent of C/C++.
In C++ data
referenced by a pointer or an array reference could reference memory that is
not on the heap.
"Rayiner Hashem" <heliosc@mindspring.com> wrote in message
news:a3995c0d.0303252120.1a2ba638@posting.google.com...
> You're definately going to need runtime support to implement
> serialization. The C++ memory manager has enough information about the
> exact sizes of objects, while the compiler knows about where pointers
> to other objects are. Put the two together, and you've probably got
> enough to implement reasonably transparent serialization. Nothing will
> ever handle stuff like this:
>
> int * ptrTable[256];
>
> struct my_struct
> {
> int idx; //index into ptrTable
> };
>
> But it's not entirely clear that you'd want the runtime to handle
> stuff like this anyway (too ambiguous, probably don't want to store
> the entire referred to by ptrTable along with my_struct, etc). I've
> been looking into hacking up some serialization code, and I think I'm
> making some (conceptually, at least) progress. There is a project
> (http://www.omegahat.org/GccTranslationUnit/) that has written a
> Python utility to parse GCC's internal compiler information. This
> might very well provide enough information for generic serialization.
> The major snag I've run into is how to write the serialization
> routines. If a class declares the serialization routine as 'friend'
> then it is fairly easy for an external tool to generate a function
> that can use the compiler to access each member. If you want
> transparency, then you're going to have to figure out how the compiler
> lays out data structures and emulate the algorithm in the external
> tool. This is possible for GCC, because the C++ ABI it uses is
> publically documented at CodeSourcery, but might not be possible for
> something like Visual C++. If anybody can come up with test cases that
> can't be deduced either from compiler info, or by the runtime, I'd
> appreciate it if you'd post on this thread.
>
> >
> > struct my_struct
> > {
> > int* my_int;
> > int* my_int_array;
> > }
> >
>
> ---
> [ comp.std.c++ is moderated. To submit articles, try just posting with ]
> [ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
> [ --- Please see the FAQ before posting. --- ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
>
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: heliosc@mindspring.com (Rayiner Hashem)
Date: Tue, 1 Apr 2003 05:22:21 +0000 (UTC) Raw View
To tell the truth, I really don't think there is any other way to
support certain cases. For the most part, you can use RTTI to
statically determine the size of objects. For example, the static size
of all stack variables can be determined by the compiler. For *most*
objects allocated on the heap, you can also RTTI. If you're
serializing a pointer to a non-polymorphic type, then you're sure of
the static size. If you're serializing a pointer to a polymorphic
type, you can use the RTTI info in the v-table to get the size. The
only case where you *need* the memory manager is in the case of
arrays. Existing C++ constructs already require the memory manager to
keep some accounting information. Take an operation like:
int * i = new int[256];
Usually, the compiler will turn this into __vec_new(size_t sz, size_t
count). __vec_new() is pretty much required to store count before the
start of the array, so the array can be freed later. There isn't
really another way of handling it. So if you want to serialize the
array 'i', you can use the same 'count' token that delete[] would use.
> Using the C++ memory manager is great for prototyping the extended type
> information interfaces + serialization code, but I'm not sure it is the best
> language
> choice from the perspective of implementing a generic serializer library
> that
> uses run time type information.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: allan_w@my-dejanews.com (Allan W)
Date: Thu, 3 Apr 2003 01:28:41 +0000 (UTC) Raw View
heliosc@mindspring.com (Rayiner Hashem) wrote
> Existing C++ constructs already require the memory manager to
> keep some accounting information. Take an operation like:
> int * i = new int[256];
> Usually, the compiler will turn this into __vec_new(size_t sz, size_t
> count). __vec_new() is pretty much required to store count before the
> start of the array, so the array can be freed later. There isn't
> really another way of handling it.
Two nits here.
Obviously the name "__vec_new" isn't mandated; the run-time system can
use any name it wants, so long as it won't conflict with your names.
Also, there's a practical requirement to store the amount of memory
*somewhere* that can be retrieved when you call delete[]. It does
not have to be "before the start of the array" although that's a common
technique. Another possible method would have the run-time system
maintain a data structure such as a map, which could associate a
address with the original allocation size. There are others, I'm sure.
> So if you want to serialize the
> array 'i', you can use the same 'count' token that delete[] would use.
I've been down this slope. It's very compelling. At runtime we have to
"know" the size of every memory allocation, and we have to "know" the
number of elements in an array. Why not combine them?
But this simply can't work. First of all, the code that destroys an
array isn't always called due to a delete[].
foo() {
MyObject o[10];
}
When foo exits, it needs to destroy all ten MyObject objects. But
these objects never were on the heap, so we have to have some other
way to know how many objects were in the array.
But this could be a parameter to the internal delete-an-array
logic, right? After all, we know the size of the o array, so it
still seems to work similarly.
No. The real problem is that global operator new can be replaced by
the user. If the user does replace global operator new, then of course
the user must also keep track of the size of this block. But there's
nothing that says the user has to do it the same way as the operator
new in the standard library. Perhaps the run-time system stores the
size just before the allocated memory, as you suggested. But perhaps
the user has a hash map instead (this would be very common for
debugging allocators that want to track problems with heap usage). So
now the user's code "knows" how large the block is, and the original
run-time code does not.
Many systems have some way of asking, "here's the address of something
on the heap -- how big is it?" But this is non-standard. Even if the
user does implement such a mechanism, the original run-time system
cannot possibly know the name of it. So it cannot request this
information.
The compiler cannot ask the heap for the size of a block, in order to
"know" how many objects are in the array. In order to handle destructors
properly, it is forced to keep track of the size in some other way.
(But knowing it at compile-time might be enough -- it doesn't have to
hold it in any partiuclar place in memory.) Perhaps THAT number could
also be used for serialization, but the size of the heap allocation cannot.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: rogero@howzatt.demon.co.uk ("Roger Orr")
Date: Thu, 20 Mar 2003 18:12:54 +0000 (UTC) Raw View
""Eddie Forren"" <eforren@cox.net> wrote in message
news:SL4da.58995$JE5.24324@news2.central.cox.net...
[snip]
> My question is this:
> When Java implements automatic serialization, I believe it does so
> without any class
> specific operations. It does not know about specific collection
> classes like vector or
> hash table. I think it does this because it knows the complete
type
> of every attribute
> in a class. (Everything goes through the new keyword).
Java does know the type if everything - but that's not because of the 'new'
keyword but because the Java specification defines 'meta' information for
all classes, arrays and primitive data types. The compiler must create such
information and the virtual machine makes use of it. See the
'java.lang.Class' class for more information.
In order for C++ to deal with the issue in a flexible way C++ would need an
infrastructure, supported at both compile and run time, to produce and
manage type information.
At present the C++ standard does not make any requirements of this type, but
various people do seem to be interested in proposing standard way(s) of
representing and accessing such meta information.
I suspect such extended RTTI could become an optional part of the language
at some future date, but I don't think it is likely to become mandatory.
Some implementations of C++ may support this sort of information already -
however at present the way such information is presented is non-standard and
hence non-portable. One example is Microsoft's managed C++ (which runs
inside the .NET infrastructure) which does provide C++ code with access to
comprehensive information about class types, etc. although this language
does have some restrictions compared to 'standard' C++.
Hope this helps,
Roger Orr
--
MVP in C++ at www.brainbench.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: eforren@cox.net ("Eddie Forren")
Date: Mon, 17 Mar 2003 21:31:30 +0000 (UTC) Raw View
Hello everyone. I'm new, so please feel free to ignore/correct any
ignorance on my part.
I read most of the commentary on properties and RTTI.
My current opinions are:
1. The property issue should be decided independently of
introspection issues.
2. I don't understand the difference between a public attribute
and a property.
Since elegant abstractions are not always possible, I do
believe public attributes
and/or properties are useful.
My interest in introspection for C++ is as follows:
1. Java provides automatic serialization/deserialization for
application data. I assume Java
uses self-description capabilities to implement this. This
is very useful in distributed
computing environments.
2. I would like to see introspection/self-description capabilities
added to many
languages: C++, FORTRAN, C if it does not merge with C++,
etc. This
would potentially allow automatic
serialization/deserialization of information
in a multi-language environment without IDL. I think it is
important that
the self-description capababilities of each language should
be consistent
with the spirit of each language.
3. I think introspection can be used for much more that GUI
builders. It may be (is?) useful
in making many vertical applications much more flexible and
generic.
My question is this:
When Java implements automatic serialization, I believe it does so
without any class
specific operations. It does not know about specific collection
classes like vector or
hash table. I think it does this because it knows the complete type
of every attribute
in a class. (Everything goes through the new keyword).
How should C++ deal with this issue since the user can allocate
data structures
with a typeless malloc function call? For example, a user might
have a structure
or class with the following:
class X
{
std::vector<char> my_list_1;
std::vector<short> my_list_2;
}
The serializer could either:
1. Use introspection and be knowlegable of std collection
types. (Not very
flexible because new collection types will impact
the serializer)
2. All memory allocations can become typed in some way.
ie. malloc would allocate an array of bytes. new
allocations are typed.
????
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]