Thread

Topic: sizeof functions

Author: qrczak@knm.org.pl ("Marcin 'Qrczak' Kowalczyk")
Date: Tue, 6 Jan 2004 20:55:35 +0000 (UTC) Raw View

On Tue, 06 Jan 2004 15:35:45 +0000, galathaea wrote:

> And I would imagine that as long as the function body was above
> the sizeof in the translation unit, it would be able to calculate a machine
> code function size for the block prior to translating the sizeof.

What if the compiler outputs assembler source and not binary machine code,
like GCC? Now it doesn't care how many bytes will each instruction take.

--
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: galathaea@excite.com (galathaea)
Date: Sun, 11 Jan 2004 16:15:22 +0000 (UTC) Raw View

[Note to moderators: please disregard if this is a repost.  I have not
received receipt notification after several days, and felt a resubmit
was appropriate]

"Francis Glassborow" wrote:
: galathaea writes:
: >Do you believe that it might be necessary to get something
: >working with a compiler like the gnu g++ to furthur such a
: >proposal?  It would take time to learn the core, but I
: >could look at it as a spare time activity.
:
: I believe that you would need to demonstrate that the costs
: of implementation (I think considerable when I reflect on
: the way that front-ends and back-ends work coupled with
: increasing use of intermediate languages and delay of
: implementation to pre-link stages) would justify the gain.
: I have serious doubts about this and think that it is very
: unlikely ever to get past the very early stage of evolution
: group winnowing.

I thank you very much for your appraisal, and had a few questions
concerning it.  First, do you know of a good metric on the community
support per cost at which a proposal would have a better chance of
passing, or is this usually a non-verbal evaluation for the most part?
 In particular, do you think it would be useful if I presented other
communities that have use of such facilities?  Because the work-around
techniques I mentioned arose first in the kernel debugging community,
where API interception techniques are very useful, and these
techniques have appeared in several books now in the system internals
community.  Maybe combined with the security communities I represent
(both systems security as well as software protection and general
DRM), the total community served coupled with absence of a stable
workaround inside the language, and a demonstration that the
technology is not too difficult to implement, would that possibly be
enough to change the evaluation?

I don't want to waste my time on something that has very little
chance, but I wouldn't mind putting some effort into its passage if it
had a chance.

Thank you again.

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

galathaea: prankster, fablist, magician, liar

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: galathaea@excite.com (galathaea)
Date: Mon, 12 Jan 2004 17:49:43 +0000 (UTC) Raw View

[Note to moderators: please ignore if this is a repost.  I have not
received receipt notification after several days and felt a repost was
appropriate]

"Marcin 'Qrczak' Kowalczyk" wrote:
: galathaea wrote:
:
: > And I would imagine that as long as the function body was above
: > the sizeof in the translation unit, it would be able to calculate
a
: > machine code function size for the block prior to translating the
: > sizeof.
:
: What if the compiler outputs assembler source and not binary machine
code,
: like GCC? Now it doesn't care how many bytes will each instruction
take.

Invoke the assembler! =)

Seriously, thats what I'd do.  Somewhere in that system has got be the
assembler that does the final transformation to machine code so that I
can execute the output.  If its attached strongly to the linker, a
little bit of refactoring should be able to free it from any coupling
so it can be invoked elsewhere.  At least, that is what I suspect, but
I am still quite ignorant of the architecture and need to do some
reading...

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

galathaea: prankster, fablist, magician, liar

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Mon, 12 Jan 2004 17:50:01 +0000 (UTC) Raw View

In article <b22ffac3.0401110009.402af39a@posting.google.com>, galathaea
<galathaea@excite.com> writes
>I don't want to waste my time on something that has very little
>chance, but I wouldn't mind putting some effort into its passage if it
>had a chance.

I am only a single voice with my own opinion however I do not think a
proposal for enabling sizeof to apply to a function would meet any of
the criteria for the evolution of C++ to the next release.

Given that, I think it would need a convincing demonstration that the
technique has very real benefits to a sub-community and no substantial
cost to the broader C++ community.

An early step would be to produce a full C++ compiler that implemented
it. Actually when I think about it, an important step would be to
produce a C compiler that implemented it (it seems to me that having
such would meet almost all the needs that you have described and would
be very much easier to implement).


--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
                             or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: galathaea@excite.com (galathaea)
Date: Mon, 5 Jan 2004 02:08:04 +0000 (UTC) Raw View

I am curious as to whether a mechanism for determining the c++ "byte" size
(ie. equivalent # of chars) of a function block has ever been considered.
It might be difficult to use the existing sizeof as I don't see a natural
way to get around the function pointer conversion on the function instance
name, but something like

fsizeof ( function-instance-name )

seems pretty straightforward (though in my eyes, it has an ugly feel to it
and is reminiscent of some c standard lib functions).

The reason this type of need comes up in my work is that I build operating
system extensions for security applications and there is a need to "inject"
or marshall function blocks across processes and so I need to know how much
memory to move.  The operating systems I work in offer standard dynamic
library / shared object mechanisms, but unfortunately they are not robust
enough to always meet my needs.

There is a standard trick / hack that works on some compiler / linker sets
to just make use of the "next function's pointer" and find the ptrdiff_t.
But, its a fragile technique which needs to ensure there will be no code
rearrangement or removal going on behind the scenes.

There are three other places I have found I may have uses for such a
language facility: function serialisation, interception stubs, and self
modifying code.  Function serialisation and dynamic updating, like the
marshalling, can again be done through a dynamic library / shared object
feature of the OS, but sometimes that mechanism brings more overhead than
desired (particularly when there are many different functions being released
for serialisation at different times).  Interception stubs are little pieces
of thunk code patched over the head of an API function used to intercept
calls to the API and forwarded off to some management trampoline, which is
one of several useful interception techniques used by a number of security
products (for example, in digital rights management).  And self-modifying
code is one of the more effective methods of software protection.

All of the uses I mention require more than just the language feature I am
asking about.  In particular, there is rebasing to consider, but I am more
interested at the moment as to the viability of just the language capability
to find function size.  Would such a language feature unnecessarily restrict
the abstract machine and translation models targetted by the language?
Would the presence of inlining and optimisations that require code movement
inhibit such a feature?  I can imagine a language rule that states that all
function instance names used in a fsizeof should have their associated
function blocks translated to a contiguous memory for which the statement
will refer (as will all function pointers to the instance), but calls to the
function inside the code do not necessarily need to go through this
contiguous version and optimisations still applied.

Does this sound feasible?  I am mostly trying to suggest a language solution
to a problem I regularly encounter for which the alternatives are either a
fragile hack or may introduce unnecessary overhead for my needs.

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

galathaea: prankster, fablist, magician, liar

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: do-not-spam-benh@bwsint.com (Ben Hutchings)
Date: Mon, 5 Jan 2004 16:10:35 +0000 (UTC) Raw View

galathaea wrote:
> I am curious as to whether a mechanism for determining the c++ "byte" size
> (ie. equivalent # of chars) of a function block has ever been considered.
> It might be difficult to use the existing sizeof as I don't see a natural
> way to get around the function pointer conversion on the function instance
> name,

sizeof(function) is currently ill-formed so it would be possible to extend
it.  sizeof can be applied to both lvalues and rvalues so there would be
no pointer-to-function conversion (just as there is no pointer-to-array
conversion in sizeof(array)).

> but something like
>
> fsizeof ( function-instance-name )
>
> seems pretty straightforward (though in my eyes, it has an ugly feel to it
> and is reminiscent of some c standard lib functions).
>
> The reason this type of need comes up in my work is that I build operating
> system extensions for security applications and there is a need to "inject"
> or marshall function blocks across processes and so I need to know how much
> memory to move.

This is impossible to do in a portable way even if the size of the code
is known.  There is no standard conversion from function pointers to data
pointers, and I doubt there ever will be.  Code and data may be stored in
separate address spaces (Harvard architecture).

<snip>
> All of the uses I mention require more than just the language feature I am
> asking about.  In particular, there is rebasing to consider, but I am more
> interested at the moment as to the viability of just the language capability
> to find function size.

What could you use it for, except to move functions - which you recognise
requires much more information?  Wouldn't it make more sense to propose
additions to the library that do all the dirty work?

> Would such a language feature unnecessarily restrict
> the abstract machine and translation models targetted by the language?
>
> Would the presence of inlining and optimisations that require code movement
> inhibit such a feature?

Virtual functions often have multiple entry points.  Of course, if
you're thinking purely of static functions, this isn't a concern.
Certainly inlining would prevent you from replacing the body of an
existing function.  There was some proposal for a do-not-inline keyword
some time back; I forget what happened to it.

> I can imagine a language rule that states that all
> function instance names used in a fsizeof should have their associated
> function blocks translated to a contiguous memory for which the statement
> will refer (as will all function pointers to the instance), but calls to the
> function inside the code do not necessarily need to go through this
> contiguous version and optimisations still applied.

I don't think that would work with the separate compilation model that
the C++ standard still supports (excepting 'export').

> Does this sound feasible?  I am mostly trying to suggest a language solution
> to a problem I regularly encounter for which the alternatives are either a
> fragile hack or may introduce unnecessary overhead for my needs.

Can you not make use of debugging symbols?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Tue, 6 Jan 2004 03:00:10 +0000 (UTC) Raw View

In article <slrnbvj27o.e4.do-not-spam-benh@tin.bwsint.com>, Ben
Hutchings <do-not-spam-benh@bwsint.com> writes
>sizeof(function) is currently ill-formed so it would be possible to extend
>it.  sizeof can be applied to both lvalues and rvalues so there would be
>no pointer-to-function conversion (just as there is no pointer-to-array
>conversion in sizeof(array)).

How is a compiler going to determine the size of a function? We are
beginning to see compilers that inline code at link time. What is the
sizeof an inlined function. Eventually (and for all I know some already
do) we will have compilers that hoist common code from functions or use
partial inlining etc. How do we define sizeof in such circumstances? The
decision not to allow sizeof to apply to functions was not just some
form of arbitrary 'laziness' but motivated by it being difficult to do
without applying implementation constraints that we did not want.


--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
                             or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: galathaea@excite.com ("galathaea")
Date: Tue, 6 Jan 2004 05:20:49 +0000 (UTC) Raw View

"Ben Hutchings" wrote
: galathaea wrote:
: > I am curious as to whether a mechanism for determining the c++ "byte"
size
: > (ie. equivalent # of chars) of a function block has ever been
considered.
: > It might be difficult to use the existing sizeof as I don't see a
natural
: > way to get around the function pointer conversion on the function
instance
: > name,
:
: sizeof(function) is currently ill-formed so it would be possible to extend
: it.  sizeof can be applied to both lvalues and rvalues so there would be
: no pointer-to-function conversion (just as there is no pointer-to-array
: conversion in sizeof(array)).

That's even better (and prettier)!  I should have definitely checked the
standard and my compiler before posting, but after the fact it seems obvious
that since there is no evaluation of the operand (4.1.2, 5.3.3.1), there
wouldn't be any need for an actual representation in need of a conversion.
In fact, after a consultation of the standard in this area, this is all made
quite clear, and is stated several times, including the function-to-pointer
conversion not being applied (5.3.3.4).  I did think of the standard array
example prior to posting, but had confused myself about technicalities of
function types and some of their asymmetry to object types in expressions,
but I think now that was just obfuscation on my part.

: > but something like
: >
: > fsizeof ( function-instance-name )
: >
: > seems pretty straightforward (though in my eyes, it has an ugly feel to
it
: > and is reminiscent of some c standard lib functions).
: >
: > The reason this type of need comes up in my work is that I build
operating
: > system extensions for security applications and there is a need to
"inject"
: > or marshall function blocks across processes and so I need to know how
much
: > memory to move.
:
: This is impossible to do in a portable way even if the size of the code
: is known.  There is no standard conversion from function pointers to data
: pointers, and I doubt there ever will be.  Code and data may be stored in
: separate address spaces (Harvard architecture).

Yes, but c++ doesn't recognise the notion of processes yet anyway, so there
is going to need to be system specific work going on to accomplish this.
And this is the type of "grungy" work where studying up on a compiler's
implementation of reinterpret_cast and its implementation-specific
capabilities can work around any type system problems that may arise, so
although unspecified, I would hold my compilers to the "unsurprising to
those who know the addressing structure of the underlying machine" intent of
the law.

: <snip>
: > All of the uses I mention require more than just the language feature I
am
: > asking about.  In particular, there is rebasing to consider, but I am
more
: > interested at the moment as to the viability of just the language
capability
: > to find function size.
:
: What could you use it for, except to move functions - which you recognise
: requires much more information?  Wouldn't it make more sense to propose
: additions to the library that do all the dirty work?

I think that's the major thing that all of my uses would target.  Perhaps
there might be uses in things like profiling and test base libraries to
display and break down that information as well, but the most common uses of
sizeof in general seem to usually be associated with moving data around
address spaces, serialising, and such.

Additionally, as generative techniques become more popular, I can imagine
another use as well.  It is common in the generative styles of programming
to find empty functions lying around as a "do nothing" alternative for a
feature point.  If these need to be called through a function pointer (not
common in my experience, but my experience is certainly limited), that can
inhibit inlining.  Having something like

sizeof(someFunction) == sizeof(knownEmptyFunction)

to pass around to something like apply_if could certainly be handy (of
course someFunction would likely be a macro parameter here unless / until
some safer metaprogramming paradigm like metacode becomes a part of the
standard).  And this highlights what I think may be the biggest difficulty
with such a feature -- making it work during translation when the
information would be most easily available afterwards.  Of course, its not a
logical paradox or anything crazy like that, but it does seem to require
that a reference be tossed to the machine code which can be evaluated during
machine code output and can be used in such mechanisms as non-type template
instantiation.  It seems to me, though, that some such mechanism is already
required for function pointers.

As for library work, I think that the main distinction here would be that it
might be nice to make the evaluation during translation for reasons similar
to the previous one I mention, but if this presents too many difficulties
for implementors or is not predicted to be that useful, then tossing even
just the sizeof functionality off to some library routine would be an
acceptable alternative.  I would love it if the library provided utilities
for the entire process of function movement, but I would not find such a
mechanism very useful except when used across process boundaries, which is
not standardly recognised (yet).

: > Would such a language feature unnecessarily restrict
: > the abstract machine and translation models targetted by the language?
: >
: > Would the presence of inlining and optimisations that require code
movement
: > inhibit such a feature?
:
: Virtual functions often have multiple entry points.  Of course, if
: you're thinking purely of static functions, this isn't a concern.
: Certainly inlining would prevent you from replacing the body of an
: existing function.  There was some proposal for a do-not-inline keyword
: some time back; I forget what happened to it.

Yes, I am thinking that this utility would only need to occur on statically
resolved function instance names and members.  Dynamically determined
instances can be accomplished through wrapping, much as in "virtual static"
and related idioms, so there is already that capability available with just
static determination included.

I do agree that inlining and possibly other optimisations that assume a
function will never change its implementation during runtime will need to be
suppressed if such a feature were implemented for the purpose of
self-modifying code.  But I do think self-modifying code has a rather strong
niche nowadays in the software protection domain, where it is quite common
to have movable code segments that also get encrypted / decrypted on the fly
to prevent patching (and knowing a function block size would make this task
much easier!).  So perhaps if I were to sponsor a proposal on this, I would
include the inline suppression as a possible dependency.

: > I can imagine a language rule that states that all
: > function instance names used in a fsizeof should have their associated
: > function blocks translated to a contiguous memory for which the
statement
: > will refer (as will all function pointers to the instance), but calls to
the
: > function inside the code do not necessarily need to go through this
: > contiguous version and optimisations still applied.
:
: I don't think that would work with the separate compilation model that
: the C++ standard still supports (excepting 'export').

Could you expand on this a bit?  Particularly, what if any functions that
were to be sizeof'd must have visible definitions in the translation unit
prior to the operator being applied, very much like some common requirements
for inlining?

: > Does this sound feasible?  I am mostly trying to suggest a language
solution
: > to a problem I regularly encounter for which the alternatives are either
a
: > fragile hack or may introduce unnecessary overhead for my needs.
:
: Can you not make use of debugging symbols?

You know, I had never thought about using them for this purpose before!  I
think I would be inhibited from using them in my personal circumstances,
because the products I work on often have strong security requirements that
may prohibit the inclusion of symbols in the release.  And in the Windows
arena, the format is undocumented (though Sven Schreiber has published alot
of the gory details) and must be parsed through the OS API
SymEnumerateSymbols which the SDK documentation explicitly cautions that the
SymbolSize returned is a best-guess value and can be zero.  According to
Sven, even the address can be zero and that often he has found the two
values even are commonly 2s complements of each other and thus add to zero!
So at least on this platform, I would be cautious about such an approach,
and I might not gain any stability over the other methods.

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

galathaea: prankster, fablist, magician, liar

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: galathaea@excite.com ("galathaea")
Date: Tue, 6 Jan 2004 15:35:45 +0000 (UTC) Raw View

"Francis Glassborow" wrote:
: How is a compiler going to determine the size of a function?
: We are beginning to see compilers that inline code at link
: time. What is the sizeof an inlined function. Eventually
: (and for all I know some already do) we will have compilers
: that hoist common code from functions or use partial inlining
: etc. How do we define sizeof in such circumstances? The
: decision not to allow sizeof to apply to functions was not
: just some form of arbitrary 'laziness' but motivated by it
: being difficult to do without applying implementation
: constraints that we did not want.

I really hope I did not sound like I was implying any laziness.  I pretty
much figured that there was a reason for it not being included, but I do not
see insurmountable difficulties.  I know we aren't usually supposed to
mention implementation details when speaking of the standard, since we don't
want to colour our expectations out of the abstract machine the language
specifies, but I figure most compilers have pretty much an elaborated
structure very similar to those of standard compiler textbooks (like the
dragon).  And I would imagine that as long as the function body was above
the sizeof in the translation unit, it would be able to calculate a machine
code function size for the block prior to translating the sizeof.  I
understand that there is likely at least one intermediate form that is
compiled to and optimisations may be applied at convenient points along the
intermediate languages, so that this can introduce code mixing / flow.  But
I do not see anything that prohibits marking a function for non-inlining to
maintain one contiguous machine code block which can be referred to, moved
around, etc.  The keyword could even be : contiguous : to stress that the
point is to maintain one copy that the function pointer would then point to
and which accomplishes the function.  It does appear that such a request has
been made prior for different reasons, so it seems to me like this could be
a rather straightforward language modification that takes care of a couple
of minor but useful tasks and helps fill out and complete the capabilities
of the language.  From the evolution of the language as I have watched it on
these boards and in the books, it seems like a handy candidate for proposal,
which is why I chose it.

Do you believe that it might be necessary to get something working with a
compiler like the gnu g++ to furthur such a proposal?  It would take time to
learn the core, but I could look at it as a spare time activity.


--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

galathaea: prankster, fablist, magician, liar

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Tue, 6 Jan 2004 18:14:56 +0000 (UTC) Raw View

In article <8TxKb.7280$vc7.2251@newssvr25.news.prodigy.com>, galathaea
<galathaea@excite.com> writes
>Do you believe that it might be necessary to get something working with a
>compiler like the gnu g++ to furthur such a proposal?  It would take time to
>learn the core, but I could look at it as a spare time activity.

I believe that you would need to demonstrate that the costs of
implementation (I think considerable when I reflect on the way that
front-ends and back-ends work coupled with increasing use of
intermediate languages and delay of implementation to pre-link stages)
would justify the gain. I have serious doubts about this and think that
it is very unlikely ever to get past the very early stage of evolution
group winnowing.


--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
                             or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]