Topic: 1/3 - Some C/C++ Optimization Idioms I Wish Compilers Supported (Part


Author: junkmacc1@hotmail.com ("Willow Schlanger (junkmacc1)")
Date: Sun, 3 Nov 2002 22:17:32 +0000 (UTC)
Raw View
Note:
- This is part 1 of 3 of the posting with subject "Some C/C++
Optimization Idioms I Wish Compilers Supported," dated Oct 28, 2002.

Title:
- Some C/C++ Optimization Idioms I Wish Compilers Supported

Author:
- Willow Schlanger (junkmacc@yahoo.com)

Copyright:
- (C) Copyright 2002 Willow Schlanger
- Permission is hereby granted to use this however you like provided
   modified versions are not distributed and provided copyright and
   other information is intact on all copies. No rights whatsoever
   are claimed on the idioms presented; these are placed in the
   public domain.

Contents:
- [[Part I]]   "Returning a Class Method"  (**this post**)
- [[Part II]]  "Final Virtual Method Idiom"  (--see 2nd posting--)
- [[Part III]] "Template Virtual Method Idiom" (--see 3rd posting--)

[[Part I]]   "Returning a Class Method"

I was wonderring if someone could shed some light for me on why so
compilers appear to be missing simple, well-known optimization idioms.
Reading this will require a lot of knowledege of C and x86 assembly
language. In this document I will be using Borland's 32-bit compiler but
rest assured that even g++ has similiar difficulities, as one is
encouraged to verify for oneself. "g()" is declared external to simulate
a real-world situation, since some compilers might inline "g()"
otherwise (but they would not if it were contained in a separate object
file).

Exhibit A1
---------- -----------------
struct s
{
 int x[1024];
};

extern s g();

s g()
{
 s t;
 t.x[0] = t.x[1023] = 0;
 return t;
}

int main()
{
 s p;
 p = g();

 return 0;
}
---------- End of Exhibit A1

Executing "bcc32 -S -O2 <filename>" produces the following
assembly-language source code for the g() function:

Exhibit A2
---------- -----------------
@@g$qv
proc
near
?live16385@0:
  ;
  ; s g()
  ;
@1:
 push ebp
 mov ebp,esp
 add esp,-4092
  ;
  ; {
  ;  s t;
  ;  t.x[0] = t.x[1023] = 0;
  ;
?live16385@16: ; EAX = return
 xor edx,edx
  ;
  ;  return t;
  ;
 mov ecx,1024
?live16385@48: ;
 push eax
 push esi
 push edi
 mov eax,dword ptr [ebp+8]
?live16385@64: ; EAX = return
 mov dword ptr [ebp-4],edx
 mov dword ptr [ebp-4096],edx
 mov edi,eax
 lea esi,dword ptr [ebp-4096]
 rep movsd
  ;
  ; }
  ;
?live16385@96: ;
@3:
@2:
 pop edi
 pop esi
 mov esp,ebp
 pop ebp
 ret
@@g$qv
endp

;...

;int main():
 add esp,-4092 ; why is this only 4092?
 push eax
 push esp
 call @@g$qv
 pop ecx
 xor eax,eax
 add esp,4096
 ret

---------- End of Exhibit A2

As you can see, the compiler idiom used for when a local variable that
is a structure is returned, is to _copy_ the structure to the returned
structure, a pointer to which is provided on the stack.

In short, the compiler is treating the matter essentially the same as
the following situation:

void g(s *ret)
{
 s t;
 for(int i = 0; i < 1024; ++i)
  t.x[i] = i;
 *ret = t;
}

int main()
{
 s p;
 g(&p);

 return 0;
}

In the latter case, the compiler must copy the structure when "*ret =
t;" is encountered. But in the former case, where "t" is returned
instead of being assigned to *ret, an optimization that is fairly
obvious is possible.

In particular, many people are aware that returning a structure will
result in a copy of the entire structure and therefore they write code
like this:

Exhibit B1
---------- -----------------
struct s
{
 int x[1024];
};

extern s g();

void g(s *ret)
{
 ret->x[0] = ret->x[1023] = 0;
}

int main()
{
 s p;
 g(&p);

 return 0;
}
---------- -----------------

This produces the following code:

Exhibit B2
---------- -----------------
@@g$qp1s
proc
near
?live16385@0:
  ;
  ; void g(s *ret)
  ;
@1:
 push ebp
 mov ebp,esp
 mov eax,dword ptr [ebp+8]
  ;
  ; {
  ;  ret->x[0] = ret->x[1023] = 0;
  ;
?live16385@16: ; EAX = ret
 xor edx,edx
 mov dword ptr [eax+4092],edx
 mov dword ptr [eax],edx
  ;
  ; }
  ;
?live16385@32: ;
@2:
 pop ebp
 ret
@@g$qp1s
endp

;...

; int main():
@_main
proc
near
 add esp,-4092
 push eax
 push esp
 call @@g$qp1s
 pop ecx
 xor eax,eax
 add esp,4096
 ret
@_main
endp
---------- -----------------

Observe that no string move instructions are used! E.g. no copying of
the structure from a local variable to the place the caller has set
aside to store the value returned. The function simply writes directly
to that place!

The semantics for using g() has changed, though: one must now do "g(&p)"
instead of "p = g()." This is error-prone, because it is not obvious
that 'ret' is being entirely modified, whereas returning something
gaurentees that (or implies that effect, in C++).

One might think that there is a practical reason for the compiler to do
this. But one could easilly construct a preprocessor to convert the
former code (exhibit A1) into code that compiles similiar to the latter
code (e.g. produces code similiar to exhibit B2).

Exhibit C1
---------- -----------------
struct s
{
 int x[1024];
};

extern s g();

struct s_g : s
{
 s_g()
 {
  s &t = *this;
  t.x[0] = t.x[1023] = 0;
 }
};

s g()
{
 return s_g();
}

int main()
{
 s p;
 p = g();

 return 0;
}
---------- End of Exhibit C1

And observe the assembly-language code produced:

Exhibit C2
---------- -----------------
@@g$qv
proc
near
?live16385@0:
  ;
  ; s g()
  ;
@1:
 push ebp
 mov ebp,esp
 mov eax,dword ptr [ebp+8]
  ;
  ; {
  ;  return s_g();
  ;
?live16385@16: ; EAX = return
 mov edx,eax
 xor ecx,ecx
 mov dword ptr [edx+4092],ecx
 mov dword ptr [edx],ecx
  ;
  ; }
  ;
?live16385@32: ;
@3:
@2:
 pop ebp
 ret
@@g$qv
endp

;...

;int main():
 add esp,-4092
 push eax

 push esp
 call @@g$qv
 pop ecx

 xor eax,eax
 add esp,4096
 ret
---------- End of Exhibit C2

As you can see, the compiler now acts as if the code in Exhibit B1 were
written, yet the semantics of g() is now such that "p = g()" can be used!

We now have the safety and efficiency of g() having the prototype "s g();"

The fact that a preprocessor can be constructed to trick the compiler
into producing better code is a sure sign that the compiler SHOULD be
doing this 'preprocessing' all by itself!

Consider the concept of the C++ constructor, e.g. the following:

void main()
{
 s b;
 //...
 s a = f(g(b));
}

The ability to demand 'a' be valid after it is defined and undefined
before hand (so that using it will trigger a compile-time error) makes
'a' very safe. But "s a = f(g(b))" produces move instructions, so often
people are forced into doing the following instead:

void main()
{
 s b;
 //...
 s tmp1;
 g(b, &tmp);
 s a;
 g(tmp1, &a);
}

This takes up more lines, and 'a' is not valid immediately after it is
defined, e.g. we aren't using the safety of the C++ constructor concept.
Instead, it's only valid after the last call to g(). Worse, losing the
ability to chain function calls precludes an arbitrary precision library
from being able to do e.g. "x = arbitrary_sum(arbitrary_product(a, b),
c)" because that would cause structures to be copied since they are
being returned. Thus instead one would need to split that line into
multiple lines, as was done in the example above. Thus the conciseness
of mathematical notation is lost, as it must be split to multiple lines!
That is the price for efficiency, unless compiler writers start
implementing common programming idioms.

It is true that constructing a preprocessor to solve this is not by any
means trivial since all possibilities must be considered but forcing
people to call f(&x) instead of being able to do x = f() is really not
as safe and is in any case less convenient since one can't do a =
f(g(b)). Such a preprocessor would not use the "struct s_g : s" idiom
but would instead use the "void g(s *ret)" method, but because of the
phase in compiling that this would be done in, it would not be unsafe.

In the mean time, I advise using my "struct s_g : s" idiom, or else
resorting to the unsafe "void g(s *ret)" method that is already in
widespread use.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]