Thread

Topic: CRVO: Let's start the hunt for the last temporary

Author: dave@boost-consulting.com (David Abrahams)
Date: Tue, 21 Mar 2006 19:33:26 GMT Raw View

"Ion Gazta=F1aga" <igaztanaga@gmail.com> writes:

> Since I know you know a lot about exceptions and their implementation
> do you see some evident problem with this approach? The "maybe this is
> already constructed" approach is difficult to implement?

My point was that the semantics of the function in the face of an
exception might change.  If you're going for full backward
compatibility (I don't know if you are), a silent change in semantics
is unacceptable.

--=20
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "=?iso-8859-1?q?Ion_Gazta=F1aga?=" <igaztanaga@gmail.com>
Date: Mon, 20 Mar 2006 15:21:12 CST Raw View

Hi to all,

 Recently reading the move semantics proposal and looking the interface
of some other standard proposals like filesystem, I remembered my old
performance-paranoia ambition to avoid any unnecessary temporary
object, mainly, when the temporary object leads to memory allocation.
This temporary is something that I consider a weak point of otherwise
excellent C++ performance.

1. SOME EXAMPLES AND THEIR PERFORMANCE
----------------------------------------------------------------------------

 Let me present some examples to explain what I call "the last
temporary" that can't be avoided even when we apply return value
optimization and move semantics.

--> EXAMPLE 1, RVO applied:

   std::string get_long_string()
   {
      return std::string("A long string to avoid small string
optimization");
   }

   int main ()
   {
      std::string mystr = get_long_string();
      return 0;
   }

In my Visual 7.1 compiler this leads to a single std::string
construction, since RVO is applied succesfully:

String operations: 1 allocation + 1 copy

--> EXAMPLE 2, assigning to a constructed string:

   #include <string>

   std::string get_long_string()
   {
      return std::string("A long string to avoid small string
optimization. Copy 1");
   }

   std::string get_another_long_string()
   {
      return std::string("A long string to avoid small string
optimization. Copy 2");
   }

   int main ()
   {
      std::string mystr = get_long_string();
      mystr = get_another_long_string();
      return 0;
   }

The "mystr = get_another_long_string();" sentence is executed like
this:

   tmpstr("Another long string to avoid small string optimization. Copy
2");
   mystr = tmpstr;
   //Temp destroyed

If has enough storage to hold the second string, this leads to:

1 allocation + 2 copies + 1 deallocation

If mystr has no enough storage to hold the second string:

2 allocations + 2 copies + 2 deallocations

As we can see, assigning to an already constructed object forces the
creation/destruction of a temporary and an assignment.

--> EXAMPLE 3, assigning to a constructed string with move semantics:

If we activate move semantics to EXAMPLE 2, "mystr =
get_another_long_string();" is executed like this:

   tmpstr("Another long string to avoid small string optimization. Copy
2");
   mystr = std::move(tmpstr);
   //Temp destroyed

So this leads to:
1 allocation + 1 copy + 1 deallocation

If mystr has no enough storage to hold the second string, we get the
same performance, since my_str buffer is always deallocated and the
tmpstr buffer is moved to mystr:

1 allocation + 1 copy + 1 deallocation

So move semantics are really useful to avoid 1 copy in the first case
and 1 allocation + 1 copy + 1 deallocation when mystr has no enough
storage to hold the new string.

As we can see, assigning to an already constructed object with
semantics forces also the creation of a temporary, although that
temporary is moved to the target instead of copied.

--> EXAMPLE 4, the good old C++ advice:

Can we do it better? Yes, let's remember the old C++ advice: "Pass
heavy objects always by reference". If that's a good advice to pass
parameters, why don't we apply it in when returning values?

   #include <string>

   void get_long_string(std::string &ret)
   {
      ret = "A long string to avoid small string optimization. Copy 1";
   }

   void get_another_long_string(std::string &ret)
   {
      ret = "A long string to avoid small string optimization. Copy 2";
   }

   int main ()
   {
      std::string mystr;
      get_long_string(mystr);
      get_another_long_string(mystr);
      return 0;
   }

Let's see the performance of "get_another_long_string(mystr);" sentence

If mystr has enough storage to hold the new string:

0 allocations + 1 copy + 0 deallocations

When mystr has no enough storage:

1 allocation + 1 copy + 1 deallocation

So obtaining results by reference is more efficient because we can
reuse resources. But obviously this syntax is very ugly and that's why
even the standard library likes to return std::string by value in
several classes and
functions:

   std::string mystr("mystr");
   std::string subs = mystr.substr(0, 1);

or

   std::stringstream stringstr;
   std::string result = stringstr.str();

However, the performance penalty can be high if the returned temporary
string has to allocate memory. stringstream example is, in my opinion,
a perfect example of a class that has a big performance disadvantage
when comparing to the C sprintf family, because:

1) Since stringstream can be implemented with another mechanism
different from std::string, it must return a copy of the string to the
user. The most efficient would be to require stringstream to be
implement with a string (after, all, it's called string_stream :-) )
2) The string is returned by value, which creates a temporary.

This makes in my opinion, string manipulation suboptimal. substr() is
also another example. "void substr(std::string &fill_it);" would be
much more effective.

2. SUMMARY OF LEARNT LESSONS
--------------------------------------

-> Returning a heavy object by value is not expensive when the target
is not already constructed (you have to construct the object anyway).
RVO or NRVO come to the rescue.

-> Returning a heavy object by value when the target is already
constructed is certainly expensive. As expensive as passing the value
as a parameter by value. We would almost never pass an string by value.
But we accept that when returning objects.

-> Move semantics give us a big performance boost. But when returning
by value to a constructed target, the move semantics can avoid all
temporaries except for our beloved "last temporary".

-> The most efficient way to obtain a value is using a reference to a
constructed object, so that we can reuse the storage of the object. If
we can't reuse the storage, we have nearly the same performance as with
move semantics.

-> Returning objects by reference leads to an ugly syntax and more
typing. Returning values is a more natural way to express a function.

2. HOW CAN WE AVOID THIS? A CRAZY IDEA TO START
--------------------------------------------------------------------------------------

The purpose of this post is to ask if we can imagine a new C++
mechanism to kill the last temporary. I've been thinking about a
simple, incomplete, and almost wrong proposal, but I would like to set
it as a possible discussion starting point:

Let's implement a conversion function to convert a std::vector in a
std::deque using classic return by value:

   template<class T>
   std::deque<T> vector_to_deque(const std::vector<T> &vect)
   {
      return std::deque<T>(vect.begin(), vect.end());
   }

   int main ()
   {
      std::vector<int> vect1 (/*Fill vector*/);
      std::vector<int> vect2 (/*Fill vector*/);

      //...
      //some other operations
      //...

      //This is optimal. Just one deque is constructed
      std::deque<int> deque = vector_to_deque(vect1);

      //We can't reuse deque's storage for the new value.
      //Temporary will be created and moved (C++0x) to target
      deque = vector_to_deque(vect2);

      return 0;
   }


Imagine now that we can know if the target object is already
constructed and we can construct or prepare the returned object.

   template<class T>
   std::deque<T> vector_to_deque(const std::vector<T> &vect)
   {
      if(std::return_object_constructed){
         //Target constructed, just assign to try to reuse storage
         std::return_object.assign(vect.begin, vect.end());
      }
      else{
         //Initialize return object with a constructor
         std::return_object(vect.begin, vect.end());
      }
      return std::return_object;
   }

   int main ()
   {
      std::vector<int> vect1 (/*Fill vector*/);
      std::vector<int> vect2 (/*Fill vector*/);

      //...
      //some other operations
      //...

      //This is optimal. Just one deque is constructed
      std::deque<int> deque = vector_to_deque(vect1);

      //This is also optimal. If deque has enough storage we
      //have no allocation. Even if there is no enouth storage
      //the deque will just allocate additional new buffers
      //until it can hold the new values, reusing old buffers.
      //
      //This will be always more efficient that moving
      //a temporary to the target.
      deque = vector_to_deque(vect2);

      return 0;
   }

I will call this std::return_object trick as Constructed Return Value
Optimization (CRVO). To make implementation easier we can have
restrictions:

-> We can only check once if the target is already constructed.

-> The check must have both conditions filled. If true, we can do
whatever, if false, we must construct the std::return_object

   if(std::return_object_constructed){
      std::return_object.assign(/**/);
   }
   else{
      Construction
   }
   std::return_object(/**/);

-> We must always return std::return_object at the end with "return
std::return_object" statement.

3. A CLASSIC EXAMPLE
--------------------

Let's think about one of the most used C++ examples after "class foo":

class customer
{
   public:
   customer(/**/);
   /*Other functions*/
   std::string get_name()  const;
   std::string get_surname()  const;
   std::string get_phone() const;
   std::string get_address() const;
};


This class is quite easy to design but has bad performance if we don't
have reference counted strings and small string optimization is not
enoguth. Another option would be returning const std::string &, but
that would require exposing implementation details or using complicated
object caches. But sometimes exposing a const reference is not
possible/reasonable.

Returning heavy objects by value is what I would call "premature
pessimization", because there is no way to avoid the temporary and
passing a string/vector by value is something we avoid always. Let's
imagine we want to minimize memory allocations:allocations:

//Fill a vector of customers
std::vector<customer> customers;

//If we want to avoid reallocations, reserve beforehand
customers.reserve(/**/)

for(/**/){
   //God bless move semantics!
   customer cust (/**/)
   customers.push_back(std::move(cust));
}

//Now iterate and show data
for(auto it = customers.begin(), itend = customers.end()
   ;it != itend;
   ++it){
   std::cout << "name: "      << it->get_name()     << std::endl
                 << "surname: " << it->get_surname() << std::endl
                 << "address: "  << it->get_address()  << std::endl;
}

So we get (customers.end() - customers.begin()) x 3 allocations,
deallocations and copies. Imagine now we can do the CRVO:

//If we want to avoid reallocations, reserve beforehand
customers.reserve(/**/)

//Now iterate and show data
std::string aux;
aux.reserve(/*big enough to avoid reallocations*/);

for(auto it = customers.begin(), itend = customers.end()
   ;it != itend;
   ++it){

   //No temporary since CRVO is applied
   aux = it->get_name();
   std::cout << "name: "    << aux << std::endl;

   //No temporary since CRVO is applied
   aux = it->surname();
   std::cout << "surname: " << aux << std::endl

   //No temporary since CRVO is applied
   aux = it->get_address();
   std::cout << "address: " << aux << std::endl;
}

I know we require more typing, but this code is much more efficient.
And the syntax is also attractive.


4. THIS IS A CRAZY IDEA THAT HAS MANY PROBLEMS
-----------------------------------------------------

--> Changes operations semantics since operator = is never called:

mystr = get_str();

This is similar to RVO:

std::string str = get_str();
std::string str2(get_str());

where the compiler is free to optimize the temporary and avoid
side-effects that could
be caused by the copy constructor from the temporary.


--> What happens when we have chained operations?

Consider:

mystr = get_long_string() + "some string" + get_another_long_string();

Should we pass mystr storage to get_another_long_string() and after
that use mystr in every operation and avoid any temporary? Until we
find a rule to do this, we can disable the optimization for chained
operations.

--> Self modification danger? This would require disabling this
optimization when we pass target as an argument
to the function.

Consider:

   std::string add_mp3_extension_to_filename (const std::string
&filename)
   {
      if(std::return_object_constructed){
         /*Oooops, if source is equal to target, we can have problems*/
         return_object.clear();
      }
      else{
         //Default construction
         return_object();
         return_object.reserve(filename.size() + 4);
      }
      return_object =  filename;
      return_object += ".mp3"
   }

   int main()
   {
      std::string filename("/home/user/music/album/song");
      filename = add_mp3_extension_to_filename(filename);
      std::cout << filename;
      return 0;
   }

This function would print ".mp3", instead of
"/home/user/music/album/song.mp3" with the optimization activated. If
we remove "return_object.clear();" statement (which is not necessary,
anyway) the output is correct. The same could happen when we assign a
member variable using a member function that uses that member variable.

5. HOW TO IMPLEMENT THIS
----------------------------------------------

Not being a compiler expert, one thing is clear, CRVO is not a easy
optimization. Moreover, since the called function can be compiled in a
library, the only way to know if the target is already constructed is a
run-time parameter. Could this require an extra parameter to functions
to know if the target is already constructed. Would this break binary
compatibility between old libraries compiled without the optimization
and new ones? This is a question for compiler experts, but an efficient
way to implement CRVO would be:

Consider a generic function:

template <class StringType>
StringType from get_some_text()
{
   if(std::return_object_constructed){
      std::return_object.assign("Some text");
   }
   else{
      std::return_object("Some text");
   }
   return std::return_object;
}

If the target type is always aligned we can use the lower bit of the
target:

   my_str = get_some_text<std::string>();

becomes (pseudo code)

   std::string *my_str_addr = &my_str;
   ++my_str_addr;
   get_some_text_mangled(my_str_addr);

and get_some_text into:

   template <class StringType>
   void from get_some_text_mangled(StringType *target)
   {
      if(target & 0x00000001){
         --target;
         target->assign("Some text");
      }
      else{
         new (target) StringType("Some text");
      }
   }

-> Possible optimization: If the target has trivial constructor,
although the syntax is valid, only the not constructed branch is
generated.

-> Possible optimization: If the function is inline, the compiler
already knows if the target is constructed. It can apply dead code
elimination and eliminate the branch.

--> Possible optimization: Chained return types. If the innermost
function implements the optimization and the outermost function just
returns the value the innermost returns, the optimization can avoid
also the temporary:

Consider:

class ExpensiveHolder
{
   public:
   ExpensiveObject get_expensive_object_impl(const std::string &name)
   {
      auto it = map.find(name);

      if(it == map.end()){
         throw BadSearch();
      }

      if(std::return_object_constructed){
         //If it's already constructed, assign to use external
resources
         std::return_object = it->second;
      }
      else{
         std::return_object(it->second)
      }
      std::return return_object;
   }

   ExpensiveObject get_expensive_object(const std::string &name)
   {
      std::cout << "Returning a expensive object";
      return get_expensive_object_impl();
   }

   //....

   private:
   ExpensiveObjectMap map;
};

ExpensiveObject myobj(/*construct expensive object*/);

//Other operations

myobj = get_expensive_object();

Since the return address is passed down to the innermost function, this
can have a reference to the constructed object.


6. TO SUM UP
------------------------

Returning objects in C++ functions can be expensive. Using tuples to
return several objects is a good idea but returning a tuple of heavy
objects requieres the creation of a temporary when the target is
already constructed and we can't reuse already reserved resources. If
we always suggest passing expensive objects to functions by reference,
we should try to get expensive values by reference. This can help many
C++ areas, specially current string manipulation functions.

This pseudo-proposal has many flaws. But this suggestion just tries to
start a discussion to know if we can somehow avoid this overhead and
use already constructed resources instead of moving temporaries to the
target (without move semantics, this is even worse). The proposed
syntax is just an idea that has problems (some of them presented in
point 3). Can we find a way to hunt the last temporary that will
survive after we apply move semantics?

Regards and sorry for the looooong post,

Ion

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: David Abrahams <dave@boost-consulting.com>
Date: Mon, 20 Mar 2006 23:28:38 CST Raw View

"Ion Gazta   aga" <igaztanaga@gmail.com> writes:

> This pseudo-proposal has many flaws. But this suggestion just tries to
> start a discussion to know if we can somehow avoid this overhead and
> use already constructed resources instead of moving temporaries to the
> target (without move semantics, this is even worse). The proposed
> syntax is just an idea that has problems (some of them presented in
> point 3). Can we find a way to hunt the last temporary that will
> survive after we apply move semantics?

Of course the first thing I thought of when looking over your proposal
was what happens when an exception is thrown.  I'm not sure this is
what you're saying, but if you're thinking of optimizing by replacing
assignment with move construction, I think that might be an important
question.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]

Author: "=?iso-8859-1?q?Ion_Gazta=F1aga?=" <igaztanaga@gmail.com>
Date: Tue, 21 Mar 2006 09:36:32 CST Raw View

> Of course the first thing I thought of when looking over your proposal
> was what happens when an exception is thrown.  I'm not sure this is
> what you're saying, but if you're thinking of optimizing by replacing
> assignment with move construction, I think that might be an important
> question.

The function would be a mixture between:

   Target function ();
and
   void function (Target &)

I'm don't know much about exceptions (as you know ;-)), but the code
pseudo-code

void function(Target *target)
{
   if(target & 0x00000001){
      --target;
      target->some_func() ;
   }
   else{
      new (target)Target(value);
   }
}

does not seem very difficult to implement exceptions. If the first
condition throws you can view as if the external assignment throws. The
second part would be the usual C++ temporary construction. Of course,
you have a problem if the function you use when the target is already
constructed (some_func) and the assignment throws different exceptions,
because you would expect operator= to be executed.

Target target;

try{
   target = func();
}
catch(assignment_exception &){

}
catch(...){
  //You can end here if some_func does not throw assignment_exception
}

I suppose this is similar to RVO when the copy constructor might throw
a different exception than a normal constructor. Regarding
implementation issues and more advanced exception problems, I must
admit I don't know much about exceptions, even less about their
implementation, so I need a bit of help with this.

Since I know you know a lot about exceptions and their implementation
do you see some evident problem with this approach? The "maybe this is
already constructed" approach is difficult to implement?

Regards,

Ion

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html                      ]