Topic: move semantic in C++
Author: Michael Pryhodko <mpryhodko@westpac.com.au>
Date: Fri, 3 Jun 2005 15:21:01 +0000 (UTC) Raw View
David Abrahams wrote:
> "Michael Pryhodko" <mpryhodko@westpac.com.au> writes:
>
> > {I have an increasing feeling that this thread would be much more at
> > home in comp.std.c++. clc++m is really intended for matters concerned
> > with using the current Standard C++ rather than an in depth discussion
> > of a proposal for the next version. -mod}
>
> OK, followups to this message are directed there.
David did a mistake typing comp.std.c++, so this message will apear on
comp.std.c++ first. :)
> > Ok, you got me here. Here is another idea:
> > How we are doing the same thing in current C++? We are creating class
> > like that:
> >
> > class T
> > {
> > ... // members
> > bool bFlag;
> > ~T()
> > {
> > if (!bFlag)
> > {
> > ... // do usual destruction work
> > }
> > }
> > };
> >
> > and then pass pointer to such object into f:
> >
> > {
> > T var(...);
> > f(&var); // inside f will call T::close() --> 'm_bFlag = true'
> > ...
> > }
> >
> > all is ok. Now it should be clear how it could be implemented with
> > destructive move:
> >
> > class T
> > {
> > ... // no flag here
> > ~T()
> > {
> > ... // usual destruction work
> > }
> > };
> >
> > void f(T% var_name)
> > {
> > T tmp <== var_name; // destructive move
> > }
> >
> > ...
> > {
> > T var(...);
> > f(var);
> > }
> >
> > Description of compiler behavior:
> > for every scope-tied(local/temporary/static) variable compiler
> > generates extra flag which follows that variable in memory and used as
> > a 'destroyed' flag whenever variable should be automatically destroyed
> > (also it could be used in DEBUG builds for diagnostics). Compiler
> > supposed to detect every variable that could be destroyed
> > 'unnaturally'. T% in f's declaration could be used as an indication
> > that:
> > - you could pass only scope-based variables here
>
> That makes destructive move useless for very important things like
> moving the elements of a vector.
Wait a second. It does not make whole 'destructive move' useless for
this. Only this function becomes useless. You can't get away from it!
If you are to move something scope-based you need to update something
(so that leaving the scope won't kill you). If you are moving value
from non-scoped variable -- you do not need to update anything.
Unfortunately that means we have a branch here. This problem could be
solved. For example I have some ideas:
- passing hidden flag (scope-based/free)
- checking in runtime variable address (stack-based variables sometimes
detectable in this way)
- having two function declaration that are doing the same -- one for
scope-based, another -- for the rest.
anyway it is solveable.
> > David, did I proved to you that this mechanism could be implemented
> > efficiently?
>
> No, this is just exactly the mechanism I thought you were proposing in
> the first place.
Cool... That means we invented this independendly :). But you should
agree -- this mechanism is efficient! It could even rise slightly
performance, since check for flag will be always 'inlined', while, when
flag is realized by developer, check for flag lies in destructor which
is not always inlined. In this way we avoid function call for already
destroyed variables.
> >> > 1. Hmm, strange question. How do you measure 'C++ with templates'
> >> > against 'C++ without them'? It is a feature -- you can not measure it's
> >> > "performance".
> >>
> >> 'scuse me, but it's perfectly possible to make useful and instructive
> >> measurements. Howard Hinnant has done real performance measurements
> >> of a standard library implemented with non-destructive move semantics
> >> vs. one implemented without move semantics.
> >
> > I understand, but sometimes it is easy to estimate. In this case I do
> > not have nor time nor sources nor experience building C++ compiler. So
> > 'estimate using logic' is my only option.
>
> No, there are still a few options left. You can write the "equivalent
> C++98 code" just like you were doing above as a way of demonstrating
> how the compiler would implement move. Then compare performance.
> BTW, implicit move (destructive or otherwise) can be completely
> emulated with library techniques so this should not be _too_ painful.
> You can build a move-enabled STL library based on STLPort, for
> example, to see what the effects are.
Hmm... I should have thought about that. Shame on me :)
> >> > 2. Difference between non- and destructive move is the same as between
> >> > using swap and move -- in first case you will pay twice: for swap and
> >> > for destruction of empty object, in second case you'll pay less than
> >> > for 'swap'.
> >>
> >> Except that you additionally incur runtime costs for passing extra
> >> parameters (according to you), and storing/checking these flags.
> >> Until you measure how those costs trade off against the cost of
> >> destroying empty objects, you can't claim your scheme is better.
> >
> > Considering poem above, can I claim it now?
>
> 'Fraid not.
You are unbearable!
> >> > If I'd have exact syntax, with every feature considered and described
> >> > -- I'd put this post on comp.std.c++ :). Also let me point that C++ is
> >> > ALREADY complicated and hard-to-use -- and 'move' feature could make it
> >> > less complicated for use by removing some problems.
> >>
> >> Show me how.
> >
> > I am trying...
> > 1. you could now hold complex values in the container without thinking
> > how it will be moved inside
>
> Already possible, with non-destructive move or even without a language
> extension.
.. but you always need to consider consequences. I think your
objective is wrong.
> > 2. operator new now could return object instead of raw pointer (see 'my
> > C++' for clues) -- this could solve everlasting memory leak problem (or
> > at least it will make it hard to do any kind of leak)
>
> I don't see how that helps.
It solves major problem of C++: newbies always mess things up. Now it
would be hard to make leak.
> > 3. look here:
> > http://blogs.msdn.com/oldnewthing/archive/2005/05/18/419130.aspx
> > What do you see here? I see: "If you wish speed and efficiency do not
> > use classes -- use PODs". And this guy has quite big audience. That
> > kind of crap will go to history -- aren't that great?
>
> a. No it won't. People are superstitious and will remain so
> b. non-destructive move also allows speed and efficiency.
I told later that these arguments apply to both types of 'move'. People
tends to change.
> > Please, note that applies to both kind of 'move'... simply destructive
> > one is more efficient
>
> I am not convinced of that.
Could you give an example of case where nondestructive move is more
efficient?
> > while giving one extra opportunity to shoot yourself. Also my idea
> > with two phased move and passing move context is 'more generic' --
> > i.e. you could move objects which can not be moved by other means,
>
> Example?
see parallel conversation with Howard (insert_move and insert_copy).
> > also it could be used to create complex atomic-like updates.
>
> What use is an "atomic-like" update?
For example giving strong exception safety guarantee for
vector.insert.May I remind you that C++ class constructor is the fine
example of atomic-like update -- and it is used quite extensively.
Bye.
Sincerely yours, Michael.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: Michael Pryhodko <mpryhodko@westpac.com.au>
Date: Fri, 3 Jun 2005 15:21:01 +0000 (UTC) Raw View
[according to moderator's will I am trying to move this thread to
comp.std.c++'s processor :-)]
> Note that we do not require strong exception safety for this case, only
> basic. And again the reason is performance.
Hmm... anyway -- we could get strong guarantee here with efficient
implementation. If every vector's operation could have it -- it is
easier for developer to use vector, because now he won't need to care
about it -- any operation either fail, leaving vector in the previous
state, or succeeds completely. How do you think -- what percentage of
C++ developers over the world knows what is 'strong exception safety'?
;)
> > template<class T>
> > void insert_move(size_t pos, T const& var)
> > {
> > // phase 1 -- could throw
> > // move_1 right part of buffer
> > for(size_t i = size(); i > pos; --i)
> > {
> > move_1(m_buf[i - 1] ==> m_buf[i]);
> > }
> >
> > // remember m_buf[pos]'s binary representation
> > // and returns it back in case of exception
> > auto_hold_data<T> auto_data(m_buf[pos]);
> >
> > // copy construct T at m_buf[pos]
> > m_buf[pos].T(var);
> > // or in pseudocode: construct(m_buf[pos], var);
> >
> > // phase 2 -- throw()
> > // finalize right part movement
> > for(i = size(); i > pos; --i)
> > {
> > move_2(m_buf[i - 1] ==> m_buf[i]);
> > }
> >
> > // deactivate auto_data
> > auto_data.deactivate();
> > }
> >
> >
> > Looks really ugly, but as efficient as it could be. Maybe it is
> > possible to create nice syntax for that? And in this way it is possible
> > to insert-copy array of values atomically.
>
> I'm confused (which is my normal state ;-) ): If I just concentrate on
> what is happening at m_buf[pos] I see:
>
> move_1(m_buf[pos] ==> m_buf[pos+1]);
> auto_hold_data<T> auto_data(m_buf[pos]);
> construct(m_buf[pos], var);
> move_2(m_buf[pos] ==> m_buf[pos+1]);
>
> Exactly what are move_1 and move_2 doing to this location?
move_1 does not change source. It is purpose:
if object can be moved by memcpy (has not external references) does
nothing at all
otherwise -- prepares helper_obj (located for example in local storage)
of type defined by move_1's implementation. helper_obj state will be
used in move_2 to 'finalize' move or to 'undo' move preparations in
case of exception.
move_2:
implicitly copies object binary representation into new place and
executes whatever instruction T has in it's move_2. Exceptions are
forbidden.
Please note that this example gives strong guarantee, and for types
with no-throw copy constructor it is possible to get top performance by
detecting that 'auto_data' is unnecessary. Have not got a foggiest idea
how to do it -- maybe smart optimizer? :).
> > Hmm... it seems I missed something, but in case of nondestructive move
> > you are supposed to:
> > - construct default value at n+1'th position
> > - for every move from [i] to [i+1] you need to fill [i] with meaningful
> > value -- if this is done with 'swap' you'll also have one extra copy to
> > temporary variable
[skip]
> For user-defined types, a valid technique would be to default construct
> the target and then to swap source and target, as you suggest above.
> Another valid technique would be to memcpy from source to target and
> then to memset the source to 0. Still another valid technique would be
> to memcpy from source to target and leave the source intact (just as is
> done for scalars). And there will be types with internal references
> where a move construction is more complicated than any of these options.
> All of these techniques will be optimal for some types and not others.
> Only the class author can know for sure. So just as the class author
> must write a copy constructor, he too must write a move constructor.
It is still leaves us with extra-work for user-defined types:
for each i in [size(),pos)
{
when moving from [i] to [i+1] do extra work:
make it so that [i] is left in valid destructible state
}
> Unlike the copy constructor case, the current proposal does not advocate
> a compiler generated move constructor by default. However there has
> been some discussion about letting the type author explicitly request a
> compiler generated move constructor (which would move construct each
> sub-object).
Hmm... in my case move_X are implicitly generated -- just like
constructors. And behaves in ways similar to constructors -- makes it
look more convenient for C++ developer to accept (I hope).
> > This results in O(n) + C extra operations (O(2*n) + C in case of
> > 'swap')
> > This is still a bit more than in 'destructive move' case. This extra
> > could be insignificant if you use complex types, but for small and
> > easy-to-move types it could be relatively big. For example for this
> > type:
>
> <nod> Not pessimizing vector<pod> has been a driving force from the
> beginning. Years ago a common suggestion was to just have vector always
> use swap when scooting around elements internally. But as you point
> out, this can be disastrous for vector<pod>. Thus the need for uniform
> syntax that means:
>
> copy construct (source remains unchanged)
> copy assign (source remains unchanged)
> move construct (source remains constructed)
> move assign (source remains constructed)
I can't see where vector needs these two. Well, it could be useful for
fixed-size vector, where insert into full vector causes highest value
to 'fall out'. But as I'll show later it should be part of vector, not
move semantic and could be implemented efficiently without these two.
> swap (source and target values swapped)
In my case 'swap' is provided by language itself based on T's move
definition.
> and possibly:
>
> move construct (source is destructed)
>
> and I've even stumbled across minor use cases for:
>
> move assign (source is destructed)
>
> -------
>
> Let's look a little deeper at the situation where we need to move/copy a
> vector element from one place within the vector's size to another place
> within the vector's size. Such an operation is common for both the
> erase and insert functions.
>
> move(m_buf[i] ==> m_buf[j]);
>
> Before the move, both m_buf[i] and m_buf[j] are constructed and possibly
> own resources.
>
> But after the move what of m_buf[i]? Left in a constructed or
> destructed state?
1. you can not put two values into one variable. That means if you
whant to put giraffe into fridge you need to take out elephant first.
:))
2. should be implemented like this:
move_1(m_buf[i] ==> m_buf[j]); // could throw
m_buf[j].~T(); // should not throw
move_2(m_buf[i] ==> m_buf[j]); // throw()
on success: m_buf[j] hold previos value of m_buf[i], m_buf[i] is
undefined (i.e. this variable has no value in it)
this gives us atomic move to defined variable.
> Before this (insert or whatever) operation is over, something from
> somewhere is either going to be moved into or copied into m_buf[i] so
> that it is once again in a valid constructed state.
Wait a second! If you a going to insert something into vector you do
not ever need to perform 'move to defined variable' operation. In
pseudocode above every move is 'from defined to undefined'.
> If the move left
> m_buf[i] in a destructed state, the total sequence of operations on
> m_buf[i] will look something like:
>
> transfer value from m_buf[i] to m_buf[j]
> m_buf[i].~T();
> ::new (m_buf + i) T(new_value);
>
> If the move left m_buf[i] in a constructed state, the total sequence of
> operations will look something like:
>
> transfer value from m_buf[i] to m_buf[j]
> m_buf[i] = new_value;
>
> In general it has been my experience that it is faster to copy or move a
> new value into an already constructed object than to go through a
> destruct-construct sequence. It isn't an order of magnitude difference.
> But this type of operation is going to be happening a lot, and we don't
> want to prematurely pessimize it. For example consider the case that
> the value_type of the vector is vector<int> (i.e. the entire object is a
> vector<vector<int>>):
>
> In the former (destruct) case ownership of the internal vector buffer
> has been transferred from m_buf[i] to m_buf[j], and m_buf[j] has deleted
> any buffer it might have previously owned. Then, in the case that we
> are copying from new_value, a new buffer must be allocated for m_buf[i]
> in order to copy construct from new_value.
>
> In the latter (non-destruct) case ownership of m_buf[i]'s buffer has
> again been transferred to m_buf[j]. But in this case m_buf[i] is left
> in a constructed state. The logical thing to do is have m_buf[j]
> transfer ownerhip of its buffer to m_buf[i] instead of deallocating it
> (just do a swap(m_buf[i], m_buf[j])). Now when we (copy) assign
> new_value to m_buf[i], there is a decent chance that m_buf[i] already
> has sufficient capacity (donated from m_buf[j]) to handle the assignment.
>
> Using destructive move semantics for this case guarantees two trips to
> the heap: a deallocation followed by an allocation.
>
> Using non-destructive move semantics for this case, one might still have
> two trips to the heap. But one might also get away with zero trips to
> the heap. In a completely random situation, those odds are 50%, which
> would average out to one trip to the heap - or roughly twice as fast as
> the destructive case.
This advantage comes from the knowledge of std::vector's nature:
- vector's resource is a chunk of memory filled with values ot type T
- if target's buffer is large enough we could 'take a shortcut' by
moving elements instead of destroying target and moving vector's state.
But:
- there is a price: target's reserved space could be too big -- I do
not like the idea of big chunk of memory forever hanging around
- the same effect could be reached with destructive move:
insert_copy(T const& new_val)
{
swap(m_buf[i], m_buf[j]); // it could be implemented using move, see
'paper'
m_buf[i] = new_val; // here vector will decide: 2 trips to heap or 0
}
Anyway -- problem you have described is not a problem of 'move'. It is
a problem of 'how to implement insert_copy which will involve moving to
defined variable internally'. And I think that problem should be
adressed by insert_copy implementation based on knowledge of current
move semantic implementation or T's specific. As I showed to you it
implements independently of move semantic type.
More thoughts:
suppose we are working with vector1<vector2<T>>
since advantage comes from vector's ability to use big enough MEMORY
chunk from other vector we could:
1. use 'generic' method described at the beginning
2. vector2's allocator could reuse freed chunks (for example through
static cache)
In this way we will get almost the same situation: 50% chance of going
into the heap. Difference will be that in this case insert_copy will
generate N calls to T's copy constructor, while in swap'based it will
be N T's operator=s. And now it depends on T's implementation what is
faster.
Nah, definitely that sort of optimization should be part of vector's
logic, not move semantic. Insert_copy's implemetor will have two
options: either use generic one or specific one -- and this is good to
have a choice.
> > Objects could have external references that points to them. During
> > 'move' they need to be updated and this 'update' could throw.
>
> <nod>
>
> > 1. What 'current' nondestructive semantic has to solve this task?
>
> Nothing really. The class author must code the move constructor or move
> assignment manually, and thus will manually fix up such references. The
> Metrowerks implementation of all of the node-based containers do exactly
> this (they all are self referencing). Same issue with swap. For us
> these are all no-throw operations.
>
> The current proposal requires for the containers (quoted from N1771):
>
> > If the contained value_type alters the value of the source (even if the
> > source is an rvalue) when constructing or assigning, then the operation must
> > not throw an exception.
>
> I.e. If you want to put a type into a std::container, then it must have
> a no-throw move constructor and a no-throw move assignment, else it must
> be CopyConstructible and CopyAssignable (which are allowed to throw, but
> are not allowed to modify the source).
>
> The requirements for the std::algorithms are more relaxed. They will
> maintain basic exception safety even if a move constructor or move
> assignment throws.
>
> > 2. What will happen in insert_move example if one of objects' move
> > constructor will throw during 'external references update'?
>
> Exactly. Disaster. And the only cure I know of is to say "don't do
> that", just as we currently do for destructors (17.4.3.6p2). But we
> still must allow copy constructors and copy assignment to throw and
> maintain existing exception safety guarantees.
I do not like it. :-| No offence. All of this sounds like complication
of existing state of things.
Bye.
Sincerely yours, Michael.
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: hinnant@metrowerks.com (Howard Hinnant)
Date: Fri, 3 Jun 2005 20:56:09 GMT Raw View
In article <1117760565.159819.142460@g47g2000cwa.googlegroups.com>,
Michael Pryhodko <mpryhodko@westpac.com.au> wrote:
> > move_1(m_buf[pos] ==> m_buf[pos+1]);
> > auto_hold_data<T> auto_data(m_buf[pos]);
> > construct(m_buf[pos], var);
> > move_2(m_buf[pos] ==> m_buf[pos+1]);
> >
> > Exactly what are move_1 and move_2 doing to this location?
>
> move_1 does not change source. It is purpose:
> if object can be moved by memcpy (has not external references) does
> nothing at all
> otherwise -- prepares helper_obj (located for example in local storage)
> of type defined by move_1's implementation. helper_obj state will be
> used in move_2 to 'finalize' move or to 'undo' move preparations in
> case of exception.
>
> move_2:
> implicitly copies object binary representation into new place and
> executes whatever instruction T has in it's move_2. Exceptions are
> forbidden.
I'm sorry, but I'm still confused. If move_1 does nothing at all for a
pod, and move_2 memcpy's the pod, it looks to me like move_2 is going to
be copying the wrong value because we've already constructed var at
m_buf[pos].
> > For user-defined types, a valid technique would be to default construct
> > the target and then to swap source and target, as you suggest above.
> > Another valid technique would be to memcpy from source to target and
> > then to memset the source to 0. Still another valid technique would be
> > to memcpy from source to target and leave the source intact (just as is
> > done for scalars). And there will be types with internal references
> > where a move construction is more complicated than any of these options.
> > All of these techniques will be optimal for some types and not others.
> > Only the class author can know for sure. So just as the class author
> > must write a copy constructor, he too must write a move constructor.
>
> It is still leaves us with extra-work for user-defined types:
> for each i in [size(),pos)
I don't understand what the range [size(),pos) means. Did you mean
[pos, size())?
> {
> when moving from [i] to [i+1] do extra work:
> make it so that [i] is left in valid destructible state
> }
> > Unlike the copy constructor case, the current proposal does not advocate
> > a compiler generated move constructor by default. However there has
> > been some discussion about letting the type author explicitly request a
> > compiler generated move constructor (which would move construct each
> > sub-object).
>
> Hmm... in my case move_X are implicitly generated -- just like
> constructors. And behaves in ways similar to constructors -- makes it
> look more convenient for C++ developer to accept (I hope).
Implicitly generated copy constructors and assignment operators have
been and will continue to be the source of a great many bugs. This was
a necessary evil for C compatibility. We don't need to go that route
for move semantics.
If the language can't guarantee that an implicitly generated move
constructor (or assignment) is correct 100% of the time, we shouldn't be
implicitly generating them.
> > move construct (source remains constructed)
> > move assign (source remains constructed)
>
> I can't see where vector needs these two. Well, it could be useful for
> fixed-size vector, where insert into full vector causes highest value
> to 'fall out'. But as I'll show later it should be part of vector, not
> move semantic and could be implemented efficiently without these two.
move construct is needed during insert (sufficient capacity case) for
each existing element that needs to be moved beyond the current size().
move assign is needed during erase for each element moved to a lower
position in the array, and for which the source of the move is to be
left constructed after the erase (i.e. most elements if erasing only a
few elements far from the end). It is also needed during insert
(sufficient capacity case) for each existing element that needs to be
moved from its current location to a higher spot in the array but within
the current size() (i.e. most elements if inserting only a few elements
far from the end).
> > swap (source and target values swapped)
>
> In my case 'swap' is provided by language itself based on T's move
> definition.
This can be suboptimal for those classes that aren't movable with
memcpy. There are more reference fixup operations associated with 3
moves than with a single swap.
> > Let's look a little deeper at the situation where we need to move/copy a
> > vector element from one place within the vector's size to another place
> > within the vector's size. Such an operation is common for both the
> > erase and insert functions.
> >
> > move(m_buf[i] ==> m_buf[j]);
> >
> > Before the move, both m_buf[i] and m_buf[j] are constructed and possibly
> > own resources.
> >
> > But after the move what of m_buf[i]? Left in a constructed or
> > destructed state?
>
> 1. you can not put two values into one variable. That means if you
> whant to put giraffe into fridge you need to take out elephant first.
> :))
<shrug> Thanks, I'll keep that in mind.
> 2. should be implemented like this:
> move_1(m_buf[i] ==> m_buf[j]); // could throw
> m_buf[j].~T(); // should not throw
Why is m_buf[j] being destructed here? Or is this a type-o?
> move_2(m_buf[i] ==> m_buf[j]); // throw()
>
> on success: m_buf[j] hold previos value of m_buf[i], m_buf[i] is
> undefined (i.e. this variable has no value in it)
>
> this gives us atomic move to defined variable.
Sorry, this is still unclear to me. I'm afraid I need a bug-free
description. I'm making too many guesses at what you mean.
> > Before this (insert or whatever) operation is over, something from
> > somewhere is either going to be moved into or copied into m_buf[i] so
> > that it is once again in a valid constructed state.
>
> Wait a second! If you a going to insert something into vector you do
> not ever need to perform 'move to defined variable' operation. In
> pseudocode above every move is 'from defined to undefined'.
<sigh>
vector-> _ _ _ _ _
^
|
insert -----+
After the insert the element at v[2] will be moved to the location of
v[3]. The element at v[2] will have a new value after the insert.
Before the insert both v[2] and v[3] are constructed.
After the insert both v[2] and v[3] are constructed.
What you do during the insert is an implementation detail. If you want
to destruct v[2] and re-construct it fine. But I'm arguing that this is
not efficient. It is more efficient to leave v[2] constructed
throughout the insert operation. Same assertion applies to elements
v[3] through v[size()-1].
> insert_copy(T const& new_val)
> {
> swap(m_buf[i], m_buf[j]); // it could be implemented using move, see
> 'paper'
> m_buf[i] = new_val; // here vector will decide: 2 trips to heap or 0
> }
The swap is inefficient for pods. swap is the wrong semantics here.
The value of m_buf[i] is not important after your swap statement because
we are about to assign it a new value. Move assign from m_buf[i] to
m_buf[j] is what is required. If a type implements move assign as a
swap, fine. But for pods it should be a plain copy.
> Anyway -- problem you have described is not a problem of 'move'. It is
> a problem of 'how to implement insert_copy which will involve moving to
> defined variable internally'. And I think that problem should be
> adressed by insert_copy implementation based on knowledge of current
> move semantic implementation or T's specific. As I showed to you it
> implements independently of move semantic type.
<shrug>
> More thoughts:
> suppose we are working with vector1<vector2<T>>
>
> since advantage comes from vector's ability to use big enough MEMORY
> chunk from other vector we could:
> 1. use 'generic' method described at the beginning
> 2. vector2's allocator could reuse freed chunks (for example through
> static cache)
allocator caching is not something we want to tread on here. That is an
independent topic, at least if we want to continue to support user
defined allocators.
> Nah, definitely that sort of optimization should be part of vector's
> logic, not move semantic. Insert_copy's implemetor will have two
> options: either use generic one or specific one -- and this is good to
> have a choice.
Move detection? How? I know of two solutions: traits and concepts.
Traits are intrusive and error prone (requires type author to register
with trait). Concepts may or may not happen. I don't want to hinge the
success of move semantics on concepts.
-Howard
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: dave@boost-consulting.com (David Abrahams)
Date: Sat, 4 Jun 2005 06:25:22 GMT Raw View
Michael Pryhodko <mpryhodko@westpac.com.au> writes:
> David Abrahams wrote:
>> "Michael Pryhodko" <mpryhodko@westpac.com.au> writes:
>>
>> > {I have an increasing feeling that this thread would be much more at
>> > home in comp.std.c++. clc++m is really intended for matters concerned
>> > with using the current Standard C++ rather than an in depth discussion
>> > of a proposal for the next version. -mod}
>>
>> OK, followups to this message are directed there.
>
> David did a mistake typing comp.std.c++, so this message will apear on
> comp.std.c++ first. :)
I don't know what you mean about mistakes. Directing followups to
comp.std.c++ was entirely intentional.
>> > David, did I proved to you that this mechanism could be implemented
>> > efficiently?
>>
>> No, this is just exactly the mechanism I thought you were proposing in
>> the first place.
>
> Cool... That means we invented this independendly :). But you should
> agree -- this mechanism is efficient!
I don't see why I should agree. You're inserting tests and branches
that would otherwise be unneeded.
> It could even rise slightly performance, since check for flag will
> be always 'inlined', while, when flag is realized by developer,
> check for flag lies in destructor which is not always inlined. In
> this way we avoid function call for already destroyed variables.
And if there's no flag at all it's even more efficient.
>> >> > 2. Difference between non- and destructive move is the same as between
>> >> > using swap and move -- in first case you will pay twice: for swap and
>> >> > for destruction of empty object, in second case you'll pay less than
>> >> > for 'swap'.
>> >>
>> >> Except that you additionally incur runtime costs for passing extra
>> >> parameters (according to you), and storing/checking these flags.
>> >> Until you measure how those costs trade off against the cost of
>> >> destroying empty objects, you can't claim your scheme is better.
>> >
>> > Considering poem above, can I claim it now?
>>
>> 'Fraid not.
>
> You are unbearable!
Please refrain from name-calling. You don't have to carry on this
discussion with me if you find me unbearable.
>> >> > If I'd have exact syntax, with every feature considered and described
>> >> > -- I'd put this post on comp.std.c++ :). Also let me point that C++ is
>> >> > ALREADY complicated and hard-to-use -- and 'move' feature could make it
>> >> > less complicated for use by removing some problems.
>> >>
>> >> Show me how.
>> >
>> > I am trying...
>> > 1. you could now hold complex values in the container without thinking
>> > how it will be moved inside
>>
>> Already possible, with non-destructive move or even without a language
>> extension.
>
> .. but you always need to consider consequences. I think your
> objective is wrong.
I have no clue what you're talking about.
>> > 2. operator new now could return object instead of raw pointer (see 'my
>> > C++' for clues) -- this could solve everlasting memory leak problem (or
>> > at least it will make it hard to do any kind of leak)
>>
>> I don't see how that helps.
>
> It solves major problem of C++: newbies always mess things up. Now it
> would be hard to make leak.
I don't see how the change you're proposing makes it harder to leak.
You'd have to show some detailed examples for me to get it.
>> > 3. look here:
>> > http://blogs.msdn.com/oldnewthing/archive/2005/05/18/419130.aspx
>> > What do you see here? I see: "If you wish speed and efficiency do not
>> > use classes -- use PODs". And this guy has quite big audience. That
>> > kind of crap will go to history -- aren't that great?
>>
>> a. No it won't. People are superstitious and will remain so
>> b. non-destructive move also allows speed and efficiency.
>
> I told later that these arguments apply to both types of 'move'. People
> tends to change.
>
>
>> > Please, note that applies to both kind of 'move'... simply destructive
>> > one is more efficient
>>
>> I am not convinced of that.
>
> Could you give an example of case where nondestructive move is more
> efficient?
I didn't say nondestructive move was more efficient. I only said that
I'm not convinced your destructive move is more efficient than
nondestructive move.
>> > while giving one extra opportunity to shoot yourself. Also my idea
>> > with two phased move and passing move context is 'more generic' --
>> > i.e. you could move objects which can not be moved by other means,
>>
>> Example?
>
> see parallel conversation with Howard (insert_move and insert_copy).
Sorry, I've been watching, but I don't see such an example in
evidence. If you don't feel like repeating yourself, post a google
groups link to the article and tell me what text to search for to see
the example.
>> > also it could be used to create complex atomic-like updates.
>>
>> What use is an "atomic-like" update?
>
> For example giving strong exception safety guarantee for
> vector.insert.
The strong guarantee is available for vector.insert today, in the case
where neither the element's copy nor its assign can throw an
exception.
With nondestructive move, we'll have the strong guarantee also in the
case where move construct and move assign can't throw an exception.
What new case of the strong guarantee does destructive move provide
for?
> May I remind you that C++ class constructor is the fine
> example of atomic-like update -- and it is used quite extensively.
No, you can't "remind" me, because I don't know what "atomic-like" is
supposed to mean, and I have never considered C++ ctors to be
"atomic-like."
Either an update is atomic or it's not. An atomic-like operation is
not atomic, and I don't see how it being somehow "atomic-like" can be
useful to anyone.
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Michael Pryhodko" <mpryhodko@westpac.com.au>
Date: 4 Jun 2005 19:50:01 GMT Raw View
> > David Abrahams wrote:
> >> OK, followups to this message are directed there.
> >
> > David did a mistake typing comp.std.c++, so this message will apear on
> > comp.std.c++ first. :)
>
> I don't know what you mean about mistakes. Directing followups to
> comp.std.c++ was entirely intentional.
- go to: http://tinyurl.com/8h3yr
- click 'show options'
- I see here something like this:
Newsgroups: comp.lang.c++.moderated, comp.sdtd.c++
Followup-To: comp.std.c++
From: David Abrahams <d...@boost-consulting.com>
as you can see newsgroups list has mistype in the newsgroup name. If
this is not your fault (maybe moderator's?) -- please, accept my
apologies, I thought you did it.
> > But you should agree -- this mechanism is efficient!
>
> I don't see why I should agree. You're inserting tests and branches
> that would otherwise be unneeded.
I'll explain my point:
- we started with a problem:
template <class T>
void f(T& x)
{
T y = destructive_move(x);
}
> How does code that calls f know whether to generate a destructor for
> the argument to f?
- I propose to use marking to indicate that f could destroy variable:
void f(T% x)
{
T y = destructive_move(x);
}
we have two possibility:
1. if f was called with local/temporary/static variable as argument:
- compiler must detect it (it is easy) and append flag to variable's
memory representation (no losses here, since it is done on compile
stage)
- if f destroys x it should change this flag
- whenever execution leaves caller scope this flag must be used to
detect if destructor should be called
2. if f is called with variable with unknown origin it should be
considered that it does not bound to any automatic destruction
mechanism, we could:
- forbid passing such variables into f, because '%' means:
local/temp/static only. That is inconvenient -- user must overload f
with exactly the same body, but different header:
void f(T& x)
{
T y = destructive_move(x);
}
- or compiler could help us here by:
a) generating new f's body if he detects that f is called with another
type of variable.
b) or generating f's body in a special way, for example:
-> entry point for local/temp/static calls
set flag to true // really depends on platform or compiler vendor
-> entry point for vars with unknown status (flag is false by
default)
...
destructive_move code
if (flag)
mark_variable_destroyed
(later-addition:
hmm... if we have N marked arguments, f's code should take care about
2^N possible arguments configuration, luckily we will rarely have many
marked arguments, but for cases of big N compiler could switch to:
z) or by passing additional hidden parameter which holds information
necessary to find if K-th argument was passed as local/temp/static
variable.
end-of-later-addition)
c) or some platforms could use tricks to pass this information: for
example if variable's address is less than X -> it is not from heap ->
it is local/temp/static variable.
(later-addition:
won't work if you pass address of local object's subobject (e.g element
of array)
end-of-later-addition)
well, can't invent anything else at this moment...
Also compiler could help us by detecting and warning against such code:
void f(T& x)
{
T y = destructive_move(x); // caller is not prepared
}
Now I'll try to compare non- and destructive moves:
A) destructive.
losses:
1. extra bool in stack/static storage for variables that passed as
marked argument
2. f's payment for variable status detection support
3. f's payment for setting 'dead' flag if variable was destroyed
4. check for flag before calling destructor
gains:
5. destructor won't be called for moved variable
B) nondestructive:
losses:
1. payment for leaving source in valid 'minimal' state
2. payment for 'minimal state', e.g. checking for INVALID_HANDLE_VALUE
in destructor
gains:
<nothing that is not already listed in A)losses>
Here are some thoughts:
- A's losses are constant, B's -- increases with T's complexity (e.g.
if T is derived from another type), so in following notes I'll consider
that T is simple type, but we'll keep in mind that T's movement price
is a sum of subobjects' prices (in non-destructive case)
- sometimes T's 'minimal state' can not be expressed as NULL value of
one of it's members -- in this case we need to add bool member flag --
as result every instance of T will have increased size, while in
destructive case -- only those that are passed as marked arguments
- A.4 == to B.2 (if T is simple)
Hmm, from all this analysis it is hard to say which approach is
better... I am close to propose to forbid this 'marking' functionality
-- this will cross out A.1-A.4, making destructive move winner, but
leaving us without possibility to move local variables. Well, comments
are welcome.
> >> > 1. you could now hold complex values in the container without thinking
> >> > how it will be moved inside
> >>
> >> Already possible, with non-destructive move or even without a language
> >> extension.
> >
> > .. but you always need to consider consequences. I think your
> > objective is wrong.
>
> I have no clue what you're talking about.
I was talking about 'move' in general, i.e. with any 'move' destructive
or not.
Also 'two-phased move' (I do not know if it could be implemented with
non-destructive move) makes it even more better:
- complex updates to container's state could have strong safety
guarantee (i.e. either succeeded or failed and container is in previous
state) -- i.e. vector.insert now could have strong guarantee
In the light of these discussions I start thinking that 'two-phased'
feature maybe could be discussed separately from 'non- vs destructive'
subject. If it is possible to separate it from destructive move...
By the way, thanks to this discussion I did very disturbing discovery:
- in one of my projects I used small_map<T> class -- this is basically
a map built on top of vector (it was designed to hold small number of
values)
- vector is always sorted, small_map::insert is implemented with
vector::insert
- that means that if T's operator= throws vector will be left in valid
but undefined state, ruining program state
- glance into vector's code confirms my fears
Hmm... This reveleation comes to me as a shock. T could be a structure
with two std::string for example. This showed me how fundamental
problem exception safety is... And how badly we need strong guarantee
at as much places as we could get.
> > It solves major problem of C++: newbies always mess things up. Now it
> > would be hard to make leak.
>
> I don't see how the change you're proposing makes it harder to leak.
> You'd have to show some detailed examples for me to get it.
Note again that this apply for any kind of move. Also I suggest you to
look upon 'How C++ could look like' chapter in my paper, it could help
to understand my point. Now suppose:
- we have 'move' in C++ (plus implicit move) -- that means it is ok to
return object by value
- 'new' returns object, that will free associated memory in it's
destructor (it's more precise description is in paper)
Now:
- it is hard to make leak (you need to call specific function
explicitly to break link between handler and resource)
- and you do not risk to do an expensive copy accidentally, because
every time your handler will be moved, not copied
Pseudocode:
template<class T> handler<T> new(...);
handler<T> t = new T(...);
Also for convenience I was proposing:
- to merge C++ reference and C++ pointer functionality and use '&'
symbol to denote reference
- to use '*' symbol to denote handler
This idea is extremely 'raw', do not judge it too harshly. Any comments
are welcome.
> > Could you give an example of case where nondestructive move is more
> > efficient?
>
> I didn't say nondestructive move was more efficient. I only said that
> I'm not convinced your destructive move is more efficient than
> nondestructive move.
If you take away 'two-phased' feature, my destructive move is typical
destructive move. What makes you uncertain about efficiency of
destructive move?
> >> > while giving one extra opportunity to shoot yourself. Also my idea
> >> > with two phased move and passing move context is 'more generic' --
> >> > i.e. you could move objects which can not be moved by other means,
> >>
> >> Example?
> >
> > see parallel conversation with Howard (insert_move and insert_copy).
>
> Sorry, I've been watching, but I don't see such an example in
> evidence. If you don't feel like repeating yourself, post a google
> groups link to the article and tell me what text to search for to see
> the example.
> >> > also it could be used to create complex atomic-like updates.
> >>
> >> What use is an "atomic-like" update?
> >
> > For example giving strong exception safety guarantee for
> > vector.insert.
>
> The strong guarantee is available for vector.insert today, in the case
> where neither the element's copy nor its assign can throw an
> exception.
>
> With nondestructive move, we'll have the strong guarantee also in the
> case where move construct and move assign can't throw an exception.
>
> What new case of the strong guarantee does destructive move provide
> for?
destructive move itself does not make situation better. I was speaking
about 'my destructive move' which have 'two-phase' feature. It could
give strong guarantee while still being reasonable efficient.
> > May I remind you that C++ class constructor is the fine
> > example of atomic-like update -- and it is used quite extensively.
>
> No, you can't "remind" me, because I don't know what "atomic-like" is
> supposed to mean, and I have never considered C++ ctors to be
> "atomic-like."
For me 'atomic-like' means: ability of given operation to succeed
completely, or fail leaving environment in previous state. For example
if given function:
void f()
{
m_vector1.push_back(1);
m_vector2.push_back(1);
}
written so that it will behave 'atomic-like' it is very convenient to
use it. You never need to worry about state of your program when you
use it; just call it, catch exceptions and continue work. Generally if
you have to work only with 'atomic-like' functions developing is much
easier.
> Either an update is atomic or it's not. An atomic-like operation is
> not atomic, and I don't see how it being somehow "atomic-like" can be
> useful to anyone.
Sorry for messy definition. Should have used something more
appropriate.
Bye.
Sincerely yours, Michael.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: dave@boost-consulting.com (David Abrahams)
Date: Mon, 6 Jun 2005 02:10:32 GMT Raw View
"Michael Pryhodko" <mpryhodko@westpac.com.au> writes:
> Hmm, from all this analysis it is hard to say which approach is
> better...
That was my point. They are close enough in efficiency from an
analytical point-of-view that only real performance measurements can
tell us anything.
> I am close to propose to forbid this 'marking' functionality
> -- this will cross out A.1-A.4, making destructive move winner, but
> leaving us without possibility to move local variables. Well, comments
> are welcome.
If you find yourself tempted to change your proposal in order to make
it "win" the contest, you probably are more attached to the idea of
having destructive move "win" than you are interested in having the
best alternative in the language.
>> >> > 1. you could now hold complex values in the container without
>> >> > thinking how it will be moved inside
>> >>
>> >> Already possible, with non-destructive move or even without a language
>> >> extension.
>> >
>> > .. but you always need to consider consequences. I think your
>> > objective is wrong.
>>
>> I have no clue what you're talking about.
>
> I was talking about 'move' in general, i.e. with any 'move' destructive
> or not.
"move" is a wrong objective?
Sorry, I'm baffled.
> Also 'two-phased move' (I do not know if it could be implemented with
> non-destructive move) makes it even more better:
> - complex updates to container's state could have strong safety
> guarantee (i.e. either succeeded or failed and container is in previous
> state) -- i.e. vector.insert now could have strong guarantee
I don't see how that could possibly be the case unless you are willing
to pay O(N) additional storage for the partially-moved objects.
> In the light of these discussions I start thinking that 'two-phased'
> feature maybe could be discussed separately from 'non- vs destructive'
> subject. If it is possible to separate it from destructive move...
>
> By the way, thanks to this discussion I did very disturbing discovery:
> - in one of my projects I used small_map<T> class -- this is basically
> a map built on top of vector (it was designed to hold small number of
> values)
> - vector is always sorted, small_map::insert is implemented with
> vector::insert
> - that means that if T's operator= throws vector will be left in valid
> but undefined state, ruining program state
> - glance into vector's code confirms my fears
>
> Hmm... This reveleation comes to me as a shock. T could be a structure
> with two std::string for example. This showed me how fundamental
> problem exception safety is...
Yes. Well, sort of; it really has nothing to do with exceptions
per se. The fundamental problem is how to maintain program invariants
in the face of operations that can fail (by any means).
> And how badly we need strong guarantee at as much places as we could
> get.
No, read http://www.boost.org/more/generic_exception_safety.html
(originally published in M. Jazayeri, R. Loos, D. Musser (eds.):
Generic Programming, Proc. of a Dagstuhl Seminar, Lecture Notes on
Computer Science. Volume. 1766)
We need the strong guarantee exactly where it is "natural" for the
operation to provide it without imposing a big-O efficiency cost. You
can always add a copy/swap layer on top of whatever is natural if
you're willing to make the operation expensive in order to get the
strong guarantee.
>> > It solves major problem of C++: newbies always mess things up. Now it
>> > would be hard to make leak.
>>
>> I don't see how the change you're proposing makes it harder to leak.
>> You'd have to show some detailed examples for me to get it.
>
> Note again that this apply for any kind of move. Also I suggest you to
> look upon 'How C++ could look like' chapter in my paper, it could help
> to understand my point.
I find most of your paper very hard to read, so it doesn't help me to
understand very much of what you're saying. However, what I can
understand seems to have some major holes. You can't implement shared
ownership without GC or reference counting no matter how much move
semantics you are given.
> This idea is extremely 'raw', do not judge it too harshly. Any
> comments are welcome.
My only comment is that when proposing something as a better
alternative to a very complete and refined proposal, it's probably a
good idea not to argue too strongly while the idea is still 'raw.'
>> > Could you give an example of case where nondestructive move is more
>> > efficient?
>>
>> I didn't say nondestructive move was more efficient. I only said that
>> I'm not convinced your destructive move is more efficient than
>> nondestructive move.
>
> If you take away 'two-phased' feature, my destructive move is typical
> destructive move.
Well, two-phase and marking are completely independent issues, so no,
"your move" is not what people typically thought of as destructive
move in the past. But if you take away marking...
> What makes you uncertain about efficiency of
> destructive move?
..people will need to reinvent marking "manually" somehow. Plus the
programming model is very difficult.
>> What new case of the strong guarantee does destructive move provide
>> for?
>
> destructive move itself does not make situation better. I was speaking
> about 'my destructive move' which have 'two-phase' feature. It could
> give strong guarantee while still being reasonable efficient.
I doubt it, without extra storage. I think you're ignoring composite
operations.
>> > May I remind you that C++ class constructor is the fine
>> > example of atomic-like update -- and it is used quite extensively.
>>
>> No, you can't "remind" me, because I don't know what "atomic-like" is
>> supposed to mean, and I have never considered C++ ctors to be
>> "atomic-like."
>
> For me 'atomic-like' means: ability of given operation to succeed
> completely, or fail leaving environment in previous state.
That's just "atomic."
--
Dave Abrahams
Boost Consulting
www.boost-consulting.com
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Michael Pryhodko" <mpryhodko@westpac.com.au>
Date: 7 Jun 2005 03:10:05 GMT Raw View
> > move_2:
> > implicitly copies object binary representation into new place and
> > executes whatever instruction T has in it's move_2. Exceptions are
> > forbidden.
>
> I'm sorry, but I'm still confused. If move_1 does nothing at all for a
> pod, and move_2 memcpy's the pod, it looks to me like move_2 is going to
> be copying the wrong value because we've already constructed var at
> m_buf[pos].
Sorry for confusion, it is my mistake. memory for implicit memcpy
should be taken from auto_data. This makes it looking even more ugly
than it was before.
Hmm... I guess this happens because copy is one-shot operation, while
move is two-phased. If we could have two-phased copy (built in the same
way as 'move') it will fit nicely here:
move_1(m_buf[pos] ==> m_buf[pos+1]);
copy_1(var ==> m_buf[pos]);
move_2(m_buf[pos] ==> m_buf[pos+1]);
copy_2(var ==> m_buf[pos]);
but this is not gonna ever happen. So we have only options:
- trying to live with this uglification
- forfeit two-phased move (no strong guarantee in inserts, no complex
atomic updates)
> > It is still leaves us with extra-work for user-defined types:
> > for each i in [size(),pos)
>
> I don't understand what the range [size(),pos) means. Did you mean
> [pos, size())?
Well, actually I tried to say (size(), pos], but it is not my fault --
it is all my keyboard. .) Strange order is to show that we traverse
backwards.
> > > move construct (source remains constructed)
> > > move assign (source remains constructed)
> >
> > I can't see where vector needs these two. Well, it could be useful for
> > [skip]
>
> move construct is needed during insert (sufficient capacity case) for
> [skip]
> move assign is needed during erase for each element moved to a lower
> [skip]
You are thinking in terms of non-destructive move. But vector
operations could be implemented with destructive move (imho more
efficiently) -- in this case they are not needed.
> > In my case 'swap' is provided by language itself based on T's move
> > definition.
>
> This can be suboptimal for those classes that aren't movable with
> memcpy. There are more reference fixup operations associated with 3
> moves than with a single swap.
No, there will be exactly two reference fixups. Pseudocode:
swap(x <==> y):
move_1(x==>y)
move_1(y==>x)
swap_memory_bits
move_2(x==>y)
move_2(y==>x)
Please, note that:
- while I write second phase as move_2(A==>B), move_2 implementation
has access only to heper_obj and B.
- obviously move_2 should not do implicit memcpy, but since 'swap' is
entirely generated by compiler -- he could do everything here (or we
could select implicit memcpy of move_2 as independent step, thus
actually creating 3-phased move, exactly like I thought it should be
about 1 year ago)
> > 1. you can not put two values into one variable. That means if you
> > whant to put giraffe into fridge you need to take out elephant first.
> > :))
>
> <shrug> Thanks, I'll keep that in mind.
Ok, I'll be serious.
> > 2. should be implemented like this:
> > move_1(m_buf[i] ==> m_buf[j]); // could throw
> > m_buf[j].~T(); // should not throw
>
> Why is m_buf[j] being destructed here? Or is this a type-o?
it is not a mistype. move_1 prepares value to be moved from buf[i] to
buf[j]. But before move is finalized we need to free buf[j] from value
it holds.
> > move_2(m_buf[i] ==> m_buf[j]); // throw()
> >
> > on success: m_buf[j] hold previos value of m_buf[i], m_buf[i] is
> > undefined (i.e. this variable has no value in it)
> >
> > this gives us atomic move to defined variable.
>
> Sorry, this is still unclear to me. I'm afraid I need a bug-free
> description. I'm making too many guesses at what you mean.
I proposed a way to do 'move-to-constructed' without need of
move-assignment:
- prepare move [i] -> [j]
- destroy [j]
- finalize move [i] -> [j]
this will give us strong guarantee regardless of type T.
I've spent some time thinking about nature of operator= and 'move
assign'. They have quite interesting features. At the end of this post
I'll try to list my thoughts, maybe you you'd like to comment.
> > > Before this (insert or whatever) operation is over, something from
> > > somewhere is either going to be moved into or copied into m_buf[i] so
> > > that it is once again in a valid constructed state.
> >
> > Wait a second! If you a going to insert something into vector you do
> > not ever need to perform 'move to defined variable' operation. In
> > pseudocode above every move is 'from defined to undefined'.
>
> <sigh>
>
> vector-> _ _ _ _ _
> ^
> |
> insert -----+
>
> After the insert the element at v[2] will be moved to the location of
> v[3]. The element at v[2] will have a new value after the insert.
>
> Before the insert both v[2] and v[3] are constructed.
>
> After the insert both v[2] and v[3] are constructed.
>
> What you do during the insert is an implementation detail. If you want
> to destruct v[2] and re-construct it fine. But I'm arguing that this is
> not efficient. It is more efficient to leave v[2] constructed
> throughout the insert operation. Same assertion applies to elements
> v[3] through v[size()-1].
we have: vector1<vector2<int>>
First: when you insert something into the vector1 at position [i] you
will never get anything useful here, because:
for each i in (size(), pos] { [i - 1]'s buffer will go to [i] }
that means after we finish all our moves [i] will hold something
default (zeros for example). that means in case of insert_copy there
will be always 1 trip to heap.
Second: I think I understood what you mean -- you think that 'move
assign' operator is needed in the same way why we need operator=. So
that for given type author could use alternative way for 'move'
designed using specific application rules. And you think if 'move
assign' is not destructive we could win in some scenarious. Right?
I'll try to describe such scenario:
- we have vector with a cap, i.e. it has maximum size 10
- vector1<vector2<int>> is full
- we are inserting at [i]
> It is more efficient to leave v[2] constructed
> throughout the insert operation.
1. No, it is not. It COULD be more efficient -- it depends on type T
specific.
2. you propose:
- to use use 'move assign' in order to shift elements in vector
- type T (vector2<int>) will swap internal state in it's 'move assign'
operator
- during [i] = new_val assignment if [i] is large enough we well have
zero trips to heap
But also this trick will give us inefficiency in some scenarios, e.g.
when you erase first element of vector: last element will be swapped
with previous and then destroyed, which is not as efficient as it could
be -- i.e. while you are gaining at one end, you lose at another. For
me it is disappointing because theoretically we could win at both ends.
I think vector1 should decide what to use to move element -- swap-based
non-destructive move or destructive move. And this decision should
based on operation type (insert, erase) and T's specific (int or
vector2<int>). Do not know how to do it, however. For example in this
case(using my move) vector could do:
- perform destructive move "in ring" i.e.: [size()-1]->[i], [i]->[i+1],
., [size()-2]->[size()-1]
- do copy: [i] = new_val
but for PODs one move ([last] -> [i]) will be not needed (it is still
better than non-destructive). Also trying to implement this trick will
prevent us from having strong guarantee. Maybe it is still a good idea
to implement:
insert_copy(size_t pos, T const& val)
{
insert_move(pos, T(val));
}
- for PODs it will be efficient
- for others we will get strong guarantee in exchange for some
efficiency
In realworld application I'd choose strong guarantee, because if this
operation fails vector will become destructible but invalid -- and
sometimes it is impossible to recreate it. I'd vote for this:
- vector is implemented with 'my' two-phased destructive move
- if vector's user thinks that assignment is better here he should:
a) ask vector to do 'ring shift'
b) assign [i] = new_val
b1) if T's assignment is no-throw he will get strong guarantee
b2) otherwise he may live with basic guarantee (for example if he
sure that no exception could happen, for example due to plentiness of
memory)
b3) or if 'move' is a no-throw and T's assignment has a strong
guarantee, he could shift, assign, do backward shift if failed.
b4) or use insert_move(pos, T(val))
Anyway decision is left to developer.
> Move detection? How? I know of two solutions: traits and concepts.
> Traits are intrusive and error prone (requires type author to register
> with trait). Concepts may or may not happen. I don't want to hinge the
> success of move semantics on concepts.
I don't know. Maybe it is impossible to implement every possible
optimization using only move constructor and move assignment.
/////////////////////////////////////////////////////////////////
// general thoughts about copy constructors, operator= and others
copy constructor:
- used to make a copy at unitialized destination
- gives strong guarantee (as long as you nice in constructors' bodies)
- it is easy to get strong guarantee for multiple copy-construct
operations: just setup local object that will destroy already created
objects. Non-trivial objects are created in this way.
-----------------------
operator=:
now we need to copy to constructed variable. It could be done with code
like this:
target.~T();
target.T(source);
not very useful, however. Could be better:
copy_1(source ==> target); // prepares copy, uses extra memory if
needed
target.~T();
copy_2(source ==> target); // can not throw
this is much more useful:
- it gives strong guarantee for any T
- we could do more complex things like copying multiple variables and
still having a strong guarantee
- but this comes with a price of efficiency: we need to be pessimistic
and prepare very assignment using additional memory.
- also for some types it would be more efficient to reuse target's
state
Maybe that is why C++ used another approach -- separate function
'operator=':
- it is by default implemented as a sequence of member's operator=
- that means by default operator= gives only basic guarantee! it was a
surprise for me once:
struct { string s1; string s2;}
and
pair<int, string>
could break application in very mysterious way if you forget about it,
code
my_vec[5] = make_pair(15, "This is 15");
could fail leaving my_vec in destructible but unusable state. This
makes writing fail-proof code hard. You could try to get around that by
creating strong guarantee operator=:
operator=(T const& src, T& trg)
{
T tmp(src);
tmp.swap(trg);
}
this will have the same complexity as copy_1/copy_2 based approach. By
sacrificing optimistic performance we could gain strong guarantee for
one operation. But update to environment could be more that one
assignment and trying to make it having strong guarantee will involve
more losses:
- more operations needs to be 'prepared'
- more memory will be used
- more operations which are necessary only in pessimistic case
as problem escalates it becomes unmanageable. And there is no
copy_1/copy_2 in the language -- that means you will need to invent
your own mechanisms. I remember once I have implemented framework for
such 'transactional' programming. Needless to say performance was bad,
program execution path was very similar to RDMS query execution: every
action was creating an entry in transaction log (heap-based stack
structure) which used to rollback changes made so far if something goes
wrong. I had problems with updates from another threads, problems when
execution leaves my code. Basically at the end application was very
close to small memory-based database. That what happens when you try to
make totally fail-safe application by means of providing strong
guarantee on every operation.
As far as I understand to create reliable and efficient applications
you need to 'live with' basic guarantee, but never forget about
operations which have it. That means dividing application state into
weakly-dependent parts which could change state independently. And if
some basic guarantee operation fails -- drop/recreate whole part. For
example: server, where for each connected client there is a structure
of serving objects in memory -- if during request processing
basic-guarantee operation fails (e.g. due to std::bad_alloc) you could
drop connection and all state associated with it. Client will reconnect
later, but server application will be fail-safe and still efficient --
this scheme is a rollback but whithout expensive 'every action' logging
and rolling them back.
Also one more caveat -- in general compiler generated operator= does
not provide even basic guarantee. Example:
class A
{
int m_v1;
string m_data;
int m_v2; // this should be equal to (m_v1 + 10)
};
If during assignment m_data's operator= fails target will be in invalid
(but technically destructible) state making program state corrupted.
During following destruction of such object result is undefined because
object invariant was not hold.
-----------------------
move constructor:
we need to move value from one place to another not initialized place
(lets leave non- vs destructive topic alone for a moment). Obviously,
objects without external references could be moved by memcpy. Lets
consider general case where reference fixups could throw.
- it is easy to get strong guarantee for trivial object
- it is impossible to get it for non-trivial objects without mechanism
similar to 'two-phased' move. To make things worse in general it is
impossible to get even basic guarantee! This is a major difference from
copy constructor -- we can't easily organise even basic guarantee for
multiple movement (i.e. movement of all T's subobjects). Suppose we
have:
class A : class B {};
A a_src(...);
.
A a_trg <== a_src;
if B was moved but A throws a_src will have state when B has 'what is
left after move' (destroyed in case of destructive move and 'empty' in
case of non-destructive) A-part will have constructed state. If this
state is valid (belongs to space of all possible a_src's states) -- we
will get basic guarantee here, otherwise -- not. Obviously it is almost
impossible to get it for destructive move in this case.
We could put two-phased move into the language (thus forever
sacrificing efficiency for optimistic case). Or we could use the same
approach as with operator=:
- if move fails source invariant should not be violated (rule wich must
be followed by developer) -- i.e. if you move any subobject of given
object, this object's state should be valid (with respect to
invariantness)
- this gives us basic guarantee (it is still worse than copy
constructor which gives strong g.) -- I am really worried how it will
be to develop with this 'peculiarity', it looks dangerous.
- if developer needs something more strong he should organise its own
transaction context (like with operator= -- making copies, whatever).
This is similar to this example:
T v1(...);
T v2 <== v1; // this gives us basic guarantee
if we need strong g. we need to do additional work:
T v1(...);
T v2 <== T(v1); // this gives us strong basic guarantee
This makes 'my two-phased' feature unnecessary and conforms to spirit
of C++ (efficiency is a top priority). I think, presence of
'two-phased' feature in language makes it easier to write code wich
gives strong guarantee, but now I am quite far from thought that it
*should* be in the language -- it seems it will never get there, maybe
for good. (thanks for paper, David, it really cleans brains)
------------------------
move assignment:
we need to move value to already constructed variable.
since 'two-phased' feature is inappropriate, everything here is similar
to operator=:
- move assignment should give at least basic guarantee:
that means leaving BOTH target and source in valid state (hmm... it
could be a challenge in some cases, I think)
- all requirements from move constructor applies here
Consclusions:
-------------
- never let compiler to generate 'operator=' for you
- all pecularities with strong/basic guarantees should be clearly
described (in C++ standard, for example) -- so that newbies could get
necessary information immediately, instead of going through long and
painful self-learning process
Destructive vs Nondestructive:
------------------------------
Now based on previous thoughts I'll try to put some ideas about type of
move:
- it is clear that destructive move creates a problem: destroyed
subobject in C++ never was considered as a part of valid object state.
In discussion with David Abrahams there is a description of how
destructive move could be implemented in respect to move from local
storage -- performance is reasonable, but clearly this approach is
unsuitable for cases when we need to move subobject. And we need to do
it because two-phased move is basically out of field of view. That
leaves us with non-destructive move as the only option for generic move
implementation.
- on other hand there are plenty of cases where destructive move could
be more efficient that non-destructive -- for example inside of vector
some operations clearly more efficient with destructive move.
I have an idea:
- when do we need 'make source empty but valid' part of move?:
a) if object bound to any auto-destruction mechanism
b) otherwise only when something goes wrong
- what if we divide non-destructive move into two parts:
1. destructive move part -- it moves object state to new variable
without concerning about source
2. 'move init' function -- it is logically equivalent to default
constructor:
- it should init object to some valid 'empty state'
- it should never throw (very important)
we could either create new member type (move_init) or state: in order
to get basic guarantee for move operations this object's default
constructor should not throw
- during move destructive part should be used, if it succeeds move_init
should be applied to source or not applied depending on circumstances.
In case of failure move_init will be called for every destroyed
subobject and execution will continue as usual (unwinding).
This mechanism will give us ability to implement wide range of
optimizations for vector's operations, while still giving top
performance in optimistic case. For example:
T a1(...);
T a2(...);
T a3 <== a2;
a2 <== a1;
in optimistic path of execution could look like:
create a1, a2;
move_construct_destructive a2 to a3;
move_construct_destructive a1 to a2;
move_init a1;
if any operation fails pessimistic execution path will move_init
necessary sub-/objects and use usual unwinding mechanism.
It is still unclear how to do vector<vector<int>>::insert_copy
optimization here... I need to think about it -- this post is already
too large.
(later) Hmm... it could be something like this:
T a1(...);
T a2(...);
T a3(...);
a2 ==> a3;
a1 ==> a2;
to use be able to use this technique 'move assignment' should be able
to indicate that move_init is unnecessary. Hmm... Looks ugly. Maybe you
have any ideas?
Bye.
Sincerely yours, Michael.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]
Author: "Michael Pryhodko" <mpryhodko@westpac.com.au>
Date: 7 Jun 2005 03:10:02 GMT Raw View
> > Hmm, from all this analysis it is hard to say which approach is
> > better...
>
> That was my point. They are close enough in efficiency from an
> analytical point-of-view that only real performance measurements can
> tell us anything.
You are right. I found some logical problems with this approach. E.g.
it is still a problem to move subobjects and so on. Well, forget it --
it was a bad idea.
> If you find yourself tempted to change your proposal in order to make
> it "win" the contest, you probably are more attached to the idea of
> having destructive move "win" than you are interested in having the
> best alternative in the language.
I am trying to push destructive move because in wide range of cases it
gives better performance and looks more 'logical', for example when you
need to shift values inside of vector. But it seems that you can not
get away from non-destructive semantic completely.
> "move" is a wrong objective?
> Sorry, I'm baffled.
Forget this too, I used wrong word 'objective' instead of 'objection'.
> > Also 'two-phased move' (I do not know if it could be implemented with
> > non-destructive move) makes it even more better:
> > - complex updates to container's state could have strong safety
> > guarantee (i.e. either succeeded or failed and container is in previous
> > state) -- i.e. vector.insert now could have strong guarantee
>
> I don't see how that could possibly be the case unless you are willing
> to pay O(N) additional storage for the partially-moved objects.
Exactly -- in general case you'll need O(N) aditional storage. This
two-phased feature was invented in order to make move constructor
having strong guarantee. Your paper and some thoughts did a very strong
blow to the idea of having it. Now I think it is not a very good idea.
> > And how badly we need strong guarantee at as much places as we could
> > get.
>
> No, read http://www.boost.org/more/generic_exception_safety.html
> (originally published in M. Jazayeri, R. Loos, D. Musser (eds.):
> Generic Programming, Proc. of a Dagstuhl Seminar, Lecture Notes on
> Computer Science. Volume. 1766)
Very good paper. Cleans some crap in my head.
> We need the strong guarantee exactly where it is "natural" for the
> operation to provide it without imposing a big-O efficiency cost. You
> can always add a copy/swap layer on top of whatever is natural if
> you're willing to make the operation expensive in order to get the
> strong guarantee.
Unfortunately this makes it harder and more costly to get strong
guarantee. For example in case of of vector::insert generally it is
enough to backup only values right to the point of insertion. It is
possible to write such code, but hard... so usually everybody do a full
copy. This makes code that needs strong guarantee to perform slower
than needed.
> I find most of your paper very hard to read, so it doesn't help me to
> understand very much of what you're saying. However, what I can
> understand seems to have some major holes. You can't implement shared
> ownership without GC or reference counting no matter how much move
> semantics you are given.
You can... explicitly extract pointer from pointer_to_T object with
'detach()' and use it in usual way (pass it to shared_ptr constructor).
Main idea is to protect resources by making every allocation function
to return object, which encapsulates resource release, implicitly.
> My only comment is that when proposing something as a better
> alternative to a very complete and refined proposal, it's probably a
> good idea not to argue too strongly while the idea is still 'raw.'
Ok, I'll try. What do you think about this idea? Having move semantic
will make it possible to return object efficiently. To stop
auto_release mechanism you need to detach resource explicitly. I
believe this could reduce leaks possibility considerably.
> > If you take away 'two-phased' feature, my destructive move is typical
> > destructive move.
>
> Well, two-phase and marking are completely independent issues, so no,
> "your move" is not what people typically thought of as destructive
> move in the past. But if you take away marking...
>
> > What makes you uncertain about efficiency of
> > destructive move?
>
> ..people will need to reinvent marking "manually" somehow. Plus the
> programming model is very difficult.
Yes, but getting strong guarantee becomes a bit easier. Anyway, I
already de-illusioned with this feature.
> >> What new case of the strong guarantee does destructive move provide
> >> for?
> >
> > destructive move itself does not make situation better. I was speaking
> > about 'my destructive move' which have 'two-phase' feature. It could
> > give strong guarantee while still being reasonable efficient.
>
> I doubt it, without extra storage. I think you're ignoring composite
> operations.
It was designed to use extra storage. And specifically designed to give
strong guarantee for composite operations.
> > For me 'atomic-like' means: ability of given operation to succeed
> > completely, or fail leaving environment in previous state.
>
> That's just "atomic."
I thought it will create confusion with 'atomic' from multithreading.
Bye.
Sincerely yours, Michael.
---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std-c++@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]