Topic: Sorting records using sort()


Author: francis@robinton.demon.co.uk (Francis Glassborow)
Date: Sun, 4 Jan 2004 01:15:57 +0000 (UTC)
Raw View
In article <pan.2004.01.02.12.56.21.625773@remove.this.part.rtij.nl>,
Martijn Lievaart <m@remove.this.part.rtij.nl> writes
>On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:
>
>[ Please don't top post, thank you. M4 ]
>
>[ Crossposted to csc++, is this standard complient? ]
>
>[ Short description of the problem ]
>
>We have an array of n*m bytes. It holds n objects of size m. Both are only
>known at runtime. For some strange reason we want to sort this using
>std::sort. We know there are other solutions, the question is, can it be
>done using std::sort.
>
>Constraints: Not clear, but memory seems to be an issue, creating an array
>of pointers and sort that is out.

But if n and m are only known at execution time the array must be
created dynamically which immediately makes me think of using a vector.
The following trivial program demonstrates that.

   #include <iostream>
   #include <vector>
   #include <algorithm>
   using namespace std;

  int main(){
        int n, m;
        cout << "N and m";
        cin >> n >> m;
        vector< vector<char> > a(n, m);
        for(int i(0); i != n; ++i){
                for(int j(0); j != m; ++j){
                        a[i][j] = 'a' - i - j;
                }
        }

        for(int i(0); i != n; ++i){
                for(int j(0); j != m; ++j){
                        cout << a[i][j] << " ";
                }
                cout << '\n';
        }
        sort(a.begin(), a.end());
        cout << "\n\n";
        for(int i(0); i != n; ++i){
                for(int j(0); j != m; ++j){
                        cout << a[i][j] << " ";
                }
                cout << '\n';
        }
}






--
Francis Glassborow      ACCU
Author of 'You Can Do It!' see http://www.spellen.org/youcandoit
                            or http://www.robinton.demon.co.uk
Happy Xmas, Hanukkah, Yuletide, Winter/Summer Solstice to all.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: geomrock@hotmail.com (Elijah Bailey)
Date: Sun, 4 Jan 2004 04:15:33 +0000 (UTC)
Raw View
m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message news:<pan.2004.01.02.12.56.21.625773@remove.this.part.rtij.nl>...
> On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:
>
> [ Please don't top post, thank you. M4 ]
>
> [ Crossposted to csc++, is this standard complient? ]
>
> [ Short description of the problem ]
>
> We have an array of n*m bytes. It holds n objects of size m. Both are only
> known at runtime. For some strange reason we want to sort this using
> std::sort. We know there are other solutions, the question is, can it be
> done using std::sort.
>

Thanks Martijn. This is a much better description than i came up with.

> Constraints: Not clear, but memory seems to be an issue, creating an array
> of pointers and sort that is out.

Constraints: O(n) space is not allowed. So you can not create a
pointer
to each object and sort them. O(log(n)) space is perfectly ok.
O(log(n)*m) is also allowed if that simplifies things.

My data has n >> m, i.e. n is much greater than m.

>
> > "Ron Natalie" <ron@sensor.com> wrote in message
> > news:<3ff4297b$0$31857$9a6e19ea@news.newshosting.com>...
> >> "Jeff Schwab" <jeffplus@comcast.net> wrote in message
> >> news:WoidnaAaas0i8mmiRVn-hA@comcast.com...
> >>
> >> > Writing your own iterators would not be trivial, but probably would
> >> > be a educational.  Having such iterators available would make your
> >> > data structure compatible with most of the standard library, and
> >> > might avoid countless future headaches.
> >>
> >> The trouble with writing iterators for this problem is that he needs it
> >> to be variable size at run time and that the iterators have to be
> >> derferenceable and the derefenced types assignable.    Add his
> >> requirement to not allocate even enough memory to hold pointers to each
> >> object, I can't figure out how you could do it.
> >
> > Thanks Ron, I think that is definitely a problem. But I still "think" it
> > can be done, dont know how to thou...:)
>
> I also think it can be done. However, it requires so much trickery that it
> is not interesting to do so. Any of the other options quickly become very
> appealing. My approach uses some dynamic memory, but probably less than an
> array of pointers would take. The actual amount depends on the number of
> copies std::sort makes internally, there is no way around this. I guess
> for a typical implementation this would be around log(n) copies, unless
> the implementation takes special care to dispose of temporaries. You
> cannot count on any of this though.
>

Hopefully, by the end of this thread, we'll have something simple
enough
to implement...:)

> OK, how can we do it? Obviously every iterator must know the size of the
> object it is supposed to handle. The main problems we are facing:
>
> - We cannot create real objects (directly), the size of the real object is
> only known at runtime.
> - std::sort is allowed to (and most frequently does) make extra copies of
> objects.
>
> Asusmptions:
> - There are functions to copy these objects and to compare these objects.
> - The objects to be sorted don't need special construction or destruction
> and can be copied by the previously mentioned routine. They can be
> constructed by copying into a new memory area.
>  - std::sort never uses pointers to objects, only iterators. I think the
> standard mandates this, but couldn't find it.
>

Would be interesting to know, what std::sort() is required to have
from
the standard. I think you are perfectly right that sort doesnt use
pointers
to the objects, only iterators and derefernces(*) the iterators...

> We create a proxy class and let the iterator return proxy objects. These
> proxy objects store a pointer to some master descriptor that stores the
> location and size of the array and the size of the individual objects. We
> could store this in the proxy object itself, but that seems wasteful.
> Also, the proxy object stores a void* to store the data for the real
> object.
>
> This proxy class is what our iterators value_type is.
>
> Now the proxy object implements some intelligent constructors. I doubt we
> need a default constructor, the value type needs to be assignable only
> (this follows from the iterator requirements). So we create a constructor
> that is used only inside the iterator, it sets the data pointer to point
> inside the array at the correct location. If the copy constructor is
> called (which would be by std::sort) we dynamically allocate memory to
> hold the object and use the copying routine to fill it.
>
> The assignment operator must use the copying routine if the destination is
> inside the array. If not, it can either use copy-construct-and-swap or the
> copying routine.
>
> The destructor would check if the pointer points inside the array and if
> not deletes the object.
>
> Obviously, we need to define an operator< for the proxy objects, this
> calls the comparison function above. If we template the proxy objects, we
> can use a functor for the comparison, potentially inlining the comaparison
> (this would be the only reason to use std::sort anyhow, so it makes
> sense).
>
> Looking at the standard, this would fullfill all requirements for
> std::sort I could find. Somehow I think I did overlook something here, so
> scrutiny by others is appreciated. I also left obvious details out,
> obviously :-)
>
> A note on exceptions. This would give the basic guarentee, if the copying
> and comparison routines give at least the basic guarentee.
>
> A note on  memory usage. Assuming that std::sort holds log(n) copies of
> objects internally, this aproach uses less memory than the
> sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
> greatly depends on these variables, but quickly holds for large n and gets
> less interesting for large m. Assume sizeof(void*)==4, we get m <
> 4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
> less than 400. These are gross approximations but should give you a feel.
>

Usually this is certainly the case for my data.

> I hope anyone sees that you must have good reasons to do this. It is ugly,
> errorprone, prone to assumptions your implementation of std::sort makes
> (in this implementation &*it does not return a pointer to the real object,
> any pointer arithmetic will fail), difficult to maintain. There is no real
> reason why you should not call qsort in the first place, except as an
> accademic exercise.
>

First problem to qsort:     It's slower than using sort()
Second problem to qsort: Later I might want to use other STL
algorithms on this data!
                                       like for_each()? find()? ...?

Thanks a lot for your comments,
--Elijah

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: utc.4eecd8c5@ntlworld.com ("david m-")
Date: Mon, 5 Jan 2004 02:10:22 +0000 (UTC)
Raw View
Sounds like some sort of intermediate class is required in order to spoof a
RandomAccessIterator as required by the STL sort() function.



i.e. you can create these to adapt the dynamic size of the elements to be
sorted to the static requirements of an iterator.



Though whether this is technically possible I'm not sure.



david m.



"Elijah Bailey" <geomrock@hotmail.com> wrote in message
news:e008fef8.0401031936.312ffa9a@posting.google.com...
> m@remove.this.part.rtij.nl (Martijn Lievaart) wrote in message
news:<pan.2004.01.02.12.56.21.625773@remove.this.part.rtij.nl>...
> > On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:
> >
> > [ Please don't top post, thank you. M4 ]
> >
> > [ Crossposted to csc++, is this standard complient? ]
> >
> > [ Short description of the problem ]
> >
> > We have an array of n*m bytes. It holds n objects of size m. Both are
only
> > known at runtime. For some strange reason we want to sort this using
> > std::sort. We know there are other solutions, the question is, can it be
> > done using std::sort.
> >
>
> Thanks Martijn. This is a much better description than i came up with.
>
> > Constraints: Not clear, but memory seems to be an issue, creating an
array
> > of pointers and sort that is out.
>
> Constraints: O(n) space is not allowed. So you can not create a
> pointer
> to each object and sort them. O(log(n)) space is perfectly ok.
> O(log(n)*m) is also allowed if that simplifies things.
>
> My data has n >> m, i.e. n is much greater than m.
>
> >
> > > "Ron Natalie" <ron@sensor.com> wrote in message
> > > news:<3ff4297b$0$31857$9a6e19ea@news.newshosting.com>...
> > >> "Jeff Schwab" <jeffplus@comcast.net> wrote in message
> > >> news:WoidnaAaas0i8mmiRVn-hA@comcast.com...
> > >>
> > >> > Writing your own iterators would not be trivial, but probably would
> > >> > be a educational.  Having such iterators available would make your
> > >> > data structure compatible with most of the standard library, and
> > >> > might avoid countless future headaches.
> > >>
> > >> The trouble with writing iterators for this problem is that he needs
it
> > >> to be variable size at run time and that the iterators have to be
> > >> derferenceable and the derefenced types assignable.    Add his
> > >> requirement to not allocate even enough memory to hold pointers to
each
> > >> object, I can't figure out how you could do it.
> > >
> > > Thanks Ron, I think that is definitely a problem. But I still "think"
it
> > > can be done, dont know how to thou...:)
> >
> > I also think it can be done. However, it requires so much trickery that
it
> > is not interesting to do so. Any of the other options quickly become
very
> > appealing. My approach uses some dynamic memory, but probably less than
an
> > array of pointers would take. The actual amount depends on the number of
> > copies std::sort makes internally, there is no way around this. I guess
> > for a typical implementation this would be around log(n) copies, unless
> > the implementation takes special care to dispose of temporaries. You
> > cannot count on any of this though.
> >
>
> Hopefully, by the end of this thread, we'll have something simple
> enough
> to implement...:)
>
> > OK, how can we do it? Obviously every iterator must know the size of the
> > object it is supposed to handle. The main problems we are facing:
> >
> > - We cannot create real objects (directly), the size of the real object
is
> > only known at runtime.
> > - std::sort is allowed to (and most frequently does) make extra copies
of
> > objects.
> >
> > Asusmptions:
> > - There are functions to copy these objects and to compare these
objects.
> > - The objects to be sorted don't need special construction or
destruction
> > and can be copied by the previously mentioned routine. They can be
> > constructed by copying into a new memory area.
> >  - std::sort never uses pointers to objects, only iterators. I think the
> > standard mandates this, but couldn't find it.
> >
>
> Would be interesting to know, what std::sort() is required to have
> from
> the standard. I think you are perfectly right that sort doesnt use
> pointers
> to the objects, only iterators and derefernces(*) the iterators...
>
> > We create a proxy class and let the iterator return proxy objects. These
> > proxy objects store a pointer to some master descriptor that stores the
> > location and size of the array and the size of the individual objects.
We
> > could store this in the proxy object itself, but that seems wasteful.
> > Also, the proxy object stores a void* to store the data for the real
> > object.
> >
> > This proxy class is what our iterators value_type is.
> >
> > Now the proxy object implements some intelligent constructors. I doubt
we
> > need a default constructor, the value type needs to be assignable only
> > (this follows from the iterator requirements). So we create a
constructor
> > that is used only inside the iterator, it sets the data pointer to point
> > inside the array at the correct location. If the copy constructor is
> > called (which would be by std::sort) we dynamically allocate memory to
> > hold the object and use the copying routine to fill it.
> >
> > The assignment operator must use the copying routine if the destination
is
> > inside the array. If not, it can either use copy-construct-and-swap or
the
> > copying routine.
> >
> > The destructor would check if the pointer points inside the array and if
> > not deletes the object.
> >
> > Obviously, we need to define an operator< for the proxy objects, this
> > calls the comparison function above. If we template the proxy objects,
we
> > can use a functor for the comparison, potentially inlining the
comaparison
> > (this would be the only reason to use std::sort anyhow, so it makes
> > sense).
> >
> > Looking at the standard, this would fullfill all requirements for
> > std::sort I could find. Somehow I think I did overlook something here,
so
> > scrutiny by others is appreciated. I also left obvious details out,
> > obviously :-)
> >
> > A note on exceptions. This would give the basic guarentee, if the
copying
> > and comparison routines give at least the basic guarentee.
> >
> > A note on  memory usage. Assuming that std::sort holds log(n) copies of
> > objects internally, this aproach uses less memory than the
> > sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
> > greatly depends on these variables, but quickly holds for large n and
gets
> > less interesting for large m. Assume sizeof(void*)==4, we get m <
> > 4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
> > less than 400. These are gross approximations but should give you a
feel.
> >
>
> Usually this is certainly the case for my data.
>
> > I hope anyone sees that you must have good reasons to do this. It is
ugly,
> > errorprone, prone to assumptions your implementation of std::sort makes
> > (in this implementation &*it does not return a pointer to the real
object,
> > any pointer arithmetic will fail), difficult to maintain. There is no
real
> > reason why you should not call qsort in the first place, except as an
> > accademic exercise.
> >
>
> First problem to qsort:     It's slower than using sort()
> Second problem to qsort: Later I might want to use other STL
> algorithms on this data!
>                                        like for_each()? find()? ...?
>
> Thanks a lot for your comments,
> --Elijah
>
> ---
> [ comp.std.c++ is moderated.  To submit articles, try just posting with ]
> [ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
> [              --- Please see the FAQ before posting. ---               ]
> [ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]
>

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]





Author: m@remove.this.part.rtij.nl (Martijn Lievaart)
Date: Sat, 3 Jan 2004 06:20:09 +0000 (UTC)
Raw View
On Fri, 02 Jan 2004 03:13:26 -0800, Elijah Bailey wrote:

[ Please don't top post, thank you. M4 ]

[ Crossposted to csc++, is this standard complient? ]

[ Short description of the problem ]

We have an array of n*m bytes. It holds n objects of size m. Both are only
known at runtime. For some strange reason we want to sort this using
std::sort. We know there are other solutions, the question is, can it be
done using std::sort.

Constraints: Not clear, but memory seems to be an issue, creating an array
of pointers and sort that is out.

> "Ron Natalie" <ron@sensor.com> wrote in message
> news:<3ff4297b$0$31857$9a6e19ea@news.newshosting.com>...
>> "Jeff Schwab" <jeffplus@comcast.net> wrote in message
>> news:WoidnaAaas0i8mmiRVn-hA@comcast.com...
>>
>> > Writing your own iterators would not be trivial, but probably would
>> > be a educational.  Having such iterators available would make your
>> > data structure compatible with most of the standard library, and
>> > might avoid countless future headaches.
>>
>> The trouble with writing iterators for this problem is that he needs it
>> to be variable size at run time and that the iterators have to be
>> derferenceable and the derefenced types assignable.    Add his
>> requirement to not allocate even enough memory to hold pointers to each
>> object, I can't figure out how you could do it.
>
> Thanks Ron, I think that is definitely a problem. But I still "think" it
> can be done, dont know how to thou...:)

I also think it can be done. However, it requires so much trickery that it
is not interesting to do so. Any of the other options quickly become very
appealing. My approach uses some dynamic memory, but probably less than an
array of pointers would take. The actual amount depends on the number of
copies std::sort makes internally, there is no way around this. I guess
for a typical implementation this would be around log(n) copies, unless
the implementation takes special care to dispose of temporaries. You
cannot count on any of this though.

OK, how can we do it? Obviously every iterator must know the size of the
object it is supposed to handle. The main problems we are facing:

- We cannot create real objects (directly), the size of the real object is
only known at runtime.
- std::sort is allowed to (and most frequently does) make extra copies of
objects.

Asusmptions:
- There are functions to copy these objects and to compare these objects.
- The objects to be sorted don't need special construction or destruction
and can be copied by the previously mentioned routine. They can be
constructed by copying into a new memory area.
 - std::sort never uses pointers to objects, only iterators. I think the
standard mandates this, but couldn't find it.

We create a proxy class and let the iterator return proxy objects. These
proxy objects store a pointer to some master descriptor that stores the
location and size of the array and the size of the individual objects. We
could store this in the proxy object itself, but that seems wasteful.
Also, the proxy object stores a void* to store the data for the real
object.

This proxy class is what our iterators value_type is.

Now the proxy object implements some intelligent constructors. I doubt we
need a default constructor, the value type needs to be assignable only
(this follows from the iterator requirements). So we create a constructor
that is used only inside the iterator, it sets the data pointer to point
inside the array at the correct location. If the copy constructor is
called (which would be by std::sort) we dynamically allocate memory to
hold the object and use the copying routine to fill it.

The assignment operator must use the copying routine if the destination is
inside the array. If not, it can either use copy-construct-and-swap or the
copying routine.

The destructor would check if the pointer points inside the array and if
not deletes the object.

Obviously, we need to define an operator< for the proxy objects, this
calls the comparison function above. If we template the proxy objects, we
can use a functor for the comparison, potentially inlining the comaparison
(this would be the only reason to use std::sort anyhow, so it makes
sense).

Looking at the standard, this would fullfill all requirements for
std::sort I could find. Somehow I think I did overlook something here, so
scrutiny by others is appreciated. I also left obvious details out,
obviously :-)

A note on exceptions. This would give the basic guarentee, if the copying
and comparison routines give at least the basic guarentee.

A note on  memory usage. Assuming that std::sort holds log(n) copies of
objects internally, this aproach uses less memory than the
sort-array-of-pointers aproach if log(n)*m < n*sizeof(void*). So this
greatly depends on these variables, but quickly holds for large n and gets
less interesting for large m. Assume sizeof(void*)==4, we get m <
4n/log(n). For n=100, m should be less than 60. For n=1000, m should be
less than 400. These are gross approximations but should give you a feel.

I hope anyone sees that you must have good reasons to do this. It is ugly,
errorprone, prone to assumptions your implementation of std::sort makes
(in this implementation &*it does not return a pointer to the real object,
any pointer arithmetic will fail), difficult to maintain. There is no real
reason why you should not call qsort in the first place, except as an
accademic exercise.

HTH,
M4

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]