Topic: Adding Sorting with suffix subordering for suffix arrays
Author: joshua.r.marshall.1991@gmail.com
Date: Mon, 4 Jun 2018 08:14:52 -0700 (PDT)
Raw View
------=_Part_33593_124879047.1528125292355
Content-Type: multipart/alternative;
boundary="----=_Part_33594_1438869695.1528125292356"
------=_Part_33594_1438869695.1528125292356
Content-Type: text/plain; charset="UTF-8"
A more limited change to the sorts made available is adding a discrete
suffix array function. For the uninitiated, a suffix array is unlike a
typical sorting algorithm in two respects: the returned object is a
sequence of indexes or reference into the original container, and in the
case of ties, subordering is dictated by a comparison of each index's or
reference's following value in the original container with the terminal
value being the global minimum value.
The two significant algorithms for generating suffix arrays right now are
SACA-K and divsufsort.
I think a good call interface for this would be as follows:
template< typename _Bidirectional_Iterator, typename _Output_Iterator >
void get_suffix_array(_Bidirectional_Iterator begin,
_Bidirectional_Iterator end, _Output_Iterator out);
This would have applications in compression, text searching, and sequence
alignment.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/da9287b6-3218-445a-b079-6b9bf37f4060%40isocpp.org.
------=_Part_33594_1438869695.1528125292356
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>A more limited change to the sorts made available is =
adding a discrete suffix array function.=C2=A0 For the uninitiated, a suffi=
x array is unlike a typical sorting algorithm in two respects: the returned=
object is a sequence of indexes or reference into the original container, =
and in the case of ties, subordering is dictated by a comparison of each in=
dex's or reference's following value in the original container with=
the terminal value being the global minimum value.</div><div><br></div><di=
v>The two significant algorithms for generating suffix arrays right now are=
SACA-K and divsufsort.</div><div><br></div><div>I think a good call interf=
ace for this would be as follows:</div><div><br></div><div>template< typ=
ename _Bidirectional_Iterator, typename _Output_Iterator ></div><div>voi=
d get_suffix_array(_Bidirectional_Iterator begin, _Bidirectional_Iterator e=
nd, _Output_Iterator out);</div><div><br></div><div>This would have applica=
tions in compression, text searching, and sequence alignment.<br></div></di=
v>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/da9287b6-3218-445a-b079-6b9bf37f4060%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/da9287b6-3218-445a-b079-6b9bf37f4060=
%40isocpp.org</a>.<br />
------=_Part_33594_1438869695.1528125292356--
------=_Part_33593_124879047.1528125292355--
.
Author: florian.csdt@gmail.com
Date: Mon, 4 Jun 2018 08:36:49 -0700 (PDT)
Raw View
------=_Part_19178_563515407.1528126609452
Content-Type: multipart/alternative;
boundary="----=_Part_19179_1736443434.1528126609452"
------=_Part_19179_1736443434.1528126609452
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
I have the feeling we don't a new interface for that:
template< typename RandomAccess_Iterator, typename Output_Iterator >
void get_suffix_array(RandomAccess_Iterator begin, RandomAccess_Iterator en=
d
, Output_Iterator out_begin) {
// few preparations
const auto n =3D end - begin;
auto out_end =3D out_begin + n;
// actual work
std::iota(out_begin, out_end, 0);
std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=20
begin[lhs] < begin[rhs]; });
}
Would that be enough?
Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=A9crit =
:
>
> A more limited change to the sorts made available is adding a discrete=20
> suffix array function. For the uninitiated, a suffix array is unlike a=
=20
> typical sorting algorithm in two respects: the returned object is a=20
> sequence of indexes or reference into the original container, and in the=
=20
> case of ties, subordering is dictated by a comparison of each index's or=
=20
> reference's following value in the original container with the terminal=
=20
> value being the global minimum value.
>
> The two significant algorithms for generating suffix arrays right now are=
=20
> SACA-K and divsufsort.
>
> I think a good call interface for this would be as follows:
>
> template< typename _Bidirectional_Iterator, typename _Output_Iterator >
> void get_suffix_array(_Bidirectional_Iterator begin,=20
> _Bidirectional_Iterator end, _Output_Iterator out);
>
> This would have applications in compression, text searching, and sequence=
=20
> alignment.
>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/57aff230-b879-4205-9cc6-dc3382c7d580%40isocpp.or=
g.
------=_Part_19179_1736443434.1528126609452
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">I have the feeling we don't a new interface for that:<=
br><div style=3D"background-color: rgb(250, 250, 250); border-color: rgb(18=
7, 187, 187); border-style: solid; border-width: 1px; overflow-wrap: break-=
word;" class=3D"prettyprint"><code class=3D"prettyprint"><div class=3D"subp=
rettyprint"><div style=3D"color: #000000;background-color: #fffffe;font-fam=
ily: Consolas, "><div><span style=3D"color: #0000ff;"><span style=3D"color:=
#008;" class=3D"styled-by-prettify">template</span></span><span style=3D"c=
olor: #000000;"><span style=3D"color: #660;" class=3D"styled-by-prettify">&=
lt;</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span=
></span><span style=3D"color: #0000ff;"><span style=3D"color: #008;" class=
=3D"styled-by-prettify">typename</span></span><span style=3D"color: #000000=
;"><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span =
style=3D"color: #606;" class=3D"styled-by-prettify">RandomAccess_Iterator</=
span><span style=3D"color: #660;" class=3D"styled-by-prettify">,</span><spa=
n style=3D"color: #000;" class=3D"styled-by-prettify"> </span></span><span =
style=3D"color: #0000ff;"><span style=3D"color: #008;" class=3D"styled-by-p=
rettify">typename</span></span><span style=3D"color: #000000;"><span style=
=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color=
: #606;" class=3D"styled-by-prettify">Output_Iterator</span><span style=3D"=
color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #6=
60;" class=3D"styled-by-prettify">></span></span></div><div><span style=
=3D"color: #0000ff;"><span style=3D"color: #008;" class=3D"styled-by-pretti=
fy">void</span></span><span style=3D"color: #000000;"><span style=3D"color:=
#000;" class=3D"styled-by-prettify"> get_suffix_array</span><span style=3D=
"color: #660;" class=3D"styled-by-prettify">(</span><span style=3D"color: #=
606;" class=3D"styled-by-prettify">RandomAccess_Iterator</span><span style=
=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color=
: #008;" class=3D"styled-by-prettify">begin</span><span style=3D"color: #66=
0;" class=3D"styled-by-prettify">,</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> </span><span style=3D"color: #606;" class=3D"style=
d-by-prettify">RandomAccess_Iterator</span><span style=3D"color: #000;" cla=
ss=3D"styled-by-prettify"> </span><span style=3D"color: #008;" class=3D"sty=
led-by-prettify">end</span><span style=3D"color: #660;" class=3D"styled-by-=
prettify">,</span><span style=3D"color: #000;" class=3D"styled-by-prettify"=
> </span><span style=3D"color: #606;" class=3D"styled-by-prettify">Output_I=
terator</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> ou=
t_begin</span><span style=3D"color: #660;" class=3D"styled-by-prettify">)</=
span><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><spa=
n style=3D"color: #660;" class=3D"styled-by-prettify">{</span></span></div>=
<div><span style=3D"color: #000000;"><span style=3D"color: #000;" class=3D"=
styled-by-prettify"> =C2=A0 =C2=A0</span></span><span style=3D"color: #0080=
00;"><span style=3D"color: #800;" class=3D"styled-by-prettify">// few prepa=
rations</span></span></div><div><span style=3D"color: #000000;"><span style=
=3D"color: #800;" class=3D"styled-by-prettify"> =C2=A0 =C2=A0</span></span>=
<span style=3D"color: #0000ff;"><span style=3D"color: #800;" class=3D"style=
d-by-prettify">const</span></span><span style=3D"color: #000000;"><span sty=
le=3D"color: #800;" class=3D"styled-by-prettify"> </span></span><span style=
=3D"color: #0000ff;"><span style=3D"color: #800;" class=3D"styled-by-pretti=
fy">auto</span></span><span style=3D"color: #000000;"><span style=3D"color:=
#800;" class=3D"styled-by-prettify"> n =3D end - begin;</span></span></div=
><div><span style=3D"color: #000000;"><span style=3D"color: #800;" class=3D=
"styled-by-prettify"> =C2=A0 =C2=A0auto out_end =3D out_begin + n;</span></=
span></div><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></=
span><div><span style=3D"color: #000000;"><span style=3D"color: #000;" clas=
s=3D"styled-by-prettify">=C2=A0 =C2=A0 </span></span><span style=3D"color: =
#008000;"><span style=3D"color: #800;" class=3D"styled-by-prettify">// actu=
al work</span></span></div><div><span style=3D"color: #000000;"><span style=
=3D"color: #800;" class=3D"styled-by-prettify"> =C2=A0 =C2=A0std::iota(out_=
begin, out_end, </span></span><span style=3D"color: #09885a;"><span style=
=3D"color: #800;" class=3D"styled-by-prettify">0</span></span><span style=
=3D"color: #000000;"><span style=3D"color: #800;" class=3D"styled-by-pretti=
fy">);</span></span></div><div><span style=3D"color: #000000;"><span style=
=3D"color: #800;" class=3D"styled-by-prettify"> =C2=A0 =C2=A0std::sort(out_=
begin, out_end, [&begin](</span></span><span style=3D"color: #0000ff;">=
<span style=3D"color: #800;" class=3D"styled-by-prettify">auto</span></span=
><span style=3D"color: #000000;"><span style=3D"color: #800;" class=3D"styl=
ed-by-prettify"> lhs, </span></span><span style=3D"color: #0000ff;"><span s=
tyle=3D"color: #800;" class=3D"styled-by-prettify">auto</span></span><span =
style=3D"color: #000000;"><span style=3D"color: #800;" class=3D"styled-by-p=
rettify"> rhs) { </span></span><span style=3D"color: #0000ff;"><span style=
=3D"color: #800;" class=3D"styled-by-prettify">return</span></span><span st=
yle=3D"color: #000000;"><span style=3D"color: #800;" class=3D"styled-by-pre=
ttify"> begin[lhs] < begin[rhs]; });</span></span></div><div><span style=
=3D"color: #000000;"><span style=3D"color: #800;" class=3D"styled-by-pretti=
fy">}</span></span></div></div></div></code></div><br>Would that be enough?=
<br><br>Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=
=A9crit=C2=A0:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-l=
eft: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr"=
><div>A more limited change to the sorts made available is adding a discret=
e suffix array function.=C2=A0 For the uninitiated, a suffix array is unlik=
e a typical sorting algorithm in two respects: the returned object is a seq=
uence of indexes or reference into the original container, and in the case =
of ties, subordering is dictated by a comparison of each index's or ref=
erence's following value in the original container with the terminal va=
lue being the global minimum value.</div><div><br></div><div>The two signif=
icant algorithms for generating suffix arrays right now are SACA-K and divs=
ufsort.</div><div><br></div><div>I think a good call interface for this wou=
ld be as follows:</div><div><br></div><div>template< typename _Bidirecti=
onal_Iterator, typename _Output_Iterator ></div><div>void get_suffix_arr=
ay(_<wbr>Bidirectional_Iterator begin, _Bidirectional_Iterator end, _Output=
_Iterator out);</div><div><br></div><div>This would have applications in co=
mpression, text searching, and sequence alignment.<br></div></div></blockqu=
ote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/57aff230-b879-4205-9cc6-dc3382c7d580%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/57aff230-b879-4205-9cc6-dc3382c7d580=
%40isocpp.org</a>.<br />
------=_Part_19179_1736443434.1528126609452--
------=_Part_19178_563515407.1528126609452--
.
Author: joshua.r.marshall.1991@gmail.com
Date: Mon, 4 Jun 2018 08:51:48 -0700 (PDT)
Raw View
------=_Part_33699_1337372668.1528127509002
Content-Type: multipart/alternative;
boundary="----=_Part_33700_1576302153.1528127509002"
------=_Part_33700_1576302153.1528127509002
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Unfortunately, that approach computes and is correct but is not efficient=
=20
enough to be seriously considered in my area. That approach ends up being=
=20
something like O(n^2 log n). Naive tuned algorithms are still in the realm=
=20
on O(n^2) when using a normally linear sort. SACA-K is linear time and=20
constant workspace, and divsufsort is n log n time with some acceptable=20
workspace.
On Monday, June 4, 2018 at 11:36:49 AM UTC-4, floria...@gmail.com wrote:
>
> I have the feeling we don't a new interface for that:
> template< typename RandomAccess_Iterator, typename Output_Iterator >
> void get_suffix_array(RandomAccess_Iterator begin, RandomAccess_Iterator=
=20
> end, Output_Iterator out_begin) {
> // few preparations
> const auto n =3D end - begin;
> auto out_end =3D out_begin + n;
>
> // actual work
> std::iota(out_begin, out_end, 0);
> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=20
> begin[lhs] < begin[rhs]; });
> }
>
> Would that be enough?
>
> Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=A9cri=
t :
>>
>> A more limited change to the sorts made available is adding a discrete=
=20
>> suffix array function. For the uninitiated, a suffix array is unlike a=
=20
>> typical sorting algorithm in two respects: the returned object is a=20
>> sequence of indexes or reference into the original container, and in the=
=20
>> case of ties, subordering is dictated by a comparison of each index's or=
=20
>> reference's following value in the original container with the terminal=
=20
>> value being the global minimum value.
>>
>> The two significant algorithms for generating suffix arrays right now ar=
e=20
>> SACA-K and divsufsort.
>>
>> I think a good call interface for this would be as follows:
>>
>> template< typename _Bidirectional_Iterator, typename _Output_Iterator >
>> void get_suffix_array(_Bidirectional_Iterator begin,=20
>> _Bidirectional_Iterator end, _Output_Iterator out);
>>
>> This would have applications in compression, text searching, and sequenc=
e=20
>> alignment.
>>
>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/bdda0670-d00a-49da-ba5b-1a477bd717ef%40isocpp.or=
g.
------=_Part_33700_1576302153.1528127509002
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div>Unfortunately, that approach computes and is correct =
but is not efficient enough to be seriously considered in my area.=C2=A0 Th=
at approach ends up being something like O(n^2 log n).=C2=A0 Naive tuned al=
gorithms are still in the realm on O(n^2) when using a normally linear sort=
..=C2=A0 SACA-K is linear time and constant workspace, and divsufsort is n l=
og n time with some acceptable workspace.<br></div><div><br></div><br>On Mo=
nday, June 4, 2018 at 11:36:49 AM UTC-4, floria...@gmail.com wrote:<blockqu=
ote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left=
: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">I have the feeling we=
don't a new interface for that:<br><div style=3D"background-color:rgb(=
250,250,250);border-color:rgb(187,187,187);border-style:solid;border-width:=
1px"><code><div><div><div><span style=3D"color:#0000ff"><span style=3D"colo=
r:#008">template</span></span><span style=3D"color:#000000"><span style=3D"=
color:#660"><</span><span style=3D"color:#000"> </span></span><span styl=
e=3D"color:#0000ff"><span style=3D"color:#008">typename</span></span><span =
style=3D"color:#000000"><span style=3D"color:#000"> </span><span style=3D"c=
olor:#606">RandomAccess_Iterator</span><span style=3D"color:#660">,</span><=
span style=3D"color:#000"> </span></span><span style=3D"color:#0000ff"><spa=
n style=3D"color:#008">typename</span></span><span style=3D"color:#000000">=
<span style=3D"color:#000"> </span><span style=3D"color:#606">Output_Iterat=
or</span><span style=3D"color:#000"> </span><span style=3D"color:#660">>=
</span></span></div><div><span style=3D"color:#0000ff"><span style=3D"color=
:#008">void</span></span><span style=3D"color:#000000"><span style=3D"color=
:#000"> get_suffix_array</span><span style=3D"color:#660">(</span><span sty=
le=3D"color:#606">RandomAccess_<wbr>Iterator</span><span style=3D"color:#00=
0"> </span><span style=3D"color:#008">begin</span><span style=3D"color:#660=
">,</span><span style=3D"color:#000"> </span><span style=3D"color:#606">Ran=
domAccess_Iterator</span><span style=3D"color:#000"> </span><span style=3D"=
color:#008">end</span><span style=3D"color:#660">,</span><span style=3D"col=
or:#000"> </span><span style=3D"color:#606">Output_Iterator</span><span sty=
le=3D"color:#000"> out_begin</span><span style=3D"color:#660">)</span><span=
style=3D"color:#000"> </span><span style=3D"color:#660">{</span></span></d=
iv><div><span style=3D"color:#000000"><span style=3D"color:#000"> =C2=A0 =
=C2=A0</span></span><span style=3D"color:#008000"><span style=3D"color:#800=
">// few preparations</span></span></div><div><span style=3D"color:#000000"=
><span style=3D"color:#800"> =C2=A0 =C2=A0</span></span><span style=3D"colo=
r:#0000ff"><span style=3D"color:#800">const</span></span><span style=3D"col=
or:#000000"><span style=3D"color:#800"> </span></span><span style=3D"color:=
#0000ff"><span style=3D"color:#800">auto</span></span><span style=3D"color:=
#000000"><span style=3D"color:#800"> n =3D end - begin;</span></span></div>=
<div><span style=3D"color:#000000"><span style=3D"color:#800"> =C2=A0 =C2=
=A0auto out_end =3D out_begin + n;</span></span></div><span style=3D"color:=
#000"><br></span><div><span style=3D"color:#000000"><span style=3D"color:#0=
00">=C2=A0 =C2=A0 </span></span><span style=3D"color:#008000"><span style=
=3D"color:#800">// actual work</span></span></div><div><span style=3D"color=
:#000000"><span style=3D"color:#800"> =C2=A0 =C2=A0std::iota(out_begin, out=
_end, </span></span><span style=3D"color:#09885a"><span style=3D"color:#800=
">0</span></span><span style=3D"color:#000000"><span style=3D"color:#800">)=
;</span></span></div><div><span style=3D"color:#000000"><span style=3D"colo=
r:#800"> =C2=A0 =C2=A0std::sort(out_begin, out_end, [&begin](</span></s=
pan><span style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></s=
pan><span style=3D"color:#000000"><span style=3D"color:#800"> lhs, </span><=
/span><span style=3D"color:#0000ff"><span style=3D"color:#800">auto</span><=
/span><span style=3D"color:#000000"><span style=3D"color:#800"> rhs) { </sp=
an></span><span style=3D"color:#0000ff"><span style=3D"color:#800">return</=
span></span><span style=3D"color:#000000"><span style=3D"color:#800"> begin=
[lhs] < begin[rhs]; });</span></span></div><div><span style=3D"color:#00=
0000"><span style=3D"color:#800">}</span></span></div></div></div></code></=
div><br>Would that be enough?<br><br>Le lundi 4 juin 2018 17:14:52 UTC+2, <=
a>joshua.r.ma...@gmail.com</a> a =C3=A9crit=C2=A0:<blockquote class=3D"gmai=
l_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr"><div>A more limited change to the sorts mad=
e available is adding a discrete suffix array function.=C2=A0 For the unini=
tiated, a suffix array is unlike a typical sorting algorithm in two respect=
s: the returned object is a sequence of indexes or reference into the origi=
nal container, and in the case of ties, subordering is dictated by a compar=
ison of each index's or reference's following value in the original=
container with the terminal value being the global minimum value.</div><di=
v><br></div><div>The two significant algorithms for generating suffix array=
s right now are SACA-K and divsufsort.</div><div><br></div><div>I think a g=
ood call interface for this would be as follows:</div><div><br></div><div>t=
emplate< typename _Bidirectional_Iterator, typename _Output_Iterator >=
;</div><div>void get_suffix_array(_<wbr>Bidirectional_Iterator begin, _Bidi=
rectional_Iterator end, _Output_Iterator out);</div><div><br></div><div>Thi=
s would have applications in compression, text searching, and sequence alig=
nment.<br></div></div></blockquote></div></blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/bdda0670-d00a-49da-ba5b-1a477bd717ef%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/bdda0670-d00a-49da-ba5b-1a477bd717ef=
%40isocpp.org</a>.<br />
------=_Part_33700_1576302153.1528127509002--
------=_Part_33699_1337372668.1528127509002--
.
Author: florian.csdt@gmail.com
Date: Mon, 4 Jun 2018 09:02:48 -0700 (PDT)
Raw View
------=_Part_33993_1036016547.1528128168356
Content-Type: multipart/alternative;
boundary="----=_Part_33994_1777521046.1528128168356"
------=_Part_33994_1777521046.1528128168356
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
How is it O(n=C2=B2 log n)?
end - begin is O(1) (random access iterator)
out_begin + n is O(1) (random access iterator)
std::iota(out_begin, out_end, 0) is O(n)
std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=20
begin[lhs] < begin[rhs]; }) is O(n log n) ( begin[] is O(1))
In the end, this is O(n log n).
Maybe I have assumed random access iterators where they are not, but=20
otherwise, it is O(n log n).
Le lundi 4 juin 2018 17:51:49 UTC+2, joshua.r.ma...@gmail.com a =C3=A9crit =
:
>
> Unfortunately, that approach computes and is correct but is not efficient=
=20
> enough to be seriously considered in my area. That approach ends up bein=
g=20
> something like O(n^2 log n). Naive tuned algorithms are still in the rea=
lm=20
> on O(n^2) when using a normally linear sort. SACA-K is linear time and=
=20
> constant workspace, and divsufsort is n log n time with some acceptable=
=20
> workspace.
>
>
> On Monday, June 4, 2018 at 11:36:49 AM UTC-4, floria...@gmail.com wrote:
>>
>> I have the feeling we don't a new interface for that:
>> template< typename RandomAccess_Iterator, typename Output_Iterator >
>> void get_suffix_array(RandomAccess_Iterator begin, RandomAccess_Iterator=
=20
>> end, Output_Iterator out_begin) {
>> // few preparations
>> const auto n =3D end - begin;
>> auto out_end =3D out_begin + n;
>>
>> // actual work
>> std::iota(out_begin, out_end, 0);
>> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=
=20
>> begin[lhs] < begin[rhs]; });
>> }
>>
>> Would that be enough?
>>
>> Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=A9cr=
it :
>>>
>>> A more limited change to the sorts made available is adding a discrete=
=20
>>> suffix array function. For the uninitiated, a suffix array is unlike a=
=20
>>> typical sorting algorithm in two respects: the returned object is a=20
>>> sequence of indexes or reference into the original container, and in th=
e=20
>>> case of ties, subordering is dictated by a comparison of each index's o=
r=20
>>> reference's following value in the original container with the terminal=
=20
>>> value being the global minimum value.
>>>
>>> The two significant algorithms for generating suffix arrays right now=
=20
>>> are SACA-K and divsufsort.
>>>
>>> I think a good call interface for this would be as follows:
>>>
>>> template< typename _Bidirectional_Iterator, typename _Output_Iterator >
>>> void get_suffix_array(_Bidirectional_Iterator begin,=20
>>> _Bidirectional_Iterator end, _Output_Iterator out);
>>>
>>> This would have applications in compression, text searching, and=20
>>> sequence alignment.
>>>
>>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/ffee2e05-2287-4044-9947-f19f21a389b6%40isocpp.or=
g.
------=_Part_33994_1777521046.1528128168356
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">How is it O(n=C2=B2 log n)?<br><span style=3D"font-family:=
courier new, monospace;">end - begin</span> is O(1) (random access iterato=
r)<br><span style=3D"font-family: courier new, monospace;">out_begin + n</s=
pan> is O(1) (random access iterator)<br><span style=3D"font-family: courie=
r new, monospace;">std::iota(out_begin, out_end, 0)</span> is O(n)<br><span=
style=3D"font-family: courier new, monospace;">std::sort(out_begin, out_en=
d, [&begin](auto lhs, auto rhs) { return begin[lhs] < begin[rhs]; })=
</span> is O(n log n)=C2=A0 ( begin[] is O(1))<br><br>In the end, this is O=
(n log n).<br>Maybe I have assumed random access iterators where they are n=
ot, but otherwise, it is O(n log n).<br><br>Le lundi 4 juin 2018 17:51:49 U=
TC+2, joshua.r.ma...@gmail.com a =C3=A9crit=C2=A0:<blockquote class=3D"gmai=
l_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;=
padding-left: 1ex;"><div dir=3D"ltr"><div>Unfortunately, that approach comp=
utes and is correct but is not efficient enough to be seriously considered =
in my area.=C2=A0 That approach ends up being something like O(n^2 log n).=
=C2=A0 Naive tuned algorithms are still in the realm on O(n^2) when using a=
normally linear sort.=C2=A0 SACA-K is linear time and constant workspace, =
and divsufsort is n log n time with some acceptable workspace.<br></div><di=
v><br></div><br>On Monday, June 4, 2018 at 11:36:49 AM UTC-4, <a>floria...@=
gmail.com</a> wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;mar=
gin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr=
">I have the feeling we don't a new interface for that:<br><div style=
=3D"background-color:rgb(250,250,250);border-color:rgb(187,187,187);border-=
style:solid;border-width:1px"><code><div><div><div><span style=3D"color:#00=
00ff"><span style=3D"color:#008">template</span></span><span style=3D"color=
:#000000"><span style=3D"color:#660"><</span><span style=3D"color:#000">=
</span></span><span style=3D"color:#0000ff"><span style=3D"color:#008">typ=
ename</span></span><span style=3D"color:#000000"><span style=3D"color:#000"=
> </span><span style=3D"color:#606">RandomAccess_Iterator</span><span style=
=3D"color:#660">,</span><span style=3D"color:#000"> </span></span><span sty=
le=3D"color:#0000ff"><span style=3D"color:#008">typename</span></span><span=
style=3D"color:#000000"><span style=3D"color:#000"> </span><span style=3D"=
color:#606">Output_Iterator</span><span style=3D"color:#000"> </span><span =
style=3D"color:#660">></span></span></div><div><span style=3D"color:#000=
0ff"><span style=3D"color:#008">void</span></span><span style=3D"color:#000=
000"><span style=3D"color:#000"> get_suffix_array</span><span style=3D"colo=
r:#660">(</span><span style=3D"color:#606">RandomAccess_<wbr>Iterator</span=
><span style=3D"color:#000"> </span><span style=3D"color:#008">begin</span>=
<span style=3D"color:#660">,</span><span style=3D"color:#000"> </span><span=
style=3D"color:#606">RandomAccess_Iterator</span><span style=3D"color:#000=
"> </span><span style=3D"color:#008">end</span><span style=3D"color:#660">,=
</span><span style=3D"color:#000"> </span><span style=3D"color:#606">Output=
_Iterator</span><span style=3D"color:#000"> out_begin</span><span style=3D"=
color:#660">)</span><span style=3D"color:#000"> </span><span style=3D"color=
:#660">{</span></span></div><div><span style=3D"color:#000000"><span style=
=3D"color:#000"> =C2=A0 =C2=A0</span></span><span style=3D"color:#008000"><=
span style=3D"color:#800">// few preparations</span></span></div><div><span=
style=3D"color:#000000"><span style=3D"color:#800"> =C2=A0 =C2=A0</span></=
span><span style=3D"color:#0000ff"><span style=3D"color:#800">const</span><=
/span><span style=3D"color:#000000"><span style=3D"color:#800"> </span></sp=
an><span style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></sp=
an><span style=3D"color:#000000"><span style=3D"color:#800"> n =3D end - be=
gin;</span></span></div><div><span style=3D"color:#000000"><span style=3D"c=
olor:#800"> =C2=A0 =C2=A0auto out_end =3D out_begin + n;</span></span></div=
><span style=3D"color:#000"><br></span><div><span style=3D"color:#000000"><=
span style=3D"color:#000">=C2=A0 =C2=A0 </span></span><span style=3D"color:=
#008000"><span style=3D"color:#800">// actual work</span></span></div><div>=
<span style=3D"color:#000000"><span style=3D"color:#800"> =C2=A0 =C2=A0std:=
:iota(out_begin, out_end, </span></span><span style=3D"color:#09885a"><span=
style=3D"color:#800">0</span></span><span style=3D"color:#000000"><span st=
yle=3D"color:#800">);</span></span></div><div><span style=3D"color:#000000"=
><span style=3D"color:#800"> =C2=A0 =C2=A0std::sort(out_begin, out_end, [&a=
mp;begin](</span></span><span style=3D"color:#0000ff"><span style=3D"color:=
#800">auto</span></span><span style=3D"color:#000000"><span style=3D"color:=
#800"> lhs, </span></span><span style=3D"color:#0000ff"><span style=3D"colo=
r:#800">auto</span></span><span style=3D"color:#000000"><span style=3D"colo=
r:#800"> rhs) { </span></span><span style=3D"color:#0000ff"><span style=3D"=
color:#800">return</span></span><span style=3D"color:#000000"><span style=
=3D"color:#800"> begin[lhs] < begin[rhs]; });</span></span></div><div><s=
pan style=3D"color:#000000"><span style=3D"color:#800">}</span></span></div=
></div></div></code></div><br>Would that be enough?<br><br>Le lundi 4 juin =
2018 17:14:52 UTC+2, <a>joshua.r.ma...@gmail.com</a> a =C3=A9crit=C2=A0:<bl=
ockquote class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>A more limited c=
hange to the sorts made available is adding a discrete suffix array functio=
n.=C2=A0 For the uninitiated, a suffix array is unlike a typical sorting al=
gorithm in two respects: the returned object is a sequence of indexes or re=
ference into the original container, and in the case of ties, subordering i=
s dictated by a comparison of each index's or reference's following=
value in the original container with the terminal value being the global m=
inimum value.</div><div><br></div><div>The two significant algorithms for g=
enerating suffix arrays right now are SACA-K and divsufsort.</div><div><br>=
</div><div>I think a good call interface for this would be as follows:</div=
><div><br></div><div>template< typename _Bidirectional_Iterator, typenam=
e _Output_Iterator ></div><div>void get_suffix_array(_<wbr>Bidirectional=
_Iterator begin, _Bidirectional_Iterator end, _Output_Iterator out);</div><=
div><br></div><div>This would have applications in compression, text search=
ing, and sequence alignment.<br></div></div></blockquote></div></blockquote=
></div></blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/ffee2e05-2287-4044-9947-f19f21a389b6%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/ffee2e05-2287-4044-9947-f19f21a389b6=
%40isocpp.org</a>.<br />
------=_Part_33994_1777521046.1528128168356--
------=_Part_33993_1036016547.1528128168356--
.
Author: joshua.r.marshall.1991@gmail.com
Date: Mon, 4 Jun 2018 09:15:03 -0700 (PDT)
Raw View
------=_Part_33998_398333938.1528128903915
Content-Type: multipart/alternative;
boundary="----=_Part_33999_1984808436.1528128903916"
------=_Part_33999_1984808436.1528128903916
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Not quite. Because these have to track backwards from the each starting=20
point, there are many successive comparisons, especially over the common=20
case of small alphabets. So take your expected length before mismatch of=
=20
strings, O(n/2) worst case, O(log(n)/log(|alphabet|))) best case, and=20
multiply each comparison by that. Comparisons are not O(1).
On Monday, June 4, 2018 at 12:02:48 PM UTC-4, floria...@gmail.com wrote:
>
> How is it O(n=C2=B2 log n)?
> end - begin is O(1) (random access iterator)
> out_begin + n is O(1) (random access iterator)
> std::iota(out_begin, out_end, 0) is O(n)
> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=20
> begin[lhs] < begin[rhs]; }) is O(n log n) ( begin[] is O(1))
>
> In the end, this is O(n log n).
> Maybe I have assumed random access iterators where they are not, but=20
> otherwise, it is O(n log n).
>
> Le lundi 4 juin 2018 17:51:49 UTC+2, joshua.r.ma...@gmail.com a =C3=A9cri=
t :
>>
>> Unfortunately, that approach computes and is correct but is not efficien=
t=20
>> enough to be seriously considered in my area. That approach ends up bei=
ng=20
>> something like O(n^2 log n). Naive tuned algorithms are still in the re=
alm=20
>> on O(n^2) when using a normally linear sort. SACA-K is linear time and=
=20
>> constant workspace, and divsufsort is n log n time with some acceptable=
=20
>> workspace.
>>
>>
>> On Monday, June 4, 2018 at 11:36:49 AM UTC-4, floria...@gmail.com wrote:
>>>
>>> I have the feeling we don't a new interface for that:
>>> template< typename RandomAccess_Iterator, typename Output_Iterator >
>>> void get_suffix_array(RandomAccess_Iterator begin, RandomAccess_Iterato=
r=20
>>> end, Output_Iterator out_begin) {
>>> // few preparations
>>> const auto n =3D end - begin;
>>> auto out_end =3D out_begin + n;
>>>
>>> // actual work
>>> std::iota(out_begin, out_end, 0);
>>> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=
=20
>>> begin[lhs] < begin[rhs]; });
>>> }
>>>
>>> Would that be enough?
>>>
>>> Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=A9c=
rit :
>>>>
>>>> A more limited change to the sorts made available is adding a discrete=
=20
>>>> suffix array function. For the uninitiated, a suffix array is unlike =
a=20
>>>> typical sorting algorithm in two respects: the returned object is a=20
>>>> sequence of indexes or reference into the original container, and in t=
he=20
>>>> case of ties, subordering is dictated by a comparison of each index's =
or=20
>>>> reference's following value in the original container with the termina=
l=20
>>>> value being the global minimum value.
>>>>
>>>> The two significant algorithms for generating suffix arrays right now=
=20
>>>> are SACA-K and divsufsort.
>>>>
>>>> I think a good call interface for this would be as follows:
>>>>
>>>> template< typename _Bidirectional_Iterator, typename _Output_Iterator =
>
>>>> void get_suffix_array(_Bidirectional_Iterator begin,=20
>>>> _Bidirectional_Iterator end, _Output_Iterator out);
>>>>
>>>> This would have applications in compression, text searching, and=20
>>>> sequence alignment.
>>>>
>>>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/65056fa4-3634-4430-9ca3-ca2254eb308b%40isocpp.or=
g.
------=_Part_33999_1984808436.1528128903916
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">Not quite.=C2=A0 Because these have to track backwards fro=
m the each starting point, there are many successive comparisons, especiall=
y over the common case of small alphabets.=C2=A0 So take your expected leng=
th before mismatch of strings, O(n/2) worst case, O(log(n)/log(|alphabet|))=
) best case, and multiply each comparison by that.=C2=A0 Comparisons are no=
t O(1).<br><br>On Monday, June 4, 2018 at 12:02:48 PM UTC-4, floria...@gmai=
l.com wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;margin-lef=
t: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir=3D"ltr">H=
ow is it O(n=C2=B2 log n)?<br><span style=3D"font-family:courier new,monosp=
ace">end - begin</span> is O(1) (random access iterator)<br><span style=3D"=
font-family:courier new,monospace">out_begin + n</span> is O(1) (random acc=
ess iterator)<br><span style=3D"font-family:courier new,monospace">std::iot=
a(out_begin, out_end, 0)</span> is O(n)<br><span style=3D"font-family:couri=
er new,monospace">std::sort(out_begin, out_end, [&begin](auto lhs, auto=
rhs) { return begin[lhs] < begin[rhs]; })</span> is O(n log n)=C2=A0 ( =
begin[] is O(1))<br><br>In the end, this is O(n log n).<br>Maybe I have ass=
umed random access iterators where they are not, but otherwise, it is O(n l=
og n).<br><br>Le lundi 4 juin 2018 17:51:49 UTC+2, <a>joshua.r.ma...@gmail.=
com</a> a =C3=A9crit=C2=A0:<blockquote class=3D"gmail_quote" style=3D"margi=
n:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><div>Unfortunately, that approach computes and is correct but is n=
ot efficient enough to be seriously considered in my area.=C2=A0 That appro=
ach ends up being something like O(n^2 log n).=C2=A0 Naive tuned algorithms=
are still in the realm on O(n^2) when using a normally linear sort.=C2=A0 =
SACA-K is linear time and constant workspace, and divsufsort is n log n tim=
e with some acceptable workspace.<br></div><div><br></div><br>On Monday, Ju=
ne 4, 2018 at 11:36:49 AM UTC-4, <a>floria...@gmail.com</a> wrote:<blockquo=
te class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I have the feeling we don&#=
39;t a new interface for that:<br><div style=3D"background-color:rgb(250,25=
0,250);border-color:rgb(187,187,187);border-style:solid;border-width:1px"><=
code><div><div><div><span style=3D"color:#0000ff"><span style=3D"color:#008=
">template</span></span><span style=3D"color:#000000"><span style=3D"color:=
#660"><</span><span style=3D"color:#000"> </span></span><span style=3D"c=
olor:#0000ff"><span style=3D"color:#008">typename</span></span><span style=
=3D"color:#000000"><span style=3D"color:#000"> </span><span style=3D"color:=
#606">RandomAccess_Iterator</span><span style=3D"color:#660">,</span><span =
style=3D"color:#000"> </span></span><span style=3D"color:#0000ff"><span sty=
le=3D"color:#008">typename</span></span><span style=3D"color:#000000"><span=
style=3D"color:#000"> </span><span style=3D"color:#606">Output_Iterator</s=
pan><span style=3D"color:#000"> </span><span style=3D"color:#660">></spa=
n></span></div><div><span style=3D"color:#0000ff"><span style=3D"color:#008=
">void</span></span><span style=3D"color:#000000"><span style=3D"color:#000=
"> get_suffix_array</span><span style=3D"color:#660">(</span><span style=3D=
"color:#606">RandomAccess_<wbr>Iterator</span><span style=3D"color:#000"> <=
/span><span style=3D"color:#008">begin</span><span style=3D"color:#660">,</=
span><span style=3D"color:#000"> </span><span style=3D"color:#606">RandomAc=
cess_Iterator</span><span style=3D"color:#000"> </span><span style=3D"color=
:#008">end</span><span style=3D"color:#660">,</span><span style=3D"color:#0=
00"> </span><span style=3D"color:#606">Output_Iterator</span><span style=3D=
"color:#000"> out_begin</span><span style=3D"color:#660">)</span><span styl=
e=3D"color:#000"> </span><span style=3D"color:#660">{</span></span></div><d=
iv><span style=3D"color:#000000"><span style=3D"color:#000"> =C2=A0 =C2=A0<=
/span></span><span style=3D"color:#008000"><span style=3D"color:#800">// fe=
w preparations</span></span></div><div><span style=3D"color:#000000"><span =
style=3D"color:#800"> =C2=A0 =C2=A0</span></span><span style=3D"color:#0000=
ff"><span style=3D"color:#800">const</span></span><span style=3D"color:#000=
000"><span style=3D"color:#800"> </span></span><span style=3D"color:#0000ff=
"><span style=3D"color:#800">auto</span></span><span style=3D"color:#000000=
"><span style=3D"color:#800"> n =3D end - begin;</span></span></div><div><s=
pan style=3D"color:#000000"><span style=3D"color:#800"> =C2=A0 =C2=A0auto o=
ut_end =3D out_begin + n;</span></span></div><span style=3D"color:#000"><br=
></span><div><span style=3D"color:#000000"><span style=3D"color:#000">=C2=
=A0 =C2=A0 </span></span><span style=3D"color:#008000"><span style=3D"color=
:#800">// actual work</span></span></div><div><span style=3D"color:#000000"=
><span style=3D"color:#800"> =C2=A0 =C2=A0std::iota(out_begin, out_end, </s=
pan></span><span style=3D"color:#09885a"><span style=3D"color:#800">0</span=
></span><span style=3D"color:#000000"><span style=3D"color:#800">);</span><=
/span></div><div><span style=3D"color:#000000"><span style=3D"color:#800"> =
=C2=A0 =C2=A0std::sort(out_begin, out_end, [&begin](</span></span><span=
style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></span><span=
style=3D"color:#000000"><span style=3D"color:#800"> lhs, </span></span><sp=
an style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></span><sp=
an style=3D"color:#000000"><span style=3D"color:#800"> rhs) { </span></span=
><span style=3D"color:#0000ff"><span style=3D"color:#800">return</span></sp=
an><span style=3D"color:#000000"><span style=3D"color:#800"> begin[lhs] <=
; begin[rhs]; });</span></span></div><div><span style=3D"color:#000000"><sp=
an style=3D"color:#800">}</span></span></div></div></div></code></div><br>W=
ould that be enough?<br><br>Le lundi 4 juin 2018 17:14:52 UTC+2, <a>joshua.=
r.ma...@gmail.com</a> a =C3=A9crit=C2=A0:<blockquote class=3D"gmail_quote" =
style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left=
:1ex"><div dir=3D"ltr"><div>A more limited change to the sorts made availab=
le is adding a discrete suffix array function.=C2=A0 For the uninitiated, a=
suffix array is unlike a typical sorting algorithm in two respects: the re=
turned object is a sequence of indexes or reference into the original conta=
iner, and in the case of ties, subordering is dictated by a comparison of e=
ach index's or reference's following value in the original containe=
r with the terminal value being the global minimum value.</div><div><br></d=
iv><div>The two significant algorithms for generating suffix arrays right n=
ow are SACA-K and divsufsort.</div><div><br></div><div>I think a good call =
interface for this would be as follows:</div><div><br></div><div>template&l=
t; typename _Bidirectional_Iterator, typename _Output_Iterator ></div><d=
iv>void get_suffix_array(_<wbr>Bidirectional_Iterator begin, _Bidirectional=
_Iterator end, _Output_Iterator out);</div><div><br></div><div>This would h=
ave applications in compression, text searching, and sequence alignment.<br=
></div></div></blockquote></div></blockquote></div></blockquote></div></blo=
ckquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/65056fa4-3634-4430-9ca3-ca2254eb308b%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/65056fa4-3634-4430-9ca3-ca2254eb308b=
%40isocpp.org</a>.<br />
------=_Part_33999_1984808436.1528128903916--
------=_Part_33998_398333938.1528128903915--
.
Author: florian.csdt@gmail.com
Date: Mon, 4 Jun 2018 09:29:27 -0700 (PDT)
Raw View
------=_Part_25863_481596994.1528129767281
Content-Type: multipart/alternative;
boundary="----=_Part_25864_1229247947.1528129767281"
------=_Part_25864_1229247947.1528129767281
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
My bad, I did not understand the algorithm. And my implementation is not=20
correct (my comparison predicate compares single characters).
you should probably try to explain better what it is to avoid ambiguities.
Le lundi 4 juin 2018 18:15:03 UTC+2, joshua.r.ma...@gmail.com a =C3=A9crit =
:
>
> Not quite. Because these have to track backwards from the each starting=
=20
> point, there are many successive comparisons, especially over the common=
=20
> case of small alphabets. So take your expected length before mismatch of=
=20
> strings, O(n/2) worst case, O(log(n)/log(|alphabet|))) best case, and=20
> multiply each comparison by that. Comparisons are not O(1).
>
> On Monday, June 4, 2018 at 12:02:48 PM UTC-4, floria...@gmail.com wrote:
>>
>> How is it O(n=C2=B2 log n)?
>> end - begin is O(1) (random access iterator)
>> out_begin + n is O(1) (random access iterator)
>> std::iota(out_begin, out_end, 0) is O(n)
>> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=20
>> begin[lhs] < begin[rhs]; }) is O(n log n) ( begin[] is O(1))
>>
>> In the end, this is O(n log n).
>> Maybe I have assumed random access iterators where they are not, but=20
>> otherwise, it is O(n log n).
>>
>> Le lundi 4 juin 2018 17:51:49 UTC+2, joshua.r.ma...@gmail.com a =C3=A9cr=
it :
>>>
>>> Unfortunately, that approach computes and is correct but is not=20
>>> efficient enough to be seriously considered in my area. That approach =
ends=20
>>> up being something like O(n^2 log n). Naive tuned algorithms are still=
in=20
>>> the realm on O(n^2) when using a normally linear sort. SACA-K is linea=
r=20
>>> time and constant workspace, and divsufsort is n log n time with some=
=20
>>> acceptable workspace.
>>>
>>>
>>> On Monday, June 4, 2018 at 11:36:49 AM UTC-4, floria...@gmail.com wrote=
:
>>>>
>>>> I have the feeling we don't a new interface for that:
>>>> template< typename RandomAccess_Iterator, typename Output_Iterator >
>>>> void get_suffix_array(RandomAccess_Iterator begin,=20
>>>> RandomAccess_Iterator end, Output_Iterator out_begin) {
>>>> // few preparations
>>>> const auto n =3D end - begin;
>>>> auto out_end =3D out_begin + n;
>>>>
>>>> // actual work
>>>> std::iota(out_begin, out_end, 0);
>>>> std::sort(out_begin, out_end, [&begin](auto lhs, auto rhs) { return=
=20
>>>> begin[lhs] < begin[rhs]; });
>>>> }
>>>>
>>>> Would that be enough?
>>>>
>>>> Le lundi 4 juin 2018 17:14:52 UTC+2, joshua.r.ma...@gmail.com a =C3=A9=
crit :
>>>>>
>>>>> A more limited change to the sorts made available is adding a discret=
e=20
>>>>> suffix array function. For the uninitiated, a suffix array is unlike=
a=20
>>>>> typical sorting algorithm in two respects: the returned object is a=
=20
>>>>> sequence of indexes or reference into the original container, and in =
the=20
>>>>> case of ties, subordering is dictated by a comparison of each index's=
or=20
>>>>> reference's following value in the original container with the termin=
al=20
>>>>> value being the global minimum value.
>>>>>
>>>>> The two significant algorithms for generating suffix arrays right now=
=20
>>>>> are SACA-K and divsufsort.
>>>>>
>>>>> I think a good call interface for this would be as follows:
>>>>>
>>>>> template< typename _Bidirectional_Iterator, typename _Output_Iterator=
>
>>>>> void get_suffix_array(_Bidirectional_Iterator begin,=20
>>>>> _Bidirectional_Iterator end, _Output_Iterator out);
>>>>>
>>>>> This would have applications in compression, text searching, and=20
>>>>> sequence alignment.
>>>>>
>>>>
--=20
You received this message because you are subscribed to the Google Groups "=
ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp=
..org/d/msgid/std-proposals/bd3e0341-6843-43d2-82d6-27fb7a9529d7%40isocpp.or=
g.
------=_Part_25864_1229247947.1528129767281
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">My bad, I did not understand the algorithm. And my impleme=
ntation is not correct (my comparison predicate compares single characters)=
..<br><br>you should probably try to explain better what it is to avoid ambi=
guities.<br><br><br>Le lundi 4 juin 2018 18:15:03 UTC+2, joshua.r.ma...@gma=
il.com a =C3=A9crit=C2=A0:<blockquote class=3D"gmail_quote" style=3D"margin=
: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div=
dir=3D"ltr">Not quite.=C2=A0 Because these have to track backwards from th=
e each starting point, there are many successive comparisons, especially ov=
er the common case of small alphabets.=C2=A0 So take your expected length b=
efore mismatch of strings, O(n/2) worst case, O(log(n)/log(|alphabet|))) be=
st case, and multiply each comparison by that.=C2=A0 Comparisons are not O(=
1).<br><br>On Monday, June 4, 2018 at 12:02:48 PM UTC-4, <a>floria...@gmail=
..com</a> wrote:<blockquote class=3D"gmail_quote" style=3D"margin:0;margin-l=
eft:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">How=
is it O(n=C2=B2 log n)?<br><span style=3D"font-family:courier new,monospac=
e">end - begin</span> is O(1) (random access iterator)<br><span style=3D"fo=
nt-family:courier new,monospace">out_begin + n</span> is O(1) (random acces=
s iterator)<br><span style=3D"font-family:courier new,monospace">std::iota(=
out_begin, out_end, 0)</span> is O(n)<br><span style=3D"font-family:courier=
new,monospace">std::sort(out_begin, out_end, [&begin](auto lhs, auto r=
hs) { return begin[lhs] < begin[rhs]; })</span> is O(n log n)=C2=A0 ( be=
gin[] is O(1))<br><br>In the end, this is O(n log n).<br>Maybe I have assum=
ed random access iterators where they are not, but otherwise, it is O(n log=
n).<br><br>Le lundi 4 juin 2018 17:51:49 UTC+2, <a>joshua.r.ma...@gmail.co=
m</a> a =C3=A9crit=C2=A0:<blockquote class=3D"gmail_quote" style=3D"margin:=
0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><div>Unfortunately, that approach computes and is correct but is n=
ot efficient enough to be seriously considered in my area.=C2=A0 That appro=
ach ends up being something like O(n^2 log n).=C2=A0 Naive tuned algorithms=
are still in the realm on O(n^2) when using a normally linear sort.=C2=A0 =
SACA-K is linear time and constant workspace, and divsufsort is n log n tim=
e with some acceptable workspace.<br></div><div><br></div><br>On Monday, Ju=
ne 4, 2018 at 11:36:49 AM UTC-4, <a>floria...@gmail.com</a> wrote:<blockquo=
te class=3D"gmail_quote" style=3D"margin:0;margin-left:0.8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I have the feeling we don&#=
39;t a new interface for that:<br><div style=3D"background-color:rgb(250,25=
0,250);border-color:rgb(187,187,187);border-style:solid;border-width:1px"><=
code><div><div><div><span style=3D"color:#0000ff"><span style=3D"color:#008=
">template</span></span><span style=3D"color:#000000"><span style=3D"color:=
#660"><</span><span style=3D"color:#000"> </span></span><span style=3D"c=
olor:#0000ff"><span style=3D"color:#008">typename</span></span><span style=
=3D"color:#000000"><span style=3D"color:#000"> </span><span style=3D"color:=
#606">RandomAccess_Iterator</span><span style=3D"color:#660">,</span><span =
style=3D"color:#000"> </span></span><span style=3D"color:#0000ff"><span sty=
le=3D"color:#008">typename</span></span><span style=3D"color:#000000"><span=
style=3D"color:#000"> </span><span style=3D"color:#606">Output_Iterator</s=
pan><span style=3D"color:#000"> </span><span style=3D"color:#660">></spa=
n></span></div><div><span style=3D"color:#0000ff"><span style=3D"color:#008=
">void</span></span><span style=3D"color:#000000"><span style=3D"color:#000=
"> get_suffix_array</span><span style=3D"color:#660">(</span><span style=3D=
"color:#606">RandomAccess_<wbr>Iterator</span><span style=3D"color:#000"> <=
/span><span style=3D"color:#008">begin</span><span style=3D"color:#660">,</=
span><span style=3D"color:#000"> </span><span style=3D"color:#606">RandomAc=
cess_Iterator</span><span style=3D"color:#000"> </span><span style=3D"color=
:#008">end</span><span style=3D"color:#660">,</span><span style=3D"color:#0=
00"> </span><span style=3D"color:#606">Output_Iterator</span><span style=3D=
"color:#000"> out_begin</span><span style=3D"color:#660">)</span><span styl=
e=3D"color:#000"> </span><span style=3D"color:#660">{</span></span></div><d=
iv><span style=3D"color:#000000"><span style=3D"color:#000"> =C2=A0 =C2=A0<=
/span></span><span style=3D"color:#008000"><span style=3D"color:#800">// fe=
w preparations</span></span></div><div><span style=3D"color:#000000"><span =
style=3D"color:#800"> =C2=A0 =C2=A0</span></span><span style=3D"color:#0000=
ff"><span style=3D"color:#800">const</span></span><span style=3D"color:#000=
000"><span style=3D"color:#800"> </span></span><span style=3D"color:#0000ff=
"><span style=3D"color:#800">auto</span></span><span style=3D"color:#000000=
"><span style=3D"color:#800"> n =3D end - begin;</span></span></div><div><s=
pan style=3D"color:#000000"><span style=3D"color:#800"> =C2=A0 =C2=A0auto o=
ut_end =3D out_begin + n;</span></span></div><span style=3D"color:#000"><br=
></span><div><span style=3D"color:#000000"><span style=3D"color:#000">=C2=
=A0 =C2=A0 </span></span><span style=3D"color:#008000"><span style=3D"color=
:#800">// actual work</span></span></div><div><span style=3D"color:#000000"=
><span style=3D"color:#800"> =C2=A0 =C2=A0std::iota(out_begin, out_end, </s=
pan></span><span style=3D"color:#09885a"><span style=3D"color:#800">0</span=
></span><span style=3D"color:#000000"><span style=3D"color:#800">);</span><=
/span></div><div><span style=3D"color:#000000"><span style=3D"color:#800"> =
=C2=A0 =C2=A0std::sort(out_begin, out_end, [&begin](</span></span><span=
style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></span><span=
style=3D"color:#000000"><span style=3D"color:#800"> lhs, </span></span><sp=
an style=3D"color:#0000ff"><span style=3D"color:#800">auto</span></span><sp=
an style=3D"color:#000000"><span style=3D"color:#800"> rhs) { </span></span=
><span style=3D"color:#0000ff"><span style=3D"color:#800">return</span></sp=
an><span style=3D"color:#000000"><span style=3D"color:#800"> begin[lhs] <=
; begin[rhs]; });</span></span></div><div><span style=3D"color:#000000"><sp=
an style=3D"color:#800">}</span></span></div></div></div></code></div><br>W=
ould that be enough?<br><br>Le lundi 4 juin 2018 17:14:52 UTC+2, <a>joshua.=
r.ma...@gmail.com</a> a =C3=A9crit=C2=A0:<blockquote class=3D"gmail_quote" =
style=3D"margin:0;margin-left:0.8ex;border-left:1px #ccc solid;padding-left=
:1ex"><div dir=3D"ltr"><div>A more limited change to the sorts made availab=
le is adding a discrete suffix array function.=C2=A0 For the uninitiated, a=
suffix array is unlike a typical sorting algorithm in two respects: the re=
turned object is a sequence of indexes or reference into the original conta=
iner, and in the case of ties, subordering is dictated by a comparison of e=
ach index's or reference's following value in the original containe=
r with the terminal value being the global minimum value.</div><div><br></d=
iv><div>The two significant algorithms for generating suffix arrays right n=
ow are SACA-K and divsufsort.</div><div><br></div><div>I think a good call =
interface for this would be as follows:</div><div><br></div><div>template&l=
t; typename _Bidirectional_Iterator, typename _Output_Iterator ></div><d=
iv>void get_suffix_array(_<wbr>Bidirectional_Iterator begin, _Bidirectional=
_Iterator end, _Output_Iterator out);</div><div><br></div><div>This would h=
ave applications in compression, text searching, and sequence alignment.<br=
></div></div></blockquote></div></blockquote></div></blockquote></div></blo=
ckquote></div></blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/bd3e0341-6843-43d2-82d6-27fb7a9529d7%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/bd3e0341-6843-43d2-82d6-27fb7a9529d7=
%40isocpp.org</a>.<br />
------=_Part_25864_1229247947.1528129767281--
------=_Part_25863_481596994.1528129767281--
.
Author: joshua.r.marshall.1991@gmail.com
Date: Wed, 6 Jun 2018 04:03:17 -0700 (PDT)
Raw View
------=_Part_46960_1204789278.1528282997107
Content-Type: multipart/alternative;
boundary="----=_Part_46961_1380035335.1528282997108"
------=_Part_46961_1380035335.1528282997108
Content-Type: text/plain; charset="UTF-8"
If there is nothing more to say on this, I'll go ahead and add a proposal
for the feature.
On Monday, June 4, 2018 at 11:14:52 AM UTC-4, joshua.r.ma...@gmail.com
wrote:
>
> A more limited change to the sorts made available is adding a discrete
> suffix array function. For the uninitiated, a suffix array is unlike a
> typical sorting algorithm in two respects: the returned object is a
> sequence of indexes or reference into the original container, and in the
> case of ties, subordering is dictated by a comparison of each index's or
> reference's following value in the original container with the terminal
> value being the global minimum value.
>
> The two significant algorithms for generating suffix arrays right now are
> SACA-K and divsufsort.
>
> I think a good call interface for this would be as follows:
>
> template< typename _Bidirectional_Iterator, typename _Output_Iterator >
> void get_suffix_array(_Bidirectional_Iterator begin,
> _Bidirectional_Iterator end, _Output_Iterator out);
>
> This would have applications in compression, text searching, and sequence
> alignment.
>
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/0c4b217f-7f50-478f-b82d-3931122fc543%40isocpp.org.
------=_Part_46961_1380035335.1528282997108
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">If there is nothing more to say on this, I'll go ahead=
and add a proposal for the feature.<br><br>On Monday, June 4, 2018 at 11:1=
4:52 AM UTC-4, joshua.r.ma...@gmail.com wrote:<blockquote class=3D"gmail_qu=
ote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padd=
ing-left: 1ex;"><div dir=3D"ltr"><div>A more limited change to the sorts ma=
de available is adding a discrete suffix array function.=C2=A0 For the unin=
itiated, a suffix array is unlike a typical sorting algorithm in two respec=
ts: the returned object is a sequence of indexes or reference into the orig=
inal container, and in the case of ties, subordering is dictated by a compa=
rison of each index's or reference's following value in the origina=
l container with the terminal value being the global minimum value.</div><d=
iv><br></div><div>The two significant algorithms for generating suffix arra=
ys right now are SACA-K and divsufsort.</div><div><br></div><div>I think a =
good call interface for this would be as follows:</div><div><br></div><div>=
template< typename _Bidirectional_Iterator, typename _Output_Iterator &g=
t;</div><div>void get_suffix_array(_<wbr>Bidirectional_Iterator begin, _Bid=
irectional_Iterator end, _Output_Iterator out);</div><div><br></div><div>Th=
is would have applications in compression, text searching, and sequence ali=
gnment.<br></div></div></blockquote></div>
<p></p>
-- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals" group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to <a href=3D"mailto:std-proposals+unsubscribe@isocpp.org">std-proposa=
ls+unsubscribe@isocpp.org</a>.<br />
To post to this group, send email to <a href=3D"mailto:std-proposals@isocpp=
..org">std-proposals@isocpp.org</a>.<br />
To view this discussion on the web visit <a href=3D"https://groups.google.c=
om/a/isocpp.org/d/msgid/std-proposals/0c4b217f-7f50-478f-b82d-3931122fc543%=
40isocpp.org?utm_medium=3Demail&utm_source=3Dfooter">https://groups.google.=
com/a/isocpp.org/d/msgid/std-proposals/0c4b217f-7f50-478f-b82d-3931122fc543=
%40isocpp.org</a>.<br />
------=_Part_46961_1380035335.1528282997108--
------=_Part_46960_1204789278.1528282997107--
.