Thread

Topic: a library to provide portable SIMD types

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Thu, 14 Feb 2013 09:58:23 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart2787195.XHjFYzjCva
Content-Type: text/plain; charset=ISO-8859-1

Hello.

I would like to help with making C++ a better language with regard to SIMD
programming.

Since 2009, I have been working on a C++ library that provides SIMD types
(http://code.compeng.uni-frankfurt.de/projects/vc). My work shows that it is
possible to create those types without having to abandon portability. This
library has allowed several codes in High-Energy-Physics to benefit from
SSE/AVX. Often those codes could not benefit from the auto-vectorizer. Where
the inherent limitations of auto-vectorization (and thus also explicit forms
of loop vectorization) cannot improve the code further, explicit vectorization
can often help.
The use of SIMD types also helps developers to better design their algorithms
for SIMD hardware. By knowing the available types and operations the developer
learns to design data structures and algorithms more suitable for SIMD
execution.

I believe it is useful for a wide audience to easily have access to explicit
SIMD programming. The trend on the x86 architecture is showing that SIMD is
becoming more and more important. Compiler extensions (intrinsics) and coding
in assembly should not be the only answer to explicit vectorization.

Also, my experience shows that it would be much better to have compiler
developers working more closely on this: I have to debug and work around
miscompilations and half-baked compiler extensions (SIMD intrinsics, explicit
aliasing) very often.

I read N3419 and agree that it is important to have a way to annotate loops as
vectorizable. This helps to move assumptions about the work of the compiler
into explicit code. I don't see my proposal as a replacement for the proposal
in N3419. Instead, I think it depends on the specific problem whether the loop
approach or SIMD types are the best solution.

There's more to be said if you need more convincing. I'd be happy to write a
formal proposal. Let me know what you think.

Regards,
 Matthias
--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart2787195.XHjFYzjCva
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Hello.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I would lik=
e to help with making C++ a better language with regard to SIMD programming=
..</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Since 2009,=
 I have been working on a C++ library that provides SIMD types (<a href=3D"=
http://code.compeng.uni-frankfurt.de/projects/vc"><span style=3D" text-deco=
ration: underline; color:#0057ae;">http://code.compeng.uni-frankfurt.de/pro=
jects/vc</span></a>). My work shows that it is possible to create those typ=
es without having to abandon portability. This library has allowed several =
codes in High-Energy-Physics to benefit from SSE/AVX. Often those codes cou=
ld not benefit from the auto-vectorizer. Where the inherent limitations of =
auto-vectorization (and thus also explicit forms of loop vectorization) can=
not improve the code further, explicit vectorization can often help.<br />T=
he use of SIMD types also helps developers to better design their algorithm=
s for SIMD hardware. By knowing the available types and operations the deve=
loper learns to design data structures and algorithms more suitable for SIM=
D execution.<br /><br />I believe it is useful for a wide audience to easil=
y have access to explicit SIMD programming. The trend on the x86 architectu=
re is showing that SIMD is becoming more and more important. Compiler exten=
sions (intrinsics) and coding in assembly should not be the only answer to =
explicit vectorization.<br /></p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Also, my ex=
perience shows that it would be much better to have compiler developers wor=
king more closely on this: I have to debug and work around miscompilations =
and half-baked compiler extensions (SIMD intrinsics, explicit aliasing) ver=
y often.<br /><br />I read N3419 and agree that it is important to have a w=
ay to annotate loops as vectorizable. This helps to move assumptions about =
the work of the compiler into explicit code. I don't see my proposal as a r=
eplacement for the proposal in N3419. Instead, I think it depends on the sp=
ecific problem whether the loop approach or SIMD types are the best solutio=
n.<br /></p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">There's mor=
e to be said if you need more convincing. I'd be happy to write a formal pr=
oposal. Let me know what you think.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Regards,</p=
>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> Matthias</=
p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart2787195.XHjFYzjCva--


.

Author: snk_kid <korcan.hussein@googlemail.com>
Date: Thu, 14 Feb 2013 02:03:32 -0800 (PST) Raw View

------=_Part_592_537429.1360836212369
Content-Type: text/plain; charset=ISO-8859-1

I too think we should definitely standardize vector types in both C & C++
however I believe that vector types should be built-in fundamental types
like in shader languages. I don't think an API approach is enough, the APIs
I've seen are either too low-level, too tied to specify register sizes and
have some inefficiency issues for particular cases.

For example when dealing with individual components like accessing a single
component a more efficient method that most APIs do not do is return a
proxy primitive type which behaviors like a C/C++ primitive type but is
actually a simd register with the other components either zero'd out or
left alone as instructions sets like SSE/AVX have versions of various
instructions that only operate on a single component. This would avoid the
overhead of load/stores. E.g.

instead of this:

struct float4 { float x() const { ... } };

A better method would be something like this:

struct float1 { __m128 v; }; // behaviours exactly built-in float and has
conversions to float when it's absoultely neccessary

struct float4 { float1 x() const { ... } };

This would be easier to abstract in pure C++ but not so for C. If vector
types where built-in the above wouldn't be a concern because a compiler
should be able to more easily optimize these cases without the need for
proxy types.

There is also issues of wrapping up intrinsic simd data-types in
structs/classes with some compilers like visual c++ where extra load/store
operations are generated when using intrinisc operations inside a function
which takes and returns the wrapper type .e.g

struct float4
{
    __m128 v;
    float4(const __m128& x):v(x){}
};

inline float4 foo(const float4& lhs, const float4& rhs) { return
__mm_add_ps(lhs.v, rhs.v); }

If you look at the generated code for this on VC++ and compare it to the
generated code using the bare intrinsic data-type you will see extra
load/store instructions generated. GCC and Clang generate the same code
however.

I want to see vector types be built-in types for both C and C++ and have
the same interface as vector/matrix types in shader languages like HLSL.

On Thursday, February 14, 2013 8:58:23 AM UTC, Matthias Kretz wrote:
>
>  Hello.
>
>
>
> I would like to help with making C++ a better language with regard to SIMD
> programming.
>
>
>
> Since 2009, I have been working on a C++ library that provides SIMD types (
> http://code.compeng.uni-frankfurt.de/projects/vc). My work shows that it
> is possible to create those types without having to abandon portability.
> This library has allowed several codes in High-Energy-Physics to benefit
> from SSE/AVX. Often those codes could not benefit from the auto-vectorizer.
> Where the inherent limitations of auto-vectorization (and thus also
> explicit forms of loop vectorization) cannot improve the code further,
> explicit vectorization can often help.
> The use of SIMD types also helps developers to better design their
> algorithms for SIMD hardware. By knowing the available types and operations
> the developer learns to design data structures and algorithms more suitable
> for SIMD execution.
>
> I believe it is useful for a wide audience to easily have access to
> explicit SIMD programming. The trend on the x86 architecture is showing
> that SIMD is becoming more and more important. Compiler extensions
> (intrinsics) and coding in assembly should not be the only answer to
> explicit vectorization.
>
> Also, my experience shows that it would be much better to have compiler
> developers working more closely on this: I have to debug and work around
> miscompilations and half-baked compiler extensions (SIMD intrinsics,
> explicit aliasing) very often.
>
> I read N3419 and agree that it is important to have a way to annotate
> loops as vectorizable. This helps to move assumptions about the work of the
> compiler into explicit code. I don't see my proposal as a replacement for
> the proposal in N3419. Instead, I think it depends on the specific problem
> whether the loop approach or SIMD types are the best solution.
>
> There's more to be said if you need more convincing. I'd be happy to write
> a formal proposal. Let me know what you think.
>
>
>
> Regards,
>
> Matthias
>
> --
>
> Dipl.-Phys. Matthias Kretz
>
>
>
> Web: http://compeng.uni-frankfurt.de/?mkretz
>
>
>
> SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc
>

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_592_537429.1360836212369
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I too think we should definitely standardize vector types in both C=20
&amp; C++ however I believe that vector types should be built-in=20
fundamental types like in shader languages. I don't think an API=20
approach is enough, the APIs I've seen are either too low-level, too=20
tied to specify register sizes and have some inefficiency issues for=20
particular cases.<br><br>For example when dealing with individual=20
components like accessing a single component a more efficient method=20
that most APIs do not do is return a proxy primitive type which=20
behaviors like a C/C++ primitive type but is actually a simd register=20
with the other components either zero'd out or left alone as=20
instructions sets like SSE/AVX have versions of various instructions=20
that only operate on a single component. This would avoid the overhead=20
of load/stores. E.g.<br><br>instead of this:<br><br><div class=3D"prettypri=
nt" style=3D"background-color: rgb(250, 250, 250); border-color: rgb(187, 1=
87, 187); border-style: solid; border-width: 1px; word-wrap: break-word;"><=
code class=3D"prettyprint"><div class=3D"subprettyprint"><span style=3D"col=
or: #008;" class=3D"styled-by-prettify">struct</span><span style=3D"color: =
#000;" class=3D"styled-by-prettify"> float4 </span><span style=3D"color: #6=
60;" class=3D"styled-by-prettify">{</span><span style=3D"color: #000;" clas=
s=3D"styled-by-prettify"> </span><span style=3D"color: #008;" class=3D"styl=
ed-by-prettify">float</span><span style=3D"color: #000;" class=3D"styled-by=
-prettify"> x</span><span style=3D"color: #660;" class=3D"styled-by-prettif=
y">()</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> </sp=
an><span style=3D"color: #008;" class=3D"styled-by-prettify">const</span><s=
pan style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span style=
=3D"color: #660;" class=3D"styled-by-prettify">{</span><span style=3D"color=
: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #660;" =
class=3D"styled-by-prettify">...</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> </span><span style=3D"color: #660;" class=3D"style=
d-by-prettify">}</span><span style=3D"color: #000;" class=3D"styled-by-pret=
tify"> </span><span style=3D"color: #660;" class=3D"styled-by-prettify">};<=
/span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></span>=
</div></code></div><br>A better method would be something like this:<br><br=
><div class=3D"prettyprint" style=3D"background-color: rgb(250, 250, 250); =
border-color: rgb(187, 187, 187); border-style: solid; border-width: 1px; w=
ord-wrap: break-word;"><code class=3D"prettyprint"><div class=3D"subprettyp=
rint"><span style=3D"color: #008;" class=3D"styled-by-prettify">struct</spa=
n><span style=3D"color: #000;" class=3D"styled-by-prettify"> float1 </span>=
<span style=3D"color: #660;" class=3D"styled-by-prettify">{</span><span sty=
le=3D"color: #000;" class=3D"styled-by-prettify"> __m128 v</span><span styl=
e=3D"color: #660;" class=3D"styled-by-prettify">;</span><span style=3D"colo=
r: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #660;"=
 class=3D"styled-by-prettify">};</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> </span><span style=3D"color: #800;" class=3D"style=
d-by-prettify">// behaviours exactly built-in float and has conversions to =
float when it's absoultely neccessary</span><span style=3D"color: #000;" cl=
ass=3D"styled-by-prettify"><br><br></span><span style=3D"color: #008;" clas=
s=3D"styled-by-prettify">struct</span><span style=3D"color: #000;" class=3D=
"styled-by-prettify"> float4 </span><span style=3D"color: #660;" class=3D"s=
tyled-by-prettify">{</span><span style=3D"color: #000;" class=3D"styled-by-=
prettify"> float1 x</span><span style=3D"color: #660;" class=3D"styled-by-p=
rettify">()</span><span style=3D"color: #000;" class=3D"styled-by-prettify"=
> </span><span style=3D"color: #008;" class=3D"styled-by-prettify">const</s=
pan><span style=3D"color: #000;" class=3D"styled-by-prettify"> </span><span=
 style=3D"color: #660;" class=3D"styled-by-prettify">{</span><span style=3D=
"color: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #=
660;" class=3D"styled-by-prettify">...</span><span style=3D"color: #000;" c=
lass=3D"styled-by-prettify"> </span><span style=3D"color: #660;" class=3D"s=
tyled-by-prettify">}</span><span style=3D"color: #000;" class=3D"styled-by-=
prettify"> </span><span style=3D"color: #660;" class=3D"styled-by-prettify"=
>};</span><span style=3D"color: #000;" class=3D"styled-by-prettify"><br></s=
pan></div></code></div><br>This
 would be easier to abstract in pure C++ but not so for C. If vector=20
types where built-in the above wouldn't be a concern because a compiler=20
should be able to more easily optimize these cases without the need for=20
proxy types.<br><br>There is also issues of wrapping up intrinsic simd=20
data-types in structs/classes with some compilers like visual c++ where=20
extra load/store operations are generated when using intrinisc=20
operations inside a function which takes and returns the wrapper type=20
..e.g<br><br><div class=3D"prettyprint" style=3D"background-color: rgb(250, =
250, 250); border-color: rgb(187, 187, 187); border-style: solid; border-wi=
dth: 1px; word-wrap: break-word;"><code class=3D"prettyprint"><div class=3D=
"subprettyprint"><span style=3D"color: #008;" class=3D"styled-by-prettify">=
struct</span><span style=3D"color: #000;" class=3D"styled-by-prettify"> flo=
at4<br></span><span style=3D"color: #660;" class=3D"styled-by-prettify">{<b=
r>&nbsp;&nbsp;&nbsp; </span><span style=3D"color: #000;" class=3D"styled-by=
-prettify">__m128 v</span><span style=3D"color: #660;" class=3D"styled-by-p=
rettify">;<br>&nbsp;&nbsp;&nbsp; </span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"></span><span style=3D"color: #660;" class=3D"styled=
-by-prettify"><code class=3D"prettyprint"><span style=3D"color: #000;" clas=
s=3D"styled-by-prettify">float4</span><span style=3D"color: #000;" class=3D=
"styled-by-prettify">(const __m128&amp; x):v(x){}<br></span></code>};</span=
><span style=3D"color: #000;" class=3D"styled-by-prettify"><br><br>inline f=
loat4 foo</span><span style=3D"color: #660;" class=3D"styled-by-prettify">(=
</span><span style=3D"color: #008;" class=3D"styled-by-prettify">const</spa=
n><span style=3D"color: #000;" class=3D"styled-by-prettify"> float4</span><=
span style=3D"color: #660;" class=3D"styled-by-prettify">&amp;</span><span =
style=3D"color: #000;" class=3D"styled-by-prettify"> lhs</span><span style=
=3D"color: #660;" class=3D"styled-by-prettify">,</span><span style=3D"color=
: #000;" class=3D"styled-by-prettify"> </span><span style=3D"color: #008;" =
class=3D"styled-by-prettify">const</span><span style=3D"color: #000;" class=
=3D"styled-by-prettify"> float4</span><span style=3D"color: #660;" class=3D=
"styled-by-prettify">&amp;</span><span style=3D"color: #000;" class=3D"styl=
ed-by-prettify"> rhs</span><span style=3D"color: #660;" class=3D"styled-by-=
prettify">)</span><span style=3D"color: #000;" class=3D"styled-by-prettify"=
> </span><span style=3D"color: #660;" class=3D"styled-by-prettify">{</span>=
<span style=3D"color: #000;" class=3D"styled-by-prettify"> return __mm_add_=
ps(lhs.v, rhs.v); }<br></span></div></code></div><br>If
 you look at the generated code for this on VC++ and compare it to the=20
generated code using the bare intrinsic data-type you will see extra=20
load/store instructions generated. GCC and Clang generate the same code=20
however.<br><br>I want to see vector types be built-in types for both C=20
and C++ and have the same interface as vector/matrix types in shader=20
languages like HLSL.<br><br>On Thursday, February 14, 2013 8:58:23 AM UTC, =
Matthias Kretz wrote:<blockquote class=3D"gmail_quote" style=3D"margin: 0;m=
argin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;">
<div style=3D"font-family:'Monospace';font-size:9pt;font-weight:400;font-st=
yle:normal">
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Hello.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">I would like to help with making C++ a better language =
with regard to SIMD programming.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Since 2009, I have been working on a C++ library that p=
rovides SIMD types (<a href=3D"http://code.compeng.uni-frankfurt.de/project=
s/vc" target=3D"_blank"><span style=3D"text-decoration:underline;color:#005=
7ae">http://code.compeng.uni-<wbr>frankfurt.de/projects/vc</span></a>). My =
work shows that it is possible to create those types without having to aban=
don portability. This library has allowed several codes in High-Energy-Phys=
ics to benefit from SSE/AVX. Often those codes could not benefit from the a=
uto-vectorizer. Where the inherent limitations of auto-vectorization (and t=
hus also explicit forms of loop vectorization) cannot improve the code furt=
her, explicit vectorization can often help.<br>The use of SIMD types also h=
elps developers to better design their algorithms for SIMD hardware. By kno=
wing the available types and operations the developer learns to design data=
 structures and algorithms more suitable for SIMD execution.<br><br>I belie=
ve it is useful for a wide audience to easily have access to explicit SIMD =
programming. The trend on the x86 architecture is showing that SIMD is beco=
ming more and more important. Compiler extensions (intrinsics) and coding i=
n assembly should not be the only answer to explicit vectorization.<br></p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Also, my experience shows that it would be much better =
to have compiler developers working more closely on this: I have to debug a=
nd work around miscompilations and half-baked compiler extensions (SIMD int=
rinsics, explicit aliasing) very often.<br><br>I read N3419 and agree that =
it is important to have a way to annotate loops as vectorizable. This helps=
 to move assumptions about the work of the compiler into explicit code. I d=
on't see my proposal as a replacement for the proposal in N3419. Instead, I=
 think it depends on the specific problem whether the loop approach or SIMD=
 types are the best solution.<br></p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">There's more to be said if you need more convincing. I'=
d be happy to write a formal proposal. Let me know what you think.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Regards,</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px"> Matthias</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">-- </p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Dipl.-Phys. Matthias Kretz</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Web:   <a href=3D"http://compeng.uni-frankfurt.de/?mkre=
tz" target=3D"_blank">http://compeng.uni-frankfurt.<wbr>de/?mkretz</a></p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">SIMD easy and portable: <a href=3D"http://compeng.uni-f=
rankfurt.de/?vc" target=3D"_blank">http://compeng.uni-frankfurt.<wbr>de/?vc=
</a></p></div></blockquote>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_592_537429.1360836212369--

.

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Thu, 14 Feb 2013 13:08:39 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart9817129.HUSsmlJMF2
Content-Type: text/plain; charset=ISO-8859-1

You're covering a lot of interesting points. But apparently you have not taken
the time to look at the Vc library I linked to. I have worked on all of the
points you mention.
My conclusion from my work is that the library approach is the most flexible
and least invasive (to the core language and to the compilers) one. At the
same time it can be as efficient as built-in fundamental types. It obviously
depends on the optimizations provided by the compiler. Also a library must
depend on compiler extensions to access the SIMD registers and instructions.
With a library approach the compiler vendor is free to use an extension of his
choice. This also implies that any shortcoming in current compilers are
certainly fixable.

Now to a few concrete answers:

On Thursday 14 February 2013 02:03:32 snk_kid wrote:
> I don't think an API approach is enough,

Fundamental types would also be an API, but I assume you mean to say that a
library is not enough.

> the APIs I've seen are either too low-level, too tied to specify register
> sizes and have some inefficiency issues for particular cases.

I agree, except for the library that I developed. ;) No, really, as I said:
I'm prepared to do a more detailed writeup of how a SIMD library could look
like and how it could be implemented.
Obviously any abstraction to SIMD (and this is what a target-independent
language like C++ would have to provide) cannot easily cover all special
instructions provided by a given CPU.

> For example when dealing with individual components like accessing a single
> component a more efficient method that most APIs do not do is return a
> proxy primitive type which behaviors like a C/C++ primitive type but is
> actually a simd register with the other components either zero'd out or
> left alone as instructions sets like SSE/AVX have versions of various
> instructions that only operate on a single component. This would avoid the
> overhead of load/stores.

My experience from 64-bit Linux: if you alias float[4] and __m128 where this
is needed, you get perfectly optimized code. Compilers know that an XMM
register aliases both a float, a double, and all the __m128 types.

If you were to introduce a float1 type that has a larger sizeof than float
itself then you might actually introduce unnecessary load/store bandwidth. Vc
shows that aliasing SIMD vectors and its entries is possible, and - depending
on the compiler - can be optimally translated to machine code.

> There is also issues of wrapping up intrinsic simd data-types in
> structs/classes with some compilers like visual c++ where extra load/store
> operations are generated when using intrinisc operations inside a function
> which takes and returns the wrapper type

Inlining is of course crucial. Which is why most functions in Vc are marked as
__forceinline / __attribute__((__always_inline__)). Larger functions don't
lose much, especially since we have efficient store to load forwarding on x86.

> If you look at the generated code for this on VC++ and compare it to the
> generated code using the bare intrinsic data-type you will see extra
> load/store instructions generated. GCC and Clang generate the same code
> however.

I already tried to find out what the rationale behind the Windows ABI is. So
far it still looks arbitrary to me:
- on 32 bit xmm0-xmm4 can be used for __m128 parameters (by value)
- on 64 bit __m128 parameters will never be passed via register (because there
is an associated stack location for every function parameter, and that
location has only 8 Bytes)
- with MSVC, any type that contains a __m128 cannot be used as function
parameter

If better SIMD support in C++ goes forward I'm sure Microsoft will be able to
improve on this. At least I don't see any showstopper there.

> I want to see vector types be built-in types for both C and C++

Of course, if the goal is to have SIMD types in C then a library approach will
not work.

--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart9817129.HUSsmlJMF2
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">You're cove=
ring a lot of interesting points. But apparently you have not taken the tim=
e to look at the Vc library I linked to. I have worked on all of the points=
 you mention.</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">My conclusi=
on from my work is that the library approach is the most flexible and least=
 invasive (to the core language and to the compilers) one. At the same time=
 it can be as efficient as built-in fundamental types. It obviously depends=
 on the optimizations provided by the compiler. Also a library must depend =
on compiler extensions to access the SIMD registers and instructions. With =
a library approach the compiler vendor is free to use an extension of his c=
hoice. This also implies that any shortcoming in current compilers are cert=
ainly fixable.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Now to a fe=
w concrete answers:</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Thursday=
 14 February 2013 02:03:32 snk_kid wrote:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; I don'=
t think an API approach is enough,</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Fundamental=
 types would also be an API, but I assume you mean to say that a library is=
 not enough.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; the AP=
Is=A0I've seen are either too low-level, too tied to specify register</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; sizes =
and have some inefficiency issues for particular cases.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I agree, ex=
cept for the library that I developed. ;) No, really, as I said: I'm prepar=
ed to do a more detailed writeup of how a SIMD library could look like and =
how it could be implemented.</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Obviously a=
ny abstraction to SIMD (and this is what a target-independent language like=
 C++ would have to provide) cannot easily cover all special instructions pr=
ovided by a given CPU.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; For ex=
ample when dealing with individual components like accessing a single=A0</p=
>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; compon=
ent a more efficient method that most APIs do not do is return a</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; proxy =
primitive type which behaviors like a C/C++ primitive type but is</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; actual=
ly a simd register with the other components either zero'd out or</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; left a=
lone as instructions sets like SSE/AVX have versions of various</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; instru=
ctions that only operate on a single component. This would avoid the</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; overhe=
ad of load/stores.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">My experien=
ce from 64-bit Linux: if you alias float[4] and __m128 where this is needed=
, you get perfectly optimized code. Compilers know that an XMM register ali=
ases both a float, a double, and all the __m128 types.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">If you were=
 to introduce a float1 type that has a larger sizeof than float itself then=
 you might actually introduce unnecessary load/store bandwidth. Vc shows th=
at aliasing SIMD vectors and its entries is possible, and - depending on th=
e compiler - can be optimally translated to machine code.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; There =
is also issues of wrapping up intrinsic simd data-types in=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; struct=
s/classes with some compilers like visual c++ where extra load/store=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; operat=
ions are generated when using intrinisc operations inside a function=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; which =
takes and returns the wrapper type</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Inlining is=
 of course crucial. Which is why most functions in Vc are marked as __force=
inline / __attribute__((__always_inline__)). Larger functions don't lose mu=
ch, especially since we have efficient store to load forwarding on x86.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; If you=
 look at the generated code for this on VC++ and compare it to the=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; genera=
ted code using the bare intrinsic data-type you will see extra=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; load/s=
tore instructions generated. GCC and Clang generate the same code=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; howeve=
r.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I already t=
ried to find out what the rationale behind the Windows ABI is. So far it st=
ill looks arbitrary to me:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- on 32 bit=
 xmm0-xmm4 can be used for __m128 parameters (by value)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- on 64 bit=
 __m128 parameters will never be passed via register (because there is an a=
ssociated stack location for every function parameter, and that location ha=
s only 8 Bytes)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- with MSVC=
, any type that contains a __m128 cannot be used as function parameter</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">If better S=
IMD support in C++ goes forward I'm sure Microsoft will be able to improve =
on this. At least I don't see any showstopper there.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; I want=
 to see vector types be built-in types for both C and C++</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Of course, =
if the goal is to have SIMD types in C then a library approach will not wor=
k.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart9817129.HUSsmlJMF2--


.

Author: Lawrence Crowl <crowl@googlers.com>
Date: Thu, 14 Feb 2013 14:29:49 -0800 Raw View

On 2/14/13, Matthias Kretz <kretz@compeng.uni-frankfurt.de> wrote:
> You're covering a lot of interesting points. But apparently you
> have not taken the time to look at the Vc library I linked to. I
> have worked on all of the points you mention.  My conclusion from
> my work is that the library approach is the most flexible and
> least invasive (to the core language and to the compilers) one. At
> the same time it can be as efficient as built-in fundamental
> types. It obviously depends on the optimizations provided by the
> compiler. Also a library must depend on compiler extensions to
> access the SIMD registers and instructions.
>
> With a library approach the compiler vendor is free to use an
> extension of his choice. This also implies that any shortcoming
> in current compilers are certainly fixable.

I agree that we should prefer a library approach.

Can you explain why std::valarray and its supporting types are
not sufficient?

--
Lawrence Crowl

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: Marc <marc.glisse@gmail.com>
Date: Thu, 14 Feb 2013 16:26:21 -0800 (PST) Raw View

------=_Part_1259_31360675.1360887981950
Content-Type: text/plain; charset=ISO-8859-1

On Thursday, February 14, 2013 9:58:23 AM UTC+1, Matthias Kretz wrote:
> Since 2009, I have been working on a C++ library that provides SIMD types
(http://code.compeng.uni-frankfurt.de/projects/vc).

I just had a quick look and notice that you seem to use dynamically sized
vectors. That doesn't match my experience. In most cases where a dynamic
sized vector makes sense, auto-vectorization works. And when I want to
hand-write vector code, I want a fixed size that I chose and no allocation
overhead (more like array than valarray). Sorry if I misunderstood how your
library works, this was a very quick glance at the doc.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_1259_31360675.1360887981950
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Thursday, February 14, 2013 9:58:23 AM UTC+1, Matthias Kretz wrote: <br>=
&gt; Since 2009, I have been working on a C++ library that provides SIMD ty=
pes (<a href=3D"http://code.compeng.uni-frankfurt.de/projects/vc">http://co=
de.compeng.uni-frankfurt.de/projects/vc</a>).<br><br>I just had a quick loo=
k and notice that you seem to use dynamically sized vectors. That doesn't m=
atch my experience. In most cases where a dynamic sized vector makes sense,=
 auto-vectorization works. And when I want to hand-write vector code, I wan=
t a fixed size that I chose and no allocation overhead (more like array tha=
n valarray). Sorry if I misunderstood how your library works, this was a ve=
ry quick glance at the doc.

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_1259_31360675.1360887981950--

.

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Fri, 15 Feb 2013 15:34:14 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart6327529.orCdBEfr7d
Content-Type: text/plain; charset=ISO-8859-1

On Thursday 14 February 2013 14:29:49 Lawrence Crowl wrote:
> Can you explain why std::valarray and its supporting types are
> not sufficient?

Because valarray is variably sized - worse: at runtime. Normally one must keep
the working-set small for best cache usage. Therefore it is often best to use
exactly one SIMD register. In valarray terms you'd need to represent a
vectorized point (well four points in this example) like this:
struct PointVec {
  std::valarray<float> x, y, z;
  PointVec() : x(4), y(4), z(4) {}
  float length() const {
    return std::sqrt(x * x + y * y + z * z);
  }
};

You want the length function to compile to three mulps, two addps, and one
sqrtps instructions (and one ret) - no more. No test and jump anywhere. It
will not be possible to optimize valarray that way.

From my experience it is best to provide types that are fixed-size. The size
equals the capabilities of the hardware. Thus a float SIMD vector has exactly
4 entries with SSE, 8 entries with AVX, 16 entries on the Xeon Phi, and only
one entry if compiled without SIMD support.

--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart6327529.orCdBEfr7d
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Thursday=
 14 February 2013 14:29:49 Lawrence Crowl wrote:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; Can yo=
u explain why std::valarray and its supporting types are</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; not su=
fficient?</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Because val=
array is variably sized - worse: at runtime. Normally one must keep the wor=
king-set small for best cache usage. Therefore it is often best to use exac=
tly one SIMD register. In valarray terms you'd need to represent a vectoriz=
ed point (well four points in this example) like this:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">struct Poin=
tVec {</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  std::vala=
rray&lt;float&gt; x, y, z;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  PointVec(=
) : x(4), y(4), z(4) {}</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  float len=
gth() const {</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">    return =
std::sqrt(x * x + y * y + z * z);</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  }</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">};</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">You want th=
e length function to compile to three mulps, two addps, and one sqrtps inst=
ructions (and one ret) - no more. No test and jump anywhere. It will not be=
 possible to optimize valarray that way.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">From my exp=
erience it is best to provide types that are fixed-size. The size equals th=
e capabilities of the hardware. Thus a float SIMD vector has exactly 4 entr=
ies with SSE, 8 entries with AVX, 16 entries on the Xeon Phi, and only one =
entry if compiled without SIMD support.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart6327529.orCdBEfr7d--


.

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Fri, 15 Feb 2013 15:51:09 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart23903025.WO8zadYTuQ
Content-Type: text/plain; charset=ISO-8859-1

On Thursday 14 February 2013 16:26:21 Marc wrote:
> I just had a quick look and notice that you seem to use dynamically sized
> vectors.

No, I propose to provide fixed-size types. The size is determined by the
target. Maybe I should provide a little code example to give you a better idea
what I'm talking about.

Let's start very simple (you can also find an elaborate version in the Vc
documentation http://code.compeng.uni-frankfurt.de/docs/Vc-master/ex-polarcoord.html): we have a function that is supposed to convert a 2d
cartesian coordinate to a polar coordinate:

void convert(float &x_r, float &y_phi)
{
  float r = std::sqrt(x_r * x_r);
  y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
  x_r = r;
}

Now instead of just converting a single coordinate at a time we can convert as
many as fit into one SIMD vector:

using Vc::float_v;
void convert(float_v &x_r, float_v &y_phi)
{
  float_v r = std::sqrt(x_r * x_r);
  y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
  x_r = r;
}

That's it. Now, if you compile for SSE the convert function converts 4
coordinates per call, 8 with AVX, 16 with Xeon Phi ... no #ifdef involved.

(note that auto-vectorization can only vectorize such a function if it is
inlined)

The types for the different SIMD widths are not the same internally. So if you
compile this convert function with SSE and compile the caller to that function
with AVX it will not link. At compile time the type resolves to either
Vc::Scalar::Vector<float>, Vc::SSE::Vector<float>, or Vc::AVX::Vector<float>.

In essence, the Vc library allows one to write code that is as efficient as
with intrinsics. I have done considerable review of the machine code compilers
generate. From that experience I can say that with current GCC and clang this
works almost perfectly.

--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart23903025.WO8zadYTuQ
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Thursday=
 14 February 2013 16:26:21 Marc wrote:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; I just=
 had a quick look and notice that you seem to use dynamically sized</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; vector=
s.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">No, I propo=
se to provide fixed-size types. The size is determined by the target. Maybe=
 I should provide a little code example to give you a better idea what I'm =
talking about.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Let's start=
 very simple (you can also find an elaborate version in the Vc documentatio=
n http://code.compeng.uni-frankfurt.de/docs/Vc-master/ex-polarcoord.html): =
we have a function that is supposed to convert a 2d cartesian coordinate to=
 a polar coordinate:</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">void conver=
t(float &amp;x_r, float &amp;y_phi)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">{</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  float r =
=3D std::sqrt(x_r * x_r);</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  y_phi =3D=
 std::atan2(y_phi, x_r) * float(180/M_PI);</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  x_r =3D r=
;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">}</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Now instead=
 of just converting a single coordinate at a time we can convert as many as=
 fit into one SIMD vector:</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">using Vc::f=
loat_v;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">void conver=
t(float_v &amp;x_r, float_v &amp;y_phi)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">{</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  float_v r=
 =3D std::sqrt(x_r * x_r);</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  y_phi =3D=
 std::atan2(y_phi, x_r) * float(180/M_PI);</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">  x_r =3D r=
;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">}</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">That's it. =
Now, if you compile for SSE the convert function converts 4 coordinates per=
 call, 8 with AVX, 16 with Xeon Phi ... no #ifdef involved.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">(note that =
auto-vectorization can only vectorize such a function if it is inlined)</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">The types f=
or the different SIMD widths are not the same internally. So if you compile=
 this convert function with SSE and compile the caller to that function wit=
h AVX it will not link. At compile time the type resolves to either Vc::Sca=
lar::Vector&lt;float&gt;, Vc::SSE::Vector&lt;float&gt;, or Vc::AVX::Vector&=
lt;float&gt;. </p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">In essence,=
 the Vc library allows one to write code that is as efficient as with intri=
nsics. I have done considerable review of the machine code compilers genera=
te. From that experience I can say that with current GCC and clang this wor=
ks almost perfectly.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart23903025.WO8zadYTuQ--


.

Author: Marc <marc.glisse@gmail.com>
Date: Fri, 15 Feb 2013 15:18:45 -0800 (PST) Raw View

------=_Part_26_30624274.1360970325495
Content-Type: text/plain; charset=ISO-8859-1

On Friday, February 15, 2013 3:51:09 PM UTC+1, Matthias Kretz wrote:
>
>  On Thursday 14 February 2013 16:26:21 Marc wrote:
>
> > I just had a quick look and notice that you seem to use dynamically sized
>
> > vectors.
>
>
>
> No, I propose to provide fixed-size types. The size is determined by the
> target.
>

Ah, ok.

Let's start very simple (you can also find an elaborate version in the Vc
> documentation
> http://code.compeng.uni-frankfurt.de/docs/Vc-master/ex-polarcoord.html):
> we have a function that is supposed to convert a 2d cartesian coordinate to
> a polar coordinate:
>
>
>
> void convert(float &x_r, float &y_phi)
>
> {
>
> float r = std::sqrt(x_r * x_r);
>
> y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
>
> x_r = r;
>
> }
>
>
>
> Now instead of just converting a single coordinate at a time we can
> convert as many as fit into one SIMD vector:
>
>
>
> using Vc::float_v;
>
> void convert(float_v &x_r, float_v &y_phi)
>
> {
>
> float_v r = std::sqrt(x_r * x_r);
>
> y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
>
> x_r = r;
>
> }
>
>
>
> That's it. Now, if you compile for SSE the convert function converts 4
> coordinates per call, 8 with AVX, 16 with Xeon Phi ... no #ifdef involved.
>
>
Reminds me of Cilk, which automatically generates the second function if
you annotate the scalar one. Makes it hard from an ABI point of view.

> (note that auto-vectorization can only vectorize such a function if it is
> inlined)
>

Not quite, but close enough.

So with your library, on a recent Intel processor, I cannot use vectors of
size 2?
Again, it sounds like you are targeting iteration on very large arrays (the
same thing auto-vectorization targets) and not small fixed-size arrays.
Which can be fine, I am just trying to understand.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.

------=_Part_26_30624274.1360970325495
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Friday, February 15, 2013 3:51:09 PM UTC+1, Matthias Kretz wrote:<blockq=
uote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-lef=
t: 1px #ccc solid;padding-left: 1ex;">
<div style=3D"font-family:'Monospace';font-size:9pt;font-weight:400;font-st=
yle:normal">
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">On Thursday 14 February 2013 16:26:21 Marc wrote:</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; I just had a quick look and notice that you seem t=
o use dynamically sized</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; vectors.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">No, I propose to provide fixed-size types. The size is =
determined by the target.</p></div></blockquote><div><br>Ah, ok.<br><br></d=
iv><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; bo=
rder-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"=
font-family: 'Monospace'; font-size: 9pt; font-weight: 400; font-style: nor=
mal;"><p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-r=
ight:0px;text-indent:0px">Let's start very simple (you can also find an ela=
borate version in the Vc documentation <a href=3D"http://code.compeng.uni-f=
rankfurt.de/docs/Vc-master/ex-polarcoord.html" target=3D"_blank">http://cod=
e.compeng.uni-<wbr>frankfurt.de/docs/Vc-master/<wbr>ex-polarcoord.html</a>)=
: we have a function that is supposed to convert a 2d cartesian coordinate =
to a polar coordinate:</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">void convert(float &amp;x_r, float &amp;y_phi)</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">{</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  float r =3D std::sqrt(x_r * x_r);</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  y_phi =3D std::atan2(y_phi, x_r) * float(180/M_PI);</=
p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  x_r =3D r;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">}</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Now instead of just converting a single coordinate at a=
 time we can convert as many as fit into one SIMD vector:</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">using Vc::float_v;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">void convert(float_v &amp;x_r, float_v &amp;y_phi)</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">{</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  float_v r =3D std::sqrt(x_r * x_r);</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  y_phi =3D std::atan2(y_phi, x_r) * float(180/M_PI);</=
p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">  x_r =3D r;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">}</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">That's it. Now, if you compile for SSE the convert func=
tion converts 4 coordinates per call, 8 with AVX, 16 with Xeon Phi ... no #=
ifdef involved.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px"></p></div></blockquote><div><br>Reminds me of Cilk, whi=
ch automatically generates the second function if you annotate the scalar o=
ne. Makes it hard from an ABI point of view.<br>&nbsp;<br></div><blockquote=
 class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px=
 solid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"font-family: '=
Monospace'; font-size: 9pt; font-weight: 400; font-style: normal;">

<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">(note that auto-vectorization can only vectorize such a=
 function if it is inlined)</p></div></blockquote><div><br>Not quite, but c=
lose enough.<br><br>So with your library, on a recent Intel processor, I ca=
nnot use vectors of size 2?<br>Again, it sounds like you are targeting iter=
ation on very large arrays (the same thing auto-vectorization targets) and =
not small fixed-size arrays. Which can be fine, I am just trying to underst=
and.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_26_30624274.1360970325495--

.

Author: Lawrence Crowl <crowl@googlers.com>
Date: Fri, 15 Feb 2013 17:51:07 -0800 Raw View

On 2/15/13, Matthias Kretz <kretz@compeng.uni-frankfurt.de> wrote:
> On Thursday 14 February 2013 16:26:21 Marc wrote:
> > I just had a quick look and notice that you seem to use
> > dynamically sized vectors.
>
> No, I propose to provide fixed-size types. The size is determined
> by the target. Maybe I should provide a little code example to
> give you a better idea what I'm talking about.

While I have no issue with fixed-size types, making that size
dependent on the platform seems like a problem.  Wouldn't we
generally rather want the programmer to write an algorithm solving
the problem than adapting to the machine?

> Let's start very simple (you can also find
> an elaborate version in the Vc documentation
> http://code.compeng.uni-frankfurt.de/docs/Vc-master/ex-polarcoord.html):
> we have a function that is supposed to convert a 2d cartesian
> coordinate to a polar coordinate:
>
> void convert(float &x_r, float &y_phi)
> {
>   float r = std::sqrt(x_r * x_r);
>   y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
>   x_r = r;
> }
>
> Now instead of just converting a single coordinate at a time we
> can convert as many as fit into one SIMD vector:
>
> using Vc::float_v;
> void convert(float_v &x_r, float_v &y_phi)
> {
>   float_v r = std::sqrt(x_r * x_r);
>   y_phi = std::atan2(y_phi, x_r) * float(180/M_PI);
>   x_r = r;
> }
>
> That's it. Now, if you compile for SSE the convert function
> converts 4 coordinates per call, 8 with AVX, 16 with Xeon Phi
> ... no #ifdef involved.  (note that auto-vectorization can only
> vectorize such a function if it is inlined)

That can't be 'it'.  Where is the step that converts my data into
a float_v?

> The types for the different SIMD widths are not the same
> internally. So if you compile this convert function with SSE and
> compile the caller to that function with AVX it will not link. At
> compile time the type resolves to either Vc::Scalar::Vector<float>,
> Vc::SSE::Vector<float>, or Vc::AVX::Vector<float>.

IMHO, compile-time binding to the architecture is fine.

> In essence, the Vc library allows one to write code that is as
> efficient as with intrinsics. I have done considerable review of
> the machine code compilers generate. From that experience I can
> say that with current GCC and clang this works almost perfectly.

That's good.  I'm still concerned about usability and portability.

--
Lawrence Crowl

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Tue, 19 Feb 2013 20:20:43 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart1546083.uk5qDH2B3c
Content-Type: text/plain; charset=ISO-8859-1

On Friday 15 February 2013 15:18:45 Marc wrote:
> Reminds me of Cilk, which automatically generates the second function if
> you annotate the scalar one. Makes it hard from an ABI point of view.

Yes, Cilk's function annotations are a good alternative for many cases. What I
like about the explicit use of SIMD types, though, is the increased awareness
for the developer about the code. When you can only write code with scalar
types you're not writing down what you actually want to happen. Which makes it
somewhat harder to learn designing algorithms/data-structures for SIMD.

ABI is certainly a very important topic (important for my SIMD types in C++,
but should ultimately be discussed in a separate issue, in my opinion). But
this is true whenever you start to use instructions from optional/new
instruction sets. Your target suddenly isn't x86 anymore - it's now way more
specific and includes microarchitectural features. In a way what happens is
comparable to compiling for different architectures, e.g. ia32 vs. amd64.
You'd never expect the two to be linked into one binary. You'd also never
expect all types to have the same sizeof. Now, once you take SIMD into the
target, x86 appears as a quickly changing target. I'd like to work on this
issue more thoroughly. Right now I've only sketched out some patterns how to
solve the problem portably and without compiler extensions.

But I don't know if C++ should deviate from its target-agnostic view of the
world...

> > (note that auto-vectorization can only vectorize such a function if it is
> > inlined)
>
> Not quite, but close enough.

Care to elaborate on "not quite"? If I'm missing something on the auto-
vectorization front I'd really like to understand. (Don't want to spread
misinformation...)

> So with your library, on a recent Intel processor, I cannot use vectors of
> size 2?

Yes and no. You can use Vc::SSE::double_v, which has two entries. I'd really
like to have a more generic class in one of the next releases. This class
should provide a bit more flexibility in the vector sizes. But for most
applications my experience is that one should stick to what the register width
provides. Types that are larger than the register width can easily increase
register pressure and decrease cache efficiency. And using only a part of the
vector is normally no big deal (the machine could calculate more - but if
there is no more data-parallelism...).

> Again, it sounds like you are targeting iteration on very large arrays (the
> same thing auto-vectorization targets) and not small fixed-size arrays.
> Which can be fine, I am just trying to understand.

I'm targeting anything that has data-parallism that can be reasonably used for
SIMD. I'm especially interested in vectorization of algorithms that require
non-countable loops. E.g. many track-reconstruction problems in High-Energy-
Physiscs applications are of this kind: One has to fit many tracks with equal
code for every iteration step, but the measurement inputs are scattered more
or less randomly in memory and the length of the tracks is unknown a-priori.

I'm also interested in enabling explicit expression of what the developer
knows about the data-parallelism in his problem. I find loop- and function-
annotations somewhat lacking in that area. I strongly believe that this
increases maintainablity of vectorized codes (as long as the SIMD abstraction
works well enough for current and future targets).

--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart1546083.uk5qDH2B3c
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Friday 1=
5 February 2013 15:18:45 Marc wrote:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; Remind=
s me of Cilk, which automatically generates the second function if=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; you an=
notate the scalar one. Makes it hard from an ABI point of view.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Yes, Cilk's=
 function annotations are a good alternative for many cases. What I like ab=
out the explicit use of SIMD types, though, is the increased awareness for =
the developer about the code. When you can only write code with scalar type=
s you're not writing down what you actually want to happen. Which makes it =
somewhat harder to learn designing algorithms/data-structures for SIMD.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">ABI is cert=
ainly a very important topic (important for my SIMD types in C++, but shoul=
d ultimately be discussed in a separate issue, in my opinion). But this is =
true whenever you start to use instructions from optional/new instruction s=
ets. Your target suddenly isn't x86 anymore - it's now way more specific an=
d includes microarchitectural features. In a way what happens is comparable=
 to compiling for different architectures, e.g. ia32 vs. amd64. You'd never=
 expect the two to be linked into one binary. You'd also never expect all t=
ypes to have the same sizeof. Now, once you take SIMD into the target, x86 =
appears as a quickly changing target. I'd like to work on this issue more t=
horoughly. Right now I've only sketched out some patterns how to solve the =
problem portably and without compiler extensions.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">But I don't=
 know if C++ should deviate from its target-agnostic view of the world...</=
p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"> =A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; (=
note that auto-vectorization can only vectorize such a function if it is=A0=
</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; i=
nlined)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; Not qu=
ite, but close enough.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Care to ela=
borate on &quot;not quite&quot;? If I'm missing something on the auto-vecto=
rization front I'd really like to understand. (Don't want to spread misinfo=
rmation...)</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; So wit=
h your library, on a recent Intel processor, I cannot use vectors of=A0</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; size 2=
?</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Yes and no.=
 You can use Vc::SSE::double_v, which has two entries. I'd really like to h=
ave a more generic class in one of the next releases. This class should pro=
vide a bit more flexibility in the vector sizes. But for most applications =
my experience is that one should stick to what the register width provides.=
 Types that are larger than the register width can easily increase register=
 pressure and decrease cache efficiency. And using only a part of the vecto=
r is normally no big deal (the machine could calculate more - but if there =
is no more data-parallelism...).</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; Again,=
 it sounds like you are targeting iteration on very large arrays (the=A0</p=
>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; same t=
hing auto-vectorization targets) and not small fixed-size arrays.</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; Which =
can be fine, I am just trying to understand.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I'm targeti=
ng anything that has data-parallism that can be reasonably used for SIMD. I=
'm especially interested in vectorization of algorithms that require non-co=
untable loops. E.g. many track-reconstruction problems in High-Energy-Physi=
scs applications are of this kind: One has to fit many tracks with equal co=
de for every iteration step, but the measurement inputs are scattered more =
or less randomly in memory and the length of the tracks is unknown a-priori=
..</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I'm also in=
terested in enabling explicit expression of what the developer knows about =
the data-parallelism in his problem. I find loop- and function-annotations =
somewhat lacking in that area. I strongly believe that this increases maint=
ainablity of vectorized codes (as long as the SIMD abstraction works well e=
nough for current and future targets).</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart1546083.uk5qDH2B3c--


.

Author: Matthias Kretz <kretz@compeng.uni-frankfurt.de>
Date: Tue, 19 Feb 2013 20:36:12 +0100 Raw View

This is a multi-part message in MIME format.

--nextPart4384819.PNjGTnL0Vi
Content-Type: text/plain; charset=ISO-8859-1

On Friday 15 February 2013 17:51:07 Lawrence Crowl wrote:
> While I have no issue with fixed-size types, making that size
> dependent on the platform seems like a problem.  Wouldn't we
> generally rather want the programmer to write an algorithm solving
> the problem than adapting to the machine?

Full ack to your question. I guess I understand why it would seem like a
problem. My experience is that this is not a problem, as long as the developer
does *not make an assumption about the size of the vector*, i.e. does not
adapt the algorithm for the machine - only for SIMD.
The library provides compile-time constants for the sizes of the vector types
- you can always rely on those. And then you can design your algorithm
accordingly. I.e. design it thus, that it works well on a CPU without SIMD
extensions, with SSE, AVX, AltiVec, NEON, the Xeon Phi, GPUs (this is not part
of my work, but from what I researched there is no fundamental show-stopper -
only the tools need to be adapted), or whatever else the future will give us.

> > That's it. Now, if you compile for SSE the convert function
> > converts 4 coordinates per call, 8 with AVX, 16 with Xeon Phi
> > ... no #ifdef involved.  (note that auto-vectorization can only
> > vectorize such a function if it is inlined)
>
> That can't be 'it'.  Where is the step that converts my data into
> a float_v?

Please take a look at the linked documentation. I discuss it in detail there.

> I'm still concerned about usability and portability.

Good, that's definitely what a SIMD library must provide to be of worth. My
three main concerns when developing Vc have been usability, portability, and
efficiency. I think I've succeeded in many areas and learned a lot in the
process.

--
Dipl.-Phys. Matthias Kretz

Web:   http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



--nextPart4384819.PNjGTnL0Vi
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-=
html40/strict.dtd">
<html><head><meta name=3D"qrichtext" content=3D"1" /><style type=3D"text/cs=
s">
p, li { white-space: pre-wrap; }
</style></head><body style=3D" font-family:'Monospace'; font-size:9pt; font=
-weight:400; font-style:normal;">
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">On Friday 1=
5 February 2013 17:51:07 Lawrence Crowl wrote:</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; While =
I have no issue with fixed-size types, making that size</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; depend=
ent on the platform seems like a problem.  Wouldn't we</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; genera=
lly rather want the programmer to write an algorithm solving</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; the pr=
oblem than adapting to the machine?</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Full ack to=
 your question. I guess I understand why it would seem like a problem. My e=
xperience is that this is not a problem, as long as the developer does *not=
 make an assumption about the size of the vector*, i.e. does not adapt the =
algorithm for the machine - only for SIMD.</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">The library=
 provides compile-time constants for the sizes of the vector types - you ca=
n always rely on those. And then you can design your algorithm accordingly.=
 I.e. design it thus, that it works well on a CPU without SIMD extensions, =
with SSE, AVX, AltiVec, NEON, the Xeon Phi, GPUs (this is not part of my wo=
rk, but from what I researched there is no fundamental show-stopper - only =
the tools need to be adapted), or whatever else the future will give us.</p=
>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; T=
hat's it. Now, if you compile for SSE the convert function</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; c=
onverts 4 coordinates per call, 8 with AVX, 16 with Xeon Phi</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; .=
... no #ifdef involved.  (note that auto-vectorization can only</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; &gt; v=
ectorize such a function if it is inlined)</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; That c=
an't be 'it'.  Where is the step that converts my data into</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; a floa=
t_v?</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Please take=
 a look at the linked documentation. I discuss it in detail there.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">&gt; I'm st=
ill concerned about usability and portability.</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Good, that'=
s definitely what a SIMD library must provide to be of worth. My three main=
 concerns when developing Vc have been usability, portability, and efficien=
cy. I think I've succeeded in many areas and learned a lot in the process.<=
/p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">-- </p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Dipl.-Phys.=
 Matthias Kretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Web:   http=
://compeng.uni-frankfurt.de/?mkretz</p>
<p style=3D"-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; ma=
rgin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; ">&nb=
sp;</p>
<p style=3D" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-rig=
ht:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">SIMD easy a=
nd portable: http://compeng.uni-frankfurt.de/?vc</p></body></html>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

--nextPart4384819.PNjGTnL0Vi--


.

Author: Lawrence Crowl <crowl@googlers.com>
Date: Tue, 19 Feb 2013 15:01:07 -0800 Raw View

On 2/19/13, Matthias Kretz <kretz@compeng.uni-frankfurt.de> wrote:
> On Friday 15 February 2013 17:51:07 Lawrence Crowl wrote:
> > While I have no issue with fixed-size types, making that size
> > dependent on the platform seems like a problem.  Wouldn't we
> > generally rather want the programmer to write an algorithm
> > solving the problem than adapting to the machine?
>
> Full ack to your question. I guess I understand why it would seem
> like a problem. My experience is that this is not a problem, as
> long as the developer does *not make an assumption about the size
> of the vector*, i.e. does not adapt the algorithm for the machine -
> only for SIMD.  The library provides compile-time constants for the
> sizes of the vector types - you can always rely on those. And then
> you can design your algorithm accordingly. I.e. design it thus,
> that it works well on a CPU without SIMD extensions, with SSE, AVX,
> AltiVec, NEON, the Xeon Phi, GPUs (this is not part of my work,
> but from what I researched there is no fundamental show-stopper -
> only the tools need to be adapted), or whatever else the future
> will give us.

I understand why you say this.  However, I think the code can be
substantially simpler.

> > > That's it. Now, if you compile for SSE the convert function
> > > converts 4 coordinates per call, 8 with AVX, 16 with Xeon
> > > Phi ... no #ifdef involved.  (note that auto-vectorization
> > > can only vectorize such a function if it is inlined)
> >
> > That can't be 'it'.  Where is the step that converts my data
> > into a float_v?
>
> Please take a look at the linked documentation. I discuss it in
> detail there.

Stripping out redundancies, your example has:

  Vc::Memory<float_v, 1000> x_mem;
  for (size_t i = 0; i < x_mem.vectorsCount(); ++i)
    x_mem.vector(i) = float_v::Random() * 2.f - 1.f;

Essentially, it requires users to change their sequential loop
into another sequential loop with some unspecified chunk size.
I don't see a significant benefit to the programmer given a good
vectorizing compiler.

Why not something like the following?

  Vc::simd_vector<float, 1000> x_mem;
  x_mem = x_mem.random() * 2.f - 1.f;

Making it work would probably require a template wrapping your
float_v type.  However, the conciseness has value in and of itself.

Have I made my concerns clear?

> > I'm still concerned about usability and portability.
>
> Good, that's definitely what a SIMD library must provide to be
> of worth. My three main concerns when developing Vc have been
> usability, portability, and efficiency. I think I've succeeded
> in many areas and learned a lot in the process.
>
> Web:   http://compeng.uni-frankfurt.de/?mkretz
>
> SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

--
Lawrence Crowl

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



.

Author: Marc <marc.glisse@gmail.com>
Date: Thu, 21 Feb 2013 14:44:38 -0800 (PST) Raw View

------=_Part_229_5197603.1361486678273
Content-Type: text/plain; charset=ISO-8859-1

On Tuesday, February 19, 2013 8:20:43 PM UTC+1, Matthias Kretz wrote:
>
> ABI is certainly a very important topic (important for my SIMD types in
> C++, but should ultimately be discussed in a separate issue, in my opinion).
>

Sure, it can be discussed separately, but clarifying the ABI implications
is imho a prerequisite to the addition of such a feature to the standard.

But this is true whenever you start to use instructions from optional/new
> instruction sets. Your target suddenly isn't x86 anymore - it's now way
> more specific and includes microarchitectural features. In a way what
> happens is comparable to compiling for different architectures, e.g. ia32
> vs. amd64. You'd never expect the two to be linked into one binary. You'd
> also never expect all types to have the same sizeof. Now, once you take
> SIMD into the target, x86 appears as a quickly changing target. I'd like to
> work on this issue more thoroughly. Right now I've only sketched out some
> patterns how to solve the problem portably and without compiler extensions.
>

A related issue is intmax_t, which prevents for instance gcc from
documenting __int128 as an extended integer as that would break the ABI
(and a future __int256 would break it again, and a parameterized __int_N
would make it meaningless).

> > (note that auto-vectorization can only vectorize such a function if it
> is
>
> > > inlined)
>
> >
>
> > Not quite, but close enough.
>
>
>
> Care to elaborate on "not quite"? If I'm missing something on the
> auto-vectorization front I'd really like to understand. (Don't want to
> spread misinformation...)
>
>
Well, you need the compilations of the 2 functions to communicate, but the
compiler, while it compiles one, could notice that this function would like
a vectorized version of the other, and ask for it. It might then produce a
vector version of the other function, without necessarily inlining it.


> > So with your library, on a recent Intel processor, I cannot use vectors
> of
>
> > size 2?
>
>
>
> Yes and no. You can use Vc::SSE::double_v, which has two entries. I'd
> really like to have a more generic class in one of the next releases. This
> class should provide a bit more flexibility in the vector sizes. But for
> most applications my experience is that one should stick to what the
> register width provides. Types that are larger than the register width can
> easily increase register pressure and decrease cache efficiency. And using
> only a part of the vector is normally no big deal (the machine could
> calculate more - but if there is no more data-parallelism...).
>
>
That's cool for people iterating on large arrays. My use case for SIMD
(which is likely a low proportion of all SIMD uses) is for fixed size. If I
do interval arithmetics, my vector has size 2 (lower and upper bounds). If
the platform doesn't have vectors, I'd like to know it and I'll use a
scalar version, although in most cases a synthesized pair of scalars would
be fine. On an AVX platform, I don't want to waste half of the space when I
store these vectors in memory (and I doubt the 256-bit operations are quite
as fast as the 128-bit ones). The only case where I'd be happy with 256
bits is if, my interval being considered as a scalar, the code got
vectorized to work on 2 intervals at once, but that's not the general case.
I also use SIMD to store small vectors (size 2, 3, 4) and compute
determinants of sets of vectors, and in this code I make heavy use of gcc's
__builtin_shuffle. An unknown-sized vector wouldn't be very convenient.

Again, what I am describing may not be the most common use, but I wanted to
stress that for some people, std::array (with many operations added) is
closer to what they need.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_229_5197603.1361486678273
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Tuesday, February 19, 2013 8:20:43 PM UTC+1, Matthias Kretz wrote:<block=
quote class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-le=
ft: 1px #ccc solid;padding-left: 1ex;">
<div style=3D"font-family:'Monospace';font-size:9pt;font-weight:400;font-st=
yle:normal"><p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;ma=
rgin-right:0px;text-indent:0px">ABI is certainly a very important topic (im=
portant for my SIMD types in C++, but should ultimately be discussed in a s=
eparate issue, in my opinion).</p></div></blockquote><div><br>Sure, it can =
be discussed separately, but clarifying the ABI implications is imho a prer=
equisite to the addition of such a feature to the standard.<br><br></div><b=
lockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-=
left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"font-=
family: 'Monospace'; font-size: 9pt; font-weight: 400; font-style: normal;"=
><p style=3D"margin: 0px; text-indent: 0px;"> But this is true whenever you=
 start to use instructions from optional/new instruction sets. Your target =
suddenly isn't x86 anymore - it's now way more specific and includes microa=
rchitectural features. In a way what happens is comparable to compiling for=
 different architectures, e.g. ia32 vs. amd64. You'd never expect the two t=
o be linked into one binary. You'd also never expect all types to have the =
same sizeof. Now, once you take SIMD into the target, x86 appears as a quic=
kly changing target. I'd like to work on this issue more thoroughly. Right =
now I've only sketched out some patterns how to solve the problem portably =
and without compiler extensions.</p>
</div></blockquote><div>&nbsp;<br>A related issue is intmax_t, which preven=
ts for instance gcc from documenting __int128 as an extended integer as tha=
t would break the ABI (and a future __int256 would break it again, and a pa=
rameterized __int_N would make it meaningless).<br><br></div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px so=
lid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"font-family: 'Mon=
ospace'; font-size: 9pt; font-weight: 400; font-style: normal;"><p style=3D=
"margin: 0px; text-indent: 0px;"></p><p style=3D"margin-top:0px;margin-bott=
om:0px;margin-left:0px;margin-right:0px;text-indent:0px">&gt; &gt; (note th=
at auto-vectorization can only vectorize such a function if it is&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; &gt; inlined)</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; </p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; Not quite, but close enough.</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Care to elaborate on "not quite"? If I'm missing someth=
ing on the auto-vectorization front I'd really like to understand. (Don't w=
ant to spread misinformation...)</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px"></p></div></blockquote><div><br>Well, you need the comp=
ilations of the 2 functions to communicate, but the compiler, while it comp=
iles one, could notice that this function would like a vectorized version o=
f the other, and ask for it. It might then produce a vector version of the =
other function, without necessarily inlining it. <br>&nbsp;<br></div><block=
quote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left=
: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div style=3D"font-fami=
ly: 'Monospace'; font-size: 9pt; font-weight: 400; font-style: normal;">
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; So with your library, on a recent Intel processor,=
 I cannot use vectors of&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&gt; size 2?</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">&nbsp;</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px">Yes and no. You can use Vc::SSE::double_v, which has tw=
o entries. I'd really like to have a more generic class in one of the next =
releases. This class should provide a bit more flexibility in the vector si=
zes. But for most applications my experience is that one should stick to wh=
at the register width provides. Types that are larger than the register wid=
th can easily increase register pressure and decrease cache efficiency. And=
 using only a part of the vector is normally no big deal (the machine could=
 calculate more - but if there is no more data-parallelism...).</p>
<p style=3D"margin-top:0px;margin-bottom:0px;margin-left:0px;margin-right:0=
px;text-indent:0px"></p></div></blockquote><div><br>That's cool for people =
iterating on large arrays. My use case for SIMD (which is likely a low prop=
ortion of all SIMD uses) is for fixed size. If I do interval arithmetics, m=
y vector has size 2 (lower and upper bounds). If the platform doesn't have =
vectors, I'd like to know it and I'll use a scalar version, although in mos=
t cases a synthesized pair of scalars would be fine. On an AVX platform, I =
don't want to waste half of the space when I store these vectors in memory =
(and I doubt the 256-bit operations are quite as fast as the 128-bit ones).=
 The only case where I'd be happy with 256 bits is if, my interval being co=
nsidered as a scalar, the code got vectorized to work on 2 intervals at onc=
e, but that's not the general case. I also use SIMD to store small vectors =
(size 2, 3, 4) and compute determinants of sets of vectors, and in this cod=
e I make heavy use of gcc's __builtin_shuffle. An unknown-sized vector woul=
dn't be very convenient.<br><br>Again, what I am describing may not be the =
most common use, but I wanted to stress that for some people, std::array (w=
ith many operations added) is closer to what they need.<br></div>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_229_5197603.1361486678273--

.

Author: VinceRev <vince.rev@gmail.com>
Date: Fri, 22 Feb 2013 06:44:36 -0800 (PST) Raw View

------=_Part_451_747603.1361544276837
Content-Type: text/plain; charset=ISO-8859-1

A while ago, I started a discussion about adding a technique to
automatically provide mathematical operators to simple vectors :
https://groups.google.com/a/isocpp.org/d/topic/std-proposals/xzQIfMeNFWs/discussion
I have nearly finished to implement that for my own library but I think
that the conclusion of the topic was "it has to be peer-reviewed, tested,
optimized through the Boost process before going any further".

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-proposals@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.



------=_Part_451_747603.1361544276837
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

A while ago, I started a discussion about adding a technique to automatical=
ly provide mathematical operators to simple vectors :<br>https://groups.goo=
gle.com/a/isocpp.org/d/topic/std-proposals/xzQIfMeNFWs/discussion<br>I have=
 nearly finished to implement that for my own library but I think that the =
conclusion of the topic was "it has to be peer-reviewed, tested, optimized =
through the Boost process before going any further".<br><br>

<p></p>

-- <br />
&nbsp;<br />
--- <br />
You received this message because you are subscribed to the Google Groups &=
quot;ISO C++ Standard - Future Proposals&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an e=
mail to std-proposals+unsubscribe@isocpp.org.<br />
To post to this group, send email to std-proposals@isocpp.org.<br />
Visit this group at <a href=3D"http://groups.google.com/a/isocpp.org/group/=
std-proposals/?hl=3Den">http://groups.google.com/a/isocpp.org/group/std-pro=
posals/?hl=3Den</a>.<br />
&nbsp;<br />
&nbsp;<br />

------=_Part_451_747603.1361544276837--

.