Thread

Topic: 32-bit product of 16-bit multiply

Author: James.Kanze@dresdner-bank.com
Date: 1999/04/13 Raw View

In article <370E3761.631934F6@technologist.com>,
  David R Tribble <dtribble@technologist.com> wrote:

    [...]
> Now for my question:  Is there a more optimal way of producing
> a 32-bit product of two 16-bit integers than [B] (casting the
> operands to long first)?  (I think the answer is "No".)

As this is comp.STD.c++, the answer is no.  From the description, it
sounds like portability is not a particular issue for you friend; if
this is the case, it may be possible to write an inline function using
an asm directive.

> Or, to state it another way, is there some sort of hint we can
> give the compiler that we want code like [3] instead of [2]?
> (I think the answer is "No, but good optimizing compilers ought
> to be capable of this".)

I would expect almost any compiler to be able to optimize this; even
simple peep-hole optimization will do the trick.  Basically, the
generated code consists of a series of multiplications and additions,
but the compiler can see that for all but one of the multiplications, at
least one of the multiplicands is the constant 0.

Or were the original values signed?  That would make the problem
slightly more difficult.

--
James Kanze                         mailto: James.Kanze@dresdner-bank.com
Conseils en informatique orient   e objet/
                        Beratung in objekt orientierter Datenverarbeitung
Ziegelh   ttenweg 17a, 60598 Frankfurt, Germany  Tel. +49 (069) 63 19 86 27

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/       Search, Read, Discuss, or Start Your Own
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: ncm@nospam.cantrip.org (Nathan Myers)
Date: 1999/04/11 Raw View

David R Tribble <dtribble@technologist.com> wrote:
>A friend of mine brought up a "problem" with C/C++ the other day.
>He has code similar to the following:
>
>    short   a,b; long  x;
>    x = a * b;  // [A] 16-bit result placed into 32-bit var
>
>Now for my question:  Is there a more optimal way of producing
>a 32-bit product of two 16-bit integers than [B] (casting the
>operands to long first)?  (I think the answer is "No".)

The best I know is

   x = a * (long)b;

>Or, to state it another way, is there some sort of hint we can
>give the compiler that we want code like [3] instead of [2]?

No, but good optimizing compilers would be capable of this --
for a permitted definition of "good".  Any compiler intended
for DSP certainly should be capable of it.  Do we know that
your friend's isn't?

--
Nathan Myers
ncm@nospam.cantrip.org  http://www.cantrip.org/
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Douglas A. Gwyn" <DAGwyn@null.net>
Date: 1999/04/11 Raw View

David R Tribble wrote:
>     short   a;  // 16 bits
>     short   b;  // 16 bits
>     long    x;  // 32 bits
>     x = a * b;  // [A] 16-bit result placed into 32-bit var
> What he wanted, though, was to obtain the full 32-bit result of the
> multiplication; for this, he had to code it differently:
>     x = (long)a * (long)b;   // [B] Force a 32-bit multiply
> The problem with this, he pointed out, was that this seemed to
> generate suboptimal code, especially on CPUs having a single
> instruction ...
> I then explained the "as if" rule, that the result of a*b had to
> act like a 16-bit int, regardless of the size of the variable it
> was subsequently assigned to (x).

Note that
 x = a * b;
will produce a numerically correct answer when the product is
representable in a short.  A compiler that so chooses can *also*
produce the apparently desired result when the product overflows
(is not representable in a short), since the C standard imposes
no requirements in such a case (the code is not strictly
conforming if it lets that situation arise).  If this is
important behavior for a compiler's customers, then the vendor
can make it work that way, at least in certain cases.

> Now for my question:  Is there a more optimal way of producing
> a 32-bit product of two 16-bit integers than [B] (casting the
> operands to long first)?  (I think the answer is "No".)

Not in strictly conforming code.

> Or, to state it another way, is there some sort of hint we can
> give the compiler that we want code like [3] instead of [2]?
> (I think the answer is "No, but good optimizing compilers ought
> to be capable of this".)

When a compiler sees
 x = (long)a * (long)b;
it is allowed to take advantage of its knowledge that a and b
are shorts to generate special code that works only in that
case; as you say, an optimization.  This is preferable to the
nonstandard approach mentioned above.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: Andreas Schwab <schwab@issan.informatik.uni-dortmund.de>
Date: 1999/04/12 Raw View

"Douglas A. Gwyn" <DAGwyn@null.net> writes:

|> David R Tribble wrote:
|> >     short   a;  // 16 bits
|> >     short   b;  // 16 bits
|> >     long    x;  // 32 bits
|> >     x = a * b;  // [A] 16-bit result placed into 32-bit var
|> > What he wanted, though, was to obtain the full 32-bit result of the
|> > multiplication; for this, he had to code it differently:
|> >     x = (long)a * (long)b;   // [B] Force a 32-bit multiply
|> > The problem with this, he pointed out, was that this seemed to
|> > generate suboptimal code, especially on CPUs having a single
|> > instruction ...
|> > I then explained the "as if" rule, that the result of a*b had to
|> > act like a 16-bit int, regardless of the size of the variable it
|> > was subsequently assigned to (x).
|>
|> Note that
|>  x = a * b;
|> will produce a numerically correct answer when the product is
|> representable in a short.  A compiler that so chooses can *also*
|> produce the apparently desired result when the product overflows
|> (is not representable in a short), since the C standard imposes
|> no requirements in such a case (the code is not strictly
|> conforming if it lets that situation arise).

s/short/int/.  Types narrower than int are always promoted to int first.
(Of course, that does not make a difference if sizeof(int) ==
sizeof(short)).

--
Andreas Schwab                                      "And now for something
schwab@issan.cs.uni-dortmund.de                      completely different"
schwab@gnu.org
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: "Douglas A. Gwyn" <DAGwyn@null.net>
Date: 1999/04/13 Raw View

Andreas Schwab wrote:
> s/short/int/.  Types narrower than int are always promoted to int
> first.

Well, okay, but I was assuming the context of the original question.
The issue was how to multiply shorts to get a long without getting
unduly inefficient code due to such promotions, or how the compiler
could optimize such constructs.
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]

Author: David R Tribble <dtribble@technologist.com>
Date: 1999/04/11 Raw View

A friend of mine brought up a "problem" with C/C++ the other day.
He has code similar to the following:

    short   a;  // 16 bits
    short   b;  // 16 bits
    long    x;  // 32 bits

    x = a * b;  // [A] 16-bit result placed into 32-bit var

(This assumes a system with 16-bit ints.)

He was surprised to find that the value of x was only the lower
16 bits of the product of a and b.  I explained to him the rules
of integer promotions and assured him that the compiler is
justified in doing what it does.  The pseudo-assembler code looks
something like this:

    ; [1]
    mov  a, r1     ; r1 <- a
    mov  b, r2     ; r2 <- b
    mulw r2, r1    ; r0:r1 <- r1 x r2
    mov  r1, x+4   ; x.lsw <- r1
    mov  0, x+0    ; x.msw <- 0, throw away upper 16 bits (r0)

(This assumes a CPU with 16-bit registers having a multiply
instruction that yields a 32-bit product in a register pair.)

What he wanted, though, was to obtain the full 32-bit result of the
multiplication; for this, he had to code it differently:

    x = (long)a * (long)b;   // [B] Force a 32-bit multiply

The problem with this, he pointed out, was that this seemed to
generate suboptimal code, especially on CPUs having a single
instruction for multiplying two 16-bit values to obtain a 32-bit
product.  (On 16-bit CPUs like he uses, a 32-bit multiply operation
is much slower than a 16-bit multiply.)  The typical assembler code
looks like this:

    ; [2]
    mov  a, r1     ; r1 <- a.lsw
    mov  r0, 0     ; r0 <- a.msw; r0:r1 <- (long)a
    mov  b, r5     ; r5 <- b.lsw
    mov  r4, 0     ; r4 <- b.msw; r4:r5 <- (long)b
    muld r0, r2    ; r0:r1:r2:r3 <- r0:r1 x r4:r5
    mov  r3, x+4   ; x.lsw <- r3
    mov  r2, x+0   ; x.msw <- r2
                   ; throw away upper 32 bits (r0:r1)

He pointed out that this seems like such a basic operation,
especially in the kinds of embedded environments he programmed in,
that is, to get the 32-bit product of two 16-bit integers.  He
considered it a lost opportunity for the compiler to optimize
this kind of expression, taking advantage of the full register
pair result instead of simply throwing away the upper half,
especially since the compiler knows that the result is being
placed into a 32-bit integer.  He thought code like this would
be more suitable:

    ; [3]
    mov  a, r1     ; r1 <- a
    mov  b, r2     ; r2 <- b
    mulw r2, r1    ; r0:r1 <- r1 x r2
    mov  r1, x+4   ; x.lsw <- r1
    mov  r2, x+0   ; x.msw <- r2, use full 32-bit result

I then explained the "as if" rule, that the result of a*b had to
act like a 16-bit int, regardless of the size of the variable it
was subsequently assigned to (x).

Now for my question:  Is there a more optimal way of producing
a 32-bit product of two 16-bit integers than [B] (casting the
operands to long first)?  (I think the answer is "No".)

Or, to state it another way, is there some sort of hint we can
give the compiler that we want code like [3] instead of [2]?
(I think the answer is "No, but good optimizing compilers ought
to be capable of this".)

-- David R. Tribble, dtribble@technologist.com --
---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://reality.sgi.com/austern_mti/std-c++/faq.html              ]