Thread

Topic: inline assembly

Author: "kanze" <kanze@gabi-soft.fr>
Date: Thu, 3 Nov 2005 09:39:19 CST Raw View

Alf P. Steinbach wrote:
> * kanze -> ayart3:
> > > align, align x, even
> > > Makes the assembler to emit nop instructions to align the
> > > next instruction in the default, x, or even boundary.
> > > Note: x must be a power of 2.

> > Even on machines where word size is 6?

> You don't happen to have an example of such a machine? ;-)

Not physically at hand, but they have certainly existed (Unisys
Series A, for example).

The fact remains that the C/C++ standards try to support such
machines.  (Whether such support is still relevant is another
question.  About the only "modern" machine I know of which isn't
based on the 8/16/32/64 bit model is the Unisys 2200.)

> [snip]
> > > c++ variables should be accessible from the inline
> > > assembly;

> > That's a bit tricky.  How?  Often, you cannot access a C++
> > variable with a single instruction.  On most architectures,
> > you need to use registers in order to access a static
> > variable; how is the compiler suppsed to know which
> > registers are free?

> It's not tricky at all.  After all, is there any example of
> inline assemly that doesn't support that?  Specifying it in a
> general way is, however, tricky and probably impossible.

I think that that is really my point.  The inline assembly will
obviously never be portable, and trying to make it more or less
so is tricky, as anything you do will probably end up causing
problems on some machine.  The goal of being able to access
variables declared in the C++ part is good, but given the way
different machine architectures work, I'm sceptical.

Interactions with the optimizer must be considered, too.  On the
machines I usually work on, if I declare "int i;" as a local
variable, and don't take its address, then it will almost
certainly be in a register.  I don't want the compiler to move
it into memory just because there is inline assembler; I want
some way of 1) telling the compiler whether my inline assembler
uses it or not, and 2) if the inline assembler uses it, for the
compiler to tell me whether it's in a register or not (and if it
is in a register, which one, of course).

I might add that one important thing he completely forgot was
register allocation.  Either the compiler needs to know which
registers I use in the inline assembler, or I have to know which
ones it uses (which will depend on the various optimization
options, and can change in unexpected ways as a result of
minimum changes in the source code).

> I agree with the original poster that the current asm
> construct is, in my words, no good at all, a wart on the
> language  --  to wit, compiler vendors ignore it and provide
> their own non-standard replacements.

> So it should be cleaned up and brought in line with _existing
> practice_, like, removing that silly "string literal" in the
> syntax and adding support for functions implemented in inline
> assembly ("naked" functions), or alternatively, it should be
> deprecated, but, is there any example where the existing
> practice argument has actually worked?

I can agree with that.  Something like asm { ... } would seem
more reasonable.

The reason behind the string literal is simple: it's about the
only thing the compiler doesn't syntax check.  Off hand, "asm
{...}" seems reasonable, but suppose the assembler uses } as
some sort of a meta-character.  That seems unlikely enough to
me that I'd take the risk.  Suppose the assembler uses some sort
of character quoting that isn't compatible with C++, and the
user needs a '}'.  Suppose...

The problem is that assemblers are very varied, for a number of
reasons.  Off hand, I'd settle for: asm is a keyword, and the
syntax for what follows and its scope are implementation
defined.

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "Joe" <jgreer@nsisoftware.com>
Date: Thu, 3 Nov 2005 09:40:37 CST Raw View

I don't know about a 6 bit word, but it wouldn't surprise me if some of
the embedded processors might have such a thing.  I remember working on
the Cyber70s where the word size was 60 bits, the byte size was 6bit,
the character size was either 6 bit or 12 bit (depending upon which
character set you were using) and the address registers were 18bit.
The previous example would look something like:

         asm "Cyber70" {
               Set X1=0x123
               Set A1 = gSomeGlobalVar    // has side effect of loading
X1 with word
               Set X2 = X2 + X1
               Set X7 = X2
               Set A7 = gSomeGlobalVar  // has side effect of writing
X7 to memory
               ret
        }

There are 3 sets of registers.  Data registers X0 - X7, address
registers A0-A7 and temp registers B0-B7.  Assigning a value to A0-A5
caused the associated data register X0-X5 to be loaded.  Assigning a
value to A6 or A7 caused the associated data register X6 or X7 to be
written.  Register B0 was hard coded to 0.  The machine was massively
parallel so many instructions could be executed in the same clock
cycle.  Anyway, the point is really that capturing an assembly language
like this is hard to do in a generalized way, though I can sympathize
with the desire.

joe

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "//ayart3" <nayart3@gmail.com>
Date: Tue, 1 Nov 2005 22:33:48 CST Raw View

Here's my 2 cents:

In the current standard the asm declaration is defined as:
    asm ( string-literal );

The asm declaration is conditionally supported, meaning
implementation-defined.

Current compilers, GCC and MSVC++, support inline assembly using a
custom implementation, witch makes inline assembly code non portable
across compilers. Both GCC and MSVC++ don't make use of the keyword
asm, but reserve it, instead they use the __asm__ and _asm keywords
respectably.

My proposal:
Keep the actual definition for the asm keyword: asm ( string-literal );
And add the additional definition:
asm "machine" {
       asm-instruction;
}

asm-instruction:
 label: asm-instruction
 align
 align x
 even
 db Operands
 ds Operands
 di Operands
 dl Operands
 df Operands
 dd Operands
 Opcode
 Opcode Operands
Operands:
 operand1, operand2, ... , operandn




machine
Stands for the target machine for with the assembly code applies: x86,
ppc, mips, etc...
When compiled for a target machine for witch no assembly was written,
in the presence of assembly written for other machines, the compiler
should give an error.

label
Just like labels in C++ that can be target of gotos, and in the case of
assembly, branches and jumps.

align, align x, even
Makes the assembler to emit nop instructions to align the next
instruction in the default, x, or even boundary. Note: x must be a
power of 2.

db, ds, di, dl, df, dd
Used to insert raw data: db stands for byte (8bits), ds for short
(16bits), di for integer (32bit), dl for long (64bits), df for float
(32bits) and dd for double (64bits).

Opcode
The opcodes are the assembly instructions supported by the native
machine for witch the compilation is being targeted. They are written
in lower case only. The virtual machine instructions are optional but
the native instructions must be all supported. The name of the
instructions should be has specified by the machine vendor.

Operands
Operands stand for machine registers, c++ variables, immediates.
Registers must be written in lower case: general propose register are
named as r0, r1, r2, .., rN,  where N is the total number of  general
propose register minus 1; general propose register must also be named
by their vendor name (example x86: eax, ebx, ebp, ...; mips: ra0, rs0,
rt0, ...); all the other machine registers witch are not consider
general propose are named as specified by the machine vendor; c++
variables should be accessible from the inline assembly; immediates:
can be decimal, hexadecimal or binary numbers and also can be chars,
they are bound to the C++ rules (ex: 0x0A, 0101b, 123, 'A').

Pointers
The syntax for treating operands as pointers can be one of the
following: [operand + offset];
offset[operand];
struct_def_name[operand].struct_member.
Some machines require a prefix to tell the size of the memory to
address, in x86 those are: byte, word, dword, qword. Note that some
implementations also use the keyword 'ptr' following one of the
byte, word, ... prefixes, this is not required as using byte, word, ...
should suffice.

Notes:
- Inside the bracelets of asm statement the C++ keywords (except the
inline assembly pseudo ops) have no meaning in order to avoid name
conflicts with opcodes (example: the int of C++ with int of x86).
- The inline assembly should fit into the C++ world: must use the same
tokens as C++; operands and opcodes must be all written in lower case
following the C++ naming convention (no dots in opcode names, no $ in
register names); operands are evaluated just like in C++; comments must
be written just like in C++ (no #comment or ;comment); asm instructions
are terminated with a ';' and not with an end of line.
- Some compilers introduce the naked keyword prefixing the function
name as directive for the compiler to not generate prolog or epilog
code for the function, I suggest that instead of 'nake' we use the
'asm' keyword in order to avoid introducing a new keyword into the
core language.
- The point of this is to create a set o rules to ease the
implementation of inline assembly across compilers in the most portable
manner as possible.

Examples:

int gSomeGlobalVar;

void asm foo() //no prolog or epilog
{
 asm "x86" {
  mov eax, 0x123;   //eax = 0x123
  mov eax, dword [gSomeGlobalVar]; //eax = *gSomeGlobalVar
  add ebx, eax;    //ebx += eax
  mov dword [gSomeGlobalVar], ebx //*gSomeGlobalVar = ebx
  ret;
 }
 asm "mips" {
  li rt0, 0x123   //rt0 = 0x123
  lw rt0, [gSomeGlobalVar] //rt0 = *gSomeGlobalVar
  add rt1, rt1, rt0   //rt1 += rt0
  sw [gSomeGlobalVar], rt1 //*gSomeGlobalVar = rt1
  j rra
        }
        //should give an error if not compiled for mips or x86,
something like:
        // "Error: No inline assembly for target machine"
}

struct my_t { int a; char b;};

void bar(my_t* mt) {
 asm "x86" {
  mov eax, mt;  //eax = mt
  mov eax, my_t[eax].a //eax = mt->a
        }
        asm "mips" {
  lw rt0, my_t[ra0].a //rt0 = mt->a
        }
}


Comments ? Suggestions?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: nagle@animats.com (John Nagle)
Date: Wed, 2 Nov 2005 07:41:15 GMT Raw View

//ayart3 wrote:

> Here's my 2 cents:
>
> In the current standard the asm declaration is defined as:
>     asm ( string-literal );
>
> The asm declaration is conditionally supported, meaning
> implementation-defined.
>
> Current compilers, GCC and MSVC++, support inline assembly using a
> custom implementation, witch makes inline assembly code non portable
> across compilers. Both GCC and MSVC++ don't make use of the keyword
> asm, but reserve it, instead they use the __asm__ and _asm keywords
> respectably.
>
> My proposal:
> Keep the actual definition for the asm keyword: asm ( string-literal );
> And add the additional definition:
> asm "machine" {
>        asm-instruction;
> }

     Inline assembly code is always difficult.  On machines where
register assignment is not totally stupid.

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: "kanze" <kanze@gabi-soft.fr>
Date: Wed, 2 Nov 2005 09:50:39 CST Raw View

//ayart3 wrote:
> Here's my 2 cents:

> In the current standard the asm declaration is defined as:
>     asm ( string-literal );

> The asm declaration is conditionally supported, meaning
> implementation-defined.

> Current compilers, GCC and MSVC++, support inline assembly
> using a custom implementation, witch makes inline assembly
> code non portable across compilers. Both GCC and MSVC++ don't
> make use of the keyword asm, but reserve it, instead they use
> the __asm__ and _asm keywords respectably.

> My proposal:

> Keep the actual definition for the asm keyword: asm (
> string-literal ); And add the additional definition:

> asm "machine" {
>        asm-instruction;
> }

> asm-instruction:
>  label: asm-instruction
>  align
>  align x
>  even
>  db Operands
>  ds Operands
>  di Operands
>  dl Operands
>  df Operands
>  dd Operands
>  Opcode
>  Opcode Operands
> Operands:
>  operand1, operand2, ... , operandn

> machine

> Stands for the target machine for with the assembly code
> applies: x86, ppc, mips, etc...

> When compiled for a target machine for witch no assembly was
> written, in the presence of assembly written for other
> machines, the compiler should give an error.

> label

> Just like labels in C++ that can be target of gotos, and in
> the case of assembly, branches and jumps.

Allowing it as a target of a goto doesn't seem like a good idea to me.
Typically, if I'm writing assembler code, I know what's in my
registers; I'm likely using a conditional branch to set up a
specific register with one value or another.  A goto from
outside of the code could only be an error.

> align, align x, even
> Makes the assembler to emit nop instructions to align the next
> instruction in the default, x, or even boundary. Note: x must
> be a power of 2.

Even on machines where word size is 6?

> db, ds, di, dl, df, dd

> Used to insert raw data: db stands for byte (8bits), ds for
> short (16bits), di for integer (32bit), dl for long (64bits),
> df for float (32bits) and dd for double (64bits).

Even on machines with 9 bit bytes?

> Opcode
> The opcodes are the assembly instructions supported by the
> native machine for witch the compilation is being targeted.
> They are written in lower case only.  The virtual machine
> instructions are optional but the native instructions must be
> all supported.  The name of the instructions should be has
> specified by the machine vendor.

What if the architecture has several vendors, which use
different names?  (This was the case for Intel 8080 and Zilog
Z80.)

> Operands

> Operands stand for machine registers, c++ variables,
> immediates. Registers must be written in lower case: general
> propose register are named as r0, r1, r2, .., rN,  where N is
> the total number of  general propose register minus 1; general
> propose register must also be named by their vendor name
> (example x86: eax, ebx, ebp, ...; mips: ra0, rs0, rt0, ...);
> all the other machine registers witch are not consider general
> propose are named as specified by the machine vendor;

How general purpose does a general purpose register have to be?
I can't see any sense in trying to normalize register names
between different architectures; the register names typically
have some specific meaning, like the i<n>, o<n>, l<n> and g<n>
on a Sparc.

> c++ variables should be accessible from the inline assembly;

That's a bit tricky.  How?  Often, you cannot access a C++
variable with a single instruction.  On most architectures, you
need to use registers in order to access a static variable; how
is the compiler suppsed to know which registers are free?

> immediates: can be decimal, hexadecimal or binary numbers and
> also can be chars, they are bound to the C++ rules (ex: 0x0A,
> 0101b, 123, 'A').

> Pointers
> The syntax for treating operands as pointers can be one of the
> following: [operand + offset];
> offset[operand];
> struct_def_name[operand].struct_member.
> Some machines require a prefix to tell the size of the memory
> to address, in x86 those are: byte, word, dword, qword. Note
> that some implementations also use the keyword 'ptr' following
> one of the byte, word, ... prefixes, this is not required as
> using byte, word, ... should suffice.

And most machines reflect the size of the operand in the machine
op code, e.g. mov movb, etc.

> Notes:
> - Inside the bracelets of asm statement the C++ keywords
> (except the inline assembly pseudo ops) have no meaning in
> order to avoid name conflicts with opcodes (example: the int
> of C++ with int of x86).

What happens if a machine has an opcode even, or dl?

> - The inline assembly should fit into the C++ world: must use
> the same tokens as C++; operands and opcodes must be all
> written in lower case following the C++ naming convention (no
> dots in opcode names, no $ in register names); operands are
> evaluated just like in C++; comments must be written just like
> in C++ (no #comment or ;comment); asm instructions are
> terminated with a ';' and not with an end of line.

> - Some compilers introduce the naked keyword prefixing the
> function name as directive for the compiler to not generate
> prolog or epilog code for the function, I suggest that instead
> of 'nake' we use the 'asm' keyword in order to avoid
> introducing a new keyword into the core language.

> - The point of this is to create a set o rules to ease the
> implementation of inline assembly across compilers in the most
> portable manner as possible.

By definition, assembler isn't portable.  So what's the point?
What does this buy us that we don't have already?

> Examples:

> int gSomeGlobalVar;
>
> void asm foo() //no prolog or epilog
> {
>  asm "x86" {
>   mov eax, 0x123;   //eax = 0x123
>   mov eax, dword [gSomeGlobalVar]; //eax = *gSomeGlobalVar
>   add ebx, eax;    //ebx += eax
>   mov dword [gSomeGlobalVar], ebx //*gSomeGlobalVar = ebx
>   ret;
>  }
>  asm "mips" {
>   li rt0, 0x123   //rt0 = 0x123
>   lw rt0, [gSomeGlobalVar] //rt0 = *gSomeGlobalVar
>   add rt1, rt1, rt0   //rt1 += rt0
>   sw [gSomeGlobalVar], rt1 //*gSomeGlobalVar = rt1
>   j rra
>         }
>         //should give an error if not compiled for mips or x86,
> something like:
>         // "Error: No inline assembly for target machine"
> }

This I really don't understand.  The point of inline assembler
is to generate code "inline".  Here, there's absolutly no reason
not to use the native assembler directly for this.

> struct my_t { int a; char b;};

> void bar(my_t* mt) {
>  asm "x86" {
>   mov eax, mt;  //eax = mt
>   mov eax, my_t[eax].a //eax = mt->a
>         }
>         asm "mips" {
>   lw rt0, my_t[ra0].a //rt0 = mt->a
>         }
> }

> Comments ? Suggestions?

Seems like a waste of time.  (It also seems like you're not
familiar with too many different assemblers, or machine
architectures.)

--
James Kanze                                           GABI Software
Conseils en informatique orient   e objet/
                   Beratung in objektorientierter Datenverarbeitung
9 place S   mard, 78210 St.-Cyr-l'   cole, France, +33 (0)1 30 23 00 34


---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]

Author: alfps@start.no (Alf P. Steinbach)
Date: Wed, 2 Nov 2005 18:42:08 GMT Raw View

* kanze -> ayart3:
> > align, align x, even
> > Makes the assembler to emit nop instructions to align the next
> > instruction in the default, x, or even boundary. Note: x must
> > be a power of 2.
>
> Even on machines where word size is 6?

You don't happen to have an example of such a machine? ;-)


[snip]
> > c++ variables should be accessible from the inline assembly;
>
> That's a bit tricky.  How?  Often, you cannot access a C++
> variable with a single instruction.  On most architectures, you
> need to use registers in order to access a static variable; how
> is the compiler suppsed to know which registers are free?

It's not tricky at all.  After all, is there any example of inline
assemly that doesn't support that?  Specifying it in a general way is,
however, tricky and probably impossible.

I agree with the original poster that the current asm construct is, in
my words, no good at all, a wart on the language  --  to wit, compiler
vendors ignore it and provide their own non-standard replacements.

So it should be cleaned up and brought in line with _existing practice_,
like, removing that silly "string literal" in the syntax and adding
support for functions implemented in inline assembly ("naked"
functions), or alternatively, it should be deprecated, but, is there any
example where the existing practice argument has actually worked?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

---
[ comp.std.c++ is moderated.  To submit articles, try just posting with ]
[ your news-reader.  If that fails, use mailto:std-c++@ncar.ucar.edu    ]
[              --- Please see the FAQ before posting. ---               ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html                       ]