Go to the first, previous, next, last section, table of contents.

Syntax

This chapter describes the machine-independent syntax allowed in a source file. syntax is similar to what many other assemblers use; it is inspired by the BSD 4.2 assembler.

Preprocessing

The internal preprocessor:

It does not do macro processing, include file handling, or anything else you may get from your C compiler's preprocessor. You can do include file processing with the .include directive (see section .include "file"). You can use the GNU C compiler driver to get other "CPP" style preprocessing, by giving the input file a `.S' suffix. See section `Options Controlling the Kind of Output' in Using GNU CC.

Excess whitespace, comments, and character constants cannot be used in the portions of the input text that are not preprocessed.

If the first line of an input file is #NO_APP or if you use the `-f' option, whitespace and comments are not removed from the input file. Within an input file, you can ask for whitespace and comment removal in specific portions of the by putting a line that says #APP before the text that may contain whitespace or comments, and putting a line that says #NO_APP after this text. This feature is mainly intend to support asm statements in compilers whose output is otherwise free of comments and whitespace.

Whitespace

Whitespace is one or more blanks or tabs, in any order. Whitespace is used to separate symbols, and to make programs neater for people to read. Unless within character constants (see section Character Constants), any whitespace means the same as exactly one space.

Comments

There are two ways of rendering comments to . In both cases the comment is equivalent to one space.

Anything from `/*' through the next `*/' is a comment. This means you may not nest these comments.

/*
  The only way to include a newline ('\n') in a comment
  is to use this sort of comment.
*/

/* This sort of comment does not nest. */

Anything from the line comment character to the next newline is considered a comment and is ignored. The line comment character is see @xref{Machine Dependencies}.

To be compatible with past assemblers, lines that begin with `#' have a special interpretation. Following the `#' should be an absolute expression (see section Expressions): the logical line number of the next line. Then a string (see section Strings) is allowed: if present it is a new logical file name. The rest of the line, if any, should be whitespace.

If the first non-whitespace characters on the line are not numeric, the line is ignored. (Just like a comment.)

                          # This is an ordinary comment.
# 42-6 "new_file_name"    # New logical file name
                          # This is logical line # 36.

This feature is deprecated, and may disappear from future versions of .

Symbols

A symbol is one or more characters chosen from the set of all letters (both upper and lower case), digits and the three characters `_.$'. No symbol may begin with a digit. Case is significant. There is no length limit: all characters are significant. Symbols are delimited by characters not in that set, or by the beginning of a file (since the source program must end with a newline, the end of a file is not a possible symbol delimiter). See section Symbols.

Statements

A statement ends at a newline character (`\n') or at a semicolon (`;'). The newline or semicolon is considered part of the preceding statement. Newlines and semicolons within character constants are an exception: they do not end statements.

It is an error to end any statement with end-of-file: the last character of any input file should be a newline.

You may write a statement on more than one line if you put a backslash (\) immediately in front of any newlines within the statement. When reads a backslashed newline both characters are ignored. You can even put backslashed newlines in the middle of symbol names without changing the meaning of your source program.

An empty statement is allowed, and may include whitespace. It is ignored.

A statement begins with zero or more labels, optionally followed by a key symbol which determines what kind of statement it is. The key symbol determines the syntax of the rest of the statement. If the symbol begins with a dot `.' then the statement is an assembler directive: typically valid for any computer. If the symbol begins with a letter the statement is an assembly language instruction: it assembles into a machine language instruction.

A label is a symbol immediately followed by a colon (:). Whitespace before a label or after a colon is permitted, but you may not have whitespace between a label's symbol and its colon. See section Labels.

label:     .directive    followed by something
another_label:           # This is an empty statement.
           instruction   operand_1, operand_2, ...

Constants

A constant is a number, written so that its value is known by inspection, without knowing any context. Like this:

.byte  74, 0112, 092, 0x4A, 0X4a, 'J, '\J # All the same value.
.ascii "Ring the bell\7"                  # A string constant.
.octa  0x123456789abcdef0123456789ABCDEF0 # A bignum.
.float 0f-314159265358979323846264338327\
95028841971.693993751E-40                 # - pi, a flonum.

Character Constants

There are two kinds of character constants. A character stands for one character in one byte and its value may be used in numeric expressions. String constants (properly called string literals) are potentially many bytes and their values may not be used in arithmetic expressions.

Strings

A string is written between double-quotes. It may contain double-quotes or null characters. The way to get special characters into a string is to escape these characters: precede them with a backslash `\' character. For example `\\' represents one backslash: the first \ is an escape which tells to interpret the second character literally as a backslash (which prevents from recognizing the second \ as an escape character). The complete list of escapes follows.

\b
Mnemonic for backspace; for ASCII this is octal code 010.
\f
Mnemonic for FormFeed; for ASCII this is octal code 014.
\n
Mnemonic for newline; for ASCII this is octal code 012.
\r
Mnemonic for carriage-Return; for ASCII this is octal code 015.
\t
Mnemonic for horizontal Tab; for ASCII this is octal code 011.
\ digit digit digit
An octal character code. The numeric code is 3 octal digits. For compatibility with other Unix systems, 8 and 9 are accepted as digits: for example, \008 has the value 010, and \009 the value 011.
\\
Represents one `\' character.
\"
Represents one `"' character. Needed in strings to represent this character, because an unescaped `"' would end the string.
\ anything-else
Any other character when escaped by \ gives a warning, but assembles as if the `\' was not present. The idea is that if you used an escape sequence you clearly didn't want the literal interpretation of the following character. However has no other interpretation, so knows it is giving you the wrong code and warns you of the fact.

Which characters are escapable, and what those escapes represent, varies widely among assemblers. The current set is what we think the BSD 4.2 assembler recognizes, and is a subset of what most C compilers recognize. If you are in doubt, do not use an escape sequence.

Characters

A single character may be written as a single quote immediately followed by that character. The same escapes apply to characters as to strings. So if you want to write the character backslash, you must write '\\ where the first \ escapes the second \. As you can see, the quote is an acute accent, not a grave accent. A newline (or semicolon `;') immediately following an acute accent is taken as a literal character and does not count as the end of a statement. The value of a character constant in a numeric expression is the machine's byte-wide code for that character. assumes your character code is ASCII: 'A means 65, 'B means 66, and so on.

Number Constants

distinguishes three kinds of numbers according to how they are stored in the target machine. Integers are numbers that would fit into an int in the C language. Bignums are integers, but they are stored in more than 32 bits. Flonums are floating point numbers, described below.

Integers

A binary integer is `0b' or `0B' followed by zero or more of the binary digits `01'.

An octal integer is `0' followed by zero or more of the octal digits (`01234567').

A decimal integer starts with a non-zero digit followed by zero or more digits (`0123456789').

A hexadecimal integer is `0x' or `0X' followed by one or more hexadecimal digits chosen from `0123456789abcdefABCDEF'.

Integers have the usual values. To denote a negative integer, use the prefix operator `-' discussed under expressions (see section Prefix Operator).

Bignums

A bignum has the same syntax and semantics as an integer except that the number (or its negative) takes more than 32 bits to represent in binary. The distinction is made because in some places integers are permitted while bignums are not.

Flonums

A flonum represents a floating point number. The translation is indirect: a decimal floating point number from the text is converted by to a generic binary floating point number of more than sufficient precision. This generic floating point number is converted to a particular computer's floating point format (or formats) by a portion of specialized to that computer.

A flonum is written by writing (in order)

At least one of the integer part or the fractional part must be present. The floating point number has the usual base-10 value.

does all processing using integers. Flonums are computed independently of any floating point hardware in the computer running .

into a field whose width depends on which assembler directive has the bit-field as its argument. Overflow (a result from the bitwise and requiring more binary digits to represent) is not an error; instead, more constants are generated, of the specified width, beginning with the least significant digits.

The directives .byte, .hword, .int, .long, .short, and .word accept bit-field arguments.


Go to the first, previous, next, last section, table of contents.