4.17. Hardware Models and Configurations

Earlier we discussed the standard option -b which chooses among different installed compilers for completely different target machines, such as 68000 vs. 80386.

In addition, each of these target machine types can have its own special options, starting with -m, to choose among various hardware models or configurations--for example, 68010 vs 68020, floating coprocessor or none. A single installed version of the compiler can compile for any model or configuration, according to the options specified.

Some configurations of the compiler also support additional special options, usually for compatibility with other compilers on the same platform.

These options are defined by the macro TARGET_SWITCHES in the machine description. The default for the options is also defined by that macro, which enables you to change the defaults.

4.17.1. SPARC Options

These -m switches are supported on the SPARC:

-mno-app-regs, -mapp-regs

Specify -mapp-regs to generate output using the global registers 2 through 4, which the SPARC SVR4 ABI reserves for applications. This is the default.

To be fully SVR4 ABI compliant at the cost of some performance loss, specify -mno-app-regs. You should compile libraries and system software with this option.

-mfpu, -mhard-float

Generate output containing floating point instructions. This is the default.

-mno-fpu, -msoft-float

Generate output containing library calls for floating point. Warning: the requisite libraries are not available for all SPARC targets. Normally the facilities of the machine's usual C compiler are used, but this cannot be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation. The embedded targets sparc-*-aout and sparclite-*-* do provide software floating point support.

-msoft-float changes the calling convention in the output file; therefore, it is only useful if you compile all of a program with this option. In particular, you need to compile libgcc.a, the library that comes with GCC, with -msoft-float in order for this to work.

-mhard-quad-float

Generate output containing quad-word (long double) floating point instructions.

-msoft-quad-float

Generate output containing library calls for quad-word (long double) floating point instructions. The functions called are those specified in the SPARC ABI. This is the default.

As of this writing, there are no sparc implementations that have hardware support for the quad-word floating point instructions. They all invoke a trap handler for one of these instructions, and then the trap handler emulates the effect of the instruction. Because of the trap handler overhead, this is much slower than calling the ABI library routines. Thus the -msoft-quad-float option is the default.

-mno-flat, -mflat

With -mflat, the compiler does not generate save/restore instructions and will use a "flat" or single register window calling convention. This model uses %i7 as the frame pointer and is compatible with the normal register window model. Code from either may be intermixed. The local registers and the input registers (0-5) are still treated as "call saved" registers and will be saved on the stack as necessary.

With -mno-flat (the default), the compiler emits save/restore instructions (except for leaf functions) and is the normal mode of operation.

-mno-unaligned-doubles, -munaligned-doubles

Assume that doubles have 8 byte alignment. This is the default.

With -munaligned-doubles, GCC assumes that doubles have 8 byte alignment only if they are contained in another type, or if they have an absolute address. Otherwise, it assumes they have 4 byte alignment. Specifying this option avoids some rare compatibility problems with code generated by other compilers. It is not the default because it results in a performance loss, especially for floating point code.

-mno-faster-structs, -mfaster-structs

With -mfaster-structs, the compiler assumes that structures should have 8 byte alignment. This enables the use of pairs of ldd and std instructions for copies in structure assignment, in place of twice as many ld and st pairs. However, the use of this changed alignment directly violates the SPARC ABI. Thus, it is intended only for use on targets where the developer acknowledges that their resulting code will not be directly in line with the rules of the ABI.

-mv8, -msparclite

These two options select variations on the SPARC architecture.

By default (unless specifically configured for the Fujitsu SPARClite), GCC generates code for the v7 variant of the SPARC architecture.

-mv8 will give you SPARC v8 code. The only difference from v7 code is that the compiler emits the integer multiply and integer divide instructions which exist in SPARC v8 but not in SPARC v7.

-msparclite will give you SPARClite code. This adds the integer multiply, integer divide step and scan (ffs) instructions which exist in SPARClite but not in SPARC v7.

These options are deprecated and will be deleted in a future GCC release. They have been replaced with -mcpu=xxx.

-mcypress, -msupersparc

These two options select the processor for which the code is optimized.

With -mcypress (the default), the compiler optimizes code for the Cypress CY7C602 chip, as used in the SPARCStation/SPARCServer 3xx series. This is also appropriate for the older SPARCStation 1, 2, IPX etc.

With -msupersparc the compiler optimizes code for the SuperSPARC cpu, as used in the SPARCStation 10, 1000 and 2000 series. This flag also enables use of the full SPARC v8 instruction set.

These options are deprecated and will be deleted in a future GCC release. They have been replaced with -mcpu=xxx.

-mcpu=cpu_type

Set the instruction set, register set, and instruction scheduling parameters for machine type cpu_type. Supported values for cpu_type are v7, cypress, v8, supersparc, sparclite, hypersparc, sparclite86x, f930, f934, sparclet, tsc701, v9, ultrasparc, and ultrasparc3.

Default instruction scheduling parameters are used for values that select an architecture and not an implementation. These are v7, v8, sparclite, sparclet, v9.

Here is a list of each supported architecture and their supported implementations.

    v7:             cypress
    v8:             supersparc, hypersparc
    sparclite:      f930, f934, sparclite86x
    sparclet:       tsc701
    v9:             ultrasparc, ultrasparc3
-mtune=cpu_type

Set the instruction scheduling parameters for machine type cpu_type, but do not set the instruction set or register set that the option -mcpu=cpu_type would.

The same values for -mcpu=cpu_type can be used for -mtune=cpu_type, but the only useful values are those that select a particular cpu implementation. Those are cypress, supersparc, hypersparc, f930, f934, sparclite86x, tsc701, ultrasparc, and ultrasparc3.

These -m switches are supported in addition to the above on the SPARCLET processor.

-mlittle-endian

Generate code for a processor running in little-endian mode.

-mlive-g0

Treat register %g0 as a normal register. GCC will continue to clobber it as necessary but will not assume it always reads as 0.

-mbroken-saverestore

Generate code that does not use non-trivial forms of the save and restore instructions. Early versions of the SPARCLET processor do not correctly handle save and restore instructions used with arguments. They correctly handle them used without arguments. A save instruction used without arguments increments the current window pointer but does not allocate a new stack frame. It is assumed that the window overflow trap handler will properly handle this case as will interrupt handlers.

These -m switches are supported in addition to the above on SPARC V9 processors in 64-bit environments.

-mlittle-endian

Generate code for a processor running in little-endian mode.

-m32, -m64

Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits.

-mcmodel=medlow

Generate code for the Medium/Low code model: the program must be linked in the low 32 bits of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked.

-mcmodel=medmid

Generate code for the Medium/Middle code model: the program must be linked in the low 44 bits of the address space, the text segment must be less than 2G bytes, and data segment must be within 2G of the text segment. Pointers are 64 bits.

-mcmodel=medany

Generate code for the Medium/Anywhere code model: the program may be linked anywhere in the address space, the text segment must be less than 2G bytes, and data segment must be within 2G of the text segment. Pointers are 64 bits.

-mcmodel=embmedany

Generate code for the Medium/Anywhere code model for embedded systems: assume a 32-bit text and a 32-bit data segment, both starting anywhere (determined at link time). Register %g4 points to the base of the data segment. Pointers are still 64 bits. Programs are statically linked, PIC is not supported.

-mstack-bias, -mno-stack-bias

With -mstack-bias, GCC assumes that the stack pointer, and frame pointer if present, are offset by -2047 which must be added back when making stack frame references. Otherwise, assume no such offset is present.

4.17.2. IBM RS/6000 and PowerPC Options

These -m options are defined for the IBM RS/6000 and PowerPC:

-mpower, -mno-power, -mpower2, -mno-power2, -mpowerpc, -mno-powerpc, -mpowerpc-gpopt, -mno-powerpc-gpopt, -mpowerpc-gfxopt, -mno-powerpc-gfxopt, -mpowerpc64, -mno-powerpc64

GCC supports two related instruction set architectures for the RS/6000 and PowerPC. The POWER instruction set are those instructions supported by the rios chip set used in the original RS/6000 systems and the PowerPC instruction set is the architecture of the Motorola MPC5xx, MPC6xx, MPC8xx microprocessors, and the IBM 4xx microprocessors.

Neither architecture is a subset of the other. However there is a large common subset of instructions supported by both. An MQ register is included in processors supporting the POWER architecture.

You use these options to specify which instructions are available on the processor you are using. The default value of these options is determined when configuring GCC. Specifying the -mcpu=cpu_type overrides the specification of these options. We recommend you use the -mcpu=cpu_type option rather than the options listed above.

The -mpower option allows GCC to generate instructions that are found only in the POWER architecture and to use the MQ register. Specifying -mpower2 implies -power and also allows GCC to generate instructions that are present in the POWER2 architecture but not the original POWER architecture.

The -mpowerpc option allows GCC to generate instructions that are found only in the 32-bit subset of the PowerPC architecture. Specifying -mpowerpc-gpopt implies -mpowerpc and also allows GCC to use the optional PowerPC architecture instructions in the General Purpose group, including floating-point square root. Specifying -mpowerpc-gfxopt implies -mpowerpc and also allows GCC to use the optional PowerPC architecture instructions in the Graphics group, including floating-point select.

The -mpowerpc64 option allows GCC to generate the additional 64-bit instructions that are found in the full PowerPC64 architecture and to treat GPRs as 64-bit, doubleword quantities. GCC defaults to -mno-powerpc64.

If you specify both -mno-power and -mno-powerpc, GCC will use only the instructions in the common subset of both architectures plus some special AIX common-mode calls, and will not use the MQ register. Specifying both -mpower and -mpowerpc permits GCC to use any instruction from either architecture and to allow use of the MQ register; specify this for the Motorola MPC601.

-mnew-mnemonics, -mold-mnemonics

Select which mnemonics to use in the generated assembler code. With -mnew-mnemonics, GCC uses the assembler mnemonics defined for the PowerPC architecture. With -mold-mnemonics it uses the assembler mnemonics defined for the POWER architecture. Instructions defined in only one architecture have only one mnemonic; GCC uses that mnemonic irrespective of which of these options is specified.

GCC defaults to the mnemonics appropriate for the architecture in use. Specifying -mcpu=cpu_type sometimes overrides the value of these option. Unless you are building a cross-compiler, you should normally not specify either -mnew-mnemonics or -mold-mnemonics, but should instead accept the default.

-mcpu=cpu_type

Set architecture type, register usage, choice of mnemonics, and instruction scheduling parameters for machine type cpu_type. Supported values for cpu_type are rios, rios1, rsc, rios2, rs64a, 601, 602, 603, 603e, 604, 604e, 620, 630, 740, 7400, 7450, 750, power, power2, powerpc, 403, 505, 801, 821, 823, and 860 and common.

-mcpu=common selects a completely generic processor. Code generated under this option will run on any POWER or PowerPC processor. GCC will use only the instructions in the common subset of both architectures, and will not use the MQ register. GCC assumes a generic processor model for scheduling purposes.

-mcpu=power, -mcpu=power2, -mcpu=powerpc, and -mcpu=powerpc64 specify generic POWER, POWER2, pure 32-bit PowerPC (i.e., not MPC601), and 64-bit PowerPC architecture machine types, with an appropriate, generic processor model assumed for scheduling purposes.

The other options specify a specific processor. Code generated under those options will run best on that processor, and may not run at all on others.

The -mcpu options automatically enable or disable other -m options as follows:

common

-mno-power, -mno-powerpc

power, power2, rios1, rios2, rsc

-mpower, -mno-powerpc, -mno-new-mnemonics

powerpc, rs64a, 602, 603, 603e, 604, 620, 630, 740, 7400, 7450, 750, 505

-mno-power, -mpowerpc, -mnew-mnemonics

601

-mpower, -mpowerpc, -mnew-mnemonics

403, 821, 860

-mno-power, -mpowerpc, -mnew-mnemonics, -msoft-float

-mtune=cpu_type

Set the instruction scheduling parameters for machine type cpu_type, but do not set the architecture type, register usage, or choice of mnemonics, as -mcpu=cpu_type would. The same values for cpu_type are used for -mtune as for -mcpu. If both are specified, the code generated will use the architecture, registers, and mnemonics set by -mcpu, but the scheduling parameters set by -mtune.

-maltivec, -mno-altivec

These switches enable or disable the use of built-in functions that allow access to the AltiVec instruction set. You may also need to set -mabi=altivec to adjust the current ABI with AltiVec ABI enhancements.

-mabi=spe

Extend the current ABI with SPE ABI extensions. This does not change the default ABI, instead it adds the SPE ABI extensions to the current ABI.

-mabi=no-spe

Disable Booke SPE ABI extensions for the current ABI.

-misel=yes/no, -misel

This switch enables or disables the generation of ISEL instructions.

-mspe=yes/no, -mspe

This switch enables or disables the generation of SPE simd instructions.

-mfloat-gprs=yes/no, -mfloat-gprs

This switch enables or disables the generation of floating point operations on the general purpose registers for architectures that support it. This option is currently only available on the MPC8540.

-mfull-toc, -mno-fp-in-toc, -mno-sum-in-toc, -mminimal-toc

Modify generation of the TOC (Table Of Contents), which is created for every executable file. The -mfull-toc option is selected by default. In that case, GCC will allocate at least one TOC entry for each unique non-automatic variable reference in your program. GCC will also place floating-point constants in the TOC. However, only 16,384 entries are available in the TOC.

If you receive a linker error message that saying you have overflowed the available TOC space, you can reduce the amount of TOC space used with the -mno-fp-in-toc and -mno-sum-in-toc options. -mno-fp-in-toc prevents GCC from putting floating-point constants in the TOC and -mno-sum-in-toc forces GCC to generate code to calculate the sum of an address and a constant at run-time instead of putting that sum into the TOC. You may specify one or both of these options. Each causes GCC to produce very slightly slower and larger code at the expense of conserving TOC space.

If you still run out of space in the TOC even when you specify both of these options, specify -mminimal-toc instead. This option causes GCC to make only one TOC entry for every file. When you specify this option, GCC will produce code that is slower and larger but which uses extremely little TOC space. You may wish to use this option only on files that contain less frequently executed code.

-maix64, -maix32

Enable 64-bit AIX ABI and calling convention: 64-bit pointers, 64-bit long type, and the infrastructure needed to support them. Specifying -maix64 implies -mpowerpc64 and -mpowerpc, while -maix32 disables the 64-bit ABI and implies -mno-powerpc64. GCC defaults to -maix32.

-mxl-call, -mno-xl-call

On AIX, pass floating-point arguments to prototyped functions beyond the register save area (RSA) on the stack in addition to argument FPRs. The AIX calling convention was extended but not initially documented to handle an obscure K&R C case of calling a function that takes the address of its arguments with fewer arguments than declared. AIX XL compilers access floating point arguments which do not fit in the RSA from the stack when a subroutine is compiled without optimization. Because always storing floating-point arguments on the stack is inefficient and rarely needed, this option is not enabled by default and only is necessary when calling subroutines compiled by AIX XL compilers without optimization.

-mpe

Support IBM RS/6000 SP Parallel Environment (PE). Link an application written to use message passing with special startup code to enable the application to run. The system must have PE installed in the standard location (/usr/lpp/ppe.poe/), or the specs file must be overridden with the -specs= option to specify the appropriate directory location. The Parallel Environment does not support threads, so the -mpe option and the -pthread option are incompatible.

-malign-natural, -malign-power

On AIX and 64-bit PowerPC Linux, the option -malign-natural overrides the ABI-defined alignment of larger types, such as floating-point doubles, on their natural size-based boundary. The option -malign-power instructs GCC to follow the ABI-specified alignment rules. GCC defaults to the standard alignment defined in the ABI.

-msoft-float, -mhard-float

Generate code that does not use (uses) the floating-point register set. Software floating point emulation is provided if you use the -msoft-float option, and pass the option to GCC when linking.

-mmultiple, -mno-multiple

Generate code that uses (does not use) the load multiple word instructions and the store multiple word instructions. These instructions are generated by default on POWER systems, and not generated on PowerPC systems. Do not use -mmultiple on little endian PowerPC systems, since those instructions do not work when the processor is in little endian mode. The exceptions are PPC740 and PPC750 which permit the instructions usage in little endian mode.

-mstring, -mno-string

Generate code that uses (does not use) the load string instructions and the store string word instructions to save multiple registers and do small block moves. These instructions are generated by default on POWER systems, and not generated on PowerPC systems. Do not use -mstring on little endian PowerPC systems, since those instructions do not work when the processor is in little endian mode. The exceptions are PPC740 and PPC750 which permit the instructions usage in little endian mode.

-mupdate, -mno-update

Generate code that uses (does not use) the load or store instructions that update the base register to the address of the calculated memory location. These instructions are generated by default. If you use -mno-update, there is a small window between the time that the stack pointer is updated and the address of the previous frame is stored, which means code that walks the stack frame across interrupts or signals may get corrupted data.

-mfused-madd, -mno-fused-madd

Generate code that uses (does not use) the floating point multiply and accumulate instructions. These instructions are generated by default if hardware floating is used.

-mno-bit-align, -mbit-align

On embedded PowerPC systems do not (do) force structures and unions that contain bit-fields to be aligned to the base type of the bit-field.

For example, by default a structure containing nothing but 8 unsigned bit-fields of length 1 would be aligned to a 4 byte boundary and have a size of 4 bytes. By using -mno-bit-align, the structure would be aligned to a 1 byte boundary and be one byte in size.

-mno-strict-align, -mstrict-align

On embedded PowerPC systems do not (do) assume that unaligned memory references will be handled by the system.

-mrelocatable, -mno-relocatable

On embedded PowerPC systems generate code that allows (does not allow) the program to be relocated to a different address at runtime. If you use -mrelocatable on any module, all objects linked together must be compiled with -mrelocatable or -mrelocatable-lib.

-mrelocatable-lib, -mno-relocatable-lib

On embedded PowerPC systems generate code that allows (does not allow) the program to be relocated to a different address at runtime. Modules compiled with -mrelocatable-lib can be linked with either modules compiled without -mrelocatable and -mrelocatable-lib or with modules compiled with the -mrelocatable options.

-mno-toc, -mtoc

On embedded PowerPC systems do not (do) assume that register 2 contains a pointer to a global area pointing to the addresses used in the program.

-mlittle, -mlittle-endian

On embedded PowerPC systems compile code for the processor in little endian mode. The -mlittle-endian option is the same as -mlittle.

-mbig, -mbig-endian

On embedded PowerPC systems compile code for the processor in big endian mode. The -mbig-endian option is the same as -mbig.

-mcall-sysv

On embedded PowerPC systems compile code using calling conventions that adheres to the March 1995 draft of the System V Application Binary Interface, PowerPC processor supplement. This is the default unless you configured GCC using powerpc-*-eabiaix.

-mcall-sysv-eabi

Specify both -mcall-sysv and -meabi options.

-mcall-sysv-noeabi

Specify both -mcall-sysv and -mno-eabi options.

-mcall-solaris

On embedded PowerPC systems compile code for the Solaris operating system.

-mcall-linux

On embedded PowerPC systems compile code for the Linux-based GNU system.

-mcall-gnu

On embedded PowerPC systems compile code for the Hurd-based GNU system.

-mcall-netbsd

On embedded PowerPC systems compile code for the NetBSD operating system.

-maix-struct-return

Return all structures in memory (as specified by the AIX ABI).

-msvr4-struct-return

Return structures smaller than 8 bytes in registers (as specified by the SVR4 ABI).

-mabi=altivec

Extend the current ABI with AltiVec ABI extensions. This does not change the default ABI, instead it adds the AltiVec ABI extensions to the current ABI.

-mabi=no-altivec

Disable AltiVec ABI extensions for the current ABI.

-mprototype, -mno-prototype

On embedded PowerPC systems assume that all calls to variable argument functions are properly prototyped. Otherwise, the compiler must insert an instruction before every non prototyped call to set or clear bit 6 of the condition code register (CR) to indicate whether floating point values were passed in the floating point registers in case the function takes a variable arguments. With -mprototype, only calls to prototyped variable argument functions will set or clear the bit.

-msim

On embedded PowerPC systems, assume that the startup module is called sim-crt0.o and that the standard C libraries are libsim.a and libc.a. This is the default for powerpc-*-eabisim. configurations.

-mmvme

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libmvme.a and libc.a.

-mads

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libads.a and libc.a.

-myellowknife

On embedded PowerPC systems, assume that the startup module is called crt0.o and the standard C libraries are libyk.a and libc.a.

-mvxworks

On embedded PowerPC systems, specify that you are compiling for a VxWorks system.

-mwindiss

Specify that you are compiling for the WindISS simulation environment.

-memb

On embedded PowerPC systems, set the PPC_EMB bit in the ELF flags header to indicate that eabi extended relocations are used.

-meabi, -mno-eabi

On embedded PowerPC systems do (do not) adhere to the Embedded Applications Binary Interface (eabi) which is a set of modifications to the System V.4 specifications. Selecting -meabi means that the stack is aligned to an 8 byte boundary, a function __eabi is called to from main to set up the eabi environment, and the -msdata option can use both r2 and r13 to point to two separate small data areas. Selecting -mno-eabi means that the stack is aligned to a 16 byte boundary, do not call an initialization function from main, and the -msdata option will only use r13 to point to a single small data area. The -meabi option is on by default if you configured GCC using one of the powerpc*-*-eabi* options.

-msdata=eabi

On embedded PowerPC systems, put small initialized const global and static data in the .sdata2 section, which is pointed to by register r2. Put small initialized non-const global and static data in the .sdata section, which is pointed to by register r13. Put small uninitialized global and static data in the .sbss section, which is adjacent to the .sdata section. The -msdata=eabi option is incompatible with the -mrelocatable option. The -msdata=eabi option also sets the -memb option.

-msdata=sysv

On embedded PowerPC systems, put small global and static data in the .sdata section, which is pointed to by register r13. Put small uninitialized global and static data in the .sbss section, which is adjacent to the .sdata section. The -msdata=sysv option is incompatible with the -mrelocatable option.

-msdata=default, -msdata

On embedded PowerPC systems, if -meabi is used, compile code the same as -msdata=eabi, otherwise compile code the same as -msdata=sysv.

-msdata-data

On embedded PowerPC systems, put small global and static data in the .sdata section. Put small uninitialized global and static data in the .sbss section. Do not use register r13 to address small data however. This is the default behavior unless other -msdata options are used.

-msdata=none, -mno-sdata

On embedded PowerPC systems, put all initialized global and static data in the .data section, and all uninitialized data in the .bss section.

-G num

On embedded PowerPC systems, put global and static items less than or equal to num bytes into the small data or bss sections instead of the normal data or bss section. By default, num is 8. The -G num switch is also passed to the linker. All modules should be compiled with the same -G num value.

-mregnames, -mno-regnames

On embedded PowerPC systems do (do not) emit register names in the assembly language output using symbolic forms.

-mlongcall, -mno-longcall

Default to making all function calls via pointers, so that functions which reside further than 64 megabytes (67,108,864 bytes) from the current location can be called. This setting can be overridden by the shortcall function attribute, or by #pragma longcall(0).

Some linkers are capable of detecting out-of-range calls and generating glue code on the fly. On these systems, long calls are unnecessary and generate slower code. As of this writing, the AIX linker can do this, as can the GNU linker for PowerPC/64. It is planned to add this feature to the GNU linker for 32-bit PowerPC systems as well.

In the future, we may cause GCC to ignore all longcall specifications when the linker is known to generate glue.

-pthread

Adds support for multithreading with the pthreads library. This option sets flags for both the preprocessor and linker.

4.17.3. Intel 386 and AMD x86-64 Options

These -m options are defined for the i386 and x86-64 family of computers:

-mtune=cpu-type

Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are i386, i486, i586, i686, pentium, pentium-mmx, pentiumpro, pentium2, pentium3, pentium4, k6, k6-2, k6-3, athlon, athlon-tbird, athlon-4, athlon-xp, athlon-mp, winchip-c6, winchip2, k8, c3 and c3-2.

While picking a specific cpu-type will schedule things appropriately for that particular chip, the compiler will not generate any code that does not run on the i386 without the -march=cpu-type option being used. i586 is equivalent to pentium and i686 is equivalent to pentiumpro. k6 and athlon are the AMD chips as opposed to the Intel ones.

-march=cpu-type

Generate instructions for the machine type cpu-type. The choices for cpu-type are the same as for -mtune. Moreover, specifying -march=cpu-type implies -mtune=cpu-type.

-mcpu=cpu-type

A deprecated synonym for -mtune.

-m386, -m486, -mpentium, -mpentiumpro

These options are synonyms for -mtune=i386, -mtune=i486, -mtune=pentium, and -mtune=pentiumpro respectively. These synonyms are deprecated.

-mfpmath=unit

Generate floating point arithmetics for selected unit unit. the choices for unit are:

387

Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.

sse

Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.

For i387 you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.

This is the default choice for x86-64 compiler.

sse,387

Attempt to utilize both instruction sets at once. This effectively double the amount of available registers and on chips with separate execution units for 387 and SSE the execution resources too. Use this option with care, as it is still experimental, because gcc register allocator does not model separate functional units well resulting in instable performance.

-masm=dialect

Output asm instructions using selected dialect. Supported choices are intel or att (the default one).

-mieee-fp, -mno-ieee-fp

Control whether or not the compiler uses IEEE floating point comparisons. These handle correctly the case where the result of a comparison is unordered.

-msoft-float

Generate output containing library calls for floating point. Warning: the requisite libraries are not part of GCC. Normally the facilities of the machine's usual C compiler are used, but this can't be done directly in cross-compilation. You must make your own arrangements to provide suitable library functions for cross-compilation.

On machines where a function returns floating point results in the 80387 register stack, some floating point opcodes may be emitted even if -msoft-float is used.

-mno-fp-ret-in-387

Do not use the FPU registers for return values of functions.

The usual calling convention has functions return values of types float and double in an FPU register, even if there is no FPU. The idea is that the operating system should emulate an FPU.

The option -mno-fp-ret-in-387 causes such values to be returned in ordinary CPU registers instead.

-mno-fancy-math-387

Some 387 emulators do not support the sin, cos and sqrt instructions for the 387. Specify this option to avoid generating those instructions. This option is the default on FreeBSD, OpenBSD and NetBSD. This option is overridden when -march indicates that the target cpu will always have an FPU and so the instruction will not need emulation. As of revision 2.6.1, these instructions are not generated unless you also use the -funsafe-math-optimizations switch.

-malign-double, -mno-align-double

Control whether GCC aligns double, long double, and long long variables on a two word boundary or a one word boundary. Aligning double variables on a two word boundary will produce code that runs somewhat faster on a Pentium at the expense of more memory.

Warning: if you use the -malign-double switch, structures containing the above types will be aligned differently than the published application binary interface specifications for the 386 and will not be binary compatible with structures in code compiled without that switch.

-m96bit-long-double, -m128bit-long-double

These switches control the size of long double type. The i386 application binary interface specifies the size to be 96 bits, so -m96bit-long-double is the default in 32 bit mode.

Modern architectures (Pentium and newer) would prefer long double to be aligned to an 8 or 16 byte boundary. In arrays or structures conforming to the ABI, this would not be possible. So specifying a -m128bit-long-double will align long double to a 16 byte boundary by padding the long double with an additional 32 bit zero.

In the x86-64 compiler, -m128bit-long-double is the default choice as its ABI specifies that long double is to be aligned on 16 byte boundary.

Notice that neither of these options enable any extra precision over the x87 standard of 80 bits for a long double.

Warning: if you override the default value for your target ABI, the structures and arrays containing long double will change their size as well as function calling convention for function taking long double will be modified. Hence they will not be binary compatible with arrays or structures in code compiled without that switch.

-mrtd

Use a different function-calling convention, in which functions that take a fixed number of arguments return with the ret num instruction, which pops their arguments while returning. This saves one instruction in the caller since there is no need to pop the arguments there.

You can specify that an individual function is called with this calling sequence with the function attribute stdcall. You can also override the -mrtd option by using the function attribute cdecl. Refer to Section 6.26 Declaring Attributes of Functions.

Warning: this calling convention is incompatible with the one normally used on Unix, so you cannot use it if you need to call libraries compiled with the Unix compiler.

Also, you must provide function prototypes for all functions that take variable numbers of arguments (including printf); otherwise incorrect code will be generated for calls to those functions.

In addition, seriously incorrect code will result if you call a function with too many arguments. (Normally, extra arguments are harmlessly ignored.)

-mregparm=num

Control how many registers are used to pass integer arguments. By default, no registers are used to pass arguments, and at most 3 registers can be used. You can control this behavior for a specific function by using the function attribute regparm. Section 6.26 Declaring Attributes of Functions.

Warning: if you use this switch, and num is nonzero, then you must build all modules with the same value, including any libraries. This includes the system libraries and startup modules.

-mpreferred-stack-boundary=num

Attempt to keep the stack boundary aligned to a 2 raised to num byte boundary. If -mpreferred-stack-boundary is not specified, the default is 4 (16 bytes or 128 bits), except when optimizing for code size (-Os), in which case the default is the minimum correct alignment (4 bytes for x86, and 8 bytes for x86-64).

On Pentium and PentiumPro, double and long double values should be aligned to an 8 byte boundary (see -malign-double) or suffer significant run time performance penalties. On Pentium III, the Streaming SIMD Extension (SSE) data type __m128 suffers similar penalties if it is not 16 byte aligned.

To ensure proper alignment of this values on the stack, the stack boundary must be as aligned as that required by any value stored on the stack. Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. It is recommended that libraries that use callbacks always use the default setting.

This extra alignment does consume extra stack space, and generally increases code size. Code that is sensitive to stack space usage, such as embedded systems and operating system kernels, may want to reduce the preferred alignment to -mpreferred-stack-boundary=2.

-mmmx, -mno-mmx, -msse, -mno-sse, -msse2, -mno-sse2, -m3dnow, -mno-3dnow

These switches enable or disable the use of built-in functions that allow direct access to the MMX, SSE and 3Dnow extensions of the instruction set.

Section 6.47.1 X86 Built-in Functions, for details of the functions enabled and disabled by these switches.

To have SSE/SSE2 instructions generated automatically from floating-point code, see -mfpmath=sse.

-mpush-args, -mno-push-args

Use PUSH operations to store outgoing parameters. This method is shorter and usually equally fast as method using SUB/MOV operations and is enabled by default. In some cases disabling it may improve performance because of improved scheduling and reduced dependencies.

-maccumulate-outgoing-args

If enabled, the maximum amount of space required for outgoing arguments will be computed in the function prologue. This is faster on most modern CPUs because of reduced dependencies, improved scheduling and reduced stack usage when preferred stack boundary is not equal to 2. The drawback is a notable increase in code size. This switch implies -mno-push-args.

-mthreads

Support thread-safe exception handling on Mingw32. Code that relies on thread-safe exception handling must compile and link all code with the -mthreads option. When compiling, -mthreads defines -D_MT; when linking, it links in a special thread helper library -lmingwthrd which cleans up per thread exception handling data.

-mno-align-stringops

Do not align destination of inlined string operations. This switch reduces code size and improves performance in case the destination is already aligned, but gcc don't know about it.

-minline-all-stringops

By default GCC inlines string operations only when destination is known to be aligned at least to 4 byte boundary. This enables more inlining, increase code size, but may improve performance of code that depends on fast memcpy, strlen and memset for short lengths.

-momit-leaf-frame-pointer

Don't keep the frame pointer in a register for leaf functions. This avoids the instructions to save, set up and restore frame pointers and makes an extra register available in leaf functions. The option -fomit-frame-pointer removes the frame pointer for all functions which might make debugging harder.

-mtls-direct-seg-refs, -mno-tls-direct-seg-refs

Controls whether TLS variables may be accessed with offsets from the TLS segment register (%gs for 32-bit, %fs for 64-bit), or whether the thread base pointer must be added. Whether or not this is legal depends on the operating system, and whether it maps the segment to cover the entire TLS area.

For systems that use GNU libc, the default is on.

These -m switches are supported in addition to the above on AMD x86-64 processors in 64-bit environments.

-m32, -m64

Generate code for a 32-bit or 64-bit environment. The 32-bit environment sets int, long and pointer to 32 bits and generates code that runs on any i386 system. The 64-bit environment sets int to 32 bits and long and pointer to 64 bits and generates code for AMD's x86-64 architecture.

-mno-red-zone

Do not use a so called red zone for x86-64 code. The red zone is mandated by the x86-64 ABI, it is a 128-byte area beyond the location of the stack pointer that will not be modified by signal or interrupt handlers and therefore can be used for temporary data without adjusting the stack pointer. The flag -mno-red-zone disables this red zone.

-mcmodel=small

Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=kernel

Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.

-mcmodel=medium

Generate code for the medium model: The program is linked in the lower 2 GB of the address space but symbols can be located anywhere in the address space. Programs can be statically or dynamically linked, but building of shared libraries are not supported with the medium model.

-mcmodel=large

Generate code for the large model: This model makes no assumptions about addresses and sizes of sections. Currently GCC does not implement this model.

4.17.4. IA-64 Options

These are the -m options defined for the Intel IA-64 architecture.

-mbig-endian

Generate code for a big endian target. This is the default for HP-UX.

-mlittle-endian

Generate code for a little endian target. This is the default for AIX5 and Linux.

-mgnu-as, -mno-gnu-as

Generate (or don't) code for the GNU assembler. This is the default.

-mgnu-ld, -mno-gnu-ld

Generate (or don't) code for the GNU linker. This is the default.

-mno-pic

Generate code that does not use a global pointer register. The result is not position independent code, and violates the IA-64 ABI.

-mvolatile-asm-stop, -mno-volatile-asm-stop

Generate (or don't) a stop bit immediately before and after volatile asm statements.

-mb-step

Generate code that works around Itanium B step errata.

-mregister-names, -mno-register-names

Generate (or don't) in, loc, and out register names for the stacked registers. This may make assembler output more readable.

-mno-sdata, -msdata

Disable (or enable) optimizations that use the small data section. This may be useful for working around optimizer bugs.

-mconstant-gp

Generate code that uses a single constant global pointer value. This is useful when compiling kernel code.

-mauto-pic

Generate code that is self-relocatable. This implies -mconstant-gp. This is useful when compiling firmware code.

-minline-float-divide-min-latency

Generate code for inline divides of floating point values using the minimum latency algorithm.

-minline-float-divide-max-throughput

Generate code for inline divides of floating point values using the maximum throughput algorithm.

-minline-int-divide-min-latency

Generate code for inline divides of integer values using the minimum latency algorithm.

-minline-int-divide-max-throughput

Generate code for inline divides of integer values using the maximum throughput algorithm.

-mno-dwarf2-asm, -mdwarf2-asm

Don't (or do) generate assembler code for the DWARF2 line number debugging info. This may be useful when not using the GNU assembler.

-mfixed-range=register-range

Generate code treating the given register range as fixed registers. A fixed register is one that the register allocator can not use. This is useful when compiling kernel code. A register range is specified as two registers separated by a dash. Multiple register ranges can be specified separated by a comma.

-mearly-stop-bits, -mno-early-stop-bits

Allow stop bits to be placed earlier than immediately preceding the instruction that triggered the stop bit. This can improve instruction scheduling, but does not always do so.

4.17.5. S/390 and zSeries Options

These are the -m options defined for the S/390 and zSeries architecture.

-mhard-float, -msoft-float

Use (do not use) the hardware floating-point instructions and registers for floating-point operations. When -msoft-float is specified, functions in libgcc.a will be used to perform floating-point operations. When -mhard-float is specified, the compiler generates IEEE floating-point instructions. This is the default.

-mbackchain, -mno-backchain

Generate (or do not generate) code which maintains an explicit backchain within the stack frame that points to the caller's frame. This is currently needed to allow debugging. The default is to generate the backchain.

-msmall-exec, -mno-small-exec

Generate (or do not generate) code using the bras instruction to do subroutine calls. This only works reliably if the total executable size does not exceed 64k. The default is to use the basr instruction instead, which does not have this limitation.

-m64, -m31

When -m31 is specified, generate code compliant to the Linux for S/390 ABI. When -m64 is specified, generate code compliant to the Linux for zSeries ABI. This allows GCC in particular to generate 64-bit instructions. For the s390 targets, the default is -m31, while the s390x targets default to -m64.

-mzarch, -mesa

When -mzarch is specified, generate code using the instructions available on z/Architecture. When -mesa is specified, generate code using the instructions available on ESA/390. Note that -mesa is not possible with -m64. For the s390 targets, the default is -mesa, while the s390x targets default to -mzarch.

-mmvcle, -mno-mvcle

Generate (or do not generate) code using the mvcle instruction to perform block moves. When -mno-mvcle is specified, use a mvc loop instead. This is the default.

-mdebug, -mno-debug

Print (or do not print) additional debug information when compiling. The default is to not print debug information.

-march=arch

Generate code that will run on arch, which is the name of system representing a certain processor type. Possible values for cpu-type are g5, g6 and z900.

-mtune=arch

Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The list of arch values is the same as for -march.