Options That Control Optimization
Next: Preprocessor Options, Previous: Debugging Options, Up: Invoking GCC
http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
3.10 Options That Control Optimization
These options control various sorts of optimizations.
Without any optimization option, the compiler"s goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results you expect from the source code.
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program.
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
Not all optimizations are controlled directly by a flag. Only optimizations that have a flag are listed in this section.
Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.
Depending on the target and how GCC was configured, a slightly different set of optimizations may be enabled at each -O level than those listed here. You can invoke GCC with -Q --help=optimizers to find out the exact set of optimizations that are enabled at each level. See Overall Options, for examples.
-O
-O1
- Optimize. Optimizing compilation takes somewhat more time, and a lot more memory for a large function.
With -O, the compiler tries to reduce code size and execution time, without performing any optimizations that take a great deal of compilation time.
-O turns on the following optimization flags:
-fauto-inc-dec -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fguess-branch-probability -fif-conversion2 -fif-conversion -fipa-pure-const -fipa-profile -fipa-reference -fmerge-constants -fsplit-wide-types -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -fssa-phiopt -ftree-ch -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-slsr -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.
-O2
- Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to
-O, this option increases both compilation time and the performance of the generated code.
-O2 turns on all optimization flags specified by -O. It also turns on the following optimization flags:
-fthread-jumps -falign-functions -falign-jumps -falign-loops -falign-labels -fcaller-saves -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fexpensive-optimizations -fgcse -fgcse-lm -fhoist-adjacent-loads -finline-small-functions -findirect-inlining -fipa-sra -fisolate-erroneous-paths-dereference -foptimize-sibling-calls -fpartial-inlining -fpeephole2 -freorder-blocks -freorder-functions -frerun-cse-after-loop -fsched-interblock -fsched-spec -fschedule-insns -fschedule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-switch-conversion -ftree-tail-merge -ftree-pre -ftree-vrp
Please note the warning under -fgcse about invoking -O2 on programs that use computed gotos.
-O3
- Optimize yet more. -O3 turns on all optimizations specified by
-O2 and also turns on the -finline-functions,
-funswitch-loops, -fpredictive-commoning,
-fgcse-after-reload, -ftree-loop-vectorize,
-ftree-slp-vectorize, -fvect-cost-model,
-ftree-partial-pre and -fipa-cp-clone options.
-O0
- Reduce compilation time and make debugging produce the expected results. This is the default.
-Os
- Optimize for size. -Os enables all
-O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks -freorder-blocks-and-partition -fprefetch-loop-arrays
-Ofast
- Disregard strict standards compliance.
-Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on
-ffast-math and the Fortran-specific
-fno-protect-parens and -fstack-arrays.
-Og
- Optimize debugging experience.
-Og enables optimizations that do not interfere with debugging. It should be the optimization level of choice for the standard edit-compile-debug cycle, offering a reasonable level of optimization while maintaining fast compilation
and a good debugging experience.
If you use multiple -O options, with or without level numbers, the last such option is the one that is effective.
Options of the form -fflag specify machine-independent flags. Most flags have both positive and negative forms; the negative form of -ffoo is -fno-foo. In the table below, only one of the forms is listed—the one you typically use. You can figure out the other form by either removing ‘no-’ or adding it.
The following options control specific optimizations. They are either activated by -O options or are related to ones that are. You can use the following flags in the rare cases when “fine-tuning” of optimizations to be performed is desired.
-fno-defer-pop
- Always pop the arguments to each function call as soon as that function returns. For machines that must pop arguments after a function call, the compiler normally lets arguments accumulate on the stack
for several function calls and pops them all at once.
Disabled at levels -O, -O2, -O3, -Os.
-fforward-propagate
- Perform a forward propagation pass on RTL. The pass tries to combine two instructions and checks if the result can be simplified. If loop unrolling is active, two passes are performed and the second
is scheduled after loop unrolling.
This option is enabled by default at optimization levels -O, -O2, -O3, -Os.
-ffp-contract=
style- -ffp-contract=off disables floating-point expression contraction.
-ffp-contract=fast enables floating-point expression contraction such as forming of fused multiply-add operations if the target has native support for them.
-ffp-contract=on enables floating-point expression contraction if allowed by the language standard. This is currently not implemented and treated equal to
-ffp-contract=off.
The default is -ffp-contract=fast.
-fomit-frame-pointer
- Don"t keep the frame pointer in a register for functions that don"t need one. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available
in many functions. It also makes debugging impossible on some machines.
On some machines, such as the VAX, this flag has no effect, because the standard calling sequence automatically handles the frame pointer and nothing is saved by pretending it doesn"t exist. The machine-description macro
FRAME_POINTER_REQUIRED
controls whether a target machine supports this flag. See Register Usage.Starting with GCC version 4.6, the default setting (when not optimizing for size) for 32-bit GNU/Linux x86 and 32-bit Darwin x86 targets has been changed to -fomit-frame-pointer. The default can be reverted to -fno-omit-frame-pointer by configuring GCC with the --enable-frame-pointer configure option.
Enabled at levels -O, -O2, -O3, -Os.
-foptimize-sibling-calls
- Optimize sibling and tail recursive calls.
Enabled at levels -O2, -O3, -Os.
-fno-inline
- Do not expand any functions inline apart from those marked with the
always_inline
attribute. This is the default when not optimizing.Single functions can be exempted from inlining by marking them with the
noinline
attribute. -finline-small-functions
- Integrate functions into their callers when their body is smaller than expected function call code (so overall size of program gets smaller). The compiler heuristically decides which functions
are simple enough to be worth integrating in this way. This inlining applies to all functions, even those not declared inline.
Enabled at level -O2.
-findirect-inlining
- Inline also indirect calls that are discovered to be known at compile time thanks to previous inlining. This option has any effect only when inlining itself is turned on by the
-finline-functions or -finline-small-functions options.
Enabled at level -O2.
-finline-functions
- Consider all functions for inlining, even if they are not declared inline. The compiler heuristically decides which functions are worth integrating in this way.
If all calls to a given function are integrated, and the function is declared
static
, then the function is normally not output as assembler code in its own right.Enabled at level -O3.
-finline-functions-called-once
- Consider all
static
functions called once for inlining into their caller even if they are not markedinline
. If a call to a given function is integrated, then the function is not output as assembler code in its own right.Enabled at levels -O1, -O2, -O3 and -Os.
-fearly-inlining
- Inline functions marked by
always_inline
and functions whose body seems smaller than the function call overhead early before doing -fprofile-generate instrumentation and real inlining pass. Doing so makes profiling significantly cheaper and usually inlining faster on programs having large chains of nested wrapper functions.Enabled by default.
-fipa-sra
- Perform interprocedural scalar replacement of aggregates, removal of unused parameters and replacement of parameters passed by reference by parameters passed by value.
Enabled at levels -O2, -O3 and -Os.
-finline-limit=
n- By default, GCC limits the size of functions that can be inlined. This flag allows coarse control of this limit.
n is the size of functions that can be inlined in number of pseudo instructions.
Inlining is actually controlled by a number of parameters, which may be specified individually by using --param name=value. The -finline-limit=n option sets some of these parameters as follows:
max-inline-insns-single
- is set to n/2.
max-inline-insns-auto
- is set to n/2.
See below for a documentation of the individual parameters controlling inlining and for the defaults of these parameters.
Note: there may be no value to -finline-limit that results in default behavior.
Note: pseudo instruction represents, in this particular context, an abstract measurement of function"s size. In no way does it represent a count of assembly instructions and as such its exact meaning might change from one release to an another.
-fno-keep-inline-dllexport
- This is a more fine-grained version of
-fkeep-inline-functions, which applies only to functions that are declared using the
dllexport
attribute or declspec (See Declaring Attributes of Functions.) -fkeep-inline-functions
- In C, emit
static
functions that are declaredinline
into the object file, even if the function has been inlined into all of its callers. This switch does not affect functions using theextern inline
extension in GNU C90. In C++, emit any and all inline functions into the object file. -fkeep-static-consts
- Emit variables declared
static const
when optimization isn"t turned on, even if the variables aren"t referenced.GCC enables this option by default. If you want to force the compiler to check if a variable is referenced, regardless of whether or not optimization is turned on, use the -fno-keep-static-consts option.
-fmerge-constants
- Attempt to merge identical constants (string constants and floating-point constants) across compilation units.
This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.
Enabled at levels -O, -O2, -O3, -Os.
-fmerge-all-constants
- Attempt to merge identical constants and identical variables.
This option implies -fmerge-constants. In addition to -fmerge-constants this considers e.g. even constant initialized arrays or initialized constant variables with integral or floating-point types. Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior.
-fmodulo-sched
- Perform swing modulo scheduling immediately before the first scheduling pass. This pass looks at innermost loops and reorders their instructions by overlapping different iterations.
-fmodulo-sched-allow-regmoves
- Perform more aggressive SMS-based modulo scheduling with register moves allowed. By setting this flag certain anti-dependences edges are deleted, which triggers the generation of reg-moves
based on the life-range analysis. This option is effective only with -fmodulo-sched enabled.
-fno-branch-count-reg
- Do not use “decrement and branch” instructions on a count register, but instead generate a sequence of instructions that decrement a register, compare it against zero, then branch based upon the
result. This option is only meaningful on architectures that support such instructions, which include x86, PowerPC, IA-64 and S/390.
The default is -fbranch-count-reg.
-fno-function-cse
- Do not put function addresses in registers; make each instruction that calls a constant function contain the function"s address explicitly.
This option results in less efficient code, but some strange hacks that alter the assembler output may be confused by the optimizations performed when this option is not used.
The default is -ffunction-cse
-fno-zero-initialized-in-bss
- If the target supports a BSS section, GCC by default puts variables that are initialized to zero into BSS. This can save space in the resulting code.
This option turns off this behavior because some programs explicitly rely on variables going to the data section—e.g., so that the resulting executable can find the beginning of that section and/or make assumptions based on that.
The default is -fzero-initialized-in-bss.
-fthread-jumps
- Perform optimizations that check to see if a jump branches to a location where another comparison subsumed by the first is found. If so, the first branch is redirected to either the destination of the
second branch or a point immediately following it, depending on whether the condition is known to be true or false.
Enabled at levels -O2, -O3, -Os.
-fsplit-wide-types
- When using a type that occupies multiple registers, such as
long long
on a 32-bit system, split the registers apart and allocate them independently. This normally generates better code for those types, but may make debugging more difficult.Enabled at levels -O, -O2, -O3, -Os.
-fcse-follow-jumps
- In common subexpression elimination (CSE), scan through jump instructions when the target of the jump is not reached by any other path. For example, when CSE encounters an
if
statement with anelse
clause, CSE follows the jump when the condition tested is false.Enabled at levels -O2, -O3, -Os.
-fcse-skip-blocks
- This is similar to
-fcse-follow-jumps, but causes CSE to follow jumps that conditionally skip over blocks. When CSE encounters a simple
if
statement with no else clause, -fcse-skip-blocks causes CSE to follow the jump around the body of theif
.Enabled at levels -O2, -O3, -Os.
-frerun-cse-after-loop
- Re-run common subexpression elimination after loop optimizations are performed.
Enabled at levels -O2, -O3, -Os.
-fgcse
- Perform a global common subexpression elimination pass. This pass also performs global constant and copy propagation.
Note: When compiling a program using computed gotos, a GCC extension, you may get better run-time performance if you disable the global common subexpression elimination pass by adding -fno-gcse to the command line.
Enabled at levels -O2, -O3, -Os.
-fgcse-lm
- When -fgcse-lm is enabled, global common subexpression elimination attempts to move loads that are only killed by stores into themselves. This allows a loop containing
a load/store sequence to be changed to a load outside the loop, and a copy/store within the loop.
Enabled by default when -fgcse is enabled.
-fgcse-sm
- When -fgcse-sm is enabled, a store motion pass is run after global common subexpression elimination. This pass attempts to move stores out of loops. When used in conjunction
with -fgcse-lm, loops containing a load/store sequence can be changed to a load before the loop and a store after the loop.
Not enabled at any optimization level.
-fgcse-las
- When -fgcse-las is enabled, the global common subexpression elimination pass eliminates redundant loads that come after stores to the same memory location (both partial
and full redundancies).
Not enabled at any optimization level.
-fgcse-after-reload
- When -fgcse-after-reload is enabled, a redundant load elimination pass is performed after reload. The purpose of this pass is to clean up redundant spilling.
-faggressive-loop-optimizations
- This option tells the loop optimizer to use language constraints to derive bounds for the number of iterations of a loop. This assumes that loop code does not invoke undefined behavior
by for example causing signed integer overflows or out-of-bound array accesses. The bounds for the number of iterations of a loop are used to guide loop unrolling and peeling and loop exit test optimizations. This option is enabled by default.
-funsafe-loop-optimizations
- This option tells the loop optimizer to assume that loop indices do not overflow, and that loops with nontrivial exit condition are not infinite. This enables a wider range of loop optimizations
even if the loop optimizer itself cannot prove that these assumptions are valid. If you use
-Wunsafe-loop-optimizations, the compiler warns you if it finds this kind of loop.
-fcrossjumping
- Perform cross-jumping transformation. This transformation unifies equivalent code and saves code size. The resulting code may or may not perform better than without cross-jumping.
Enabled at levels -O2, -O3, -Os.
-fauto-inc-dec
- Combine increments or decrements of addresses with memory accesses. This pass is always skipped on architectures that do not have instructions to support this. Enabled by default at
-O and higher on architectures that support this.
-fdce
- Perform dead code elimination (DCE) on RTL. Enabled by default at
-O and higher.
-fdse
- Perform dead store elimination (DSE) on RTL. Enabled by default at
-O and higher.
-fif-conversion
- Attempt to transform conditional jumps into branch-less equivalents. This includes use of conditional moves, min, max, set flags and abs instructions, and some tricks doable by standard arithmetics.
The use of conditional execution on chips where it is available is controlled by
if-conversion2
.Enabled at levels -O, -O2, -O3, -Os.
-fif-conversion2
- Use conditional execution (where available) to transform conditional jumps into branch-less equivalents.
Enabled at levels -O, -O2, -O3, -Os.
-fdeclone-ctor-dtor
- The C++ ABI requires multiple entry points for constructors and destructors: one for a base subobject, one for a complete object, and one for a virtual destructor that calls operator delete afterwards.
For a hierarchy with virtual bases, the base and complete variants are clones, which means two copies of the function. With this option, the base and complete variants are changed to be thunks that call a common implementation.
Enabled by -Os.
-fdelete-null-pointer-checks
- Assume that programs cannot safely dereference null pointers, and that no code or data element resides there. This enables simple constant folding optimizations at all optimization levels.
In addition, other optimization passes in GCC use this flag to control global dataflow analyses that eliminate useless checks for null pointers; these assume that if a pointer is checked after it has already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization for programs that depend on that behavior.
Some targets, especially embedded ones, disable this option at all levels. Otherwise it is enabled at all levels: -O0, -O1, -O2, -O3, -Os. Passes that use the information are enabled independently at different optimization levels.
-fdevirtualize
- Attempt to convert calls to virtual functions to direct calls. This is done both within a procedure and interprocedurally as part of indirect inlining (
-findirect-inlining
) and interprocedural constant propagation (-fipa-cp). Enabled at levels -O2, -O3, -Os. -fdevirtualize-speculatively
- Attempt to convert calls to virtual functions to speculative direct calls. Based on the analysis of the type inheritance graph, determine for a given call the set of likely targets. If
the set is small, preferably of size 1, change the call into an conditional deciding on direct and indirect call. The speculative calls enable more optimizations, such as inlining. When they seem useless after further optimization, they are converted back
into original form.
-fexpensive-optimizations
- Perform a number of minor optimizations that are relatively expensive.
Enabled at levels -O2, -O3, -Os.
-free
- Attempt to remove redundant extension instructions. This is especially helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit registers after writing to their lower 32-bit half.
Enabled for Alpha, AArch64 and x86 at levels -O2, -O3, -Os.
-flive-range-shrinkage
- Attempt to decrease register pressure through register live range shrinkage. This is helpful for fast processors with small or moderate size register sets.
-fira-algorithm=
algorithm- Use the specified coloring algorithm for the integrated register allocator. The
algorithm argument can be ‘priority’, which specifies Chow"s priority coloring, or ‘CB’, which specifies Chaitin-Briggs coloring. Chaitin-Briggs coloring is not implemented
for all architectures, but for those targets that do support it, it is the default because it generates better code.
-fira-region=
region- Use specified regions for the integrated register allocator. The region argument should be one of the following:
- ‘all’
- Use all loops as register allocation regions. This can give the best results for machines with a small and/or irregular register set.
- ‘mixed’
- Use all loops except for loops with small register pressure as the regions. This value usually gives the best results in most cases and for most architectures, and is enabled by default when compiling with optimization for speed (-O,
-O2, ...).
- ‘one’
- Use all functions as a single region. This typically results in the smallest code size, and is enabled by default for -Os or -O0.
-fira-hoist-pressure
- Use IRA to evaluate register pressure in the code hoisting pass for decisions to hoist expressions. This option usually results in smaller code, but it can slow the compiler down.
This option is enabled at level -Os for all targets.
-fira-loop-pressure
- Use IRA to evaluate register pressure in loops for decisions to move loop invariants. This option usually results in generation of faster and smaller code on machines with large register files (>=
32 registers), but it can slow the compiler down.
This option is enabled at level -O3 for some targets.
-fno-ira-share-save-slots
- Disable sharing of stack slots used for saving call-used hard registers living through a call. Each hard register gets a separate stack slot, and as a result function stack frames are larger.
-fno-ira-share-spill-slots
- Disable sharing of stack slots allocated for pseudo-registers. Each pseudo-register that does not get a hard register gets a separate stack slot, and as a result function stack frames are
larger.
-fira-verbose=
n- Control the verbosity of the dump file for the integrated register allocator. The default value is 5. If the value
n is greater or equal to 10, the dump output is sent to stderr using the same format as
n minus 10.
-fdelayed-branch
- If supported for the target machine, attempt to reorder instructions to exploit instruction slots available after delayed branch instructions.
Enabled at levels -O, -O2, -O3, -Os.
-fschedule-insns
- If supported for the target machine, attempt to reorder instructions to eliminate execution stalls due to required data being unavailable. This helps machines that have slow floating point or memory
load instructions by allowing other instructions to be issued until the result of the load or floating-point instruction is required.
Enabled at levels -O2, -O3.
-fschedule-insns2
- Similar to -fschedule-insns, but requests an additional pass of instruction scheduling after register allocation has been done. This is especially useful on
machines with a relatively small number of registers and where memory load instructions take more than one cycle.
Enabled at levels -O2, -O3, -Os.
-fno-sched-interblock
- Don"t schedule instructions across basic blocks. This is normally enabled by default when scheduling before register allocation, i.e. with
-fschedule-insns or at -O2 or higher.
-fno-sched-spec
- Don"t allow speculative motion of non-load instructions. This is normally enabled by default when scheduling before register allocation, i.e. with
-fschedule-insns or at -O2 or higher.
-fsched-pressure
- Enable register pressure sensitive insn scheduling before register allocation. This only makes sense when scheduling before register allocation is enabled, i.e. with
-fschedule-insns or at -O2 or higher. Usage of this option can improve the generated code and decrease its size by preventing register pressure increase above the number of available
hard registers and subsequent spills in register allocation.
-fsched-spec-load
- Allow speculative motion of some load instructions. This only makes sense when scheduling before register allocation, i.e. with
-fschedule-insns or at -O2 or higher.
-fsched-spec-load-dangerous
- Allow speculative motion of more load instructions. This only makes sense when scheduling before register allocation, i.e. with
-fschedule-insns or at -O2 or higher.
-fsched-stalled-insns
-fsched-stalled-insns=
n- Define how many insns (if any) can be moved prematurely from the queue of stalled insns into the ready list during the second scheduling pass.
-fno-sched-stalled-insns means that no insns are moved prematurely,
-fsched-stalled-insns=0 means there is no limit on how many queued insns can be moved prematurely.
-fsched-stalled-insns without a value is equivalent to
-fsched-stalled-insns=1.
-fsched-stalled-insns-dep
-fsched-stalled-insns-dep=
n- Define how many insn groups (cycles) are examined for a dependency on a stalled insn that is a candidate for premature removal from the queue of stalled insns. This has an effect only during
the second scheduling pass, and only if -fsched-stalled-insns is used.
-fno-sched-stalled-insns-dep is equivalent to
-fsched-stalled-insns-dep=0. -fsched-stalled-insns-dep without a value is equivalent to
-fsched-stalled-insns-dep=1.
-fsched2-use-superblocks
- When scheduling after register allocation, use superblock scheduling. This allows motion across basic block boundaries, resulting in faster schedules. This option is experimental, as not all
machine descriptions used by GCC model the CPU closely enough to avoid unreliable results from the algorithm.
This only makes sense when scheduling after register allocation, i.e. with -fschedule-insns2 or at -O2 or higher.
-fsched-group-heuristic
- Enable the group heuristic in the scheduler. This heuristic favors the instruction that belongs to a schedule group. This is enabled by default when scheduling is enabled, i.e. with
-fschedule-insns or -fschedule-insns2 or at
-O2 or higher.
-fsched-critical-path-heuristic
- Enable the critical-path heuristic in the scheduler. This heuristic favors instructions on the critical path. This is enabled by default when scheduling is enabled, i.e. with
-fschedule-insns or -fschedule-insns2 or at
-O2 or higher.
-fsched-spec-insn-heuristic
- Enable the speculative instruction heuristic in the scheduler. This heuristic favors speculative instructions with greater dependency weakness. This is enabled by default when scheduling
is enabled, i.e. with -fschedule-insns or
-fschedule-insns2 or at -O2 or higher.
-fsched-rank-heuristic
- Enable the rank heuristic in the scheduler. This heuristic favors the instruction belonging to a basic block with greater size or frequency. This is enabled by default when scheduling is enabled,
i.e. with -fschedule-insns or -fschedule-insns2 or at
-O2 or higher.
-fsched-last-insn-heuristic
- Enable the last-instruction heuristic in the scheduler. This heuristic favors the instruction that is less dependent on the last instruction scheduled. This is enabled by default when scheduling
is enabled, i.e. with -fschedule-insns or
-fschedule-insns2 or at -O2 or higher.
-fsched-dep-count-heuristic
- Enable the dependent-count heuristic in the scheduler. This heuristic favors the instruction that has more instructions depending on it. This is enabled by default when scheduling is enabled,
i.e. with -fschedule-insns or -fschedule-insns2 or at
-O2 or higher.
-freschedule-modulo-scheduled-loops
- Modulo scheduling is performed before traditional scheduling. If a loop is modulo scheduled, later scheduling passes may change its schedule. Use this option to control that behavior.
-fselective-scheduling
- Schedule instructions using selective scheduling algorithm. Selective scheduling runs instead of the first scheduler pass.
-fselective-scheduling2
- Schedule instructions using selective scheduling algorithm. Selective scheduling runs instead of the second scheduler pass.
-fsel-sched-pipelining
- Enable software pipelining of innermost loops during selective scheduling. This option has no effect unless one of
-fselective-scheduling or -fselective-scheduling2 is turned on.
-fsel-sched-pipelining-outer-loops
- When pipelining loops during selective scheduling, also pipeline outer loops. This option has no effect unless
-fsel-sched-pipelining is turned on.
-fsemantic-interposition
- Some object formats, like ELF, allow interposing of symbols by dynamic linker. This means that for symbols exported from the DSO compiler can not perform inter-procedural propagation, inlining
and other optimizations in anticipation that the function or variable in question may change. While this feature is useful, for example, to rewrite memory allocation functions by a debugging implementation, it is expensive in the terms of code quality. With
-fno-semantic-inteposition compiler assumest that if interposition happens for functions the overwritting function will have precisely same semantics (and side effects). Similarly if interposition happens for variables,
the constructor of the variable will be the same. The flag has no effect for functions explicitly declared inline, where interposition changing semantic is never allowed and for symbols explicitly declared weak.
-fshrink-wrap
- Emit function prologues only before parts of the function that need it, rather than at the top of the function. This flag is enabled by default at
-O and higher.
-fcaller-saves
- Enable allocation of values to registers that are clobbered by function calls, by emitting extra instructions to save and restore the registers around such calls. Such allocation is done only when it
seems to result in better code.
This option is always enabled by default on certain machines, usually those which have no call-preserved registers to use instead.
Enabled at levels -O2, -O3, -Os.
-fcombine-stack-adjustments
- Tracks stack adjustments (pushes and pops) and stack memory references and then tries to find ways to combine them.
Enabled by default at -O1 and higher.
-fuse-caller-save
- Use caller save registers for allocation if those registers are not used by any called function. In that case it is not necessary to save and restore them around calls. This is only possible if called functions are part of same compilation unit as current
function and they are compiled before it.
Enabled at levels -O2, -O3, -Os.
-fconserve-stack
- Attempt to minimize stack usage. The compiler attempts to use less stack space, even if that makes the program slower. This option implies setting the
large-stack-frame parameter to 100 and the
large-stack-frame-growth parameter to 400.
-ftree-reassoc
- Perform reassociation on trees. This flag is enabled by default at
-O and higher.
-ftree-pre
- Perform partial redundancy elimination (PRE) on trees. This flag is enabled by default at
-O2 and -O3.
-ftree-partial-pre
- Make partial redundancy elimination (PRE) more aggressive. This flag is enabled by default at
-O3.
-ftree-forwprop
- Perform forward propagation on trees. This flag is enabled by default at
-O and higher.
-ftree-fre
- Perform full redundancy elimination (FRE) on trees. The difference between FRE and PRE is that FRE only considers expressions that are computed on all paths leading to the redundant computation. This analysis
is faster than PRE, though it exposes fewer redundancies. This flag is enabled by default at
-O and higher.
-ftree-phiprop
- Perform hoisting of loads from conditional pointers on trees. This pass is enabled by default at
-O and higher.
-fhoist-adjacent-loads
- Speculatively hoist loads from both branches of an if-then-else if the loads are from adjacent locations in the same structure and the target architecture has a conditional move instruction. This
flag is enabled by default at -O2 and higher.
-ftree-copy-prop
- Perform copy propagation on trees. This pass eliminates unnecessary copy operations. This flag is enabled by default at
-O and higher.
-fipa-pure-const
- Discover which functions are pure or constant. Enabled by default at
-O and higher.
-fipa-reference
- Discover which static variables do not escape the compilation unit. Enabled by default at
-O and higher.
-fipa-pta
- Perform interprocedural pointer analysis and interprocedural modification and reference analysis. This option can cause excessive memory and compile-time usage on large compilation units. It is not enabled
by default at any optimization level.
-fipa-profile
- Perform interprocedural profile propagation. The functions called only from cold functions are marked as cold. Also functions executed once (such as
cold
,noreturn
, static constructors or destructors) are identified. Cold functions and loop less parts of functions executed once are then optimized for size. Enabled by default at -O and higher. -fipa-cp
- Perform interprocedural constant propagation. This optimization analyzes the program to determine when values passed to functions are constants and then optimizes accordingly. This optimization can substantially
increase performance if the application has constants passed to functions. This flag is enabled by default at
-O2, -Os and
-O3.
-fipa-cp-clone
- Perform function cloning to make interprocedural constant propagation stronger. When enabled, interprocedural constant propagation performs function cloning when externally visible function can be called
with constant arguments. Because this optimization can create multiple copies of functions, it may significantly increase code size (see
--param ipcp-unit-growth=value). This flag is enabled by default at
-O3.
-fisolate-erroneous-paths-dereference
- Detect paths which trigger erroneous or undefined behaviour due to dereferencing a NULL pointer. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behaviour into a trap.
-fisolate-erroneous-paths-attribute
- Detect paths which trigger erroneous or undefined behaviour due a NULL value being used in a way which is forbidden by a
returns_nonnull
ornonnull
attribute. Isolate those paths from the main control flow and turn the statement with erroneous or undefined behaviour into a trap. This is not currently enabled, but may be enabled by-O2
in the future. -ftree-sink
- Perform forward store motion on trees. This flag is enabled by default at
-O and higher.
-ftree-bit-ccp
- Perform sparse conditional bit constant propagation on trees and propagate pointer alignment information. This pass only operates on local scalar variables and is enabled by default at
-O and higher. It requires that -ftree-ccp is enabled.
-ftree-ccp
- Perform sparse conditional constant propagation (CCP) on trees. This pass only operates on local scalar variables and is enabled by default at
-O and higher.
-fssa-phiopt
- Perform pattern matching on SSA PHI nodes to optimize conditional code. This pass is enabled by default at
-O and higher.
-ftree-switch-conversion
- Perform conversion of simple initializations in a switch to initializations from a scalar array. This flag is enabled by default at
-O2 and higher.
-ftree-tail-merge
- Look for identical code sequences. When found, replace one with a jump to the other. This optimization is known as tail merging or cross jumping. This flag is enabled by default at
-O2 and higher. The compilation time in this pass can be limited using
max-tail-merge-comparisons parameter and
max-tail-merge-iterations parameter.
-ftree-dce
- Perform dead code elimination (DCE) on trees. This flag is enabled by default at
-O and higher.
-ftree-builtin-call-dce
- Perform conditional dead code elimination (DCE) for calls to built-in functions that may set
errno
but are otherwise side-effect free. This flag is enabled by default at -O2 and higher if -Os is not also specified. -ftree-dominator-opts
- Perform a variety of simple scalar cleanups (constant/copy propagation, redundancy elimination, range propagation and expression simplification) based on a dominator tree traversal. This also
performs jump threading (to reduce jumps to jumps). This flag is enabled by default at
-O and higher.
-ftree-dse
- Perform dead store elimination (DSE) on trees. A dead store is a store into a memory location that is later overwritten by another store without any intervening loads. In this case the earlier store can
be deleted. This flag is enabled by default at -O and higher.
-ftree-ch
- Perform loop header copying on trees. This is beneficial since it increases effectiveness of code motion optimizations. It also saves one jump. This flag is enabled by default at
-O and higher. It is not enabled for
-Os, since it usually increases code size.
-ftree-loop-optimize
- Perform loop optimizations on trees. This flag is enabled by default at
-O and higher.
-ftree-loop-linear
- Perform loop interchange transformations on tree. Same as
-floop-interchange. To use this code transformation, GCC has to be configured with
--with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure.
-floop-interchange
- Perform loop interchange transformations on loops. Interchanging two nested loops switches the inner and outer loops. For example, given a loop like:
DO J = 1, M DO I = 1, N A(J, I) = A(J, I) * C ENDDO ENDDO
loop interchange transforms the loop as if it were written:
DO I = 1, N DO J = 1, M A(J, I) = A(J, I) * C ENDDO ENDDO
which can be beneficial when
N
is larger than the caches, because in Fortran, the elements of an array are stored in memory contiguously by column, and the original loop iterates over rows, potentially creating at each access a cache miss. This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure. -floop-strip-mine
- Perform loop strip mining transformations on loops. Strip mining splits a loop into two nested loops. The outer loop has strides equal to the strip size and the inner loop has strides of the original
loop within a strip. The strip length can be changed using the loop-block-tile-size parameter. For example, given a loop like:
DO I = 1, N A(I) = A(I) + C ENDDO
loop strip mining transforms the loop as if it were written:
DO II = 1, N, 51 DO I = II, min (II + 50, N) A(I) = A(I) + C ENDDO ENDDO
This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure.
-floop-block
- Perform loop blocking transformations on loops. Blocking strip mines each loop in the loop nest such that the memory accesses of the element loops fit inside caches. The strip length can be changed using
the loop-block-tile-size parameter. For example, given a loop like:
DO I = 1, N DO J = 1, M A(J, I) = B(I) + C(J) ENDDO ENDDO
loop blocking transforms the loop as if it were written:
DO II = 1, N, 51 DO JJ = 1, M, 51 DO I = II, min (II + 50, N) DO J = JJ, min (JJ + 50, M) A(J, I) = B(I) + C(J) ENDDO ENDDO ENDDO ENDDO
which can be beneficial when
M
is larger than the caches, because the innermost loop iterates over a smaller amount of data which can be kept in the caches. This optimization applies to all the languages supported by GCC and is not limited to Fortran. To use this code transformation, GCC has to be configured with --with-ppl and --with-cloog to enable the Graphite loop transformation infrastructure. -fgraphite-identity
- Enable the identity transformation for graphite. For every SCoP we generate the polyhedral representation and transform it back to gimple. Using
-fgraphite-identity we can check the costs or benefits of the GIMPLE -> GRAPHITE -> GIMPLE transformation. Some minimal optimizations are also performed by the code generator CLooG, like index splitting and dead code
elimination in loops.
-floop-nest-optimize
- Enable the ISL based loop nest optimizer. This is a generic loop nest optimizer based on the Pluto optimization algorithms. It calculates a loop structure optimized for data-locality and parallelism.
This option is experimental.
-floop-parallelize-all
- Use the Graphite data dependence analysis to identify loops that can be parallelized. Parallelize all the loops that can be analyzed to not contain loop carried dependences without checking that
it is profitable to parallelize the loops.
-fcheck-data-deps
- Compare the results of several data dependence analyzers. This option is used for debugging the data dependence analyzers.
-ftree-loop-if-convert
- Attempt to transform conditional jumps in the innermost loops to branch-less equivalents. The intent is to remove control-flow from the innermost loops in order to improve the ability of the vectorization pass to handle these loops. This is enabled by default
if vectorization is enabled.
-ftree-loop-if-convert-stores
- Attempt to also if-convert conditional jumps containing memory writes. This transformation can be unsafe for multi-threaded programs as it transforms conditional memory writes into unconditional memory writes. For example,
for (i = 0; i < N; i++) if (cond) A[i] = expr;
is transformed to
for (i = 0; i < N; i++) A[i] = cond ? expr : A[i];
potentially producing data races.
-ftree-loop-distribution
- Perform loop distribution. This flag can improve cache performance on big loop bodies and allow further loop optimizations, like parallelization or vectorization, to take place. For example, the loop
DO I = 1, N A(I) = B(I) + C D(I) = E(I) * F ENDDO
is transformed to
DO I = 1, N A(I) = B(I) + C ENDDO DO I = 1, N D(I) = E(I) * F ENDDO
-ftree-loop-distribute-patterns
- Perform loop distribution of patterns that can be code generated with calls to a library. This flag is enabled by default at
-O3.
This pass distributes the initialization loops and generates a call to memset zero. For example, the loop
DO I = 1, N A(I) = 0 B(I) = A(I) + I ENDDO
is transformed to
DO I = 1, N A(I) = 0 ENDDO DO I = 1, N B(I) = A(I) + I ENDDO
and the initialization loop is transformed into a call to memset zero.
-ftree-loop-im
- Perform loop invariant motion on trees. This pass moves only invariants that are hard to handle at RTL level (function calls, operations that expand to nontrivial sequences of insns). With
-funswitch-loops it also moves operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion.
-ftree-loop-ivcanon
- Create a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily. Useful
especially in connection with unrolling.
-fivopts
- Perform induction variable optimizations (strength reduction, induction variable merging and induction variable elimination) on trees.
-ftree-parallelize-loops=n
- Parallelize loops, i.e., split their iteration space to run in n threads. This is only possible for loops whose iterations are independent and can be arbitrarily reordered. The optimization
is only profitable on multiprocessor machines, for loops that are CPU-intensive, rather than constrained e.g. by memory bandwidth. This option implies
-pthread, and thus is only supported on targets that have support for
-pthread.
-ftree-pta
- Perform function-local points-to analysis on trees. This flag is enabled by default at
-O and higher.
-ftree-sra
- Perform scalar replacement of aggregates. This pass replaces structure references with scalars to prevent committing structures to memory too early. This flag is enabled by default at
-O and higher.
-ftree-copyrename
- Perform copy renaming on trees. This pass attempts to rename compiler temporaries to other variables at copy locations, usually resulting in variable names which more closely resemble the original
variables. This flag is enabled by default at -O and higher.
-ftree-coalesce-inlined-vars
- Tell the copyrename pass (see
-ftree-copyrename) to attempt to combine small user-defined variables too, but only if they were inlined from other functions. It is a more limited form of
-ftree-coalesce-vars. This may harm debug information of such inlined variables, but it will keep variables of the inlined-into function apart from each other, such that they are more likely to contain the expected values
in a debugging session. This was the default in GCC versions older than 4.7.
-ftree-coalesce-vars
- Tell the copyrename pass (see
-ftree-copyrename) to attempt to combine small user-defined variables too, instead of just compiler temporaries. This may severely limit the ability to debug an optimized program compiled with
-fno-var-tracking-assignments. In the negated form, this flag prevents SSA coalescing of user variables, including inlined ones. This option is enabled by default.
-ftree-ter
- Perform temporary expression replacement during the SSA->normal phase. Single use/single def temporaries are replaced at their use location with their defining expression. This results in non-GIMPLE code,
but gives the expanders much more complex trees to work on resulting in better RTL generation. This is enabled by default at
-O and higher.
-ftree-slsr
- Perform straight-line strength reduction on trees. This recognizes related expressions involving multiplications and replaces them by less expensive calculations when possible. This is enabled by default
at -O and higher.
-ftree-vectorize
- Perform vectorization on trees. This flag enables
-ftree-loop-vectorize and -ftree-slp-vectorize if not explicitly specified.
-ftree-loop-vectorize
- Perform loop vectorization on trees. This flag is enabled by default at
-O3 and when -ftree-vectorize is enabled.
-ftree-slp-vectorize
- Perform basic block vectorization on trees. This flag is enabled by default at
-O3 and when -ftree-vectorize is enabled.
-fvect-cost-model=
model- Alter the cost model used for vectorization. The
model argument should be one of
unlimited
,dynamic
orcheap
. With theunlimited
model the vectorized code-path is assumed to be profitable while with thedynamic
model a runtime check will guard the vectorized code-path to enable it only for iteration counts that will likely execute faster than when executing the original scalar loop. Thecheap
model will disable vectorization of loops where doing so would be cost prohibitive for example due to required runtime checks for data dependence or alignment but otherwise is equal to thedynamic
model. The default cost model depends on other optimization flags and is eitherdynamic
orcheap
. -fsimd-cost-model=
model- Alter the cost model used for vectorization of loops marked with the OpenMP or Cilk Plus simd directive. The
model argument should be one of
unlimited
,dynamic
,cheap
. All values of model have the same meaning as described in -fvect-cost-model and by default a cost model defined with -fvect-cost-model is used. -ftree-vrp
- Perform Value Range Propagation on trees. This is similar to the constant propagation pass, but instead of values, ranges of values are propagated. This allows the optimizers to remove unnecessary range
checks like array bound checks and null pointer checks. This is enabled by default at
-O2 and higher. Null pointer check elimination is only done if
-fdelete-null-pointer-checks is enabled.
-ftracer
- Perform tail duplication to enlarge superblock size. This transformation simplifies the control flow of the function allowing other optimizations to do a better job.
-funroll-loops
- Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop.
-funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
-funroll-all-loops
- Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly.
-funroll-all-loops implies the same options as
-funroll-loops,
-fsplit-ivs-in-unroller
- Enables expression of values of induction variables in later iterations of the unrolled loop using the value in the first iteration. This breaks long dependency chains, thus improving efficiency
of the scheduling passes.
A combination of -fweb and CSE is often sufficient to obtain the same effect. However, that is not reliable in cases where the loop body is more complicated than a single basic block. It also does not work at all on some architectures due to restrictions in the CSE pass.
This optimization is enabled by default.
-fvariable-expansion-in-unroller
- With this option, the compiler creates multiple copies of some local variables when unrolling a loop, which can result in superior code.
-fpartial-inlining
- Inline parts of functions. This option has any effect only when inlining itself is turned on by the
-finline-functions or -finline-small-functions options.
Enabled at level -O2.
-fpredictive-commoning
- Perform predictive commoning optimization, i.e., reusing computations (especially memory loads and stores) performed in previous iterations of loops.
This option is enabled at level -O3.
-fprefetch-loop-arrays
- If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays.
This option may generate better or worse code; results are highly dependent on the structure of loops within the source code.
Disabled at level -Os.
-fno-peephole
-fno-peephole2
- Disable any machine-specific peephole optimizations. The difference between
-fno-peephole and -fno-peephole2 is in how they are implemented in the compiler; some targets use one, some use the other, a few use both.
-fpeephole is enabled by default. -fpeephole2 enabled at levels -O2, -O3, -Os.
-fno-guess-branch-probability
- Do not guess branch probabilities using heuristics.
GCC uses heuristics to guess branch probabilities if they are not provided by profiling feedback (-fprofile-arcs). These heuristics are based on the control flow graph. If some branch probabilities are specified by ‘__builtin_expect’, then the heuristics are used to guess branch probabilities for the rest of the control flow graph, taking the ‘__builti