效能最佳化：編譯器最佳化選項 -O2/-O3 究竟有多強大？

Zijian/TENG發表於2024-03-15

原文網址 : https://www.cnblogs.com/tengzijian/p/18075365

之前的“效能最佳化的一般策略及方法”一文中介紹了多種效能最佳化的方法。根據以往的專案經驗，開啟編譯器最佳化選項可能是立竿見影、成本最低、效果最好的方式了。

這麼說可能還不夠直觀，舉個真實的例子：我所參與的自動駕駛的專案中，無需修改任何程式碼，僅僅增加一個 -O2 選項，程序整體的 CPU loading 可以從 50% 降到 30% 左右，某些關鍵函式的執行時間可以從 1700us 降低到 700us 左右。

編譯器能最佳化能力遠比你想象中的強大！往後翻翻附錄，看看那些多到讓人眼花的最佳化選項你就知道，很多的人工最佳化都是不必要的，編譯器會做得更快，更好，更安全！人工最佳化，不僅會降低程式碼的可讀性和可維護性，而且非常容易引入 bug！

實際上，不管是 -O2 還是 -O3，都是一組最佳化選項的集合，要知道具體做了什麼，可以透過 gcc/g++ 的 -c -Q --help=optimizers 引數

例如我用的 aarch64-unknown-nto-qnx7.1.0-g++ 編譯器，如果想知道加了 -O2 之後開啟了哪些最佳化項，可以透過以下 3 條命令：

$ aarch64-unknown-nto-qnx7.1.0-g++ -c -Q -O2 --help=optimizers > /tmp/O2-opts
$ aarch64-unknown-nto-qnx7.1.0-g++ -c -Q --help=optimizers > /tmp/O-opts
$ diff /tmp/O2-opts /tmp/O-opts | grep enabled
<   -fdevirtualize                              [enabled]
<   -finline-functions-called-once              [enabled]
<   -finline-small-functions                    [enabled]
<   -foptimize-strlen                           [enabled]
<   -freorder-blocks                            [enabled]
<   -freorder-functions                         [enabled]
<   -ftree-switch-conversion                    [enabled]
<   -ftree-tail-merge                           [enabled]
...

隨便看了幾個，就足以感受到編譯器最佳化選項的強大：

finline-xxx：行內函數，以避免函式呼叫開銷。順便提一句：程式碼中的 inline 關鍵字只是一個對編譯器的提示，編譯器會根據具體情況作出最佳的選擇，無論是否有 inline 關鍵字
fdevirtualize：嘗試把虛擬函式呼叫轉換為直接呼叫，以避免虛擬函式導致的額外開銷
freorder-blocks：對函式中的程式碼塊重新排序，以減少分支數、提高程式碼區域性性
freorder-functions：對物件中函式重新排序，以提升程式碼區域性性：把經常執行的函式放到 ".text.hot" 節，不常執行的函式放到 ".text.unlikely" 節
...

完整的最佳化項很多，具體每個選項的確切解釋需要檢視編譯器手冊。

小結

如果效能不理想，先檢查是否開啟了編譯器最佳化選項。這可能是最快、最有效的手段了。
編譯器能最佳化能力遠比你想象中的強大！
不要在沒有開啟最佳化選項的時候就開始盲目改程式碼，很多都是徒勞，甚至降低效能、引入 bug：編譯器最佳化會做得更快、更安全
如果開了最佳化選項，你的程式出現問題，不要懷疑編譯器，大機率是因為你的程式碼不規範，使用了 C/C++ “未定義”行為導致的
需要注意，在汽車領域中，對最佳化選項有一定的限制，比如我的專案中，編譯器的 Safety Manual 明確說明了最大隻支援 -O2 的最佳化等級

附錄

授人以漁

關於這個問題，我第一開始想到的是問 ChatGPT，但是得到的結果並不滿意。然後想到的是 RTFM！

man gcc

線上版本：https://manpages.org/gcc

搜尋關鍵字 /optimiz，很快就找到了我要的答案：

gcc 支援的最佳化選項

 Optimization Options
           -faggressive-loop-optimizations -falign-functions[=n[:m:[n2[:m2]]]] -falign-jumps[=n[:m:[n2[:m2]]]] -falign-labels[=n[:m:[n2[:m2]]]]
           -falign-loops[=n[:m:[n2[:m2]]]] -fno-allocation-dce -fallow-store-data-races -fassociative-math  -fauto-profile  -fauto-profile[=path]
           -fauto-inc-dec  -fbranch-probabilities -fcaller-saves -fcombine-stack-adjustments  -fconserve-stack -fcompare-elim  -fcprop-registers
           -fcrossjumping -fcse-follow-jumps  -fcse-skip-blocks  -fcx-fortran-rules -fcx-limited-range -fdata-sections  -fdce  -fdelayed-branch
           -fdelete-null-pointer-checks  -fdevirtualize  -fdevirtualize-speculatively -fdevirtualize-at-ltrans  -fdse -fearly-inlining  -fipa-sra
           -fexpensive-optimizations  -ffat-lto-objects -ffast-math  -ffinite-math-only  -ffloat-store  -fexcess-precision=style -ffinite-loops
           -fforward-propagate  -ffp-contract=style  -ffunction-sections -fgcse  -fgcse-after-reload  -fgcse-las  -fgcse-lm  -fgraphite-identity
           -fgcse-sm  -fhoist-adjacent-loads  -fif-conversion -fif-conversion2  -findirect-inlining -finline-functions  -finline-functions-called-once
           -finline-limit=n -finline-small-functions -fipa-modref -fipa-cp  -fipa-cp-clone -fipa-bit-cp  -fipa-vrp  -fipa-pta  -fipa-profile
           -fipa-pure-const -fipa-reference  -fipa-reference-addressable -fipa-stack-alignment  -fipa-icf  -fira-algorithm=algorithm
           -flive-patching=level -fira-region=region  -fira-hoist-pressure -fira-loop-pressure  -fno-ira-share-save-slots -fno-ira-share-spill-slots
           -fisolate-erroneous-paths-dereference  -fisolate-erroneous-paths-attribute -fivopts  -fkeep-inline-functions  -fkeep-static-functions
           -fkeep-static-consts  -flimit-function-alignment  -flive-range-shrinkage -floop-block  -floop-interchange  -floop-strip-mine
           -floop-unroll-and-jam  -floop-nest-optimize -floop-parallelize-all  -flra-remat  -flto  -flto-compression-level -flto-partition=alg
           -fmerge-all-constants -fmerge-constants  -fmodulo-sched  -fmodulo-sched-allow-regmoves -fmove-loop-invariants  -fno-branch-count-reg
           -fno-defer-pop  -fno-fp-int-builtin-inexact  -fno-function-cse -fno-guess-branch-probability  -fno-inline  -fno-math-errno  -fno-peephole
           -fno-peephole2  -fno-printf-return-value  -fno-sched-interblock -fno-sched-spec  -fno-signed-zeros -fno-toplevel-reorder  -fno-trapping-math
           -fno-zero-initialized-in-bss -fomit-frame-pointer  -foptimize-sibling-calls -fpartial-inlining  -fpeel-loops  -fpredictive-commoning
           -fprefetch-loop-arrays -fprofile-correction -fprofile-use  -fprofile-use=path -fprofile-partial-training -fprofile-values
           -fprofile-reorder-functions -freciprocal-math  -free  -frename-registers  -freorder-blocks -freorder-blocks-algorithm=algorithm
           -freorder-blocks-and-partition  -freorder-functions -frerun-cse-after-loop  -freschedule-modulo-scheduled-loops -frounding-math
           -fsave-optimization-record -fsched2-use-superblocks  -fsched-pressure -fsched-spec-load  -fsched-spec-load-dangerous
           -fsched-stalled-insns-dep[=n]  -fsched-stalled-insns[=n] -fsched-group-heuristic  -fsched-critical-path-heuristic -fsched-spec-insn-heuristic
           -fsched-rank-heuristic -fsched-last-insn-heuristic  -fsched-dep-count-heuristic -fschedule-fusion -fschedule-insns  -fschedule-insns2
           -fsection-anchors -fselective-scheduling  -fselective-scheduling2 -fsel-sched-pipelining  -fsel-sched-pipelining-outer-loops
           -fsemantic-interposition  -fshrink-wrap  -fshrink-wrap-separate -fsignaling-nans -fsingle-precision-constant  -fsplit-ivs-in-unroller
           -fsplit-loops -fsplit-paths -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt -fstdarg-opt  -fstore-merging
           -fstrict-aliasing -fthread-jumps  -ftracer  -ftree-bit-ccp -ftree-builtin-call-dce  -ftree-ccp  -ftree-ch -ftree-coalesce-vars
           -ftree-copy-prop  -ftree-dce  -ftree-dominator-opts -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting -ftree-loop-if-convert
           -ftree-loop-im -ftree-phiprop  -ftree-loop-distribution  -ftree-loop-distribute-patterns -ftree-loop-ivcanon  -ftree-loop-linear
           -ftree-loop-optimize -ftree-loop-vectorize -ftree-parallelize-loops=n  -ftree-pre  -ftree-partial-pre  -ftree-pta -ftree-reassoc
           -ftree-scev-cprop  -ftree-sink  -ftree-slsr  -ftree-sra -ftree-switch-conversion  -ftree-tail-merge -ftree-ter  -ftree-vectorize  -ftree-vrp
           -funconstrained-commons -funit-at-a-time  -funroll-all-loops  -funroll-loops -funsafe-math-optimizations  -funswitch-loops -fipa-ra
           -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt -fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs --param
           name=value -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og

aarch64-unknown-nto-qnx7.1.0-g++ 加 `-O2` 相較於預設不加 `-O2` 增加的最佳化選項（完整列表）

$ aarch64-unknown-nto-qnx7.1.0-g++ -c -Q -O2 --help=optimizers > /tmp/O2-opts
$ aarch64-unknown-nto-qnx7.1.0-g++ -c -Q --help=optimizers > /tmp/O-opts
$ diff /tmp/O2-opts /tmp/O-opts | grep enabled
<   -falign-labels                              [enabled]
<   -fbranch-count-reg                          [enabled]
<   -fcaller-saves                              [enabled]
<   -fcode-hoisting                             [enabled]
<   -fcombine-stack-adjustments                 [enabled]
<   -fcompare-elim                              [enabled]
<   -fcprop-registers                           [enabled]
<   -fcrossjumping                              [enabled]
<   -fcse-follow-jumps                          [enabled]
<   -fdefer-pop                                 [enabled]
<   -fdevirtualize                              [enabled]
<   -fdevirtualize-speculatively                [enabled]
<   -fexpensive-optimizations                   [enabled]
<   -fforward-propagate                         [enabled]
<   -fgcse                                      [enabled]
<   -fguess-branch-probability                  [enabled]
<   -fhoist-adjacent-loads                      [enabled]
<   -fif-conversion                             [enabled]
<   -fif-conversion2                            [enabled]
<   -findirect-inlining                         [enabled]
<   -finline-functions-called-once              [enabled]
<   -finline-small-functions                    [enabled]
<   -fipa-bit-cp                                [enabled]
<   -fipa-cp                                    [enabled]
<   -fipa-icf                                   [enabled]
<   -fipa-icf-functions                         [enabled]
<   -fipa-icf-variables                         [enabled]
<   -fipa-profile                               [enabled]
<   -fipa-pure-const                            [enabled]
<   -fipa-ra                                    [enabled]
<   -fipa-reference                             [enabled]
<   -fipa-sra                                   [enabled]
<   -fipa-vrp                                   [enabled]
<   -fisolate-erroneous-paths-dereference       [enabled]
<   -flra-remat                                 [enabled]
<   -fmove-loop-invariants                      [enabled]
<   -foptimize-sibling-calls                    [enabled]
<   -foptimize-strlen                           [enabled]
<   -fpartial-inlining                          [enabled]
<   -fpeephole2                                 [enabled]
<   -freorder-blocks                            [enabled]
<   -freorder-functions                         [enabled]
<   -frerun-cse-after-loop                      [enabled]
<   -fsched-pressure                            [enabled]
<   -fschedule-insns                            [enabled]
<   -fschedule-insns2                           [enabled]
<   -fsection-anchors                           [enabled]
<   -fshrink-wrap                               [enabled]
<   -fsplit-wide-types                          [enabled]
<   -fssa-phiopt                                [enabled]
<   -fstore-merging                             [enabled]
<   -fstrict-aliasing                           [enabled]
<   -fthread-jumps                              [enabled]
<   -ftree-bit-ccp                              [enabled]
<   -ftree-builtin-call-dce                     [enabled]
<   -ftree-ccp                                  [enabled]
<   -ftree-ch                                   [enabled]
<   -ftree-coalesce-vars                        [enabled]
<   -ftree-copy-prop                            [enabled]
<   -ftree-dce                                  [enabled]
<   -ftree-dominator-opts                       [enabled]
<   -ftree-dse                                  [enabled]
<   -ftree-fre                                  [enabled]
<   -ftree-pre                                  [enabled]
<   -ftree-pta                                  [enabled]
<   -ftree-sink                                 [enabled]
<   -ftree-slsr                                 [enabled]
<   -ftree-sra                                  [enabled]
<   -ftree-switch-conversion                    [enabled]
<   -ftree-tail-merge                           [enabled]
<   -ftree-ter                                  [enabled]
<   -ftree-vrp                                  [enabled]

探索gcc編譯最佳化細節編譯器最佳化gcc -o3
2024-07-10
GC編譯
編譯器最佳化丨Cache最佳化
2022-12-02
編譯
深入瞭解Java JIT編譯器：原理與效能最佳化
2023-03-16
Java編譯
React 19 編譯器：2024 年最受歡迎的效能最佳化利器
2024-11-05
React編譯
-debug（C# 編譯器選項）
2019-02-16
C#編譯
oracle大表效能最佳化
2024-07-24
Oracle
C# 編譯器選項（Visual Studio配置）
2018-09-12
C#編譯
編譯器最佳化記錄(Mem2Reg+SSA Destruction)
2023-09-22
編譯Struct
伺服器SMB效能最佳化
2020-09-15
伺服器
伺服器效能最佳化文件
2024-07-26
伺服器
伺服器效能最佳化指南
2024-07-27
伺服器
9大效能最佳化經驗總結，強烈建議收藏！
2022-08-17
Unity效能最佳化CPU最佳化
2024-05-20
Unity
史上最全效能最佳化詳解(9大必備大廠最佳化方案)
2023-04-06
vc 編譯連線選項
2020-04-04
編譯
多庫取數的效能最佳化方案
2019-12-17
深入瞭解JVM虛擬機器8：Java的編譯期最佳化與執行期最佳化
2019-11-14
JVM虛擬機Java編譯
閘道器限流功能效能最佳化
2024-06-04
Unity效能最佳化GPU渲染最佳化
2024-05-20
UnityGPU
前端效能最佳化——圖片最佳化
2023-04-04
前端
記一起由 Clang 編譯器最佳化觸發的 Crash
2020-12-10
編譯
typescript 3.2 新編譯選項strictBindCallApply
2019-02-16
TypeScript編譯APP
cmake中新增 -g編譯選項
2020-10-08
編譯
前端效能最佳化百問大雜燴
2023-02-14
前端
JavaScript效能最佳化
2024-04-08
JavaScript
HarmonyOS 效能最佳化
2024-04-17
oracle 效能最佳化
2020-02-18
Oracle
MethodImpl最佳化效能
2024-11-12
前端效能最佳化
2020-12-18
前端
Unity效能最佳化記憶體最佳化
2024-05-20
Unity記憶體
SQL效能最佳化之索引最佳化法
2021-07-28
SQL索引
vue3編譯最佳化之“靜態提升”
2024-05-14
Vue編譯
KaiwuDB 多模資料庫-時序效能最佳化
2023-11-30
AI資料庫
Redis網路模型究竟有多強
2022-12-27
Redis模型
你知道 Vue3 中的編譯最佳化嗎？
2023-03-03
Vue編譯
Web 效能最佳化方法
2018-08-18
Web
網站效能最佳化
2024-06-17
網站
Mysql效能最佳化(三)
2022-10-25
MySql

效能最佳化：編譯器最佳化選項 -O2/-O3 究竟有多強大？

小結

附錄

授人以漁

gcc 支援的最佳化選項

aarch64-unknown-nto-qnx7.1.0-g++ 加 -O2 相較於預設不加 -O2 增加的最佳化選項（完整列表）

相關文章

aarch64-unknown-nto-qnx7.1.0-g++ 加 `-O2` 相較於預設不加 `-O2` 增加的最佳化選項（完整列表）