Kernel/Optimization
- Fix GCC PGO
- Add other ways to apply patches due to new kernel package like sys-kernel/gentoo-kernel
Please read prerequisites first before reading other sections.
This article describes various optimizations for the Linux kernel, including speed and hardening.
Prerequisites
Some optimization methods described here MAY break the kernel in unexpected ways, including slower kernel runtime and even rendering the system UNUSABLE. It is highly recommend to backup the system or to use virtual machine. They may also require downloading git patches. Read the entire patches before running the patches as they may have malware in the patches. After reading, users should ensure that the correct version of the patches are applied for the used kernel version.
Some CONFIG options require other CONFIG options to be set/unset, including if the architecture/compiler supports such CONFIG option. A optimization method describe here may need to sacrifice other kinds of optimization e.g enabling register zeroing on function exit increases hardening at the cost of performance.
Higher version of GCC/Clang and/or the kernel don't mean they give out higher optimizations due to software regression. Full list of regressions links are: GCC, Clang, and official kernel.
The article assumes the user is using sys-kernel/gentoo-sources
and that /usr/src/linux is the symbolic link to the current kernel. Change directory to /usr/src/linux before continuing:
user $
cd /usr/src/linux
One way to optimize the kernel is to remove what users don't need. For example, if not using KVM, then remove CONFIG_KVM:
Virtualization --->
< > Kernel-based Virtual Machine (KVM) support
Kbuild
The Kernel build system can be used to change how Kernel builds in a more advanced way than make *config, similar to GNU Make. Kbuild also support Environment Variables like LLVM=1. For example, the kernel will be build with LLVM and with aggressive optimization flags:
root #
make LLVM=1 KCFLAGS="-O3 -march=native -pipe"
Experimental USE flag
The user may turn on Experimental
USE flag to be able to use more features, like -march=native:
sys-kernel/gentoo-sources experimental
Clang/LLVM
DO NOT MIX GNU binutils and LLVM binutils for casual usage!!! For example make CC=gcc LD=ld.lld AR=llvm-ar will not work because LLVM's ar and ld is not compatible with GCC.
Refer to The Linux Kernel Organization's latest document Building Linux with Clang/LLVM for more information.
Make sure the LLVM toolchain is installed before proceeding:
user $
emerge --pretend --noreplace sys-devel/clang sys-devel/llvm sys-libs/compiler-rt sys-libs/llvm-libunwind sys-devel/lld
By default, the kernel is build under GNU binutils. The following environment variables are used: CC, LD, AR, NM, STRIP, OBJCOPY, OBJDUMP, READELF, HOSTCC, HOSTCXX, HOSTAR, and HOSTLD. Alternatively, the kernel may be build using LLVM binutils:
root #
make LLVM=1 LLVM_IAS=1
*FLAGS
For more list of *FLAGS to play with, see the GCC manual and Clang manual or man 1 gcc and man 1 clang commands.
By default, most of the kernel is build with C's -O2
(some code, like Random Number Generation, does not work with optimizations and sometimes checked with the C macro __OPTIMIZE__
). This can be changed via Kbuild. Before making any KCFLAGS and similar flags, please check the kernel's Makefiles before it gets any changes. For example, -fallow-store-data-races
is disabled on this Makefile.
-O3
The command to add this flag is:
root #
make KCFLAGS="-O3"
There was a official attempt to add -O3
to the kernel but Linus Torvalds reject it due to -O3
historically outputting worse code than -O2
. Phoronix ran a -O3
kernel benchmark and found nearly all tested programs to have no measurable benefit.
Performance
Performance means how fast the kernel runs.
Link Time Optimization
LTO memory usage may surpass RAM for 32 bit systems, so it may need to be disabled.
Enabling Link Time Optimization is not simple as make KCFLAGS="-flto". Except Clang's ThinLTO, the whole kernel will be recompiled if at least one CONFIG option change. See Clang LTO and GCC LTO for more information.
GCC LTO
Andi Kleen and others has a experimental patches for this and will used to apply GCC LTO. For more information, see LWN article.
First, download the following 2 patches from CachyOS's kernel patches:
user $
curl -o gcc-lto.patch https://raw.githubusercontent.com/CachyOS/kernel-patches/refs/heads/master/6.5/misc/gcc-lto/0001-gcc-lto.patch
user $
curl -o gcc-lto-no-pie.patch https://raw.githubusercontent.com/CachyOS/kernel-patches/master/6.5/misc/gcc-lto/0002-gcc-lto-no-pie.patch
Then, apply the patch using git:
root #
mv gcc-lto.patch
root #
git apply gcc-lto-no-pie.patch
Alternatively, when using distribution kernels users can add these patches in /etc/portage/patches/sys-kernel/*-kernel/*.patch.
Afterwards, enable GCC LTO on the kernel and enjoy:
root #
make oldconfig
Link Time Optimization (LTO) > 1. None (LTO_NONE) 2. gcc LTO (LTO_GCC) (NEW) choice[1-2?]: 2 Allow aggressive cloning for function specialization (LTO_CP_CLONE) [N/y/?] (NEW) n
To remove the patch:
root #
git apply gcc-lto.patch --reverse
root #
git apply gcc-lto-no-pie.patch --reverse
root #
rm gcc-lto.patch gcc-lto-no-pie.patch
Clang LTO
Clang's Link Time Optimization can be either FullLTO or ThinLTO for 5.12+ Linux kernel:
General architecture-dependent options --->
Link Time Optimization (LTO) (Clang ThinLTO (EXPERIMENTAL)) --->
( ) None
( ) Clang Full LTO (EXPERIMENTAL)
(X) Clang ThinLTO (EXPERIMENTAL)
The difference between these two are that ThinLTO compiles faster due to parallelization and less memory usage. ThinLTO may sometimes improve performance or decrease performance.
Profile Guided Optimization
The Clang's package sys-devel/clang-runtime will pull in sys-libs/compiler-rt-sanitizers by default via the sanitize
USE flags to be able to use Profile Guided Optimization. Users who customize their USE flags and don't want the extra Clang sanitizers will need to ensure profile
and orc
are set locally in /etc/portage/package.use:
root #
nano /etc/portage/package.use/compiler-rt-sanitizers.use
# required USE flags for pgo
sys-libs/compiler-rt-sanitizers profile orc
Install the Clang sanitizers:
root #
emerge --ask --changed-use sys-libs/compiler-rt-sanitizers
GCC PGO
To use Profile-Guided Optimization, activate debugfs and gcov support (See this for modern info):
Kernel hacking --->
Generic Kernel Debugging Instruments --->
[*] Debug Filesystem
General architecture-dependent options --->
GCOV-based kernel profiling --->
[*] Enable gcov-based kernel profiling
[*] Profile entire Kernel
The environment variable CFLAGS_GCOV, used when CONFIG_GCOV_KERNEL is on, defaults to -fprofile-arcs -ftest-coverage
, but can be changed to -fprofile-generate -ftest-coverage
or similar in Instrumentation Options:
root #
make CFLAGS_GCOV="-fprofile-generate -ftest-coverage"
Then build as usual, setup the kernel and reboot the system using the command:
root #
reboot
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.
After booted back to system, run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, copy /sys/kernel/debug/gcov/usr/src/linux/*gcda files to /usr/src/linux:
root #
cd /sys/kernel/debug/gcov/usr/src/linux
root #
find . -name '*.gcda' -exec cp {} /usr/src/linux/{} \;
Then disable CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL and edit the KCFLAGS:
General architecture-dependent options --->
GCOV-based kernel profiling --->
[ ] Enable gcov-based kernel profiling
[ ] Profile entire Kernel
root #
make KCFLAGS="-fprofile-use -fprofile-correction -Wno-error=missing-profile -Wno-error=coverage-mismatch"
Like before, setup the kernel and finally reboot. To remove /usr/src/linux/*gcda files, run the command:
root #
cd /usr/src/linux
root #
find . -name '*.gcda' -exec rm {} \;
Clang PGO
Download the patch and apply the patch:
user $
curl -o clang-pgo.patch https://raw.githubusercontent.com/CachyOS/kernel-patches/refs/heads/master/6.5/misc/0001-Clang-PGO.patch
root #
git apply clang-pgo.patch
Configure the kernel with LLVM:
root #
make menuconfig LLVM=1
Configure the kernel as follows:
General architecture-dependent options --->
Link Time Optimization (LTO) (None) --->
(X) None
( ) Clang Full LTO (EXPERIMENTAL)
( ) Clang ThinLTO (EXPERIMENTAL)
Profile Guided Optimization (PGO) (EXPERIMENTAL) --->
[*] Enable clang's PGO-based kernel profiling
Then build as usual, setup the kernel and reboot the system using the command:
root #
reboot
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.
After booted back to system, clear any PGO data:
root #
echo 1 | tee /proc/pgo/reset
Run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, collect the raw profile data:
root #
cp -a /proc/pgo/vmlinux.profraw /tmp/vmlinux.profraw
Then process the raw profile data using llvm-profdata:
user $
cd /usr/src/linux
root #
llvm-profdata merge --output=vmlinux.profdata /tmp/vmlinux.profraw
Disable Clang's PGO and optionally enable Clang's LTO:
root #
make menuconfig LLVM=1
General architecture-dependent options --->
Link Time Optimization (LTO) (None) --->
( ) None
( ) Clang Full LTO (EXPERIMENTAL)
(X) Clang ThinLTO (EXPERIMENTAL)
Profile Guided Optimization (PGO) (EXPERIMENTAL) --->
[ ] Enable clang's PGO-based kernel profiling
Then compile the kernel with Clang's PGO and enjoy the faster kernel:
root #
make KCFLAGS=-fprofile-use=/usr/src/linux
The user may now remove the patch:
root #
git apply clang-pgo.patch --reverse
root #
rm clang-pgo.patch
Hardened
Hardening refers to reducing the potential for malware to damage the system.
Removing module support (CONFIG_MODULES) prevents the kernel from loading code at runtime but many drivers will not work without module. The alternative is to use only signed modules.
Processor type and features --->
[*] Randomize the address of the kernel image (KASLR)
Power management and ACPI options --->
[ ] Hibernation (aka 'suspend to disk')
Memory Management options --->
[ ] Disable heap randomization
Security options --->
Kernel hardening options --->
Memory initialization --->
Initialize kernel stack variables at function entry (zero-init everything (strongest and safest)) --->
( ) no automatic stack variable initialization (weakest)
( ) pattern-init everything (strongest)
(X) zero-init everything (strongest and safest)
[*] Poison kernel stack before returning from syscalls
[*] Enable heap memory zeroing on allocation by default
[*] Enable heap memory zeroing on free by default
[*] Enable register zeroing on function exit
Kernel hacking --->
Memory Debugging --->
[*] Debug VM translations
Pietinger, Kicksecure, and Clip OS has more hardened config options to the kernel.
Size
This section describes reducing kernel memory usage (useful for embedded systems).
5.4+ kernel officially support -Os
flag:
General setup --->
Compiler optimization level (Optimize for performance (-Os)) --->
( ) Optimize for performance (-O2)
(X) Optimize for size (-Os)
-Oz
may also be instead use to more aggressively reduce size than -Os
:
root #
make KCFLAGS="-Oz"