Kernel/Optimization

From Gentoo Wiki
Jump to:navigation Jump to:search
This page contains changes which are not marked for translation.
Article status
This article has some todo items:
  • Fix GCC PGO


Important
Please read prerequisites first before reading other sections.

This article describes various optimizations for the Linux kernel, including speed and hardening.

Prerequisites

Warning
Some optimization methods described here MAY break the kernel in unexpected ways, including slower kernel runtime and even rendering the system UNUSABLE. It is highly recommend to backup the system or to use virtual machine. They may also require downloading git patches. Read the entire patches before running the patches as they may have malware in the patches. After reading, users should ensure that the correct version of the patches are applied for the used kernel version.
Note
Some CONFIG options require other CONFIG options to be set/unset, including if the architecture/compiler supports such CONFIG option. A optimization method describe here may need to sacrifice other kinds of optimization e.g enabling register zeroing on function exit increases hardening at the cost of performance.
Note
Higher version of GCC/Clang and/or the kernel don't mean they give out higher optimizations due to software regression. Full list of regressions links are: GCC, Clang, and official kernel.

This article assumes /usr/src/linux is the symbolic link to the current kernel. Change directory to /usr/src/linux before continuing:

user $cd /usr/src/linux

One way to optimize the kernel is to remove what users don't need. For example, if not using KVM, then remove CONFIG_KVM:

KERNEL Disable KVM (CONFIG_KVM) support
Virtualization --->
  < >   Kernel-based Virtual Machine (KVM) support

Kbuild

The Kernel build system can be used to change how Kernel builds in a more advanced way than make *config, similar to GNU Make. Kbuild also support Environment Variables like LLVM=1. For example, the kernel will be build with LLVM and with aggressive optimization flags:

root #make LLVM=1 KCFLAGS="-O3 -march=native -pipe"

Distribution

Another way has emerged to USE="experimental" settings.

FILE /etc/kernel/config.d/USE-experimental-x86-64-v2-linux6-1-111.config
# CONFIG_GENERIC_CPU is not set
CONFIG_GENERIC_CPU2=y
FILE /etc/portage/env/gentoo-kernel
LLVM=1
KCFLAGS="-mtune=native"
KCPPFLAGS="-mtune=native"
FILE /etc/portage/package.env/gentoo-kernel
sys-kernel/gentoo-kernel gentoo-kernel

Clang/LLVM

Warning
DO NOT MIX GNU binutils and LLVM binutils for casual usage!!! For example make CC=gcc LD=ld.lld AR=llvm-ar will not work because LLVM's ar and ld is not compatible with GCC.
Note
Refer to The Linux Kernel Organization's latest document Building Linux with Clang/LLVM for more information.

Make sure the LLVM toolchain is installed before proceeding:

user $emerge --pretend --noreplace sys-devel/clang sys-devel/llvm sys-libs/compiler-rt sys-libs/llvm-libunwind sys-devel/lld

By default, the kernel is build under GNU binutils. The following environment variables are used: CC, LD, AR, NM, STRIP, OBJCOPY, OBJDUMP, READELF, HOSTCC, HOSTCXX, HOSTAR, and HOSTLD. Alternatively, the kernel may be build using LLVM binutils:

root #make LLVM=1 LLVM_IAS=1

*FLAGS

Note
For more list of *FLAGS to play with, see the GCC manual and Clang manual or man 1 gcc and man 1 clang commands.

By default, most of the kernel is build with C's -O2 (some code, like Random Number Generation, does not work with optimizations and sometimes checked with the C macro __OPTIMIZE__). This can be changed via Kbuild. Before making any KCFLAGS and similar flags, please check the kernel's Makefiles before it gets any changes. For example, -fallow-store-data-races is disabled on this Makefile.

-O3

The command to add this flag is:

root #make KCFLAGS="-O3"

There was a official attempt to add it to the kernel but Linus Torvalds reject it due to -O3 historically outputting worse code than -O2. Phoronix ran a -O3 kernel benchmark and found nearly all tested programs to have no measurable benefit.

-flto

Note
LTO memory usage may surpass RAM for 32 bit systems, so it may need to be disabled.

Enabling Link Time Optimization is not simple as make KCFLAGS="-flto". Except Clang's ThinLTO, the whole kernel will be recompiled if at least one CONFIG option change. See Clang LTO and GCC LTO for more information.

Performance

Performance means how fast the kernel runs.

GCC LTO

Andi Kleen and others has a experimental patches for this and will used to apply GCC LTO. For more information, see LWN article.

First, download the following 2 patches from CachyOS's kernel patches:

Then, apply the patch using git:

root #git apply gcc-lto.patch
root #git apply gcc-lto-no-pie.patch

Alternatively, when using distribution kernels users can add these patches in /etc/portage/patches/sys-kernel/*-kernel/*.patch.

Afterwards, enable GCC LTO on the kernel and enjoy:

root #make oldconfig
Link Time Optimization (LTO)
> 1. None (LTO_NONE)
  2. gcc LTO (LTO_GCC) (NEW)
choice[1-2?]: 2
Allow aggressive cloning for function specialization (LTO_CP_CLONE) [N/y/?] (NEW) n

To remove the patch:

root #git apply gcc-lto.patch --reverse
root #git apply gcc-lto-no-pie.patch --reverse
root #rm gcc-lto.patch gcc-lto-no-pie.patch

Clang LTO

Clang's Link Time Optimization can be either FullLTO or ThinLTO for 5.12+ Linux kernel:

KERNEL Enable Clang's LTO (CONFIG_LTO_CLANG_FULL and CONFIG_LTO_CLANG_THIN) support
General architecture-dependent options --->
  Link Time Optimization (LTO) (Clang ThinLTO (EXPERIMENTAL)) --->
    ( ) None
    ( ) Clang Full LTO (EXPERIMENTAL)
    (X) Clang ThinLTO (EXPERIMENTAL)

The difference between these two are that ThinLTO compiles faster due to parallelization and less memory usage. ThinLTO may sometimes improve performance or decrease performance.

Snippet

FILE /etc/kernel/config.d/clang-thin-lto-linux6-1-111.config
CONFIG_LTO=y
CONFIG_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG=y
CONFIG_ARCH_SUPPORTS_LTO_CLANG_THIN=y
CONFIG_HAS_LTO_CLANG=y
# CONFIG_LTO_NONE is not set
# CONFIG_LTO_CLANG_FULL is not set
CONFIG_LTO_CLANG_THIN=y

GCC PGO

The information in this section is probably outdated. You can help the Gentoo community by verifying and updating this section.
Note
The following instructions are from Yuan-ApSys-14, Yuan-APSys-15, and Yuan-ScienceChina-18.

To use Profile-Guided Optimization, activate debugfs and gcov support (See this for modern info):

KERNEL Enable debugfs (CONFIG_DEBUG_FS) and gcov (CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL) support
Kernel hacking --->
  Generic Kernel Debugging Instruments --->
    [*] Debug Filesystem
General architecture-dependent options --->
  GCOV-based kernel profiling --->
    [*] Enable gcov-based kernel profiling
    [*] Profile entire Kernel

The environment variable CFLAGS_GCOV, used when CONFIG_GCOV_KERNEL is on, defaults to -fprofile-arcs -ftest-coverage, but can be changed to -fprofile-generate -ftest-coverage or similar in Instrumentation Options:

root #make CFLAGS_GCOV="-fprofile-generate -ftest-coverage"

Then build as usual, setup the kernel and reboot the system using the command:

root #reboot
Important
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.

After booted back to system, run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, copy /sys/kernel/debug/gcov/usr/src/linux/*gcda files to /usr/src/linux:

root #cd /sys/kernel/debug/gcov/usr/src/linux
root #find . -name '*.gcda' -exec cp {} /usr/src/linux/{} \;

Then disable CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL and edit the KCFLAGS:

KERNEL Disable gcov (CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL) support
General architecture-dependent options --->
  GCOV-based kernel profiling --->
    [ ] Enable gcov-based kernel profiling
    [ ] Profile entire Kernel
root #make KCFLAGS="-fprofile-use -fprofile-correction -Wno-error=missing-profile -Wno-error=coverage-mismatch"

Like before, setup the kernel and finally reboot. To remove /usr/src/linux/*gcda files, run the command:

root #cd /usr/src/linux
root #find . -name '*.gcda' -exec rm {} \;

Clang PGO

Note
The following instructions are from CachyOS's kernel patch.

Download the patch and apply the patch:

root #git apply clang-pgo.patch

Configure the kernel with LLVM:

root #make menuconfig LLVM=1

Configure the kernel as follows:

KERNEL Disable Clang's LTO (CONFIG_LTO_CLANG_FULL and CONFIG_LTO_CLANG_THIN) support. Then enable Clang's PGO (CONFIG_PGO_CLANG)
General architecture-dependent options --->
  Link Time Optimization (LTO) (None) --->
    (X) None
    ( ) Clang Full LTO (EXPERIMENTAL)
    ( ) Clang ThinLTO (EXPERIMENTAL)
  Profile Guided Optimization (PGO) (EXPERIMENTAL)  --->
    [*] Enable clang's PGO-based kernel profiling

Then build as usual, setup the kernel and reboot the system using the command:

root #reboot
Important
The kernel will run slower and increase in size because the kernel has been instrumented to collect data like how many times a line of code executes. This will be necessary to build the PGO kernel.

After booted back to system, clear any PGO data:

root #echo 1 | tee /proc/pgo/reset

Run the system with many programs: play sound, game, run Firefox and so on. The longer the system is run and with more different programs, the higher instrumented data gets. When satisfied with the instrumented data, collect the raw profile data:

root #cp -a /proc/pgo/vmlinux.profraw /tmp/vmlinux.profraw

Then process the raw profile data using llvm-profdata:

user $cd /usr/src/linux
root #llvm-profdata merge --output=vmlinux.profdata /tmp/vmlinux.profraw

Disable Clang's PGO and optionally enable Clang's LTO:

root #make menuconfig LLVM=1
KERNEL Optionally enable Clang's LTO (CONFIG_LTO_CLANG_FULL and CONFIG_LTO_CLANG_THIN) support. Then disable Clang's PGO (CONFIG_PGO_CLANG)
General architecture-dependent options --->
  Link Time Optimization (LTO) (None) --->
    ( ) None
    ( ) Clang Full LTO (EXPERIMENTAL)
    (X) Clang ThinLTO (EXPERIMENTAL)
  Profile Guided Optimization (PGO) (EXPERIMENTAL)  --->
    [ ] Enable clang's PGO-based kernel profiling

Then compile the kernel with Clang's PGO and enjoy the faster kernel:

root #make KCFLAGS=-fprofile-use=/usr/src/linux

The user may now remove the patch:

root #git apply clang-pgo.patch --reverse
root #rm clang-pgo.patch

Hardened

Hardening refers to reducing the potential for malware to damage the system.

Important
Removing module support (CONFIG_MODULES) prevents the kernel from loading code at runtime but many drivers will not work without module. The alternative is to use only signed modules.
KERNEL Enable hardening
Processor type and features  --->
  [*]   Randomize the address of the kernel image (KASLR)
Power management and ACPI options  --->
  [ ] Hibernation (aka 'suspend to disk')
Memory Management options  --->
  [ ] Disable heap randomization
Security options --->
  Kernel hardening options --->
    Memory initialization --->
      Initialize kernel stack variables at function entry (zero-init everything (strongest and safest)) --->
        ( ) no automatic stack variable initialization (weakest)
        ( ) pattern-init everything (strongest)
        (X) zero-init everything (strongest and safest)
      [*] Poison kernel stack before returning from syscalls
      [*] Enable heap memory zeroing on allocation by default 
      [*] Enable heap memory zeroing on free by default
      [*] Enable register zeroing on function exit
Kernel hacking  --->
  Memory Debugging  --->
    [*] Debug VM translations

Pietinger, Kicksecure, and Clip OS has more hardened config options to the kernel.

Size

This section describes reducing kernel memory usage (useful for embedded systems).

5.4+ kernel officially support -Os flag:

KERNEL Enable -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE)
General setup --->
  Compiler optimization level (Optimize for performance (-Os)) --->
    ( ) Optimize for performance (-O2)
    (X) Optimize for size (-Os)

-Oz may also be instead use to more aggressively reduce size than -Os:

root #make KCFLAGS="-Oz"

References