Power management/Processor
- Add instructions for kernel configuration of remaining CPU frequency drivers
- Describe AMD P-state EPP thoroughly
This article describes the setup of power management for processors.
CPU frequency scaling
CPU frequency scaling is a technique whereby the frequency (and voltage) of a processor can be automatically adjusted "on the fly" to conserve power. This helps improving the battery life of mobile devices, reduces the amount of heat generated by the chip which lessens the cooling requirements. The scaling can react to system load, be controlled by userspace tools, or react to ACPI events.
The ACPI specification describes the scaling mechanism as performance states - P-states or Processor Performance States.[1] The state labeled as P0 is used for the processor's highest possible frequency and P1-Pn states are used for lower frequencies.
Lower processor frequency leads to lower number of instruction processed over a unit of time. This means finding a balance between frequency and performance is necessary.
The kernel CPUFreq subsystem[2] is responsible for handling the frequency scaling. This subsystem provides two basic means of changing the scaling behavior:
- Scaling Governors - provide different approaches to estimate the desired processor frequency using different scaling algorithms.
- Scaling Drivers - provide an interface between scaling governors and the specific hardware. Scaling driver can read/write hardware-specific values on behalf of the governor.
The CPUFreq subsystem exposes multiple sysfs interfaces. The most useful is created per-processor /sys/devices/system/cpu/cpu*/cpufreq/. This directory contains various files, like:
- cpuinfo_cur_freq - current frequency in KHz as reported by the processor.
- cpuinfo_min_freq - minimal possible frequency in KHz as reported by the processor.
- cpuinfo_max_freq - maximal possible frequency in KHz as reported by the processor.
- scaling_governor - currently used scaling governor. It can be changed by writing to this file.
- scaling_driver - currently used scaling driver. It can be changed by writing to this file.
- scaling_min_freq - minimal processor frequency in KHz to be used by the governor. It can be set by writing to this file.
- scaling_max_freq - maximum processor frequency in KHz to be used by the governor. It can be set by writing to this file.
Installation
BIOS
Some functions can be enabled or disabled in the BIOS. Check that the following, if available, are enabled:
- "Processor C1E support"
- "Enhanced Intel SpeedStep (EIST)"
- "AMD Cool'n'Quiet (C&Q)"
- "AMD PowerNow!"
Kernel
Activate the following kernel options:
Power management and ACPI options --->
[*] ACPI (Advanced Configuration and Power Interface) Support --->
<*> Processor
CPU Frequency scaling --->
-*- CPU Frequency scaling
[*] CPU frequency transition statistics
Default CPUFreq governor (ondemand) --->
Select a default governor; see below table
Default is 'ondemand'
*** CPU frequency scaling drivers ***
Select a driver; see below table
Enabling CPUFreq governor and driver is needed:
Option | Module | Supported Processors | Note |
---|---|---|---|
'performance' governor | cpufreq_performance | Sets the frequency statically to the highest available processor frequency as defined by the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq. | For recent Intel Core processors, this should be selected as default. [3] [4] |
'powersave' governor | cpufreq_powersave | Sets the frequency statically to the lowest available processor frequency as defined by the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq. | Can't be set as default. |
'userspace' governor for userspace frequency scaling | cpufreq_userspace | To set the CPU frequency manually (via the file /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed) or when a userspace program shall be able to set the processor frequency dynamically. | |
'ondemand' cpufreq policy governor | cpufreq_ondemand | Does a periodic polling and immediately changes frequency based on the processor load. | For processors other than Intel Core, this should be selected as default. |
'conservative' cpufreq governor | cpufreq_conservative | Similar to 'ondemand'. The frequency is gracefully increased and decreased rather than jumping to 100% when speed is required. | |
'schedutil' cpufreq policy governor | cpufreq_schedutil | Aimed at driving the frequency changes by the kernel scheduler.[5] |
Name of the active CPUFreq governor is available in: /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Behavior of active governor can be further configured via tunables exposed as sysfs interface. For more details see the dedicated documentation. Commonly used sysfs tunables include:
- schedutil - /sys/devices/system/cpu/cpufreq/schedutil/rate_limit_us sets minimal interval in μs between consecutive governor runs.
- ondemand - /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate sets the interval in μs between consecutive load sampling runs.
- conservative - /sys/devices/system/cpu/cpufreq/conservative/freq_step sets the maximal frequency change step as % of scaling_max_freq.
Option | Module / Kernel symbol | Supported Processors | Note |
---|---|---|---|
Intel P state control | intel_pstate (CONFIG_X86_INTEL_PSTATE) | recent (Sandy Bridge+) Intel Core | Implements an internal scaling governor. Shows itself as intel_cpufreq on Intel processors lacking Hardware P-States (HWP) (hwp CPU flag) support.[6]
|
AMD Processor P-State driver[7] | amd-pstate (X86_AMD_PSTATE). | AMD Zen 2 and newer | Provides more fine grained frequency steps compared to the standard acpi-cpufreq driver.[7] Shows itself as amd_pstate_epp when its internal scaling governor implementation is active. Requires kernel v5.17 and above. |
ACPI Processor P-States driver | acpi-cpufreq (CONFIG_X86_ACPI_CPUFREQ) | AMD Zen 1-based EPYC/Ryzen, older Intel Core (pre-Sandy Bridge)/Xeon, AMD Opteron/Phenom, Intel Atom, Intel Pentium M | Acts as a generic CPUFreq driver. Utilizes ACPI Performance States. Note, for AMD processors it is limited to only 3 frequency steps unlike amd-pstate.[7] |
AMD Opteron/Athlon64 PowerNow! | powernow-k8 (CONFIG_X86_POWERNOW_K8) | K8-based AMD Opteron, AMD Athlon 64, AMD Turion 64 | Supports older AMD K8-based processors. |
Intel Enhanced SpeedStep (deprecated) | speedstep-centrino (CONFIG_X86_SPEEDSTEP_CENTRINO) | Intel Pentium M (Centrino)/Xeon | Deprecated, use ACPI Processor P-States driver instead. |
Intel Pentium 4 clock modulation | p4-clockmod (CONFIG_X86_P4_CLOCKMOD) | Intel Pentium 4/Xeon | Not recommended - causes severe slowdowns and noticeable latency. |
Processor Clocking Control interface driver | pcc-cpufreq (CONFIG_X86_PCC_CPUFREQ) | x86 processors supporting the Processor Clocking Control (PCC) interface | Adds support for the PCC interface. Might be useful for HP servers supporting the interface.[8] |
Availability of drivers depend on the processor architecture.
Name of the active CPUFreq driver is available in: /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
Specific CPU scaling drivers settings
Intel P-state
This driver implements internal scaling governors (roughly similar to CPUFreq's powersave and performance) and works based on the processor load. It is intended for recent Intel Core series of processors (based on the Sandy Bridge microarchitecture or newer).
This driver works in either active mode (intel_pstate), for processors featuring Hardware P-States (HWP), or passive mode (intel_cpufreq). The passive mode concerns the processors not supporting HWP which are generations prior the Skylake microarchitecture - no hwp
CPU flag is present.
In the active mode the processor autonomously sets the frequency based on provided CPUFreq parameters. This passes the control of frequency scaling to the processor itself. On the other hand, in the passive mode the driver behaves similarly to the generic acpi-cpufreq driver - it collaborates with the regular scaling governors. Although, it can use the full range of frequency steps.[9]
In the active mode case, the userspace, ondemand, and conservative scaling governors are unnecessary. The performance governor should be selected as the default. [10]
Power management and ACPI options --->
[*] CPU Frequency scaling --->
Default CPUFreq governor (performance) --->
-*- 'performance' governor
<*> Intel P state control
There is a sysfs interface exposed by the driver. Its root is located at the /sys/devices/system/cpu/intel_pstate/ directory. There are files like:
- no_turbo - disables the Intel Turbo Boost feature (1 means disabled and 0 means enabled). The state can be changed by writing to this file.
- status - displays the status of the driver. Values are either - off, passive, or active.
AMD P-State
This driver is available in kernel v5.17 or newer[11]. It aims to provide a more effective alternative to the generic acpi-cpufreq driver. It is based on Collaborative Processor Performance Control (CPPC)[12] to provide fine grained frequency steps. This was motivated by acpi-cpufreq providing only 3 frequency control options, and the lowest frequency is typically higher than what is made available when using amd-pstate thus being less effective than it might otherwise be as a way to maximize battery life.
It is intended for AMD Ryzen/EPYC processors based on the Zen 2 or newer microarchitecture. In case of hardware support and configuration mismatch the scaling driver gets set to the acpi-cpufreq as a fallback.
To verify the currently used driver did not fall back to acpi-cpufreq read: /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver.
In order to use this driver, "CPPC", "ACPI CPPC", or similar BIOS setting must be set to enabled or auto.
Power management and ACPI options --->
[*] CPU Frequency scaling --->
Default CPUFreq governor (performance) --->
-*- 'performance' governor
[*] AMD Processor P-State driver
<M> selftest for AMD Processor P-State driver
There is a sysfs interface exposed by the driver. Its root is located at the /sys/devices/system/cpu/amd_pstate/ directory. There are files like:
- status - displays the status of the driver. Values are either - active, passive, guided, or disable.
When the currently used driver falls back to the acpi-cpufreq driver the following kernel command-line parameters can fix loading the amd-pstate driver:
- Zen 2 processors: Add
amd_pstate.shared_mem=1
to enable amd-pstate using its shared memory implementation.[13] - Zen 3 or newer processors: Add
amd-pstate=passive
. Zen 3 or newer also supports CPPC.[12]
Kernel 6.3 further developed available AMD P-State options in the form of Energy Preference Performance (EPP) modes.[14] This new driver is referred as amd_pstate_epp. It allows new combinations of drivers and governors such as "amd_pstate_epp powersave performance" or "amd_pstate_epp performance performance". Some benchmarks are available.
For further details on the AMD P-state driver see the documentation available upstream.
Manual governor/driver change
It is possible to change the active CPU governor and/or driver using a simple command:
root #
echo ondemand | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
It is possible to execute this command on startup using means of the system's init system.
Set governor at boot time
It is possible to set the default governor via cpufreq.default_governor kernel command-line parameter.
This parameter requires kernel v5.9+.[15]
Scheduling-Clock Ticks
The processor saves the most energy when the processor stays longer in its power savings mode, so it is desirable to reduce the amount of any actions that wakes the processor up. One of those actions can be scheduling-clock interrupts also known as "ticks". Details about the available "tickless" modes can be found in the kernel documentation.
Installation
BIOS
Some functions can be enabled or disabled in the BIOS. Check that the following settings are enabled:
- "High Precision Event Timer"
- "HPET"
- "Multimedia timer"
Kernel
Activate the following kernel options for power saving features:
General setup --->
Timers subsystem --->
[*] Idle dynticks system (tickless idle)
[*] High Resolution Timer Support
Device Drivers --->
Character devices --->
[*] HPET Timer Support
CPU Idle
Modern multi-core processors are often not fully loaded which brings an opportunity to suspend the unused parts and save power. The hardware transitions the unused parts to idle states. The kernel then does not schedule regular tasks to the idle parts but only special idle tasks.
The ACPI specification describes those idle states as C-states or Processor Power States. [16] There are usually multiple C-states implemented. Starting from the C0 state for a regularly running processor to C1, C2, and deeper idle states. The deeper the idle state, greater power saving but also a longer transition back to the running state.
The kernel CPUIdle subsystem[17] is responsible for handling the idle state management. Similarly to CPUFreq, this subsystem provides two basic means of idle state management - governor and driver. The governor attempts to predict the optimal C-state and driver to perform the operation on the hardware.
The CPUIdle subsystem exposes a sysfs interface. It is available at /sys/devices/system/cpu/cpuidle/. This directory contains various files, like:
- current_governor - currently used idle governor. It can be changed by writing to this file.
- available_governors - list of available idle governors.
- current_driver - currently used idle driver information.
Installation
BIOS
Check that the following settings are enabled in BIOS:
- "C-States"
- "ACPI C states"
Kernel
Name | Module / Kernel symbol | Supported Processors | Note |
---|---|---|---|
Intel Idle Time Driver | intel_idle (CONFIG_INTEL_IDLE) | recent (Nehalem+) Intel Core[18] | Asks the processor part to enter the idle state using the MWAIT instruction. |
ACPI Idle Driver | acpi_idle (CONFIG_ACPI_PROCESSOR_IDLE) | AMD processors, old Intel processors | Generic idle driver |
Name | Module / Kernel symbol | Note |
---|---|---|
Ladder Governor | ladder (CONFIG_CPU_IDLE_GOV_LADDER) | Default governor for systems with allowed scheduler ticks in idle - CONFIG_NO_HZ_IDLE=n. |
Menu Governor | menu (CONFIG_CPU_IDLE_GOV_MENU) | Default governor for tickless systems - CONFIG_NO_HZ_IDLE=y. |
Timer events oriented (TEO) governor | TEO (CONFIG_CPU_IDLE_GOV_TEO) | Alternative governor for tickless systems - CONFIG_NO_HZ_IDLE=y. |
Tools
PowerTOP
PowerTOP is a utility designed to measure, explain and minimize a computer's electrical power consumption.
When it is run, it sorts the running processes in order of how often they cause the processor to wake up. For details on installation, configuration and usage see the separate PowerTOP article.
cpupower
The sys-power/cpupower package provides a set of tools to comfortably manage and monitor processor powersaving features. The tools include cpupower frequency-info, cpupower frequency-set, and cpupower monitor.
hprofile
Allows automation some of the decisions of governing CPU frequency. For instance, when not wired to AC power, most users would like to have the system in a power saving mode.
This is where Hprofile comes into play. Please refer to its article for more information and configuration.
See also
- ACPI — a power management system that is part of the BIOS.
External resources
- What exactly is a P-state? (Pt. 1) - An Intel article (kind of) explaining P-states.
- Linux's "Ondemand" Governor Is No Longer Fit - Explains why ondemand should not be used for newer Intel core processors.
References
- ↑ 8. Processor Configuration and Control — ACPI Specification 6.4 documentation, UEFI Forum, Inc. Retrieved 9 September 2023.
- ↑ CPU Performance Scaling, The kernel development community. Retrieved 9 September 2023.
- ↑ Dominik Brodowski. Intel P-State driver, CPU frequency and voltage scaling code in the Linux(TM) kernel. Retrieved 12 June 2016.
- ↑ Michael Larabel. Linux's "Ondemand" Governor Is No Longer Fit. Retrieved 15 October 2016.
- ↑ Improvements in CPU frequency management, LWN.net, Neil Brown, 6 April 2016. Retrieved 12 January 2022.
- ↑ intel_pstate CPU Performance Scaling Driver, kernel.org, Rafael J. Wysocki. Retrieved 12 January 2022.
- ↑ 7.0 7.1 7.2 amd-pstate CPU Performance Scaling Driver, The kernel development community. Retrieved 9 September 2023.
- ↑ Platform-based Power Management and Linux, Bdale Garbee and Naga Chumbalkar. Retrieved 9 September 2023.
- ↑ intel_pstate CPU Performance Scaling Driver, The kernel development community. Retrieved 9 September 2023.
- ↑ Dominik Brodowski. Intel P-State driver, CPU frequency and voltage scaling code in the Linux(TM) kernel. Retrieved 12 June 2016.
- ↑ AMD P-State Driver To Premiere In Linux 5.17 With Aim To Deliver Better Power Efficiency, Michael Larabel. Retrieved 9 September 2023.
- ↑ 12.0 12.1 Collaborative Processor Performance Control (CPPC), The kernel development community. Retrieved 9 September 2023.
- ↑ How to enable amd-pstate?, Manjaro.org. Retrieved 9 September 2023.
- ↑ Ryzen Mobile Power/Performance With Linux 6.3's New AMD P-State EPP Driver, Michael Larabel. Retrieved 9 September 2023.
- ↑ The kernel’s command-line parameters, The kernel development community. Retrieved 9 September 2023.
- ↑ 8.1. Processor Power States — ACPI Specification 6.4 documentation, UEFI Forum, Inc. Retrieved 10 September 2023.
- ↑ CPU Idle Time Management, The kernel development community. Retrieved 10 September 2023.
- ↑ intel_idle CPU Idle Time Management Driver, The kernel development community. Retrieved 10 September 2023.