DTrace
DTrace is a dynamic tracing tool for analysing or debugging the whole system. It can analyse and inspect both the kernel (inc. modules) and userland applications. It might be used for both analysing performance issues, but also analysing unexpected or wrong behavior from either userland or the kernel.
Installation
Kernel configuration
Kernel debug info is incompatible with CONFIG_GCC_PLUGIN_RANDSTRUCT, see https://gcc.gnu.org/PR84052. A workaround is disabling hardened on dist-kernels if using them for now.
This modern DTrace implementation is based on eBPF and requires a suite of kernel configuration options which the ebuild checks for. In particular:
General Setup -> Compile-time checks and compiler options -> <*> Configure standard kernel features (expert users) -> <*> Load all symbols for debugging/ksymoops <*> Include all symbols in kallsyms Search for <code>CONFIG_KALLSYMS_ALL</code> to find this item. File systems -> <*> FUSE (Filesystem in Userspace) support Search for <code>CONFIG_FUSE_FS</code> to find this item. <*> Character device in Userspace support Search for <code>CONFIG_CUSE</code> to find this item. Kernel Hacking -> <*> Tracers -> Search for <code>CONFIG_FTRACE</code> to find this item. <*> Trace syscalls Search for <code>CONFIG_FTRACE_SYSCALLS</code> to find this item. <*> Enable uprobes-based dynamic events Search for <code>CONFIG_UPROBE_EVENTS</code> to find this item. <*> enable/disable function tracing dynamically Search for <code>CONFIG_DYNAMIC_FTRACE</code> to find this item. <*> Kernel Function Tracer Search for <code>CONFIG_FUNCTION_TRACER</code> to find this item. <*> Generate BTF typeinfo Search for <code>CONFIG_DEBUG_INFO_BTF</code> to find this item.
dist-kernel
When using a Distribution Kernel, enabling debug for the relevant *-kernel package covers some of these.
If using gentoo-kernel[hardened], the following workaround config.d snippet can be used:
# CONFIG_RANDSTRUCT_NONE is not set
# CONFIG_RANDSTRUCT_FULL is not set
# CONFIG_RANDSTRUCT_PERFORMANCE is not set
# CONFIG_RANDSTRUCT is not set
# CONFIG_GCC_PLUGIN_RANDSTRUCT is not set
A general snippet for DTrace is below:
CONFIG_CUSE=m
CONFIG_FPROBE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_INFO_NONE is not set
CONFIG_DEBUG_INFO_DWARF5=y
# CONFIG_DEBUG_INFO_REDUCED is not set
# CONFIG_DEBUG_INFO_COMPRESSED_NONE is not set
CONFIG_DEBUG_INFO_COMPRESSED_ZLIB=y
# CONFIG_DEBUG_INFO_COMPRESSED_ZSTD is not set
# CONFIG_DEBUG_INFO_SPLIT is not set
CONFIG_DEBUG_INFO_BTF=y
CONFIG_PAHOLE_HAS_SPLIT_BTF=y
CONFIG_PAHOLE_HAS_LANG_EXCLUDE=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
# CONFIG_MODULE_ALLOW_BTF_MISMATCH is not set
# CONFIG_GDB_SCRIPTS is not set
CONFIG_PROBE_EVENTS_BTF_ARGS=y
Remember, the above only works with sys-kernel/gentoo-kernel
Userland
Modern versions of DTrace are Extended Berkeley Packet Filter (eBPF)-based, so there's no need for a separate kernel module, or any patches.
Simply install the userland tools:
root #
emerge --ask dev-debug/dtrace
sys-devel/bpf-toolchain is used to avoid needing to manually build a cross-compiler with Crossdev.
systemd
root #
systemctl enable --now dtprobed
OpenRC
root #
rc-update add dtprobed default && rc-service dtprobed start
Getting started
The DTrace framework is centred around two concepts:
- D scripts (not to be confused with D programming language, also called Dlang): these can be one-liners or whole scripts with a traditional shebang for convenience;
- probes which attach to some system component of interest, like a part of a kernel module, or userland application
Other terminology:
- providers publishes available probes
- modules provide an implementation of a probe
For example, the syscall provider, via the vmlinux module, has various probes for each syscall at entry and return.
Following the official D script tutorial is strongly recommended.
Syntax
D Script's syntax is roughly Awk-like. Below is a basic skeleton:
#!/usr/bin/env dtrace -s
PROBE
/optional predicate condition/
{
/* Action */
}
BEGIN and END blocks are special, like constructors/destructors:
#!/usr/bin/env dtrace -s
/* Sample script which starts up, prints a message, then exits and shows
another message before terminating. */
BEGIN
{
i = 0;
printf("Starting up with i=%d\n", i);
exit(0);
}
END
{
i++;
printf("Shutting down with i=%d\n", i);
}
A fully qualified probe name takes the form provider:module:function:name. Wildcards can be used to match multiple probes. To leave a field blank, use a colon-separator, e.g. provider::function:name. In general, DTrace tries to solve ambiguous names rather than bailing out.
For more information, see the D Program Syntax Reference.
Usage
DTrace shines in investigating emerging incidents where one-liners or small scripts can quickly be written to both gather information but also to prove or disprove a theory about root causes. As a result, the most important thing for using DTrace is being familiar with its primitives, rather than having a large corpus of pre-existing scripts ready to use.
Investigating locks
Suppose a system has a mysterious lock file being rapidly created and then deleted.
Artificially create such a scenario with a loop:
root #
while true ; do flock -s $(mktemp) sleep 1 ; done
This can be easily investigated:
#!/usr/bin/env dtrace -s
#pragma D option quiet
syscall::openat:entry
{
/* Record the pathname arg to openat(2). */
self->path = args[1];
}
syscall::openat:return
{
/* Store the fd that openat(2) returns, corresponding to the pathname from earlier. */
map[args[0]] = self->path;
}
syscall::flock:entry
{
/* Log the fd we're locking. */
trace(stringof(args[0]));
/* Use the information we recorded earlier to show what PID is locking what file. */
printf("Lock was taken by pid=%d for file=%s\n", pid, stringof(map[args[0]]));
}
Probing a probe
When writing a script, one might need to know the argument types (prototype) of a probe. For example, to investigate the parameters for syscall::vmlinux::openat and syscall::vmlinux::openat2:
root #
dtrace -l -n openat*:entry -v
ID PROVIDER MODULE FUNCTION NAME 124115 syscall vmlinux openat entry [...] Argument Types args[0]: int args[1]: char * args[2]: int args[3]: umode_t 124113 syscall vmlinux openat2 entry [...] Argument Types args[0]: int args[1]: char * args[2]: struct open_how * args[3]: size_t
Troubleshooting
Basic sanity checks
First, have DTrace enable a trivial probe and return once it is hit:
root #
dtrace -n 'BEGIN { exit(0); }'
dtrace: description 'BEGIN ' matched 1 probe CPU ID FUNCTION:NAME 28 1 :BEGIN
Check whether the syscall probe has very basic functionality:
root #
dtrace -n 'syscall:::entry { exit(0); }'
dtrace: description 'syscall:::entry ' matched 344 probes CPU ID FUNCTION:NAME 3 124015 bpf:entry
If either of these commands hang or don't produce a table as above, DTrace isn't working correctly. It may be caused by exotic optimization flags.
All the available probes can be listed with dtrace -l:
root #
dtrace -l
ID PROVIDER MODULE FUNCTION NAME 1 dtrace BEGIN 2 dtrace END 3 dtrace ERROR 4 cpc perf_count_hw_cpu_cycles-all-1000000000 5 cpc cycles-all-1000000000 6 cpc cpu_cycles-all-1000000000 7 cpc perf_count_hw_instructions-all-1000000000 8 cpc instructions-all-1000000000 [...]
As of August 2024, on an ~amd64 system with linux-6.6, around 125000 probes are registered. If the number is substantially lower than that, it's possible some required kernel config options are not enabled.
Faulting on a bad address
Using a valid argument in an entry probe which the process hasn't yet accessed might cause a fault. To workaround this, store the argument in this->foo with copyinstr.
Online examples not working
Some examples online may not work for a few reasons:
- Relying on unstable kernel ABI so 'function' names or types may have changed
- The example relies on assumptions about libc and how it wraps syscalls
- Overly specific names
Below are a few examples of both a problem and analysis to fix it. The same methods can be applied to other bad examples on the internet.
execveat_common
For example, consider this script for tracing execve:
root #
dtrace -n 'proc::do_execveat_common:exec { trace(stringof(args[0])); }'
dtrace: invalid probe specifier proc::do_execveat_common:exec { trace(stringof(args[0])); }: probe description proc::do_execveat_common:exec does not match any probes
The specification proc::do_execveat_common:exec is prone to internal changes. BPF-based DTrace provides a tracepoint for exec elsewhere.
To fix it, use the more generic proc tracepoint:
root #
dtrace -n 'proc:::exec { trace(stringof(args[0])); }'
dtrace: description 'proc:::exec ' matched 1 probe CPU ID FUNCTION:NAME 2 123651 :exec /usr/sbin/utempter 3 123651 :exec /usr/sbin/utempter 10 123651 :exec /bin/bash 1 123651 :exec /usr/bin/dircolors
open
For example, consider this script for tracing open:
root #
dtrace -n 'syscall::open:entry { printf("%-16s %-16s\n",execname,copyinstr(arg0)); }'
dtrace: description 'syscall::open:entry ' matched 1 probe [Likely to have no further output]
Most functions used to open files don't actually use the open syscall in glibc, but one of the other wrappers instead. Note that arg0 changes to arg1 here too as we see openat is being used which has different parameters:
root #
dtrace -n 'syscall::open*:entry { printf("%-16s %-16s\n",execname,copyinstr(arg1)); }'
dtrace: description 'syscall::open*:entry ' matched 5 probes CPU ID FUNCTION:NAME 5 128754 openat:entry cc1plus /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h 2 128754 openat:entry cc1plus /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_heap.h 2 128754 openat:entry cc1plus /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/uniform_int_dist.h 2 128754 openat:entry cc1plus /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_construct.h 2 128754 openat:entry cc1plus /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_tempbuf.h
See also
- SystemTap — powerful tool that provides an infrastructure to simplify the gathering of information about the running Linux kernel or userspace programs