Sysbench
sysbench provides benchmarking capabilities for Linux. sysbench supports testing CPU, memory, file I/O, mutex performance, and even MySQL benchmarking.
Installation
USE flags
USE flags for app-benchmarks/sysbench A scriptable multi-threaded benchmark tool based on LuaJIT
+aio
|
Add support for AIO. |
+largefile
|
Add support for large files. |
attachsql
|
Add support for AttachSQL. |
drizzle
|
Add support for Drizzles. |
mysql
|
Add mySQL Database support |
postgres
|
Add support for the postgresql database |
test
|
Enable dependencies and/or preparations necessary to run tests (usually controlled by FEATURES=test but can be toggled independently) |
Emerge
Install app-benchmarks/sysbench:
root #
emerge --ask sysbench
Usage
As mentioned, sysbench supports several benchmark workloads: fileio, cpu, memory, threads, mutex, oltp.
Using the fileio workload
When using fileio, you will need to create a set of test files to work on. It is recommended that the size is larger than the available memory to ensure that file caching does not influence the workload too much.
user $
sysbench --test=fileio --file-total-size=128G prepare
user $
sysbench --test=fileio --file-total-size=128G --file-test-mode=rndrw --max-time=300 --max-requests=0 run
user $
sysbench --test=fileio --file-total-size=128G cleanup
As this is I/O benchmarking, you can tell sysbench which kind of workload you want to run: sequential reads, writes or random reads, writes, or a combination. In the above example, random read/write is used (rndrw). The duration of the test is given through the --max-time
option (in seconds).
The output of a run is shown below:
user $
sysbench --test=fileio --file-total-size=32G --file-test-mode=rndrw --max-time=300 --max-requests=0 run
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 0 128 files, 256Mb each 32Gb total file size Block size 16Kb Number of random requests for random IO: 0 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test Threads started! Time limit exceeded, exiting... Done. Operations performed: 14788 Read, 9858 Write, 31488 Other = 56134 Total Read 231.06Mb Written 154.03Mb Total transferred 385.09Mb (1.2836Mb/sec) 82.15 Requests/sec executed Test execution summary: total time: 300.0149s total number of events: 24646 total time taken by event execution: 173.9797 per-request statistics: min: 0.01ms avg: 7.06ms max: 92.72ms approx. 95 percentile: 16.57ms Threads fairness: events (avg/stddev): 24646.0000/0.00 execution time (avg/stddev): 173.9797/0.00
The important part to look at is the information regarding the operations:
Operations performed: 14788 Read, 9858 Write, 31488 Other = 56134 Total Read 231.06Mb Written 154.03Mb Total transferred 385.09Mb (1.2836Mb/sec)
These numbers can be compared with runs on different file systems, other systems, etc.
Using the CPU workload
When running with the CPU workload, sysbench will verify prime numbers by doing standard division of the number by all numbers between 2 and the square root of the number. If any number gives a remainder of 0, the next number is calculated. As you can imagine, this will put some stress on the CPU, but only on a very limited set of the CPUs features.
The benchmark can be configured with the number of simultaneous threads and the maximum number to verify if it is a prime.
user $
sysbench --test=cpu --cpu-max-prime=20000 --num-threads=2 run
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 2 Doing CPU performance benchmark Threads started! Done. Maximum prime number checked in CPU test: 20000 Test execution summary: total time: 18.0683s total number of events: 10000 total time taken by event execution: 36.1322 per-request statistics: min: 3.44ms avg: 3.61ms max: 6.77ms approx. 95 percentile: 5.05ms Threads fairness: events (avg/stddev): 5000.0000/7.00 execution time (avg/stddev): 18.0661/0.00
The number to verify with other systems is given by the execution summary:
total time: 18.0683s total time taken by event execution: 36.1322
The event execution time is the pure calculation part. If you run the test with multiple threads, it is the sum of the time of all threads. The total time is the end-to-end time, and as such includes the overhead of shared memory access for the threads (although this is usually negligible). Unlike the event execution time, the total time is the duration from start to finish (so no culmination of individual times of the threads).
While older versions of sysbench limit the number of events, sysbench >=1.0 limits the total execution time by default, in which case the total number of events or the number of events per second should be used as a performance indicator instead of the total execution time[1].
Using the threads workload
With the threads workload, each worker thread will be allocated a mutex (a sort of lock) and will, for each execution, loop a number of times (documented as the number of yields) in which it takes the lock, yields (meaning it asks the scheduler to stop itself from running and put it back and the end of the runqueue) and then, when it is scheduled again for execution, unlock.
By tuning the various parameters, one can simulate situations with high concurrent threading with the same lock, or high concurrent threading with several different locks, etc.
user $
sysbench --test=threads --thread-locks=1 --max-time=20s run
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Doing thread subsystem performance test Thread yields per test: 1000 Locks used: 1 Threads started! Time limit exceeded, exiting... Done. Test execution summary: total time: 20.0052s total number of events: 1894 total time taken by event execution: 19.9653 per-request statistics: min: 10.22ms avg: 10.54ms max: 13.42ms approx. 95 percentile: 10.71ms Threads fairness: events (avg/stddev): 1894.0000/0.00 execution time (avg/stddev): 19.9653/0.00
When using the --max-time
option, the number you want to use for comparing systems is the per-request statistic. In the above case, a single request (of on average 10.54ms) has run the lock-yield-unlock process 1000 times. Or put differently, on average, the lock-yield-unlock process took about 10.54 microseconds on average.
Using the mutex workload
When using the mutex workload, the sysbench application will run a single request per thread. This request will first put some stress on the CPU (using a simple incremental loop, through the --mutex-loops
parameter) after which it takes a random mutex (lock), increments a global variable and releases the lock again. This process is repeated several times identified by the number of locks (--mutex-locks
). The random mutex is taken from a pool sized by the --mutex-num
parameter.
user $
sysbench --test=mutex --num-threads=64 run
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 64 Doing mutex performance test Threads started! Done. Test execution summary: total time: 0.5510s total number of events: 64 total time taken by event execution: 27.9994 per-request statistics: min: 15.40ms avg: 437.49ms max: 520.51ms approx. 95 percentile: 516.51ms Threads fairness: events (avg/stddev): 1.0000/0.00 execution time (avg/stddev): 0.4375/0.07
The duration of such a run here is important, although one has to take into account that the threads will take a random mutex from the available pool. This random factor might influence the results a bit.
Using the memory workload
When using the memory test in sysbench, the benchmark application will allocate a memory buffer and then read or write from it, each time for the size of a pointer (so 32bit or 64bit), and each execution until the total buffer size has been read from or written to. This is then repeated until the provided volume (--memory-total-size
) is reached. Users can provide multiple threads (--num-threads
), different sizes in buffer (--memory-block-size
) and the type of requests (read or write, sequential or random).
user $
sysbench --test=memory --num-threads=4 run
sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 4 Doing memory operations speed test Memory block size: 1K Memory transfer size: 102400M Memory operations type: write Memory scope type: global Threads started! Done. Operations performed: 104857600 (2320057.25 ops/sec) 102400.00 MB transferred (2265.68 MB/sec) Test execution summary: total time: 45.1961s total number of events: 104857600 total time taken by event execution: 117.7050 per-request statistics: min: 0.00ms avg: 0.00ms max: 18.72ms approx. 95 percentile: 0.00ms Threads fairness: events (avg/stddev): 26214400.0000/124550.07 execution time (avg/stddev): 29.4263/0.18
The important number to compare (given the same or similar parameters) is the throughput and operations per second:
Operations performed: 104857600 (2320057.25 ops/sec) 102400.00 MB transferred (2265.68 MB/sec)