TL;DR – ashift=12
performed noticeably better than ashift=13
.
I recently installed Proxmox on a new build and couldn’t find any information about the best ashift
values for my new NVMe SSD drives. Given that I was installing a brand new server, it gave me a chance to do some quick testing. Here’s what I found.
Hardware setup
- Asus Pro W680-ACE IPME (without the IPME card installed)
- 64GB of DDR5-4800 ECC memory
- 2x 2TB Samsung 990 PRO NVMe SSDs
The SSDs are installed on the motherboard’s M.2 slots, one in the slot directly attached to the CPU and the other in a slot connected through the W680 controller.
Software setup
For both tests I did a clean install of Proxmox VE 8.1.3 with both SSDs in a RAID1 configuration. All zpool/vdev settings were default (compression=lzw
, checksum=on
, copies=1
, recordsize=128K
, etc.) except for the ashift
values changed for the tests. After the installation, I did the standard apt-get
updates to get current, and then installed fio
to do the testing.
Test setup
The tests I ran were from Jim Salter’s ARStechnica article which gives good detail on how to run fio
to test disk performance. Each test was run back-to-back. I ran each test 4 times. Here’s the script I used to run the tests:
echo "Test 1 - Single 4KiB random write process"
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
echo "Test 2 - 16 parallel 64KiB random write processes"
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
echo "Test 3 - Single 1MiB random write process"
fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
Results
ashift=12
was faster (on average) than ashift=13
for every test. For Test 1, it was 10.3% (15MB/s) faster. For Test 2, it was a whopping 22.3% (499MB/s) faster. And for Test 3, it was 8.5% (117MB/s) faster. Those are pretty big differences – I was surprised they weren’t more in the noise. Made it easy to make the call to set my drives to ashift=12
, which also happens to be the common wisdom for all drives today.
Test | ashift | Iteration 1 | Iteration 2 | Iteration 3 | Iteration 4 | Average |
1 | 12 | 176 | 92.6 | 150 | 163 | 145.4 |
2 | 12 | 2844 | 1126 | 3865 | 1104 | 2234.75 |
3 | 12 | 1139 | 1320 | 1550 | 1473 | 1370.5 |
1 | 13 | 177 | 72.7 | 118 | 154 | 130.425 |
2 | 13 | 3199 | 1473 | 993 | 1280 | 1736.25 |
3 | 13 | 1030 | 1221 | 1476 | 1287 | 1253.5 |
Why didn’t you test ashift 9?
It looks to be faster than both ashift 12 and 13: https://feldspaten.org/2024/05/05/Performance-impact-of-different-blocksizes-on-a-Samsung-Pro-SSD-while-using-zfs/
The drives reports a physical block size of 512 but might be lying:
# nvme id-ns -H /dev/nvme0n1 | grep “Relative Performance”
LBA Format 0 : Metadata Size: 0 bytes – Data Size: 512 bytes – Relative Performance: 0 Best (in use)
Everything I read at the time pointed to 512 (ashift=9) being not worth testing, and the general guidance was essentially “if you’re going to be wrong, be wrong high”. If you run tests with it, please let me know what you find.
An ashift value of 9 can only perform better than a higher value on a device with 512-byte physical sectors.
Correct. But the whole point is that SSDs often lie:
“ Most client-oriented storage operates by default in “512-bytes emulation” mode, where although the logical sector size is 512 byes/sector, internally the firmware uses 4096 bytes/sector. Storage with a 4096 byte size for both logical and physical sectors operates in what is commonly called “4K native” mode” – link
Please feel free to test 9 and report back with your results.
That said, Jim Salters comments in this thread do have me thinking about if ashift=9 would have performed better. He was my go-to resource when exploring this topic.
https://discourse.practicalzfs.com/t/block-size-alignment-with-512b-ssd/1441