Skip to main content

storage - Linux - real-world hardware RAID controller tuning (scsi and cciss)



Most of the Linux systems I manage feature hardware RAID controllers (mostly HP Smart Array). They're all running RHEL or CentOS.



I'm looking for real-world tunables to help optimize performance for setups that incorporate hardware RAID controllers with SAS disks (Smart Array, Perc, LSI, etc.) and battery-backed or flash-backed cache. Assume RAID 1+0 and multiple spindles (4+ disks).




I spend a considerable amount of time tuning Linux network settings for low-latency and financial trading applications. But many of those options are well-documented (changing send/receive buffers, modifying TCP window settings, etc.). What are engineers doing on the storage side?



Historically, I've made changes to the I/O scheduling elevator, recently opting for the deadline and noop schedulers to improve performance within my applications. As RHEL versions have progressed, I've also noticed that the compiled-in defaults for SCSI and CCISS block devices have changed as well. This has had an impact on the recommended storage subsystem settings over time. However, it's been awhile since I've seen any clear recommendations. And I know that the OS defaults aren't optimal. For example, it seems that the default read-ahead buffer of 128kb is extremely small for a deployment on server-class hardware.



The following articles explore the performance impact of changing read-ahead cache and nr_requests values on the block queues.


http://zackreed.me/articles/54-hp-smart-array-p410-controller-tuning

http://www.overclock.net/t/515068/tuning-a-hp-smart-array-p400-with-linux-why-tuning-really-matters

http://yoshinorimatsunobu.blogspot.com/2009/04/linux-io-scheduler-queue-size-and.html




For example, these are suggested changes for an HP Smart Array RAID controller:



echo "noop" > /sys/block/cciss\!c0d0/queue/scheduler 
blockdev --setra 65536 /dev/cciss/c0d0
echo 512 > /sys/block/cciss\!c0d0/queue/nr_requests
echo 2048 > /sys/block/cciss\!c0d0/queue/read_ahead_kb


What else can be reliably tuned to improve storage performance?


I'm specifically looking for sysctl and sysfs options in production scenarios.


Answer



I've found that when I've had to tune for lower latency vs throughput, I've tuned nr_requests down from it's default (to as low as 32). The idea being smaller batches equals lower latency.



Also for read_ahead_kb I've found that for sequential reads/writes, increasing this value offers better throughput, but I've found that this option really depends on your workload and IO pattern. For example on a database system that I've recently tuned I changed this value to match a single db page size which helped to reduce read latency. Increasing or decreasing beyond this value proved to hurt performance in my case.



As for other options or settings for block device queues:



max_sectors_kb = I've set this value to match what the hardware allows for a single transfer (check the value of the max_hw_sectors_kb (RO) file in sysfs to see what's allowed)




nomerges = this lets you disable or adjust lookup logic for merging io requests. (turning this off can save you some cpu cycles, but I haven't seen any benefit when changing this for my systems, so I left it default)



rq_affinity = I haven't tried this yet, but here is the explanation behind it from the kernel docs




If this option is '1', the block layer will migrate request completions to the
cpu "group" that originally submitted the request. For some workloads this
provides a significant reduction in CPU cycles due to caching effects.
For storage configurations that need to maximize distribution of completion
processing setting this option to '2' forces the completion to run on the
requesting cpu (bypassing the "group" aggregation logic)"





scheduler = you said that you tried deadline and noop. I've tested both noop and deadline, but have found deadline win's out for the testing I've done most recently for a database server.



NOOP performed well, but for our database server I was still able to achieve better performance adjusting the deadline scheduler.



Options for deadline scheduler located under /sys/block/{sd,cciss,dm-}*/queue/iosched/ :



fifo_batch = kind of like nr_requests, but specific to the scheduler. Rule of thumb is tune this down for lower latency or up for throughput. Controls the batch size of read and write requests.




write_expire = sets the expire time for write batches default is 5000ms. Once again decrease this value decreases your write latency while increase the value increases throughput.



read_expire = sets the expire time for read batches default is 500ms. Same rules apply here.



front_merges = I tend to turn this off, and it's on by default. I don't see the need for the scheduler to waste cpu cycles trying to front merge IO requests.



writes_starved = since deadline is geared toward reads the default here is to process 2 read batches before a write batch is processed. I found the default of 2 to be good for my workload.


Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits, ...

linux - Awstats - outputting stats for merged Access_logs only producing stats for one server's log

I've been attempting this for two weeks and I've accessed countless number of sites on this issue and it seems there is something I'm not getting here and I'm at a lost. I manged to figure out how to merge logs from two servers together. (Taking care to only merge the matching domains together) The logs from the first server span from 15 Dec 2012 to 8 April 2014 The logs from the second server span from 2 Mar 2014 to 9 April 2014 I was able to successfully merge them using the logresolvemerge.pl script simply enermerating each log and > out_putting_it_to_file Looking at the two logs from each server the format seems exactly the same. The problem I'm having is producing the stats page for the logs. The command I've boiled it down to is /usr/share/awstats/tools/awstats_buildstaticpages.pl -configdir=/home/User/Documents/conf/ -config=example.com awstatsprog=/usr/share/awstats/wwwroot/cgi-bin/awstats.pl dir=/home/User/Documents/parced -month=all -year=all...