SukuDokuWiki

I had a project that was on a "Shoe string budget" that we had to use some old hardware to implement a virtualized environment. Most of the environment consisted of Virtualized desktops, so IOPS rather than throughput were very important. As we added more desktops, we noticed that the system was starting to crawl and Nexenta was starting to drop paths and not be able to recover without a reboot. Everything seems to point to the iSCSI daemon running on the Nexenta boxes.

Hardware

Dell PowerEdge 2900
2 x Quad Xeons
32 GB RAM
2 x SSD
6 x 1TB 7200rpm SAS drives, RAID 5 through a Perc controller with 512MB cache ( I should have set these up as individual drive I realize now)
Nexenta Setup

6 x 1TB drives RAIDed through Perc controller and presented to Nexenta as one large Volume ( I should have set these up as individual drive I realize now)
SSD1 for log (Read cache)
SSD2 for ZIL (Write cache)
VMware Setup

ESXi 5u1, connected through iSCSI to the Nexenta SAN. Default parameters on both. 500GB iSCSI LUNS with 10-12 VM's per LUN. ~50 VM's total.

Notes

ESXi sends all write requests as synchronous, so while async operations can be put in to RAM in Nexenta, sync operations need to be written to stable storage.

Fixes

While these fixes affect speed, they do lead to a pretty stable system for us on our low end hardware. Cloning of a 25GB VM takes us roughly 20min. All of our VDI's are fast enough for people not to complain and most importantly once again, the system is stable.

Fix 1: Load up on RAM. The more RAM you have, the larger the ARC (Adaptive Read Cache) can be.

Reason: This saves the drives to do other things.

Fix 2: Get at least two SSD's, preferably three. They can be cheap (We're using OCZ Vertex 4's). Use one for a ZIL (write cache, use two here for a mirror if at all possible) and the other for a Log (Read cache, aka L2ARC). Your ZIL can be small (I've never seen it grow beyond 1-2G), your Log should be as large as budgeting allows.

Reason: This buffers the harddrives. Reads can temporarily be held till the harddrives have time to write, and reads can be cached and pulled directly from RAM/SSD. It also allows write instructions to be ordered for best performance when actually executed.

Fix 3: Use compression. There is NO reason not to on any machine (unless it is a 486-Dx2).

Reason: This reduces the number of IOPS your disks have to do and it increases throughput.

Fix 3: Make sure you have Nexenta setup to use your log devices properly. For each of your zvols, setup the following parameters: Pimay Cache: All. Seconday Cache: All. Compession: On. Log Bias: Latency. Sync: Always. Witeback cache: On

Reason: This forces all sync requests to your ZIL, reducing the load on your disks. When the disks are available, the ZIL will flush to it.

Fix 4: Disable VAAI, UNMAP, and Delayed ACK on your ESXi hosts. To do this, SSH in to your ESXi hosts and execute the following:

esxcli system settings advanced set --int-value 0 --option /VMFS3/EnableBlockDelete
esxcli system settings advanced set --int-value 0 --option /DataMover/HardwareAcceleratedMove
esxcli system settings advanced set --int-value 0 --option /DataMover/HardwareAcceleratedInit
esxcli system settings advanced set --int-value 0 --option /VMFS3/HardwareAcceleratedLocking
vmkiscsi-tool vmhba33 -W -a delayed_ack=0
(Note: I have had problems with this setting the parameter correctly. Use the below instructions if it does not set it correctly)
Log in to the vSphee Client and select the host. Navigate to the Configuation tab. Select Stoage Adaptes. Select the iSCSI vmhba to be modified. Click Popeties. Select the Geneal tab. Click Advanced. In the Advanced Settings dialog box, scoll down to the delayed ACK setting. Uncheck Inheit Fom paent. Uncheck DelayedAck. Reboot the host

Reason: VAAI and UNMAP seem to cause problems with the iSCSI daemon. DelayedACK seems to aggravate congestion problems, also an issue with the iSCSI daemon.

Fix 5: Reduce the Syszfsvdevmaxpending. Go in to Settings -> Appliance -> System -> syszfsvdevmaxpending and change the value to a one, or edit your /etc/system file and add "set zfs:zfsvdevmax_pending = 1" to it.

Reason: This is how big the command pipeline is to your drives. By reducing this number, you are reducing the load on your drives. This is VERY important with SATA drives which lack TCQ (but do have NCQ)

Fix 6: Enable Syszfsnocacheflush. Go in to Settings -> Appliance -> System -> Syszfsnocacheflush and set this to yes or edit your /etc/system file and add "set zfs:zfs_nocacheflush = 1"

Reason: "Disable device cache flushes issued by ZFS during write operation processing. "

Fix 7: Put each interface on it's own subnet, unless you are using bonding/teaming.

Reason: It prevents ARP Flux and contention.

Links

http://blogs.vmwae.com/vsphee/2012/04/vaai-thin-povisioning-block-eclaimunmap-in-action.html
http://kb.vmwae.com/selfsevice/micosites/seach.do?language=en_US&cmd=displayKC&extenalId=1002598
http://en.wikipedia.og/wiki/SeialattachedSCSI
http://en.wikipedia.og/wiki/Taggedcommandqueuing
http://en.wikipedia.og/wiki/Nativecommandqueuing

SukuDokuWiki

User Tools

Site Tools

Page Tools