Recently we have been having (extremely) slow boot times for a couple of hosts that we’ve been using for a new customer. We actually didn’t really notice this in the beginning, until we added hosts to the existing cluster and the reboot for that host took over 3 hours!

The host seemed fine to me after checking the hardware and configuration. Everything checked out. But the host still seemed to be broken. Checking the console it seemed to be stuck on the “vmw_vaaip_cx loaded successfully” message in the hypervisor boot screen. Shortly after investigating this I remembered that these hosts have LUN’s being used as physical Raw Device Mappings (RDM) on Windows MSCS clusters. And that was the problem! During the boot process, ESXi scans the so-called storage mid-layer and attempts to discover all devices presented to an ESXi host during the device claiming phase. This however is a problem for RDM’s used in MSCS clusters because they have a permanent SCSI reservation allocated to them. Because of this the ESXi host cannot interrogate the LUN during the boot process and has to wait for a device scan timeout, which takes a long time. In this KB VMware says that if you have 10 RDM’s the boot process is already 30 minutes (excluding physical hardware boot times). Our hosts have over 40 RDM’s so that took way to long.

Fortunately, this can be fixed by setting an ESXi local device parameter on the LUN. This setting is called “perennially reserved”.  This setting just tells ESXi that there is a permanent SCSI reservation on the LUN, so it can be skipped during the boot process. ESXi basically won’t scan the device during the boot process anymore.

You can easily set this for a LUN on an ESXi host with the following commands:

Check if the command succeeded by using the following command and checking the “IsPereniallyReserved” setting:

This is easy if you have just a single host, but we have an environment with loads of hosts for this customer and since I am not going to execute this for each RDM on each host by hand, I wrote a script for this. This script is pretty simple, it checks a given cluster for VM’s with (p/v)RDM’s and makes those (p/v)RDM’s perennially reserved on each host in the cluster. It has an additional check to see if a LUN has already been set to perennially reserved, if so it skips that LUN.

Output from the script should be something like this:

Or:

I did find some examples for the scripts on the web, but I couldn’t actually find any with ESXCLI V2 commands that suited my situation. So I created my own. I am going to write this into a function sometime this year so that it will be easier to use, but this will do for now.

The “Perennially Reserved” setting can also be edited in a Host Profile since ESXi 5.1. You can do this by browsing to the host profile -> Edit Host Profile -> Storage Configuration -> Pluggable Storage Architecture (PSA) configuration -> PSA Device Setting and set the flag “Device perennially reserved status” to “Enabled”. See the screenshot below for more information:

Host Profile Perennially Reserved flag setting

So there you have it! Are you experiencing extreme long boot times and using (p/v)RDM’s? You will want to archive this script to make your life easier!

Share this if you found this interesting.

Leave a Reply

Your email address will not be published. Required fields are marked *