You are currently browsing the archives for the SRM category.

Archive for the ‘SRM’ Category

NetApp unified SRA requires NAS and SAN RBAC

Tuesday, June 29, 2010
posted by Andy Daniel

Rawlinson Rivera blogged about a known issue with the 1.4.3 version of the NetApp SRA a while back. If you’re not familiar, this version marries the seperate NAS and SAN versions that previously existed. If you read through the comments you’ll discover that the documented issue occurs in NFS only environments where no iSCSI or FCP LUNs exist. As explained in the NetApp knowledgebase article (NOW access required), the workaround is to create a dummy VMware igroup:

igroup create -i -t vmware sra_dummy_igroup

After recently implementing the workaround, we continued to receive a LUN discovery error until we finally remembered that the filer permissions required for the older NAS and SAN SRAs were different. During our original setup, we had configured role based access control with the requirements for NAS only. Unfortunately, to complete a successful discovery with the new unified SRA, it appears that RBAC must now include the permissions for both NAS and SAN, even if you’re not using one or the other. Luckily for us, a quick look back at the NetApp SRA RBAC rights knowledgebase article (NOW access required) got us back on track.

Configuring the NetApp SRA to use SSL

Saturday, April 3, 2010
posted by Andy Daniel

Cormac Hogan produced a great Proven Practice document a while back detailing the steps needed for configuring the NetApp SRA to use SSL communication instead of unencrypted http. The steps involve using PPM to download the Perl libraries needed for SSL communication. If the 27 page document seems a little daunting, and RTFM is not for you, then how about the two steps below:

1. Download the libraries here and drop them into C:\Program Files\VMware\VMware vCenter Site Recovery Manager\external\perl-5.8.8

2. Modify the ontap_config.txt file found in C:\Program Files\VMware\VMware vCenter Site Recovery Manager\scripts\SAN\ONTAP_NAS and set the SSL option to on.

Now, obviously this assumes you’ve already enabled SSL Secure Admin on your filer, etc. If not, take a look at Cormac’s guide. Either way, hopefully the libraries above will save you some time. :)

SRM 4 and disappearing iSCSI LUNs

Thursday, March 18, 2010
posted by Andy Daniel

It appears that Site Recovery Manager 4 ships with a default advanced option that is potentially very dangerous for iSCSI users who run production VMs at the recovery site. The option is “SanProvider.removeStaticIscsiTargets” and it is enabled by default. It isn’t quite clear exactly why it comes enabled, but the result is that during the cleanup of a recovery test, all iSCSI targets are removed. This includes any targets that are potentially running production virtual machines! A couple of colleagues stumbled across this issue during a multi-site deployment last week. In this case, SRM removed an iSCSI target with LUNs containing not only production machines, but the DR vCenter and SRM machines too! Hopefully this will be corrected in a future update, but for now, go disable this option!

iscsi_targets

SRM 4.0.1 is out!

Saturday, February 27, 2010
posted by Andy Daniel

You can find it here. This update is near and dear to my heart because it solves an issue I recently ran into:

“a problem that could cause intermittent site disconnections when there was a firewall between the sites that was configured to close connections due to inactivity”

Quick turnaround to get this one fixed! Here is a complete summary of fixes:

  • Test recovery times have been improved for ESX 4.0.1 hosts that use iSCSI arrays.
  • a problem that could cause a recovery plan to hang while powering-off virtual machines at the protected site if the virtual machine’s storage goes offline while the plan is running
  • Customization is now supported for virtual machines running Windows 7 and Windows 2008 R2.
  • a problem that could prevent IP customization from updating the /etc/hosts file on a protected virtual machine running Linux
  • a problem that could cause intermittent site disconnections when there was a firewall between the sites that was configured to close connections due to inactivity
  • a problem that could cause test and recovery networks to be swapped in a recovery plan after the SRM service was restarted
  • a problem that could cause datastore group calculation to fail with a “Not initialized” exception when encountering a virtual machine with an RDM device for which the lunUuid is not set
  • a problem that could cause a recovered virtual machine to be deleted if an administrator manually removed it from a protection group while a recovery plan was being run
  • a problem that could cause the SRM Installer to fail to update vCenter credentials when running in Repair mode
  • a problem that caused the Perl installation created by SRM to be incompatible with some Perl packages. This fix eliminates the need to create the temporary Perl installation mentioned in VMware Knowledge Base article 1014232.
  • a problem that could cause the SRM Service to hang when a Configure All operation configured more than 300 virtual machines
  • a problem that could cause recovery plan failures with hardware iSCSI HBAs connected to Clarriion arrays

Undocumented SRM communication

Saturday, February 27, 2010
posted by Andy Daniel

I’ve had the pleasure of implementing SRM within a highly firewalled environment over the last couple of weeks. With the help of a skilled customer and Wireshark, we successfully identified undocumented (at least that I could find) intrasite SRM communication. A call into VMware support and subsequent escalation to the SRM engineering team helped to verify that our findings were correct. In our case, the ESX Service Console network was completely isolated from the SRM server and we found issues when executing a recovery plan with protected VMs that had RDMs and when doing IP customization. We found that the SRM server at the recovery site must directly connect to ESX hosts over port 902 (NFC). In the RDM case, SRM must modify the vmx file since it contains references to the RDM disk path (UUID) that has changed because of resignature. Although there may be other instances, the three cases we confirmed where SRM must establish a connection directly with ESX hosts are:

VM with RDM(s)
VM with Snapshot(s)
VM where IP customization is performed

To solve our issue, we simply multihomed our SRM server and enabled communication into the Service Console network.