Managing vSAN 6.6 Fault Domains

 

Back in vSAN 6.0 release (when it was called VSAN), VMware introduced the ability to create Fault Domains to provide rack awareness and a bit of control where vSAN placed data objects. The idea behind fault domains is that we want to be able to tolerate groups of hosts (chassis or racks) failing without requiring additional data copies. The implementation allows vSAN to save replica copies of the virtual machine data in different domains, for example, different racks of compute.

Fault domains enable you to protect against rack or chassis failure if your vSAN cluster spans across multiple racks or blade server chassis. You can create fault domains and add one or more hosts to each fault domain.

A fault domain consists of one or more vSAN hosts grouped according to their physical location in the data center. When configured, fault domains enable vSAN to tolerate failures of entire physical racks as well as failures of a single host, capacity device, network link, or a network switch dedicated to a fault domain.

2018-06-22_14h52_13

Each host in a vSAN cluster is an implicit fault domain. vSAN automatically distributes components of a vSAN object across fault domains in a cluster based on the Number of Failures to Tolerate rule in the assigned storage policy. When you configure fault domains on a rack and provision a new virtual machine, vSAN ensures that protection objects, such as replicas and witnesses, are placed in different fault domains.

I am going to show how to enable and configure fault domains in vSAN 6.6 and see the actual data distributed across the FD.

First navigate to the vSAN cluster in the vSphere Web Client. Click Configure -> Fault Domains and Stretched Cluster and click the Create a new fault domain icon ().

2018-06-19_13h11_36

Enter the name of the new fault domain and check the host which be part of this FD and click OK.

2018-06-19_13h12_02

The selected hosts appear in the fault domain. In my lab I just created 3 fault domain, with one esxi host each, FD-01, FD-02 and FD-03.

2018-06-19_13h13_03

When fault domains are enabled, this allows hosts to be grouped together to form a fault domain. This means that no two copies/replicas of the virtual machine’s data will be placed in the same fault domain. Depending on storage policy you create and attach to a virtual machine, vSAN will distribute the components of that VM in different hosts in across fault domains.

We are going to create a new vSAN storage policy to show the function of fault domains. For that go to vSphere web client -> Home and VM Storage Policies. Click icon highlighed to create a new VM storage policy.

2018-06-19_13h36_48

Enter the name for the new storage policy.

2018-06-19_13h37_13

In Common rules tab just click next to continue with rule set.

Most important storage policy for the fault domains configuration are:

  • Failure Tolerance Method  (FTM): Defines the actual data placement, or parity method used to tolerate a failure. The FTM can be set to “RAID-1 (Mirroring)” or “RAID-5/6 (Erasure Coding).”
  • Failures to Tolerate (FTT): Defines the number of failures an object can tolerate while still being accessible. Valid preset FTT values for RAID-1 object mirroring would be from 0 – 3, while RAID-5/6 supports an FTT of 1 – 2.

2018-06-19_13h38_32

In our example,I am going with PFTT=1, which means that assigned VMs will use RAID-5.

Now we are going to apply this storage policy just created to a virtual machine. Right-click on VM -> VM Policies and Edit VM Storage Policies.

2018-06-19_13h40_29

Choose the policy from the dropdown list and click Apply to all. You can check the impact of applying this policy. Click OK.

2018-06-19_13h40_55

Now you can check the components of the VM how are spread and their status, on Monitor tab of VM, go to Policies -> Physical Disk Placement.

2018-06-19_13h45_32

As the output shows, the VM is compliant with this policy, and component are distributed across 2 different fault domains, one component on FD-01 and second component (replica) on FD-02. With RAID-5, there will be 3 data components and a parity component.

Keep in mind that:

  • a minimum of three (3) Fault Domains are required, standard practice would be to configure four (4).
  • When creating Fault Domains, create each Fault Domain with the same amount of hosts.
  • For vSAN objects that will be protected with Mirroring, there must be 2n+1 hosts or Fault Domains for the level of protection chosen. And with Erasure Coding, there must be 2n+2 hosts or Fault Domains.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.