Back in vSAN 6.0 release (when it was called VSAN), VMware introduced the ability to create Fault Domains to provide rack awareness and a bit of control where vSAN placed data objects. The idea behind fault domains is that we want to be able to tolerate groups of hosts (chassis or racks) failing without requiring additional data copies. The implementation allows vSAN to save replica copies of the virtual machine data in different domains, for example, different racks of compute.
Fault domains enable you to protect against rack or chassis failure if your vSAN cluster spans across multiple racks or blade server chassis. You can create fault domains and add one or more hosts to each fault domain.
A fault domain consists of one or more vSAN hosts grouped according to their physical location in the data center. When configured, fault domains enable vSAN to tolerate failures of entire physical racks as well as failures of a single host, capacity device, network link, or a network switch dedicated to a fault domain.
Each host in a vSAN cluster is an implicit fault domain. vSAN automatically distributes components of a vSAN object across fault domains in a cluster based on the Number of Failures to Tolerate rule in the assigned storage policy. When you configure fault domains on a rack and provision a new virtual machine, vSAN ensures that protection objects, such as replicas and witnesses, are placed in different fault domains.
I am going to show how to enable and configure fault domains in vSAN 6.6 and see the actual data distributed across the FD.
First navigate to the vSAN cluster in the vSphere Web Client. Click Configure -> Fault Domains and Stretched Cluster and click the Create a new fault domain icon ().
Enter the name of the new fault domain and check the host which be part of this FD and click OK.
The selected hosts appear in the fault domain. In my lab I just created 3 fault domain, with one esxi host each, FD-01, FD-02 and FD-03.
When fault domains are enabled, this allows hosts to be grouped together to form a fault domain. This means that no two copies/replicas of the virtual machine’s data will be placed in the same fault domain. Depending on storage policy you create and attach to a virtual machine, vSAN will distribute the components of that VM in different hosts in across fault domains.
We are going to create a new vSAN storage policy to show the function of fault domains. For that go to vSphere web client -> Home and VM Storage Policies. Click icon highlighed to create a new VM storage policy.
Enter the name for the new storage policy.
In Common rules tab just click next to continue with rule set.
Most important storage policy for the fault domains configuration are:
- Failure Tolerance Method (FTM): Defines the actual data placement, or parity method used to tolerate a failure. The FTM can be set to “RAID-1 (Mirroring)” or “RAID-5/6 (Erasure Coding).”
- Failures to Tolerate (FTT): Defines the number of failures an object can tolerate while still being accessible. Valid preset FTT values for RAID-1 object mirroring would be from 0 – 3, while RAID-5/6 supports an FTT of 1 – 2.
In our example,I am going with PFTT=1, which means that assigned VMs will use RAID-5.
Now we are going to apply this storage policy just created to a virtual machine. Right-click on VM -> VM Policies and Edit VM Storage Policies.
Choose the policy from the dropdown list and click Apply to all. You can check the impact of applying this policy. Click OK.
Now you can check the components of the VM how are spread and their status, on Monitor tab of VM, go to Policies -> Physical Disk Placement.
As the output shows, the VM is compliant with this policy, and component are distributed across 2 different fault domains, one component on FD-01 and second component (replica) on FD-02. With RAID-5, there will be 3 data components and a parity component.
Keep in mind that:
- a minimum of three (3) Fault Domains are required, standard practice would be to configure four (4).
- When creating Fault Domains, create each Fault Domain with the same amount of hosts.
- For vSAN objects that will be protected with Mirroring, there must be 2n+1 hosts or Fault Domains for the level of protection chosen. And with Erasure Coding, there must be 2n+2 hosts or Fault Domains.