Understanding vSAN Storage Policies

 

Storage Policy Based Management (SPBM) is a storage policy framework that provides a single unified control plane across a broad range of data services and storage solutions. The framework helps to align storage with application demands of your virtual machines.

SPBM enables the following mechanisms:

  • Advertisement of storage capabilities and data services that storage arrays and other entities, such as I/O filters, offer.
  • Bidirectional communications between ESXi and vCenter Server on one side, and storage arrays and entities on the other.
  • Virtual machine provisioning based on VM storage policies.

 

Storage Policy-Based Management (SPBM) from VMware enables precise control of storage services. Like other storage solutions, vSAN provides services such as availability levels, capacity consumption, and stripe widths for performance. A storage policy contains one or more rules that define service levels.

GUID-FD9EE41C-1FC0-4B91-BC8D-720920B8443C-low

Storage policies are created and managed using the vSphere Web Client. Policies can be assigned to virtual machines and individual objects such as a virtual disk. Storage policies are easily changed or reassigned if application requirements change. These modifications are performed with no downtime and without the need to migrate virtual machines from one datastore to another. SPBM makes it possible to assign and modify service levels with precision on a per-virtual machine basis. Each virtual machine deployed to vSAN datastores is assigned at least one virtual machine storage policy. You can assign storage policies when you create or edit virtual machines.

We are going to cover main capabilities and storage policies of vSAN.

 

  1. Number of disk stripes per object

The minimum number of capacity devices across which each replica of a virtual machine object is striped, commonly referred to as stripe width.

Striping may help performance if certain virtual machines are I/O intensive and others are not. With striping, a virtual machines data is spread across more drives which all contribute to the overall storage performance experienced by that virtual machine. In the case of hybrid,this striping would be across magnetic disks. In the case of all-flash, the striping would be across whatever flash devices are making up the capacity layer. There are two main sizing considerations when it comes to stripe width:

First of these considerations is if there are enough physical devices in the various hosts and across the cluster to accommodate the requested stripe width.

Second consideration is whether the value chosen for stripe width is going to require a significant number of components and consume the host component count.

Default value for stripe width is 1. Maximum value is 12. However, for the most part, VMware recommends leaving striping at the default value of 1. A value higher than 1 might result in better performance, but also results in higher use of system resources.

 

  1. Primary level of failures to tolerate (PFTT)

Defines the number of host and device failures that a virtual machine object can tolerate. For n failures tolerated, each piece of data written is stored in n+1 places. This is the number of concurrent host, network or disk failures that may occur in the cluster and still ensuring the availability of the object. Default value is 1. Maximum value is 3. When provisioning a virtual machine, if you do not choose a storage policy, vSAN assigns this policy as the default virtual machine storage policy, which is PFTT = 1.  If fault domains are configured, 2n+1 fault domains with hosts contributing capacity are required.

In a vSAN Stretched cluster configuration, this rule defines the number of site failures that a virtual machine object can tolerate. You can use PFTT with the SFTT to provide local fault protection for objects within your data sites. The maximum value for a stretched cluster is 1.Figure2

  1. Secondary level of failures to tolerate (SFTT)

This is a storage policy used only on vSAN Stretched Cluster configuration. In a stretched cluster, this rule defines the number of additional host failures that the object can tolerate after the number of site failures defined by PFTT is reached. Default value is 1. Maximum value is 3. This policy can be combined with PFTT to ensure high availability of data across cluster. For example if PFTT = 1 and SFTT = 2, and one site is unavailable, then the cluster can tolerate two additional host failures.

 

  1. Flash read cache reservation

This is the amount of flash capacity reserved on the SSD as read cache for the storage object. It is specified as a percentage of the logical size of the storage object (i.e. VMDK). Reserved flash capacity cannot be used by other objects. Unreserved flash is shared fairly among all objects. You do not have to set a reservation to get cache. Setting read cache reservations might cause a problem when you move the virtual machine object because the cache reservation settings are always included with the object.

The reservation should be left at 0 (default) unless you are trying to solve a real performance problem and you believe dedicating read cache is the solution. Default value is 0%. Maximum value is 100%. The Flash Read Cache Reservation storage policy attribute is supported only for hybrid configurations.

 

  1. Object space reservation

All objects deployed on vSAN are thinly provisioned. This policy defines percentage of the logical size of the virtual machine disk (.vmdk) object that must be reserved, or thick provisioned when deploying virtual machines. The Object Space Reservation is the amount of space to reserve specified as a percentage of the total object address space. Default value is 0%. Maximum value is 100%.

 

  1. Failure tolerance method

This policy specifies whether the data replication method optimizes for Performance or Capacity. There are different FTM to choose for an vSAN object, 3 different RAID types that vSAN supports: RAID 1, 5 and 6. RAID-5/6 is also called Erasure Coding.

RAID-1 (Mirroring) – Performance, vSAN uses more disk space to place the components of objects but provides better performance for accessing the objects. For each object with RAID-1 policy, vSAN will create another mirror of that object and place in another host, so we can tolerate one VM failure.

RAID-5/6 (Erasure Coding) – Capacity, vSAN uses less disk space, but the performance is reduced. Erasure coding can provide the same level of data protection as mirroring (RAID 1), while using less storage capacity. RAID 5 or RAID 6 erasure coding enables vSAN to tolerate the failure of up to two capacity devices in the datastore. You can configure RAID 5 on all-flash clusters with four or more fault domains. You can configure RAID 5 or RAID 6 on all-flash clusters with six or more fault domains.

erasure-coding-1

This storage policy can be combined with PFTT policy to prevent more high availability against data loss.

– To use RAID 5, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 1.

– To use RAID 6, set Failure tolerance method to RAID-5/6 (Erasure Coding) – Capacity and Primary level of failures to tolerate to 2.

 

  1. Force provisioning

If the option is set to Yes, the object is provisioned even if the Primary level of failures to tolerate, Number of disk stripes per object, and Flash read cache reservation policies specified in the storage policy cannot be satisfied by the datastore. The default No is acceptable for most production environments. vSAN fails to provision a virtual machine when the policy requirements are not met, but it successfully creates the user-defined storage policy. The virtual machine will be shown as non-compliant in the VM Summary tab, and relevant VM Storage Policy views in the UI.

 

  1. IOPS limit for object

Defines the IOPS limit for an object, such as a vmdk. Through this policy setting, a customer can set an IOPS limit on a per object basis (typically vmdk) which will guarantee that the object will not be able to exceed this amount of IOPS. IOPS is calculated as the number of I/O operations, using a weighted size. If the system uses the default base size of 32 KB, a 64-KB I/O represents two I/O operations.

If the IOPS limit for object is set to 0, IOPS limits are not enforced. vSAN allows the object to double the rate of the IOPS limit during the first second of operation or after a period of inactivity.

When working with virtual machine storage policies, you must understand how the storage capabilities affect the consumption of storage capacity in the vSAN cluster and then choose the best optimal storage policies.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.