Thursday, December 23, 2021

vSAN DEDUPLICATION AND COMPRESSION

 vSAN Deduplication and Compression

vSAN can perform block-level deduplication and compression to save storage space. When you enable deduplication and compression on a vSAN all-flash cluster, redundant data within each disk group is reduced.

Deduplication removes redundant data blocks, whereas compression removes additional redundant data within each data block. These techniques work together to reduce the amount of space required to store the data. vSAN applies deduplication and then compression as it moves data from the cache tier to the capacity tier. Use compression-only vSAN for workloads that do not benefit from deduplication, such as online transactional processing.

Deduplication occurs inline when data is written back from the cache tier to the capacity tier. The deduplication algorithm uses a fixed block size and is applied within each disk group. Redundant copies of a block within the same disk group are deduplicated.

Deduplication and compression are enabled as a cluster-wide setting, but they are applied on a disk group basis. When you enable deduplication and compression on a vSAN cluster, redundant data within a particular disk group is reduced to a single copy.

Notes:

  • Compression-only vSAN is applied on a per-disk basis.
  • Deduplication and compression might not be effective for encrypted VMs, because VM Encryption encrypts data on the host before it is written out to storage. Consider storage tradeoffs when using VM Encryption.

When you enable or disable deduplication and compression,  vSAN performs a rolling reformat of every disk group on every host. Depending on the data stored on the vSAN datastore, this process might take a long time. Do not perform these operations frequently. If you plan to disable deduplication and compression, you must first verify that enough physical capacity is available to place your data.

How to Manage Disks in a Cluster with Deduplication and Compression

Consider the following guidelines when managing disks in a cluster with deduplication and compression enabled. These guidelines do not apply to compression-only vSAN.

  • Avoid adding disks to a disk group incrementally. For more efficient deduplication and compression, consider adding a disk group to increase the cluster storage capacity.
  • When you add a disk group manually, add all the capacity disks at the same time.
  • You cannot remove a single disk from a disk group. You must remove the entire disk group to make modifications.
  • A single disk failure causes the entire disk group to fail.

Verifying Space Savings from Deduplication and Compression

The amount of storage reduction from deduplication and compression depends on many factors, including the type of data stored and the number of duplicate blocks. Larger disk groups tend to provide a higher deduplication ratio. You can check the results of deduplication and compression by viewing the Usage breakdown before dedup and compression in the vSAN Capacity monitor.

You can view the Usage breakdown before dedup and compression when you monitor vSAN capacity in the vSphere Client. It displays information about the results of deduplication and compression. The Used Before space indicates the logical space required before applying deduplication and compression, while the Used After space indicates the physical space used after applying deduplication and compression. The Used After space also displays an overview of the amount of space saved, and the Deduplication and Compression ratio.

The Deduplication and Compression ratio is based on the logical (Used Before) space required to store data before applying deduplication and compression, in relation to the physical (Used After) space required after applying deduplication and compression. Specifically, the ratio is the Used Before space divided by the Used After space. For example, if the Used Before space is 3 GB, but the physical Used After space is 1 GB, the deduplication and compression ratio is 3x.

When deduplication and compression are enabled on the vSAN cluster, it might take several minutes for capacity updates to be reflected in the Capacity monitor as disk space is reclaimed and reallocated.

Deduplication and Compression Design Considerations

Consider these guidelines when you configure deduplication and compression in a vSAN cluster.

  • Deduplication and compression are available only on all-flash disk groups.
  • On-disk format version 3.0 or later is required to support deduplication and compression.
  • You must have a valid license to enable deduplication and compression on a cluster.
  • When you enable deduplication and compression on a vSAN cluster, all disk groups participate in data reduction through deduplication and compression.
  • vSAN can eliminate duplicate data blocks within each disk group, but not across disk groups.
  • Capacity overhead for deduplication and compression is approximately five percent of total raw capacity.
  • Policies must have either 0 percent or 100 percent object space reservations. Policies with 100 percent object space reservations are always honored, but can make deduplication and compression less efficient.

Enable Deduplication and Compression on a New vSAN Cluster

You can enable deduplication and compression when you configure a new vSAN all-flash cluster.

  1. Navigate to a new all-flash vSAN cluster.
  2. Click the Configure tab.
  3. Under vSAN, select Services.
    1. Click to edit Space Efficiency.
    2. Select a space efficiency option: Deduplication and compression, or Compression only.
    3. (Optional) Select Allow Reduced Redundancy. If needed, vSAN reduces the protection level of your VMs while enabling Deduplication and Compression. For more details, see Reducing VM Redundancy for vSAN Cluster.
    4. Complete your cluster configuration.

The New "Compression only" Option in vSAN 7 U1

A "Compression only" option alleviates the challenge described above. vSAN administrators can use this setting for clusters with demanding workloads that typically cannot take advantage of deduplication techniques. It accommodates today’s economics of flash storage while maintaining an emphasis on delivering performance for high demand, latency-sensitive workloads.

Selecting the desired space efficiency option is easy. At the cluster level, the vCenter Server UI now presents three options:
 

  1. None
  2. Compression only
  3. Deduplication and compression.

Note that changing this cluster-level setting does require a rolling evacuation of the data in each disk group. This is an automated process but does require resources while the activity is performed.
 

Advantages

When compared to the DD&C option, the "Compression only" option offers interesting advantages.
 

  • Reduce the failure domain of a capacity device failure. A failure of a capacity device in a disk group for a cluster using "Compression only" will only impact that discrete storage device, whereas the same failure using DD&C would impact the entire disk group. This reduced impact area of a device failure also reduces the amount of potential data that vSAN needs to rebuild upon a device failure.
     
    Figure 2. Comparing the failure domain of a capacity device failure in vSAN 7 U1
     
  • Increased destaging rates of data from the buffer tier to the capacity tier. As described in "vSAN Design Considerations – Deduplication and Compression," vSAN’s two-tier system ingests writes into a high-performance buffer tier, while destaging the data to the more value-based capacity tier at a later time. The space efficiency processes occur at the time of destaging, and as described in that post, may have a potential impact on performance. When compared to DD&C, the "Compression only" feature improves destage rates in two ways: 1.) Avoids the inherent write amplification required with deduplication techniques, and 2.) Uses multiple elevator processes to destage the data.
     
     

Capacity Savings

 
How much space savings can one expect using the "Compression only" feature? The answer to this depends on the workload, and the type of data being stored. Both of the DD&C and "Compression only" features are opportunistic, which means that space savings are not guaranteed. This capacity savings through compression can be easily viewed in the vCenter Server UI. Note that it may take some time before the savings ratio stabilizes.

By contrast, vSAN’s data placement techniques using erasure codes like RAID-5/6 are deterministic: They provide a guaranteed level of space efficiency for data stored in a resilient manner. RAID-5/6 erasure coding can be applied to VMs using storage policies and can be used with cluster-based space efficiency techniques.
 

Performance

 
What will the levels of performance be like when using the "Compression only" feature? This will land somewhere in between the performance of your hosts not running any space efficiency, and the performance of your hosts running DD&C.

Performance using "Compression only" could be superior when compared to the same environment using DD&C. This improvement would show up most where there are workloads with large working sets issuing large sequential writes and medium-sized random writes. In these cases, the absence of the deduplication engine and the improved parallelization of destaging will allow the data to be destaged faster, and less likely to hit buffer fullness thresholds that begin to impact the guest VM latency.
 

 
The performance capabilities of vSAN are still ultimately determined by the hardware used, the configuration of vSAN, the version of vSAN, the associated storage policies, and the characteristics of the application & workload. To better understand how hardware selection (including the type of flash devices) impact performance, see the post "vSAN Design Considerations – Fast Storage Devices versus Fast Networking" and "Write buffer sizing in vSAN when using the very latest hardware."
 

Compression only, or Deduplication and compression? Which is right for you?

 
Workloads and data sets do not provide an easy way to know if they are ideally suited for some space efficiency techniques versus others. Therefore, the administrator should decide based on the requirements of the workloads and the constraints of the hardware powering the workloads. A comparison of design and operational considerations between the three options is provided below.
 

* Capacity savings not guaranteed
** Depends on workloads, working sets, and hardware configuration

 
For some environments, the minimal failure domain of a capacity device failure may be the only reason needed to justify the use of the "Compression only" feature versus the other options. Whatever the case, the configuration desired can be tailored on a per cluster basis.

VMware recommends the following settings for the best balance of capacity savings and performance impact. Workloads and environmental conditions vary, therefore these are generalized recommendations.
 

* If performance is of the highest priority, using no space efficiency would yield the highest sustained performance for the hardware configuration used.

No comments:

Post a Comment