ESXi Performance analysis.

I have described below the CPU parameters of ESXTOP command, In a virtual environment virtual-machine's mostly have performance issues as default even its small or enterprise but it's easy to find the reason "why" if we understand what exactly the performance parameters show's.

There are several other parameter which needs to be considered for performance analysis and troubleshooting but these are by default which should be considered first level of investigation.


%RUN: 
  • This value represents the percentage of absolute time the virtual machine was running on the system.
  • If the virtual machine is unresponsive, %RUN may indicate that the guest operating system is busy conducting an operation.
  • When %RUN is near zero and the virtual machine is unresponsive, it could mean that the virtual machine is idle, blocked on an operation, or is not scheduled due to resource contention. Look at other values (%WAIT, %RDY, and %CSTP) to identify resource contention.
  • When %RUN is near the value of the number of vCPUS x 100%, it means that all vCPUs in the virtual machine are busy. This is an indicator that the guest operating system may be stuck in a operational loop. To investigate this issue further, you may need to engage the appropriate operating system vendor for assistance in identifying why the guest operating system is using all of the CPU resources.
  • If you have engaged the guest operating system vendor, and they have determined that the issue is caused by the VMware tools or the virtual machine hardware, it may be pertinent to suspend the virtual machine to collect additional diagnostic information.

%WAIT: 
  • This value represents the percentage of time the virtual machine was waiting for some VMkernel activity to complete (such as I/O etc) before it can continue.
  • If the virtual machine is unresponsive and the %WAIT value is proportionally higher than %RUN, %RDY, and %CSTP, then it could indicate that the world is waiting for a VMkernel operation to complete.
  • You may observe that the %SYS is proportionally higher than %RUN. %SYS represents the percentage of time spent by system services on behalf of the virtual machine.
  • A high %WAIT value can be a result of a poorly performing storage device where the virtual machine is residing. If you are experiencing storage latency and timeouts, it may trigger these types of symptoms across multiple virtual machines residing in the same LUN, volume, or array depending on the scale of the storage performance issue.
  • A high %WAIT value can also be triggered by latency to any device in the virtual machine configuration. This can include but is not limited to serial pass-through devices, parallel pass-through parallel , and USB devices. If the device suddenly stops functioning or responding, it could result in these symptoms. A common cause for a high %WAIT value is ISO files that have been left mounted in the virtual machine accidentally that have been deleted or moved to an alternate location.
  • If there does not appear to be any backing storage or networking infrastructure issue, it may be pertinent to crash the virtual machine to collect additional diagnostic information.

 %RDY:  
  • This value represents the percentage of time that the virtual machine is ready to execute commands, but has not yet been scheduled for CPU time due to contention with other virtual machines.
  • VM waiting for the physical resource, this is the case occurs when the ESXI host is over provisioned.
  • Threshold: Maximum ready time can be 10 per 1Vcpu (i.e: if vm has 4Vcpu it will be consider as 4x10=40 is acceptable).
  • Compare against the Max-Limited, %MLMTD value. This represents the amount of time that the virtual machine was ready to execute, but has not been scheduled for CPU time because the VMkernel deliberately constrained it.
  • If the virtual machine is unresponsive or very slow and %MLMTD is low it may indicate that the ESX host has limited CPU time to schedule for this virtual machine.
      
%CSTP: 
  • This value represent over-provisioning the resource cause this triggered(i.e: vm's with more cpu). over-provision will negatively affect the performance of vm's and wait for all CPU to process.
  • Threshold: Maximum CSTP value can be 3 for a vm.
  • If the virtual machine is unresponsive and %CSTP is proportionally high compared to %RUN, it may indicate that the ESX host has limited CPU resources simultaneously co-schedule all vCPUs in this virtual machine.
  • Review the usage of virtual machines running with multiple vCPUs on this host. For example, a virtual machine with four vCPUs may need to schedule 4 pCPUs to do an operation. If there are multiple virtual machines configured in this way, it may lead to CPU contention and resource starvation.

%SYS: 
  • Time spent by system services running inside guest OS, high %SYS can be analyzed by %RDY value. Most likely caused by high IO VM's.
  • Threshold: Maximum Threshold %SYS can be 20

% MLMTD: 
  • This depends on operating system recommendations, likely assigning more cpu than OS limit which will cause the negative performance impact for VM.

% SWPWT:
  • This indicates VM waiting on swap disk to access the page files. Possible cause: Memory over-commitment. Hence assigned less or more RAM for VM will be having spike over this parameters.
  • Threshold: Maximum %SWPWT is 5. 

Comments

Popular posts from this blog