You can use the Prometheus eks_nodegroup_capacity metric to get the maximum number of nodes that is set in the node group.
The eks_nodegroup_capacity metric provides the number of nodes that a node group is capable of hosting. You can use the max function to get the maximum value of this metric over time. Here's an example query to get the maximum capacity of a node group named mygroup:
max(eks_nodegroup_capacity{nodegroup_name="mygroup"})
You can combine this with your current query to get the current number of nodes and calculate the difference to set an alert when the number of nodes exceeds the maximum capacity.
For example, if you want to set an alert when the number of nodes exceeds 90% of the maximum capacity, you can use the following query:
sum(up{instance=~".*.myregion.compute.internal", eks_amazonaws_com_nodegroup="mygroup"}) > 0.9 * max(eks_nodegroup_capacity{nodegroup_name="mygroup"})
This query will return a boolean value that is true when the number of nodes exceeds 90% of the maximum capacity, and false otherwise. You can use this query as the condition for your alert rule in Grafana.
Ready to level up your coding game? Uncover the realm of microservices with our cutting-edge Microservices Developer Certification!