Right now I am using these values:
# y = c * p / 100
# y: nagios value
# c: number of cores
# p: wanted load procent
# 4 cores
# time 5 minutes 10 minutes 15 minutes
# warning: 90% 70% 50%
# critical: 100% 80% 60%
command[check_load]=/usr/local/nagios/libexec/check_load -w 3.6,2.8,2.0 -c 4.0,3.2,2.4
But these values are just picked almost random.
Does anyone have some tested values?
Answer
Linux load is actually simple. Each of the load avg numbers are the summation of all the core's avg load. Ie.
1 min load avg = load_core_1 + load_core_2 + ... + load_core_n
5 min load avg = load_core_1 + load_core_2 + ... + load_core_n
15 min load avg = load_core_1 + load_core_2 + ... + load_core_n
where 0 < avg load < infinity
.
So if a load is 1 on a 4 core server, then it either means each core is used 25% or one core is 100% under load. A load of 4 means all 4 cores are under 100% load. A load of >4 means the server needs more cores.
check_load
now have
-r, --percpu
Divide the load averages by the number of CPUs (when possible)
which means that when used, you can think of your server as having just one core and hence write the percent fractions directly without thinking of number of cores. With -r
the warning and critical intervals becomes 0 <= load avg <= 1
. Ie. you don't have to modify your warning and critical values from server to server.
OP have 5,10,15 for intervals. That is wrong. It is 1,5,15.
Comments
Post a Comment