Mitigate a thundering herd by spreading agents out when too many check in at once.
Version and installation information
PE version: All supported versions
Solution
When too many agents check in at once, configure the pe-puppetserver service to return 503 Service Unavailable error
responses with random Retry-After
headers. Agents sleep for a random amount of time set in the Retry-After
field and then check in, breaking up the herd.
Configure the both of these settings using Hiera directly or Hiera via the console:
"puppet_enterprise::master::puppetserver::jruby_puppet_max_queued_requests": 48 "puppet_enterprise::master::puppetserver::jruby_puppet_max_retry_delay": 600
The jruby_puppet_max_queued_requests
setting limits the maximum number of waiting requests allowed before pe-puppetserver starts sending 503 responses to spread agents out. Change this setting based on the number of JRuby workers Puppet Server is running. Start with a limit of 12 queued requests per JRuby. The example above is based on the default JRuby worker pool of 4 instances. The maximum value for jruby_puppet_max_queued_requests
is 150.
The jruby_puppet_max_retry_delay
setting limits the maximum amount of time that pe-puppetserver
returns as a Retry-After
header on 503 responses. This limit is multiplied by a random number, and each agent sleeps for a different amount of time, preventing a thundering herd. The example above uses a limit of 10 minutes.
How can we improve this article?
0 comments
Please sign in to leave a comment.
Related articles