Your agents are running normally, but on the console Status page, they are not reporting, and are listed as Unresponsive.
Version and installation information
PE version: All supported
Solution
There are a few reasons why your nodes become unresponsive, but generally they are related to network issues (a queue in the network) or reports that are slow to process (a queue in PuppetDB).
If your compilers have PuppetDB on them and your primary server and compilers are either connected with high-latency links or connected by congested network segments, there might be a large queue of data between PuppetDB and PostgreSQL causing nodes to be unresponsive in the console.
If the queue (queue_depth
) is growing consistently, into the thousands or tens of thousands, you likely have a bandwidth issue between the compiler and PostgreSQL host. You can check queue_depth
using one (or more) of the following. Methods are listed in order from easiest to hardest.
-
On the compiler, count the number of files in
/opt/puppetlabs/server/data/puppetdb/stockpile/cmd/q
-
Use the
puppetlabs-puppet_metrics_collector
(included with all supported versions of PE) to checkqueue_depth
-
Use the PuppetDB performance dashboard to check command queue depth.
If this is the issue, improving network performance will improve responsiveness.
Load on PuppetDB can also cause command processing times to slow. For example, if you look in the PostgreSQL logs (located in the /var/log/puppetlabs/postgresql/<POSTGRESQL VERSION NUMBER>/
directory, and see a lot of deadlocks or sharelocks, too many processes are trying to access PostgreSQL at the same time, causing nodes to be unresponsive in the console. This kind of performance issue can be caused by system design or architecture issues. So, if you have many deadlocks or sharelocks, and you aren’t experiencing a command_queue
issue, open a ticket and ask us for help.
How can we improve this article?
0 comments
Please sign in to leave a comment.
Related articles