Get troubleshooting information to help you solve common PuppetDB performance issues including crashes, out of memory errors, unresponsive and unreachable databases,
500 errors, slow PuppetDB queries, slow Puppet runs, and databases that grow too large.
Run commands on the PuppetDB node unless noted otherwise. The PuppetDB node is on the primary server unless you have a separate PE-PostgreSQL node that runs PuppetDB.
Version and installation information
PE version: All supported versions
Get more troubleshooting information
To get more troubleshooting information and to see the queries that PuppetDB is sending to PostgreSQL (the backend database), increase the debugging level for PuppetDB.
Fix crashes and out of memory errors
If PuppetDB memory usage exceeds the Java heap size, it crashes, causing errors in
/var/log/puppetlabs/puppetdb-daemon.log. You can fix these errors by increasing the Java heap size.
If PuppetDB is not running or is unreachable, you get a 500 error during Puppet runs. For example:
# puppet agent -t Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER:
Server Error: Failed to execute '/pdb/cmd/v1?checksum=7c2f9eabbba1bbfd8de78b0f0ac5a2d401104d58 &version=5&certname=pe-201722-server.puppetdebug.vlan&command=replace_facts&producer-time stamp=1500568375' on at least 1 of the following 'server_urls': https://pe-201722-server.puppetdebug.vlan:8081 Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
To troubleshoot, follow these steps.
Check that the pe-puppetdb service is running. Run
puppet resource service pe-puppetdb ensure=running
If the service isn’t running, check the PuppetDB logs in
/var/log/puppetlabs/postgresql/. Address any issues that you find.
If the service is running, confirm that ports 8080 and 8081 are accessible.
Troubleshoot performance issues
When there are performance issues with PuppetDB or slow Puppet runs, check the command queue depth metric. You can also check for issues with large facts, slow queries, and troubleshoot a database that’s getting too large.
Check the command queue depth
If the Puppet metrics collector value for
queue_depth is persistently larger than 100, it causes performance issues. Learn how to query
queue_depth and allocate free CPUs (if they’re available).
If the issue persists after allocating additional command processing threads, you might need to refactor your code or data. Check for the following common issues:
Binary files stored in the database.
Large reports generated when many agent changes are applied frequently, for example, when a resource with 10,000 dependent resources is updated.
Check for issues with facts and reports, covered in following sections.
Check for large facts
When a fact is very large it might cause performance issues such as slow Puppet runs and PuppetDB performance issues. The
ec2 structured fact can cause repeated connections to Openstack metadata nodes. Solaris systems automount each user’s home directory causing the
mountpoints fact, which stores all filesystems mounted on each system, to become very large.
To check for large facts, run this query:
su - pe-postgres -s '/bin/bash' -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -c 'select k, j, pg_column_size(j) from ( select jsonb_each(stable) j, certname k from factsets union all select jsonb_each(volatile) j, certname k from factsets) a order by 3 desc limit 20;'" >/tmp/large_facts.txt
If you have facts that are larger than a few KB, you can improve performance by disabling them.
- Disable custom facts by removing them from your module or overriding their value
- Disable ec2 or filesystem built-in facts by blocking them using factor.conf
- Disable other built-in facts by Overriding or disabling them](https://support.puppet.com/hc/en-us/articles/218916348)
Check for fact or report insertions that take a long time
Slow query completion can cause a decrease in database performance. Some queries, including fact or report insertions, take a long time to process in database tables. Queries and insertions should be completed in less than five seconds. If they take longer than five seconds to complete, database performance suffers.
By adjusting the default value of
log_min_duration_statement (by default 5000ms or five seconds) you can get more information about how long it takes for queries to run, and which queries are causing issues.
To check if insertions are an issue, increase the value of
log_min_duration_statementto longer than 10 minutes (600000ms).
If you have many queries that are taking longer than average, it can also cause performance issues. To check this, decrease the value of
log_min_duration_statementFor example, to find all SQL statements that take longer than 500ms to run, set the value to 500ms.
Changing the value might increase the size of PostgreSQL logs files (located in
/var/log/puppetlabs/postgresql/<POSTGRESQL VERSION NUMBER>), so when you’re done troubleshooting, change the value back to the default.
log_min_duration_statement in the console
To change the setting in the console:
In the console, click Node groups, click on the
+to the left of PE Infrastructure, and then click PE Master.
Click on the Classes tab. In the Add new class field, enter
databaseto find the
puppet_enterprise::profile::databaseclass. Click Add class.
puppet_enterprise::profile::database, select the parameter
log_min_duration_statement. Set the Value in milliseconds, between the double quotes, for example,
To make the change, click Add, click Commit, and then click Run and select Puppet
log_min_duration_statement in Hiera
To change the setting in Heira, add the following to your per-node hierarchy in
nodes/<CERTNAME>.yaml (defined in
hiera.yaml](https://puppet.com/docs/puppet/7/hiera_quick.html#create_hiera_yaml_config)), and set the VALUE in milliseconds, for example,
Troubleshoot and take action when your database grows too large
If the database grows too large, it can cause disk space exhaustion or performance issues.
When database size increases suddenly check for lots of write-ahead logging data in
The following might increase the size of tables. Check for them using these queries.
Number of events generated per resource
su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -c 'select certname, containing_class, file, count(*) from resource_events join certnames on certnames.id = resource_events.certname_id group by certname, containing_class, file order by count desc limit 20;'" >events_per_resource.txt
Longest resource titles
su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql -d pe-puppetdb -c 'select certname, containing_class, file, resource_title, length(resource_title) from resource_events join certnames on certnames.id = resource_events.certname_id group by certname, containing_class, file, resource_title order by length(resource_title) desc limit 20;'" >longest_resource_titles.txt