Performance/Error Debugging

Often times you may face certain errors within your site. This page aims to be a guide to most errors faced by utitizing the tools provided in Frappe Cloud and Frappe Framework

Site Slow: Daily Usage limit reached

This happens when you exceed cpu hours alloted for your site. If you're confused as to how you reached your cpu hours limit, you can check the analytics tab of your site for past 24 hours. Especially, the Slowest Requests and Slowest Background Jobs graphs. This will give you an idea of which endpoints in your site take most time/requests. We can take a look at the following graphs as an example

Here, the red bars are seem to take relatively long and should be looked into.

It is sorted in descending order, so the first endpoints in the list are usually slowest.

500 Internal Server Error

More often than not, this error represents some application related issue. If your site is on custom bench group, then you can investigate the same with logs or ssh access. It is a possibility that your custom app is throwing an error. You can view the same in web.error.log. Refer our docs for the same.

If you occasionally get a pop-up with the same message, it is likely that a background job is failing. In such cases, checking your Scheduled Job Log, Error Log and worker.err.log file should help.

502 Bad Gateway

This error usually happens when your web worker processes has completely stopped. It can happen due to various reasons. To debug, you should check your logs the same way as above. You can also try to bring the process back up with bench restart.

Site Slow: 504 Gateway timeout

This can happen when web workers on your site are all busy with previous requests. This can even cause a bench to go down! This is caused by slow apis. Most of the time these are reports which take too long to run. You can confirm the same from your analytics page by looking at Slowest Requests chart as shown above.

Some common endpoints and their meanings are given below

Endpoint	Meaning
`/api/method/frappe.desk.query_report.run`	Reports from Report doctype
`/api/method/frappe.desk.reportview.get`	Loading of report or List view of a doctype. In case lot of columns are being fetched with filters on various others, it can get slow depending on indexes.
`/api/method/run_doc_method`	This indicates a whitelisted method in a Document controller is being called

You can also use frappe's built in Recorder in your site to figure out what's wrong. Remember to turn it off once you're done to prevent slowing down your site further.

If the endpoint is not something you can optimize, you can try converting the same into a background job.

If you own a dedicated server, you should also check your server analytics to see if you're reaching CPU limits for either of your servers (Application or Database)

Slow reports

if you see /api/method/frappe.desk.query_report.run at the top of the list. This is a good indication that you can convert such reports into Prepared Reports so they run in background and allow you to freely use your site.

What does "Other" mean in chart

“Other” bucket is attributed to all requests that aren’t the top slowest requests. When “Other” is most prominent, it usually means a specific pattern of endpoint is slow. For example:

)

Here, the /custom_app/view/* endpoint is slow and should be investigated by the developer for the same.

If you don't see a pattern of sorts and "Other" is still the slowest endpoint, then it's likely that the server itself is slow and should be looked into.

What's causing request timed out error?

If a particular action in your site (not all), say submission of a document takes too long and eventually ends with a Request Timed Out popup, it's an application issue assuming normal functioning of the server. In most cases we can't do much other than try increasing the default http timeout of 2 minutes of web requests.

Here, the slowness could be in your python application or be due to slow queries.

If the action you're performing is part of your custom app, we'd suggest you look into try and optimizing the code so that it finishes faster. If you're pressed for time, you may also run the particular action from bench console after ssh as a workaround.

If the action is guaranteed to take long, consider converting the same to a background job.

On the off chance that the app is not part of custom app and all other activities in the site are going smoothly, please reach out to ERPNext Support for help.

Request Timeout: Server was too busy to process this request

This happens when a SQL query times out due to not getting a lock. This indicates a bug in the application. Some other job may also be acquiring a lock on a related table, causing the issue. Any recent controller hook or scheduled job added should be reviewed.

One easy way to debug this is to perform the action that triggers it and while it is happening, check the processlist of your site to see which queries are running.

Checking slow queries is also a good idea.

Work-horse terminated unexpectedly; Waitpid returned 9/15 (signal 9/15)

You may see this as the output of RQ Job . This happens when a background worker gets killed. Usually by the OOM Killer as the result of consuming too much memory. In such cases, you may consider optimizing your code to use less memory. If that is not possible, you'll have to upgrade your application server for more memory.