Add a health check view.
Review Request #13252 — Created Aug. 30, 2023 and submitted
This change adds a view implementation that can be used to set up a
health check. It simply checks that the database and cache server are
available, and returns 200 if so and 500 if not (while logging the
error).
Manually set things up to have either the database or cache server fail
and saw that this view returned the correct results.
Summary | ID |
---|---|
cc55ec97fc0eb0e7a25b95378e672443b7dc0e65 |
Description | From | Last Updated |
---|---|---|
I was reading up on some best practices for health checks, and something that came up is that consuming services … |
chipx86 | |
We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and … |
chipx86 | |
This might leak a cursor. We should with and pass this. |
chipx86 | |
Extra blank line here. |
chipx86 | |
line too long (81 > 79 characters) Column: 80 Error code: E501 |
reviewbot | |
Looks like the sentence ended too early, should say something like "are accessible." at the end. |
maubin | |
Add unused to these |
maubin | |
I think we want 503 Service Unavailable for these. |
chipx86 | |
Missing docs for these. |
chipx86 | |
Missing docs for these. |
chipx86 | |
Can we flesh this out a bit, document what this does/checks and what to expect from it? |
chipx86 | |
This needs to be typed as bool or it can annoyingly end up typed Final[True]. |
chipx86 |
- Change Summary:
-
- Use
get_or_set
instead oftouch
- Iterate through all connected databases and caches
- Use
- Commits:
-
Summary ID 7f0d7c67174b6f2420213741a7a831f984b85b36 efd67f56acc9493e824e56597a19b92b62036904 - Diff:
-
Revision 2 (+202 -8)
- Commits:
-
Summary ID efd67f56acc9493e824e56597a19b92b62036904 245b20191004f70a3666272a7ae19fe78b021fdf - Diff:
-
Revision 3 (+204 -8)
Checks run (2 succeeded)
- Commits:
-
Summary ID 245b20191004f70a3666272a7ae19fe78b021fdf efe09b9953e81acaea0cbcf1f33abee03ba210f9 - Diff:
-
Revision 4 (+206 -8)
Checks run (2 succeeded)
-
Been mulling over the health checks and doing some reading and looking at other services. There are some improvements I think we should make before we ship this, whether part of this change or another change on top.
-
I was reading up on some best practices for health checks, and something that came up is that consuming services can benefit from a JSON payload for success and failure results.
The payload would have a general status stating if it's OK or if something is down, and then a list/mapping of errors. This lets monitoring services communicate, graph, and alert on this. Maybe something like:
Success:
{ "status": "UP", "checks": { "database": { "default": "UP", }, "cache": { "default": "UP", } } } }
Error:
{ "status": "DOWN", "errors": { "cache": { "default": "Unable to connect to memcached at 127.0.0.1:11611" }, }, "checks": { "database": { "default": "UP", }, "cache": { "default": "DOWN", } } } }
Some food-for-thought:
- We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a
DJBLETS_HEALTHCHECK_IPS
setting, maybe defaulting that toINTERNAL_IPS
, and check against that. - Larger-in-scope project: We should build in a more general healthcheck registry that other things can populate. Review Board could one day check a message queue, or extensions could check whatever they need to check. A separate batch of work, but might influence the JSON payload.
Related reading I found interesting and bookmarked:
- https://testfully.io/blog/api-health-check-monitoring/
- https://medium.com/ether-labs/designing-a-health-check-api-a98784eb6a07
- https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring
- https://emmer.dev/blog/writing-meaningful-health-check-endpoints/
- We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a
-
- Change Summary:
-
- Change status code when healthcheck fails.
- Add a payload with a JSON blob that includes all the checks and status of connected services.
- Limit requests to hosts in
DJBLETS_HEALTHCHECK_IPS
(defaulting toINTERNAL_IPS
) - query: should this just be calledHEALTHCHECK_IPS
?
- Commits:
-
Summary ID efe09b9953e81acaea0cbcf1f33abee03ba210f9 4d5d3a4a2c418745d22add7386cd5f0489f1d995 - Diff:
-
Revision 5 (+326 -8)
Checks run (2 succeeded)
- Commits:
-
Summary ID 4d5d3a4a2c418745d22add7386cd5f0489f1d995 cc55ec97fc0eb0e7a25b95378e672443b7dc0e65 - Diff:
-
Revision 6 (+374 -8)