Add a health check view.
Review Request #13252 — Created Aug. 30, 2023 and submitted
This change adds a view implementation that can be used to set up a
health check. It simply checks that the database and cache server are
available, and returns 200 if so and 500 if not (while logging the
error).
Manually set things up to have either the database or cache server fail
and saw that this view returned the correct results.
Summary | |
---|---|
Description | From | Last Updated |
---|---|---|
I was reading up on some best practices for health checks, and something that came up is that consuming services … |
|
|
We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and … |
|
|
This might leak a cursor. We should with and pass this. |
|
|
Extra blank line here. |
|
|
line too long (81 > 79 characters) Column: 80 Error code: E501 |
![]() |
|
Looks like the sentence ended too early, should say something like "are accessible." at the end. |
![]() |
|
Add unused to these |
![]() |
|
I think we want 503 Service Unavailable for these. |
|
|
Missing docs for these. |
|
|
Missing docs for these. |
|
|
Can we flesh this out a bit, document what this does/checks and what to expect from it? |
|
|
This needs to be typed as bool or it can annoyingly end up typed Final[True]. |
|
-
-
djblets/util/views.py (Diff revision 1) We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and perform the test?
-
-
Change Summary:
- Use
get_or_set
instead oftouch
- Iterate through all connected databases and caches
Commits: |
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 2 (+202 -8) |
Checks run (1 failed, 1 succeeded)
flake8
-
djblets/util/views.py (Diff revision 2) line too long (81 > 79 characters) Column: 80 Error code: E501
Commits: |
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+204 -8) |
Checks run (2 succeeded)

-
-
djblets/util/views.py (Diff revision 3) Looks like the sentence ended too early, should say something like "are accessible." at the end.
-
Commits: |
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+206 -8) |
Checks run (2 succeeded)
-
Been mulling over the health checks and doing some reading and looking at other services. There are some improvements I think we should make before we ship this, whether part of this change or another change on top.
-
I was reading up on some best practices for health checks, and something that came up is that consuming services can benefit from a JSON payload for success and failure results.
The payload would have a general status stating if it's OK or if something is down, and then a list/mapping of errors. This lets monitoring services communicate, graph, and alert on this. Maybe something like:
Success:
{ "status": "UP", "checks": { "database": { "default": "UP", }, "cache": { "default": "UP", } } } }
Error:
{ "status": "DOWN", "errors": { "cache": { "default": "Unable to connect to memcached at 127.0.0.1:11611" }, }, "checks": { "database": { "default": "UP", }, "cache": { "default": "DOWN", } } } }
Some food-for-thought:
- We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a
DJBLETS_HEALTHCHECK_IPS
setting, maybe defaulting that toINTERNAL_IPS
, and check against that. - Larger-in-scope project: We should build in a more general healthcheck registry that other things can populate. Review Board could one day check a message queue, or extensions could check whatever they need to check. A separate batch of work, but might influence the JSON payload.
Related reading I found interesting and bookmarked:
- https://testfully.io/blog/api-health-check-monitoring/
- https://medium.com/ether-labs/designing-a-health-check-api-a98784eb6a07
- https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring
- https://emmer.dev/blog/writing-meaningful-health-check-endpoints/
- We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a
-
Change Summary:
- Change status code when healthcheck fails.
- Add a payload with a JSON blob that includes all the checks and status of connected services.
- Limit requests to hosts in
DJBLETS_HEALTHCHECK_IPS
(defaulting toINTERNAL_IPS
) - query: should this just be calledHEALTHCHECK_IPS
?
Commits: |
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+326 -8) |
Checks run (2 succeeded)
-
-
-
-
djblets/util/views.py (Diff revision 5) Can we flesh this out a bit, document what this does/checks and what to expect from it?
-
djblets/util/views.py (Diff revision 5) This needs to be typed as
bool
or it can annoyingly end up typedFinal[True]
.
Commits: |
|
|||||||||
---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+374 -8) |