Add a health check view.

Review Request #13252 — Created Aug. 30, 2023 and submitted

Information

Djblets
release-4.x

Reviewers

This change adds a view implementation that can be used to set up a
health check. It simply checks that the database and cache server are
available, and returns 200 if so and 500 if not (while logging the
error).

Manually set things up to have either the database or cache server fail
and saw that this view returned the correct results.

Summary
Add a health check view.
Description From Last Updated

I was reading up on some best practices for health checks, and something that came up is that consuming services …

chipx86chipx86

We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and …

chipx86chipx86

This might leak a cursor. We should with and pass this.

chipx86chipx86

Extra blank line here.

chipx86chipx86

line too long (81 > 79 characters) Column: 80 Error code: E501

reviewbotreviewbot

Looks like the sentence ended too early, should say something like "are accessible." at the end.

maubinmaubin

Add unused to these

maubinmaubin

I think we want 503 Service Unavailable for these.

chipx86chipx86

Missing docs for these.

chipx86chipx86

Missing docs for these.

chipx86chipx86

Can we flesh this out a bit, document what this does/checks and what to expect from it?

chipx86chipx86

This needs to be typed as bool or it can annoyingly end up typed Final[True].

chipx86chipx86
chipx86
  1. 
      
  2. djblets/util/views.py (Diff revision 1)
     
     

    We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and perform the test?

  3. djblets/util/views.py (Diff revision 1)
     
     

    This might leak a cursor. We should with and pass this.

  4. djblets/util/views.py (Diff revision 1)
     
     
     
     

    Extra blank line here.

  5. 
      
david
Review request changed

Change Summary:

  • Use get_or_set instead of touch
  • Iterate through all connected databases and caches

Commits:

Summary
-
Add a health check view.
+
Add a health check view.

Diff:

Revision 2 (+202 -8)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

david
maubin
  1. 
      
  2. djblets/util/views.py (Diff revision 3)
     
     

    Looks like the sentence ended too early, should say something like "are accessible." at the end.

  3. djblets/util/views.py (Diff revision 3)
     
     
     
     
     

    Add unused to these

  4. 
      
david
maubin
  1. Ship It!
  2. 
      
chipx86
  1. Been mulling over the health checks and doing some reading and looking at other services. There are some improvements I think we should make before we ship this, whether part of this change or another change on top.

  2. I was reading up on some best practices for health checks, and something that came up is that consuming services can benefit from a JSON payload for success and failure results.

    The payload would have a general status stating if it's OK or if something is down, and then a list/mapping of errors. This lets monitoring services communicate, graph, and alert on this. Maybe something like:

    Success:

    {
        "status": "UP",
        "checks": {
            "database": {
                "default": "UP",
            },
            "cache": {
                "default": "UP",
            }
          }
       }
    }
    

    Error:

    {
        "status": "DOWN",
        "errors":  {
            "cache": {
                "default": "Unable to connect to memcached at 127.0.0.1:11611"
             },
        },
        "checks": {
            "database": {
                "default": "UP",
            },
            "cache": {
                "default": "DOWN",
            }
          }
       }
    }
    

    Some food-for-thought:

    • We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a DJBLETS_HEALTHCHECK_IPS setting, maybe defaulting that to INTERNAL_IPS, and check against that.
    • Larger-in-scope project: We should build in a more general healthcheck registry that other things can populate. Review Board could one day check a message queue, or extensions could check whatever they need to check. A separate batch of work, but might influence the JSON payload.

    Related reading I found interesting and bookmarked:

    • https://testfully.io/blog/api-health-check-monitoring/
    • https://medium.com/ether-labs/designing-a-health-check-api-a98784eb6a07
    • https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring
    • https://emmer.dev/blog/writing-meaningful-health-check-endpoints/
  3. djblets/util/views.py (Diff revision 4)
     
     

    I think we want 503 Service Unavailable for these.

  4. 
      
david
chipx86
  1. 
      
  2. djblets/util/views.py (Diff revision 5)
     
     
     
     
     
     
     

    Missing docs for these.

  3. djblets/util/views.py (Diff revision 5)
     
     
     
     

    Missing docs for these.

  4. djblets/util/views.py (Diff revision 5)
     
     
     

    Can we flesh this out a bit, document what this does/checks and what to expect from it?

  5. djblets/util/views.py (Diff revision 5)
     
     

    This needs to be typed as bool or it can annoyingly end up typed Final[True].

  6. 
      
david
chipx86
  1. Ship It!
  2. 
      
david
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-4.x (15672ff)
Loading...