Add a health check view.

Review Request #13252 — Created Aug. 30, 2023 and submitted

Information

Djblets
release-4.x

Reviewers

This change adds a view implementation that can be used to set up a
health check. It simply checks that the database and cache server are
available, and returns 200 if so and 500 if not (while logging the
error).

Manually set things up to have either the database or cache server fail
and saw that this view returned the correct results.

Summary ID
Add a health check view.
This change adds a view implementation that can be used to set up a health check. It simply checks that the database and cache server are available, and returns 200 if so and 500 if not (while logging the error). Testing Done: Manually set things up to have either the database or cache server fail and saw that this view returned the correct results.
cc55ec97fc0eb0e7a25b95378e672443b7dc0e65
Description From Last Updated

I was reading up on some best practices for health checks, and something that came up is that consuming services …

chipx86chipx86

We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and …

chipx86chipx86

This might leak a cursor. We should with and pass this.

chipx86chipx86

Extra blank line here.

chipx86chipx86

line too long (81 > 79 characters) Column: 80 Error code: E501

reviewbotreviewbot

Looks like the sentence ended too early, should say something like "are accessible." at the end.

maubinmaubin

Add unused to these

maubinmaubin

I think we want 503 Service Unavailable for these.

chipx86chipx86

Missing docs for these.

chipx86chipx86

Missing docs for these.

chipx86chipx86

Can we flesh this out a bit, document what this does/checks and what to expect from it?

chipx86chipx86

This needs to be typed as bool or it can annoyingly end up typed Final[True].

chipx86chipx86
chipx86
  1. 
      
  2. djblets/util/views.py (Diff revision 1)
     
     
    Show all issues

    We only use the default DB, but since this is in Djblets, maybe we should loop through all connections and perform the test?

  3. djblets/util/views.py (Diff revision 1)
     
     
    Show all issues

    This might leak a cursor. We should with and pass this.

  4. djblets/util/views.py (Diff revision 1)
     
     
     
     
    Show all issues

    Extra blank line here.

  5. 
      
david
Review request changed
Change Summary:
  • Use get_or_set instead of touch
  • Iterate through all connected databases and caches
Commits:
Summary ID
Add a health check view.
This change adds a view implementation that can be used to set up a health check. It simply checks that the database and cache server are available, and returns 200 if so and 500 if not (while logging the error). Testing Done: Manually set things up to have either the database or cache server fail and saw that this view returned the correct results.
7f0d7c67174b6f2420213741a7a831f984b85b36
Add a health check view.
This change adds a view implementation that can be used to set up a health check. It simply checks that the database and cache server are available, and returns 200 if so and 500 if not (while logging the error). Testing Done: Manually set things up to have either the database or cache server fail and saw that this view returned the correct results.
efd67f56acc9493e824e56597a19b92b62036904

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

david
maubin
  1. 
      
  2. djblets/util/views.py (Diff revision 3)
     
     
    Show all issues

    Looks like the sentence ended too early, should say something like "are accessible." at the end.

  3. djblets/util/views.py (Diff revision 3)
     
     
     
     
     
    Show all issues

    Add unused to these

  4. 
      
david
maubin
  1. Ship It!
  2. 
      
chipx86
  1. Been mulling over the health checks and doing some reading and looking at other services. There are some improvements I think we should make before we ship this, whether part of this change or another change on top.

  2. Show all issues

    I was reading up on some best practices for health checks, and something that came up is that consuming services can benefit from a JSON payload for success and failure results.

    The payload would have a general status stating if it's OK or if something is down, and then a list/mapping of errors. This lets monitoring services communicate, graph, and alert on this. Maybe something like:

    Success:

    {
        "status": "UP",
        "checks": {
            "database": {
                "default": "UP",
            },
            "cache": {
                "default": "UP",
            }
          }
       }
    }
    

    Error:

    {
        "status": "DOWN",
        "errors":  {
            "cache": {
                "default": "Unable to connect to memcached at 127.0.0.1:11611"
             },
        },
        "checks": {
            "database": {
                "default": "UP",
            },
            "cache": {
                "default": "DOWN",
            }
          }
       }
    }
    

    Some food-for-thought:

    • We should consider health information to be sensitive, since it's not great if third-parties are able to just watch the status of an arbitrary server. Might be worth then adding a DJBLETS_HEALTHCHECK_IPS setting, maybe defaulting that to INTERNAL_IPS, and check against that.
    • Larger-in-scope project: We should build in a more general healthcheck registry that other things can populate. Review Board could one day check a message queue, or extensions could check whatever they need to check. A separate batch of work, but might influence the JSON payload.

    Related reading I found interesting and bookmarked:

    • https://testfully.io/blog/api-health-check-monitoring/
    • https://medium.com/ether-labs/designing-a-health-check-api-a98784eb6a07
    • https://learn.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring
    • https://emmer.dev/blog/writing-meaningful-health-check-endpoints/
  3. djblets/util/views.py (Diff revision 4)
     
     
    Show all issues

    I think we want 503 Service Unavailable for these.

  4. 
      
david
chipx86
  1. 
      
  2. djblets/util/views.py (Diff revision 5)
     
     
     
     
     
     
     
    Show all issues

    Missing docs for these.

  3. djblets/util/views.py (Diff revision 5)
     
     
     
     
    Show all issues

    Missing docs for these.

  4. djblets/util/views.py (Diff revision 5)
     
     
     
    Show all issues

    Can we flesh this out a bit, document what this does/checks and what to expect from it?

  5. djblets/util/views.py (Diff revision 5)
     
     
    Show all issues

    This needs to be typed as bool or it can annoyingly end up typed Final[True].

  6. 
      
david
chipx86
  1. Ship It!
  2. 
      
david
Review request changed
Status:
Completed
Change Summary:
Pushed to release-4.x (15672ff)