• 
      

    Gracefully handle errors communicating with the queue.

    Review Request #14374 — Created March 18, 2025 and updated — Latest diff uploaded

    Information

    ReviewBot
    release-4.x

    Reviewers

    When the queue is down or non-responsive, we end up with long exception
    traces in the log files but nothing useful on the front-end. Tool
    updates and tool runs will spin and spin until they time out.

    We now have error handling code around all code communicating over
    Celery, logging errors and stack traces with a trace ID.

    Tool runs (both manual and automatic) will fail to the error state with
    an "error running tool (error ID XYZ)", instead of waiting to time out.

    Worker status checks will once again display a suitable error message
    and include information in the logs. We had code that attempted to
    provide a good result, but it was checking for IOError, and nothing in
    the Celery communication code uses that anymore.

    Tool refreshes in the Tools database list now alert with an error and
    reset the state. This has also been updated to be more accessible and
    prioritize Ink styling if available.

    Shut down RabbitMQ and tested all the functionality: Manual tool runs,
    automatic tool runs, worker status checks, and tool refreshes. Verified
    the log output and the user-facing output in all cases.

    Tested the newer tool refresh UI on Review Board 6 and 7.

    Unit tests pass.


    Commits

    Files