• 
      

    Redo RelationCounterField unsaved state tracking to avoid deadlocks.

    Review Request #9268 — Created Oct. 14, 2017 and submitted — Latest diff uploaded

    Information

    Djblets
    release-0.10.x
    6e0df58...

    Reviewers

    We were hitting a very peculiar deadlock bug that was reproduceable but
    depended on the operations being performed and the state of the garbage
    collector. The deadlock occurred due to Django's Signal.connect()
    triggering a RelationCounterField weak reference to drop while
    iterating through signal handlers, which resulted in a callback in
    RelationCounterField that then attempted to disconnect signals. Since
    Signal.connect() already grabbed a lock, Signal.disconnect() was
    blocked waiting on that lock, causing the process (whether unit tests or
    web server) to hang.

    In order to get around this, the state tracking has been changed a bit,
    in a way that actually cleans up the code and brings some consistency
    and order to how unsaved instances work. Previously, unsaved model
    instances (those with a primary key of None) required listening both for
    a save (to begin tracking state) and a weak reference drop (to clean up
    signal handlers), which required the use of non-weak signal callbacks,
    preventing Signal from doing its own cleanup.

    In the new design, InstanceState has been updated to be used for both
    saved and unsaved instances. It now owns the signal handlers, and
    connects to them using weak references, so that Signal can handle
    disconnecting (in a way that avoids deadlocks) when the state is
    no longer referenced. An InstanceState that represents an unsaved
    instance will listen for when the instance is saved, remove the existing
    references to itself, and register new InstanceStates for the saved
    reference.

    InstanceState also listens for when the weak reference is cleared for
    both saved and unsaved instances, so that the resets always happen when
    needed. It also no longer listens to pre_delete for unsaved instances,
    reducing the number of signal connections.

    This all avoids the problem of nested locks in Signal, creates
    consistency in how all instances are tracked, and further reduces the
    chances of state or signal connection leaks.

    Set up a reproduceable unit test run in Review Board that consistently
    eventually ended up in a deadlock (due to the garbage collector deciding
    to run at some point). After this change, I could not reproduce any
    deadlocks in the test runs.

    All unit tests for Djblets and Review Board pass.