Redo RelationCounterField unsaved state tracking to avoid deadlocks.

Review Request #9268 — Created Oct. 14, 2017 and submitted

Information

Djblets
release-0.10.x
6e0df58...

Reviewers

We were hitting a very peculiar deadlock bug that was reproduceable but
depended on the operations being performed and the state of the garbage
collector. The deadlock occurred due to Django's Signal.connect()
triggering a RelationCounterField weak reference to drop while
iterating through signal handlers, which resulted in a callback in
RelationCounterField that then attempted to disconnect signals. Since
Signal.connect() already grabbed a lock, Signal.disconnect() was
blocked waiting on that lock, causing the process (whether unit tests or
web server) to hang.

In order to get around this, the state tracking has been changed a bit,
in a way that actually cleans up the code and brings some consistency
and order to how unsaved instances work. Previously, unsaved model
instances (those with a primary key of None) required listening both for
a save (to begin tracking state) and a weak reference drop (to clean up
signal handlers), which required the use of non-weak signal callbacks,
preventing Signal from doing its own cleanup.

In the new design, InstanceState has been updated to be used for both
saved and unsaved instances. It now owns the signal handlers, and
connects to them using weak references, so that Signal can handle
disconnecting (in a way that avoids deadlocks) when the state is
no longer referenced. An InstanceState that represents an unsaved
instance will listen for when the instance is saved, remove the existing
references to itself, and register new InstanceStates for the saved
reference.

InstanceState also listens for when the weak reference is cleared for
both saved and unsaved instances, so that the resets always happen when
needed. It also no longer listens to pre_delete for unsaved instances,
reducing the number of signal connections.

This all avoids the problem of nested locks in Signal, creates
consistency in how all instances are tracked, and further reduces the
chances of state or signal connection leaks.

Set up a reproduceable unit test run in Review Board that consistently
eventually ended up in a deadlock (due to the garbage collector deciding
to run at some point). After this change, I could not reproduce any
deadlocks in the test runs.

All unit tests for Djblets and Review Board pass.

Description From Last Updated

typo: represented -> represent

daviddavid
david
  1. 
      
  2. djblets/db/fields.py (Diff revision 1)
     
     
    Show all issues

    typo: represented -> represent

  3. 
      
chipx86
david
  1. Ship It!
  2. 
      
chipx86
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to release-0.10.x (ac0a9d0)
Loading...