• 
      

    Add distributed lock functionality, and locked cache updates.

    Review Request #14628 — Created Oct. 8, 2025 and submitted

    Information

    Djblets
    release-5.x

    Reviewers

    This introduces djblets.protect, a new module for service protection
    capabilities, and specifically djblets.protect.locks.CacheLock, which
    is a simple distributed lock utilizing the cache backend. This can help
    avoid cache stampede issues, and overall reduce the work required by a
    service.

    It's important to note that these locks should only be used in cases
    where the loss of a lock will not cause corruption or other bad
    behavior. As cache backends may expire keys prematurely, and may lack
    atomic operations, a lock cannot be guaranteed. These can be thought of
    as a soft optimistic lock.

    Locks have an expiration, and consumers can block waiting on a lock to
    be available or return immediately, giving control over how to best
    utilize a lock.

    Locks are set by performing an atomic add() with a UUID4. If the value
    is added, the lock is acquired. If it already exists, the lock has to
    either block waiting or return a result. Waiting supports a timeout and
    a time between retries.

    When waiting, the lock will periodically check if it can acquire a new
    lock, using the provided timestamp and some random jitter to help avoid
    issues with stampedes where too many consumers are trying at the same
    times to check and acquire a lock.

    Locks are released when they expire or (ideally) when release() is
    called. It's also possible they may fall out of cache, at which point
    the lock is no longer valid, and suitable logging will occur.

    Since there aren't atomic operations around deletes, this will try to do
    release a lock as safely as possible. If the time spent with the lock is
    greater than the expected expiration, it will assume the lock has
    expired in cache and won't delete it (it may have been re-acquired
    elsewhere). Otherwise, it will attempt to bump the expiration to
    keep the key alive long enough to check it, with a worst-case scenario
    that the other acquirer may have a new expiration set (likely extending
    the lock). This is preferable over deleting another lock.

    When using a lock as a context manager, both acquiring and releasing the
    lock is handled automatically.

    The interface is designed to be largely API-compatible with
    threading.Lock and similar lock interfaces, but with more flexibility
    useful for distributed lock behavior.

    A pattern I expect to be common will be to lock a cache key when
    calculating state to store and then writing it, which may be expensive
    (for instance, talking to a remote service and storing the result).

    For this, cache_memoize() and cache_memoize_iter() have been updated
    to work with locks. They now take a lock= argument, which accepts a
    CacheLock with the parameters controlling the lock behavior. If
    provided, the lock will be acquired if the initial fetch doesn't yield a
    value. A second fetch is then attempted (in case it had to wait for
    another process to finish), and if it still needs to compute data to
    cache, it will do so under the protection of the lock, releasing when
    complete.

    Locks are entirely optional and not enabled by default for any current
    caching behavior, but are something we'll likely want to opt into any
    time we're working on caching something that's expensive to generate.

    Unit tests pass.

    Summary ID
    Add distributed lock functionality, and locked cache updates.
    This introduces `djblets.protect`, a new module for service protection capabilities, and specifically `djblets.protect.locks.CacheLock`, which is a simple distributed lock utilizing the cache backend. This can help avoid cache stampede issues, and overall reduce the work required by a service. It's important to note that these locks should only be used in cases where the loss of a lock will not cause corruption or other bad behavior. As cache backends may expire keys prematurely, and may lack atomic operations, a lock cannot be guaranteed. These can be thought of as a soft optimistic lock. Locks have an expiration, and consumers can block waiting on a lock to be available or return immediately, giving control over how to best utilize a lock. Locks are set by performing an atomic `add()` with a UUID4. If the value is added, the lock is acquired. If it already exists, the lock has to either block waiting or return a result. Waiting supports a timeout and a time between retries. When waiting, the lock will periodically check if it can acquire a new lock, using the provided timestamp and some random jitter to help avoid issues with stampedes where too many consumers are trying at the same times to check and acquire a lock. Locks are released when they expire or (ideally) when `release()` is called. It's also possible they may fall out of cache, at which point the lock is no longer valid, and suitable logging will occur. Since there aren't atomic operations around deletes, this will try to do release a lock as safely as possible. If the time spent with the lock is greater than the expected expiration, it will assume the lock has expired in cache and won't delete it (it may have been re-acquired elsewhere). Otherwise, it will attempt to bump the expiration to keep the key alive long enough to check it, with a worst-case scenario that the other acquirer may have a new expiration set (likely extending the lock). This is preferable over deleting another lock. When using a lock as a context manager, both acquiring and releasing the lock is handled automatically. The interface is designed to be largely API-compatible with `threading.Lock` and similar lock interfaces, but with more flexibility useful for distributed lock behavior. A pattern I expect to be common will be to lock a cache key when calculating state to store and then writing it, which may be expensive (for instance, talking to a remote service and storing the result). For this, `cache_memoize()` and `cache_memoize_iter()` have been updated to work with locks. They now take a `lock=` argument, which accepts a `CacheLock` with the parameters controlling the lock behavior. If provided, the lock will be acquired if the initial fetch doesn't yield a value. A second fetch is then attempted (in case it had to wait for another process to finish), and if it still needs to compute data to cache, it will do so under the protection of the lock, releasing when complete. Locks are entirely optional and not enabled by default for any current caching behavior, but are something we'll likely want to opt into any time we're working on caching something that's expensive to generate.
    7bb2e79db2a77ef2183385570d2fdbfffe87ea1d
    Description From Last Updated

    Can we add debug logging for when locks are acquired, extended, and released? This seems like potentially a cause of …

    daviddavid

    Remove this blank line?

    daviddavid

    typo: as -> was

    daviddavid

    Do we want to add any validation to these (ex: no negative numbers, timeout should be longer than retry, etc)?

    daviddavid

    These attributes are actually named timeout_secs and retry_secs

    daviddavid

    Can we add the lock key in this exception message? AssertionError also seems like maybe not the best type. Perhaps …

    daviddavid

    Comparing a float like this isn't reliable. While we could use something like math.isclose(), I think a much better option …

    daviddavid

    Seems like we likely want >= instead of >

    daviddavid

    The implementation here doesn't return anything

    daviddavid

    typo: released -> release

    daviddavid

    It seems like there's a potential race here where in between the check and the delete, the existing lock could …

    daviddavid

    Should we catch RuntimeError here as well, in case this is called while the lock is already acquired?

    daviddavid

    "while released" -> "without being released"

    daviddavid

    Can we add a __del__ implementation that logs a warning if the object gets garbage collected while the lock is …

    daviddavid
    chipx86
    chipx86
    david
    1. 
        
    2. Show all issues

      Can we add debug logging for when locks are acquired, extended, and released? This seems like potentially a cause of difficult bugs.

    3. djblets/cache/backend.py (Diff revision 3)
       
       
      Show all issues

      Remove this blank line?

      1. Our docs with multi-line bullet points usually have a blank line so they don't all run together. Same as blank lines between paragraphs.

    4. djblets/cache/backend.py (Diff revision 3)
       
       
      Show all issues

      typo: as -> was

    5. djblets/protect/locks.py (Diff revision 3)
       
       
       
      Show all issues

      Do we want to add any validation to these (ex: no negative numbers, timeout should be longer than retry, etc)?

      1. -1 is valid for timeout_secs (follows the same signature as Python's locking APIs), and a lesser value will always time out so I'm not worried about that, but I'll toy with the smarts around retry.

    6. djblets/protect/locks.py (Diff revision 3)
       
       
       
      Show all issues

      These attributes are actually named timeout_secs and retry_secs

      1. Ah good catch. Changed the names to fit Python locking APIs. Fixing.

    7. djblets/protect/locks.py (Diff revision 3)
       
       
      Show all issues

      Can we add the lock key in this exception message?

      AssertionError also seems like maybe not the best type. Perhaps just a RuntimeError?

      1. Yeah. Maybe not the key since it's not key-bound but instance-bound, and that's an important distinction for debugging.

        RuntimeError is a better match for the Locking APIs.

    8. djblets/protect/locks.py (Diff revision 3)
       
       
      Show all issues

      Comparing a float like this isn't reliable. While we could use something like math.isclose(), I think a much better option than a -1 magic value would be to define a sentinel value to use to indicate no timeout.

      1. -1 and floats are explicitly part of the Locking APIs, which I want to maintain compatibility with.

    9. djblets/protect/locks.py (Diff revision 3)
       
       
      Show all issues

      Seems like we likely want >= instead of >

    10. djblets/protect/locks.py (Diff revision 3)
       
       
       
       
       
      Show all issues

      The implementation here doesn't return anything

      1. Ah yep, older design.

    11. djblets/protect/locks.py (Diff revision 3)
       
       
      Show all issues

      typo: released -> release

    12. djblets/protect/locks.py (Diff revision 3)
       
       
       
       
      Show all issues

      It seems like there's a potential race here where in between the check and the delete, the existing lock could timeout and be acquired by another user. I don't suppose there are any atomic operations we could do for this?

      1. I looked into that when I wrote this, and sadly, no conditional deletes or anything. But I suppose I could attempt a new add() with the same token, and if the result is still this instance's token in either case, we can safely delete.

      2. So the reality here is that we can't make any guarantees. Memcached by default (and certainly through Django's layers) don't give us what's needed for this.

        Memcached really is not a safe place to build a reliable locking mechanism (beyond not having the atomic operations we need, memcached can always just randomly evict things), but for our purposes, we don't need a reliable lock here. The goal is to optimistically avoid concurrent operations for the same thing, like building a diff, where if one stomps on the other it's okay but we ideally avoid it in the average case.

        What I'm doing is implementing some basic protections and then document that this is really a sort of fuzzy lock, list the limitations, and provide guidance on use. I a caller gets their lock stomped on, or two operations end up running concurrently, the caller needs to be okay with that.

    13. 
        
    chipx86
    david
    1. 
        
    2. djblets/cache/backend.py (Diff revision 4)
       
       
       
       
       
      Show all issues

      Should we catch RuntimeError here as well, in case this is called while the lock is already acquired?

      1. If that happens, it's an implementation error. The caller explicitly passed a lock that was already acquired. They should get the exception, and the docs document that an exception may be raised.

    3. djblets/protect/locks.py (Diff revision 4)
       
       
      Show all issues

      Can we add a __del__ implementation that logs a warning if the object gets garbage collected while the lock is acquired? (indicating a missing release())

    4. 
        
    chipx86
    david
    1. 
        
    2. djblets/protect/locks.py (Diff revisions 4 - 5)
       
       
       
       
      Show all issues

      "while released" -> "without being released"

    3. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-5.x (fc33a03)