Add distributed lock functionality, and locked cache updates.

Review Request #14628 — Created Oct. 8, 2025 and updated

Information

Djblets
release-5.x

Reviewers

This introduces djblets.protect, a new module for service protection
capabilities, and specifically djblets.protect.locks.CacheLock, which
is a simple distributed lock utilizing the cache backend. This can help
avoid cache stampede issues, and overall reduce the work required by a
service.

It's important to note that these locks should only be used in cases
where the loss of a lock will not cause corruption or other bad
behavior. As cache backends may expire keys prematurely, and may lack
atomic operations, a lock cannot be guaranteed. These can be thought of
as a soft optimistic lock.

Locks have an expiration, and consumers can block waiting on a lock to
be available or return immediately, giving control over how to best
utilize a lock.

Locks are set by performing an atomic add() with a UUID4. If the value
is added, the lock is acquired. If it already exists, the lock has to
either block waiting or return a result. Waiting supports a timeout and
a time between retries.

When waiting, the lock will periodically check if it can acquire a new
lock, using the provided timestamp and some random jitter to help avoid
issues with stampedes where too many consumers are trying at the same
times to check and acquire a lock.

Locks are released when they expire or (ideally) when release() is
called. It's also possible they may fall out of cache, at which point
the lock is no longer valid, and suitable logging will occur.

Since there aren't atomic operations around deletes, this will try to do
release a lock as safely as possible. If the time spent with the lock is
greater than the expected expiration, it will assume the lock has
expired in cache and won't delete it (it may have been re-acquired
elsewhere). Otherwise, it will attempt to bump the expiration to
keep the key alive long enough to check it, with a worst-case scenario
that the other acquirer may have a new expiration set (likely extending
the lock). This is preferable over deleting another lock.

When using a lock as a context manager, both acquiring and releasing the
lock is handled automatically.

The interface is designed to be largely API-compatible with
threading.Lock and similar lock interfaces, but with more flexibility
useful for distributed lock behavior.

A pattern I expect to be common will be to lock a cache key when
calculating state to store and then writing it, which may be expensive
(for instance, talking to a remote service and storing the result).

For this, cache_memoize() and cache_memoize_iter() have been updated
to work with locks. They now take a lock= argument, which accepts a
CacheLock with the parameters controlling the lock behavior. If
provided, the lock will be acquired if the initial fetch doesn't yield a
value. A second fetch is then attempted (in case it had to wait for
another process to finish), and if it still needs to compute data to
cache, it will do so under the protection of the lock, releasing when
complete.

Locks are entirely optional and not enabled by default for any current
caching behavior, but are something we'll likely want to opt into any
time we're working on caching something that's expensive to generate.

Unit tests pass.

Summary ID
Add distributed lock functionality, and locked cache updates.
This introduces `djblets.protect`, a new module for service protection capabilities, and specifically `djblets.protect.locks.CacheLock`, which is a simple distributed lock utilizing the cache backend. This can help avoid cache stampede issues, and overall reduce the work required by a service. It's important to note that these locks should only be used in cases where the loss of a lock will not cause corruption or other bad behavior. As cache backends may expire keys prematurely, and may lack atomic operations, a lock cannot be guaranteed. These can be thought of as a soft optimistic lock. Locks have an expiration, and consumers can block waiting on a lock to be available or return immediately, giving control over how to best utilize a lock. Locks are set by performing an atomic `add()` with a UUID4. If the value is added, the lock is acquired. If it already exists, the lock has to either block waiting or return a result. Waiting supports a timeout and a time between retries. When waiting, the lock will periodically check if it can acquire a new lock, using the provided timestamp and some random jitter to help avoid issues with stampedes where too many consumers are trying at the same times to check and acquire a lock. Locks are released when they expire or (ideally) when `release()` is called. It's also possible they may fall out of cache, at which point the lock is no longer valid, and suitable logging will occur. Since there aren't atomic operations around deletes, this will try to do release a lock as safely as possible. If the time spent with the lock is greater than the expected expiration, it will assume the lock has expired in cache and won't delete it (it may have been re-acquired elsewhere). Otherwise, it will attempt to bump the expiration to keep the key alive long enough to check it, with a worst-case scenario that the other acquirer may have a new expiration set (likely extending the lock). This is preferable over deleting another lock. When using a lock as a context manager, both acquiring and releasing the lock is handled automatically. The interface is designed to be largely API-compatible with `threading.Lock` and similar lock interfaces, but with more flexibility useful for distributed lock behavior. A pattern I expect to be common will be to lock a cache key when calculating state to store and then writing it, which may be expensive (for instance, talking to a remote service and storing the result). For this, `cache_memoize()` and `cache_memoize_iter()` have been updated to work with locks. They now take a `lock=` argument, which accepts a `CacheLock` with the parameters controlling the lock behavior. If provided, the lock will be acquired if the initial fetch doesn't yield a value. A second fetch is then attempted (in case it had to wait for another process to finish), and if it still needs to compute data to cache, it will do so under the protection of the lock, releasing when complete. Locks are entirely optional and not enabled by default for any current caching behavior, but are something we'll likely want to opt into any time we're working on caching something that's expensive to generate.
97a37605a541154ffe6a6e1abcf80fd48a1258a8
Description From Last Updated

Can we add debug logging for when locks are acquired, extended, and released? This seems like potentially a cause of …

daviddavid

Remove this blank line?

daviddavid

typo: as -> was

daviddavid

Do we want to add any validation to these (ex: no negative numbers, timeout should be longer than retry, etc)?

daviddavid

These attributes are actually named timeout_secs and retry_secs

daviddavid

Can we add the lock key in this exception message? AssertionError also seems like maybe not the best type. Perhaps …

daviddavid

Comparing a float like this isn't reliable. While we could use something like math.isclose(), I think a much better option …

daviddavid

Seems like we likely want >= instead of >

daviddavid

The implementation here doesn't return anything

daviddavid

typo: released -> release

daviddavid

It seems like there's a potential race here where in between the check and the delete, the existing lock could …

daviddavid

Should we catch RuntimeError here as well, in case this is called while the lock is already acquired?

daviddavid

Can we add a __del__ implementation that logs a warning if the object gets garbage collected while the lock is …

daviddavid
chipx86
chipx86
david
  1. 
      
  2. Show all issues

    Can we add debug logging for when locks are acquired, extended, and released? This seems like potentially a cause of difficult bugs.

  3. djblets/cache/backend.py (Diff revision 3)
     
     
    Show all issues

    Remove this blank line?

    1. Our docs with multi-line bullet points usually have a blank line so they don't all run together. Same as blank lines between paragraphs.

  4. djblets/cache/backend.py (Diff revision 3)
     
     
    Show all issues

    typo: as -> was

  5. djblets/protect/locks.py (Diff revision 3)
     
     
     
    Show all issues

    Do we want to add any validation to these (ex: no negative numbers, timeout should be longer than retry, etc)?

    1. -1 is valid for timeout_secs (follows the same signature as Python's locking APIs), and a lesser value will always time out so I'm not worried about that, but I'll toy with the smarts around retry.

  6. djblets/protect/locks.py (Diff revision 3)
     
     
     
    Show all issues

    These attributes are actually named timeout_secs and retry_secs

    1. Ah good catch. Changed the names to fit Python locking APIs. Fixing.

  7. djblets/protect/locks.py (Diff revision 3)
     
     
    Show all issues

    Can we add the lock key in this exception message?

    AssertionError also seems like maybe not the best type. Perhaps just a RuntimeError?

    1. Yeah. Maybe not the key since it's not key-bound but instance-bound, and that's an important distinction for debugging.

      RuntimeError is a better match for the Locking APIs.

  8. djblets/protect/locks.py (Diff revision 3)
     
     
    Show all issues

    Comparing a float like this isn't reliable. While we could use something like math.isclose(), I think a much better option than a -1 magic value would be to define a sentinel value to use to indicate no timeout.

    1. -1 and floats are explicitly part of the Locking APIs, which I want to maintain compatibility with.

  9. djblets/protect/locks.py (Diff revision 3)
     
     
    Show all issues

    Seems like we likely want >= instead of >

  10. djblets/protect/locks.py (Diff revision 3)
     
     
     
     
     
    Show all issues

    The implementation here doesn't return anything

    1. Ah yep, older design.

  11. djblets/protect/locks.py (Diff revision 3)
     
     
    Show all issues

    typo: released -> release

  12. djblets/protect/locks.py (Diff revision 3)
     
     
     
     
    Show all issues

    It seems like there's a potential race here where in between the check and the delete, the existing lock could timeout and be acquired by another user. I don't suppose there are any atomic operations we could do for this?

    1. I looked into that when I wrote this, and sadly, no conditional deletes or anything. But I suppose I could attempt a new add() with the same token, and if the result is still this instance's token in either case, we can safely delete.

    2. So the reality here is that we can't make any guarantees. Memcached by default (and certainly through Django's layers) don't give us what's needed for this.

      Memcached really is not a safe place to build a reliable locking mechanism (beyond not having the atomic operations we need, memcached can always just randomly evict things), but for our purposes, we don't need a reliable lock here. The goal is to optimistically avoid concurrent operations for the same thing, like building a diff, where if one stomps on the other it's okay but we ideally avoid it in the average case.

      What I'm doing is implementing some basic protections and then document that this is really a sort of fuzzy lock, list the limitations, and provide guidance on use. I a caller gets their lock stomped on, or two operations end up running concurrently, the caller needs to be okay with that.

  13. 
      
chipx86
Review request changed
Change Summary:
  • Added a big note to the docs that these locks can be lossy, listing the limitations and use cases.
  • Added safer handling of key loss (from cache purge or race conditions). Now when releasing a lock, we first check if we think we're over the expiration. If so, we don't touch the key. Otherwise we touch the key to extend expiration, and if that succeeds, we check the stored token and delete.
  • Added random jitter when retrying lock acquisition, which avoids processes continuously trying to acquire locks at the same time.
  • A new token is generated each time acquire() is called, allowing lock reuse.
  • Added debug and warning logging.
  • Fixed various issues in docs.
  • Added some checks and constraints for timeout and retry values.
  • Switched to RuntimeError for exceptions.
Description:
   

This introduces djblets.protect, a new module for service protection

    capabilities, and specifically djblets.protect.locks.CacheLock, which
    is a simple distributed lock utilizing the cache backend. This can help
    avoid cache stampede issues, and overall reduce the work required by a
    service.

   
  +

It's important to note that these locks should only be used in cases

  + where the loss of a lock will not cause corruption or other bad
  + behavior. As cache backends may expire keys prematurely, and may lack
  + atomic operations, a lock cannot be guaranteed. These can be thought of
  + as a soft optimistic lock.

  +
   

Locks have an expiration, and consumers can block waiting on a lock to

    be available or return immediately, giving control over how to best
    utilize a lock.

   
   

Locks are set by performing an atomic add() with a UUID4. If the value

    is added, the lock is acquired. If it already exists, the lock has to
    either block waiting or return a result. Waiting supports a timeout and
    a time between retries.

   
  +

When waiting, the lock will periodically check if it can acquire a new

  + lock, using the provided timestamp and some random jitter to help avoid
  + issues with stampedes where too many consumers are trying at the same
  + times to check and acquire a lock.

  +
   

Locks are released when they expire or (ideally) when release() is

~   called.

  ~ called. It's also possible they may fall out of cache, at which point
  + the lock is no longer valid, and suitable logging will occur.

  +
  +

Since there aren't atomic operations around deletes, this will try to do

  + release a lock as safely as possible. If the time spent with the lock is
  + greater than the expected expiration, it will assume the lock has
  + expired in cache and won't delete it (it may have been re-acquired
  + elsewhere). Otherwise, it will attempt to bump the expiration to
  + keep the key alive long enough to check it, with a worst-case scenario
  + that the other acquirer may have a new expiration set (likely extending
  + the lock). This is preferable over deleting another lock.

   
   

When using a lock as a context manager, both acquiring and releasing the

    lock is handled automatically.

   
   

The interface is designed to be largely API-compatible with

    threading.Lock and similar lock interfaces, but with more flexibility
    useful for distributed lock behavior.

   
   

A pattern I expect to be common will be to lock a cache key when

    calculating state to store and then writing it, which may be expensive
    (for instance, talking to a remote service and storing the result).

   
   

For this, cache_memoize() and cache_memoize_iter() have been updated

    to work with locks. They now take a lock= argument, which accepts a
    CacheLock with the parameters controlling the lock behavior. If
    provided, the lock will be acquired if the initial fetch doesn't yield a
    value. A second fetch is then attempted (in case it had to wait for
    another process to finish), and if it still needs to compute data to
    cache, it will do so under the protection of the lock, releasing when
    complete.

   
   

Locks are entirely optional and not enabled by default for any current

    caching behavior, but are something we'll likely want to opt into any
    time we're working on caching something that's expensive to generate.

Commits:
Summary ID
Add distributed lock functionality, and locked cache updates.
This introduces `djblets.protect`, a new module for service protection capabilities, and specifically `djblets.protect.locks.CacheLock`, which is a simple distributed lock utilizing the cache backend. This can help avoid cache stampede issues, and overall reduce the work required by a service. Locks have an expiration, and consumers can block waiting on a lock to be available or return immediately, giving control over how to best utilize a lock. Locks are set by performing an atomic `add()` with a UUID4. If the value is added, the lock is acquired. If it already exists, the lock has to either block waiting or return a result. Waiting supports a timeout and a time between retries. Locks are released when they expire or (ideally) when `release()` is called. When using a lock as a context manager, both acquiring and releasing the lock is handled automatically. The interface is designed to be largely API-compatible with `threading.Lock` and similar lock interfaces, but with more flexibility useful for distributed lock behavior. A pattern I expect to be common will be to lock a cache key when calculating state to store and then writing it, which may be expensive (for instance, talking to a remote service and storing the result). For this, `cache_memoize()` and `cache_memoize_iter()` have been updated to work with locks. They now take a `lock=` argument, which accepts a `CacheLock` with the parameters controlling the lock behavior. If provided, the lock will be acquired if the initial fetch doesn't yield a value. A second fetch is then attempted (in case it had to wait for another process to finish), and if it still needs to compute data to cache, it will do so under the protection of the lock, releasing when complete. Locks are entirely optional and not enabled by default for any current caching behavior, but are something we'll likely want to opt into any time we're working on caching something that's expensive to generate.
c181dc802fc56790d92b3494a6790ec809bdfce8
Add distributed lock functionality, and locked cache updates.
This introduces `djblets.protect`, a new module for service protection capabilities, and specifically `djblets.protect.locks.CacheLock`, which is a simple distributed lock utilizing the cache backend. This can help avoid cache stampede issues, and overall reduce the work required by a service. It's important to note that these locks should only be used in cases where the loss of a lock will not cause corruption or other bad behavior. As cache backends may expire keys prematurely, and may lack atomic operations, a lock cannot be guaranteed. These can be thought of as a soft optimistic lock. Locks have an expiration, and consumers can block waiting on a lock to be available or return immediately, giving control over how to best utilize a lock. Locks are set by performing an atomic `add()` with a UUID4. If the value is added, the lock is acquired. If it already exists, the lock has to either block waiting or return a result. Waiting supports a timeout and a time between retries. When waiting, the lock will periodically check if it can acquire a new lock, using the provided timestamp and some random jitter to help avoid issues with stampedes where too many consumers are trying at the same times to check and acquire a lock. Locks are released when they expire or (ideally) when `release()` is called. It's also possible they may fall out of cache, at which point the lock is no longer valid, and suitable logging will occur. Since there aren't atomic operations around deletes, this will try to do release a lock as safely as possible. If the time spent with the lock is greater than the expected expiration, it will assume the lock has expired in cache and won't delete it (it may have been re-acquired elsewhere). Otherwise, it will attempt to bump the expiration to keep the key alive long enough to check it, with a worst-case scenario that the other acquirer may have a new expiration set (likely extending the lock). This is preferable over deleting another lock. When using a lock as a context manager, both acquiring and releasing the lock is handled automatically. The interface is designed to be largely API-compatible with `threading.Lock` and similar lock interfaces, but with more flexibility useful for distributed lock behavior. A pattern I expect to be common will be to lock a cache key when calculating state to store and then writing it, which may be expensive (for instance, talking to a remote service and storing the result). For this, `cache_memoize()` and `cache_memoize_iter()` have been updated to work with locks. They now take a `lock=` argument, which accepts a `CacheLock` with the parameters controlling the lock behavior. If provided, the lock will be acquired if the initial fetch doesn't yield a value. A second fetch is then attempted (in case it had to wait for another process to finish), and if it still needs to compute data to cache, it will do so under the protection of the lock, releasing when complete. Locks are entirely optional and not enabled by default for any current caching behavior, but are something we'll likely want to opt into any time we're working on caching something that's expensive to generate.
97a37605a541154ffe6a6e1abcf80fd48a1258a8

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
david
  1. 
      
  2. djblets/cache/backend.py (Diff revision 4)
     
     
     
     
     
    Show all issues

    Should we catch RuntimeError here as well, in case this is called while the lock is already acquired?

  3. djblets/protect/locks.py (Diff revision 4)
     
     
    Show all issues

    Can we add a __del__ implementation that logs a warning if the object gets garbage collected while the lock is acquired? (indicating a missing release())

  4.