Update the BaseWebAPIToken model for API Tokens v2
Review Request #12341 — Created June 6, 2022 and submitted
This change is the first in a series of changes for improving our API Tokens
(which we'll refer to as API Tokens v2). Currently, Djblets generates tokens
using SHA1, which has the following disadvantages:
- Tokens aren't able to be validated or identified from other SHA1 tokens.
- Can't use secret scanning to detect leaked tokens
- Aren't as secure as some modern counterparts.In response, we are moving towards a new token format that contain a
vendor/category prefix and can be validated through a checksum. Additionally,
we want to be able to support token expiration and invalidation, which we
currently don't do. Since we are moving away from an old token format, we also
need support for token deprecation.In this change we modify the API tokens model to allow the possibility for
new token types, expiration, and invalidation. The following changes are made
to theBaseWebAPIToken
model:
- Addingexpires
field and anis_expired
method.
- Addingvalid
,invalid_date
andinvalid_reason
fields.
- Addingtoken_generator_id
field.
- Addinglast_used
field.
- Updating max length of token to 255 characters.The token authentication backends were also updated to block authentication
if an expired or invalid token was used. We also added help texts for all
of theBaseWebAPIToken
fields for better clarity.
- Ran unit tests in
djblets.webapi.tests
- Manually tested token authentication when making requests with
curl
andrbt api-get
Summary | ID |
---|---|
25b540b95771d6599931010a52a7db61ca93ac88 |
Description | From | Last Updated |
---|---|---|
I think we just want timezone.now(). This is the modern equivalent of our old get_tz_aware_utcnow(). |
chipx86 | |
By default, save() will save every single field in the model. If there are two threads dealing with an API … |
chipx86 | |
Same note about saving. |
chipx86 | |
Blank line before comments. |
chipx86 | |
We haven't been consistent with this, but we should aim to have localized help_text= set for every field. Since that's … |
chipx86 | |
255 should be fine. |
chipx86 | |
'django.utils.timezone' imported but unused Column: 1 Error code: F401 |
reviewbot | |
continuation line over-indented for hanging indent Column: 13 Error code: E126 |
reviewbot | |
This is being set after we save. I assume we'll want to set this before the parent save call above. |
chipx86 | |
Things can go wrong when comparing dates/times. 1 second seems fine, but what if we hit a leap second while … |
chipx86 | |
It looks like you've pulled on release-3.x but not rebased, which is why your diff is now undoing these changes. … |
david | |
This could use djblets.db.fields.ModificationTimestampField in order to avoid having to babysit it in save() |
david | |
Add a blank line in here (datetime is standard lib, django is third party). |
david | |
Can you move this to the third-party import section? |
david | |
'datetime.timedelta' imported but unused Column: 1 Error code: F401 |
reviewbot | |
I feel like this is something we should do per-test-method rather than for everything in the entire suite. I think … |
david | |
This is missing a "Version Added". |
chipx86 | |
Might be more clear as "... the token is invalid or expired, and any appropriate HTTP headers to send to … |
chipx86 | |
This should be "str" now. |
chipx86 | |
Since we're defining a new method, it might be a good opportunity to replace this with a dictionary, for better … |
chipx86 | |
Any % values must be specified outside of a _(...). Otherwise they'll modify the string that needs to be localized. |
chipx86 | |
We should specify which user this pertains to. |
chipx86 | |
Same comment about localized string formatting. |
chipx86 | |
We always either override everything in here, or return None. Rather than pre-define it, let's just return { ... } … |
chipx86 | |
Same comment about localized string formatting. |
chipx86 | |
We should specify which user this pertains to. |
chipx86 | |
For safety, this should ddo LOGIN_FAILED.headers.copy(). That avoids any state leakage (a problem we've had before in the API). |
chipx86 | |
For something like this, formatting works a bit better as: return_dict['error_message'] = ( _('...') % ... ) That gives both … |
chipx86 | |
Rather than format and set a default, let's move this to an else below. |
chipx86 | |
As above, this becomes more clear when the form is: return_dict['error_message'] = ( _('This token is invalid. It became invalid … |
chipx86 | |
When logging, we don't want to use %. Instead, pass the value as a positional parameter. |
chipx86 | |
Similar comments above apply here. |
chipx86 | |
This should contain a Version Added. |
chipx86 | |
We've been moving to a better format. This can be: result = ( super(...) .valdilate_credentials(...) ) Though, being Python 3, … |
chipx86 | |
This can be simplified to: return self.expires is not None and timezone.now() >= self.expires |
chipx86 | |
Plain imports should be listed before from imports. |
chipx86 | |
Let's switch this to an import kgb, and then access these as attributes from kgb. This is the pattern we're … |
chipx86 | |
For unit tests, we don't want to use _(...). |
chipx86 | |
Same note about import kgb. |
chipx86 | |
This should be assertTrue |
chipx86 | |
line too long (82 > 79 characters) Column: 80 Error code: E501 |
reviewbot | |
continuation line over-indented for visual indent Column: 36 Error code: E127 |
reviewbot | |
Dictionaries should (almost) always be in multi-line format, with keys alphabetized and on their own line (not on the { … |
chipx86 | |
Same here. |
chipx86 | |
Missing a trailing comma. |
chipx86 | |
Let's actually move these into the assertion, so it's not scattered: self.assertEqual( result, ( False, '...', { # Explicit headers … |
chipx86 | |
No need for the .copy() here, since we know it won't be modified in a test. Although, we want to … |
chipx86 | |
I'm torn on this. My gut is telling me we don't want to bake this into the API. I think, … |
chipx86 | |
We don't want to localize here. Test results should almost always compare against fixed state. Generating dynamic state can mask … |
chipx86 | |
'djblets.webapi.errors.LOGIN_FAILED' imported but unused Column: 1 Error code: F401 |
reviewbot | |
These should still hard-code the exact string we expect, rather than running it through any Django functions. Otherwise we're just … |
chipx86 |
-
-
I think we just want
timezone.now()
. This is the modern equivalent of our oldget_tz_aware_utcnow()
. -
By default,
save()
will save every single field in the model. If there are two threads dealing with an API token at once, one will overwrite the other's data.Best to always be explicit:
webapi_token.save(update_fields=('last_used',))
-
-
-
We haven't been consistent with this, but we should aim to have localized
help_text=
set for every field.Since that's long, the best formatting for fields are:
field = models.FooField( param, param, help_text=...)
-
- Change Summary:
-
- Added
help_text
to token fields and reformatted the fields - Removed updating the
last_used
field for OAuth2 tokens since that field doesnt exist for those tokens - Modified
BaseWebAPIToken
default test to check for the new fields too - Modified
WebAPITokenAuthBackend
's tests to check iflast_used
field gets updated upon successful authentication - Updates the
last_used
field inWebAPITokenAuthBackend
- Updates the
last_updated
field in whenever an API token object is saved and that save is from an update
- Added
- Description:
-
~ As part of our move to API Tokens v2, we need to open API tokens for new token
~ types, expiration, and invalidation. The following changes are made to the ~ BaseWebApiToken
model:~ This change is the first in a series of changes for improving our API Tokens
~ (which we'll refer to as API Tokens v2). Currently, Djblets generates tokens ~ using SHA1, which has the following disadvantages: + - Tokens aren't able to be validated or identified from other SHA1 tokens. + - Can't use secret scanning to detect leaked tokens + - Aren't as secure as some modern counterparts. ~ - Adding
expires
field
~ - Adding
valid
,invalid_date
andinvalid_reason
fields
~ - Adding
token_generator_id
field
~ - Adding
last_used
field
~ - Updating max length of token to 255 characters.
~ In response, we are moving towards a new token format that contain a
~ vendor/category prefix and can be validated through a checksum. Additionally, ~ we want to be able to support token expiration and invalidation, which we ~ currently don't do. Since we are moving away from an old token format, we also ~ need support for token deprecation. + + In this change we modify the API tokens model to allow the possibility for
+ new token types, expiration, and invalidation. The following changes are made + to the BaseWebAPIToken
model:+ - Adding expires
field+ - Adding valid
,invalid_date
andinvalid_reason
fields+ - Adding token_generator_id
field+ - Adding last_used
field+ - Updating max length of token to 255 characters. + + We also add
help_text
for all theBaseWebAPIToken
fields for better+ clarity. - Adding
- Commits:
-
Summary ID 512cab638671fa0635701720c7ba37830a68de10 62dcc5f4eeadab14fbf86d3dbaba35301b532e9d
- Change Summary:
-
flake8 fixes
- Commits:
-
Summary ID 62dcc5f4eeadab14fbf86d3dbaba35301b532e9d fc20f6a2211b0705261bf7fda2518e0db30c1a53
Checks run (2 succeeded)
-
-
-
Things can go wrong when comparing dates/times. 1 second seems fine, but what if we hit a leap second while this runs? Or something causes the process to stall? This sort of stuff unfortunately happens.
A better solution we've found is to hard-code when "now" is, using spies:
self.spy_on(timezone.now, op=kgb.SpyOpReturn(...))
Then we can safely compare the values with strict equal checks.
- Change Summary:
-
- Uses
spy_on()
to settimezone.now
- Added unit test for
save()
updating thelast_updated
field
- Uses
- Commits:
-
Summary ID fc20f6a2211b0705261bf7fda2518e0db30c1a53 466de5ccb36d41597020958be885eb72db5b13ef
Checks run (2 succeeded)
- Change Summary:
-
Rebased with release-3.x
- Commits:
-
Summary ID 466de5ccb36d41597020958be885eb72db5b13ef bbdcc6ad1c1cce018b116c95eac2de22b3b7c9ce
Checks run (2 succeeded)
- Change Summary:
-
- Spy on
timezone.now
in Token Auth authentication test to see iflast_used
gets updated - Fixed imports ordering issues
- Uses
ModificationTimeStampField
for token'slast_updated
field
- Spy on
- Commits:
-
Summary ID bbdcc6ad1c1cce018b116c95eac2de22b3b7c9ce ace466caf3cfa24acc2a8ef4204c9401037b468b
- Change Summary:
-
got rid of unused import
- Commits:
-
Summary ID ace466caf3cfa24acc2a8ef4204c9401037b468b 6ed5e27724f2aab17f69d44f3836c543907c52d3
Checks run (2 succeeded)
-
-
I feel like this is something we should do per-test-method rather than for everything in the entire suite.
I think I'd also be happier if we initialized it with some fake timestamp value. Right now in something like
test_last_updated
, if no time has elapsed between when the token is created and when it's saved, there's no reason to think the timestamp would change. Instead, to be sure everything is working independent of timing, we could do this:- create token
- spy on
timezone.now
with a fake timestamp - save
- check timestamp against fake value
- Change Summary:
-
Spies on
timezone.now
on a per-test-method basis rather than for the entire test suite - Commits:
-
Summary ID 6ed5e27724f2aab17f69d44f3836c543907c52d3 f1850a70e67f14fcadcadeb9ebf8c6e3ce222863
Checks run (2 succeeded)
- Change Summary:
-
- Added a method for checking if a token is expired.
- The token authentication process now checks for invalid or expired tokens.
- The
last_used
field of the token was being updated twice.
One area would update it even if authentication failed (from the user not being active).
Changed it so thatlast_used
is only updated on successful authentications.
- Description:
-
This change is the first in a series of changes for improving our API Tokens
(which we'll refer to as API Tokens v2). Currently, Djblets generates tokens using SHA1, which has the following disadvantages: - Tokens aren't able to be validated or identified from other SHA1 tokens. - Can't use secret scanning to detect leaked tokens - Aren't as secure as some modern counterparts. In response, we are moving towards a new token format that contain a
vendor/category prefix and can be validated through a checksum. Additionally, we want to be able to support token expiration and invalidation, which we currently don't do. Since we are moving away from an old token format, we also need support for token deprecation. In this change we modify the API tokens model to allow the possibility for
new token types, expiration, and invalidation. The following changes are made to the BaseWebAPIToken
model:~ - Adding expires
field~ - Adding valid
,invalid_date
andinvalid_reason
fields~ - Adding token_generator_id
field~ - Adding last_used
field~ - Adding expires
field and anis_expired
method.~ - Adding valid
,invalid_date
andinvalid_reason
fields.~ - Adding token_generator_id
field.~ - Adding last_used
field.- Updating max length of token to 255 characters. ~ We also add
help_text
for all theBaseWebAPIToken
fields for better~ clarity. ~ The token authentication backends were also updated to block authentication
~ if an expired or invalid token was used. We also added help texts for all + of the BaseWebAPIToken
fields for better clarity. - Commits:
-
Summary ID f1850a70e67f14fcadcadeb9ebf8c6e3ce222863 69cae8216db36df3004dddde3fd3b4a7d88b9e19
Checks run (2 succeeded)
-
-
-
-
Since we're defining a new method, it might be a good opportunity to replace this with a dictionary, for better readability and future expansion.
-
Any
%
values must be specified outside of a_(...)
. Otherwise they'll modify the string that needs to be localized. -
-
-
-
-
-
We've been moving to a better format. This can be:
result = ( super(...) .valdilate_credentials(...) )
Though, being Python 3, the
super
can just besuper()
, probably. -
-
-
Let's switch this to an
import kgb
, and then access these as attributes fromkgb
. This is the pattern we're adopting elsewhere. -
-
-
- Commits:
-
Summary ID 69cae8216db36df3004dddde3fd3b4a7d88b9e19 9963d9cb06b5a00d089a6964ec3b420d124de200
- Commits:
-
Summary ID 9963d9cb06b5a00d089a6964ec3b420d124de200 9ba15ff0ffce019565b83e0b8553aca19bf9b83a
Checks run (2 succeeded)
-
-
Might be more clear as "... the token is invalid or expired, and any appropriate HTTP headers to send to the client."
-
We always either override everything in here, or return
None
. Rather than pre-define it, let's justreturn { ... }
directly where we need it.For any conditional values, let's set a variable and then use that as the value in the
return { ... }
. -
For safety, this should ddo
LOGIN_FAILED.headers.copy()
. That avoids any state leakage (a problem we've had before in the API). -
For something like this, formatting works a bit better as:
return_dict['error_message'] = ( _('...') % ... )
That gives both the message and the formatting arguments breathing room. Changes to one don't impact the other, and it's obvious how to reformat for any new changes.
-
-
As above, this becomes more clear when the form is:
return_dict['error_message'] = ( _('This token is invalid. It became invalid %s.' 'It was made invalid because: %s') % (webapi_token.invalid_date, webapi_token.invalid_reason) )
So that being said, I'd like to change the message. Something more like:
'This API token became invalid on %s: %s'
This is just a bit less technical, flows a bit better, and makes the reason more part of the error message itself. An example then becomes:
"This API token became invalid on 2023-06-14 12:13:14.123456: We've reset tokens due to a potential compromise of our internal systems."
Similarly, the above error should be
This API token became invalid on %s.
.We should probably format the date here as well, rather than using the (above) default representation that has too much precision.
Now we're not at all consistent with this (we mostly do this right in templates via the
|date
filter), but Django does have some localization code. We can do:from django.utils.formats import localize '... %s ...' % localize(webapi_token.invalid_date)
This gives us a string like
July 28, 2022, 6:18 a.m.
(matching bits of our UI), which users can customize if they really want to viasettings.DATETIME_FORMAT
(not that we advertise that). -
-
- Commits:
-
Summary ID 9ba15ff0ffce019565b83e0b8553aca19bf9b83a 141b505d35d961d923f74bfe3d1ccd0b69eafb26
Checks run (2 succeeded)
-
-
Dictionaries should (almost) always be in multi-line format, with keys alphabetized and on their own line (not on the
{
or}
lines):return { 'error_message': None, 'headers': None, }
-
-
-
Let's actually move these into the assertion, so it's not scattered:
self.assertEqual( result, ( False, '...', { # Explicit headers here } ))
Same below.
-
No need for the
.copy()
here, since we know it won't be modified in a test.Although, we want to be clear on expectations, so we know what regresses from a change. So this should hard-code any headers to check against.
Same below.
- Commits:
-
Summary ID 141b505d35d961d923f74bfe3d1ccd0b69eafb26 1771899497ed3539da1f5561d83288174ba9b821
- Commits:
-
Summary ID 1771899497ed3539da1f5561d83288174ba9b821 123eb08f08c2f01e6843d361330c8c29f1b63a2f
Checks run (2 succeeded)
- Change Summary:
-
Does not emit the
webapi_token_updated
signal when a token is used for authentication (this would happen because thelast_used
field was being updated whenever a token was used) - Commits:
-
Summary ID 123eb08f08c2f01e6843d361330c8c29f1b63a2f a49a2dd9afd39fc6caad45e1d746f8faa5d7aa80
Checks run (2 succeeded)
-
-
I'm torn on this. My gut is telling me we don't want to bake this into the API.
I think, since this is a workaround for wanting to update a timestamp without triggering default behavior, we should work around it there.
We have two approaches we could take for that:
-
We could use
save_base()
instead ofsave()
. This can be used just likesave()
for our purposes. It just bypasses overriddensave
logic, and some deferred model and field sanity checks that won't apply in our case. -
Alternatively, we could do
api_token_model.objects.filter(pk=token.pk).update(...)
to update the database copy without triggering any functionality (including things like the standardpre_save
/post_save
signals).
Since we probably aren't interested in making this save entirely invisible — we just want to avoid
webapi_token_updated
— we should probably go with the first option. -
-
We don't want to localize here. Test results should almost always compare against fixed state. Generating dynamic state can mask bugs.
We can assume a default localization when running tests. If the assumption were to be wrong, then we'd have a test environment/setup issue, which we don't want to mask.
- Commits:
-
Summary ID a49a2dd9afd39fc6caad45e1d746f8faa5d7aa80 45f9622b58f2b84f071dc0269fdffd272a491c55
Checks run (2 succeeded)
-
-
These should still hard-code the exact string we expect, rather than running it through any Django functions. Otherwise we're just testing Django's functionality using Django's functionality. So let's make this whole thing a plain string, no function calls. If the values ever change, it highlights a test environment problem, or a change in Django that we need to be aware of.