Add SCMTool ID to Repository model.
Review Request #12332 — Created June 3, 2022 and submitted
As part of the effort towards creating a registry for SCMTools, we're
working towards deprecating the oldTool
model, which was what we had
initially created to support multiple SCM types. This model was
initially populated via a database fixture, and over the years we've
moved to entry points, and (soon) to a registry for storing all of the
available tools.If
Tool
is going away over time, we need a new way for Repository to
keep track of what SCMTool it's supposed to use. This change adds a new
field calledscmtool_id
, which will be the ID that can be looked up in
the registry. Populating this field is a little complex because the
existing Tool object only stores the SCMTool class name, not the ID. We
therefore can't initialize the field from an evolution, even using
custom SQL. The solution we've settled on is to wrap the upgrade process
with something that will store a mapping of the existing ID relations,
run any database evolutions, and then apply the mapping to populate the
field.This should allow us to move that information from the Tool model (and
associated SCMTool class) into the Repository model during the upgrade,
even in the case where the evolutions being run delete the Tool model
entirely. For now the code that builds that stored info uses the Tool
model, but there's a note about how we'll need to update that to use
hand-written SQL queries when the model instance gets deleted in the
future.
Set up a bunch of different repositories and ran
manage.py upgrade
.
Saw that the new field was populated correctly.
Summary | ID |
---|---|
0071dfe9dcd6c4a411816e8fae764296ea0740a4 |
Description | From | Last Updated |
---|---|---|
I think we should avoid this. hosting already provides the username@repository-type information. It's more public-facing than other database entries. |
chipx86 | |
Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (IntegrationConfig, StatusUpdate, upcoming APIToken … |
chipx86 | |
We should probably use .prefetch_related() here, bringing this down to 2 queries total. |
chipx86 | |
Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able … |
chipx86 | |
You can pass flat=True to .values_list() to get a flat list of IDs, no indexing required. |
chipx86 |
-
-
I think we should avoid this.
hosting
already provides the username@repository-type information. It's more public-facing than other database entries. -
Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (
IntegrationConfig
,StatusUpdate
, upcomingAPIToken
work). -
-
Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able to bulk-update, avoiding the overhead for customers with larger lists of repositories.
How about:
# This will just be 2 queries in total, optimized only for the fields # we need, and leveraging the database as best as possible: # # tools = ( Tool.objects .prefetch_related(Prefetch( 'repositories', queryset=Repository.objects.only('pk', 'tool_id'))) .order_by('pk') ) scmtool_id_data = { # This is required instead of values_list() due to the prefetch, but # will be optimized due to what we chose to prefetch. tool.scmtool_id: [ repository.pk repository in tool.repositories.all() ] for tool in tools } ... # This will let us bulk-update. for scmtool_id, repository_ids in scmtool_id_data.items(): repositories = Repository.objects.filter(pk__in=repository_ids) repositories.update(scmtool_id=scmtool_id)
Should be a big performance win.
-