Add SCMTool ID to Repository model.

Review Request #12332 — Created June 3, 2022 and submitted

Information

Review Board
release-5.0.x

Reviewers

As part of the effort towards creating a registry for SCMTools, we're
working towards deprecating the old Tool model, which was what we had
initially created to support multiple SCM types. This model was
initially populated via a database fixture, and over the years we've
moved to entry points, and (soon) to a registry for storing all of the
available tools.

If Tool is going away over time, we need a new way for Repository to
keep track of what SCMTool it's supposed to use. This change adds a new
field called scmtool_id, which will be the ID that can be looked up in
the registry. Populating this field is a little complex because the
existing Tool object only stores the SCMTool class name, not the ID. We
therefore can't initialize the field from an evolution, even using
custom SQL. The solution we've settled on is to wrap the upgrade process
with something that will store a mapping of the existing ID relations,
run any database evolutions, and then apply the mapping to populate the
field.

This should allow us to move that information from the Tool model (and
associated SCMTool class) into the Repository model during the upgrade,
even in the case where the evolutions being run delete the Tool model
entirely. For now the code that builds that stored info uses the Tool
model, but there's a note about how we'll need to update that to use
hand-written SQL queries when the model instance gets deleted in the
future.

Set up a bunch of different repositories and ran manage.py upgrade.
Saw that the new field was populated correctly.

Summary ID
Add SCMTool ID to Repository model.
As part of the effort towards creating a registry for SCMTools, we're working towards deprecating the old `Tool` model, which was what we had initially created to support multiple SCM types. This model was initially populated via a database fixture, and over the years we've moved to entry points, and (soon) to a registry for storing all of the available tools. If `Tool` is going away over time, we need a new way for Repository to keep track of what SCMTool it's supposed to use. This change adds a new field called `scmtool_id`, which will be the ID that can be looked up in the registry. Populating this field is a little complex because the existing Tool object only stores the SCMTool class name, not the ID. We therefore can't initialize the field from an evolution, even using custom SQL. The solution we've settled on is to wrap the upgrade process with something that will store a mapping of the existing ID relations, run any database evolutions, and then apply the mapping to populate the field. This should allow us to move that information from the Tool model (and associated SCMTool class) into the Repository model during the upgrade, even in the case where the evolutions being run delete the Tool model entirely. For now the code that builds that stored info uses the Tool model, but there's a note about how we'll need to update that to use hand-written SQL queries when the model instance gets deleted in the future. Testing Done: Set up a bunch of different repositories and ran `manage.py upgrade`. Saw that the new field was populated correctly.
0071dfe9dcd6c4a411816e8fae764296ea0740a4
Description From Last Updated

I think we should avoid this. hosting already provides the username@repository-type information. It's more public-facing than other database entries.

chipx86chipx86

Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (IntegrationConfig, StatusUpdate, upcoming APIToken …

chipx86chipx86

We should probably use .prefetch_related() here, bringing this down to 2 queries total.

chipx86chipx86

Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able …

chipx86chipx86

You can pass flat=True to .values_list() to get a flat list of IDs, no indexing required.

chipx86chipx86
chipx86
  1. 
      
  2. reviewboard/scmtools/admin.py (Diff revision 1)
     
     
    Show all issues

    I think we should avoid this. hosting already provides the username@repository-type information. It's more public-facing than other database entries.

  3. reviewboard/scmtools/models.py (Diff revision 1)
     
     
    Show all issues

    Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (IntegrationConfig, StatusUpdate, upcoming APIToken work).

  4. reviewboard/upgrade.py (Diff revision 1)
     
     
    Show all issues

    We should probably use .prefetch_related() here, bringing this down to 2 queries total.

  5. reviewboard/upgrade.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues

    Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able to bulk-update, avoiding the overhead for customers with larger lists of repositories.

    How about:

    # This will just be 2 queries in total, optimized only for the fields
    # we need, and leveraging the database as best as possible:
    #
    # 
    tools = (
        Tool.objects
        .prefetch_related(Prefetch(
            'repositories',
            queryset=Repository.objects.only('pk', 'tool_id')))
        .order_by('pk')
    )
    
    scmtool_id_data = {
        # This is required instead of values_list() due to the prefetch, but
        # will be optimized due to what we chose to prefetch.
        tool.scmtool_id: [
            repository.pk
            repository in tool.repositories.all()
        ]
        for tool in tools
    }
    
    ...
    
    # This will let us bulk-update.
    for scmtool_id, repository_ids in scmtool_id_data.items():
        repositories = Repository.objects.filter(pk__in=repository_ids)
        repositories.update(scmtool_id=scmtool_id)
    

    Should be a big performance win.

  6. reviewboard/upgrade.py (Diff revision 1)
     
     
     
    Show all issues

    You can pass flat=True to .values_list() to get a flat list of IDs, no indexing required.

  7. 
      
david
chipx86
  1. Ship It!
  2. 
      
david
Review request changed
Status:
Completed
Change Summary:
Pushed to release-5.0.x (2449b13)