• 
      

    Add SCMTool ID to Repository model.

    Review Request #12332 — Created June 3, 2022 and submitted

    Information

    Review Board
    release-5.0.x

    Reviewers

    As part of the effort towards creating a registry for SCMTools, we're
    working towards deprecating the old Tool model, which was what we had
    initially created to support multiple SCM types. This model was
    initially populated via a database fixture, and over the years we've
    moved to entry points, and (soon) to a registry for storing all of the
    available tools.

    If Tool is going away over time, we need a new way for Repository to
    keep track of what SCMTool it's supposed to use. This change adds a new
    field called scmtool_id, which will be the ID that can be looked up in
    the registry. Populating this field is a little complex because the
    existing Tool object only stores the SCMTool class name, not the ID. We
    therefore can't initialize the field from an evolution, even using
    custom SQL. The solution we've settled on is to wrap the upgrade process
    with something that will store a mapping of the existing ID relations,
    run any database evolutions, and then apply the mapping to populate the
    field.

    This should allow us to move that information from the Tool model (and
    associated SCMTool class) into the Repository model during the upgrade,
    even in the case where the evolutions being run delete the Tool model
    entirely. For now the code that builds that stored info uses the Tool
    model, but there's a note about how we'll need to update that to use
    hand-written SQL queries when the model instance gets deleted in the
    future.

    Set up a bunch of different repositories and ran manage.py upgrade.
    Saw that the new field was populated correctly.

    Summary ID
    Add SCMTool ID to Repository model.
    As part of the effort towards creating a registry for SCMTools, we're working towards deprecating the old `Tool` model, which was what we had initially created to support multiple SCM types. This model was initially populated via a database fixture, and over the years we've moved to entry points, and (soon) to a registry for storing all of the available tools. If `Tool` is going away over time, we need a new way for Repository to keep track of what SCMTool it's supposed to use. This change adds a new field called `scmtool_id`, which will be the ID that can be looked up in the registry. Populating this field is a little complex because the existing Tool object only stores the SCMTool class name, not the ID. We therefore can't initialize the field from an evolution, even using custom SQL. The solution we've settled on is to wrap the upgrade process with something that will store a mapping of the existing ID relations, run any database evolutions, and then apply the mapping to populate the field. This should allow us to move that information from the Tool model (and associated SCMTool class) into the Repository model during the upgrade, even in the case where the evolutions being run delete the Tool model entirely. For now the code that builds that stored info uses the Tool model, but there's a note about how we'll need to update that to use hand-written SQL queries when the model instance gets deleted in the future. Testing Done: Set up a bunch of different repositories and ran `manage.py upgrade`. Saw that the new field was populated correctly.
    0071dfe9dcd6c4a411816e8fae764296ea0740a4
    Description From Last Updated

    I think we should avoid this. hosting already provides the username@repository-type information. It's more public-facing than other database entries.

    chipx86chipx86

    Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (IntegrationConfig, StatusUpdate, upcoming APIToken …

    chipx86chipx86

    We should probably use .prefetch_related() here, bringing this down to 2 queries total.

    chipx86chipx86

    Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able …

    chipx86chipx86

    You can pass flat=True to .values_list() to get a flat list of IDs, no indexing required.

    chipx86chipx86
    chipx86
    1. 
        
    2. reviewboard/scmtools/admin.py (Diff revision 1)
       
       
      Show all issues

      I think we should avoid this. hosting already provides the username@repository-type information. It's more public-facing than other database entries.

    3. reviewboard/scmtools/models.py (Diff revision 1)
       
       
      Show all issues

      Leaning toward saying we may want to increase this. Other such module/ID-referencing fields are at 255 (IntegrationConfig, StatusUpdate, upcoming APIToken work).

    4. reviewboard/upgrade.py (Diff revision 1)
       
       
      Show all issues

      We should probably use .prefetch_related() here, bringing this down to 2 queries total.

    5. reviewboard/upgrade.py (Diff revision 1)
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
      Show all issues

      Going through this, we can optimize a fair amount, especially now that we're on Django 3.2. Mainly, we'll be able to bulk-update, avoiding the overhead for customers with larger lists of repositories.

      How about:

      # This will just be 2 queries in total, optimized only for the fields
      # we need, and leveraging the database as best as possible:
      #
      # 
      tools = (
          Tool.objects
          .prefetch_related(Prefetch(
              'repositories',
              queryset=Repository.objects.only('pk', 'tool_id')))
          .order_by('pk')
      )
      
      scmtool_id_data = {
          # This is required instead of values_list() due to the prefetch, but
          # will be optimized due to what we chose to prefetch.
          tool.scmtool_id: [
              repository.pk
              repository in tool.repositories.all()
          ]
          for tool in tools
      }
      
      ...
      
      # This will let us bulk-update.
      for scmtool_id, repository_ids in scmtool_id_data.items():
          repositories = Repository.objects.filter(pk__in=repository_ids)
          repositories.update(scmtool_id=scmtool_id)
      

      Should be a big performance win.

    6. reviewboard/upgrade.py (Diff revision 1)
       
       
       
      Show all issues

      You can pass flat=True to .values_list() to get a flat list of IDs, no indexing required.

    7. 
        
    david
    chipx86
    1. Ship It!
    2. 
        
    david
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-5.0.x (2449b13)