• 
      

    Clean up and document the code for diffing signatures.

    Review Request #9569 — Created Feb. 1, 2018 and submitted

    Information

    Django Evolution
    master
    cc1ff40...

    Reviewers

    The Diff class is responsible for taking two project signatures and
    diffing them, showing the changes that the target signature has that the
    original one does not, and generating a hinted evolution to turn the
    original into the target. It's old code that was getting harder to
    reason about and maintain.

    This change fixes up that code a bit, adding and clarifying
    documentation and separating out information gathering from state storage
    for the diff (and making it far more efficient in the process). It's now
    easier to read through and see the various bits of logic and to modify
    it. This will be needed for the upcoming signature work.

    Unit tests pass for all versions of Django.

    Description From Last Updated

    Should be double backticks at the end.

    david david

    lines.extend( "The model %s.%s has been deleted" % (app_label, model_name) for model_name in app_changes.get('deleted') )

    david david

    Somewhat more efficient as: lines.extend( " Field '%s' has been added" % field_name for field_name in change.get('added', []) ) lines.extend( …

    david david

    Somewhat more efficient as: lines.extend( " Property '%s' has changed" % prop for prop in field_change )

    david david
    david
    1. 
        
    2. django_evolution/diff.py (Diff revision 1)
       
       
      Show all issues

      Should be double backticks at the end.

    3. django_evolution/diff.py (Diff revision 1)
       
       
       
       
       
      Show all issues

      lines.extend(
      "The model %s.%s has been deleted" % (app_label, model_name)
      for model_name in app_changes.get('deleted')
      )

    4. django_evolution/diff.py (Diff revision 1)
       
       
       
       
       
       
       
       
      Show all issues

      Somewhat more efficient as:

      lines.extend(
          "    Field '%s' has been added" % field_name
          for field_name in change.get('added', [])
      )
      lines.extend(
          "    Field '%s' has been deleted" % field_name
          for field_name in change.get('deleted', [])
      )
      
      1. .extend isn't more efficient than +=. It's actually a little slower.

        $ python -m timeit 'a = [123]; a.extend([456, 789])'
        1000000 loops, best of 3: 0.405 usec per loop
        $ python -m timeit 'a = [123]; a.extend([456, 789])'
        1000000 loops, best of 3: 0.408 usec per loop
        $ python -m timeit 'a = [123]; a.extend([456, 789])'
        1000000 loops, best of 3: 0.418 usec per loop
        

        vs.

        $ python -m timeit 'a = [123]; a += [456, 789]'
        1000000 loops, best of 3: 0.389 usec per loop
        $ python -m timeit 'a = [123]; a += [456, 789]'
        1000000 loops, best of 3: 0.38 usec per loop
        $ python -m timeit 'a = [123]; a += [456, 789]'
        1000000 loops, best of 3: 0.39 usec per loop
        

        I learned about this a long time ago, and refreshed my memory just now. Basically, under the hood in C (and I just verified this in the Python code), they're the same exact thing (except += sanity-checks the return type). The difference is in how they're invoked in Python.

        In Python's bytecode, += is an operator, which is fast to execute. You put some data on the stack, invoke the operator, and you're done. It looks like this:

        >>> a += [123, 456]
        
          3           6 LOAD_FAST                0 (a)
                      9 LOAD_CONST               1 (123)
                     12 LOAD_CONST               2 (456)
                     15 BUILD_LIST               2
                     18 INPLACE_ADD
        

        .extend() is a function call, which looks like this:

        >>> a.extend([123, 456])
        
          3           6 LOAD_FAST                0 (a)
                      9 LOAD_ATTR                0 (extend)
                     12 LOAD_CONST               1 (123)
                     15 LOAD_CONST               2 (456)
                     18 BUILD_LIST               2
                     21 CALL_FUNCTION            1
                     24 POP_TOP
        

        This, it turns out, is a lot more expensive. Function calls involve setting up a whole lot more state, looking up the function in the dictionary, checking the stack against the stored function signatures, building a list of args and kwargs based on the stack and the function's list of defaults, preparing the final stack, calling the function, then undoing a bunch of that.

        Operators don't do most of that. They're strict in their function signatures and their code paths in CPython are a lot tighter. Their registration is a slot in a struct, instead of a key in a dictionary. They don't have to manipulate the data going in or do the kinds of checking required by functions. All around, they're a lot faster. That's where the performance gains in the timeit loops above come from.

    5. django_evolution/diff.py (Diff revision 1)
       
       
       
       
       
      Show all issues

      Somewhat more efficient as:

      lines.extend(
          "        Property '%s' has changed" % prop
          for prop in field_change
      )
      
    6. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to master (8c50ee7)