Clean up and document the code for diffing signatures.

Review Request #9569 — Created Feb. 1, 2018 and submitted

Information

Django Evolution
master
cc1ff40...

Reviewers

The Diff class is responsible for taking two project signatures and
diffing them, showing the changes that the target signature has that the
original one does not, and generating a hinted evolution to turn the
original into the target. It's old code that was getting harder to
reason about and maintain.

This change fixes up that code a bit, adding and clarifying
documentation and separating out information gathering from state storage
for the diff (and making it far more efficient in the process). It's now
easier to read through and see the various bits of logic and to modify
it. This will be needed for the upcoming signature work.

Unit tests pass for all versions of Django.

Description From Last Updated

Should be double backticks at the end.

daviddavid

lines.extend( "The model %s.%s has been deleted" % (app_label, model_name) for model_name in app_changes.get('deleted') )

daviddavid

Somewhat more efficient as: lines.extend( " Field '%s' has been added" % field_name for field_name in change.get('added', []) ) lines.extend( …

daviddavid

Somewhat more efficient as: lines.extend( " Property '%s' has changed" % prop for prop in field_change )

daviddavid
david
  1. 
      
  2. django_evolution/diff.py (Diff revision 1)
     
     

    Should be double backticks at the end.

  3. django_evolution/diff.py (Diff revision 1)
     
     
     
     
     

    lines.extend(
    "The model %s.%s has been deleted" % (app_label, model_name)
    for model_name in app_changes.get('deleted')
    )

  4. django_evolution/diff.py (Diff revision 1)
     
     
     
     
     
     
     
     

    Somewhat more efficient as:

    lines.extend(
        "    Field '%s' has been added" % field_name
        for field_name in change.get('added', [])
    )
    lines.extend(
        "    Field '%s' has been deleted" % field_name
        for field_name in change.get('deleted', [])
    )
    
    1. .extend isn't more efficient than +=. It's actually a little slower.

      $ python -m timeit 'a = [123]; a.extend([456, 789])'
      1000000 loops, best of 3: 0.405 usec per loop
      $ python -m timeit 'a = [123]; a.extend([456, 789])'
      1000000 loops, best of 3: 0.408 usec per loop
      $ python -m timeit 'a = [123]; a.extend([456, 789])'
      1000000 loops, best of 3: 0.418 usec per loop
      

      vs.

      $ python -m timeit 'a = [123]; a += [456, 789]'
      1000000 loops, best of 3: 0.389 usec per loop
      $ python -m timeit 'a = [123]; a += [456, 789]'
      1000000 loops, best of 3: 0.38 usec per loop
      $ python -m timeit 'a = [123]; a += [456, 789]'
      1000000 loops, best of 3: 0.39 usec per loop
      

      I learned about this a long time ago, and refreshed my memory just now. Basically, under the hood in C (and I just verified this in the Python code), they're the same exact thing (except += sanity-checks the return type). The difference is in how they're invoked in Python.

      In Python's bytecode, += is an operator, which is fast to execute. You put some data on the stack, invoke the operator, and you're done. It looks like this:

      >>> a += [123, 456]
      
        3           6 LOAD_FAST                0 (a)
                    9 LOAD_CONST               1 (123)
                   12 LOAD_CONST               2 (456)
                   15 BUILD_LIST               2
                   18 INPLACE_ADD
      

      .extend() is a function call, which looks like this:

      >>> a.extend([123, 456])
      
        3           6 LOAD_FAST                0 (a)
                    9 LOAD_ATTR                0 (extend)
                   12 LOAD_CONST               1 (123)
                   15 LOAD_CONST               2 (456)
                   18 BUILD_LIST               2
                   21 CALL_FUNCTION            1
                   24 POP_TOP
      

      This, it turns out, is a lot more expensive. Function calls involve setting up a whole lot more state, looking up the function in the dictionary, checking the stack against the stored function signatures, building a list of args and kwargs based on the stack and the function's list of defaults, preparing the final stack, calling the function, then undoing a bunch of that.

      Operators don't do most of that. They're strict in their function signatures and their code paths in CPython are a lot tighter. Their registration is a slot in a struct, instead of a key in a dictionary. They don't have to manipulate the data going in or do the kinds of checking required by functions. All around, they're a lot faster. That's where the performance gains in the timeit loops above come from.

  5. django_evolution/diff.py (Diff revision 1)
     
     
     
     
     

    Somewhat more efficient as:

    lines.extend(
        "        Property '%s' has changed" % prop
        for prop in field_change
    )
    
  6. 
      
chipx86
david
  1. Ship It!
  2. 
      
chipx86
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to master (8c50ee7)
Loading...