• 
      

    Fix the diff viewer to handle very large files.

    Review Request #3402 — Created Oct. 7, 2012 and submitted

    Information

    Review Board

    Reviewers

    Fix the diff viewer to handle very large files.
    
    Previously, very large files (10,000+ lines) would greatly slow down
    Review Board and take forever to render. This slowdown was entirely in
    the rendering step, rather than being in the diffing or patching steps.
    
    The reason for the slowdown is that we executed a fragment of a Django
    template (consisting of variable resolution, template tags, and filters)
    per line of the diff.
    
    This, it turns out, is very slow. Variable lookups (and possible
    escaping) is slow. Template tag and filter execution is slow. Copying of
    contexts is slow. To make matters worse, the context we supply for diffs
    contains data that can be quite large (chunk, file of chunks, etc.),
    which ends up getting copied a bunch for different tags. While we may
    want to audit this for other templates, fixing it for the diff lines
    is still slow due to how templates are processed.
    
    So what we're now doing is defining printf-style templates for the lines
    (and conditional parts of lines), processing some data in code, and then
    piecing them together for output.
    
    This creates a massive speedup. a 44,000 line change I tested with took
    around a minute to render before, bringing the server to its knees at
    the same time. Now, the total render time for the page (including all
    the rest of the diff viewer page) is now only a few seconds.
    Tested with my usual scattering of diffs. Verified that line numbers
    were correct, as were the coloring, expansion, and collapsing.
    
    Tested with a nearly 2MB patch containing over 44,000 lines of changes
    in the diff viewer. Saw it speed by (not that the browser was very
    happy about it).
    Description From Last Updated

    Tiny optimization: chunk_lines = chunk['lines'] chunk_index = chunk['index'] Multiple references.

    ME medanat

    The brackets are not required here, are you using them for clarity?

    ME medanat

    Is the intermediate list result needed? Can't you just concatenate to a string immediately instead of appending to the list …

    ME medanat

    Tiny optimization, maybe add here: moved_line = len(line) > 8 and line[8] and on line 291: if moved_line:

    ME medanat

    Checking for i == 0 twice. could move second if-statement body into first if-statement.

    ME medanat

    show_collapse = i == 0 and standalone

    ME medanat
    ME
    1. Possible style issues and very tiny optimizations.
    2. Show all issues
      Tiny optimization:
      
      chunk_lines = chunk['lines']
      chunk_index = chunk['index']
      
      Multiple references.
    3. Show all issues
      Tiny optimization, maybe add here:
      
      moved_line = len(line) > 8 and line[8]
      
      and on line 291:
      
      if moved_line:
      1. I don't see how that's an optimization. We still do the one check per line either way.
      2. Multiple references of line[8].
        
        moved_line = len(line) > 8 and line[8]
        
        moved_line is now the value of line[8], which will also have a True boolean value for the if-statement, or False. The 'len(line) > 8' check is to avoid index error.
    4. Show all issues
      show_collapse = i == 0 and standalone
    5. 
        
    ME
    1. Again, a bunch of possibly unnecessary optimizations that might make the code less readable. Worth mentioning though.
    2. reviewboard/diffviewer/templatetags/difftags.py (Diff revision 1)
       
       
       
       
       
      Show all issues
      The brackets are not required here, are you using them for clarity?
      1. Yes. I don't like Seeing foo = bar == bah.
    3. Show all issues
      Is the intermediate list result needed? Can't you just concatenate to a string immediately instead of appending to the list result then joining?
      
      So result = ''
      
      and on line 341
      
          result += line_fmt % context
      
      return result
      1. It's historically been faster to build an array and then concatenate.
    4. reviewboard/diffviewer/templatetags/difftags.py (Diff revision 1)
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
      Show all issues
      Checking for i == 0 twice.
      
      could move second if-statement body into first if-statement.
    5. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed