Fix the diff viewer to handle very large files.

Review Request #3402 — Created Oct. 7, 2012 and submitted

Information

Review Board

Reviewers

Fix the diff viewer to handle very large files.

Previously, very large files (10,000+ lines) would greatly slow down
Review Board and take forever to render. This slowdown was entirely in
the rendering step, rather than being in the diffing or patching steps.

The reason for the slowdown is that we executed a fragment of a Django
template (consisting of variable resolution, template tags, and filters)
per line of the diff.

This, it turns out, is very slow. Variable lookups (and possible
escaping) is slow. Template tag and filter execution is slow. Copying of
contexts is slow. To make matters worse, the context we supply for diffs
contains data that can be quite large (chunk, file of chunks, etc.),
which ends up getting copied a bunch for different tags. While we may
want to audit this for other templates, fixing it for the diff lines
is still slow due to how templates are processed.

So what we're now doing is defining printf-style templates for the lines
(and conditional parts of lines), processing some data in code, and then
piecing them together for output.

This creates a massive speedup. a 44,000 line change I tested with took
around a minute to render before, bringing the server to its knees at
the same time. Now, the total render time for the page (including all
the rest of the diff viewer page) is now only a few seconds.
Tested with my usual scattering of diffs. Verified that line numbers
were correct, as were the coloring, expansion, and collapsing.

Tested with a nearly 2MB patch containing over 44,000 lines of changes
in the diff viewer. Saw it speed by (not that the browser was very
happy about it).
Description From Last Updated

Tiny optimization: chunk_lines = chunk['lines'] chunk_index = chunk['index'] Multiple references.

ME medanat

The brackets are not required here, are you using them for clarity?

ME medanat

Is the intermediate list result needed? Can't you just concatenate to a string immediately instead of appending to the list …

ME medanat

Tiny optimization, maybe add here: moved_line = len(line) > 8 and line[8] and on line 291: if moved_line:

ME medanat

Checking for i == 0 twice. could move second if-statement body into first if-statement.

ME medanat

show_collapse = i == 0 and standalone

ME medanat
ME
  1. Possible style issues and very tiny optimizations.
  2. Show all issues
    Tiny optimization:
    
    chunk_lines = chunk['lines']
    chunk_index = chunk['index']
    
    Multiple references.
  3. Show all issues
    Tiny optimization, maybe add here:
    
    moved_line = len(line) > 8 and line[8]
    
    and on line 291:
    
    if moved_line:
    1. I don't see how that's an optimization. We still do the one check per line either way.
    2. Multiple references of line[8].
      
      moved_line = len(line) > 8 and line[8]
      
      moved_line is now the value of line[8], which will also have a True boolean value for the if-statement, or False. The 'len(line) > 8' check is to avoid index error.
  4. Show all issues
    show_collapse = i == 0 and standalone
  5. 
      
ME
  1. Again, a bunch of possibly unnecessary optimizations that might make the code less readable. Worth mentioning though.
  2. reviewboard/diffviewer/templatetags/difftags.py (Diff revision 1)
     
     
     
     
     
    Show all issues
    The brackets are not required here, are you using them for clarity?
    1. Yes. I don't like Seeing foo = bar == bah.
  3. Show all issues
    Is the intermediate list result needed? Can't you just concatenate to a string immediately instead of appending to the list result then joining?
    
    So result = ''
    
    and on line 341
    
        result += line_fmt % context
    
    return result
    1. It's historically been faster to build an array and then concatenate.
  4. reviewboard/diffviewer/templatetags/difftags.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
    Show all issues
    Checking for i == 0 twice.
    
    could move second if-statement body into first if-statement.
  5. 
      
chipx86
david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed