Optimize diff/post performance by caching 'svn info' results

Review Request #7199 — Created April 13, 2015 and submitted

Information

RBTools
master
2fcdf37...

Reviewers

rbt diff and post operations make numerous calls to 'svn info' with
paths to various files. For any given modified file, 'svn info'
is called 4 + N times. This breaks down as:

  • 3 calls from convert_to_absolute_paths() with the full filename
    to process the new file line, original file line, and Index line
  • 1 call from find_copyfrom() for the full filename
  • N calls from find_copyfrom() for each sub-path component in the
    filename

By introducing caching on results from 'svn info' the number of
calls per file can be decreased to 1 + M, where M is always <= N.
For files sharing a common directory hierarchy, M will be equal
to N for the first file processed and 0 for all subsequent files.

Consider a contrived example with two modified files at:
A/B/C/foo.txt
A/B/C/bar.txt

The original code make 14 total calls to 'svn info'. Measuring
timing on a CentOS 7 box with time -p reveals an average run
time of 0.742s. The updated code makes 5 total calls to
'svn info' and runtime is measured at 0.586s. This is a decrease
in runtime of 21%.

* Performed various 'rbt diff' executions with debug enabled to 
  verify expected decrease in calls to 'svn info'.
* Ran unit tests.
Description From Last Updated

How about we do something like: if path not in self._svn_info_cache: result = ... if result is None: self._svn_info_cache[path] = …

brenniebrennie

Remove this line.

brenniebrennie

Col: 26 E222 multiple spaces after operator

reviewbotreviewbot
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/svn.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/svn.py
    
    
  2. 
      
brennie
  1. 
      
  2. rbtools/clients/svn.py (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    How about we do something like:

    if path not in self._svn_info_cache:
        result = ...
    
        if result is None:
            self._svn_info_cache[path] = None
        else:
            self._svn_info_cache[path] = {}
    
            for info in result:
                # ...
    
    return self._svn_info_cache[path]
    

    This leaves us with a single return path.
    This will have

    1. Ignore that last sentence fragment.

  3. 
      
gmyers
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/svn.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/svn.py
    
    
  2. rbtools/clients/svn.py (Diff revision 2)
     
     
    Col: 26
     E222 multiple spaces after operator
    
  3. 
      
brennie
  1. 
      
  2. rbtools/clients/svn.py (Diff revision 2)
     
     

    Remove this line.

  3. 
      
gmyers
reviewbot
  1. Tool: Pyflakes
    Processed Files:
        rbtools/clients/svn.py
    
    
    
    Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/svn.py
    
    
  2. 
      
gmyers
reviewbot
  1. Tool: PEP8 Style Checker
    Processed Files:
        rbtools/clients/svn.py
    
    
    
    Tool: Pyflakes
    Processed Files:
        rbtools/clients/svn.py
    
    
  2. 
      
gmyers
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to master (2747e8f)
Loading...