Translating newlines to fix embedded carriage returns

Review Request #3208 — Created July 12, 2012 and discarded

Information

RBTools

Reviewers

A frequent, and especially troublesome to debug, cause for post-review issues
is embedded windows newlines. These are fine if they occur at the end of a
line, but when a \r isn't followed by a \n the patch command run on the
ReviewBoard host will usually choke due to misaligning line numbers in the
following chunks.

Pushed to the following branch...
https://github.com/atagar/rbtools/tree/newline_normalization
Tested by making a perforce change riddled with erroneous windows newlines,
then posting a review for it (ReviewBoard wept bitterly when rendering the
diff, of course).

We've been using this change against RBTools 0.3. I've adapted the change
for the current master but this copy of it hasn't been exercised.
AT
reviewbot
  1. This is a review from Review Bot.
      Tool: PEP8 Style Checker
      Processed Files:
        rbtools/clients/perforce.py
      Ignored Files:
        AUTHORS
    
    
  2. 
      
chipx86
  1. Hi Damian,
    
    I'm pretty hesitant to change any logic related to newlines, due to the subtle ways in which they can break.
    
    We've seen all the following cases before:
    
    * \r\n on every line
    * \n on every line
    * \r on every line
    * \r\r\n on every line
    * \r\n only on some lines
    * \n only on some lines
    * \r only on some lines
    * \r\r\n only on some lines
    
    We'd need to make sure we don't regress any of those. And yes, some simply fail today, as you've noticed, because these situations can't always be fixed.
    
    There are files out there that have embedded newlines as part of the data set and not necessarily as line terminators. It complicates things, and I'm not sure what the best course of action is for it.
    
    What I'd need to see are unit tests accompanying this that check each type of file and make sure that the results are correct.
    
    I'm curious if you know how your files get into these situations. Usually, this indicates issues with repository configuration, Perforce doing newline normalization and then syncing those files from one platform to another, or editor issues.
    1. > * \r on every line
      > * \r only on some lines
      
      Yup. Lone carriage returns are evil. With git for instance a file with \r line endings is reported by 'git diff' as being a single line (it doesn't respect them as newlines at all). The ReviewBoard newline conversion then changes them for the base file again resulting in a stacktrace. We actually have a patch for our GitClient to reject files that contain any lone carriage returns because it's doomed to fail. In the perforce case however it seems to be recoverable.
      
      > It complicates things, and I'm not sure what the best course of action is for it.
      
      The main thing is that we need to respect the server side's expectations, lest we fail with a stacktrace like we do now. That is where this change's NEWLINE_CONVERSION_RE comes from...
      
      https://github.com/reviewboard/reviewboard/blob/master/reviewboard/diffviewer/diffutils.py#L43
      
      > What I'd need to see are unit tests accompanying this that check each type of file and make sure that the results are correct.
      
      Can do, though I'm not sure if it's really appropriate. The problem here is different newline normalization differs beteen rbtools and reviewboard (if reviewboard changes the above regex we'll again be broken). Unit tests wouldn't really cover this - an integration test would be more appropriate but I don't believe we have those.
      
      > I'm curious if you know how your files get into these situations.
      
      I gave the repro above...
      
      % p4 edit README
      % echo "more\rstuff\r\n" >> README
      % post-review
      
      As for how this happens in practice I'm not really sure. The lone carriage returns usually reside in legacy code so the person filing the code review isn't aware of them. We just know they exist and are a pita to track down. ;)
      
  2. 
      
AT
Review request changed
Status:
Discarded