Summary

Fix URL encoding issues in pysvn when viewing diffs in the UI

Review Request #10671 — Created Aug. 30, 2019 and discarded May 14, 2020, 10:08 p.m.

Information

Owner

rpetti

Repository

Review Board

Branch

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

Resolve URL encoding issue for pysvn that results in diffs not being displayable in the UI if the file's path contains a '#' (and possibly other special URL characters, such as '?')

The original code appeared to append the file path to the repo URL, then parse the result as a URL in order to URL-encode the file path. This is nonsensical, so most of the code has been removed and replaced with simply URL-encoding the path before appending it to the SVN repository URL.

Please see [here|https://groups.google.com/d/msg/reviewboard/GufvNZe9MUE/A4tAwtPMAwAJ] for more context on the issue.

Testing Done

Currently running in our production server. No known side-effects so far.

Issues

Description	From	Last Updated
Looks like this doesn't pass our unit tests: ERROR: Testing SVN (pysvn) get_file ---------------------------------------------------------------------- Traceback (most recent call last): File …	chipx86	Oct. 26, 2019, 11:14 a.m.
E501 line too long (81 > 79 characters)	reviewbot	Aug. 30, 2019, 9:36 a.m.
E501 line too long (82 > 79 characters)	reviewbot	Oct. 26, 2019, 11:14 a.m.
E501 line too long (83 > 79 characters)	reviewbot	Oct. 26, 2019, 11:14 a.m.
W291 trailing whitespace	reviewbot	Oct. 26, 2019, 11:15 a.m.
W291 trailing whitespace	reviewbot	Oct. 26, 2019, 11:15 a.m.
W291 trailing whitespace	reviewbot	Oct. 26, 2019, 11:15 a.m.
I think rewriting URL parsing and rebuilding logic isn't the way to go. We're now essentially doing the work that …	chipx86	Oct. 26, 2019, 2:27 p.m.

flake8 failed.

JSHint passed.

flake8

reviewboard/scmtools/svn/pysvn.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E501 line too long (81 > 79 characters)
```

Diff:

Revision 2 (+2 -13)

Show changes

reviewboard/scmtools/svn/pysvn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

I'm sorry we never got to this, Rob. It definitely got lost in the shuffle of this past few months, especially with all the student project and 4.0 beta 1 work. I really do appreciate the change, and understand your concerns. I promise that despite the lack of visibility lately, behind-the-scenes the project and our company backing it are in great shape.

I dug into the reasoning for the old code, and I agree, it's not the right approach. I see why it was implemented that way -- they were trying to avoid quoting the file:// and http:// bit -- but with years of small code changes it became an increasingly bad and broken approach. There's some subtle things it's doing that the new code is not doing, and I'll need to do some testing around this to make sure nothing appears to break, but we'll this ready to land in 3.0.16.

rpetti Oct. 22, 2019, 1:40 p.m.
```
Sounds good, thank you!
```

```
Ship It!
```

The issue has been resolved. Show all issues

Looks like this doesn't pass our unit tests:

ERROR: Testing SVN (pysvn) get_file
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/tests/test_svn.py", line 101, in test_get_file
    self.tool.get_file(self.repository.path + '/' + file, rev),
  File "/Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/svn/__init__.py", line 160, in get_file
    return self.client.get_file(path, revision)
  File "/Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/svn/pysvn.py", line 93, in get_file
    return self._do_on_path(self._get_file_data, path, revision)
  File "/Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/svn/pysvn.py", line 66, in _do_on_path
    raise FileNotFoundError(path, revision, detail=exc)
FileNotFoundError: The file "file:///Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/testdata/svn_repo/trunk/doc/misc-docs/Makefile" (revision 2) could not be found in the repository: '/file:/Users/chipx86/src/rb/trees/reviewboard/3.0/reviewboard/scmtools/testdata/svn_repo/trunk/doc/misc-docs/Makefile' path not found

rpetti Oct. 26, 2019, 9:23 a.m.

The file:// url being passed into pysvn looks correct to me, so why it pysvn not interpreting it correctly?

Change Summary:

Since the root cause of the issue was peforming a urlsplit on an unescaped path, we can instead use a different method for splitting out the path portion for escaping. This follows the original intent of the code more closely, while still avoiding the issues with special url characters in the path.

Diff:

Revision 3 (+3 -5)

Show changes

reviewboard/scmtools/svn/pysvn.py

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/scmtools/svn/pysvn.py (Diff revision 3)
The issue has been resolved. Show all issues
```
E501 line too long (82 > 79 characters)
```
reviewboard/scmtools/svn/pysvn.py (Diff revision 3)
The issue has been resolved. Show all issues
```
E501 line too long (83 > 79 characters)
```

Diff:

Revision 4 (+8 -5)

Show changes

reviewboard/scmtools/svn/pysvn.py

Checks run (1 failed, 1 succeeded)

flake8 failed.

JSHint passed.

flake8

reviewboard/scmtools/svn/pysvn.py (Diff revision 4)
The issue has been resolved. Show all issues
```
W291 trailing whitespace
```
reviewboard/scmtools/svn/pysvn.py (Diff revision 4)
The issue has been resolved. Show all issues
```
W291 trailing whitespace
```
reviewboard/scmtools/svn/pysvn.py (Diff revision 4)
The issue has been resolved. Show all issues
```
W291 trailing whitespace
```

Diff:

Revision 5 (+8 -5)

Show changes

reviewboard/scmtools/svn/pysvn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Fix it!

reviewboard/scmtools/svn/pysvn.py (Diff revision 5)

An issue was opened. Show all issues

I think rewriting URL parsing and rebuilding logic isn't the way to go. We're now essentially doing the work that urlsplit and urlunsplit were meant to do, except it won't handle a variety of cases (Windows file paths or network locations like file://\\Host\Path paths, URLs with ports, or URLs with IPv6 addresses). Allowed URL formats are just too complex for anything but a really complex regex (which I don't want us to have to maintain).

So keeping what we had was probably the closest to "correct" at this stage, given requirements of this function. But it's not ideal.

So let's re-examine what we're actually wanting to do and what we are currently doing here.

We're taking in a path that comes from a diff, which may be a path relative to a repository, or may be an absolute path (file-based or URL-based), depending on how the diff is constructed.
We're then currently normalizing that to ensure we have an absolute, URL-based path that we can ultimately feed to libsvn.
After we've normalized it, we're ensuring the path is quoted. Here's where things go wrong: we're losing anything after a ? or #, because urlsplit attempts to parse it out.
We rebuild the absolute URL and begin work with it.

We're doing too much.

At stage 2, we are normalizing a URL. We're making a relative URL absolute. That means we already have the file path. This is where we should be handling any quoting.

Your original approach tried to do that, but it did it by quoting the path going into normalize_path(). That prevented some code paths from matching correctly. The first check is whether the file path starts with the repo path, and since we were passing in a quoted scheme (file%3A//), we weren't going to get the result we wanted.

The proper approach is to ditch all this URL building code here and instead do the right thing at the right time in normalize_path().

That method looks like this:

if path.startswith(self.repopath):
    norm_path = path
elif path.startswith('//'):
    norm_path = self.repopath + path[1:]
elif path[0] == '/':
    norm_path = self.repopath + path
else:
    norm_path = '%s/%s' % (self.repopath, path)

return norm_path.rstrip('/')

In the first case, we can assume we have a fully normalized path, so we're good to go. We don't need to do anything here.

The remaining cases are building a path. Now, a root SVN URL may, like any other file, containg special characters, but we assume that a URL coming in is going to be a proper URL, and that means characters like # and ? are going to be escaped. It's also unlikely that we'll find them here, but still. I think we can focus on path, since that's the bit that's almost guaranteed not to be escaped.

So this would be the proper time to do a quote() operation, when we're adding path onto repopath.

I'd suggest doing that here. We won't have to muck around with URL parsing or rebuilding, we're just taking what's from the diff file and making sure we can build a suitable URL out of it.

The benefit to this is that we can share this fix with other functions that need to normalize paths but currently aren't doing so. They're also very likely to be broken as well.

It'll be important to have unit tests for normalize_path for the various inputs. Our process is to introduce unit tests when we find a regression or bug and have new code fixing it, so we can be sure it works (now and in the future, and across Python versions). One unit test per condition, demonstrating paths containing normal characters, ?, and #.

I understand if this is getting to the point where it's more of a time investment than you expected to put into it and you'd rather we take care of the rest.

Change Summary:

Here's a reimplementation, following your recommendations. I'm not able to create a test case since I'd need to add a file to the test SVN repo, which I don't believe I can do in the form of a diff review.

Diff:

Revision 6 (+10 -19)

Show changes

	reviewboard/scmtools/svn/base.py
	reviewboard/scmtools/svn/pysvn.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

gentle bump

chipx86 March 18, 2020, 3:08 p.m.

Oh man, I'm sorry Rob. Going to test it out. We'll have this in for 3.0.18.

chipx86 May 14, 2020, 2:50 a.m.

This is going into 3.0.18. I have a change that will be going up tomorrow that just supplements this with unit tests and updates the imports to work with Python 2 and 3 (for RB4). 3.0.18 is currently scheduled for next week.

chipx86 May 14, 2020, 10:07 p.m.

The supplemented change is up here: https://reviews.reviewboard.org/r/11020/

I found out some of the history behind the old split/rebuild logic, and how Subversion itself has changed in this regard, which is reflected in the change description.

Status:: Discarded
Change Summary:: Replaced by https://reviews.reviewboard.org/r/11020/