Summary

Refactor encode_multipart_formdata and tests to be easier to read and maintain

Review Request #9653 — Created Feb. 16, 2018 and discarded Nov. 7, 2019, 10:42 a.m.

Information

Owner

solarmist

Repository

RBTools

Branch

master

Bugs

Depends On

~~9652~~

Reviewers

Groups

rbtools

People

Description

Refactor encode_multipart_formdata and tests to be easier to read and maintain.

Testing Done

Will update with the remaining fixes.

Issues

Description	From	Last Updated
This should be part of the same import group.	chipx86	March 3, 2018, 12:30 a.m.
Blank line between statements/blocks and other blocks.	chipx86	March 3, 2018, 12:30 a.m.
Docstrings for functions must meet a standard format described at https://www.notion.so/reviewboard/Writing-Codebase-Documentation-e16312b5f061437cb73cbfa369ac3cb5	chipx86	Feb. 17, 2018, 12:36 a.m.
BytesIO() was the correct way to go. Concatenating strings in Python is slow and bad for garbage collection. You almost …	chipx86	March 3, 2018, 12:09 a.m.
Blank line between statements and comments.	chipx86	Feb. 17, 2018, 12:36 a.m.
Blank line between these. There's more in the file. Can you go through and fix those up? If there's others …	chipx86	Feb. 17, 2018, 12:36 a.m.
We always want to use explicit encodings in calls to encode() and decode().	chipx86	Feb. 17, 2018, 12:36 a.m.
We use % instead of .format().	chipx86	Feb. 17, 2018, 12:36 a.m.
This is not Python 2.7-compatible. All code must work on Python 2.7 as well, and we want to see testing …	chipx86	March 3, 2018, 12:42 a.m.
This is not efficient. This should be reverted.	chipx86	March 3, 2018, 12:40 a.m.

flake8 passed.

JSHint passed.

Description:: Refactor encode_multipart_formdata and tests to be easier to read and maintain.

~
This is the biggest functional code change.
~
This is the biggest functional code change.
+ https://reviews.reviewboard.org/r/9648/
+ https://reviews.reviewboard.org/r/9649/
+ https://reviews.reviewboard.org/r/9650/
+ https://reviews.reviewboard.org/r/9651/
+ https://reviews.reviewboard.org/r/9652/
+ https://reviews.reviewboard.org/r/9653/
+ https://reviews.reviewboard.org/r/9654/
+ https://reviews.reviewboard.org/r/9655/
Depends On:: ~~9648 - Fix exception handling, and update imports~~
~~9649 - Make six use consistant.~~
~~9650 - Fix log messages~~
~~9651 - Make quote use consistent across repo~~
~~9652 - Fix various things for python 3 compatibility~~
~~9653 - Refactor encode_multipart_formdata and tests to be easier to read and maintain~~
~~9654 - Fix unit tests to not pollute the test directories of other tests.~~
~~9655 - Fix various PEP 8, style and PyFlakes issues~~

Fix it!

Going through this, I noticed that you're introducing code that's specifically for Python 3. We won't be switching exclusively to Python 3.

For every single change, we're going to want to see testing for Python 2.7, 3.4, 3.5, and 3.6, both manual and unit tests. I know the unit tests may not run on Python 3.x versions as of certain points in the change, but every single change must be able to be landed without the changes that follow it without breaking anything in RBTools for Python 2.7.

solarmist March 3, 2018, 12:30 a.m.

That was an issue in my env. Python2 was somehow incorrectly linked to python3.6.

rbtools/api/request.py (Diff revision 1)
The issue has been resolved. Show all issues
```
This should be part of the same import group.
```
rbtools/api/request.py (Diff revision 1)
The issue has been resolved. Show all issues
```
Blank line between statements/blocks and other blocks.
```

rbtools/api/request.py (Diff revision 1)

An issue was opened. Show all issues

Docstrings for functions must meet a standard format described at https://www.notion.so/reviewboard/Writing-Codebase-Documentation-e16312b5f061437cb73cbfa369ac3cb5

solarmist March 3, 2018, 12:27 a.m.
```
Sure.
```

rbtools/api/request.py (Diff revision 1)

The issue has been dropped. Show all issues

BytesIO() was the correct way to go. Concatenating strings in Python is slow and bad for garbage collection. You almost never want to concatenate strings if you're going to do more than a few. Instead, you always want to use something like BytesIO(), StringIO(), or add them to a list and join them back into a string.

solarmist March 3, 2018, 12:27 a.m.

I'm dropping this as a moot point. As a cli tool that makes a network call any string processing done will be minscule compared to a single network call.

chipx86 March 3, 2018, 12:41 a.m.

The problem has to do when working with large amounts of data. File attachment uploads, for instance, will take a lot more memory with string concatenation, resulting in multiple copies of the file attachment data in memory at one time, just due to the way that Python works.

Say we upload two files that's 5MB and 10MB in size, respectively. During the loop in which we set the first file, we'll end up with a resulting string containing all the previous form-data content and the 5MB file. The next string concatenation will allocate another string containing the size of the original string + the new content, meaning another ~5MB string in memory, alongside the original. The reference then gets set to the new one, and the old one is flagged for garbage collection. We now have ~10MB in memory. The next bit of data that gets added will allocate another ~5MB string, giving us ~15MB in memory. And then ~20MB. When we start building the payload for the new file, that begins to grow further.

At some point during this, garbage collection is likely to kick in and start freeing up older strings, but given the sizes, this isn't exactly instantaneous, and it still means the process has allocated much more memory than needed.

Going the BytesIO route, we have an efficient way of storing this data without all the extra allocations. It's faster and more memory-efficient.

Because of that, the original mechanism for building these payloads must remain.

Also, while not as big a deal, it's probably not worth defining a new function inline. This has to be generated every time we call encode_multipart_formdata, and I don't think it's needed.

I'd really like to use the original code for this function, converted for BytesIO, instead of the new mechanism. The old method is well-tested.

rbtools/api/request.py (Diff revision 1)
An issue was opened. Show all issues
```
Blank line between statements and comments.
```
1. solarmist March 3, 2018, 12:27 a.m.
  Will update.

rbtools/api/request.py (Diff revision 1)

An issue was opened. Show all issues

Blank line between these.

There's more in the file. Can you go through and fix those up? If there's others like this in other files, those will also need to be fixed.

solarmist March 3, 2018, 12:27 a.m.
```
Will update.
```

rbtools/api/request.py (Diff revision 1)
An issue was opened. Show all issues
```
We always want to use explicit encodings in calls to encode() and decode().
```
1. solarmist March 3, 2018, 12:27 a.m.
  Sure.

rbtools/api/request.py (Diff revision 1)

An issue was opened. Show all issues

We use % instead of .format().

solarmist March 3, 2018, 12:27 a.m.

That's fine, but what's the reasoning behind that?

As noted in the docs https://docs.python.org/3/library/stdtypes.html#string-formatting

Note The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display 
tuples and dictionaries correctly). Using the newer formatted string literals or the str.format() interface helps avoid these errors. These 
alternatives also provide more powerful, flexible and extensible approaches to formatting text.

And from python 3.6 on .format() with f-strings is a much better choice for most applications.

chipx86 March 3, 2018, 12:41 a.m.

It's just what we do. Partially for historical reasons, partially because .format() is just more wordy. I know this is debatable, but it's how we format strings.

We can't rely on Python 3.6. It won't work with Python 2.7. We're likely to support Python 2.7 in RBTools long past the end-of-life date, as we're still trying to get companies off of Python 2.4 (enterprises tend to move very slow, and we have support contracts with them mandating support).

solarmist March 3, 2018, 12:50 a.m.

Fair enough. I'll see what I can do, but I was specifically having an issue with printf style formatting for the basic auth header and maintaining compatibility between python 2 and 3.

chipx86 March 3, 2018, 12:52 a.m.

I can help with that. There's some gotchas, definitely. If you can show me what data's going into it and what you're getting out, I can help figure out what's going on. Also, what version of Python 3 are you testing with? (Ultimately we're probably going to start with 3.4 through 3.6 compatibility.)

rbtools/api/request.py (Diff revision 1)

The issue has been resolved. Show all issues

This is not Python 2.7-compatible.

All code must work on Python 2.7 as well, and we want to see testing that assures this.

solarmist March 3, 2018, 12:27 a.m.

A bug in my venv linked to python2 incorrectly when I installed the python 3 venv first.

It's fixed now.

rbtools/api/request.py (Diff revision 1)

An issue was opened. Show all issues

This is not efficient. This should be reverted.

solarmist March 3, 2018, 12:27 a.m.

Also dropping this as a moot point. I could do this a million times and still have time to spare compared to a single network call.

Also this was directly re-used from the more recent versions of the standard libraries where this prevents python2/3 edge cases that cause a literal 'b' to be formatted into the string.

chipx86 March 3, 2018, 12:41 a.m.

That's true, but it's how we work with strings. We want to keep consistency, because it helps avoid errors with new changes, and we also don't want changes to regress logic without needing to. I'm going to ask to revert the string building to use the format string here.

solarmist March 3, 2018, 12:50 a.m.

As I mentioned in the previous comment I'll see what I can do, but I adopted this code after trying several variations of printf formatting and getting string type labels 'u' in py2 or 'b' in py3.

Description:

		Refactor encode_multipart_formdata and tests to be easier to read and maintain.
-
-		This is the biggest functional code change.
-		https://reviews.reviewboard.org/r/9648/
-		https://reviews.reviewboard.org/r/9649/
-		https://reviews.reviewboard.org/r/9650/
-		https://reviews.reviewboard.org/r/9651/
-		https://reviews.reviewboard.org/r/9652/
-		https://reviews.reviewboard.org/r/9653/
-		https://reviews.reviewboard.org/r/9654/
-		https://reviews.reviewboard.org/r/9655/

Testing Done:

~		Not sure what to put here since these intermediate commits don't pass tests or flake8.
	~	Will update with the remaining fixes.

Depends On:

~~9648 - Fix exception handling, and update imports~~

~~9649 - Make six use consistant.~~

~~9650 - Fix log messages~~

~~9651 - Make quote use consistent across repo~~

~~9653 - Refactor encode_multipart_formdata and tests to be easier to read and maintain~~

~~9654 - Fix unit tests to not pollute the test directories of other tests.~~

~~9655 - Fix various PEP 8, style and PyFlakes issues~~

Commit:

466454634336d7498af27955e6bd40eb584c90a3

1b420efd0cc224ddddf2ebf9884c18e5ce84b8b4

Diff:

Revision 2 (+89 -94)

Show changes

	rbtools/api/request.py
	rbtools/api/tests.py

Checks run (2 succeeded)

flake8 passed.

JSHint passed.

Status:: Discarded