• 
      

    Add improved type safety for diff parsing.

    Review Request #10496 — Created April 2, 2019 and submitted — Latest diff uploaded

    Information

    Review Board
    release-4.0.x

    Reviewers

    Diff parsing was intended to work with byte strings, and generally did a
    pretty good job of this. Still, we had no proper enforcement, and we had
    some places where byte strings were being compared to Unicode strings,
    or temporarily transformed into Unicode strings. Much, though not all,
    of the type inconsistency occurred in the parse_diff_revision()
    methods on SCMTool subclasses. The rest was somewhat harmless, setting
    empty strings or default revision identifiers in ParsedDiffFile
    attributes, which then just made their way ultimately into other
    function calls or into FileDiff attributes.

    Python 2's ability to interchange byte strings and Unicode strings meant
    that this all pretty well worked, even when wrong. That changes with
    Python 3. Byte strings and Unicode strings are now very different, and
    we need to ensure we're working only with byte strings for all parsing.
    We also need to make sure we have somewhat of a migration path for the
    parsing that's capable of informing when things are wrong.

    This change goes through our diff parsing and ensures that they're
    exclusively working with byte strings, updating string literals and
    using io.BytesIO instead of StringIO (which is Unicode on Python 3).
    All parse_diff_revision() methods now assert the types coming in, and
    the code that calls it now asserts the types on the results.
    DiffParser also now requires a byte string, rather than accepting a
    Unicode string, and asserts on this.

    The other big part of this change is the data going into and coming out
    of ParsedDiffFile. This class is very old, and we had old attributes
    with old-style names that accepted any value with the assumption that
    it'd all get normalized to bytes at some point. Now, we have newer
    attributes that replace the old ones, but enforce type safety. The old
    attribute names continue to work and will cast to the right type, but
    will also emit deprecation warnings, helping us or third-parties to
    catch any issues.

    The diff viewer code all uses the new attributes, but there's still a
    bunch of SCMTool code and tests that use the old attributes, causing
    warnings to appear. Those will be tackled separately.

    Unit tests pass on Python 2.7 and 3.7 (with other in-progress changes).

    Tested posting changes for review.

    Commits

    Files