• 
      

    Give FileDiffs control over handling of encodings for source files.

    Review Request #10775 — Created Nov. 1, 2019 and submitted

    Information

    Review Board
    release-4.0.x
    78b4aa4...

    Reviewers

    Historically, Review Board has depended on source files to either us a
    UTF-8 or ISO-8859-15 encoding, or for the repository to supply a list of
    compatible encodings. If a special encoding were ever used for a file,
    and the repository later had its encodings list changed to exclude it,
    file would no longer be able to be viewed in the diff viewer.

    This encoding list was computed up-front when generating a diff for
    file. We'd fetch the repository encoding list, pass it to calls like
    get_original_file(), which would then pass that on to
    convert_to_unicode(), which would try the built-in encodings and the
    provided list of encodings. The caller was always responsible for
    fetching or computing that encoding list up-front before calling any of
    these methods.

    This change rewrites all of this to make the process more dependent on
    the FileDiff we're working on, and to record a known working encoding
    for future use.

    FileDiff.extra_data can now contain an 'encoding' key (conveniently
    exposed as FileDiff.encoding), which specifies an explicit encoding to
    use. This could in theory be set during diff parsing (which requires
    some additional work still), but it will be set after we've successfully
    converted a source file to Unicode and back.

    If not set, we still respect the provided encoding list (though this is
    now deprecated in get_original_file()), and then look up from the
    repository. This will allow us to later work with FileDiffs backed by
    multiple repositories without placing unnecessary burdens on the caller.

    The machinery for diff chunk generation has been updated as well. Now,
    RawDiffChunkGenerator.generate_chunks() can be given an encoding list
    for the old and new content. This defaults to the encoding list normally
    provided to the chunk generator, but subclasses (DiffChunkGenerator)
    can pass in separate encoding lists, allowing for two FileDiffs on an
    interdiff to be in separate encodings without breaking chunk generation.

    Note that it's still possible for a patch to contain content in another
    encoding that doesn't match what's in FileDiff.encoding. The chunk
    generator will still try to do the right thing in this case, falling
    back to the repository encodings.

    Unit tests passed.

    Made sure my various test diffs and interdiffs were still working as
    expected.

    Description From Last Updated

    F811 redefinition of unused 'test_with_filediff_encoding_set' from line 3548

    reviewbotreviewbot

    Can we call this FALLBACK_ENCODING? Our real default is utf-8

    daviddavid
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    chipx86
    chipx86
    david
    1. 
        
    2. reviewboard/scmtools/models.py (Diff revision 3)
       
       
      Show all issues

      Can we call this FALLBACK_ENCODING? Our real default is utf-8

    3. 
        
    chipx86
    david
    1. Ship It!
    2. 
        
    chipx86
    Review request changed
    Status:
    Completed
    Change Summary:
    Pushed to release-4.0.x (4f98605)