Add a utility parser for hunks in a diff.

Review Request #11740 — Created July 21, 2021 and submitted — Latest diff uploaded

Information

DiffX
master

Reviewers

This introduces pydiffx.utils.unified_diffs, which contains a
get_unified_diff_hunks method. This method iterates through a byte
string, returning a list of information on each hunk found in the
string, up until either the end of the string or the first occurrence of
something other thna a hunk.

The following general information is returned:

  • The number of lines from the provided list of lines that have been
    processed to return hunk data.
  • The total numbers of inserts and deletes found across all hunks.

For each hunk:

  • The number of lines of context before/after the changed lines in the
    hunk.
  • The header context (usually a function/class after the @@ ... @@).

For each side (original/modified) of each hunk:

  • The 0-based line number in the file where the start of the hunk
    should map to.
  • The number of lines in the file represented by the hunk.
  • The 0-based line numbers in the file where the first and last change
    in the hunk occurs.
  • The total number of lines changed in the hunk.

This can be told to ignore junk between headers, which is helpful for
gathering stats across an entire diff file.

It will raise a MalformedHunkError if it finds anything really out of
the ordinary (such as a premature end of a hunk, or garbage found within
the hunk).

This method is based on a similar method we have in Review Board, but
with some improvements to parsing, strictness, and results. It will be
used by the DiffX DOM class in an upcoming change to calculate stats
for the generated DiffX file.

Unit tests pass on Python 2 and 3.

Built the docs and checked that they rendered and linked correctly.
Checked for spelling errors.

Commits

Files