Add a utility parser for hunks in a diff.

Review Request #11740 — Created July 21, 2021 and submitted

Information

DiffX
master

Reviewers

This introduces pydiffx.utils.unified_diffs, which contains a
get_unified_diff_hunks method. This method iterates through a byte
string, returning a list of information on each hunk found in the
string, up until either the end of the string or the first occurrence of
something other thna a hunk.

The following general information is returned:

  • The number of lines from the provided list of lines that have been
    processed to return hunk data.
  • The total numbers of inserts and deletes found across all hunks.

For each hunk:

  • The number of lines of context before/after the changed lines in the
    hunk.
  • The header context (usually a function/class after the @@ ... @@).

For each side (original/modified) of each hunk:

  • The 0-based line number in the file where the start of the hunk
    should map to.
  • The number of lines in the file represented by the hunk.
  • The 0-based line numbers in the file where the first and last change
    in the hunk occurs.
  • The total number of lines changed in the hunk.

This can be told to ignore junk between headers, which is helpful for
gathering stats across an entire diff file.

It will raise a MalformedHunkError if it finds anything really out of
the ordinary (such as a premature end of a hunk, or garbage found within
the hunk).

This method is based on a similar method we have in Review Board, but
with some improvements to parsing, strictness, and results. It will be
used by the DiffX DOM class in an upcoming change to calculate stats
for the generated DiffX file.

Unit tests pass on Python 2 and 3.

Built the docs and checked that they rendered and linked correctly.
Checked for spelling errors.

Summary ID
Add a utility parser for hunks in a diff.
This introduces `pydiffx.utils.unified_diffs`, which contains a `get_unified_diff_hunks` method. This method iterates through a byte string, returning a list of information on each hunk found in the string, up until either the end of the string or the first occurrence of something other thna a hunk. The following general information is returned: * The number of lines from the provided list of lines that have been processed to return hunk data. * The total numbers of inserts and deletes found across all hunks. For each hunk: * The number of lines of context before/after the changed lines in the hunk. * The header context (usually a function/class after the `@@ ... @@`). For each side (original/modified) of each hunk: * The 0-based line number in the file where the start of the hunk should map to. * The number of lines in the file represented by the hunk. * The 0-based line numbers in the file where the first and last change in the hunk occurs. * The total number of lines changed in the hunk. This can be told to ignore junk between headers, which is helpful for gathering stats across an entire diff file. It will raise a `MalformedHunkError` if it finds anything really out of the ordinary (such as a premature end of a hunk, or garbage found within the hunk). This method is based on a similar method we have in Review Board, but with some improvements to parsing, strictness, and results. It will be used by the `DiffX` DOM class in an upcoming change to calculate stats for the generated DiffX file.
e025a21b02008b0dedb8f929f6e3ec94ab43d76a
Description From Last Updated

E501 line too long (80 > 79 characters)

reviewbotreviewbot
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

david
  1. Ship It!
  2. 
      
chipx86
Review request changed
Status:
Completed
Change Summary:
Pushed to master (7c118b4)