Summary

Add a utility parser for hunks in a diff.

Review Request #11740 — Created July 21, 2021 and submitted July 23, 2021, 6:05 p.m.

Information

Owner

chipx86

Repository

DiffX

Branch

master

Bugs

Depends On

Reviewers

Groups

diffx

People

Description

This introduces pydiffx.utils.unified_diffs, which contains a

get_unified_diff_hunks method. This method iterates through a byte

string, returning a list of information on each hunk found in the

string, up until either the end of the string or the first occurrence of

something other thna a hunk.
The following general information is returned:

The number of lines from the provided list of lines that have been

  processed to return hunk data.
The total numbers of inserts and deletes found across all hunks.

For each hunk:

The number of lines of context before/after the changed lines in the

  hunk.
The header context (usually a function/class after the @@ ... @@).

For each side (original/modified) of each hunk:

The 0-based line number in the file where the start of the hunk

  should map to.
The number of lines in the file represented by the hunk.
The 0-based line numbers in the file where the first and last change

  in the hunk occurs.
The total number of lines changed in the hunk.

This can be told to ignore junk between headers, which is helpful for

gathering stats across an entire diff file.
It will raise a MalformedHunkError if it finds anything really out of

the ordinary (such as a premature end of a hunk, or garbage found within

the hunk).
This method is based on a similar method we have in Review Board, but

with some improvements to parsing, strictness, and results. It will be

used by the DiffX DOM class in an upcoming change to calculate stats

for the generated DiffX file.

Testing Done

Unit tests pass on Python 2 and 3.

Built the docs and checked that they rendered and linked correctly.
Checked for spelling errors.

Commits

Summary	ID
Add a utility parser for hunks in a diff. This introduces `pydiffx.utils.unified_diffs`, which contains a `get_unified_diff_hunks` method. This method iterates through a byte string, returning a list of information on each hunk found in the string, up until either the end of the string or the first occurrence of something other thna a hunk. The following general information is returned: * The number of lines from the provided list of lines that have been processed to return hunk data. * The total numbers of inserts and deletes found across all hunks. For each hunk: * The number of lines of context before/after the changed lines in the hunk. * The header context (usually a function/class after the `@@ ... @@`). For each side (original/modified) of each hunk: * The 0-based line number in the file where the start of the hunk should map to. * The number of lines in the file represented by the hunk. * The 0-based line numbers in the file where the first and last change in the hunk occurs. * The total number of lines changed in the hunk. This can be told to ignore junk between headers, which is helpful for gathering stats across an entire diff file. It will raise a `MalformedHunkError` if it finds anything really out of the ordinary (such as a premature end of a hunk, or garbage found within the hunk). This method is based on a similar method we have in Review Board, but with some improvements to parsing, strictness, and results. It will be used by the `DiffX` DOM class in an upcoming change to calculate stats for the generated DiffX file.	e025a21b02008b0dedb8f929f6e3ec94ab43d76a

Summary

Add a utility parser for hunks in a diff.

This introduces `pydiffx.utils.unified_diffs`, which contains a `get_unified_diff_hunks` method. This method iterates through a byte string, returning a list of information on each hunk found in the string, up until either the end of the string or the first occurrence of something other thna a hunk. The following general information is returned: * The number of lines from the provided list of lines that have been processed to return hunk data. * The total numbers of inserts and deletes found across all hunks. For each hunk: * The number of lines of context before/after the changed lines in the hunk. * The header context (usually a function/class after the `@@ ... @@`). For each side (original/modified) of each hunk: * The 0-based line number in the file where the start of the hunk should map to. * The number of lines in the file represented by the hunk. * The 0-based line numbers in the file where the first and last change in the hunk occurs. * The total number of lines changed in the hunk. This can be told to ignore junk between headers, which is helpful for gathering stats across an entire diff file. It will raise a `MalformedHunkError` if it finds anything really out of the ordinary (such as a premature end of a hunk, or garbage found within the hunk). This method is based on a similar method we have in Review Board, but with some improvements to parsing, strictness, and results. It will be used by the `DiffX` DOM class in an upcoming change to calculate stats for the generated DiffX file.

e025a21b02008b0dedb8f929f6e3ec94ab43d76a

Issues

Description	From	Last Updated
E501 line too long (80 > 79 characters)	reviewbot	July 23, 2021, 6:05 p.m.

flake8 failed.

JSHint passed.

flake8

python/pydiffx/utils/unified_diffs.py (Diff revision 1)
The issue has been resolved. Show all issues
```
E501 line too long (80 > 79 characters)
```

Ship it!

```
Ship It!
```

Status:: Completed
Change Summary:: Pushed to master (7c118b4)