Add a streaming parser for DiffX files.

Review Request #11712 — Created July 7, 2021 and submitted

chipx86
DiffX
master
diffx

This introduces diffx.reader.DiffXReader, a streaming parser for the
DiffX file format. This is a low-level interface for DiffX files, which
is able to read from a file stream (such as a local file, HTTP response,
or memory-backed stream) and parse and return each section in the DiffX
files according to the specification.

The parser acts as a generator, and will provide dictionaries containing
information on each section of the DiffX file. This includes options,
metadata or text/diff content, the section ID, section type, and section
hierarchy level. Consumers can build upon this to work with the data as
it comes in, or to easily convert it into another representation.

As a streaming parser, this does not keep much state around between
sections. This makes it ideal for working with very large DiffX files,
or reading from a stream that may incrementally provide new content.
However, it also means that a parsing error may not manifest until later
in the stream, after the consumer has already handled some content. An
object-based implementation is in the works that will address this.

As this is a reference implementation, the parser is very strict about
conforming to the specification, when it comes to header structure,
characters in header options, lengths, and valid option values. It does
take into consideration different newline formats and ignores extra
newlines before/after sections (including at the beginning or end of the
file).

At this stage, DiffXReader should be usable for any production use of
the DiffX format. An accompanying DiffXWriter, and implementations
representing DiffX through an object model, are in the works.

Unit tests pass on Python 2 and 3.

Put the reader through its paces by testing a bunch of valid and invalid
DiffX files generated by hand and by the upcoming DiffXWriter.

Summary
Add a streaming parser for DiffX files.
Description From Last Updated

E501 line too long (81 > 79 characters)

reviewbotreviewbot

E302 expected 2 blank lines, found 1

reviewbotreviewbot

These attributes don't exist. Should these be Section.MAIN, etc?

daviddavid

It's just one element, but maybe do this before the list comprehension? Seems weird to process first and then throw ...

daviddavid
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

chipx86
chipx86
Review request changed

Change Summary:

Removed an extra blank line.

Commits:

Summary
-
Add a streaming parser for DiffX files.
+
Add a streaming parser for DiffX files.

Diff:

Revision 3 (+3892)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

chipx86
david
  1. 
      
  2. python/diffx/reader.py (Diff revision 4)
     
     
     
     
     
     
     
     
     
     

    These attributes don't exist. Should these be Section.MAIN, etc?

    1. Yep. Got a fix in my tree. I meant to update the diff.

  3. python/diffx/utils/text.py (Diff revision 4)
     
     
     
     
     
     
     

    It's just one element, but maybe do this before the list comprehension? Seems weird to process first and then throw away data second.

    1. This function gets some big enough changes in /r/11714 (the current version has some faults to it), so I'll investigate doing it in that change.

  4. 
      
chipx86
david
  1. Ship It!
  2. 
      
chipx86
Review request changed

Status: Closed (submitted)

Change Summary:

Pushed to master (13eaec0)
Loading...