Add the initial documentation and spec for DiffX.

Review Request #8914 — Created April 22, 2017 and updated

Information

DiffX
master
40c1b90...

Reviewers

This goes over the rationale for why we need a new diff file format,
answers questions about the whys and hows, and includes a specification
for the format itself.

The spec is not complete. It's an iteration over what we had on
Hackpad/Notion, but additional changes will be coming to address things
like the metadata format and to include more examples.

Built the docs and read through them. This is still early stages, so a
lot needs to be verified during review and after.

Description From Last Updated

'sys' imported but unused

reviewbotreviewbot

'os' imported but unused

reviewbotreviewbot

'shlex' imported but unused

reviewbotreviewbot

zh_CN is a language code, not an encoding. Perhaps say "Shift-JIS" or "UTF-16LE" (which are both relatively common)?

daviddavid

I'm not sure what "the file is treated as 8-bit binary data" means here, since this seems to be explaining …

daviddavid

How do we parse this if the encoding is something like UTF-16? It seems like there's kind of a chicken …

daviddavid

What happens if a commit message has a line that looks like a diffx header? Perhaps far-fetched, but consider commits …

daviddavid

You are missing the author_date field!

brenniebrennie

Can we make this a comma-separated list? It's weird to have copy-modify and move-modify be their own things, and we …

daviddavid
Checks run (1 failed, 1 succeeded, 1 failed with error)
JSHint passed.
PEP8 Style Checker internal error.
Pyflakes failed.

Pyflakes

david
  1. I'll probably have a lot more comments as I digest this further, but here's a start.

  2. docs/problems-with-diffs.rst (Diff revision 1)
     
     
    Show all issues

    zh_CN is a language code, not an encoding. Perhaps say "Shift-JIS" or "UTF-16LE" (which are both relatively common)?

  3. docs/spec/index.rst (Diff revision 1)
     
     
     
     
    Show all issues

    I'm not sure what "the file is treated as 8-bit binary data" means here, since this seems to be explaining the encoding of the section itself. I'm also not sure what it means for parsing if a section defines its own encoding somewhere in the middle.

    1. I believe all section headings are interpreted as ASCII and this just refers to the body (i.e., the content of the sections) and will be used as a default encoding if any child section does not declare an encoding.

    2. Correct. It's not a valid diffx file if the headers and such are in something like UTF-16. These are all ASCII, and the encoding has to do with the section.

      Having a content length solves parsing problems here for sure.

    3. In that case, the "Section Format" part should specify that all diffx lines are ASCII-only.

  4. docs/spec/index.rst (Diff revision 1)
     
     
    Show all issues

    How do we parse this if the encoding is something like UTF-16? It seems like there's kind of a chicken and egg problem. I'd be way more comfortable if we specified the encoding for the main header explicitly, and maybe even for all sections.

  5. docs/spec/index.rst (Diff revision 1)
     
     
     
     
    Show all issues

    What happens if a commit message has a line that looks like a diffx header?

    Perhaps far-fetched, but consider commits to a diffx tool.

    1. I think it would be great to have content-length=512 for e.g. 512 byte content, so we know exactly how much to parse. That would avoid the issue altogether.

    2. To clarify, I mean:

      #..somesection: content-length=512
      /* 512 bytes */
      
  6. docs/spec/index.rst (Diff revision 1)
     
     
     
     
    Show all issues

    Can we make this a comma-separated list? It's weird to have copy-modify and move-modify be their own things, and we may want to add additional types of operations (permissions change, import from some external repository, etc), some of which might be able to be used in conjunction with others.

  7. 
      
brennie
  1. 
      
  2. docs/spec/index.rst (Diff revision 1)
     
     
    Show all issues

    You are missing the author_date field!

  3.