Implementation of GoDiffX up to Reader.go

Review Request #11872 — Created Nov. 10, 2021 and updated — Latest diff uploaded

Information

DiffX
master

Reviewers

Review post for all of GoDiffX up to the Reader. A prior review was deprecated as
I made some changes to the basic boilerplate as I started implementing Reader.go.

There are a few known issues with the implementation at the moment. The most prevalent
is that the reader can not read files with encoded utf-16 or utf-32 encodings. The
reason for this is I did not know bytes.Buffer would cast those values down into binary
which then causes it to be encoded and decoded improperly. However, If I encode text into
utf-16 or utf-32 and store them in multiple bytes, then it will work properly. This is something
I plan to look at next and will likely require a large overhaul of the code.

I do have a few questions about my implementation that I would appreciate some advice on and that
I hope to have time to try and fix prior to the CANOSP pencil's down deadline. Both are in regards to
some commented out tests for reader.go.

The first is in regards to a slack question I had a few days ago,
but for TestWithNewlinesFileLFContentCRLF I still need help understanding why the python version
says the length of one option should be 709, as I find 712 with my program and that's what I count
too. I feel like I am accidentally miscounting the values of something.

The second is why the test with extra newlines passes when the very first lines
are newlines. By my understanding, this should cause the iterator to go into
readHeader and that should return some version of none and then it will break
out of the for loop, meaning there is no SectionDict in that iteration. Should I
build my program to just ignore newlines until I find content or should it cause an
error?

The third is about the option 'xxx' from the test with a large content. I had assumed
options were more standardized than metadata given they seemed to be about how to process
text. Because of this, I made a custom struct to hold the information so that I could write
some simple functions to ensure data was okay. Am I good to assume for the go implementation
that options will usually be something like format or encoding, or should I look to change
this to a more generic interface in the future?

All functionality has been manually tested by importing this library on my machine.

I have also written tests for all of my functions. One note is there are some tests in
reader.go that are currently commented out because they do not work. See the description
above for questions behind those or the rationale for them not working.

Commits

Files