Implementation of GoDiffX up to Reader.go

Review Request #11872 — Created Nov. 10, 2021 and updated

jordanvandenbruel
DiffX
master
diffx

Review post for all of GoDiffX up to the Reader. A prior review was deprecated as
I made some changes to the basic boilerplate as I started implementing Reader.go.

There are a few known issues with the implementation at the moment. The most prevalent
is that the reader can not read files with encoded utf-16 or utf-32 encodings. The
reason for this is I did not know bytes.Buffer would cast those values down into binary
which then causes it to be encoded and decoded improperly. However, If I encode text into
utf-16 or utf-32 and store them in multiple bytes, then it will work properly. This is something
I plan to look at next and will likely require a large overhaul of the code.

I do have a few questions about my implementation that I would appreciate some advice on and that
I hope to have time to try and fix prior to the CANOSP pencil's down deadline. Both are in regards to
some commented out tests for reader.go.

The first is in regards to a slack question I had a few days ago,
but for TestWithNewlinesFileLFContentCRLF I still need help understanding why the python version
says the length of one option should be 709, as I find 712 with my program and that's what I count
too. I feel like I am accidentally miscounting the values of something.

The second is why the test with extra newlines passes when the very first lines
are newlines. By my understanding, this should cause the iterator to go into
readHeader and that should return some version of none and then it will break
out of the for loop, meaning there is no SectionDict in that iteration. Should I
build my program to just ignore newlines until I find content or should it cause an
error?

The third is about the option 'xxx' from the test with a large content. I had assumed
options were more standardized than metadata given they seemed to be about how to process
text. Because of this, I made a custom struct to hold the information so that I could write
some simple functions to ensure data was okay. Am I good to assume for the go implementation
that options will usually be something like format or encoding, or should I look to change
this to a more generic interface in the future?

All functionality has been manually tested by importing this library on my machine.

I have also written tests for all of my functions. One note is there are some tests in
reader.go that are currently commented out because they do not work. See the description
above for questions behind those or the rationale for them not working.

Summary Author
Create LICENSE and README for project
Jordan
Create basic support files for diffx.
Jordan
Write basic constants for options.go
Jordan
Wrote code for sections.go
Jordan
Update documentation for sections.go
Jordan
Write all custom errors for godiffx.
Jordan
Add documentation and fix line value for DiffXParseError
Jordan
Create text values and write SplitLines
Jordan
Fix Errors and Write StripBom Method.
Jordan
Created test files and wrote init tests for Errors.go
Jordan
Finish tests for errors.go
Jordan
Add testing for sections.go
Jordan
Wrote unit tests for finished functions in text.go
Jordan
Create new error for invalid encoding.
Jordan
Create encode method for text.go
Jordan
Add GetNewlineForType function in text.go
Jordan
Finish manually testing all functions for text.go
Jordan
Write tests for line ending methods.
Jordan
Wrote unit test for encoding function
Jordan
Test a few more edge cases in the code.
Jordan
Write initial documentation for unifiedDiffs.go.
Jordan
WIP update for unifiedDiffs.go
Jordan
Finished unified_diffs.go
Jordan
Write basics for unifiedDiffs testing and fix bugs encountered.
Jordan
Update unifiedDiff tests to ensure proper returns and tweaked bugs.
Jordan
Start progress on reader.go
Jordan
Decode function, ReadUntil function, and a few misc changes in Reader.go
Jordan
Finish readContent and readHeader for reader.go
Jordan
functioning reader with miscellaneous changes to better support it.
Jordan
Update tests after finished reader prototype.
Jordan
Wrote first test for reader.go.
Jordan
Write functional tests for reader.go with required bug fixes.
Jordan
Add errors and remove panics from reader.go.
Jordan
Update documetation and cleanup code.
Jordan
Description From Last Updated

Is this supposed to be used for future work, or did you forget to delete it?

akim.ruslanovakim.ruslanov

Perhaps consider writing the block comments like this: /* Everything here will be considered a block comment */ This can ...

akim.ruslanovakim.ruslanov

Wondering if there is a chance this could trigger index out of bounds?

akim.ruslanovakim.ruslanov

This function has a lot of nested if statements which can make it pretty hard to read. Perhaps refactoring it ...

akim.ruslanovakim.ruslanov

Accidentally deleted a comma?

akim.ruslanovakim.ruslanov
jordanvandenbruel
jordanvandenbruel
Review request changed

Summary:

-[WIP] Progress on reader.go
+Implementation of GoDiffX up to Reader.go

Description:

~  

This WIP contains all of the work for GoDiffX since there are some tweaks coming

~   to the utils as I actually put them into practice.

  ~

Review post for all of GoDiffX up to the Reader. A prior review was deprecated as

  ~ I made some changes to the basic boilerplate as I started implementing Reader.go.

   
~  

reader.go is almost finished, and I think it should be working properly. The

~   code is still a mess and partially documented, so I will need to go back and
~   clean that up.

  ~

There are a few known issues with the implementation at the moment. The most prevalent

  ~ is that the reader can not read files with encoded utf-16 or utf-32 encodings. The
  ~ reason for this is I did not know bytes.Buffer would cast those values down into binary
  + which then causes it to be encoded and decoded improperly. However, If I encode text into
  + utf-16 or utf-32 and store them in multiple bytes, then it will work properly. This is something
  + I plan to look at next and will likely require a large overhaul of the code.

   
~  

There have also been some small modifications to the base files I created as

~   I found better ways to implement them so that they are more useful for reader.go

  ~

I do have a few questions about my implementation that I would appreciate some advice on and that

  ~ I hope to have time to try and fix prior to the CANOSP pencil's down deadline. Both are in regards to
  + some commented out tests for reader.go.

  +
  +

The first is in regards to a slack question I had a few days ago,

  + but for TestWithNewlinesFileLFContentCRLF I still need help understanding why the python version
  + says the length of one option should be 709, as I find 712 with my program and that's what I count
  + too. I feel like I am accidentally miscounting the values of something.

  +
  +

The second is why the test with extra newlines passes when the very first lines

  + are newlines. By my understanding, this should cause the iterator to go into
  + readHeader and that should return some version of none and then it will break
  + out of the for loop, meaning there is no SectionDict in that iteration. Should I
  + build my program to just ignore newlines until I find content or should it cause an
  + error?

  +
  +

The third is about the option 'xxx' from the test with a large content. I had assumed

  + options were more standardized than metadata given they seemed to be about how to process
  + text. Because of this, I made a custom struct to hold the information so that I could write
  + some simple functions to ensure data was okay. Am I good to assume for the go implementation
  + that options will usually be something like format or encoding, or should I look to change
  + this to a more generic interface in the future?

Testing Done:

~  

Manually tested reader.go with the first python test case (test_with_simple_diff) and

  ~

All functionality has been manually tested by importing this library on my machine.

-   it works if you change the length from 692 to 696 (not sure why the python version is 692).
-   Insight into why my code isn't working and why the length should be 692 would be appreciated!

   
~  

Other updated files have not been tested.

  ~

I have also written tests for all of my functions. One note is there are some tests in

  + reader.go that are currently commented out because they do not work. See the description
  + above for questions behind those or the rationale for them not working.

Commits:

Summary Author
-
Create LICENSE and README for project
Jordan
-
Create basic support files for diffx.
Jordan
-
Write basic constants for options.go
Jordan
-
Wrote code for sections.go
Jordan
-
Update documentation for sections.go
Jordan
-
Write all custom errors for godiffx.
Jordan
-
Add documentation and fix line value for DiffXParseError
Jordan
-
Create text values and write SplitLines
Jordan
-
Fix Errors and Write StripBom Method.
Jordan
-
Created test files and wrote init tests for Errors.go
Jordan
-
Finish tests for errors.go
Jordan
-
Add testing for sections.go
Jordan
-
Wrote unit tests for finished functions in text.go
Jordan
-
Create new error for invalid encoding.
Jordan
-
Create encode method for text.go
Jordan
-
Add GetNewlineForType function in text.go
Jordan
-
Finish manually testing all functions for text.go
Jordan
-
Write tests for line ending methods.
Jordan
-
Wrote unit test for encoding function
Jordan
-
Test a few more edge cases in the code.
Jordan
-
Write initial documentation for unifiedDiffs.go.
Jordan
-
WIP update for unifiedDiffs.go
Jordan
-
Finished unified_diffs.go
Jordan
-
Write basics for unifiedDiffs testing and fix bugs encountered.
Jordan
-
Update unifiedDiff tests to ensure proper returns and tweaked bugs.
Jordan
-
Start progress on reader.go
Jordan
-
Decode function, ReadUntil function, and a few misc changes in Reader.go
Jordan
-
Finish readContent and readHeader for reader.go
Jordan
-
functioning reader with miscellaneous changes to better support it.
Jordan
+
Create LICENSE and README for project
Jordan
+
Create basic support files for diffx.
Jordan
+
Write basic constants for options.go
Jordan
+
Wrote code for sections.go
Jordan
+
Update documentation for sections.go
Jordan
+
Write all custom errors for godiffx.
Jordan
+
Add documentation and fix line value for DiffXParseError
Jordan
+
Create text values and write SplitLines
Jordan
+
Fix Errors and Write StripBom Method.
Jordan
+
Created test files and wrote init tests for Errors.go
Jordan
+
Finish tests for errors.go
Jordan
+
Add testing for sections.go
Jordan
+
Wrote unit tests for finished functions in text.go
Jordan
+
Create new error for invalid encoding.
Jordan
+
Create encode method for text.go
Jordan
+
Add GetNewlineForType function in text.go
Jordan
+
Finish manually testing all functions for text.go
Jordan
+
Write tests for line ending methods.
Jordan
+
Wrote unit test for encoding function
Jordan
+
Test a few more edge cases in the code.
Jordan
+
Write initial documentation for unifiedDiffs.go.
Jordan
+
WIP update for unifiedDiffs.go
Jordan
+
Finished unified_diffs.go
Jordan
+
Write basics for unifiedDiffs testing and fix bugs encountered.
Jordan
+
Update unifiedDiff tests to ensure proper returns and tweaked bugs.
Jordan
+
Start progress on reader.go
Jordan
+
Decode function, ReadUntil function, and a few misc changes in Reader.go
Jordan
+
Finish readContent and readHeader for reader.go
Jordan
+
functioning reader with miscellaneous changes to better support it.
Jordan
+
Update tests after finished reader prototype.
Jordan
+
Wrote first test for reader.go.
Jordan
+
Write functional tests for reader.go with required bug fixes.
Jordan
+
Add errors and remove panics from reader.go.
Jordan
+
Update documetation and cleanup code.
Jordan

Diff:

Revision 3 (+8839 -753)

Show changes

Checks run (2 succeeded)

flake8 passed.
JSHint passed.
akim.ruslanov
  1. 
      
  2. go/godiffx/options.go (Diff revision 3)
     
     
     
     

    Is this supposed to be used for future work, or did you forget to delete it?

    1. It was past work, thanks for the catch!

  3. go/godiffx/reader.go (Diff revision 3)
     
     
     
     
     
     
     
     
     
     
     
     

    Perhaps consider writing the block comments like this:
    /* Everything here will be considered a block comment */
    This can make it consistent witht the way comments are written in JS components

    1. This is the format that header comments are supposed to be written for Go functions so that when you use GoDocs, it automatically compiles them. In all the examples I saw, multiline comments were kept like this. I'm just basing that off of this though: https://go.dev/blog/godoc. It's something I'll try and look into to see if it can be changed to the other format!

  4. go/godiffx/sections_test.go (Diff revision 3)
     
     
     
     
     
     
     
     
     
     

    Wondering if there is a chance this could trigger index out of bounds?

    1. It should not assuming the test works properly. If it didn trigger an out-of-bound, it would mean my init isn't working properly anyways

  5. go/godiffx/unifedDiffs.go (Diff revision 3)
     
     

    This function has a lot of nested if statements which can make it pretty hard to read. Perhaps refactoring it with helper functions can help

    1. that's a good point, it might be a good idea to refactor it and pull out the functionality for different types of headers

  6. Accidentally deleted a comma?

    1. Good catch, not sure if I'll have to change this as it relates to a bugfix I did within the Python code and I'm not sure if that change was shipped yet

  7. 
      
Loading...