Summary

Implementation of GoDiffX up to Reader.go

Review Request #11872 — Created Nov. 10, 2021 and updated 3 years, 7 months ago — Latest diff uploaded 3 years, 7 months ago

Information

Owner

jordanvandenbruel*

Repository

DiffX

Branch

master

Bugs

Depends On

Reviewers

Groups

diffx

People

Description*

Review post for all of GoDiffX up to the Reader. A prior review was deprecated as
I made some changes to the basic boilerplate as I started implementing Reader.go.

There are a few known issues with the implementation at the moment. The most prevalent
is that the reader can not read files with encoded utf-16 or utf-32 encodings. The
reason for this is I did not know bytes.Buffer would cast those values down into binary
which then causes it to be encoded and decoded improperly. However, If I encode text into
utf-16 or utf-32 and store them in multiple bytes, then it will work properly. This is something
I plan to look at next and will likely require a large overhaul of the code.

I do have a few questions about my implementation that I would appreciate some advice on and that
I hope to have time to try and fix prior to the CANOSP pencil's down deadline. Both are in regards to
some commented out tests for reader.go.

The first is in regards to a slack question I had a few days ago,
but for TestWithNewlinesFileLFContentCRLF I still need help understanding why the python version
says the length of one option should be 709, as I find 712 with my program and that's what I count
too. I feel like I am accidentally miscounting the values of something.

The second is why the test with extra newlines passes when the very first lines
are newlines. By my understanding, this should cause the iterator to go into
readHeader and that should return some version of none and then it will break
out of the for loop, meaning there is no SectionDict in that iteration. Should I
build my program to just ignore newlines until I find content or should it cause an
error?

The third is about the option 'xxx' from the test with a large content. I had assumed
options were more standardized than metadata given they seemed to be about how to process
text. Because of this, I made a custom struct to hold the information so that I could write
some simple functions to ensure data was okay. Am I good to assume for the go implementation
that options will usually be something like format or encoding, or should I look to change
this to a more generic interface in the future?

Testing Done

All functionality has been manually tested by importing this library on my machine.

I have also written tests for all of my functions. One note is there are some tests in
reader.go that are currently commented out because they do not work. See the description
above for questions behind those or the rationale for them not working.

Diff Revision 3 (Latest)

orig

Commits

Summary	ID	Author
Create LICENSE and README for project First commit for godiffx, just boilerplate based off of pydiffx. Testing done: NA	4552441df26f5bc1f71e0957b915eb5d294e2c88	Jordan
Create basic support files for diffx. Created errors.go, sections.go, options.go, and text.go. These will all be based on their python equivalents from pydiffx. At the moment, there is no code within them besides stating which package they are apart of. No functionality introduced so no testing done.	53615b11759d77a06b2a0d6aa8f387beeb1eced7	Jordan
Write basic constants for options.go Added the basic functions to options.go based on what is in options.py for pydiffx. This was accomplished via enums using iota in go. The implementation may be tweaked overtime once I actually need to start using them. The functionality of these was tested in another go file. This was also the first time I tried testing the godiffx module locally and it worked!	ed55e923e78ead35565ee3f344a234516cfc3921	Jordan
Wrote code for sections.go sections.go is based off of sections.py, and the functionality was replicated as close as possible to that file. One hiccup was that Go does not support sets natively, but an easy work around is to use sets and just set all of the keys to true. Code was tested by creating the struct in another file that imported this module. The default parameter method was tested and properly adds in the default values and the getters were all functioning as expected.	3659b78ffe8b8cc27d23a5d1f497f2bb0ab00a21	Jordan
Update documentation for sections.go Basic documentation via comments was added to the code in sections.go. No new functionality was introduced so there was no additional testing.	398e1cd37ede5380265ee716bb93f7217bfa9996	Jordan
Write all custom errors for godiffx. These are based on the errors that were created for pydiffx. Go does not use errors in the same way, but an error struct was created and an error can be returned whenever needed by other functions. One note is that at the moment, there is no unicode support since I am not very familiar with it or implementing it, but I assume it will be as simple as changing the type and possibly importing the library in the future. This was tested by importing the module locally and testing each of the different types of errors. The init function was validated to ensure it would not overwrite an existing error and the Err function was tested to ensure it printed properly. I also tested to ensure the baseDiffXError was not accessible.	37da0122f047f79a5cb92e0aeba4d0b5711196ec	Jordan
Add documentation and fix line value for DiffXParseError Basic documentation was added to all of the functions and errors. They are all pretty simple definitions based off of what was used in the pydiffx version. There was also a modification made to the way line and column values are treated in DiffXParseError. Originally this was set to just use the values as they are passed in. However, pydiffx assumes these are 0 based so the message increments it by 1. This change was tested by importing the module locally in another file.	b9ade0ac8f01e150291ac24e2bebc38b01276cbc	Jordan
Create text values and write SplitLines Following the example from text.py, the maps have been created for godiffx as well. One thing to note is the utils folder is now gone. This is because when creating a module in Go, you actually don't use nested directories since those will then belong to a new package. Another change is that instead of using a map for the objects like in python, structs were created along with a way to initialize their default values. The reasoning behind this is that BOMS uses tuples to store multiple values in pydiffx, but tuples don't exist in Go. The SplitLines method was also written. This took a fair bit of time while I experimented and learned about the best way to handle bytes. The bytes.Buffer seems to be a very efficient data structure that comes with some functionality. Time will tell if this is the right way to go! All of the objects have been tested in the separate project using this module and I have run SplitLines with what I believe to be the proper behaviour.	ddb24d486e20a4f3365cb1186c647575d2ff3458	Jordan
Fix Errors and Write StripBom Method. I made a silly mistake when creating the errors by naming the error functions 'Err()' instead of 'Error()' so they were not quite working like errors yet, so that has been resolved. I also wrote the StripBom method which will remove a given BOM from a byte buffer if it exists. In order to do this, I made a method that can retrieve the BOM given a string of the encoding type. I tested out the methods again in main. To test StripBom, I created a buffer that held "\uFEFF is a string that starts with a Byte Order Mark" and compared its length to before and after running the function. The buffer that the function returns will be shorter but it will not mutate the buffer passed in.	87a9119601141817aa81cef3e8b163a7d20b4f7b	Jordan
Created test files and wrote init tests for Errors.go All of the test files were created and can be run with the command 'go test'. I have written all of the Init tests for errors.go except for the base class since I am not 100% sure how to test private methods. However, I don't think it is too important since all of the other test cases work.	0576404dacb4b9f6ac9f5eceb9c29ee97c022b2a	Jordan
Finish tests for errors.go Finished writing the rest of the tests for errors.go, specifically all of the tests for the Error() methods. The only testing that was done was running the test suite and seeing all of the new tests pass.	40257068d62ca72958a447221529a04876ca908a	Jordan
Add testing for sections.go The testing itself is pretty trivial, since it just ensure the init methods works properly and that the set functions return the proper sets. The ValidSectionStates function has a very long test since there are so many different pieces that need to be checked. Not sure if this can be simplified some way in the future.	cb86e91b7b51ac9be3ed52c35fd9aab20baefa90	Jordan
Wrote unit tests for finished functions in text.go While there is still a bit of work to be done for text.go, the unit tests for the finished functions have now been added in. These tests try to test for every edge case I can think of, but suggestions are welcome! I also do not know if there is a better way to run these tests. Currently, each test is rather large with a number of small pieces it tests for a function. It seems like Go tests usually have one test per function instead of multiple tests for different edge cases of a function, but if I find something new then I will make adjustments. The ony strategy I saw is to make a struct with test data, but I feel it would bloat the code a bit due to the number of byte buffers needed.	6f73cf4159218d416aa97234e19481982e6c2c94	Jordan
Create new error for invalid encoding. Since it seemed relatively straightforward to add my own encoding and there weren't any great go modules I found with a quick search, I have started creating a function to encode in some given utf scheme. It makes sense to begin by creating an error that can be thrown in case someone provides an invalid string. A go test has been created as well in the same manner as all of the other error tests to ensure InvalidEncodingError works properly.	13bebc89fe5764f1da87e795230cab2eaddd1cb4	Jordan
Create encode method for text.go I created an encode function based off of what Python does. There weren't any libraries I found quick that seemed to cover the different utf styles. I based their implementation off of the returns for unix and dos newlines. I have not created tests for the function yet, but I manually tested it by using the function in another project and comparing the output to Python	b5ef19b54466ccd7e3b3a961188974203b0c15f4	Jordan
Add GetNewlineForType function in text.go This function is the same as the version in pydiffx. One note is that it returns a byte slice instead of a buffer, but I thought this would make more sense since it will usually only be a few bytes at most. There is no testing created for this function yet, it will be written at a later time. But it was manually tested with utf8 and utf16.	0951ce3c23a99b2d1b5e53128ebdd85feb28e413	Jordan
Finish manually testing all functions for text.go All functions have now been written and manually tested in text.go. I thought I had things working on Saturday, but there was still an issue related to an EOF error that I had to sort out. Manual testing was done by creating a buffer that had both unix and dos newlines as well as an empty one where it assumed we used unix. No tests have been written for the newline guessing or getting.	5678c06d947b2c40ff5e73814a2ae1af36812da4	Jordan
Write tests for line ending methods. This commit just adds a couple tests to ensure the GuessLineEnding and GetNewlineForType functions work properly.	3225e18cd1e33cf1f473cedfff7eb37fdb9df982	Jordan
Wrote unit test for encoding function Just a quick unit test for encoding function. It's very inclusive and covers every possible return. I noticed that Go only thinks I have ~90% coverage for my tests, so I will likely spend some time trying to identify the edge cases I missed.	1a455560529844e0eb504cb130b8136534fbf337	Jordan
Test a few more edge cases in the code. The Go test coverage is now at 98%, so almost everything is accounted for. However, there are a couple of pieces that I'm not sure if I can test. These are mainly catching an EOF error from ReadByte() functions because I use a function before that should ensure it never sees this issue. I don't know how I could test some of the pieces of code even with a mocker but I am very confident that those small areas should never have an issue. Their error is handled somehow earlier by another method, but it is still good style to double check that error in the code.	6e56198f6904b19e5d0de6b4bc1009313bc234bc	Jordan
Write initial documentation for unifiedDiffs.go. There isn't any functionality in this commit, mainly just getting all of the documentation down for this file. At the top, I did create the regex function but it is commented since it is not being used yet and I am a bit unsure of what the expression is all doing (but I've asked a question in the slack). Because Go is not as flexible as Python, I can not just create a dictionary of whatever I would like, so I had to problem solve a bit to figure out what I could return. The solution I am going with for now is to create a couple structs that can hold all relevant information. This is the way I've seen Go handle nested JSON before so I figure it would be an appropriate approach. There is not testing since this commit does not come with any funcitonality.	705e1aef8ebc6cb0211aa9be22940aef383e5dc1	Jordan
WIP update for unifiedDiffs.go The unifiedDiffs.go now has the start of its functionality, namely it now can properly deal with seeing the header of a hunk. There was a lot of experimentation to understand how regexp works within go along with understanding how the implementation works for unified_diffs.py. I have a pretty good idea of how the whole python program works now, so it is just a matter of understanding how that should best be translated. The only testing that has been done is to ensure the hunk is initially created properly (which it is). No tests have been written yet.	96622a38479aa2cf29a4823e54b2d4dca1ff81f4	Jordan
Finished unified_diffs.go Finshed writing all of the code for unified_diffs.go. You can pass it a list of byte buffers and it can create a proper return. So far, it has only been tested with the first test case from the python version, but I will write tests next that will include the other types to ensure the functionality is good.	184fc152fcd8cff3ac006902960d403363c2a01d	Jordan
Write basics for unifiedDiffs testing and fix bugs encountered. All of the tests that need to be written have been created for unifiedDiffs.go based on pydiffx. Because it takes so long to write out all of the different test cases and because it does not seem as straightforward to compare my custom structs as I had hoped, the testing is currently very minimal. Most test cases ensure that the correct number of hunks are returned or the proper error. There are also a lot of other small tweaks to other files like unifiedDiffs.go and errors.go as I had small mistakes in my understanding of how the Python version was supposed to work.	27fdcf5139556f0750682ce728bf285f8b80c89a	Jordan
Update unifiedDiff tests to ensure proper returns and tweaked bugs. All of the testing is finally complete for unifiedDiffs.go. I had to actually validate all of the UnifiedDiffHunks that were returned to ensure the values were correct. In the process, I found a few minor bugs that I was able to fix up.	430153ba69177c03209a3bc7a382d3379e911e53	Jordan
Start progress on reader.go This commit is not really a finished product, but rather to show the progress so far on reader.go. It's been challenging trying to find an appropriate way to create a parser where there may be different items in each iteration. Unlike Python, Go does not support dictionaries that hold different items, so a more robust struct must be created to hold all of the information. None of this has really been tested since there is not finished product, but I am trying to figure out the parts needed to get this to work via small manual tests.	76790c524e9eeb931d5ae470ef7d55323b822d8e	Jordan
Decode function, ReadUntil function, and a few misc changes in Reader.go This commit contains a number of scattered changes due to the way I am trying to follow the Python code functionality within Go. I have completed the ReadUntil function and it seems to behave properly, but no testing has been written for it. I have also created a Decode function, similar to the Encode one I made to help replicate a similar process to the Python version. While there was a module that helped with this, it was setup where a function would create a decoding object depending on the type of decoding required, use it to decode the given information, and return it. It does technically rely on 2 modules because UTF32 is not well supported, but the module itself seems to be good. I have only written testing for Decode to ensure it works and to maintain test completion for all previous parts of the code. However, the test does rely on the encode function which is not right. It is not a priority to fix this right away since there is still a lot of work to be done on the reader and writer.	076b3932b5f7231f88a52a5e65ba027ae43155f6	Jordan
Finish readContent and readHeader for reader.go All of the code except for the iterator itself has been written out. All the methods have been manually tested, but there is a chance there are bugs since it has just been some small simple tests to ensure the functions are good. No tests have been written yet.	c839e4703409c1fa4721185a74b5d48b259ce9f5	Jordan
functioning reader with miscellaneous changes to better support it. This commit has (hopefully) a working reader. It has been tested with the first test case from the python implementation, and works if you change the length from 692 to 696. Many of the supporting files were modified as well to better support the reader. Manual testing has been done with the reader for the one test case, but none of the other changes have been tested (meaning their test cases haven't been updated either).	fa07825e8b16f277c030a835013fd0de2970fe80	Jordan
Update tests after finished reader prototype. This is just a quick commit to update some of the existing tests that became outdated due to small tweaks to the codebase. There likely will still be a few more tweaks as I implemet all of the tests for reader.go.	bca2efee0dbf1167b492aa0238ab12fd89ea1087	Jordan
Wrote first test for reader.go. This commit just serves as the starting point for the testing. There are so many pieces and a lot of interesting issues that came up while trying to verify the contents of a SectionDict. A few minor tweaks were made to the reader code to better fit the testing requirements. The next commit should contain a lot more tests, and will likely contain all of the tests excluding errors assuming that no tests fail because of a major bug.	1aa4e61a37071af1391cfd3a0de5c81762db132f	Jordan
Write functional tests for reader.go with required bug fixes. The commit contains all of the tests for Reader.go that do not test errors. However, many are commented out or not fully implemented. One of these is due to a length issue that I am not sure how to resolve yet. Two of them are for the UTF encoding tests. While my encoding and decoding functions work when passed a byte object, I am not sure how to read in UTF-16 and UTF-32 since it will not properly store in a byte slice or a bytes.buffer. One of the tests I haven't implemented is for TestWithExtraNewlines since I don't fully understand why the test should work in the first place. The final test I have to fully implement is TestWithHeaderLongLine because I did not realize options should be an interface and not their own struct. There were also numerous bugs that came up throught implementing the tests for nearly every one. I decided it was easier to try and get as much done as possible since I still had not heard back about my first question in the slack. The goal before pencil's down is to implement all the errors properly and resolve most of the issues.	87e048d880e2f57c4afe0516f3c5b9b8abbd9690	Jordan
Add errors and remove panics from reader.go. This commit just replaces all of my previous throwaway panic statements with the proper errors.	66523cd80ef09332e54dd7d95c090c388e75bfd5	Jordan
Update documetation and cleanup code. This commit updates all documentation that had to be completed, removes some print statements for debugging, and tidies some other comments. With this commit, a V0.1 of GoDiffX could be released under the alias '1.0'. While the reader is missing some functionality such as reading in utf-16 and utf-32 as unicode characters and has some other small issues that need to be addressed in debugging, it is relatively stable and can still parse the majority of DiffX files.	158d83567749a7b1586e95b7bb6cdbb6c402d52e	Jordan

Files

	go/LICENSE
	go/README.md
	go/godiffx/errors.go
	go/godiffx/errors_test.go
	go/godiffx/go.mod
	go/godiffx/go.sum
	go/godiffx/options.go
	go/godiffx/reader.go
	go/godiffx/reader_test.go
	go/godiffx/sections.go
	go/godiffx/sections_test.go
	go/godiffx/text.go
	go/godiffx/text_test.go
	go/godiffx/unifedDiffs.go
	go/godiffx/unifiedDiffs_test.go
	python/pydiffx/tests/test_utils_unified_diffs.py

go/LICENSE

go/README.md

go/godiffx/errors.go

go/godiffx/errors_test.go

go/godiffx/go.mod

go/godiffx/go.sum

go/godiffx/options.go

go/godiffx/reader.go

go/godiffx/reader_test.go

go/godiffx/sections.go

go/godiffx/sections_test.go

go/godiffx/text.go

go/godiffx/text_test.go

go/godiffx/unifedDiffs.go

go/godiffx/unifiedDiffs_test.go

python/pydiffx/tests/test_utils_unified_diffs.py