Add better MySQL compatibility, new arguments, and fixes to condensediffs.
Review Request #10909 — Created Feb. 21, 2020 and submitted
condensediffs
has been around since Review Board 1.7, and has changed
a bit over the years to accommodate changes to our diff storage, but has
largely still remained the same command. This is the first major change
to condensediffs and to the diff migration machinery since the
introduction of the compressed diff storage in Review Board 2.5.When we moved to the new storage mechanism and updated the migration
process, we uncovered a bug that affected MySQL users using InnoDB for
thediffviewer_filediffdata
table (in other words, most MySQL users)
where fetching a row count on that table could take a very long time.
This is due to the fact that InnoDB does not store accurate row counts,
requiring that the table is walked every time a count is needed
(database-side query caching helps to mitigate this), and this can cause
some serious issues with large databases.In order to address that, and to improve migration overall for large
databases, this change makes a number of improvements tocondensediffs
and to the overall diff migration process.
On MySQL, we now use an alternative method to fetch row counts. On
MyISAM, this will be accurate, but on InnoDB it will be an
approximation (which may be quite wrong). Still, it's fast. We warn
the user about this when it happens, and tell them that it may be
inaccurate but it's safe.Row counts are no longer a factor in the migration process. They are
now purely informative, used for progress reporting. That means that
an inaccurate row count on InnoDB won't break anything.
condensediffs
can now be told to perform a migration without even
fetching any counts by using--no-progress
. This provides a simple
"XX diffs migrated" status message, and is suitable for running in
scheduled migration operations.
condensediffs
can also now be told to migrate only a specified
number of diffs by using--max-diffs=<value>
. This is also helpful
for scheduled migrations, when you may want to process a certain
number per night while the server isn't being used as heavily.It can also now show the total number of unmigrated
FileDiff
and
LegacyFileDiffData
objects, allowing people to see what's left
without having to run a migration and Control-C it.Some legacy code around
FileDiff
migrations weren't showing
accurate diff savings (showing 0% or close to it). The calculations
made sense when converting to theLegacyFileDiffData
, but not
RawFileDiffData
. They've been updated to show the savings from that
conversion.Calculating the number of unmigrated
FileDiff
s is now faster. We're
no longer trying to compare diff content against empty strings, and
instead just comparing references to the diff data tables.Documentation has also been updated to better help understand the code
and processes.
Populated a database with thousands of
FileDiff
andLegacyFileDiffData
objects. Performed dozens and dozens of migrations against copies of this
database, using all the new command line arguments to test that the
functionality all worked as expected.Tested
--max-diffs
with 0, with values that align with the batch and
object fetch limit boundaries, with the total number ofFileDiff
s,
total number ofLegacyFileDiffData
s, the total number for both, values
that surpass that, and all sorts of numbers in-between (including funky,
non-round numbers). Verified that the requested number never broke
anything, and that that number of diffs was always migrated (unless it
was more than the number in the database).Tested progress reporting with a bad estimate (both under and over the
actual row count). Verified the finalized progress update showed 100%
and reflected the actual number of diffs processed.Tested the row calculation on MySQL, and that the warning was displayed
when using InnoDB.