Add better MySQL compatibility, new arguments, and fixes to condensediffs.

Review Request #10909 — Created Feb. 21, 2020 and submitted — Latest diff uploaded

Information

Review Board
release-3.0.x
0a0acb8...

Reviewers

condensediffs has been around since Review Board 1.7, and has changed
a bit over the years to accommodate changes to our diff storage, but has
largely still remained the same command. This is the first major change
to condensediffs and to the diff migration machinery since the
introduction of the compressed diff storage in Review Board 2.5.

When we moved to the new storage mechanism and updated the migration
process, we uncovered a bug that affected MySQL users using InnoDB for
the diffviewer_filediffdata table (in other words, most MySQL users)
where fetching a row count on that table could take a very long time.
This is due to the fact that InnoDB does not store accurate row counts,
requiring that the table is walked every time a count is needed
(database-side query caching helps to mitigate this), and this can cause
some serious issues with large databases.

In order to address that, and to improve migration overall for large
databases, this change makes a number of improvements to condensediffs
and to the overall diff migration process.

  1. On MySQL, we now use an alternative method to fetch row counts. On
    MyISAM, this will be accurate, but on InnoDB it will be an
    approximation (which may be quite wrong). Still, it's fast. We warn
    the user about this when it happens, and tell them that it may be
    inaccurate but it's safe.

  2. Row counts are no longer a factor in the migration process. They are
    now purely informative, used for progress reporting. That means that
    an inaccurate row count on InnoDB won't break anything.

  3. condensediffs can now be told to perform a migration without even
    fetching any counts by using --no-progress. This provides a simple
    "XX diffs migrated" status message, and is suitable for running in
    scheduled migration operations.

  4. condensediffs can also now be told to migrate only a specified
    number of diffs by using --max-diffs=<value>. This is also helpful
    for scheduled migrations, when you may want to process a certain
    number per night while the server isn't being used as heavily.

  5. It can also now show the total number of unmigrated FileDiff and
    LegacyFileDiffData objects, allowing people to see what's left
    without having to run a migration and Control-C it.

  6. Some legacy code around FileDiff migrations weren't showing
    accurate diff savings (showing 0% or close to it). The calculations
    made sense when converting to the LegacyFileDiffData, but not
    RawFileDiffData. They've been updated to show the savings from that
    conversion.

  7. Calculating the number of unmigrated FileDiffs is now faster. We're
    no longer trying to compare diff content against empty strings, and
    instead just comparing references to the diff data tables.

Documentation has also been updated to better help understand the code
and processes.

Populated a database with thousands of FileDiff and LegacyFileDiffData
objects. Performed dozens and dozens of migrations against copies of this
database, using all the new command line arguments to test that the
functionality all worked as expected.

Tested --max-diffs with 0, with values that align with the batch and
object fetch limit boundaries, with the total number of FileDiffs,
total number of LegacyFileDiffDatas, the total number for both, values
that surpass that, and all sorts of numbers in-between (including funky,
non-round numbers). Verified that the requested number never broke
anything, and that that number of diffs was always migrated (unless it
was more than the number in the database).

Tested progress reporting with a bad estimate (both under and over the
actual row count). Verified the finalized progress update showed 100%
and reflected the actual number of diffs processed.

Tested the row calculation on MySQL, and that the warning was displayed
when using InnoDB.