• 
      

    Add better MySQL compatibility, new arguments, and fixes to condensediffs.

    Review Request #10909 — Created Feb. 21, 2020 and submitted — Latest diff uploaded

    Information

    Review Board
    release-3.0.x
    0a0acb8...

    Reviewers

    condensediffs has been around since Review Board 1.7, and has changed
    a bit over the years to accommodate changes to our diff storage, but has
    largely still remained the same command. This is the first major change
    to condensediffs and to the diff migration machinery since the
    introduction of the compressed diff storage in Review Board 2.5.

    When we moved to the new storage mechanism and updated the migration
    process, we uncovered a bug that affected MySQL users using InnoDB for
    the diffviewer_filediffdata table (in other words, most MySQL users)
    where fetching a row count on that table could take a very long time.
    This is due to the fact that InnoDB does not store accurate row counts,
    requiring that the table is walked every time a count is needed
    (database-side query caching helps to mitigate this), and this can cause
    some serious issues with large databases.

    In order to address that, and to improve migration overall for large
    databases, this change makes a number of improvements to condensediffs
    and to the overall diff migration process.

    1. On MySQL, we now use an alternative method to fetch row counts. On
      MyISAM, this will be accurate, but on InnoDB it will be an
      approximation (which may be quite wrong). Still, it's fast. We warn
      the user about this when it happens, and tell them that it may be
      inaccurate but it's safe.

    2. Row counts are no longer a factor in the migration process. They are
      now purely informative, used for progress reporting. That means that
      an inaccurate row count on InnoDB won't break anything.

    3. condensediffs can now be told to perform a migration without even
      fetching any counts by using --no-progress. This provides a simple
      "XX diffs migrated" status message, and is suitable for running in
      scheduled migration operations.

    4. condensediffs can also now be told to migrate only a specified
      number of diffs by using --max-diffs=<value>. This is also helpful
      for scheduled migrations, when you may want to process a certain
      number per night while the server isn't being used as heavily.

    5. It can also now show the total number of unmigrated FileDiff and
      LegacyFileDiffData objects, allowing people to see what's left
      without having to run a migration and Control-C it.

    6. Some legacy code around FileDiff migrations weren't showing
      accurate diff savings (showing 0% or close to it). The calculations
      made sense when converting to the LegacyFileDiffData, but not
      RawFileDiffData. They've been updated to show the savings from that
      conversion.

    7. Calculating the number of unmigrated FileDiffs is now faster. We're
      no longer trying to compare diff content against empty strings, and
      instead just comparing references to the diff data tables.

    Documentation has also been updated to better help understand the code
    and processes.

    Populated a database with thousands of FileDiff and LegacyFileDiffData
    objects. Performed dozens and dozens of migrations against copies of this
    database, using all the new command line arguments to test that the
    functionality all worked as expected.

    Tested --max-diffs with 0, with values that align with the batch and
    object fetch limit boundaries, with the total number of FileDiffs,
    total number of LegacyFileDiffDatas, the total number for both, values
    that surpass that, and all sorts of numbers in-between (including funky,
    non-round numbers). Verified that the requested number never broke
    anything, and that that number of diffs was always migrated (unless it
    was more than the number in the database).

    Tested progress reporting with a bad estimate (both under and over the
    actual row count). Verified the finalized progress update showed 100%
    and reflected the actual number of diffs processed.

    Tested the row calculation on MySQL, and that the warning was displayed
    when using InnoDB.