Summary

[Performance] Improve parsing of huge diffs (hours -> minutes)

Review Request #8075 — Created March 25, 2016 and discarded Nov. 17, 2017, 11:27 a.m. — Latest diff uploaded March 29, 2016, 2:22 a.m.

Information

Owner

mizhka

Repository

Review Board

Branch

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

This patch is simple, but significantly reduces parsing time for huge diffs (size>10Mb, lines>500K). Current implementation concatenates string line by line, so it consumes memory O(N*S) where N is amount of lines per file in diff, S is file diff size. Proposed implementation collects lines into array and then concatenates into one string. So it requires less memory - O(N).

The issue was found on huge diff file (500K lines, 30MB size). Original processing time was around 30min and process was killed by linux OOM killer. After fix, processing time was few minutes (2-3) and finished successfully.

For information, cprofile & perf graphs are attached to review request: method parse_diff_header consumes 95% of time and most of time is spent for string concatenations.

Testing Done

Fix is live on production environment during last 2 weeks. No regression issue found.

Files