Enabling lexer guessing

Review Request #3156 — Created June 22, 2012 and discarded — Latest diff uploaded

Information

Review Board

Reviewers

Prolog is not perl, and pygments is smart enough to differentiate the two if we
give it a chance. Pygment's *_lexer_for_filename() functions can guess the lexer
based on two things...
- the filename
- the contents of the file, especially the shebang line

The later of the two was disabled in commit '36b761a', making pygments pretty
dumb about applying syntax highlighting.

For instance, if you ever see red rectangles around '$' characters in perl code
then that's what's happening. Both prolog and perl have a '.pl' file extension
and without file contents pygments flips a coin and half the time guesses the
former (it sorts based on heuristics, which are then both zero).

If there's only one possible option (for instance, a '.java' extension) then
this doesn't have any extra runtime cost. If there is ambiguity (such as perl
vs prolog) then it's an O(n) operation over the file contents. To avoid having
this (or the following pygments.highlight() call) take too long we're imposing
a two second timeout.

We're exercising this change with a ReviewBoard 1.5 instance and has slightly
increased the ReviewBoardDiffFragment latency (p50 raised from 0.1 to 0.2
seconds), but that's about it. Imho this latency cost is well worth having a
far more readable diff (I've had syntax highlighting turned off for years
because it's distracting to read code incorrectly marked as being full of
syntax errors).

Note that this patch itself has not been tested against master - we're running
an identical change against RB 1.5.