Enabling lexer guessing
Review Request #3156 — Created June 22, 2012 and discarded
Prolog is not perl, and pygments is smart enough to differentiate the two if we give it a chance. Pygment's *_lexer_for_filename() functions can guess the lexer based on two things... - the filename - the contents of the file, especially the shebang line The later of the two was disabled in commit '36b761a', making pygments pretty dumb about applying syntax highlighting. For instance, if you ever see red rectangles around '$' characters in perl code then that's what's happening. Both prolog and perl have a '.pl' file extension and without file contents pygments flips a coin and half the time guesses the former (it sorts based on heuristics, which are then both zero). If there's only one possible option (for instance, a '.java' extension) then this doesn't have any extra runtime cost. If there is ambiguity (such as perl vs prolog) then it's an O(n) operation over the file contents. To avoid having this (or the following pygments.highlight() call) take too long we're imposing a two second timeout. We're exercising this change with a ReviewBoard 1.5 instance and has slightly increased the ReviewBoardDiffFragment latency (p50 raised from 0.1 to 0.2 seconds), but that's about it. Imho this latency cost is well worth having a far more readable diff (I've had syntax highlighting turned off for years because it's distracting to read code incorrectly marked as being full of syntax errors). Note that this patch itself has not been tested against master - we're running an identical change against RB 1.5.
Description | From | Last Updated |
---|---|---|
Alphabetical order: Should be before the "subprocess" import. |
ME medanat | |
Use same line for import. i.e.: from pygments.lexers import get_lexer_for_filename, guess_lexer_for_filename ref: http://www.python.org/dev/peps/pep-0008/#imports |
ME medanat | |
No need for this empty line. |
ME medanat | |
Oops, minor mistake - lexer may be undefined or None at this point. |
AT atagar |
-
The reason we disabled this was that some file types (I recall XML files, particularly larger ones, being a key problem) take a *very* long time to guess. It's those file types that really caused the key problems causing us to switch. I'd appreciate some testing with XML files (ones with a couple hundred lines would be good), and Java files.
ME
-
Found a few issues that may need some attention.
-
-
Use same line for import. i.e.: from pygments.lexers import get_lexer_for_filename, guess_lexer_for_filename ref: http://www.python.org/dev/peps/pep-0008/#imports
-
-
These two calls can raise pygments.util.ClassNotFound. I'm not sure if you want to handle that at this point.
-
Not your code I know, but this should probably use: if hasattr(lexer, "add_filter") Rather than try/except AttributeError.
AT
- Change Summary:
-
Several changes, functional ones including... * Missing logging module import. I forgot that the branch I'm testing this with had other changes which included it. * NameError if the guess_lexer_for_filename() call timed out since lexer would then be undefined. * Moving alarm(0) call into finally clause so it's still ran when the prior code raises an exception. Our caller has apply_pygments() within a try/catch block but without this I'm pretty sure the alarm will then raise an unexpected TimeoutException later. Moved changes to a feature branch... https://github.com/atagar/ReviewBoard/tree/lexer_guessing
- Diff:
-
Revision 2 (+33 -13)