Summary

Implement Thumbnail Rendering for Text-ish File Attachments (.rst and .md)

Review Request #3454 — Created Oct. 24, 2012 and submitted Nov. 25, 2012, 11:56 a.m.

Information

Owner

slchen

Repository

Review Board

Branch

master

Bugs

Depends On

Reviewers

Groups

reviewboard

People

Description

Implemented thumbnail rendering for ReStructuredText (.rst) 
and MarkDown (.md) file attachment types. The rendered 
thumbnails are displayed as HTML scaled down by css (instead
of rendered as image thumbs), and are mem cached. 

Also changed the way MimetypeHandlers are registered to match
existing infrastructure for registering Review UIs, replacing
the old way of doing it through __subclass__

Testing Done

1. Ran unit tests for reviewboard.attachments.tests:

- Updated tests now all pass after adding setUp() and tearDown()
  to register / unregister the test Mimetype Handlers.

2. Manual testing on localhost:

Tested by uploading to a new review-request:
- raw .txt file
- raw .rst file
- raw .md file
- raw .jpg file
- raw .rst file with javascript injection
- raw .md file with javascript injection

Visual inspection of the rendered thumbnails all pass: 
- Thumbnails appear immediately (without a refresh), reflecting
  latest feature updates to master.
- Scaling, cropping and rendering all appear to be correct.
- Javascript injection correctly escaped for malicious
  .rst and .md files

See updated screenshots (2012-11-24, in the files attachment section)

Screenshots

Thumbnails for .rst / .md

2012-11-01: updated thumbnails screenshot

2012-11-21

Screenshot of Thumbnail Rendering on localhost 2012-11-23

Files

Issues

Description	From	Last Updated
imports are a bit wrong. The froms should all be together and the imports should all be together. They also …	chipx86	Oct. 26, 2012, 5:49 p.m.
This is probably not the right place to do this. We should probably define a new map below MIMETYPE_ICON_ALIASES that …	chipx86	Oct. 26, 2012, 5:49 p.m.
No blank line.	chipx86	Oct. 26, 2012, 5:51 p.m.
I'm not sure we want to escape this here? A better name would be "data".	chipx86	Oct. 26, 2012, 5:51 p.m.
Should use format strings, and no camel casing. You can also combine with the below. return mark_safe('%s%s%s' % (rst_parts['title'], ......) …	chipx86	Oct. 26, 2012, 6:46 p.m.
Same comments from above apply.	chipx86	Oct. 26, 2012, 6:13 p.m.
Can you limit the number of lines actually shown? Maybe do splitlines()[:5] on the result.	chipx86	Oct. 26, 2012, 6:46 p.m.
Should be: h1, h2, body, p { font-size: 30%; }	chipx86	Oct. 26, 2012, 6:48 p.m.
Two blank lines.	chipx86	Nov. 4, 2012, 9:35 a.m.
This comment should be made more generic, since this is no longer just about .rst files.	chipx86	Nov. 4, 2012, 9:35 a.m.
Blank line between these.	chipx86	Nov. 4, 2012, 9:35 a.m.
You can just combine these two lines.	chipx86	Nov. 4, 2012, 9:35 a.m.
Combine these too.	chipx86	Nov. 4, 2012, 9:35 a.m.
Same comments as above.	chipx86	Nov. 4, 2012, 9:35 a.m.
Two blank lines.	chipx86	Nov. 4, 2012, 9:35 a.m.
No blank line here.	chipx86	Nov. 4, 2012, 9:35 a.m.
Blank line.	chipx86	Nov. 4, 2012, 9:35 a.m.
So the issue with this is that it'll affect units for margins, padding, etc. of anything contained within. I'm concerned …	chipx86	Nov. 4, 2012, 9:35 a.m.
Should be moved above the 'from'. The imports should be grouped together and from statements should be grouped together (as …	AA aam1r	Nov. 18, 2012, 7:39 a.m.
This is all done below. It should only be returned once. However.. "file is closed" is pretty useless to all …	chipx86	Nov. 20, 2012, 10:45 p.m.
That doesn't really describe what the function does. """Returns the HTML for a thumbnail preview for a ReST file.""" would …	chipx86	Nov. 20, 2012, 10:45 p.m.
Not really a useful comment. That's pretty clear from the code.	chipx86	Nov. 20, 2012, 10:45 p.m.
self.FILE_CROP_CHAR_LIMIT	chipx86	Nov. 20, 2012, 10:45 p.m.
This should all be returned only once. Same comment as above about logging and a default.	chipx86	Nov. 20, 2012, 10:45 p.m.
Same comments for this file.	chipx86	Nov. 20, 2012, 10:45 p.m.
You should import all these in one go, not with different "from ..." lines.	chipx86	Nov. 21, 2012, 8:06 p.m.
Two blank lines.	chipx86	Nov. 21, 2012, 8:06 p.m.
It'd be nice to keep this alphabetical. Is the MimetypeHandler registration needed?	chipx86	Nov. 21, 2012, 8:07 p.m.
Can you prefix with a _ ? That indicates it's not a public function. Same with all others introduced that …	chipx86	Nov. 21, 2012, 8:12 p.m.
This doesn't need to go into specifics on how it's built, and probably shouldn't. Just say what it does. Also, …	chipx86	Nov. 21, 2012, 8:12 p.m.
It's a bit.. verbose. How about just generate_preview_html?	chipx86	Nov. 21, 2012, 8:12 p.m.
Should probably be on the class itself.	chipx86	Nov. 21, 2012, 8:12 p.m.
Use splitlines.	chipx86	Nov. 21, 2012, 8:12 p.m.
One statement.	chipx86	Nov. 21, 2012, 8:13 p.m.
We always import from django.conf.settings. It should be: from django.conf import settings However, you don't want this. See my comment …	chipx86	Nov. 24, 2012, 9:48 a.m.
Can we call these TEXT_CROP_NUM_LINES and TEXT_CROP_LINE_LENGTH? I think it would be a little more clear.	david	Nov. 23, 2012, 5:51 p.m.
Whitespace	david	Nov. 23, 2012, 5:51 p.m.
This comment really doesn't add much.	david	Nov. 23, 2012, 5:51 p.m.
This doesn't really fit our other logs. Instead, do: "Failed to read from file attachment %s: %s' % (self.attachment.pk, e) …	chipx86	Nov. 24, 2012, 8:27 a.m.
You should be able to just provide self._generate_thumbnail as the function without a lambda.	chipx86	Nov. 23, 2012, 5:52 p.m.
Rather than call this yourself, I think you should probably use Django's restructuredtext filter. You can import it from django.contrib.markup.templatetags. …	chipx86	Nov. 24, 2012, 9:52 a.m.
You never need to supply \ when it's in parens. Also, you have some indentation problems.	chipx86	Nov. 23, 2012, 5:52 p.m.
Indentation problem.	chipx86	Nov. 24, 2012, 8:25 a.m.
This should absolutely use Django's markdown filter (importable from the same place as above). That does all sorts of very …	chipx86	Nov. 24, 2012, 9:49 a.m.
Can you say "file extensions"? I had a moment of confusion where I thought "reviewboard extensions"	david	Nov. 23, 2012, 5:51 p.m.
Is this really something that we expect to ever want to change/override? It seems like it would be better to …	david	Nov. 23, 2012, 5:51 p.m.
Should be: BLAHBLAH = { 'file_insertion_enabled': 0, ... } There should be a trailing comma on all entries. I also …	chipx86	Nov. 23, 2012, 5:55 p.m.
Given that we're going to call publish_parts ourselves, we should just inline these settings and force them, rather than have …	chipx86	Nov. 24, 2012, 7:42 p.m.
What we probably want to do is just take the very latest version of markdown that's out right now and …	chipx86	Nov. 24, 2012, 7:43 p.m.
The formatting here could use some improvement. I'd also like to pull the settings dict inline (it's only used in …	david	Nov. 24, 2012, 8:21 p.m.

reviewboard/attachments/mimetypes.py (Diff revision 1)

Can also write this as to make it a bit shorter:

if extension.lower() == ".rst"

SL

slchen Oct. 25, 2012, 8:41 p.m.
```
Fixed!
```

Change Summary:

Updated Review-Request: Fixed imports and streamlined rst ext case check

Summary:

-	Added basic thumbnail rendering for RST and MD
+	Fixed imports and streamlined rst ext case check

Description:

~		[WIP]
	~	Fixed imports and streamlined rst ext case check
	+
	+	Fixed import order to comply with standards (I hope I got it right)
	+	Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Diff:

Revision 2 (+53 -1)

Show changes

reviewboard/attachments/mimetypes.py

Change Summary:

Added thumbnail rendering for MarkDown (.md) file attachments.

Tested locally and everything appears to work properly.

Next steps will be to optimize the rendering process for thumbnails (cropping raw text before rendering using docutil / markdown libs)

Summary:

-	Fixed imports and streamlined rst ext case check
+	Added thumbnail rendering for MarkDown attachments

Description:

	+	Added thumbnail rendering for MarkDown attachments
	+
	+	Added functionality for rendering MarkDown (.md) file attachments
	+	Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
	+	(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
	+	Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)
	+
		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Diff:

Revision 3 (+52 -2)

Show changes

reviewboard/attachments/mimetypes.py

Change Summary:

Added "zoomed out" thumbnails for RST and MD. The thumbnails for these file attachment types now render at 30% of normal size in addition to being cropped.

Summary:

-	Added thumbnail rendering for MarkDown attachments
+	Added "zoomed out" thumbnails for RST and MD

Description:

	+	Added "zoomed out" thumbnails for RST and MD
	+
	+	Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
	+	This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
	+	Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)
	+
		Added thumbnail rendering for MarkDown attachments

		Added functionality for rendering MarkDown (.md) file attachments
		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)

		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Diff:

Revision 4 (+68 -2)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

```
Can you add some screenshots?
```

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

imports are a bit wrong.

The froms should all be together and the imports should all be together. They also need to be alphabetical.

djblets is a third-party dependency (as far as RB is concerned), os it should be with the others above. No blank line.

The docutils import should probably just do an import of docutils.core and docutils.io. Importing core and io will make code harder to read. The names are just too common.

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

This is probably not the right place to do this.

We should probably define a new map below MIMETYPE_ICON_ALIASES that maps file extensions to mimetypes. Then we can do:

extension = os.path.splitext(attachment.filename)[1].lower()

if extension in MIMETYPE_EXTENSIONS:
    mimetype = MIMETYPE_EXTENSIONS[extension]


And that should be above the original parse_mime_type call.

reviewboard/attachments/mimetypes.py (Diff revision 4)
Show all issues
```
No blank line.
```

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

I'm not sure we want to escape this here?

A better name would be "data".

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

Should use format strings, and no camel casing. You can also combine with the below.

return mark_safe('<div class="file-thumbnail-clipped">%s%s%s</div>'
                 % (rst_parts['title'], ......)

What happens if we have a document without those bits? Can that ever happen?

slchen Oct. 26, 2012, 7:03 p.m.

Going with rst_parts['html_body'] which will always have that bit

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

Same comments from above apply.

reviewboard/attachments/mimetypes.py (Diff revision 4)

Show all issues

Can you limit the number of lines actually shown? Maybe do splitlines()[:5] on the result.

slchen Oct. 26, 2012, 7:03 p.m.

Limiting to 20 on the rationale that:
a) 20 lines is enough to render sufficient content to fill the thumbnails if we need to change the scaling ratio in the future, and
b) 20 lines is few enough such that there's only negligible efficiency difference between say 5 lines and 20 lines

reviewboard/static/rb/css/reviews.less (Diff revision 4)

Show all issues

Should be:

h1, h2, body, p {
  font-size: 30%;
}

slchen Oct. 26, 2012, 7:03 p.m.

changed to * { font-size: 60% } to scale everything down in the thumbnail area.

Change Summary:

Polished thumbnail rendering for .rst and .md: Changes made as per reviews.

Added screenshot and the raw files used in the review-request locally (note that the html files were not used, but only included for comparison purposes)

Summary:

-	Added "zoomed out" thumbnails for RST and MD
+	Polished thumbnail rendering for RST and MD

Description:

	+	Polished thumbnail rendering for RST and MD
	+
	+	Fixed import grouping / ordering
	+	Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as a better infrastructure for checking extensions that are not supported by mimeparse
	+	Added .md to this dictionary since MarkDown files are sometimes interpreted as 'octet-stream' by mimeparse
	+	Changed variable names to be more informative / PEP8-compliant
	+	Switched to using format strings for PEP8-compliancy
	+	Switched to using 'html_body' from the rst_parts dictionary from previously using 'title', 'subtitle', and 'body'.
	+	Now cropping the raw .rst and .md files up to the first 20 lines to improve efficiency in case the thumbnail needs to handle large text-ish files
	+	Now using * instead of h1, h2, body, p to scale down font-size in file-thumbnail-clipped for css in reviews.less
	+
		Added "zoomed out" thumbnails for RST and MD

		Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
		This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
		Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)

		Added thumbnail rendering for MarkDown attachments

		Added functionality for rendering MarkDown (.md) file attachments
		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)

		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Diff:

Revision 5 (+86 -2)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Added Files:

Screenshots:

+	Thumbnails for .rst / .md

Think we're just about there! There's some minor style stuff (mostly where blank lines should go, and some statements that could be easily combined).

reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
Two blank lines.
```

reviewboard/attachments/mimetypes.py (Diff revision 5)

Show all issues

This comment should be made more generic, since this is no longer just about .rst files.

reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
Blank line between these.
```
reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
You can just combine these two lines.
```
reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
Combine these too.
```
reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
Same comments as above.
```
reviewboard/attachments/mimetypes.py (Diff revision 5)
Show all issues
```
Two blank lines.
```
reviewboard/static/rb/css/reviews.less (Diff revision 5)
Show all issues
```
No blank line here.
```
reviewboard/static/rb/css/reviews.less (Diff revision 5)
Show all issues
```
Blank line.
```

reviewboard/static/rb/css/reviews.less (Diff revision 5)

Show all issues

So the issue with this is that it'll affect units for margins, padding, etc. of anything contained within. I'm concerned about that. I'd much rather we be explicit about the types we care about for now.

Change Summary:

- Adjusted thumbnail rendering + fixed style issues
- Updated screenshot

Summary:

-	Polished thumbnail rendering for RST and MD
+	Adjusted thumbnail rendering + fixed style issues

Description:

	+	Adjusted thumbnail rendering + fixed style issues
	+
	+	Adjusted the css in reviews.less for how thumbnails for text-ish files would be rendered: now uses explicit tags instead of the wildcard (*) tag
	+	Resized thumbnail scaling from 60% to 40% based on above change
	+	Fixed style issues in the code as per Christian's last code review
	+	Tentatively increased the truncation bound for the raw text-ish files from 20 to 200 prior to passing them off to docutil / markdown APIs for processing. This is to avoid formatting issues created by truncation of the files (e.g. truncating a table before it ends) from making their way into the cropped area of the thumbnails `
	+
	+	Merge branch 'master' into slchen-thumbnails
	+
	+
	+
		Polished thumbnail rendering for RST and MD

		Fixed import grouping / ordering
		Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as a better infrastructure for checking extensions that are not supported by mimeparse
		Added .md to this dictionary since MarkDown files are sometimes interpreted as 'octet-stream' by mimeparse
		Changed variable names to be more informative / PEP8-compliant
		Switched to using format strings for PEP8-compliancy
		Switched to using 'html_body' from the rst_parts dictionary from previously using 'title', 'subtitle', and 'body'.
		Now cropping the raw .rst and .md files up to the first 20 lines to improve efficiency in case the thumbnail needs to handle large text-ish files
		Now using * instead of h1, h2, body, p to scale down font-size in file-thumbnail-clipped for css in reviews.less

		Added "zoomed out" thumbnails for RST and MD

		Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
		This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
		Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)

		Added thumbnail rendering for MarkDown attachments

		Added functionality for rendering MarkDown (.md) file attachments
		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)

		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Testing Done:

~		Manual testing done on localhost. Both RST and MD thumbnails render as expected; nothing broken.
	~	Manual testing done on localhost.
	+
	+	Tested by uploading a raw .rst file and a raw .md file (copies of both of which are attached to this review-request, as are the full, uncropped html versions generated directly through docutils / markdown cmdline APIs).
	+
	+	Visual inspection of the rendered text-ish thumbnails passes: scaling, cropping and rendering all appear to be correct.
	+
	+	See latest attached screenshot for more details.

Diff:

Revision 6 (+88 -1)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Screenshots:

+	2012-11-01: updated thumbnails screenshot

reviewboard/attachments/mimetypes.py (Diff revision 6)

I wonder, what's the trade off between capping the max_length while reading the file in and reading the whole file then parsing it?

slchen Nov. 7, 2012, 8:12 p.m.

There is no trade off as I can see: capping the max_length while reading the file would be strictly better. +1 for pointing that out =) open an issue here and I'll fix it in the next update.

reviewboard/attachments/mimetypes.py (Diff revision 6)

Just curious, do you know when will the efficiency difference be not negligible when running functions like this? What will happen if it loads a very large troll file with no line break at all (or one at the very end)?

slchen Nov. 7, 2012, 8:12 p.m.

I think that's a design decision based on the following 2 questions:
- What is the likelihood that someone uploads a legitimate file with few lines, but each line having thousands (or more) of character columns? (In this case, imposing a limit on the length of each line would be undesired behaviour)
- What is the likelihood that a user of Reviewboard uploads a very large troll file? (I think this has fairly low likelihood: I can't think of a scenario where a malicious agent is able to plant a large file on the local system of a Reviewboard user with the intent for it to be uploaded as a file attachment to a review-request. I suppose it might be done as a prank upon discovery of the processing logic here in RB. However, in the event that it does happen, this would have worst case performance of a normal large uncropped text-ish file with many lines: possibly several seconds of UI delay)

chipx86 Nov. 7, 2012, 10:37 p.m.

It's entirely possible that someone could attempt to take down a server by having a line that large. It's also entirely possible it'd be accidental due to some auto-generated file. This happens in diffs. We should attempt to crop.

reviewboard/attachments/mimetypes.py (Diff revision 6)

Do you think it would be better if 200 is a variable instead? (like max_length) Should it be the same length for all text files supported?

slchen Nov. 7, 2012, 8:12 p.m.

I was debating this, actually. Magic numbers are evil, but I pondered if implementing a variable for just these 2 places would be overkill and actually reduce code readability ("Hrm, what's max_length in this context?"). Thoughts, mentors?

chipx86 Nov. 7, 2012, 10:37 p.m.
```
I'd prefer a variable.
```

reviewboard/attachments/mimetypes.py (Diff revision 6)

Show all issues

Should be moved above the 'from'. The imports should be grouped together and from statements should be grouped together (as Christian pointed out earlier)

aam1r Nov. 11, 2012, 12:46 p.m.

Not an issue -- spoke to Sampson about it. It's ordered by precedence:

1st group: Python specific
2nd group: 3rd party (including django, djblets)
3rd group: RB specific

This issue can be ignored and closed.

Change Summary:

Fixed file att. cropping for thumbnails

- Fixed file attachment cropping for thumbnails
- Previously: read entire file attachment, then slice the first 200
  lines into a string to be parsed by docutil or markdown. This opens a
  potential vulnerability when reading files with large column counts.
- Now: read up to the first `max_chars` characters from the file.
  `max_chars` is currently set to 2000.
- Expanded supported mimetypes for rst / md handlers to include
  'text/rst` and 'text/markdown` for forward compatibility in the
  event they become official mimetypes in the future.

Summary:

-	Adjusted thumbnail rendering + fixed style issues
+	Thumbnails Rendering for Text-ish File Attachments

Description:

	+	Merge branch 'master' into slchen-thumbnails
	+
	+
	+
	+	Fixed file att. cropping for thumbnails
	+
	+	Fixed file attachment cropping for thumbnails
	+	Previously: read entire file attachment, then slice the first 200 lines into a string to be parsed by docutil or markdown. This opens a potential vulnerability when reading files with large column counts.
	+	Now: read up to the first `max_chars` characters from the file. `max_chars` is currently set to 2000.
	+	Expanded supported mimetypes for rst / md handlers to include 'text/rst`and 'text/markdown` for forward compatibility in the event they become official mimetypes in the future.
	+
	+	Merge branch 'master' into slchen-thumbnails
	+
	+
	+
		Adjusted thumbnail rendering + fixed style issues

		Adjusted the css in reviews.less for how thumbnails for text-ish files would be rendered: now uses explicit tags instead of the wildcard (*) tag
		Resized thumbnail scaling from 60% to 40% based on above change
		Fixed style issues in the code as per Christian's last code review
		Tentatively increased the truncation bound for the raw text-ish files from 20 to 200 prior to passing them off to docutil / markdown APIs for processing. This is to avoid formatting issues created by truncation of the files (e.g. truncating a table before it ends) from making their way into the cropped area of the thumbnails `

		Merge branch 'master' into slchen-thumbnails



		Polished thumbnail rendering for RST and MD

		Fixed import grouping / ordering
		Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as a better infrastructure for checking extensions that are not supported by mimeparse
		Added .md to this dictionary since MarkDown files are sometimes interpreted as 'octet-stream' by mimeparse
		Changed variable names to be more informative / PEP8-compliant
		Switched to using format strings for PEP8-compliancy
		Switched to using 'html_body' from the rst_parts dictionary from previously using 'title', 'subtitle', and 'body'.
		Now cropping the raw .rst and .md files up to the first 20 lines to improve efficiency in case the thumbnail needs to handle large text-ish files
		Now using * instead of h1, h2, body, p to scale down font-size in file-thumbnail-clipped for css in reviews.less

		Added "zoomed out" thumbnails for RST and MD

		Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
		This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
		Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)

		Added thumbnail rendering for MarkDown attachments

		Added functionality for rendering MarkDown (.md) file attachments
		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)

		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Testing Done:

		Manual testing done on localhost.

		Tested by uploading a raw .rst file and a raw .md file (copies of both of which are attached to this review-request, as are the full, uncropped html versions generated directly through docutils / markdown cmdline APIs).

		Visual inspection of the rendered text-ish thumbnails passes: scaling, cropping and rendering all appear to be correct.

~		See latest attached screenshot for more details.
	~	See latest attached screenshot for more details - The results reflect the latest changes.

Diff:

Revision 7 (+90 -1)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Change Summary:

Implemented mem caching for textish thumbnails

- Implemented mem caching for ReStructuredTextMimeType and for
  MarkDownMimeType with cache_memoize
- The processed html thumbnails are cached, so this saves on:
  1) not having to re-read the raw text file each time
  2) not having to re-generate the html thumbnail each time
- During the course of implementing a unique key construction for
  caching the file attachment thumbnails, a bug was discovered: when
  deleting a file attachment, get_thumbnail() is somehow invoked, and
  this leads to a 500 internal server error as TextMimetype tries to
  read a file that is closed (doesn't exist). An issue will be opened
  separately to address this. For now, try-except blocks are set up for
  TextMimetype, ReStructuredTextMimeType, MarkDownMimeType to bandaid
  the issue in order to prevent the thumbnails projects from blocking.
- Fixed some style issues with variable names, class-vs.-instance types,
  parameter alignments, etc.

Summary:

-	Thumbnails Rendering for Text-ish File Attachments
+	Implemented mem caching for textish thumbnails

Description:

	+	Implemented mem caching for textish thumbnails
	+
	+	Implemented mem caching for ReStructuredTextMimeType and for MarkDownMimeType with cache_memoize
	+	The processed html thumbnails are cached, so this saves on: 1) not having to re-read the raw text file each time 2) not having to re-generate the html thumbnail each time
	+	During the course of implementing a unique key construction for caching the file attachment thumbnails, a bug was discovered: when deleting a file attachment, get_thumbnail() is somehow invoked, and this leads to a 500 internal server error as TextMimetype tries to read a file that is closed (doesn't exist). An issue will be opened separately to address this. For now, try-except blocks are set up for TextMimetype, ReStructuredTextMimeType, MarkDownMimeType to bandaid the issue in order to prevent the thumbnails projects from blocking.
	+	Fixed some style issues with variable names, class-vs.-instance types, parameter alignments, etc.
	+
		Merge branch 'master' into slchen-thumbnails



		Fixed file att. cropping for thumbnails

		Fixed file attachment cropping for thumbnails
		Previously: read entire file attachment, then slice the first 200 lines into a string to be parsed by docutil or markdown. This opens a potential vulnerability when reading files with large column counts.
		Now: read up to the first `max_chars` characters from the file. `max_chars` is currently set to 2000.
		Expanded supported mimetypes for rst / md handlers to include 'text/rst`and 'text/markdown` for forward compatibility in the event they become official mimetypes in the future.

		Merge branch 'master' into slchen-thumbnails



		Adjusted thumbnail rendering + fixed style issues

		Adjusted the css in reviews.less for how thumbnails for text-ish files would be rendered: now uses explicit tags instead of the wildcard (*) tag
		Resized thumbnail scaling from 60% to 40% based on above change
		Fixed style issues in the code as per Christian's last code review
		Tentatively increased the truncation bound for the raw text-ish files from 20 to 200 prior to passing them off to docutil / markdown APIs for processing. This is to avoid formatting issues created by truncation of the files (e.g. truncating a table before it ends) from making their way into the cropped area of the thumbnails `

		Merge branch 'master' into slchen-thumbnails



		Polished thumbnail rendering for RST and MD

		Fixed import grouping / ordering
		Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as a better infrastructure for checking extensions that are not supported by mimeparse
		Added .md to this dictionary since MarkDown files are sometimes interpreted as 'octet-stream' by mimeparse
		Changed variable names to be more informative / PEP8-compliant
		Switched to using format strings for PEP8-compliancy
		Switched to using 'html_body' from the rst_parts dictionary from previously using 'title', 'subtitle', and 'body'.
		Now cropping the raw .rst and .md files up to the first 20 lines to improve efficiency in case the thumbnail needs to handle large text-ish files
		Now using * instead of h1, h2, body, p to scale down font-size in file-thumbnail-clipped for css in reviews.less

		Added "zoomed out" thumbnails for RST and MD

		Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
		This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
		Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)

		Added thumbnail rendering for MarkDown attachments

		Added functionality for rendering MarkDown (.md) file attachments
		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)

		Fixed imports and streamlined rst ext case check

		Fixed import order to comply with standards (I hope I got it right)
		Use .lower to convert .rst file extension case to lower case for better code readability

		Added basic thumbnail rendering for RST and MD

		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Testing Done:

		Manual testing done on localhost.

		Tested by uploading a raw .rst file and a raw .md file (copies of both of which are attached to this review-request, as are the full, uncropped html versions generated directly through docutils / markdown cmdline APIs).

		Visual inspection of the rendered text-ish thumbnails passes: scaling, cropping and rendering all appear to be correct.

~		See latest attached screenshot for more details - The results reflect the latest changes.
	~	See attached screenshot for more details - The results reflect the latest changes (Implemented mem caching for textish thumbnails)

Diff:

Revision 8 (+128 -4)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Change Summary:

- Fixed style issues.
- Corrected review description to reflect what the entire commit is about.

Summary:

-	Implemented mem caching for textish thumbnails
+	Implement Thumbnail Rendering for Text-ish File Attachments (.rst and .md)

Description:

~		Implemented mem caching for textish thumbnails
~
~		Implemented mem caching for ReStructuredTextMimeType and for MarkDownMimeType with cache_memoize
~		The processed html thumbnails are cached, so this saves on: 1) not having to re-read the raw text file each time 2) not having to re-generate the html thumbnail each time
~		During the course of implementing a unique key construction for caching the file attachment thumbnails, a bug was discovered: when deleting a file attachment, get_thumbnail() is somehow invoked, and this leads to a 500 internal server error as TextMimetype tries to read a file that is closed (doesn't exist). An issue will be opened separately to address this. For now, try-except blocks are set up for TextMimetype, ReStructuredTextMimeType, MarkDownMimeType to bandaid the issue in order to prevent the thumbnails projects from blocking.
~		Fixed some style issues with variable names, class-vs.-instance types, parameter alignments, etc.
~
~		Merge branch 'master' into slchen-thumbnails
~
~
	~	Overview:
	~	- Implemented thumbnail rendering for ReStructuredText (.rst) and MarkDown (.md) file attachment types
	~	- Reads up to the first `FILE_CROP_CHAR_LIMIT` characters from the raw file attachments when processing the text-ish file for the first time. `FILE_CROP_CHAR_LIMIT` is currently set to 2000.
	~	- Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as an infrastructure for checking extensions that are not supported by
	~	mimeparse
	~	- Implemented mem caching for ReStructuredTextMimeType and for MarkDownMimeType with cache_memoize The processed html thumbnails are cached, so this saves on:
	~	1) not having to re-read the raw text file each time
	~	2) not having to re-generate the html thumbnail each time
	~	- During the course of implementing a unique key construction for caching the file attachment thumbnails, a bug was discovered: when deleting a file attachment, get_thumbnail() is somehow invoked, and this leads to a 500 internal server error as TextMimetype tries to read a file that is closed (doesn't exist). An issue will be opened separately to address this. For now, try-except blocks are set up for TextMimetype, ReStructuredTextMimeType, MarkDownMimeType to bandaid-fix the issue in order to prevent the thumbnails projects from blocking.
	~	- Implemented 'file-thumbnail-clipped' to be used for .rst and .md thumbnails in reviews.less. Currently the scale factor is 40%
-
-		Fixed file att. cropping for thumbnails
-
-		Fixed file attachment cropping for thumbnails
-		Previously: read entire file attachment, then slice the first 200 lines into a string to be parsed by docutil or markdown. This opens a potential vulnerability when reading files with large column counts.
-		Now: read up to the first `max_chars` characters from the file. `max_chars` is currently set to 2000.
-		Expanded supported mimetypes for rst / md handlers to include 'text/rst`and 'text/markdown` for forward compatibility in the event they become official mimetypes in the future.
-
-		Merge branch 'master' into slchen-thumbnails
-
-
-
-		Adjusted thumbnail rendering + fixed style issues
-
-		Adjusted the css in reviews.less for how thumbnails for text-ish files would be rendered: now uses explicit tags instead of the wildcard (*) tag
-		Resized thumbnail scaling from 60% to 40% based on above change
-		Fixed style issues in the code as per Christian's last code review
-		Tentatively increased the truncation bound for the raw text-ish files from 20 to 200 prior to passing them off to docutil / markdown APIs for processing. This is to avoid formatting issues created by truncation of the files (e.g. truncating a table before it ends) from making their way into the cropped area of the thumbnails `
-
-		Merge branch 'master' into slchen-thumbnails
-
-
-
-		Polished thumbnail rendering for RST and MD
-
-		Fixed import grouping / ordering
-		Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as a better infrastructure for checking extensions that are not supported by mimeparse
-		Added .md to this dictionary since MarkDown files are sometimes interpreted as 'octet-stream' by mimeparse
-		Changed variable names to be more informative / PEP8-compliant
-		Switched to using format strings for PEP8-compliancy
-		Switched to using 'html_body' from the rst_parts dictionary from previously using 'title', 'subtitle', and 'body'.
-		Now cropping the raw .rst and .md files up to the first 20 lines to improve efficiency in case the thumbnail needs to handle large text-ish files
-		Now using * instead of h1, h2, body, p to scale down font-size in file-thumbnail-clipped for css in reviews.less
-
-		Added "zoomed out" thumbnails for RST and MD
-
-		Implemented functionality to support "zoomed out" ("scaled down") thumbnails for RST and MD in addition to the cropping mechanism described by the css in reviews.less
-		This is done by creating a file-thumbnails-clipped css class in reviews.less and then changing the css class used by RST / MD mimetype handlers in mimetypes.py to div class=file-thumbnails-clipped
-		Scaling is arbitrarily set to 30% atm to achieve a thumbnail look while still giving some context / visual data through the thumbnail (Harder to make out what the thumbnail is when it gets smaller than 30%)
-
-		Added thumbnail rendering for MarkDown attachments
-
-		Added functionality for rendering MarkDown (.md) file attachments
-		Currently markdown is used to render the entire .md file. The result html is cropped using div class=file-thumbnail;
-		(Added to ToDo List:) should make a design decision to possibly crop these text-ish attachments before handing them off the libraries for html rendering for efficiency reasons (probably won't have to sacrifice much in terms of aesthetics for it)
-		Fixed import ordering grouping (realized that the imports that start with 'import' do not need to be grouped at the very start of the file, logical grouping of python / 3rd party / rb-specific takes precedence)
-
-		Fixed imports and streamlined rst ext case check
-
-		Fixed import order to comply with standards (I hope I got it right)
-		Use .lower to convert .rst file extension case to lower case for better code readability
-
-		Added basic thumbnail rendering for RST and MD
-
-		Added thumbnail rendering support for ReStructuredText (.rst, .RST) and MarkDown (.md)
-		Since mimeparse interprets RST files as 'octet-stream', added separate check for the attachment file extension. If the file ends in .rst or .RST, we assume that it has ReStructuredText MimeType (In fact, we set it manually)
-		Uses docutil's publisher's publish_parts() in conjunction with html_writer to render RST into HTML. The result is sectioned and ordered (by thumbnail relevance) into "title", "subtitle", and "body".
-		HTML result is closed by a file-thumbnail div to enforce visual consistency with other thumbnails. (@todo: possibly define a file-thumbnail-clipped for text-ish thumbnails later)
-		MarkDown infrastructure is in place. Currently it is rendered with the same approach as plaintext. MD-specific rendering to commence after sync with Aamir (Since he's has already started work with MarkDown for Pluggable UI)

Diff:

Revision 9 (+128 -4)

Show changes

	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

So I'm noticing a lot of this code is just duplicated. We should really do something about that. Here's what I propose.

TextMimetype says "Here's how to handle a text mimetype." That should really be the basis.

The ReST and Markdown ones are text renderers that know specialized types of text files. So they should really subclass TextMimetype.

TextMimetype should have a function that, given a blob of text, will returned the appropriate HTML. By default, it'd just return the text, escaped. The other handlers could override to do their thing.

One thing that might trip you up. Right now, mimetype class finding is using __subclasses__ to get the mimetype, which we really don't want long-term, and fails here. So that will need to basically do what ReviewUIs do, with the register/unregister functions.

I'd recommend making a separate change before this goes in that does exactly that. Removes the __subclasses__ iteration and instead introduces a _registered_mimetype_handlers list that it iterates. Then (un)register_mimetype_handler functions that manipulate. And something just like reviews/ui/__init__.py in attachments/__init__.py to do the registration. You should be able to just do that separately (in a new branch off master, without these changes), test it with text mimetypes, and then post for review.

You can also move this branch on top of that one until it goes in, so you can consolidate these classes and use the new registration functions.

reviewboard/attachments/mimetypes.py (Diff revision 9)

Show all issues

This is all done below. It should only be returned once.

However.. "file is closed" is pretty useless to all users. They'll just go, "what?"

This is an error. It should be logged, with enough context for us to know what failed and where, and the default thumbnail should be used instead.

reviewboard/attachments/mimetypes.py (Diff revision 9)

Show all issues

That doesn't really describe what the function does.

"""Returns the HTML for a thumbnail preview for a ReST file."""

would be more what we'd want to see.

Imagine you were viewing API docs, and only had function names and summaries. Write what you'd need to be able to easily find the appropriate function for your needs.

reviewboard/attachments/mimetypes.py (Diff revision 9)

Show all issues

Not really a useful comment. That's pretty clear from the code.

reviewboard/attachments/mimetypes.py (Diff revision 9)
Show all issues
```
self.FILE_CROP_CHAR_LIMIT
```

reviewboard/attachments/mimetypes.py (Diff revision 9)

Show all issues

This should all be returned only once.

Same comment as above about logging and a default.

reviewboard/attachments/mimetypes.py (Diff revision 9)
Show all issues
```
Same comments for this file.
```

Change Summary:

Revised review-request description and testing information

Description:

~		Overview:
~		- Implemented thumbnail rendering for ReStructuredText (.rst) and MarkDown (.md) file attachment types
~		- Reads up to the first `FILE_CROP_CHAR_LIMIT` characters from the raw file attachments when processing the text-ish file for the first time. `FILE_CROP_CHAR_LIMIT` is currently set to 2000.
~		- Added dictionary MIMETYPE_EXTENSIONS in mimetypes.py as an infrastructure for checking extensions that are not supported by
	~	Implemented thumbnail rendering for ReStructuredText (.rst)
	~	and MarkDown (.md) file attachment types. The rendered
	~	thumbnails are displayed as HTML scaled down by css (instead
	~	of rendered as image thumbs), and are mem cached.
-		mimeparse
-		- Implemented mem caching for ReStructuredTextMimeType and for MarkDownMimeType with cache_memoize The processed html thumbnails are cached, so this saves on:
-		1) not having to re-read the raw text file each time
-		2) not having to re-generate the html thumbnail each time
-		- During the course of implementing a unique key construction for caching the file attachment thumbnails, a bug was discovered: when deleting a file attachment, get_thumbnail() is somehow invoked, and this leads to a 500 internal server error as TextMimetype tries to read a file that is closed (doesn't exist). An issue will be opened separately to address this. For now, try-except blocks are set up for TextMimetype, ReStructuredTextMimeType, MarkDownMimeType to bandaid-fix the issue in order to prevent the thumbnails projects from blocking.
-		- Implemented 'file-thumbnail-clipped' to be used for .rst and .md thumbnails in reviews.less. Currently the scale factor is 40%

Testing Done:

~		Manual testing done on localhost.
	~	Manual testing on localhost.

~		Tested by uploading a raw .rst file and a raw .md file (copies of both of which are attached to this review-request, as are the full, uncropped html versions generated directly through docutils / markdown cmdline APIs).
	~	Tested by uploading a raw .rst file and a raw .md file

		Visual inspection of the rendered text-ish thumbnails passes: scaling, cropping and rendering all appear to be correct.

~		See attached screenshot for more details - The results reflect the latest changes (Implemented mem caching for textish thumbnails)
	~	See attached screenshot for more details - results reflect the latest changes (mem caching)

Change Summary:

    Refactored duplicate code to class TextMimeType
    
    - Refactored a lot of the duplicate code between the
      ReStructuredTextMimeType and MarkDownMimeType into TextMimeType and
      had them inherit from TextMimeType instead
    - Encapsulated key generation in its own method
    - Encapsulated generation of the HTML thumbnail based on string input
      into a method (to be overridden by .rst / .md handlers)

    Added mt handler registering to replace __subclass
    
    - Implemented infrastructure for registering MimetypeHandlers in much
      the same manner as registering Review UIs
    - Replaced the old way of doing it through __subclass__

Description:

		Implemented thumbnail rendering for ReStructuredText (.rst)
		and MarkDown (.md) file attachment types. The rendered
		thumbnails are displayed as HTML scaled down by css (instead
~		of rendered as image thumbs), and are mem cached.
	~	of rendered as image thumbs), and are mem cached.
	+
	+	Also changed the way MimetypeHandlers are registered to match
	+	existing infrastructure for registering Review UIs, replacing
	+	the old way of doing it through subclass

Testing Done:

		Manual testing on localhost.

~		Tested by uploading a raw .rst file and a raw .md file
	~	Tested by uploading:
	+	- raw .txt file
	+	- raw .rst file
	+	- raw .md file
	+	- raw .jpg file

~		Visual inspection of the rendered text-ish thumbnails passes: scaling, cropping and rendering all appear to be correct.
	~	Visual inspection of the rendered thumbnails all pass: scaling, cropping and rendering all appear to be correct.

~		See attached screenshot for more details - results reflect the latest changes (mem caching)
	~	See updated screenshots (2012-11-21)

Diff:

Revision 10 (+177 -23)

Show changes

	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Screenshots:

+	2012-11-21

reviewboard/attachments/__init__.py (Diff revision 10)

Show all issues

You should import all these in one go, not with different "from ..." lines.

reviewboard/attachments/__init__.py (Diff revision 10)
Show all issues
```
Two blank lines.
```

reviewboard/attachments/__init__.py (Diff revision 10)

Show all issues

It'd be nice to keep this alphabetical.

Is the MimetypeHandler registration needed?

slchen Nov. 22, 2012, 6:36 a.m.

Regarding MimetypeHandler: from looking at the code itself, the class currently seems to act as a fail-safe default in case no good match can be found for the mimetype. Its description seems to corroborate: "This class also acts as a generic handler for mimetypes not matched explicitly by any handler. Note that this is not the same as '*/*'."

I think it's safer overall to leave it in; thoughts?

reviewboard/attachments/mimetypes.py (Diff revision 10)

Show all issues

Can you prefix with a _ ? That indicates it's not a public function.

Same with all others introduced that aren't part of the ReviewUI API or something that needs to be overridden in a subclass.

Also, this function should be more specifically named. As it is, I have no idea what it's a cache key for, and in theory, we might want multiple keys. As far as I can tell, it's only used in one place, so many just keep that logic there unless a subclass has to override this.

reviewboard/attachments/mimetypes.py (Diff revision 10)

Show all issues

This doesn't need to go into specifics on how it's built, and probably shouldn't. Just say what it does.

Also, "&" shouldn't be used, and it should terminate in a period.

reviewboard/attachments/mimetypes.py (Diff revision 10)
Show all issues
```
It's a bit.. verbose. How about just generate_preview_html?
```
reviewboard/attachments/mimetypes.py (Diff revision 10)
Show all issues
```
Should probably be on the class itself.
```
reviewboard/attachments/mimetypes.py (Diff revision 10)
Show all issues
```
Use splitlines.
```
reviewboard/attachments/mimetypes.py (Diff revision 10)
Show all issues
```
One statement.
```

Change Summary:

Fixed issues as per ChipX86's most recent review.

Diff:

Revision 11 (+175 -25)

Show changes

	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Change Summary:

Fixed minor style issues warned by pep8

Diff:

Revision 12 (+175 -25)

Show changes

	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Change Summary:

Added protection against javascript injection as per ChipX86's note in IRC.

Included screenshot from the latest changes (Thumbnail rendering of .rst and .md files with js injected code)

Diff:

Revision 13 (+186 -26)

Show changes

	reviewboard/settings.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Added Files:

Screenshots:

+	Screenshot of Thumbnail Rendering on localhost 2012-11-23

Change Summary:

Updated testing information

Testing Done:

		Manual testing on localhost.

		Tested by uploading:
		- raw .txt file
		- raw .rst file
		- raw .md file
~		- raw .jpg file
	~	- raw .jpg file
	+	- raw .rst file with javascript injection
	+	- raw .md file with javascript injection

~		Visual inspection of the rendered thumbnails all pass: scaling, cropping and rendering all appear to be correct.
	~	Visual inspection of the rendered thumbnails all pass:
	+	- Scaling, cropping and rendering all appear to be correct.
	+	- Javascript injection correctly escaped .rst and .md files

~		See updated screenshots (2012-11-21)
	~	See updated screenshots (2012-11-23)

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

Can we call these TEXT_CROP_NUM_LINES and TEXT_CROP_LINE_LENGTH? I think it would be a little more clear.

reviewboard/attachments/mimetypes.py (Diff revision 13)
Show all issues
```
Whitespace
```
reviewboard/attachments/mimetypes.py (Diff revision 13)
Show all issues
```
This comment really doesn't add much.
```

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

Can you say "file extensions"? I had a moment of confusion where I thought "reviewboard extensions"

reviewboard/settings.py (Diff revision 13)

Show all issues

Is this really something that we expect to ever want to change/override? It seems like it would be better to just inline it in the code where it's used.

chipx86 Nov. 23, 2012, 12:37 a.m.

That's what Django looks for when using the restructuredtext filter, which does some security checks for us.

chipx86 Nov. 23, 2012, 12:46 a.m.

(That is, that'll be used if my recommendations are applied.)

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

We always import from django.conf.settings. It should be:

from django.conf import settings

However, you don't want this. See my comment below.

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

This doesn't really fit our other logs. Instead, do:

"Failed to read from file attachment %s: %s' % (self.attachment.pk, e)

Also note that you don't need to do str(...), since %s does that for you.

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

You should be able to just provide self._generate_thumbnail as the function without a lambda.

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

Rather than call this yourself, I think you should probably use Django's restructuredtext filter. You can import it from django.contrib.markup.templatetags.

That'll reference RESTRUCTUREDTEXT_FILTER_SETTINGS itself, and do all the other right things.

One thing though is that that will return parts["fragment"], not parts["html_body"]. I'm curious what that looks like and how it's different.

slchen Nov. 24, 2012, 9:53 a.m.

See issue below with markdown and the comments about discussion regarding this with David.

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

You never need to supply \ when it's in parens.

Also, you have some indentation problems.

reviewboard/attachments/mimetypes.py (Diff revision 13)
Show all issues
```
Indentation problem.
```

reviewboard/attachments/mimetypes.py (Diff revision 13)

Show all issues

This should absolutely use Django's markdown filter (importable from the same place as above). That does all sorts of very important security checks that we do not. Certain versions of markdown have different vulnerabilities, and Django has a lot of work to check for those. We want this.

slchen Nov. 24, 2012, 9:52 a.m.

I wasn't sure exactly how to approach this so I talked about it with David in IRC. After discussion, I think the best solution we decided on was to simply re-implement the markdown version checks in mimetypes.py (See IRC log Nov. 23rd - 18:00:00 - 19:00:00 for additional details).

reviewboard/settings.py (Diff revision 13)

Show all issues

Should be:

BLAHBLAH = {
    'file_insertion_enabled': 0,
    ...
}

There should be a trailing comma on all entries.

I also think this should be up higher, out of the way. Maybe right under WEB_API_ENCODERS.

Change Summary:

Added version checks for markdown
- Added version checks for markdown to mimick the version checking for
  markdown done by the to-be deprecated contrib.markup module in Django
- Added similar checks for restructuredtext
- Fixed various style issues
- Uploaded screenshot to compare what rst thumbnails look like when "html_body" is specified in the docutil API vs. "fragments"

Testing Done:

		Manual testing on localhost.

		Tested by uploading:
		- raw .txt file
		- raw .rst file
		- raw .md file
		- raw .jpg file
		- raw .rst file with javascript injection
		- raw .md file with javascript injection

		Visual inspection of the rendered thumbnails all pass:
		- Scaling, cropping and rendering all appear to be correct.
		- Javascript injection correctly escaped .rst and .md files

~		See updated screenshots (2012-11-23)
	~	See updated screenshots (2012-11-24)

Diff:

Revision 14 (+211 -25)

Show changes

	reviewboard/settings.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Added Files:

reviewboard/attachments/mimetypes.py (Diff revision 14)

Show all issues

Given that we're going to call publish_parts ourselves, we should just inline these settings and force them, rather than have them in the settings file.

reviewboard/attachments/mimetypes.py (Diff revision 14)

Show all issues

What we probably want to do is just take the very latest version of markdown that's out right now and just make that a required dependency in setup.py. Django has to be more flexible because it doesn't depend on setup.py, but we can ensure we have exactly what we're expecting. That'll let us get rid of all these version checks.

Change Summary:

Added markdown version requirement for RB:
- As per ChipX86's comments, added latest current markdown version to
  RB's list of required dependencies
- Removed RESTRUCTUREDTEXT_FILTER_SETTINGS from settings.py and inlined
  the settings as a class dictionary for ReStructuredTextMimetype
- Fixed minor style issue to prevent settings.py from showing
  up in git diff

Testing Done:

		Manual testing on localhost.

~		Tested by uploading:
	~	Tested by uploading to a new review-request:
		- raw .txt file
		- raw .rst file
		- raw .md file
		- raw .jpg file
		- raw .rst file with javascript injection
		- raw .md file with javascript injection

		Visual inspection of the rendered thumbnails all pass:
	+	- Thumbnails appear immediately, reflecting latest feature updates
	+	to master.
		- Scaling, cropping and rendering all appear to be correct.
~		- Javascript injection correctly escaped .rst and .md files
	~	- Javascript injection correctly escaped for malicious
	+	.rst and .md files

~		See updated screenshots (2012-11-24)
	~	See updated screenshots (2012-11-24, in the files attachment section)

Diff:

Revision 15 (+186 -25)

Show changes

	setup.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

reviewboard/attachments/mimetypes.py (Diff revision 15)

Show all issues

The formatting here could use some improvement. I'd also like to pull the settings dict inline (it's only used in one place):

Also, in the settings dict, should the 0s and 1s be False and True?

settings = {
    'file_insertion_enabled': 0,
    'raw_enabled': 0,
    '_disable_config': 1,
}
return docutils.core.publish_parts(
    source=smart_str(data_string),
    writer_name='html4css1',
    settings_overrides=settings)['html_body']

slchen Nov. 24, 2012, 8:22 p.m.

Fixed style issues; 

On the docutils docs page regarding the same input sanitation issue, they used 0's and 1's: http://docutils.sourceforge.net/docs/howto/security.html Should I leave it as it is?

david Nov. 24, 2012, 8:34 p.m.

It looks like in the code that uses things like file_insertion_enabled, it's checking truthiness, so I think we should use True/False (assuming it works).

slchen Nov. 24, 2012, 8:41 p.m.

Changed to True/False and did a quick test with both malicious and non-malicious .rst file attachment - it works =)

Change Summary:

Fixed style issues in mimetypes.py
- Moved RESTRUCTUREDTEXT_FILTER_SETTINGS inline and renamed it to
  settings
- Fixed return statement alignment issues in ReStructuredTextMimetype

Diff:

Revision 16 (+186 -25)

Show changes

	setup.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Change Summary:

Converted 0/1's to True/False's for rst settings
- Minor change, see comments here for more information:
  http://reviews.reviewboard.org/r/3454/#comment7581

Diff:

Revision 17 (+186 -25)

Show changes

	setup.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/static/rb/css/reviews.less

Ship it!

Ship It!

david Nov. 24, 2012, 9:07 p.m.

Actually, I was about to push this change, but it breaks a couple tests:

======================================================================
FAIL: Testing matching of factory method for mimetype handlers.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/david/Projects/reviewboard/reviewboard/attachments/tests.py", line 83, in test_handler_factory
    self.assertEqual(self._handler_for("test/abc"), TestAbcMimetype)
AssertionError: None != <class 'reviewboard.attachments.tests.TestAbcMimetype'>

======================================================================
FAIL: Testing precedence of factory method for mimetype handlers.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/david/Projects/reviewboard/reviewboard/attachments/tests.py", line 95, in test_handler_factory_precedence
    self.assertEqual(self._handler_for("test2/def"), StarDefMimetype)
AssertionError: None != <class 'reviewboard.attachments.tests.StarDefMimetype'>

----------------------------------------------------------------------
Ran 575 tests in 138.740s

FAILED (SKIP=35, failures=2)

slchen Nov. 25, 2012, 10:36 a.m.

Fixed!

...
255 static files copied.
Creating test database for alias 'default'...
Testing uploading a file attachment. ... ok
Testing matching of factory method for mimetype handlers. ... ok
Testing precedence of factory method for mimetype handlers. ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.251s

OK
Destroying test database for alias 'default'...

Change Summary:

Revised attachment tests for thumbnails project

- Since we changed how MimetypeHandler finds the the best fit handler
  for a particular mimetype (Old way: through __subclass__; New way:
  through explicitly registering a subclass of MimetypeHandler),
  attachment/tests.py needed up be updated to register the test handler
  used in the tests.
- Added the registering / unregistering of the test handlers in setUp()
  and tearDown() functions

Testing Done:

~		Manual testing on localhost.
	~	Ran unit tests for reviewboard.attachments.tests:
	+
	+	Updated tests now all pass after adding setUp() and tearDown() to register / unregister the test Mimetype Handlers.
	+	Manual testing on localhost:

		Tested by uploading to a new review-request:
		- raw .txt file
		- raw .rst file
		- raw .md file
		- raw .jpg file
		- raw .rst file with javascript injection
		- raw .md file with javascript injection

		Visual inspection of the rendered thumbnails all pass:
~		- Thumbnails appear immediately, reflecting latest feature updates
~		to master.
	~	- Thumbnails appear immediately (without a refresh), reflecting
	~	latest feature updates to master.
		- Scaling, cropping and rendering all appear to be correct.
		- Javascript injection correctly escaped for malicious
		.rst and .md files

		See updated screenshots (2012-11-24, in the files attachment section)

Diff:

Revision 18 (+216 -26)

Show changes

	setup.py
	reviewboard/attachments/__init__.py
	reviewboard/attachments/mimetypes.py
	reviewboard/attachments/tests.py
	reviewboard/static/rb/css/reviews.less

Ship it!

```
Ship It!
```

Status: Closed (submitted)

Change Summary:

Pushed to release-1.7.x (b898885). Thanks!

reviewboard/attachments/mimetypes.py (Diff revision 18)

This is probably too late since this is already merged in but "enable_attributes" default to False when safe_mode is enabled.

Docs: http://packages.python.org/Markdown/reference.html#enable_attributes

slchen Nov. 28, 2012, 9:02 p.m.

I actually looked into this exact issue at the time. The reference pointed to by ChipX86 here: http://security.stackexchange.com/questions/14664/how-do-i-use-markdown-securely says the recommended usage is to set enable_attributes=False explicitly because of Markdown versioning issues. While atm we make markdown>=2.1 a required dependency and it sets enabled_attributes=False by default when setting safe_mode=True, I thought it was safer to just set enable_attributes=False explicitly anyway in case that changes in future Markdown versions.