Implement Thumbnail Rendering for Text-ish File Attachments (.rst and .md)

Review Request #3454 — Created Oct. 24, 2012 and submitted — Latest diff uploaded

Information

Review Board
master

Reviewers

Implemented thumbnail rendering for ReStructuredText (.rst) 
and MarkDown (.md) file attachment types. The rendered 
thumbnails are displayed as HTML scaled down by css (instead
of rendered as image thumbs), and are mem cached. 

Also changed the way MimetypeHandlers are registered to match
existing infrastructure for registering Review UIs, replacing
the old way of doing it through __subclass__
1. Ran unit tests for reviewboard.attachments.tests:

- Updated tests now all pass after adding setUp() and tearDown()
  to register / unregister the test Mimetype Handlers.

2. Manual testing on localhost:

Tested by uploading to a new review-request:
- raw .txt file
- raw .rst file
- raw .md file
- raw .jpg file
- raw .rst file with javascript injection
- raw .md file with javascript injection

Visual inspection of the rendered thumbnails all pass: 
- Thumbnails appear immediately (without a refresh), reflecting
  latest feature updates to master.
- Scaling, cropping and rendering all appear to be correct.
- Javascript injection correctly escaped for malicious
  .rst and .md files

See updated screenshots (2012-11-24, in the files attachment section)

The reStructuredText_ Cheat Sheet: Syntax Reminders

Info:See <http://docutils.sf.net/rst.html> for introductory docs.
Author: David Goodger <goodger@python.org>
Date: 2012-06-22
Revision: 7463
Description:This is a "docinfo block", or bibliographic field list

Section Structure

Section titles are underlined or overlined & underlined.

Body Elements

Grid table:

System Message: ERROR/3 (<string>, line 18)

Malformed table.

+--------------------------------+-----------------------------------+
| Paragraphs are flush-left,     | Literal block, preceded by "::":: |
| separated by blank lines.      |                                   |
|                                |     Indented                      |
|     Block quotes are indented. |                                   |
+--------------------------------+ or::                              |
| >>>

System Message: WARNING/2 (<string>, line 25)

Blank line required after table.

Docutils System Messages

System Message: ERROR/3 (<string>, line 3); backlink

Unknown target name: "restructuredtext".
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.9.1: http://docutils.sourceforge.net/" />
<title>The reStructuredText Cheat Sheet: Syntax Reminders</title>
<meta name="author" content="David Goodger &lt;goodger&#64;python.org&gt;" />
<meta name="date" content="2012-06-22" />
<style type="text/css">
/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 7434 2012-05-11 21:06:27Z milde $
:Copyright: This stylesheet has been placed in the public domain.
Default cascading style sheet for the HTML output of Docutils.
See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/
/* used to remove borders from tables and images */
.borde

sack

s(hortcut)-ack: a faster way to use ack (and grep)!

sack acts as a wrapper for ack to provide convenience for the repetitive menial tasks

What is ack?

ack is the replacement for grep!

Here is why you should use ack over grep: http://betterthangrep.com/

(Now that you are sold on ack) To install ack in a one-line script:

curl http://betterthangrep.com/ack-standalone > ~/bin/ack && chmod 0755 !#:3

How to Install:

Open a terminal and run the following:

git clone git@github.com:sampson-chen/sack.git && cd sack && chmod +x install_sack.sh && ./install_sack.sh

How to Use:

You can use sack in exactly the same way you currently use ack! Woot!

For why sack is faster (and more fun!) to use, read on about its main / side features...

Main Feature 1 - Shortcuts:

sack prefixes shortcut tags to ack's search results:

user@linux:~/src$ sack thumbnail

... (omitted results)
/home/user/src/reviewboard/reviewboard/attachments/mimetypes.py
[13] 6
<h1>sack</h1>
<p>s(hortcut)-ack: a faster way to use ack (and grep)!</p>
<p>sack acts as a wrapper for ack to provide convenience for the repetitive menial tasks</p>
<h2>What is ack?</h2>
<p>ack is the replacement for grep!</p>
<p>Here is why you should use ack over grep: http://betterthangrep.com/</p>
<p>(Now that you are sold on ack) To install ack in a one-line script:</p>
<pre><code>curl http://betterthangrep.com/ack-standalone &gt; ~/bin/ack &amp;&amp; chmod 0755 !#:3
</code></pre>
<h2>How to Install:</h2>
<p>Open a terminal and run the following:</p>
<pre><code>git clone git@github.com:sampson-chen/sack.git &amp;&amp; cd sack &amp;&amp; chmod +x install_sack.sh &amp;&amp; ./install_sack.sh
</code></pre>
<h2>How to Use:</h2>
<p>You can use sack in exactly the same way you currently use ack! Woot!</p>
<p>For why sack is faster (and more fun!) to use, read on about its main / side features...</p>
<h2>Main Feature 1 - Shortcuts:</h2>
<p>sack prefixes shortcut tags to ack's search res
"Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod 
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
 quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequa
 t. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugia
 t nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culp
 a qui officia deserunt mollit anim id est laborum."

Review Board

Review Board is a web-based review tool designed to help projects and companies
keep track of pending code changes and make reviews of code, graphics, and more
much less painful and time-consuming. It's generic enough to use in any
project, and works at companies and organizations of any size.

Information on usage and installation can be found on
http://www.reviewboard.org/docs/manual/dev/

General information on the project is available on
http://www.reviewboard.org/


Diff Revision 9

This is not the most recent revision of the diff. The latest diff is revision 18. See what's changed.

orig
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
reviewboard/attachments/mimetypes.py
Revision efaf20f4b72376e8438396ba35eba9a49f2172ee New Change
1
import mimeparse
1
import os
2

    
   
2

   
3
from django.contrib.staticfiles.templatetags.staticfiles import static
3
from django.contrib.staticfiles.templatetags.staticfiles import static
4
from django.utils.html import escape
4
from django.utils.html import escape
5
from django.utils.safestring import mark_safe
5
from django.utils.safestring import mark_safe

    
   
6
from djblets.util.misc import cache_memoize
6
from djblets.util.templatetags.djblets_images import thumbnail
7
from djblets.util.templatetags.djblets_images import thumbnail
7
from pipeline.storage import default_storage
8
from pipeline.storage import default_storage

    
   
9
import docutils.core

    
   
10
import markdown

    
   
11
import mimeparse
8

    
   
12

   
9

    
   
13

   
10
def score_match(pattern, mimetype):
14
def score_match(pattern, mimetype):
11
    """Returns a score for how well the pattern matches the mimetype.
15
    """Returns a score for how well the pattern matches the mimetype.
12

    
   
16

   
81 lines
def get_best_handler(cls, mimetype):
94

    
   
98

   
95
    @classmethod
99
    @classmethod
96
    def for_type(cls, attachment):
100
    def for_type(cls, attachment):
97
        """Returns the handler that is the best fit for provided mimetype."""
101
        """Returns the handler that is the best fit for provided mimetype."""
98
        mimetype = mimeparse.parse_mime_type(attachment.mimetype)
102
        mimetype = mimeparse.parse_mime_type(attachment.mimetype)

    
   
103
        

    
   
104
        # Override the mimetype if mimeparse is known to misinterpret this

    
   
105
        # type of file as `octet-stream`

    
   
106
        extension = os.path.splitext(attachment.filename)[1]

    
   
107
        

    
   
108
        if extension in MIMETYPE_EXTENSIONS:

    
   
109
            mimetype = MIMETYPE_EXTENSIONS[extension]

    
   
110

   
99
        score, handler = cls.get_best_handler(mimetype)
111
        score, handler = cls.get_best_handler(mimetype)
100
        return handler(attachment, mimetype)
112
        return handler(attachment, mimetype)
101

    
   
113

   
102
    def get_icon_url(self):
114
    def get_icon_url(self):
103
        mimetype_string = self.mimetype[0] + '/' + self.mimetype[1]
115
        mimetype_string = self.mimetype[0] + '/' + self.mimetype[1]
43 lines
class TextMimetype(MimetypeHandler):
147
        """Returns the first few truncated lines of the file."""
159
        """Returns the first few truncated lines of the file."""
148
        height = 4
160
        height = 4
149
        length = 50
161
        length = 50
150

    
   
162

   
151
        f = self.attachment.file.file
163
        f = self.attachment.file.file

    
   
164

   

    
   
165
        try:
152
        preview = escape(f.readline()[:length])
166
>>>>        preview = escape(f.readline()[:length])
153
        for i in range(height - 1):
167
>>>>        for i in range(height - 1):
154
            preview = preview + '<br />' + escape(f.readline()[:length])
168
>>>>            preview = preview + '<br />' + escape(f.readline()[:length])

    
   
169
        except (ValueError, IOError), e:
1

    
   
170
            f.close()

    
   
171
            return mark_safe('<pre class="file-thumbnail">%s</pre>'

    
   
172
                             % "(file is closed)")

    
   
173

   
155
        f.close()
174
        f.close()
156

    
   
175

   
157
        return mark_safe('<pre class="file-thumbnail">%s</pre>'
176
        return mark_safe('<pre class="file-thumbnail">%s</pre>'
158
                         % preview)
177
                         % preview)
159

    
   
178

   
160

    
   
179

   

    
   
180
class ReStructuredTextMimeType(MimetypeHandler):

    
   
181
    """Handles ReStructuredText (.rst) mimetypes."""

    
   
182
    supported_mimetypes = ['text/x-rst', 'text/rst']

    
   
183
    FILE_CROP_CHAR_LIMIT = 2000

    
   
184

   

    
   
185
    def generate_thumbnail(self):
1

    
   
186
        """Actual logic for generating the thumbnail from raw text"""

    
   
187
        # Read up to 'FILE_CROP_CHAR_LIMIT' number of characters from

    
   
188
        # the file attachment to prevent long reads caused by malicious

    
   
189
        # or auto-generated files.

    
   
190
        # (This is a more flexible and simpler approach than

    
   
191
        # truncating past a certain number of lines + truncating past

    
   
192
        # a certain number of chars for each line)

    
   
193
        f = self.attachment.file.file

    
   
194

   

    
   
195
        # Enclosure in try-except block in case the read fails

    
   
196
        try:
1

    
   
197
            # Read up to FILE_CROP_CHAR_LIMIT
1

    
   
198
            data_string = f.read(ReStructuredTextMimeType.FILE_CROP_CHAR_LIMIT)

    
   
199
            f.close()

    
   
200
            rst_parts = docutils.core.publish_parts(data_string, writer_name='html')
1

    
   
201
            return mark_safe('<div class="file-thumbnail-clipped">%s</div>'

    
   
202
                             % rst_parts['html_body'])

    
   
203
        except (ValueError, IOError), e:

    
   
204
            f.close()

    
   
205
            return mark_safe('<pre class="file-thumbnail-clipped">%s</pre>'

    
   
206
                             % "(file is closed)")

    
   
207

   

    
   
208
    def get_thumbnail(self):

    
   
209
        """Returns clipped portions of the rendered .rst file as html"""

    
   
210
        # Caches the generated thumbnail to eliminate the need on each page

    
   
211
        # reload to:

    
   
212
        # 1) re-read the file attachment

    
   
213
        # 2) re-generate the html based on the data read

    
   
214
        return cache_memoize('file-attachment-thumbnail-text-rst-html-%s'

    
   
215
                             % self.attachment.pk,

    
   
216
                             lambda: self.generate_thumbnail())

    
   
217
    

    
   
218

   
1

    
   
219
class MarkDownMimeType(MimetypeHandler):

    
   
220
    """Handles MarkDown (.md) mimetypes."""

    
   
221
    supported_mimetypes = ['text/x-markdown', 'text/markdown']

    
   
222
    FILE_CROP_CHAR_LIMIT = 2000

    
   
223

   

    
   
224
    def generate_thumbnail(self):

    
   
225
        """Actual logic for generating the thumbnail from raw text"""

    
   
226
        # Read up to 'FILE_CROP_CHAR_LIMIT' number of characters from

    
   
227
        # the file attachment to prevent long reads caused by malicious

    
   
228
        # or auto-generated files.

    
   
229
        # (This is a more flexible and simpler approach than

    
   
230
        # truncating past a certain number of lines + truncating past

    
   
231
        # a certain number of chars for each line)

    
   
232
        f = self.attachment.file.file

    
   
233

   

    
   
234
        # Enclosure in try-except block in case the read fails

    
   
235
        try:

    
   
236
            # Read up to FILE_CROP_CHAR_LIMIT

    
   
237
            data_string = f.read(MarkDownMimeType.FILE_CROP_CHAR_LIMIT)

    
   
238
            f.close()

    
   
239
            return mark_safe('<div class="file-thumbnail-clipped">%s</div>'

    
   
240
                             % markdown.markdown(data_string))

    
   
241
        except (ValueError, IOError), e:

    
   
242
            f.close()

    
   
243
            return mark_safe('<pre class="file-thumbnail-clipped">%s</pre>'

    
   
244
                             % "(file is closed)")

    
   
245

   

    
   
246
    def get_thumbnail(self):

    
   
247
        """Returns clipped portion of the start of rendered .md file as html"""

    
   
248
        # Caches the generated thumbnail to eliminate the need on each page

    
   
249
        # reload to:

    
   
250
        # 1) re-read the file attachment

    
   
251
        # 2) re-generate the html based on the data read

    
   
252
        return cache_memoize('file-attachment-thumbnail-text-md-html-%s'

    
   
253
                             % self.attachment.pk,

    
   
254
                             lambda: self.generate_thumbnail())

    
   
255

   

    
   
256

   
161
# A mapping of mimetypes to icon names.
257
# A mapping of mimetypes to icon names.
162
#
258
#
163
# Normally, a mimetype will be normalized and looked up in our bundled
259
# Normally, a mimetype will be normalized and looked up in our bundled
164
# list of mimetype icons. However, if the mimetype is in this list, the
260
# list of mimetype icons. However, if the mimetype is in this list, the
165
# associated name is used instead.
261
# associated name is used instead.
92 lines
class TextMimetype(MimetypeHandler):
def get_thumbnail(self):
258
    'text/x-python': 'text-x-script',
354
    'text/x-python': 'text-x-script',
259
    'text/x-sh': 'text-x-script',
355
    'text/x-sh': 'text-x-script',
260
    'text/x-vcalendar': 'x-office-calendar',
356
    'text/x-vcalendar': 'x-office-calendar',
261
    'text/x-vcard': 'x-office-address-book',
357
    'text/x-vcard': 'x-office-address-book',
262
    'text/x-zsh': 'text-x-script',
358
    'text/x-zsh': 'text-x-script',

    
   
359
}

    
   
360

   

    
   
361

   

    
   
362
# A mapping of extensions to mimetypes

    
   
363
#

    
   
364
# Normally mimetypes are determined by mimeparse, then matched with

    
   
365
# one of the supported mimetypes classes through a best-match algorithm.

    
   
366
# However, mimeparse isn't always able to catch the unofficial mimetypes

    
   
367
# such as 'text/x-rst' or 'text/x-markdown', so we just go by the

    
   
368
# extension name.

    
   
369
MIMETYPE_EXTENSIONS = {

    
   
370
    '.rst': (u'text', u'x-rst', {}),

    
   
371
    '.md': (u'text', u'x-markdown', {}),
263
}
372
}
reviewboard/static/rb/css/reviews.less
Loading...