Fill-database

Review Request #2104 — Created Feb. 6, 2011 and submitted

sws587
Review Board
fill-database
reviewboard
Here is the fully functioning fill-database script.

Command and arguments take the form of:
./reviewboard/manage.py fill-database --users=N1 --review-requests=N2:N3 --diffs=N4:N5 --reviews=N6:N7 --diff-comments=N8:N9

Added: a password can now be set for all users created using --password=TEXT
fill-database --users=10 --password=custom1

For instance:
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --diffs=1 --reviews=1 --diff-comments=1

or

./reviewboard/manage.py fill-database --users=1 --review-requests=1:10 --diffs=1:5 --reviews=1:5 --diff-comments=1:5
Testing for fill-database
March 20, 2011 tests added:
Tested missing diff directory, missing diff files, & missing repo directory
Tested causing error cases where try/except clauses were changed

March 13, 2011 Tests added:
Tested running exceptions, where path files could not be found
Tested adding the password parameter (--password=test5)
Tested adding random lorem ipsum wording to summary & description

In the cases below where the database is empty a superuser has been created

Test adding only one user on empty database
./reviewboard/manage.py fill-database --users=1

Test adding many users on an empty database
./reviewboard/manage.py fill-database --users=100
./reviewboard/manage.py fill-database --users=1000

Test adding large num of users
./reviewboard/manage.py fill-database --users=10000

Test adding users and arguments other than review-request 
(shouldn't add anything but users)
./reviewboard/manage.py fill-database --users=5 --diffs=1
./reviewboard/manage.py fill-database --users=5 --reviews=1
./reviewboard/manage.py fill-database --users=5 --diffs=1 --reviews=1
./reviewboard/manage.py fill-database --users=5 --diff-comments=1
./reviewboard/manage.py fill-database --users=5 --diffs=1 --reviews=1 --diff-comments=1

Test adding users and review-requests to emtpy database
./reviewboard/manage.py fill-database --users=1 --review-requests=1
./reviewboard/manage.py fill-database --users=1 --review-requests=1:10
./reviewboard/manage.py fill-database --users=100 --review-requests=1:10

Test adding arguments other than diffs
(shouldn't add anything but users & review-requests)
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --reviews=100
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --diff-comments=100
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --reviews=100 --diff-comments=100

Test adding users, review-requests, and diffs to empty database
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --diffs=1
./reviewboard/manage.py fill-database --users=100 --review-requests=1:10 --diffs=0:10

Test adding users, requests, diffs, and arguments other than reviews
(shouldn't add anything but users, requests & diffs)
./reviewboard/manage.py fill-database --users=5 --review-requests=1:5 --diffs=1:10 --diff-comments=10

Test adding users, requests, diffs, and reviews to empty database
./reviewboard/manage.py fill-database --users=1 --review-requests=1:5 --diffs=1 --reviews=1
./reviewboard/manage.py fill-database --users=100 --review-requests=0:5 --diffs=0:5 --reviews=0:5

Test adding users, requests, diffs, reviews, and comments to empty database
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --diffs=1 --reviews=1 --diff-comments=1

Test adding additional information to existing database
./reviewboard/manage.py fill-database --users=5 --review-requests=1:5 --diffs=1:5 --reviews=1:10 --diff-comments=1:10

Test adding lots of information
./reviewboard/manage.py fill-database --users=1 --review-requests=1 --diffs=1 --reviews=100 --diff-comments=1
./reviewboard/manage.py fill-database --users=5 --review-requests=5 --diffs=10 --reviews=100 --diff-comments=1
./reviewboard/manage.py fill-database --users=100 --review-requests=10 --diffs=10 --reviews=10 --diff-comments=10
[unfortunately while this last test does work, I ran it for 24hour without it completing the task entirely]

Test some params at 0, shouldn't execute following parameters
./reviewboard/manage.py fill-database --users=5 --review-requests=0 --diffs=1:5 --reviews=1:5 --diff-comments=1:5
./reviewboard/manage.py fill-database --users=5 --review-requests=2 --diffs=0 --reviews=1:5 --diff-comments=1:5
./reviewboard/manage.py fill-database --users=5 --review-requests=2 --diffs=2 --reviews=0 --diff-comments=1:5
./reviewboard/manage.py fill-database --users=5 --review-requests=2 --diffs=2 --reviews=2 --diff-comments=0

Additional quantity-based tests:
./reviewboard/manage.py fill-database --users=5 --review-requests=1:5 --diffs=1:10 --reviews=1:10 --diff-comments=1:10
./reviewboard/manage.py fill-database --users=100 --review-requests=100
./reviewboard/manage.py fill-database --users=100 --review-requests=50 --diffs=20
./reviewboard/manage.py fill-database --users=100 --review-requests=30 --diffs=10
./reviewboard/manage.py fill-database --users=100 --review-requests=20 --diffs=5 --reviews=50
./reviewboard/manage.py fill-database --users=100 --review-requests=20 --diffs=5 --reviews=20
./reviewboard/manage.py fill-database --users=100 --review-requests=20 --diffs=5 --reviews=10
./reviewboard/manage.py fill-database --users=100 --review-requests=10 --diffs=5 --reviews=10 --diff-comments=10
SW
SW
SW
david
  1. 
      
  2. reviewboard/reviews/management/commands/fill-database.py (Diff revision 3)
     
     
     
     
     
     
     
     
    According to PEP-8 (the python style guide), imports should be separated into three groups, with a blank line in between each of those groups.
    
    Standard library (os, random, optparse)
    
    Other 3rd party modules (django)
    
    Package imports (reviewboard)
    
    
    
    Within each of these three sections, please alphabetize the import lines.
  3. Trailing space.
  4. reviewboard/reviews/management/commands/fill-database.py (Diff revision 3)
     
     
     
     
     
     
     
     
     
    The help text here is a little repetitive. I'd get rid of "Specifies the" from each, and provide examples of what the input should look like, especially for the range items.
    
    I think these would be more helpful:
    "The number of users to add"
    "The number of review requests per user [min:max]"
    "The number of diffs per user [min:max]"
    
    I also think that "diffs" should be per review request, and "diff-comments" should be per diff.
    
    Also your last help text (for diff-comments) is incorrect.
  5. reviewboard/reviews/management/commands/fill-database.py (Diff revision 3)
     
     
     
     
     
     
     
    It might be nice to let people either specify a single number or a range (you can parse that fairly easily). These might also be nicer to store as tuples instead of separate variables for min and max.
  6. I know that a lot of this kind of output is probably temporary and for debug purposes, but chances are pretty good that the user knows what they entered. The output should inform about the progress and communicate how long the user has to wait before long-running operations are done.
  7. Trailing space.
  8. This can be much shorter:
    
    return (int(item.strip()) for item in com_string.split(':'))
    
    Also note that we wrap lines to 80 characters.
  9. Since this is an error message, please print it to stderr.
    
    print >> sys.stderr, "..."
    
    Also wrap to 80 characters.
  10. 
      
SW
SW
SW
PI
  1. Ship It!
  2. 
      
mike_conley
  1. Steve:
    
    Good work on this.
    
    There are a few minor style things - a sweep with the pep8 script (http://pypi.python.org/pypi/pep8) will help you nail them all.
    
    Some other concerns:
    1)  Shouldn't we create the reviews/management/diffs folder, instead of asking users to create it on their own?
    2)  Shouldn't we include some .diff files, instead of asking users to create them on their own?
    3)  According to the Django documentation, @transaction.commit_on_success only rolls back if we throw an exception.  In some cases, you're exiting on a failure, which means that the database commits the transaction at that point - even though we've failed.  This can leave the database in a tricky state (I got this error after not having any .diff files, and it wouldn't go away until I reverted by DB manually:  http://www.pastie.org/1667934).  Better to throw exceptions, and rollback the DB on failure.
    
    So those are my comments.  Thanks for your work,
    
    -Mike
    
    1. Thanks for the review I appreciate it.
      
      1) yeah I have that folder ready to upload, I've just been avoiding adding it to the post-review bc of the size, but I'll add it next commit
      2) yeah I have some too, I'll add to the next post-review, this will include the additions to the test-repositories
      3) thanks, I'll fix this up for next commit too.  I forgot about those exceptions.
  2. Two blank lines here.
  3. Space between "#" and "Generate"
  4. Why is manage.py here?  I think you can just have "/../scmtools/testdata/git_repo" and get the same effect.
  5. Similarly, I don't think manage.py is required here.
  6. Hold up - so we need to add a new folder called "diffs" under "reviews/management/commands/diffs"?  That should probably be part of this review request too, then.
    
    Git generally ignores empty folders, but you can get it to pay attention by putting an empty .gitignore in it.
    1. I have a bunch of diffs in this folder for testing purposes.  So I'll add these & this folder to post-review I guess on the next commit, since they will be needed for testing, unless there is something better I should be doing.  
  7. At least two spaces before inline comment, and a space between "#" and "add"
  8. reviewboard/reviews/management/commands/fill-database.py (Diff revision 5)
     
     
     
     
     
     
     
     
     
     
     
     
    Hm...and we need .diff files too?
    
    Should we generate these ourselves, and include them?
    1. I have a bunch of git .diff files that I intended to include, which will also have an effect on the test_repo, since those files need to exist there as well.  It's just a lot of info I was trying to avoid posting on post-review
  9. Since you're using @transaction.commit_on_success, you'll probably want to throw exceptions to ensure that the db gets rolled back.
  10. Space on either side of +
  11. Two spaces before inline comment, and space between "#" and "avoids"
  12. Or should this be a fill-database optional parameter?
    1. Sure, that's fine with me, I can add it as a param if people like.
  13. Space on either side of "=" operator.
    
    There are a bunch of these, but I'll just mention it this one time.
  14. I think these should be different for each review request - it'll give a better sense of variety when looking at the dashboard, instead of seeing the same summary/descriptions all over the place.  Maybe some Lorem Ipsum would be appropriate here.
    1. ok, I'll look into something
  15. I think these comments are OK to be in lowercase.  ;)
  16. Line exceeds 80 chars
  17. So you just want the most recent one?
    
    file_diff = cur_diff.files.order_by('timestamp')[0]
    
    oughta do the trick, unless I misunderstand what you're trying to do.
    
    Also, two spaces before inline comment, space after "#".
    1. Yeah I guess I wasn't that clear in this comment.  On occasion files will contain the same diff twice, so I just tried to grab the first entry.  So I suppose which one it takes doesn't matter & if so, I can order_by any field as long as the field exists (since timestamp isn't an available field: (Choices are: binary, comments, dest_detail, dest_file, diff, diffset, id, interdiff_comments, parent_diff, source_file, source_revision, status))  
      
      I'll just use by id.
  18. Space after #, lowercase the comment.
  19. Hm.  We should guarantee that the diffs we provide (if we're providing diffs) have more than 220 lines.
    
    Or, like you say, we need to detect that per diff.  Hrm.  Is there no diffviewer util for this?
    1. Not that I'm aware of, Christian just said to hardcode it in, so we didn't have to look into parsing a diff.  I can take a look around though and see if there is something available.  220 lines was the max size of the smallest of the diffs that I have.
  20. space after , and space on either side of -
  21. space after ,
  22. Hm.  NameError is for when a local or global variable is not defined.  Both output and req_val *should* be defined here...not sure why you need to catch this one, nor pass it silently.
    1. req_val is not necessarily defined, if the user just creates a list of users.  So if: fill-database --users=10, then req_val will fail but I didn't want it to interrupt the output, so just passed it silently.  This output can be completely revamped however we want, I wasn't sure what to output since things like diffs will have multiple values for each review-request.  Ultimately I'd figure we'd just need the usernames, but having some sort of summary might be nice too.
    2. ah my bad on this one, I forgot I redefined req_val in this version, I'll ax this stuff
  23. Maybe provide an example here - like "Example:  --review-requests=2:5"
  24. Again, we'll want to throw an exception in order to rollback the db.
  25. reviewboard/reviews/management/commands/fill-database.py (Diff revision 5)
     
     
     
     
     
     
     
     
     
    More concise / neater option:
    
    return ''.join(random.choice(string.ascii_lowercase) for x in range(5, 8))
  26. reviewboard/reviews/management/commands/fill-database.py (Diff revision 5)
     
     
     
     
     
     
     
     
     
     
    I'd prefer failing early, like:
    
    if not value:
      return 0
    
    if len(value) == 1:
      return value[0]
    
    return random.randrange(value[0], value[1])
  27. 
      
SW
SW
chipx86
  1. 
      
  2. You can import both of these at once.
  3. forms, and then models (alphabetical order on the module level)
  4. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
    You probably should do:
    
    def handle_noargs(self, users=None, review_requests=None, ....., **options):
  5. Most management commands will take a verbosity parameter. That should probably be used. That way things can be done from scripts.
  6. Parameters should be aligned.
  7. Use os.path.join:
    
    repo_dir = os.path.join(os.path.abspath(sys.argv[0]), "..", "scmtools", "testdata", "git_repo")
    
    This ensures it works across platforms.
  8. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
    This is an odd way to do it. I don't really think oyu need to throw an exception and then catch it immediately. Just do the if statement and then print the error.
  9. This should actually be:
    
    raise CommandError('...')
    
    No trailing \n.
    
    Also, that lets you get rid of the exit() below. It means that other commands can wrap this and not terminate early.
  10. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
    Probably should combine to one statement.
  11. Comments should be in sentence casing with trailing periods.
  12. Same thing about os.path.join
  13. Same about comments. Here and everywhere else.
  14. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
    Same comments as above.
  15. Shouldn't need to? Especially if using os.path.join.
  16. Space after #
  17. This can be simplified as:
    
    files = [f for f in os.listdir(diff_dir)
             if f.endswith('.diff')]
  18. Same comments about comment styles.
  19. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
    Same as above. Just going to assume you'll fix these for the others below.
  20. Parameters should align.
  21. One per line.
    
    We should have different names so we can distinguish when looking at pages.
  22. Should use @example.com, since @email.com might be legit (and @example.com is allowed by some RFC)
  23. Rather than this, we should always use set_password.
    
    The SHA1 isn't good enough, because it depends on the SECRET_KEY.
  24. One per line.
  25. You should be able to leave these out.
  26. One per line.
  27. Should use print, and one per actual line.
    
    Also, instead of casting to strings and concatenating, use Python format strings.
  28. No blank line.
  29. print "Request #%s:" % j
  30. Params should align.
    
    I think None can fit on the same line though?
  31. User.objects.get(pk=1)
    
    Actually, we're doing this in a loop, so let's only fetch this one time throughout the whole lifetime of this program. It'll speed things up a bit.
  32. Format strings.
  33. No blank line.
  34. Same comments about format strings and printing.
  35. 'filename' is usually a string. Use 'f'
  36. Should be able to fit on a line./
  37. No blank line.
  38. Same as above. I'm going to assume you'll go through and take care of the rest.
  39. Can these params not fit after .create( ?
  40. Should do this as infrequently as possible.
  41. Format strings.
  42. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
     
     
     
     
    Kind of icky. Do we really need to do that so often? The inner loop isn't good enough?
  43. reviewboard/reviews/management/commands/fill-database.py (Diff revision 6)
     
     
     
     
     
     
     
     
     
    Should use print for each logical line. That'd be a total of two.
    
    Also, format strings.
  44. Descriptions of functions should be in python docstring format:
    
    def parseCommand(...)
        """One line summary"""
    
    or:
    
        """One line summary
    
        Optional description.
        """
  45. Params should align.
  46. No blank line.
  47. I don't like magic numbers. These should be constants near the top of the file.
  48. We're going to create this every single time this is called, which will be slow. Define it as a constant at the top of the file.
  49. Params should be aligned.
  50. 
      
SW
chipx86
  1. Make sure master is merged in and references were fetched from origin. I'm seeing stray changes in here.
    
    Also, best to keep this just "Fill-database script" or something in the summary. Things shouldn't have a version number until they're officially in, and changing the summary breaks the threading in my e-mail client :)
    
    Once this is updated with master, I'll do another review.
    1. Its really strange, I did just do a fetch from origin & merge just before I started to work on these changes.  But I did it again, there must have been some updates in between or else I updated wrong.
  2. 
      
SW
SW
SW
Review request changed
david
  1. The code looks good, but the testdata/git_repo isn't contained in your patch because it's binary. Can you send me a tar archive of that directory?
  2. 
      
david
  1. Pushed to master as 4e22a3d. Thanks!
  2. 
      
Loading...