Add credentials check tool

Review Request #10223 — Created Oct. 11, 2018 and updated

ammar
ReviewBot
master
4047bee...
reviewbot, students

Previously, ReviewBot did not check for credentials that may have been
accidentally included in the commit. A human reviewer would have to
look out for them, but we hoped to move more of this task's burden
to ReviewBot.

A new Credentials Check tool has been added to ReviewBot which
looks for various key files, other sensitive files and inline embedded
AWS credentials to make sure these are not pushed to the repository.

Manual tests (correctly finds and creates issues on lines with the
credentials, or marks the first line of a file type (including
file types specified from options tab) that should not have been
included e.g. .pem files).

  • 0
  • 0
  • 69
  • 4
  • 73
Description From Last Updated
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

ammar
brennie
  1. 
      
  2. README.rst (Diff revision 2)
     
     

    typo: "credntialscheck"

  3. bot/README.rst (Diff revision 2)
     
     

    typo: "credntialscheck"

  4. Missing module-level docstring

  5. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     

    Module imports should be formatted as:

    from __future__ import ...
    
    # Python STDLib imports
    
    # 3rd party imports
    
    # Imports from this package
    

    e.g.

    from __future__ import unicode_literals
    
    import re
    
    import chardet
    
    from reviewbot.tools import Tool
    
  6. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     
     
     

    Instead of having multiple credential regexes, we can make this into a single regular expression:

    compiled_credential_pattern = re.compile(
        '|'.join(
            '(%s)' % pattern
            for pattern in credential_patterns
        )
    )
    
  7. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Single quotes here and throughout

  8. Missing trailing comma.

    This regex doesn't do what you want it to becuase of the leading [, which makes everything up to ] a character this regex will match.

    e.g.

    >>> import re
    >>> x = re.compile(r"[AWS_SECRET_KEY\s*=\s*[A-Za-z0-9/+=]{40}")
    >>> m = x.match('SSSSSSSSSSSS  = %s' % 'A' * 40)
    >>> bool(m)
    True
    
  9. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory if the tool isn't used. e.g.

    class CredentialsCheckTool(Tool):
        def __init__(self, *args, **kwargs):
            super(CredentialsCheckTool, self).__init__(*args, **kwargs)
            self.compiled_re = [
                re.compile(regex)
                for regex in credential_patterns
            ]
    
  10. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Docstrings should be of the form:

    """Single line summary.
    
    Multi-line description.
    """
    
  11. We use the Oxford comma, so there should be a comma after private keys.

  12. How about just "Review a single file."?

    1. All tools use the "Perform a review of a single file" verbiage. Do we still want to change this?

    2. Nope thats fine!

  13. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Blank line between these.

  14. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     

    This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of file names (id_rsa, id_dsa, id_ecdsa) versus file extensions (p12, pem, ppk, key).

    1. >>> file_type = "id_rsa".lower().split(".")[-1]
      >>> file_type
      'id_rsa'
      >>> file_type = "my.key".lower().split(".")[-1]
      >>> file_type
      'key'
      

      file_type actually captures both

    2. Ok ignore this then :)

      BTW you can use os.path.splitext

  15. We should word this as "may be a security risk" because if its a public key --PEM files can be encoded public keys -- it totally isn't.

  16. Comments should be complete sentences: they should begin with a capital letter and end with a period.

  17. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    I would organize this as:

    try:
        encoding = chardet.detect(
            f.patched_file_contents)['encoding']
    
        if encoding:
            contents = f.patched_file_contents.decode(
                encoding, 'strict')
        else:
            # We can't do any more for this file.
            return
    except (TypeError, UnicodeError, ValueError):
        return
    
    lines = contents.split('\n')
    
    for line_number, line in enumerate(lines, 1):
        for pattern in compiled_credential_patterns:
            if pattern.match(line):
                f.comment(...)
    

    This lets us use explicit control flow (return) when we are done instead of keeping track of boolean state.

    1. Thanks! This is much cleaner.

  18. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Does bytes.decode(encoding, 'strict') work in Python 3?

    1. Seems to work fine.

      Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> b'100000000'.decode("ascii", "strict")
      '100000000'
      
  19. What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions that we aren't expecting aren't also captured.

    Looking at the source of chardet.detect, it looks like it only raises TypeError. bytes.decode(..., 'strict') seems to raise ValueError and UnicodeError.

  20. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Blank line between these.

  21. Since pattern is a compiled regular expression, you can just do pattern.match(line)

  22. bot/setup.py (Diff revision 2)
     
     

    You're going to need to add a entrypoint for your tool.

  23. :file:`.pem`
    :file:`id_rsa`
    
  24. How about:

    This tool is built into ReviewBot. There is no separate installation step required.
    
  25. extension/README.rst (Diff revision 2)
     
     

    Typo: "credntialscheck"

  26. 
      
ammar
brennie
  1. 
      
  2. Can you wrap your description and testing done at 72 Chars?

  3. Is this a WIP? Your description has "WIP: Add options..."

    If this is WIP please put it in the summary

  4. One more blank line here.

  5. Can you add a docstring here?

  6. No blank line here.

  7. bot/setup.py (Diff revision 3)
     
     
     

    Can you wrap these in parens to make it clear that this is supposed to be multiple lines?

    e.g.

    python ('credentialscheck = ' '...'),

    This will help in not accidentally adding a comma there in the future.

  8. 
      
ammar
Review request changed

Description:

~  

Previously, ReviewBot did not check for credentials that may have been accidentally included in the commit. A human reviewer would have to look out for them, but we hoped to move more of this task's burden to ReviewBot.

  ~

Previously, ReviewBot did not check for credentials that may have been

  + accidentally included in the commit. A human reviewer would have to
  + look out for them, but we hoped to move more of this task's burden
  + to ReviewBot.

   
~  

A new Credentials Check tool has been added to ReviewBot which looks for various key files, other sensitive files and inline embedded AWS credentials to make sure these are not pushed to the repository.

~  
~  

WIP: Add options to let users enter their own files to ignore.

  ~

A new Credentials Check tool has been added to ReviewBot which

  ~ looks for various key files, other sensitive files and inline embedded
  ~ AWS credentials to make sure these are not pushed to the repository.

Testing Done:

~  

Manual tests (correctly finds and creates issues on lines with the credentials, or marks the first line of a file type that should not have been included e.g. .pem files)

  ~

Manual tests (correctly finds and creates issues on lines with the

  + credentials, or marks the first line of a file type (including
  + file types specified from options tab) that should not have been
  + included e.g. .pem files).

Commit:

-b988408dc3c6a168c18e64b00c89901d5c612fa7
+ac317d290bcc069e970f67365e879db2e3affba9

Diff:

Revision 4 (+120)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
david
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     

    It looks like these wouldn't catch cases where the value was enclosed in quotes?

  3. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     

    Let's put one per line and sort them all alphabetically.

  4. Should probably be "Including this file ..." (files -> file)

  5. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     
     
     
     
     
     
     
     

    We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need to make sure that the patterns use br'...'

    1. What does br mean/achieve? Cannot seem to find any documentation on it.
      Also, if we skip decoding, how can we mitigate risk of running the regex against image files etc?

    2. An r prefix means "raw string", where a \ character is actually a \ (and doesn't need to be escaped with \\). That's generally useful for regexes that use a lot of \ characters. A b prefix means a bytestring (as opposed to unicode text).

      As far as binary files and such, we currently don't worry about them (because of the way we store diffs). Even when we add more significant binary file support, I don't think you will need to care here.

  6. Do we want to use .search() instead of .match()?

  7. I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private Amazon AWS keys")

  8. A link to this needs to be added to docs/reviewbot/tools/index.rst

  9. docs/reviewbot/tools/credentialscheck.rst (Diff revision 5)
     
     
     

    This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or private key files."

  10. 
      
ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
brennie
  1. 
      
  2. README.rst (Diff revision 8)
     
     

    Can you insert this in alphabetical order?

  3. bot/README.rst (Diff revision 8)
     
     

    Can you insert this in alphabetical order?

  4. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
     
     
     
     

    These regexes won't work in a few cases:

    1. Shell scripts with AWS_SECRET_KEY=... i.e., without quotes.
    2. Single-quoted string values

    We can get around this with:

    {
        'AWS_KEY': br'''(?:AWS_KEY|AWS_ACCESS_KEY|AWS_ACCESS_KEY_ID)\s*=\s*(?P<quote>["']?)[A-Z0-9]{20}(?P=quote)'''
    }
    
  5. Single quotes.

  6. Single quotes around AWS_SECRET_KEY

  7. Trailing space, but it should also have a period.

    Also, comma after e.g..

    1. Also, comma after e.g..

      I am not sure what comma you mean by this.

    2. "e.g., pem, key, id_rsa"

  8. Missing args/kwargs

  9. https://docs.python.org/2/library/os.path.html#os.path.splitext

    1. I think current approach seems to end up working nicer.
      With spiltext, we need to maintain two sets instead of one (file names and file extensions) and then check for a given file if the file name is in file names set or file extension in file extensions set.

      credential_file_names = {
          '.aws_credentials'
          'id_dsa',
          'id_ecdsa',
          'id_rsa',
      }
      
      credential_file_types = {
          '.key',
          '.p12',
          '.pem',
          '.ppk',
      }
      

      So we would have to keep these sets and then check:

              if (f.file_type in unsafe_file_types or
                  f.filename in unsafe_file_types):
      

      where unsafe_file_types = credential_file_types + additional file types specified in options. Options just takes in one list currently A comma-separated list of file names and extensions, and from this list I don't see a way to split items into file types or extensions (both could start with a '.'). So we would have to create a second field in options.

      Is there a problem with keeping on using split?

    2. What you have is fine, but please add a comment explaining the reasoning.

  10. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
     
     
     

    .iteritems() is Python2 only. You'll want to do:

    import six 
    
    # ...
    
    for name, pattern in six.iteritems(self.compiled_re):
    
  11. bot/setup.py (Diff revision 8)
     
     

    This should line up with the string above, e.g.

    ('... '
     '... '),
    
  12. bot/setup.py (Diff revision 8)
     
     

    RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention using six in another comment).

  13. Trailing whitespace.

  14. docs/reviewbot/tools/index.rst (Diff revision 8)
     
     

    Can you insert this in alphabetical order?

  15. extension/README.rst (Diff revision 8)
     
     

    Can you insert this in alphabetical order?

  16. 
      
alextechcc
  1. 
      
  2. Single quotes here

  3. 
      
ammar
david
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     

    six is a third-party library, so it should go in it's own "section":

    import re
    
    import six
    
    from reviewbot.tools import Tool
    
  3. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     
     
     
     
     

    Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will line up a bit nicer. We also don't need to use the triple quotes (better to just escape any inner ' characters), but we do need to have the br prefix on each line, even though they get concatenated.

    'AWS_KEY': (
        br'...'
        br'...'
    ),
    'AWS_SECRET_KEY': (
        br'...'
        br'...'
    ),
    
  4. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     
     
     

    This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go:

    super(CredentialsCheckTool, self).__init__()
    
    self.compiled_re = {
        name: re.compile(pattern)
        for name, pattern in six.iteritems(credential_patterns)
    }
    
    1. Thanks! I did not know about dictionary comprehensions.

  5. bot/setup.py (Diff revision 9)
     
     
     

    In this case I think it's probably better to just ignore the line length warning.

  6. ReviewBot -> Review Bot

  7. 
      
ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ilaw
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 10)
     
     

    ReviewBot -> Review Bot

  3. bot/setup.py (Diff revision 10)
     
     

    Should be in alphabetical order?

  4. 
      
ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 11)
     
     
     

    Will this fit as:

                for risk_name, pattern in six.iteritems(
                    self._compiled_re):
                    # ...
    
  3. 
      
ammar
Review request changed

Branch:

-release-1.0.x
+master

Commit:

-544b14a372d66e599399149480919555ce0004d0
+554f3e838d5a44738f97469fe144da2ea6d821ad

Diff:

Revision 12 (+122)

Show changes

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
  1. Ship It!
  2. 
      
Loading...