Add credentials check tool

Review Request #10223 — Created Oct. 11, 2018 and discarded

Information

ReviewBot
master

Reviewers

Previously, Review Bot did not check for credentials that may have been
accidentally included in the commit. A human reviewer would have to
look out for them, but we hoped to move more of this task's burden
to Review Bot.

A new Credentials Check tool has been added to Review Bot which
looks for various key files, other sensitive files and inline embedded
AWS credentials to make sure these are not pushed to the repository.

Manual tests (correctly finds and creates issues on lines with the
credentials, or marks the first line of a file type (including
file types specified from options tab) that should not have been
included e.g. .pem files).

Description From Last Updated

Can you wrap your description and testing done at 72 Chars?

brenniebrennie

Is this a WIP? Your description has "WIP: Add options..." If this is WIP please put it in the summary

brenniebrennie

In your change description: "ReviewBot" -> "Review Bot" (3x)

daviddavid

E501 line too long (83 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E501 line too long (85 > 79 characters)

reviewbotreviewbot

E501 line too long (80 > 79 characters)

reviewbotreviewbot

E501 line too long (84 > 79 characters)

reviewbotreviewbot

E501 line too long (89 > 79 characters)

reviewbotreviewbot

typo: "credntialscheck"

brenniebrennie

typo: "credntialscheck"

brenniebrennie

Missing module-level docstring

brenniebrennie

Module imports should be formatted as: from __future__ import ... # Python STDLib imports # 3rd party imports # Imports …

brenniebrennie

Instead of having multiple credential regexes, we can make this into a single regular expression: compiled_credential_pattern = re.compile( '|'.join( '(%s)' …

brenniebrennie

Single quotes here and throughout

brenniebrennie

Missing trailing comma. This regex doesn't do what you want it to becuase of the leading [, which makes everything …

brenniebrennie

Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory …

brenniebrennie

Docstrings should be of the form: """Single line summary. Multi-line description. """

brenniebrennie

We use the Oxford comma, so there should be a comma after private keys.

brenniebrennie

How about just "Review a single file."?

brenniebrennie

Blank line between these.

brenniebrennie

This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of …

brenniebrennie

We should word this as "may be a security risk" because if its a public key --PEM files can be …

brenniebrennie

Comments should be complete sentences: they should begin with a capital letter and end with a period.

brenniebrennie

What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions …

brenniebrennie

Blank line between these.

brenniebrennie

Since pattern is a compiled regular expression, you can just do pattern.match(line)

brenniebrennie

You're going to need to add a entrypoint for your tool.

brenniebrennie

:file:`.pem` :file:`id_rsa`

brenniebrennie

How about: This tool is built into ReviewBot. There is no separate installation step required.

brenniebrennie

Typo: "credntialscheck"

brenniebrennie

One more blank line here.

brenniebrennie

Can you add a docstring here?

brenniebrennie

No blank line here.

brenniebrennie

Can you wrap these in parens to make it clear that this is supposed to be multiple lines? e.g. python …

brenniebrennie

E128 continuation line under-indented for visual indent

reviewbotreviewbot

It looks like these wouldn't catch cases where the value was enclosed in quotes?

daviddavid

Let's put one per line and sort them all alphabetically.

daviddavid

Should probably be "Including this file ..." (files -> file)

daviddavid

We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need …

daviddavid

Do we want to use .search() instead of .match()?

daviddavid

I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private …

daviddavid

A link to this needs to be added to docs/reviewbot/tools/index.rst

daviddavid

This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or …

daviddavid

E124 closing bracket does not match visual indentation

reviewbotreviewbot

E124 closing bracket does not match visual indentation

reviewbotreviewbot

Can you insert this in alphabetical order?

brenniebrennie

Can you insert this in alphabetical order?

brenniebrennie

These regexes won't work in a few cases: Shell scripts with AWS_SECRET_KEY=... i.e., without quotes. Single-quoted string values We can …

brenniebrennie

Single quotes.

brenniebrennie

Single quotes around AWS_SECRET_KEY

brenniebrennie

Single quotes here

alextechccalextechcc

Trailing space, but it should also have a period. Also, comma after e.g..

brenniebrennie

Missing args/kwargs

brenniebrennie

https://docs.python.org/2/library/os.path.html#os.path.splitext

brenniebrennie

.iteritems() is Python2 only. You'll want to do: import six # ... for name, pattern in six.iteritems(self.compiled_re):

brenniebrennie

This should line up with the string above, e.g. ('... ' '... '),

brenniebrennie

RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention …

brenniebrennie

Trailing whitespace.

brenniebrennie

Can you insert this in alphabetical order?

brenniebrennie

Can you insert this in alphabetical order?

brenniebrennie

six is a third-party library, so it should go in it's own "section": import re import six from reviewbot.tools import …

daviddavid

Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will …

daviddavid

This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go: super(CredentialsCheckTool, self).__init__() …

daviddavid

In this case I think it's probably better to just ignore the line length warning.

daviddavid

ReviewBot -> Review Bot

daviddavid

ReviewBot -> Review Bot

ilawilaw

E501 line too long (88 > 79 characters)

reviewbotreviewbot

Should be in alphabetical order?

ilawilaw

Will this fit as: for risk_name, pattern in six.iteritems( self._compiled_re): # ...

brenniebrennie

E501 line too long (88 > 79 characters)

reviewbotreviewbot

F821 undefined name 'compiled_pattern'

reviewbotreviewbot

E501 line too long (88 > 79 characters)

reviewbotreviewbot

E501 line too long (88 > 79 characters)

reviewbotreviewbot

Add another blank line here.

daviddavid

E501 line too long (88 > 79 characters)

reviewbotreviewbot
Checks run (1 failed, 1 succeeded)
flake8 failed.
JSHint passed.

flake8

ammar
brennie
  1. 
      
  2. README.rst (Diff revision 2)
     
     
    Show all issues

    typo: "credntialscheck"

  3. bot/README.rst (Diff revision 2)
     
     
    Show all issues

    typo: "credntialscheck"

  4. Show all issues

    Missing module-level docstring

  5. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     
    Show all issues

    Module imports should be formatted as:

    from __future__ import ...
    
    # Python STDLib imports
    
    # 3rd party imports
    
    # Imports from this package
    

    e.g.

    from __future__ import unicode_literals
    
    import re
    
    import chardet
    
    from reviewbot.tools import Tool
    
  6. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     
     
     
    Show all issues

    Instead of having multiple credential regexes, we can make this into a single regular expression:

    compiled_credential_pattern = re.compile(
        '|'.join(
            '(%s)' % pattern
            for pattern in credential_patterns
        )
    )
    
  7. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
    Show all issues

    Single quotes here and throughout

  8. Show all issues

    Missing trailing comma.

    This regex doesn't do what you want it to becuase of the leading [, which makes everything up to ] a character this regex will match.

    e.g.

    >>> import re
    >>> x = re.compile(r"[AWS_SECRET_KEY\s*=\s*[A-Za-z0-9/+=]{40}")
    >>> m = x.match('SSSSSSSSSSSS  = %s' % 'A' * 40)
    >>> bool(m)
    True
    
  9. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
    Show all issues

    Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory if the tool isn't used. e.g.

    class CredentialsCheckTool(Tool):
        def __init__(self, *args, **kwargs):
            super(CredentialsCheckTool, self).__init__(*args, **kwargs)
            self.compiled_re = [
                re.compile(regex)
                for regex in credential_patterns
            ]
    
  10. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
    Show all issues

    Docstrings should be of the form:

    """Single line summary.
    
    Multi-line description.
    """
    
  11. Show all issues

    We use the Oxford comma, so there should be a comma after private keys.

  12. Show all issues

    How about just "Review a single file."?

    1. All tools use the "Perform a review of a single file" verbiage. Do we still want to change this?

    2. Nope thats fine!

  13. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
    Show all issues

    Blank line between these.

  14. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
    Show all issues

    This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of file names (id_rsa, id_dsa, id_ecdsa) versus file extensions (p12, pem, ppk, key).

    1. >>> file_type = "id_rsa".lower().split(".")[-1]
      >>> file_type
      'id_rsa'
      >>> file_type = "my.key".lower().split(".")[-1]
      >>> file_type
      'key'
      

      file_type actually captures both

    2. Ok ignore this then :)

      BTW you can use os.path.splitext

  15. Show all issues

    We should word this as "may be a security risk" because if its a public key --PEM files can be encoded public keys -- it totally isn't.

  16. Show all issues

    Comments should be complete sentences: they should begin with a capital letter and end with a period.

  17. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    I would organize this as:

    try:
        encoding = chardet.detect(
            f.patched_file_contents)['encoding']
    
        if encoding:
            contents = f.patched_file_contents.decode(
                encoding, 'strict')
        else:
            # We can't do any more for this file.
            return
    except (TypeError, UnicodeError, ValueError):
        return
    
    lines = contents.split('\n')
    
    for line_number, line in enumerate(lines, 1):
        for pattern in compiled_credential_patterns:
            if pattern.match(line):
                f.comment(...)
    

    This lets us use explicit control flow (return) when we are done instead of keeping track of boolean state.

    1. Thanks! This is much cleaner.

  18. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     

    Does bytes.decode(encoding, 'strict') work in Python 3?

    1. Seems to work fine.

      Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
      [GCC 5.4.0 20160609] on linux
      Type "help", "copyright", "credits" or "license" for more information.
      >>> b'100000000'.decode("ascii", "strict")
      '100000000'
      
  19. Show all issues

    What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions that we aren't expecting aren't also captured.

    Looking at the source of chardet.detect, it looks like it only raises TypeError. bytes.decode(..., 'strict') seems to raise ValueError and UnicodeError.

  20. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
     
     
     
    Show all issues

    Blank line between these.

  21. Show all issues

    Since pattern is a compiled regular expression, you can just do pattern.match(line)

  22. bot/setup.py (Diff revision 2)
     
     
    Show all issues

    You're going to need to add a entrypoint for your tool.

  23. Show all issues

    :file:`.pem`
    :file:`id_rsa`
    
  24. Show all issues

    How about:

    This tool is built into ReviewBot. There is no separate installation step required.
    
  25. extension/README.rst (Diff revision 2)
     
     
    Show all issues

    Typo: "credntialscheck"

  26. 
      
ammar
brennie
  1. 
      
  2. Show all issues

    Can you wrap your description and testing done at 72 Chars?

  3. Show all issues

    Is this a WIP? Your description has "WIP: Add options..."

    If this is WIP please put it in the summary

  4. Show all issues

    One more blank line here.

  5. Show all issues

    Can you add a docstring here?

  6. Show all issues

    No blank line here.

  7. bot/setup.py (Diff revision 3)
     
     
     
    Show all issues

    Can you wrap these in parens to make it clear that this is supposed to be multiple lines?

    e.g.

    python ('credentialscheck = ' '...'),

    This will help in not accidentally adding a comma there in the future.

  8. 
      
ammar
Review request changed
Description:
~  

Previously, ReviewBot did not check for credentials that may have been accidentally included in the commit. A human reviewer would have to look out for them, but we hoped to move more of this task's burden to ReviewBot.

  ~

Previously, ReviewBot did not check for credentials that may have been

  + accidentally included in the commit. A human reviewer would have to
  + look out for them, but we hoped to move more of this task's burden
  + to ReviewBot.

   
~  

A new Credentials Check tool has been added to ReviewBot which looks for various key files, other sensitive files and inline embedded AWS credentials to make sure these are not pushed to the repository.

~  
~  

WIP: Add options to let users enter their own files to ignore.

  ~

A new Credentials Check tool has been added to ReviewBot which

  ~ looks for various key files, other sensitive files and inline embedded
  ~ AWS credentials to make sure these are not pushed to the repository.

Testing Done:
~  

Manual tests (correctly finds and creates issues on lines with the credentials, or marks the first line of a file type that should not have been included e.g. .pem files)

  ~

Manual tests (correctly finds and creates issues on lines with the

  + credentials, or marks the first line of a file type (including
  + file types specified from options tab) that should not have been
  + included e.g. .pem files).

Commit:
b988408dc3c6a168c18e64b00c89901d5c612fa7
ac317d290bcc069e970f67365e879db2e3affba9

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
david
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     
    Show all issues

    It looks like these wouldn't catch cases where the value was enclosed in quotes?

  3. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     
    Show all issues

    Let's put one per line and sort them all alphabetically.

  4. Show all issues

    Should probably be "Including this file ..." (files -> file)

  5. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
     
     
     
     
     
     
     
     
     
     
    Show all issues

    We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need to make sure that the patterns use br'...'

    1. What does br mean/achieve? Cannot seem to find any documentation on it.
      Also, if we skip decoding, how can we mitigate risk of running the regex against image files etc?

    2. An r prefix means "raw string", where a \ character is actually a \ (and doesn't need to be escaped with \\). That's generally useful for regexes that use a lot of \ characters. A b prefix means a bytestring (as opposed to unicode text).

      As far as binary files and such, we currently don't worry about them (because of the way we store diffs). Even when we add more significant binary file support, I don't think you will need to care here.

  6. Show all issues

    Do we want to use .search() instead of .match()?

  7. Show all issues

    I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private Amazon AWS keys")

  8. Show all issues

    A link to this needs to be added to docs/reviewbot/tools/index.rst

  9. docs/reviewbot/tools/credentialscheck.rst (Diff revision 5)
     
     
     
    Show all issues

    This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or private key files."

  10. 
      
ammar
Review request changed
Commit:
d5e3af9386608400281a3eb40045da461682af37
120e7993fdca837f1c8015d2c8f0cb51ccf9cc1d

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
Review request changed

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
brennie
  1. 
      
  2. README.rst (Diff revision 8)
     
     
    Show all issues

    Can you insert this in alphabetical order?

  3. bot/README.rst (Diff revision 8)
     
     
    Show all issues

    Can you insert this in alphabetical order?

  4. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
     
     
     
     
    Show all issues

    These regexes won't work in a few cases:

    1. Shell scripts with AWS_SECRET_KEY=... i.e., without quotes.
    2. Single-quoted string values

    We can get around this with:

    {
        'AWS_KEY': br'''(?:AWS_KEY|AWS_ACCESS_KEY|AWS_ACCESS_KEY_ID)\s*=\s*(?P<quote>["']?)[A-Z0-9]{20}(?P=quote)'''
    }
    
  5. Show all issues

    Single quotes.

  6. Show all issues

    Single quotes around AWS_SECRET_KEY

  7. Show all issues

    Trailing space, but it should also have a period.

    Also, comma after e.g..

    1. Also, comma after e.g..

      I am not sure what comma you mean by this.

    2. "e.g., pem, key, id_rsa"

  8. Show all issues

    Missing args/kwargs

  9. Show all issues

    https://docs.python.org/2/library/os.path.html#os.path.splitext

    1. I think current approach seems to end up working nicer.
      With spiltext, we need to maintain two sets instead of one (file names and file extensions) and then check for a given file if the file name is in file names set or file extension in file extensions set.

      credential_file_names = {
          '.aws_credentials'
          'id_dsa',
          'id_ecdsa',
          'id_rsa',
      }
      
      credential_file_types = {
          '.key',
          '.p12',
          '.pem',
          '.ppk',
      }
      

      So we would have to keep these sets and then check:

              if (f.file_type in unsafe_file_types or
                  f.filename in unsafe_file_types):
      

      where unsafe_file_types = credential_file_types + additional file types specified in options. Options just takes in one list currently A comma-separated list of file names and extensions, and from this list I don't see a way to split items into file types or extensions (both could start with a '.'). So we would have to create a second field in options.

      Is there a problem with keeping on using split?

    2. What you have is fine, but please add a comment explaining the reasoning.

  10. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
     
     
     
    Show all issues

    .iteritems() is Python2 only. You'll want to do:

    import six 
    
    # ...
    
    for name, pattern in six.iteritems(self.compiled_re):
    
  11. bot/setup.py (Diff revision 8)
     
     
    Show all issues

    This should line up with the string above, e.g.

    ('... '
     '... '),
    
  12. bot/setup.py (Diff revision 8)
     
     
    Show all issues

    RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention using six in another comment).

  13. Show all issues

    Trailing whitespace.

  14. docs/reviewbot/tools/index.rst (Diff revision 8)
     
     
    Show all issues

    Can you insert this in alphabetical order?

  15. extension/README.rst (Diff revision 8)
     
     
    Show all issues

    Can you insert this in alphabetical order?

  16. 
      
alextechcc
  1. 
      
  2. Show all issues

    Single quotes here

  3. 
      
ammar
david
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     
    Show all issues

    six is a third-party library, so it should go in it's own "section":

    import re
    
    import six
    
    from reviewbot.tools import Tool
    
  3. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     
     
     
     
     
    Show all issues

    Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will line up a bit nicer. We also don't need to use the triple quotes (better to just escape any inner ' characters), but we do need to have the br prefix on each line, even though they get concatenated.

    'AWS_KEY': (
        br'...'
        br'...'
    ),
    'AWS_SECRET_KEY': (
        br'...'
        br'...'
    ),
    
  4. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
     
     
     
     
     
    Show all issues

    This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go:

    super(CredentialsCheckTool, self).__init__()
    
    self.compiled_re = {
        name: re.compile(pattern)
        for name, pattern in six.iteritems(credential_patterns)
    }
    
    1. Thanks! I did not know about dictionary comprehensions.

  5. bot/setup.py (Diff revision 9)
     
     
     
    Show all issues

    In this case I think it's probably better to just ignore the line length warning.

  6. Show all issues

    ReviewBot -> Review Bot

  7. 
      
ammar
Review request changed
Commit:
9eb1b74f700678b4b76838bcc7a0fe2ff3a10727
7cebf8d3cfee0dc7b7a5aa8dea54c2bac753eda6

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ilaw
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 10)
     
     
    Show all issues

    ReviewBot -> Review Bot

  3. bot/setup.py (Diff revision 10)
     
     
    Show all issues

    Should be in alphabetical order?

  4. 
      
ammar
Review request changed
Commit:
7cebf8d3cfee0dc7b7a5aa8dea54c2bac753eda6
544b14a372d66e599399149480919555ce0004d0

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
  1. 
      
  2. bot/reviewbot/tools/credentials_check.py (Diff revision 11)
     
     
     
    Show all issues

    Will this fit as:

                for risk_name, pattern in six.iteritems(
                    self._compiled_re):
                    # ...
    
  3. 
      
ammar
Review request changed
Branch:
release-1.0.x
master
Commit:
544b14a372d66e599399149480919555ce0004d0
554f3e838d5a44738f97469fe144da2ea6d821ad

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

ammar
Review request changed
Commit:
554f3e838d5a44738f97469fe144da2ea6d821ad
4047beed2fa56a129d6e9a9f39d6d687573577f4

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

brennie
  1. Ship It!
  2. 
      
david
  1. Two tiny nits:

  2. Show all issues

    In your change description: "ReviewBot" -> "Review Bot" (3x)

  3. Show all issues

    Add another blank line here.

  4. 
      
ammar
Review request changed
Description:
~  

Previously, ReviewBot did not check for credentials that may have been

  ~

Previously, Review Bot did not check for credentials that may have been

    accidentally included in the commit. A human reviewer would have to
    look out for them, but we hoped to move more of this task's burden
~   to ReviewBot.

  ~ to Review Bot.

   
~  

A new Credentials Check tool has been added to ReviewBot which

  ~

A new Credentials Check tool has been added to Review Bot which

    looks for various key files, other sensitive files and inline embedded
    AWS credentials to make sure these are not pushed to the repository.

Commit:
4047beed2fa56a129d6e9a9f39d6d687573577f4
d332e509b24207efeaa0b32781ce14c05acd4d08

Checks run (1 failed, 1 succeeded)

flake8 failed.
JSHint passed.

flake8

david
Review request changed
Status:
Discarded