• 
      

    Add credentials check tool

    Review Request #10223 — Created Oct. 11, 2018 and discarded

    Information

    ReviewBot
    master

    Reviewers

    Previously, Review Bot did not check for credentials that may have been
    accidentally included in the commit. A human reviewer would have to
    look out for them, but we hoped to move more of this task's burden
    to Review Bot.

    A new Credentials Check tool has been added to Review Bot which
    looks for various key files, other sensitive files and inline embedded
    AWS credentials to make sure these are not pushed to the repository.

    Manual tests (correctly finds and creates issues on lines with the
    credentials, or marks the first line of a file type (including
    file types specified from options tab) that should not have been
    included e.g. .pem files).

    Description From Last Updated

    Can you wrap your description and testing done at 72 Chars?

    brenniebrennie

    Is this a WIP? Your description has "WIP: Add options..." If this is WIP please put it in the summary

    brenniebrennie

    In your change description: "ReviewBot" -> "Review Bot" (3x)

    daviddavid

    E501 line too long (83 > 79 characters)

    reviewbotreviewbot

    E501 line too long (84 > 79 characters)

    reviewbotreviewbot

    E501 line too long (85 > 79 characters)

    reviewbotreviewbot

    E501 line too long (80 > 79 characters)

    reviewbotreviewbot

    E501 line too long (84 > 79 characters)

    reviewbotreviewbot

    E501 line too long (89 > 79 characters)

    reviewbotreviewbot

    typo: "credntialscheck"

    brenniebrennie

    typo: "credntialscheck"

    brenniebrennie

    Missing module-level docstring

    brenniebrennie

    Module imports should be formatted as: from __future__ import ... # Python STDLib imports # 3rd party imports # Imports …

    brenniebrennie

    Instead of having multiple credential regexes, we can make this into a single regular expression: compiled_credential_pattern = re.compile( '|'.join( '(%s)' …

    brenniebrennie

    Single quotes here and throughout

    brenniebrennie

    Missing trailing comma. This regex doesn't do what you want it to becuase of the leading [, which makes everything …

    brenniebrennie

    Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory …

    brenniebrennie

    Docstrings should be of the form: """Single line summary. Multi-line description. """

    brenniebrennie

    We use the Oxford comma, so there should be a comma after private keys.

    brenniebrennie

    How about just "Review a single file."?

    brenniebrennie

    Blank line between these.

    brenniebrennie

    This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of …

    brenniebrennie

    We should word this as "may be a security risk" because if its a public key --PEM files can be …

    brenniebrennie

    Comments should be complete sentences: they should begin with a capital letter and end with a period.

    brenniebrennie

    What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions …

    brenniebrennie

    Blank line between these.

    brenniebrennie

    Since pattern is a compiled regular expression, you can just do pattern.match(line)

    brenniebrennie

    You're going to need to add a entrypoint for your tool.

    brenniebrennie

    :file:`.pem` :file:`id_rsa`

    brenniebrennie

    How about: This tool is built into ReviewBot. There is no separate installation step required.

    brenniebrennie

    Typo: "credntialscheck"

    brenniebrennie

    One more blank line here.

    brenniebrennie

    Can you add a docstring here?

    brenniebrennie

    No blank line here.

    brenniebrennie

    Can you wrap these in parens to make it clear that this is supposed to be multiple lines? e.g. python …

    brenniebrennie

    E128 continuation line under-indented for visual indent

    reviewbotreviewbot

    It looks like these wouldn't catch cases where the value was enclosed in quotes?

    daviddavid

    Let's put one per line and sort them all alphabetically.

    daviddavid

    Should probably be "Including this file ..." (files -> file)

    daviddavid

    We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need …

    daviddavid

    Do we want to use .search() instead of .match()?

    daviddavid

    I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private …

    daviddavid

    A link to this needs to be added to docs/reviewbot/tools/index.rst

    daviddavid

    This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or …

    daviddavid

    E124 closing bracket does not match visual indentation

    reviewbotreviewbot

    E124 closing bracket does not match visual indentation

    reviewbotreviewbot

    Can you insert this in alphabetical order?

    brenniebrennie

    Can you insert this in alphabetical order?

    brenniebrennie

    These regexes won't work in a few cases: Shell scripts with AWS_SECRET_KEY=... i.e., without quotes. Single-quoted string values We can …

    brenniebrennie

    Single quotes.

    brenniebrennie

    Single quotes around AWS_SECRET_KEY

    brenniebrennie

    Single quotes here

    alextechccalextechcc

    Trailing space, but it should also have a period. Also, comma after e.g..

    brenniebrennie

    Missing args/kwargs

    brenniebrennie

    https://docs.python.org/2/library/os.path.html#os.path.splitext

    brenniebrennie

    .iteritems() is Python2 only. You'll want to do: import six # ... for name, pattern in six.iteritems(self.compiled_re):

    brenniebrennie

    This should line up with the string above, e.g. ('... ' '... '),

    brenniebrennie

    RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention …

    brenniebrennie

    Trailing whitespace.

    brenniebrennie

    Can you insert this in alphabetical order?

    brenniebrennie

    Can you insert this in alphabetical order?

    brenniebrennie

    six is a third-party library, so it should go in it's own "section": import re import six from reviewbot.tools import …

    daviddavid

    Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will …

    daviddavid

    This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go: super(CredentialsCheckTool, self).__init__() …

    daviddavid

    In this case I think it's probably better to just ignore the line length warning.

    daviddavid

    ReviewBot -> Review Bot

    daviddavid

    ReviewBot -> Review Bot

    ilawilaw

    E501 line too long (88 > 79 characters)

    reviewbotreviewbot

    Should be in alphabetical order?

    ilawilaw

    Will this fit as: for risk_name, pattern in six.iteritems( self._compiled_re): # ...

    brenniebrennie

    E501 line too long (88 > 79 characters)

    reviewbotreviewbot

    F821 undefined name 'compiled_pattern'

    reviewbotreviewbot

    E501 line too long (88 > 79 characters)

    reviewbotreviewbot

    E501 line too long (88 > 79 characters)

    reviewbotreviewbot

    Add another blank line here.

    daviddavid

    E501 line too long (88 > 79 characters)

    reviewbotreviewbot
    Checks run (1 failed, 1 succeeded)
    flake8 failed.
    JSHint passed.

    flake8

    ammar
    brennie
    1. 
        
    2. README.rst (Diff revision 2)
       
       
      Show all issues

      typo: "credntialscheck"

    3. bot/README.rst (Diff revision 2)
       
       
      Show all issues

      typo: "credntialscheck"

    4. Show all issues

      Missing module-level docstring

    5. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
       
       
      Show all issues

      Module imports should be formatted as:

      from __future__ import ...
      
      # Python STDLib imports
      
      # 3rd party imports
      
      # Imports from this package
      

      e.g.

      from __future__ import unicode_literals
      
      import re
      
      import chardet
      
      from reviewbot.tools import Tool
      
    6. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
       
       
       
       
      Show all issues

      Instead of having multiple credential regexes, we can make this into a single regular expression:

      compiled_credential_pattern = re.compile(
          '|'.join(
              '(%s)' % pattern
              for pattern in credential_patterns
          )
      )
      
    7. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
      Show all issues

      Single quotes here and throughout

    8. Show all issues

      Missing trailing comma.

      This regex doesn't do what you want it to becuase of the leading [, which makes everything up to ] a character this regex will match.

      e.g.

      >>> import re
      >>> x = re.compile(r"[AWS_SECRET_KEY\s*=\s*[A-Za-z0-9/+=]{40}")
      >>> m = x.match('SSSSSSSSSSSS  = %s' % 'A' * 40)
      >>> bool(m)
      True
      
    9. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
      Show all issues

      Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory if the tool isn't used. e.g.

      class CredentialsCheckTool(Tool):
          def __init__(self, *args, **kwargs):
              super(CredentialsCheckTool, self).__init__(*args, **kwargs)
              self.compiled_re = [
                  re.compile(regex)
                  for regex in credential_patterns
              ]
      
    10. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
      Show all issues

      Docstrings should be of the form:

      """Single line summary.
      
      Multi-line description.
      """
      
    11. Show all issues

      We use the Oxford comma, so there should be a comma after private keys.

    12. Show all issues

      How about just "Review a single file."?

      1. All tools use the "Perform a review of a single file" verbiage. Do we still want to change this?

      2. Nope thats fine!

    13. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
      Show all issues

      Blank line between these.

    14. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
       
      Show all issues

      This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of file names (id_rsa, id_dsa, id_ecdsa) versus file extensions (p12, pem, ppk, key).

      1. >>> file_type = "id_rsa".lower().split(".")[-1]
        >>> file_type
        'id_rsa'
        >>> file_type = "my.key".lower().split(".")[-1]
        >>> file_type
        'key'
        

        file_type actually captures both

      2. Ok ignore this then :)

        BTW you can use os.path.splitext

    15. Show all issues

      We should word this as "may be a security risk" because if its a public key --PEM files can be encoded public keys -- it totally isn't.

    16. Show all issues

      Comments should be complete sentences: they should begin with a capital letter and end with a period.

    17. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       
       

      I would organize this as:

      try:
          encoding = chardet.detect(
              f.patched_file_contents)['encoding']
      
          if encoding:
              contents = f.patched_file_contents.decode(
                  encoding, 'strict')
          else:
              # We can't do any more for this file.
              return
      except (TypeError, UnicodeError, ValueError):
          return
      
      lines = contents.split('\n')
      
      for line_number, line in enumerate(lines, 1):
          for pattern in compiled_credential_patterns:
              if pattern.match(line):
                  f.comment(...)
      

      This lets us use explicit control flow (return) when we are done instead of keeping track of boolean state.

      1. Thanks! This is much cleaner.

    18. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       

      Does bytes.decode(encoding, 'strict') work in Python 3?

      1. Seems to work fine.

        Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
        [GCC 5.4.0 20160609] on linux
        Type "help", "copyright", "credits" or "license" for more information.
        >>> b'100000000'.decode("ascii", "strict")
        '100000000'
        
    19. Show all issues

      What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions that we aren't expecting aren't also captured.

      Looking at the source of chardet.detect, it looks like it only raises TypeError. bytes.decode(..., 'strict') seems to raise ValueError and UnicodeError.

    20. bot/reviewbot/tools/credentials_check.py (Diff revision 2)
       
       
       
      Show all issues

      Blank line between these.

    21. Show all issues

      Since pattern is a compiled regular expression, you can just do pattern.match(line)

    22. bot/setup.py (Diff revision 2)
       
       
      Show all issues

      You're going to need to add a entrypoint for your tool.

    23. Show all issues

      :file:`.pem`
      :file:`id_rsa`
      
    24. Show all issues

      How about:

      This tool is built into ReviewBot. There is no separate installation step required.
      
    25. extension/README.rst (Diff revision 2)
       
       
      Show all issues

      Typo: "credntialscheck"

    26. 
        
    ammar
    brennie
    1. 
        
    2. Show all issues

      Can you wrap your description and testing done at 72 Chars?

    3. Show all issues

      Is this a WIP? Your description has "WIP: Add options..."

      If this is WIP please put it in the summary

    4. Show all issues

      One more blank line here.

    5. Show all issues

      Can you add a docstring here?

    6. Show all issues

      No blank line here.

    7. bot/setup.py (Diff revision 3)
       
       
       
      Show all issues

      Can you wrap these in parens to make it clear that this is supposed to be multiple lines?

      e.g.

      python ('credentialscheck = ' '...'),

      This will help in not accidentally adding a comma there in the future.

    8. 
        
    ammar
    Review request changed
    Description:
    ~  

    Previously, ReviewBot did not check for credentials that may have been accidentally included in the commit. A human reviewer would have to look out for them, but we hoped to move more of this task's burden to ReviewBot.

      ~

    Previously, ReviewBot did not check for credentials that may have been

      + accidentally included in the commit. A human reviewer would have to
      + look out for them, but we hoped to move more of this task's burden
      + to ReviewBot.

       
    ~  

    A new Credentials Check tool has been added to ReviewBot which looks for various key files, other sensitive files and inline embedded AWS credentials to make sure these are not pushed to the repository.

    ~  
    ~  

    WIP: Add options to let users enter their own files to ignore.

      ~

    A new Credentials Check tool has been added to ReviewBot which

      ~ looks for various key files, other sensitive files and inline embedded
      ~ AWS credentials to make sure these are not pushed to the repository.

    Testing Done:
    ~  

    Manual tests (correctly finds and creates issues on lines with the credentials, or marks the first line of a file type that should not have been included e.g. .pem files)

      ~

    Manual tests (correctly finds and creates issues on lines with the

      + credentials, or marks the first line of a file type (including
      + file types specified from options tab) that should not have been
      + included e.g. .pem files).

    Commit:
    b988408dc3c6a168c18e64b00c89901d5c612fa7
    ac317d290bcc069e970f67365e879db2e3affba9

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    ammar
    david
    1. 
        
    2. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
       
       
       
      Show all issues

      It looks like these wouldn't catch cases where the value was enclosed in quotes?

    3. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
       
       
       
      Show all issues

      Let's put one per line and sort them all alphabetically.

    4. Show all issues

      Should probably be "Including this file ..." (files -> file)

    5. bot/reviewbot/tools/credentials_check.py (Diff revision 5)
       
       
       
       
       
       
       
       
       
       
      Show all issues

      We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need to make sure that the patterns use br'...'

      1. What does br mean/achieve? Cannot seem to find any documentation on it.
        Also, if we skip decoding, how can we mitigate risk of running the regex against image files etc?

      2. An r prefix means "raw string", where a \ character is actually a \ (and doesn't need to be escaped with \\). That's generally useful for regexes that use a lot of \ characters. A b prefix means a bytestring (as opposed to unicode text).

        As far as binary files and such, we currently don't worry about them (because of the way we store diffs). Even when we add more significant binary file support, I don't think you will need to care here.

    6. Show all issues

      Do we want to use .search() instead of .match()?

    7. Show all issues

      I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private Amazon AWS keys")

    8. Show all issues

      A link to this needs to be added to docs/reviewbot/tools/index.rst

    9. docs/reviewbot/tools/credentialscheck.rst (Diff revision 5)
       
       
       
      Show all issues

      This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or private key files."

    10. 
        
    ammar
    Review request changed
    Commit:
    d5e3af9386608400281a3eb40045da461682af37
    120e7993fdca837f1c8015d2c8f0cb51ccf9cc1d

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    ammar
    Review request changed

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    ammar
    brennie
    1. 
        
    2. README.rst (Diff revision 8)
       
       
      Show all issues

      Can you insert this in alphabetical order?

    3. bot/README.rst (Diff revision 8)
       
       
      Show all issues

      Can you insert this in alphabetical order?

    4. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
       
       
       
       
      Show all issues

      These regexes won't work in a few cases:

      1. Shell scripts with AWS_SECRET_KEY=... i.e., without quotes.
      2. Single-quoted string values

      We can get around this with:

      {
          'AWS_KEY': br'''(?:AWS_KEY|AWS_ACCESS_KEY|AWS_ACCESS_KEY_ID)\s*=\s*(?P<quote>["']?)[A-Z0-9]{20}(?P=quote)'''
      }
      
    5. Show all issues

      Single quotes.

    6. Show all issues

      Single quotes around AWS_SECRET_KEY

    7. Show all issues

      Trailing space, but it should also have a period.

      Also, comma after e.g..

      1. Also, comma after e.g..

        I am not sure what comma you mean by this.

      2. "e.g., pem, key, id_rsa"

    8. Show all issues

      Missing args/kwargs

    9. Show all issues

      https://docs.python.org/2/library/os.path.html#os.path.splitext

      1. I think current approach seems to end up working nicer.
        With spiltext, we need to maintain two sets instead of one (file names and file extensions) and then check for a given file if the file name is in file names set or file extension in file extensions set.

        credential_file_names = {
            '.aws_credentials'
            'id_dsa',
            'id_ecdsa',
            'id_rsa',
        }
        
        credential_file_types = {
            '.key',
            '.p12',
            '.pem',
            '.ppk',
        }
        

        So we would have to keep these sets and then check:

                if (f.file_type in unsafe_file_types or
                    f.filename in unsafe_file_types):
        

        where unsafe_file_types = credential_file_types + additional file types specified in options. Options just takes in one list currently A comma-separated list of file names and extensions, and from this list I don't see a way to split items into file types or extensions (both could start with a '.'). So we would have to create a second field in options.

        Is there a problem with keeping on using split?

      2. What you have is fine, but please add a comment explaining the reasoning.

    10. bot/reviewbot/tools/credentials_check.py (Diff revision 8)
       
       
       
      Show all issues

      .iteritems() is Python2 only. You'll want to do:

      import six 
      
      # ...
      
      for name, pattern in six.iteritems(self.compiled_re):
      
    11. bot/setup.py (Diff revision 8)
       
       
      Show all issues

      This should line up with the string above, e.g.

      ('... '
       '... '),
      
    12. bot/setup.py (Diff revision 8)
       
       
      Show all issues

      RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention using six in another comment).

    13. Show all issues

      Trailing whitespace.

    14. docs/reviewbot/tools/index.rst (Diff revision 8)
       
       
      Show all issues

      Can you insert this in alphabetical order?

    15. extension/README.rst (Diff revision 8)
       
       
      Show all issues

      Can you insert this in alphabetical order?

    16. 
        
    alextechcc
    1. 
        
    2. Show all issues

      Single quotes here

    3. 
        
    ammar
    david
    1. 
        
    2. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
       
       
       
      Show all issues

      six is a third-party library, so it should go in it's own "section":

      import re
      
      import six
      
      from reviewbot.tools import Tool
      
    3. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
       
       
       
       
       
       
       
      Show all issues

      Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will line up a bit nicer. We also don't need to use the triple quotes (better to just escape any inner ' characters), but we do need to have the br prefix on each line, even though they get concatenated.

      'AWS_KEY': (
          br'...'
          br'...'
      ),
      'AWS_SECRET_KEY': (
          br'...'
          br'...'
      ),
      
    4. bot/reviewbot/tools/credentials_check.py (Diff revision 9)
       
       
       
       
       
      Show all issues

      This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go:

      super(CredentialsCheckTool, self).__init__()
      
      self.compiled_re = {
          name: re.compile(pattern)
          for name, pattern in six.iteritems(credential_patterns)
      }
      
      1. Thanks! I did not know about dictionary comprehensions.

    5. bot/setup.py (Diff revision 9)
       
       
       
      Show all issues

      In this case I think it's probably better to just ignore the line length warning.

    6. Show all issues

      ReviewBot -> Review Bot

    7. 
        
    ammar
    Review request changed
    Commit:
    9eb1b74f700678b4b76838bcc7a0fe2ff3a10727
    7cebf8d3cfee0dc7b7a5aa8dea54c2bac753eda6

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    ilaw
    1. 
        
    2. bot/reviewbot/tools/credentials_check.py (Diff revision 10)
       
       
      Show all issues

      ReviewBot -> Review Bot

    3. bot/setup.py (Diff revision 10)
       
       
      Show all issues

      Should be in alphabetical order?

    4. 
        
    ammar
    Review request changed
    Commit:
    7cebf8d3cfee0dc7b7a5aa8dea54c2bac753eda6
    544b14a372d66e599399149480919555ce0004d0

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    brennie
    1. 
        
    2. bot/reviewbot/tools/credentials_check.py (Diff revision 11)
       
       
       
      Show all issues

      Will this fit as:

                  for risk_name, pattern in six.iteritems(
                      self._compiled_re):
                      # ...
      
    3. 
        
    ammar
    Review request changed
    Branch:
    release-1.0.x
    master
    Commit:
    544b14a372d66e599399149480919555ce0004d0
    554f3e838d5a44738f97469fe144da2ea6d821ad

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    ammar
    Review request changed
    Commit:
    554f3e838d5a44738f97469fe144da2ea6d821ad
    4047beed2fa56a129d6e9a9f39d6d687573577f4

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    brennie
    1. Ship It!
    2. 
        
    david
    1. Two tiny nits:

    2. Show all issues

      In your change description: "ReviewBot" -> "Review Bot" (3x)

    3. Show all issues

      Add another blank line here.

    4. 
        
    ammar
    Review request changed
    Description:
    ~  

    Previously, ReviewBot did not check for credentials that may have been

      ~

    Previously, Review Bot did not check for credentials that may have been

        accidentally included in the commit. A human reviewer would have to
        look out for them, but we hoped to move more of this task's burden
    ~   to ReviewBot.

      ~ to Review Bot.

       
    ~  

    A new Credentials Check tool has been added to ReviewBot which

      ~

    A new Credentials Check tool has been added to Review Bot which

        looks for various key files, other sensitive files and inline embedded
        AWS credentials to make sure these are not pushed to the repository.

    Commit:
    4047beed2fa56a129d6e9a9f39d6d687573577f4
    d332e509b24207efeaa0b32781ce14c05acd4d08

    Checks run (1 failed, 1 succeeded)

    flake8 failed.
    JSHint passed.

    flake8

    david
    Review request changed
    Status:
    Discarded