Add credentials check tool
Review Request #10223 — Created Oct. 11, 2018 and discarded
Previously, Review Bot did not check for credentials that may have been
accidentally included in the commit. A human reviewer would have to
look out for them, but we hoped to move more of this task's burden
to Review Bot.A new Credentials Check tool has been added to Review Bot which
looks for various key files, other sensitive files and inline embedded
AWS credentials to make sure these are not pushed to the repository.
Manual tests (correctly finds and creates issues on lines with the
credentials, or marks the first line of a file type (including
file types specified from options tab) that should not have been
included e.g. .pem files).
Description | From | Last Updated |
---|---|---|
Can you wrap your description and testing done at 72 Chars? |
brennie | |
Is this a WIP? Your description has "WIP: Add options..." If this is WIP please put it in the summary |
brennie | |
In your change description: "ReviewBot" -> "Review Bot" (3x) |
david | |
E501 line too long (83 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E501 line too long (85 > 79 characters) |
reviewbot | |
E501 line too long (80 > 79 characters) |
reviewbot | |
E501 line too long (84 > 79 characters) |
reviewbot | |
E501 line too long (89 > 79 characters) |
reviewbot | |
typo: "credntialscheck" |
brennie | |
typo: "credntialscheck" |
brennie | |
Missing module-level docstring |
brennie | |
Module imports should be formatted as: from __future__ import ... # Python STDLib imports # 3rd party imports # Imports … |
brennie | |
Instead of having multiple credential regexes, we can make this into a single regular expression: compiled_credential_pattern = re.compile( '|'.join( '(%s)' … |
brennie | |
Single quotes here and throughout |
brennie | |
Missing trailing comma. This regex doesn't do what you want it to becuase of the leading [, which makes everything … |
brennie | |
Instead of doing this here, we should do it in CredentialsCheckTool.__init__ so that theyre not sitting here taking up memory … |
brennie | |
Docstrings should be of the form: """Single line summary. Multi-line description. """ |
brennie | |
We use the Oxford comma, so there should be a comma after private keys. |
brennie | |
How about just "Review a single file."? |
brennie | |
Blank line between these. |
brennie | |
This will not detect files named id_rsa becuase it is not an extension. You will need a separate set of … |
brennie | |
We should word this as "may be a security risk" because if its a public key --PEM files can be … |
brennie | |
Comments should be complete sentences: they should begin with a capital letter and end with a period. |
brennie | |
What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions … |
brennie | |
Blank line between these. |
brennie | |
Since pattern is a compiled regular expression, you can just do pattern.match(line) |
brennie | |
You're going to need to add a entrypoint for your tool. |
brennie | |
:file:`.pem` :file:`id_rsa` |
brennie | |
How about: This tool is built into ReviewBot. There is no separate installation step required. |
brennie | |
Typo: "credntialscheck" |
brennie | |
One more blank line here. |
brennie | |
Can you add a docstring here? |
brennie | |
No blank line here. |
brennie | |
Can you wrap these in parens to make it clear that this is supposed to be multiple lines? e.g. python … |
brennie | |
E128 continuation line under-indented for visual indent |
reviewbot | |
It looks like these wouldn't catch cases where the value was enclosed in quotes? |
david | |
Let's put one per line and sort them all alphabetically. |
david | |
Should probably be "Including this file ..." (files -> file) |
david | |
We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need … |
david | |
Do we want to use .search() instead of .match()? |
david | |
I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private … |
david | |
A link to this needs to be added to docs/reviewbot/tools/index.rst |
david | |
This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or … |
david | |
E124 closing bracket does not match visual indentation |
reviewbot | |
E124 closing bracket does not match visual indentation |
reviewbot | |
Can you insert this in alphabetical order? |
brennie | |
Can you insert this in alphabetical order? |
brennie | |
These regexes won't work in a few cases: Shell scripts with AWS_SECRET_KEY=... i.e., without quotes. Single-quoted string values We can … |
brennie | |
Single quotes. |
brennie | |
Single quotes around AWS_SECRET_KEY |
brennie | |
Single quotes here |
alextechcc | |
Trailing space, but it should also have a period. Also, comma after e.g.. |
brennie | |
Missing args/kwargs |
brennie | |
https://docs.python.org/2/library/os.path.html#os.path.splitext |
brennie | |
.iteritems() is Python2 only. You'll want to do: import six # ... for name, pattern in six.iteritems(self.compiled_re): |
brennie | |
This should line up with the string above, e.g. ('... ' '... '), |
brennie | |
RBTools depends on six but if we're using it directly, we should have our own dependency on it. (I mention … |
brennie | |
Trailing whitespace. |
brennie | |
Can you insert this in alphabetical order? |
brennie | |
Can you insert this in alphabetical order? |
brennie | |
six is a third-party library, so it should go in it's own "section": import re import six from reviewbot.tools import … |
david | |
Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will … |
david | |
This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go: super(CredentialsCheckTool, self).__init__() … |
david | |
In this case I think it's probably better to just ignore the line length warning. |
david | |
ReviewBot -> Review Bot |
david | |
ReviewBot -> Review Bot |
ilaw | |
E501 line too long (88 > 79 characters) |
reviewbot | |
Should be in alphabetical order? |
ilaw | |
Will this fit as: for risk_name, pattern in six.iteritems( self._compiled_re): # ... |
brennie | |
E501 line too long (88 > 79 characters) |
reviewbot | |
F821 undefined name 'compiled_pattern' |
reviewbot | |
E501 line too long (88 > 79 characters) |
reviewbot | |
E501 line too long (88 > 79 characters) |
reviewbot | |
Add another blank line here. |
david | |
E501 line too long (88 > 79 characters) |
reviewbot |
- Commit:
-
ac44d7ecf125fc206d0459a16ce2b3c81d7bf6a7b872485315ddf3a609b1cd467fc1e958a24586a4
Checks run (2 succeeded)
-
-
-
-
-
Module imports should be formatted as:
from __future__ import ... # Python STDLib imports # 3rd party imports # Imports from this package
e.g.
from __future__ import unicode_literals import re import chardet from reviewbot.tools import Tool
-
Instead of having multiple credential regexes, we can make this into a single regular expression:
compiled_credential_pattern = re.compile( '|'.join( '(%s)' % pattern for pattern in credential_patterns ) )
-
-
Missing trailing comma.
This regex doesn't do what you want it to becuase of the leading
[
, which makes everything up to]
a character this regex will match.e.g.
>>> import re >>> x = re.compile(r"[AWS_SECRET_KEY\s*=\s*[A-Za-z0-9/+=]{40}") >>> m = x.match('SSSSSSSSSSSS = %s' % 'A' * 40) >>> bool(m) True
-
Instead of doing this here, we should do it in
CredentialsCheckTool.__init__
so that theyre not sitting here taking up memory if the tool isn't used. e.g.class CredentialsCheckTool(Tool): def __init__(self, *args, **kwargs): super(CredentialsCheckTool, self).__init__(*args, **kwargs) self.compiled_re = [ re.compile(regex) for regex in credential_patterns ]
-
-
-
-
-
This will not detect files named
id_rsa
becuase it is not an extension. You will need a separate set of file names (id_rsa
,id_dsa
,id_ecdsa
) versus file extensions (p12
,pem
,ppk
,key
). -
We should word this as "may be a security risk" because if its a public key --PEM files can be encoded public keys -- it totally isn't.
-
Comments should be complete sentences: they should begin with a capital letter and end with a period.
-
I would organize this as:
try: encoding = chardet.detect( f.patched_file_contents)['encoding'] if encoding: contents = f.patched_file_contents.decode( encoding, 'strict') else: # We can't do any more for this file. return except (TypeError, UnicodeError, ValueError): return lines = contents.split('\n') for line_number, line in enumerate(lines, 1): for pattern in compiled_credential_patterns: if pattern.match(line): f.comment(...)
This lets us use explicit control flow (return) when we are done instead of keeping track of boolean state.
-
-
What exceptions are we hoping to catch? We should be very specific about what we expect so that other exceptions that we aren't expecting aren't also captured.
Looking at the source of
chardet.detect
, it looks like it only raisesTypeError
.bytes.decode(..., 'strict')
seems to raiseValueError
andUnicodeError
. -
-
-
-
-
-
- Commit:
-
b872485315ddf3a609b1cd467fc1e958a24586a4b988408dc3c6a168c18e64b00c89901d5c612fa7
Checks run (2 succeeded)
-
-
-
Is this a WIP? Your description has "WIP: Add options..."
If this is WIP please put it in the summary
-
-
-
-
Can you wrap these in parens to make it clear that this is supposed to be multiple lines?
e.g.
python ('credentialscheck = ' '...'),
This will help in not accidentally adding a comma there in the future.
- Description:
-
~ Previously, ReviewBot did not check for credentials that may have been accidentally included in the commit. A human reviewer would have to look out for them, but we hoped to move more of this task's burden to ReviewBot.
~ Previously, ReviewBot did not check for credentials that may have been
+ accidentally included in the commit. A human reviewer would have to + look out for them, but we hoped to move more of this task's burden + to ReviewBot. ~ A new Credentials Check tool has been added to ReviewBot which looks for various key files, other sensitive files and inline embedded AWS credentials to make sure these are not pushed to the repository.
~ ~ WIP: Add options to let users enter their own files to ignore.
~ A new Credentials Check tool has been added to ReviewBot which
~ looks for various key files, other sensitive files and inline embedded ~ AWS credentials to make sure these are not pushed to the repository. - Testing Done:
-
~ Manual tests (correctly finds and creates issues on lines with the credentials, or marks the first line of a file type that should not have been included e.g. .pem files)
~ Manual tests (correctly finds and creates issues on lines with the
+ credentials, or marks the first line of a file type (including + file types specified from options tab) that should not have been + included e.g. .pem files). - Commit:
-
b988408dc3c6a168c18e64b00c89901d5c612fa7ac317d290bcc069e970f67365e879db2e3affba9
- Commit:
-
ac317d290bcc069e970f67365e879db2e3affba9d5e3af9386608400281a3eb40045da461682af37
Checks run (2 succeeded)
-
-
-
-
-
We should be able to use regex matching against bytestrings, so we can skip the detection/decoding here. We just need to make sure that the patterns use
br'...'
-
-
I feel like we should be more verbose about what the problem might be (for example, "Potential disclosure of private Amazon AWS keys")
-
-
This reads a little funky. How about "Improper credentials can include things such as AWS keys hardcoded in source or private key files."
- Commit:
-
120e7993fdca837f1c8015d2c8f0cb51ccf9cc1dfec83230768108490c64c22505158cf99e7db9b6
Checks run (2 succeeded)
-
-
-
-
These regexes won't work in a few cases:
- Shell scripts with
AWS_SECRET_KEY=...
i.e., without quotes. - Single-quoted string values
We can get around this with:
{ 'AWS_KEY': br'''(?:AWS_KEY|AWS_ACCESS_KEY|AWS_ACCESS_KEY_ID)\s*=\s*(?P<quote>["']?)[A-Z0-9]{20}(?P=quote)''' }
- Shell scripts with
-
-
-
-
-
-
.iteritems()
is Python2 only. You'll want to do:import six # ... for name, pattern in six.iteritems(self.compiled_re):
-
-
RBTools depends on
six
but if we're using it directly, we should have our own dependency on it. (I mention usingsix
in another comment). -
-
-
- Commit:
-
fec83230768108490c64c22505158cf99e7db9b69eb1b74f700678b4b76838bcc7a0fe2ff3a10727
Checks run (2 succeeded)
-
-
six is a third-party library, so it should go in it's own "section":
import re import six from reviewbot.tools import Tool
-
Formatting here could be a little nicer. If you put the parens on their own lines, then the strings will line up a bit nicer. We also don't need to use the triple quotes (better to just escape any inner
'
characters), but we do need to have thebr
prefix on each line, even though they get concatenated.'AWS_KEY': ( br'...' br'...' ), 'AWS_SECRET_KEY': ( br'...' br'...' ),
-
This should use six.iteritems. Also, this can use a dict comprehension to do it all in one go:
super(CredentialsCheckTool, self).__init__() self.compiled_re = { name: re.compile(pattern) for name, pattern in six.iteritems(credential_patterns) }
-
-
- Branch:
-
release-1.0.xmaster
- Commit:
-
544b14a372d66e599399149480919555ce0004d0554f3e838d5a44738f97469fe144da2ea6d821ad
- Description:
-
~ Previously, ReviewBot did not check for credentials that may have been
~ Previously, Review Bot did not check for credentials that may have been
accidentally included in the commit. A human reviewer would have to look out for them, but we hoped to move more of this task's burden ~ to ReviewBot. ~ to Review Bot. ~ A new Credentials Check tool has been added to ReviewBot which
~ A new Credentials Check tool has been added to Review Bot which
looks for various key files, other sensitive files and inline embedded AWS credentials to make sure these are not pushed to the repository. - Commit:
-
4047beed2fa56a129d6e9a9f39d6d687573577f4d332e509b24207efeaa0b32781ce14c05acd4d08