Add a Credential Checker tool for Review Bot
Review Request #11206 — Created Sept. 27, 2020 and submitted
This change adds a tool for Review Bot to check for hard-coded security
credentials. I used regex to search on the patterns of the known
credential types. The result may include false positives.
Verified the functionality by installing the tool and manual
testing.
Summary | ID | Author |
---|---|---|
61f178b4b2b176e3b60e9e10c145e99929c2afe1 | ceciliaccwei |
Description | From | Last Updated |
---|---|---|
Hmm, so I know in the project description we called it "AWS Credential Checker" but I wonder if we shouldn't … |
david | |
In your description, you say it "adds an integration". The lingo we use in review bot is "adds a tool" |
david | |
Filenames should always be lowercase and consistent with files in the same directory. We don't have any others with capitalization … |
chipx86 | |
This docstring almost reads as if "AWS Credential Checker" is a third-party tool that can be run. Let's say something … |
david | |
For imports, we have a standard based on Python's style guide. Imports should be broken up into 4 possible sections … |
david | |
Same here as the module docstring. |
david | |
There's nothing in this that requires using the raw string (r prefix). This should also use single-quotes instead of double. |
david | |
We're compiling the regex once per file, but of course it will never change. Can we define an __init__ method … |
david | |
We can avoid writing the file out to disk and reading it back in. Let's do something like: data = … |
david | |
I don't think we need to list the actual contents of the keys in the comment message, since it will … |
david | |
It seems like we could probably just do this inside the loop that is doing all the matches. |
david | |
It'd be nice to include what services this checks for. |
chipx86 | |
This is missing a docstring. |
chipx86 | |
We always want to use super to call parent methods. |
chipx86 | |
I have some notes for improving this. 1. Raw strings Let's prefix each string with r, which is for a … |
chipx86 | |
Is there anything more specific and easy to match than this? It feels pretty generic. |
chipx86 | |
There shouldn't be ablank line after function/method-level docstrings. Just file/class. |
chipx86 | |
Here's an interesting thing. We cannot assume UTF-8 here. Plenty of files are in other encodings. So what we should … |
chipx86 | |
Every time we call self.pattern.search(line), we're re-fetching pattern from self. For a 5,000 line file, that means 5,000 attribute accesses. … |
chipx86 | |
There should always be a blank line between statements (like match = ...) and blocks (if ...). |
chipx86 | |
When formatting multi-line strings, we prefer using parens and having the space at the end of a line instead of … |
chipx86 | |
There should be a blank line between the statement and block. |
chipx86 | |
We should log if this fails. To do this, at the top, import logging and do: logger = logging.getLogger(__name__) This … |
chipx86 | |
The name doesn't reflect the conventions used elsewhere. We may also want to distinguish it from other tools by prefixing … |
chipx86 | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Instead of "if a credential has been committed", perhaps "check for hard-coded security credentials"? Same in the docstrings throughout this. |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Hmm, I'm not sure about this comment text. Perhaps something like "This appears to be a hard-coded credential, which is … |
david | |
I don't think any of the other tools wrap f.comment in an exception handler. Were you encountering some problem? |
david | |
This text can wrap to a wider width (80 columns). Also same comment about the description text as in the … |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Exclamation points are rarely if ever appropriate for use in UI text, even for security notices. Let's just use a … |
david | |
Even though it's > 80 columns, let's put this all on one line. Wrapping like this is hard to read … |
david | |
Can we add a blank line between these two? |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Just a thought regarding the syntax error: maybe try using rb""" ... """ with triple double-quotes? |
david | |
We're only using the match once, so this can be combined: if pattern.search(line): f.comment(...) |
david | |
E501 line too long (94 > 79 characters) |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
This is only used in one place, so we might as well skip assigning the local variable and just use … |
david | |
E501 line too long (94 > 79 characters) |
reviewbot | |
I noticed a similar situation in the ShellCheck Tool and figured that it would be worth highlighting here as well. … |
jblazusi | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E501 line too long (94 > 79 characters) |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E501 line too long (94 > 79 characters) |
reviewbot |
-
-
Hmm, so I know in the project description we called it "AWS Credential Checker" but I wonder if we shouldn't make this more generic. It would be nice to be able to add on other common credential types (SSH keys, etc.)
-
In your description, you say it "adds an integration". The lingo we use in review bot is "adds a tool"
-
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) This docstring almost reads as if "AWS Credential Checker" is a third-party tool that can be run. Let's say something like "Review Bot tool to check for hardcoded credentials"
-
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) For imports, we have a standard based on Python's style guide. Imports should be broken up into 4 possible sections separated by newlines.
imports from __future__ standard library imports (that would be your re import) imports from third-party libraries not in the standard library. This can include imports from other packages that we maintain, such as rbtools. imports from "this package" (meaning anything in reviewbot)
Within that, we do any full-package imports first (
import ...
), followed by anyfrom
imports. All of that should be alphabetized. What you have is simple enough that most of those rules don't apply, but the sections need to be reorganized as such:from __future__ import unicode_literals import re from reviewbot.tools import Tool
-
-
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) There's nothing in this that requires using the raw string (r prefix). This should also use single-quotes instead of double.
-
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) We're compiling the regex once per file, but of course it will never change. Can we define an
__init__
method for the tool, and store the compiled regex as an attribute? -
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) We can avoid writing the file out to disk and reading it back in. Let's do something like:
data = f.patched_file_contents if not data: return for i, line in enumerate(data.splitlines()): ...
-
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) I don't think we need to list the actual contents of the keys in the comment message, since it will be associated with the line and show up with the context. That lets us simplify the message to something warning that they're potentially hard-coding secret credentials in their code.
In the future, one thing to be really careful about is the use of
str
in Python code. On Python 2,str
(akabytes
) is a bytestring andunicode
is a unicode string. On Python 3,str
is unicode andbytes
is bytes, which makes a lot more sense. Any time there's a potential mix of string types we need to be really careful, especially since Python 3 treats such mixing as an error but Python 2 will happily attempt to cast silently. -
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) If we make the above change to use
patched_file_contents
then this point is moot, but one cool thing about Python is you can use files as a context manager:with open(path) as f: ... do stuff with f...
When the code exits the context of that
with
block, it will automatically close the file. This happens even if there's an exception that occurs within the block, and helps prevent open file descriptor leaks. -
bot/reviewbot/tools/AWS_Credential_Checker.py (Diff revision 1) It seems like we could probably just do this inside the loop that is doing all the matches.
Change Summary:
Addressed feedback; Added support for more credential types
Description: |
|
|||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Commits: |
|
|||||||||||||||||||||
Diff: |
Revision 2 (+146) |
Checks run (2 succeeded)
Change Summary:
Added more credential types.
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 3 (+156) |
Checks run (2 succeeded)
-
-
Filenames should always be lowercase and consistent with files in the same directory. We don't have any others with capitalization or underscores.
Let's call this one
rbcredentialcheck.py
. -
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) It'd be nice to include what services this checks for.
-
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) We always want to use
super
to call parent methods. -
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) I have some notes for improving this.
1. Raw strings
Let's prefix each string with
r
, which is for a raw string. This allows suff like\n
to literally match the text\n
, not a newline. It's often handy for regexes, and helps avoid mistakes in the future.2. Alphabetizing
Let's alphabetize the keys, so it's more clear where any new ones should go.
3. Performance
Currently, every time the tool is initialized, we reconstruct the pattern from the strings in a dictionary. I think there are two things we should do:
- Check for and store the pattern on the class during initialization (so we only construct it once per process) — this is probably best moved to
__init__
in the process. - Build this as one big regex.
The latter would look like:
pattern = re.compile( r'''( # AWS Access Key ((A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}) | # AWS Secret Key ([aA][wW][sS].*[0-9a-zA-Z/+]{40}) | ..etc.. )''', re.X)
This is faster and allows expressions to be more free-form and have room to breathe.
4. Alphabetization
I'm tempted to say we shouldn't trust that the content is going to always be in the same casing. We don't want to miss something just because someone pasted a SSH key that had
bEGIN RSA PRIVATE KEY
.Ideally we'd use Python 3.7+'s ability to mark subsections of regexes as case-insensitive, but we can't. We could just require it for the whole string by using the
re.I
option onre.compile()
.Interested in hearing your thoughts on that.
- Check for and store the pattern on the class during initialization (so we only construct it once per process) — this is probably best moved to
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) Is there anything more specific and easy to match than this? It feels pretty generic.
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) There shouldn't be ablank line after function/method-level docstrings. Just file/class.
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) Here's an interesting thing. We cannot assume UTF-8 here. Plenty of files are in other encodings.
So what we should do is leave this as a byte string, and have the compiled regex also be a byte string. Then we don't have to think about encodings. This is also faster.
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) Every time we call
self.pattern.search(line)
, we're re-fetchingpattern
fromself
. For a 5,000 line file, that means 5,000 attribute accesses.Instead, let's pull
pattern
out ofself
before the loop, and access it directly as a local variable. -
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) There should always be a blank line between statements (like
match = ...
) and blocks (if ...
). -
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) When formatting multi-line strings, we prefer using parens and having the space at the end of a line instead of the beginning.
However, this can just be provided inline to
f.comment()
. -
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) There should be a blank line between the statement and block.
-
bot/reviewbot/tools/Credential_Checker.py (Diff revision 3) We should log if this fails. To do this, at the top,
import logging
and do:logger = logging.getLogger(__name__)
This gives us a logger that will namespace any messages, making it clear which tool this is coming from. You can then call
logger.exception(...)
with suitable information. -
bot/setup.py (Diff revision 3) The name doesn't reflect the conventions used elsewhere. We may also want to distinguish it from other tools by prefixing it. How about
rbcredentialcheck
?
Change Summary:
Addressed feedback.
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 4 (+238) |
Checks run (1 failed, 1 succeeded)
flake8
Change Summary:
flake8
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 5 (+240) |
Checks run (1 failed, 1 succeeded)
flake8
Change Summary:
Added tool documentation.
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 6 (+306) |
Checks run (1 failed, 1 succeeded)
flake8
-
Looking pretty great!
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 6) Instead of "if a credential has been committed", perhaps "check for hard-coded security credentials"?
Same in the docstrings throughout this.
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 6) Hmm, I'm not sure about this comment text. Perhaps something like "This appears to be a hard-coded credential, which is a potential security risk" ?
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 6) I don't think any of the other tools wrap
f.comment
in an exception handler. Were you encountering some problem? -
docs/reviewbot/tools/rbcredentialchecker.rst (Diff revision 6) This text can wrap to a wider width (80 columns). Also same comment about the description text as in the credential checker source.
Change Summary:
Addressed feedback.
Description: |
|
||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Commits: |
|
||||||||||||||||||
Diff: |
Revision 7 (+290) |
||||||||||||||||||
Added Files: |
Checks run (1 failed, 1 succeeded)
flake8
-
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 7) Exclamation points are rarely if ever appropriate for use in UI text, even for security notices. Let's just use a period.
-
bot/setup.py (Diff revision 7) Even though it's > 80 columns, let's put this all on one line. Wrapping like this is hard to read and error-prone.
-
docs/reviewbot/tools/rbcredentialchecker.rst (Diff revision 7) Can we add a blank line between these two?
Change Summary:
Addressed feedback
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 8 (+290) |
Checks run (1 failed, 1 succeeded)
flake8
-
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 8) Just a thought regarding the syntax error: maybe try using
rb""" ... """
with triple double-quotes? -
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 8) We're only using the match once, so this can be combined:
if pattern.search(line): f.comment(...)
Change Summary:
Addressed feedback
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 9 (+286) |
Checks run (1 failed, 1 succeeded)
flake8
-
Looking very solid. I just see one nit left to pick.
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 9) This is only used in one place, so we might as well skip assigning the local variable and just use
self.pattern.search()
below.
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 10 (+284) |
Checks run (1 failed, 1 succeeded)
flake8
-
-
bot/reviewbot/tools/rbcredentialchecker.py (Diff revision 10) I noticed a similar situation in the ShellCheck Tool and figured that it would be worth highlighting here as well.
In short, this will most likely have compatibility issues with python 2.7 and thus should probably be written as
super(CredentialCheckerTool, self).__init__()
.
Change Summary:
Make the super() method python2.7 compatible.
Commits: |
|
||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Diff: |
Revision 11 (+284) |