Add a Credential Checker tool for Review Bot
Review Request #11206 — Created Sept. 27, 2020 and submitted
This change adds a tool for Review Bot to check for hard-coded security
credentials. I used regex to search on the patterns of the known
credential types. The result may include false positives.
Verified the functionality by installing the tool and manual
testing.
Summary | ID | Author |
---|---|---|
61f178b4b2b176e3b60e9e10c145e99929c2afe1 | ceciliaccwei |
Description | From | Last Updated |
---|---|---|
Hmm, so I know in the project description we called it "AWS Credential Checker" but I wonder if we shouldn't … |
david | |
In your description, you say it "adds an integration". The lingo we use in review bot is "adds a tool" |
david | |
Filenames should always be lowercase and consistent with files in the same directory. We don't have any others with capitalization … |
chipx86 | |
This docstring almost reads as if "AWS Credential Checker" is a third-party tool that can be run. Let's say something … |
david | |
For imports, we have a standard based on Python's style guide. Imports should be broken up into 4 possible sections … |
david | |
Same here as the module docstring. |
david | |
There's nothing in this that requires using the raw string (r prefix). This should also use single-quotes instead of double. |
david | |
We're compiling the regex once per file, but of course it will never change. Can we define an __init__ method … |
david | |
We can avoid writing the file out to disk and reading it back in. Let's do something like: data = … |
david | |
I don't think we need to list the actual contents of the keys in the comment message, since it will … |
david | |
It seems like we could probably just do this inside the loop that is doing all the matches. |
david | |
It'd be nice to include what services this checks for. |
chipx86 | |
This is missing a docstring. |
chipx86 | |
We always want to use super to call parent methods. |
chipx86 | |
I have some notes for improving this. 1. Raw strings Let's prefix each string with r, which is for a … |
chipx86 | |
Is there anything more specific and easy to match than this? It feels pretty generic. |
chipx86 | |
There shouldn't be ablank line after function/method-level docstrings. Just file/class. |
chipx86 | |
Here's an interesting thing. We cannot assume UTF-8 here. Plenty of files are in other encodings. So what we should … |
chipx86 | |
Every time we call self.pattern.search(line), we're re-fetching pattern from self. For a 5,000 line file, that means 5,000 attribute accesses. … |
chipx86 | |
There should always be a blank line between statements (like match = ...) and blocks (if ...). |
chipx86 | |
When formatting multi-line strings, we prefer using parens and having the space at the end of a line instead of … |
chipx86 | |
There should be a blank line between the statement and block. |
chipx86 | |
We should log if this fails. To do this, at the top, import logging and do: logger = logging.getLogger(__name__) This … |
chipx86 | |
The name doesn't reflect the conventions used elsewhere. We may also want to distinguish it from other tools by prefixing … |
chipx86 | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Instead of "if a credential has been committed", perhaps "check for hard-coded security credentials"? Same in the docstrings throughout this. |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Hmm, I'm not sure about this comment text. Perhaps something like "This appears to be a hard-coded credential, which is … |
david | |
I don't think any of the other tools wrap f.comment in an exception handler. Were you encountering some problem? |
david | |
This text can wrap to a wider width (80 columns). Also same comment about the description text as in the … |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Exclamation points are rarely if ever appropriate for use in UI text, even for security notices. Let's just use a … |
david | |
Even though it's > 80 columns, let's put this all on one line. Wrapping like this is hard to read … |
david | |
Can we add a blank line between these two? |
david | |
E999 SyntaxError: invalid syntax |
reviewbot | |
Just a thought regarding the syntax error: maybe try using rb""" ... """ with triple double-quotes? |
david | |
We're only using the match once, so this can be combined: if pattern.search(line): f.comment(...) |
david | |
E501 line too long (94 > 79 characters) |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
This is only used in one place, so we might as well skip assigning the local variable and just use … |
david | |
E501 line too long (94 > 79 characters) |
reviewbot | |
I noticed a similar situation in the ShellCheck Tool and figured that it would be worth highlighting here as well. … |
jblazusi | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E501 line too long (94 > 79 characters) |
reviewbot | |
E999 SyntaxError: invalid syntax |
reviewbot | |
E501 line too long (94 > 79 characters) |
reviewbot |
-
-
Hmm, so I know in the project description we called it "AWS Credential Checker" but I wonder if we shouldn't make this more generic. It would be nice to be able to add on other common credential types (SSH keys, etc.)
-
In your description, you say it "adds an integration". The lingo we use in review bot is "adds a tool"
-
This docstring almost reads as if "AWS Credential Checker" is a third-party tool that can be run. Let's say something like "Review Bot tool to check for hardcoded credentials"
-
For imports, we have a standard based on Python's style guide. Imports should be broken up into 4 possible sections separated by newlines.
imports from __future__ standard library imports (that would be your re import) imports from third-party libraries not in the standard library. This can include imports from other packages that we maintain, such as rbtools. imports from "this package" (meaning anything in reviewbot)
Within that, we do any full-package imports first (
import ...
), followed by anyfrom
imports. All of that should be alphabetized. What you have is simple enough that most of those rules don't apply, but the sections need to be reorganized as such:from __future__ import unicode_literals import re from reviewbot.tools import Tool
-
-
There's nothing in this that requires using the raw string (r prefix). This should also use single-quotes instead of double.
-
We're compiling the regex once per file, but of course it will never change. Can we define an
__init__
method for the tool, and store the compiled regex as an attribute? -
We can avoid writing the file out to disk and reading it back in. Let's do something like:
data = f.patched_file_contents if not data: return for i, line in enumerate(data.splitlines()): ...
-
I don't think we need to list the actual contents of the keys in the comment message, since it will be associated with the line and show up with the context. That lets us simplify the message to something warning that they're potentially hard-coding secret credentials in their code.
In the future, one thing to be really careful about is the use of
str
in Python code. On Python 2,str
(akabytes
) is a bytestring andunicode
is a unicode string. On Python 3,str
is unicode andbytes
is bytes, which makes a lot more sense. Any time there's a potential mix of string types we need to be really careful, especially since Python 3 treats such mixing as an error but Python 2 will happily attempt to cast silently. -
If we make the above change to use
patched_file_contents
then this point is moot, but one cool thing about Python is you can use files as a context manager:with open(path) as f: ... do stuff with f...
When the code exits the context of that
with
block, it will automatically close the file. This happens even if there's an exception that occurs within the block, and helps prevent open file descriptor leaks. -
- Change Summary:
-
Addressed feedback; Added support for more credential types
- Description:
-
~ This change adds an integration for Review Bot to detect if an AWS
~ Access Key or Secret Key is included in the commit. I used regex to ~ search on the pattern of the keys. The result may include false ~ This change adds a tool for Review Bot to detect if a secret/credential
~ is included in the commit. I used regex to search on the patterns of the ~ known credential types. The result may include false positives. - positive. - Commits:
-
Summary ID Author 0b17b93380d4313fd62a3bcfa6547c8ddb32184b ceciliaccwei d6ab94d053012b0f3908a3538340039cc9afc670 ceciliaccwei
Checks run (2 succeeded)
- Change Summary:
-
Added more credential types.
- Commits:
-
Summary ID Author d6ab94d053012b0f3908a3538340039cc9afc670 ceciliaccwei 49a03565f1bda720bd13cb7c13e348bb49e26b5d ceciliaccwei
Checks run (2 succeeded)
-
-
Filenames should always be lowercase and consistent with files in the same directory. We don't have any others with capitalization or underscores.
Let's call this one
rbcredentialcheck.py
. -
-
-
-
I have some notes for improving this.
1. Raw strings
Let's prefix each string with
r
, which is for a raw string. This allows suff like\n
to literally match the text\n
, not a newline. It's often handy for regexes, and helps avoid mistakes in the future.2. Alphabetizing
Let's alphabetize the keys, so it's more clear where any new ones should go.
3. Performance
Currently, every time the tool is initialized, we reconstruct the pattern from the strings in a dictionary. I think there are two things we should do:
- Check for and store the pattern on the class during initialization (so we only construct it once per process) — this is probably best moved to
__init__
in the process. - Build this as one big regex.
The latter would look like:
pattern = re.compile( r'''( # AWS Access Key ((A3T[A-Z0-9]|AKIA|AGPA|AIDA|AROA|AIPA|ANPA|ANVA|ASIA)[A-Z0-9]{16}) | # AWS Secret Key ([aA][wW][sS].*[0-9a-zA-Z/+]{40}) | ..etc.. )''', re.X)
This is faster and allows expressions to be more free-form and have room to breathe.
4. Alphabetization
I'm tempted to say we shouldn't trust that the content is going to always be in the same casing. We don't want to miss something just because someone pasted a SSH key that had
bEGIN RSA PRIVATE KEY
.Ideally we'd use Python 3.7+'s ability to mark subsections of regexes as case-insensitive, but we can't. We could just require it for the whole string by using the
re.I
option onre.compile()
.Interested in hearing your thoughts on that.
- Check for and store the pattern on the class during initialization (so we only construct it once per process) — this is probably best moved to
-
-
-
Here's an interesting thing. We cannot assume UTF-8 here. Plenty of files are in other encodings.
So what we should do is leave this as a byte string, and have the compiled regex also be a byte string. Then we don't have to think about encodings. This is also faster.
-
Every time we call
self.pattern.search(line)
, we're re-fetchingpattern
fromself
. For a 5,000 line file, that means 5,000 attribute accesses.Instead, let's pull
pattern
out ofself
before the loop, and access it directly as a local variable. -
-
When formatting multi-line strings, we prefer using parens and having the space at the end of a line instead of the beginning.
However, this can just be provided inline to
f.comment()
. -
-
We should log if this fails. To do this, at the top,
import logging
and do:logger = logging.getLogger(__name__)
This gives us a logger that will namespace any messages, making it clear which tool this is coming from. You can then call
logger.exception(...)
with suitable information. -
The name doesn't reflect the conventions used elsewhere. We may also want to distinguish it from other tools by prefixing it. How about
rbcredentialcheck
?
- Change Summary:
-
Addressed feedback.
- Commits:
-
Summary ID Author 49a03565f1bda720bd13cb7c13e348bb49e26b5d ceciliaccwei b970b67ac8164a4d56aa4773d503162d184dd5fe ceciliaccwei
- Change Summary:
-
flake8
- Commits:
-
Summary ID Author b970b67ac8164a4d56aa4773d503162d184dd5fe ceciliaccwei cd5abaa19b36743f07e76f5f563ba1135b3c6d39 ceciliaccwei
- Change Summary:
-
Added tool documentation.
- Commits:
-
Summary ID Author cd5abaa19b36743f07e76f5f563ba1135b3c6d39 ceciliaccwei 71a52ba995d527277c31268d5f142e11caa98e97 ceciliaccwei
-
Looking pretty great!
-
Instead of "if a credential has been committed", perhaps "check for hard-coded security credentials"?
Same in the docstrings throughout this.
-
Hmm, I'm not sure about this comment text. Perhaps something like "This appears to be a hard-coded credential, which is a potential security risk" ?
-
I don't think any of the other tools wrap
f.comment
in an exception handler. Were you encountering some problem? -
This text can wrap to a wider width (80 columns). Also same comment about the description text as in the credential checker source.
- Change Summary:
-
Addressed feedback.
- Description:
-
~ This change adds a tool for Review Bot to detect if a secret/credential
~ is included in the commit. I used regex to search on the patterns of the ~ known credential types. The result may include false positives. ~ This change adds a tool for Review Bot to check for hard-coded security
~ credentials. I used regex to search on the patterns of the known ~ credential types. The result may include false positives. - Commits:
-
Summary ID Author 71a52ba995d527277c31268d5f142e11caa98e97 ceciliaccwei 4b2a7b5a327ad3663b6cadb774d9152be2debf6f ceciliaccwei - Diff:
-
Revision 7 (+290)
- Added Files:
- Change Summary:
-
Addressed feedback
- Commits:
-
Summary ID Author 4b2a7b5a327ad3663b6cadb774d9152be2debf6f ceciliaccwei a9c1b08ef4a3db109c000f31e27d728411d999c9 ceciliaccwei
- Change Summary:
-
Addressed feedback
- Commits:
-
Summary ID Author a9c1b08ef4a3db109c000f31e27d728411d999c9 ceciliaccwei 68a4eea01e57407603e060708bbe06d7e57c033c ceciliaccwei
- Commits:
-
Summary ID Author 68a4eea01e57407603e060708bbe06d7e57c033c ceciliaccwei 49f89daeddf4937e24586eca73811b24f48e2833 ceciliaccwei