CODEpendence

Posted: July 7, 2021 at 11:16 AM

TL;dr: If you use submodules that point to a GitHub repo, make sure that the commit id matches an offical branch or tag, especially if upgraded via a PR or submitted patch.

This issue was disclosed to GitHub via the HackerOne Bug Bounty program and resolved by them in a timely manner. The writeup is available and is the same one that was provided to GitHub. It contains the complete steps in more detail than this blog post does.

Discovery¶

Earlier this year, I was dealing with a git repo that used submodules. I’ve never been a fan of them due to the extra work involved in using them. But then a thought hit me, last year, when GitHub took down youtube-dl, someone was a bit sneaky and inserted a copy of it into GitHub’s DMCA repo. They were able to do this because there is a feature/bug in GitHub’s backend, that all the commits to a forked repo are accessible in the parent repo, it’s just that the branches and tags are maintained separately. This makes sense to reduce storing duplicate data if a repo is large or has many forks.

Verification¶

The next question, would combining these into an attack even work? What would things look like? I created a few accounts to test them, creating a project to represent a code dependancy, depproj, that would be imported into another project by another user, proj. Then once those were created, have a malicious user create a fork of both the deprpoj and the proj.

Once the malicious forks were created, clone them locally. With the clones, malicious code can be inserted into the depproj repo. If you look at the repo, the previous commit was done as the maliciousrepo user, but while I was working on this, I remembered that w/ git, you can set the commit author to be anything (signing helps prevent that), so this commit appears to be done by the correct upstream123 user.

Once the malicious code has been inserted, the malicious user can now update the submodule of the project to the commit id of the malicious code. This is done simply by doing:

cd depproj
git fetch origin <commitid>
git checkout <commitid>

Even though the depproj still points to the upstream123 repo, because fork commits appear IN the depproj repo, the above works w/o any other changes. This is also what makes it dangerous, because the repo is not changed, it can be disguised as a simple version update.

A PR is then submitted to the project being attacked. I did not control the author of commits as well as I should have, but it still is effective. If you click into the proposed change, and then click on code.c, the file changed, it’ll bring you to the change compare view. For this demo, it was a small change, but if the project is large, it’s would be easy to bury a minor flaw in lots of changes. The other thing to note about this page is that the author displayed is NOT the author of the change, but it appears that it is a legitimate change by the author of the repo.

Conclusion¶

This is an interesting attack in that it leverages two features in a way that has surprising results. It demonstrates that software dependancies need to be reviewed, and vetted, and that if you’re using GitHub, that just because a PR says it’s updating a submodule to the new version, it doesn’t mean that it is safe to simply merge in the change.

Timeline¶

2021-03-31 — Reported to GitHub via HackerOne.
2021-03-31 — More info requested and provided.
2021-04-01 — Ack’d issue and started work on fix.
2021-05-04 — GitHub determined it was low risk, but did add warning when viewing commit.
2021-05-05 — Asked GitHub for disclosure timeline.
2021-06-04 — Pinged GitHub again.
2021-07-07 — Published blog post.