AI assisted code reviews
Using AI to review your pull-requests
I recently came across the concept of using AI to review pull-requests. I thought this was a brilliant idea and decided to try it myself. It turns out there are already services tightly integrated with GitHub/GitBucket that can handle this, as well as some GitHub Actions. However, there’s still room for improvement, and some may prefer a custom solution tailored to their needs. Additionally, keeping code private and avoiding third-party services is a valid concern.
I decided to experiment with a custom solution using Google’s Gemini model and a self-hosted GitHub server to run workflows with hardware-in-the-loop (HIL). Note that this is a proof of concept rather than a production-ready solution. Toward the end, I’ll discuss potential improvements to the workflow. For now, let’s build something that works.
Requirements
- STM32 Nucleo-H723 running FreeRTOS
- Self-hosted Github server
- Access to some kind AI model API (Gemini, GPT-4, etc.)
- You need to make a repository secret with the name
NONYA_BUSINESS_API_KEY
and the value of your API key. It can be any name you want, but you will need to update the code accordingly.
Setting the github workflow
I won’t detail how to set up a self-hosted GitHub server, as it’s straightforward and I’ll assume you have one running.
The workflow needs to accomplish the following:
- Check out the code
- Build the code
- Flash the hardware
- Provide AI with a diff of the code (meaning the changes made in the pull-request)
- Push that diff to the AI model and get a response
- Post the response as a comment on the pull-request
Build and flash the device
For this proof of concept, flashing the device isn’t strictly necessary since the focus is on AI code reviews.
However, I’ll include it in the workflow to meet the HIL requirement.
First order of business is to set up the workflow. The workflow YAML file should be placed in .github/workflows/
and should look something like this:
name: Flash Device
on:
pull_request:
branches:
- main
workflow_dispatch: # Allows manual triggering of the workflow
This snippet sets up a workflow that runs on pull requests to the main branch and supports manual triggering. Next, define the jobs.
The first job, flash, checks out the code, builds it, and flashes the device. For clarity, I’m breaking the YAML into sections, but it all belongs in a single file:
jobs:
flash:
runs-on: self-hosted
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Create virtual environment
run: |
python -m venv .venv
source .venv/bin/activate
- name: Install dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r .github/workflows/helpers/requirements.txt
- name: Build firmware
run: |
cmake --preset=Debug
cmake --build --preset=Debug
- name: Flash the device
env:
PROJECT_DIR: ${{ github.workspace }}
run: |
source .venv/bin/activate
python .github/workflows/helpers/flash_device.py
The Build firmware
step is specific to the STM32’s CMake setup, but you can replace it with your preferred build system.
The flash job runs on a self-hosted runner, which is necessary for HIL workflows unless you’re shipping hardware to GitHub (ha!). Each step in a job is a command that will be run in the order they are defined.
Checkout repository
will check out the code in the pull-request.Set up Python
will set up a python environment for us to use.Create virtual environment
will create a virtual environment for us to use.Install dependencies
will install the dependencies in the requirements.txt file.Build firmware
will build the firmware using cmake.Flash the device
will flash the device using a python script that I will provide later.
It is important to note that Python is used here because I chose it for the flash script. Alternatively, you could use a Bash script and install ARM GNU binaries and OpenOCD on the self-hosted runner. Since the runner is an isolated environment, it lacks default tools. My script uses a direct path to STM32CubeProgrammer, but ideally, you’d install tools in the runner’s directory.
The env
section in the Flash the device
step is used to set environment variables that will be available to the python script. Also note the sourcing
of the virtual environment in every step, since every step can be thought of as a separate shell, we need to source the virtual environment in every step that needs it.
The python script will look like this:
import subprocess
import os
def flash_device():
# Get the project directory from the environment variable
project_dir = os.getenv("PROJECT_DIR", "")
if not project_dir:
raise ValueError("PROJECT_DIR environment variable is not set.")
elf_file = os.path.join(project_dir, "build/Debug/CICD-HIL-AI.elf")
if not os.path.isfile(elf_file):
raise FileNotFoundError(f"ELF file not found: {elf_file}")
# Define the command to flash the device
command = [
"/home/eddie/st/stm32cubeclt_1.17.0/STM32CubeProgrammer/bin/STM32_Programmer_CLI",
"-c", "port=SWD",
"-w", os.path.join(project_dir, "build/Debug/CICD-HIL-AI.elf"),
"-v",
"-rst",
"-run"
]
try:
# Run the command
result = subprocess.run(command, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print("Flashing successful:")
print(result.stdout.decode())
except subprocess.CalledProcessError as e:
print("Error during flashing:")
print(e.stderr.decode())
raise
if __name__ == "__main__":
flash_device()
Not too much going on here, first we do some error checking making that both the directory and file exist, then we are using the subprocess
module to run the STM32CubeProgrammer CLI tool to flash the device.
The command args are as follows:
-c
option to specify the connection type, in this case SWD.-w
option to specify the file to flash.-v
option to verify the flash.-rst
option is used to reset the device after flashing-run
option is used to run the program after flashing.
Get diff and AI code review
Now that we have the device flashed, we need to get the diff of the code and send it to the AI model. Thankfully github makes its pretty easy to get all the things we need. The AI code review job looks like this:
ai_code_review:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request' && !contains(github.event.pull_request.title, '@NOAI')
permissions:
contents: read
pull-requests: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'
- name: Create virtual environment
run: |
python -m venv .venv
source .venv/bin/activate
- name: Install dependencies
run: |
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r .github/workflows/helpers/requirements.txt
- name: Get AI Code Review
env:
GEMINI_API_KEY_SECRET: ${{ secrets.NONYA_BUSINESS_API_KEY }}
PR_DIFF_URL: ${{ github.event.pull_request.diff_url }}
GITHUB_TOKEN_SECRET: ${{ secrets.GITHUB_TOKEN }}
GITHUB_REPOSITORY: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
source .venv/bin/activate
python .github/workflows/helpers/ai_pr_reviewer.py
Lets break it down.
This time we are using ubuntu-latest
as the runner since we do not need hardware in the loop for this job.
The job will only run if the event is a pull-request and the title does not contain @NOAI
.
This allows us to skip the AI review if we want to by simply adding @NOAI
to the title of the pull-request. Cool party trick for sure.
The permissions section is important, we need to give the job permission to read the contents of the repository and write to the pull-request.
Most of the steps are similar to the previous as far as checking out the code and setting up the python environment.
The only difference is the Get AI Code Review
step, which will run a python script that will get the diff of the code and send it to the AI model.
In the env
section we are setting up some environment variables that will be used in the python script.
GEMINI_API_KEY_SECRET
is the API key for the AI model, in this case Gemini, I am using github methods to get the secret.PR_DIFF_URL
is the url for the diff of the pull-request, this is provided by github and is used to get the changes made in the pull-request.GITHUB_TOKEN_SECRET
is the token for the github API, this is also provided by github and is used to post comments on the pull-request.GITHUB_REPOSITORY
is the name of the repository, this is provided by github and is used to post comments on the pull-request.PR_NUMBER
is the number of the pull-request, this is provided by github and is used to post comments on the pull-request.
All of these variables will be made available to the environment and can be sourced in the python script. The python script will look like this:
# .github/workflows/helpers/ai_pr_reviewer.py
import os
import requests
import google.generativeai as genai
import sys
# Configuration
# Max characters of the diff to send to Gemini. Adjust if needed based on token limits and typical PR size.
# gemini-pro has a 32k token limit (input). ~4 chars/token. 25000 chars ~ 6250 tokens.
MAX_DIFF_CHARS = 25000
GEMINI_MODEL = "gemini-1.5-flash-latest" # Use flash for speed and cost-effectiveness for this task
def fetch_pr_diff(diff_url, github_token):
"""Fetches the diff content of a PR."""
headers = {
"Authorization": f"Bearer {github_token}", # Use Bearer for GITHUB_TOKEN
"Accept": "application/vnd.github.v3.diff",
}
try:
response = requests.get(diff_url, headers=headers, timeout=30)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
print(f"::error::Failed to fetch PR diff from {diff_url}: {e}", file=sys.stderr)
return None
def get_ai_review(api_key: str, diff_content: str) -> str:
"""Gets a code review from Google Gemini."""
if not diff_content or not diff_content.strip():
print("::info::No diff content to review.", file=sys.stderr)
return "NO_REVIEW"
if len(diff_content) > MAX_DIFF_CHARS:
warning_msg = (
f"Diff content is large ({len(diff_content)} chars). "
f"Truncating to {MAX_DIFF_CHARS} chars for AI review. Full context may be lost."
)
print(f"::warning::{warning_msg}", file=sys.stderr)
diff_content = (
diff_content[:MAX_DIFF_CHARS] + "\n\n... (diff truncated due to length)"
)
genai.configure(api_key=api_key)
model = genai.GenerativeModel(GEMINI_MODEL)
prompt = (
"You are an expert code reviewer for embedded systems.\n"
"You are reviewing a Pull Request. The following is a unified diff of the changes.\n"
"Your task is to:\n"
"1. Identify potential bugs, logical errors, or anti-patterns.\n"
"2. Check for violations of embedded C/C++ best practices (e.g., resource management, "
"volatile correctness, interrupt safety if inferable).\n"
"3. Look for areas where code could be optimized for performance or clarity.\n"
"4. Provide constructive feedback and suggested improvements only if absolutely necessary, be concise\n"
"5. If everything looks good, say so clearly.\n\n"
"6. No one wants to read a novel, so keep it short and concise. Tokens cost money!!!\n\n"
"Here is the diff:\n"
f"{diff_content}"
)
try:
response = model.generate_content(prompt)
return response.text.strip()
except Exception as e:
print(f"::error::AI review failed: {e}", file=sys.stderr)
return "AI_REVIEW_FAILED"
def post_pr_review(github_token: str, repo: str, pr_number: str, comment: str):
"""Posts a review comment to the specified PR using the GitHub API."""
url = f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews"
headers = {
"Authorization": f"Bearer {github_token}",
"Accept": "application/vnd.github.v3+json",
}
data = {
"body": comment,
"event": "COMMENT", # General comment without approving/rejecting
}
try:
response = requests.post(url, json=data, headers=headers, timeout=30)
response.raise_for_status()
print(f"::info::Successfully posted review comment to PR #{pr_number}")
except requests.exceptions.RequestException as e:
print(
f"::error::Failed to post review comment to PR #{pr_number}: {e}",
file=sys.stderr,
)
sys.exit(1)
if __name__ == "__main__":
# First, try environment variables (GitHub Actions mode)
gemini_api_key = os.getenv("GEMINI_API_KEY_SECRET")
pr_diff_url = os.getenv("PR_DIFF_URL")
github_token = os.getenv("GITHUB_TOKEN_SECRET")
repo = os.getenv("GITHUB_REPOSITORY")
pr_number = os.getenv("PR_NUMBER")
if gemini_api_key and pr_diff_url and github_token:
diff_text = fetch_pr_diff(pr_diff_url, github_token)
if diff_text is None:
print("NO_REVIEW")
sys.exit(0)
if not diff_text.strip():
print("::info::Diff is empty. No review needed.", file=sys.stderr)
print("NO_REVIEW")
sys.exit(0)
review_comment = get_ai_review(gemini_api_key, diff_text)
if review_comment == "NO_REVIEW":
comment = "AI Code Review: No review generated due to empty diff content."
elif review_comment == "AI_REVIEW_FAILED":
comment = "AI Code Review: Failed to generate review due to an error."
else:
comment = f"AI Code Review:\n\n{review_comment}"
# Post the review as a PR comment
post_pr_review(github_token, repo, pr_number, comment)
print(review_comment) # Still print for logs
else:
# Local CLI mode: python ai_pr_reviewer.py <API_KEY> <diff_file>
if len(sys.argv) != 3:
print(
"Usage: python ai_pr_reviewer.py <API_KEY> <diff_file>", file=sys.stderr
)
sys.exit(1)
gemini_api_key = sys.argv[1]
diff_file = sys.argv[2]
try:
with open(diff_file, "r", encoding="utf-8") as f:
diff_text = f.read()
except Exception as e:
print(f"::error::Failed to read diff file: {e}", file=sys.stderr)
sys.exit(1)
review_comment = get_ai_review(gemini_api_key, diff_text)
print("\n===== AI REVIEW =====\n")
print(review_comment)
I will not go into detail about the code since it is not terrible hard to understand. But the gist of it is that we are sourcing the environment variables we set in the workflow and using them to get the diff of the pull-request. We send that diff along with a prompt to the AI model and get a response. We then post that response as a comment on the pull-request using the github API. The AI model is configured to look for bugs, logical errors, and anti-patterns in the code etc. That prompy is very important and you can tweak it to your liking. It can be very verbose and start going off into the weeds telling you how much you suck and your dad doesnt like you because you cant code for the life of you. But you can always tone it down by telling it to be concise and not to write a novel.
Most AI models right now have a free tier that you can use to test it out, and if you go for the basic models you can get decent reviews for free.
Improvements
There are a lot of improvements that can be made to this workflow. For example:
- All steps should check for failure and exit the workflow if something goes wrong.
- Since this involves hardware in the loop, we should have a way to check if the hardware is connected and ready to be flashed.
- You should use lock files to prevent multiple pull-requests from trying to flash the hardware at the same time. The lock file should only be released by the process that made it.
- GitHub has a nice way to queue up jobs specially for cases like this, where you have multiple pull-requests and you want to run them in order and they must wait for their turn to not cause contention.
- Use the serial ID of the debugger you are using for the specific hardware that is to be flashed by the specific repo. This way you can have multiple hardware setups and they will not interfere with each other.
- Log your failed workflows and save artifacts for debugging.
- Consider giving AI more context about the project beyond the simple diff and prompt, giving it file tree, project type etc.
- (Update) I just added the ability for the ai reviewer to read its previous comments and use them as context for the review. This helps keep the conversational feel and allows the AI to remember what it said in the past.
Conclusion
Ultimately I dont think AI assited code reviews should be a blocker for merging pull-requests, but it can be a great tool to help you catch bugs and logical errors in your code. It is beneficial to both the developer and the reviewer.
You can find a link to the code on my GitHub and feel free to use it as a starting point for your own projects. I will be adding more features to this in the future, so stay tuned for updates.