Developing an AI Engineer from Scratch

A Step-by-Step Guide Based on DeepSeek’s RL Methodology

In partnership with

TL;DR: AI software engineers are on the horizon, with tools and LLMs already aiding in coding. They differ from standard LLMs by analyzing and modifying code across multiple files in repositories. Training involves using Reinforcement Learning (RL) on Git pull requests, rewarding AI for code changes that match the task and codebase. To improve learning, curriculum learning could order PRs by difficulty, and refining the reward system to recognize multiple valid solutions could provide a more accurate incentive structure for the LLM. Addressing the dataset and reward function challenges is critical for better AI performance. The open-source nature of these advancements promises wide accessibility for developers, making it an exciting field for future exploration.

The concept of an "AI software engineer" is becoming increasingly realistic.

Technologies are already simplifying software development tasks.

Last year, Devin was introduced as the first AI software engineer, while Cursor, an alternative to VS Code, has gained traction for integrating AI into project workflows.

Large language models (LLMs) like Claude and GPT-4 assist in coding and debugging, making the process more efficient.

As research on LLM reasoning advances, the ease of software engineering will continue to improve.

Find out why 1M+ professionals read Superhuman AI daily.

In 2 years you will be working for AI

Or an AI will be working for you

Here's how you can future-proof yourself:

  1. Join the Superhuman AI newsletter – read by 1M+ people at top companies

  2. Master AI tools, tutorials, and news in just 3 minutes a day

  3. Become 10X more productive using AI

Join 1,000,000+ pros at companies like Google, Meta, and Amazon that are using AI to get ahead.

AI software engineer v/s regular LLM

The distinction between a regular large language model (LLM) and an AI software engineer is significant, despite both being capable of writing code.

An AI software engineer functions as an intelligent assistant that can analyze multiple code files within a Git repository. It determines which files need modification based on specific tasks. For instance, if you're addressing a bug in an AI project where the assistant fails to load the Mistral model, the AI software engineer can identify the relevant files to update.

In contrast, regular LLMs primarily focus on understanding and generating human-like text. While they can assist in coding tasks, their capabilities are generally limited to generating code snippets or fixing simple bugs based on textual prompts. They do not possess the contextual awareness or file management abilities that an AI software engineer has.

In summary, while both LLMs and AI software engineers can contribute to coding, the latter offers a more advanced, context-aware approach tailored for complex software engineering tasks.

As a software engineer, the first step in fixing a bug is identifying the correct code file to modify. For example, you would start by examining the file responsible for loading the model to see if the issue lies there. If the code references variables or functions imported from other files, you would need to investigate those files as well, as they could also be contributing to the problem.

A regular LLM falls short in this scenario because it cannot process all relevant files simultaneously when answering questions. You would need to locate the specific file yourself before asking the LLM to help with the bug fix.

So, how do we define an AI software engineer?

An AI software engineer operates similarly to a traditional software engineer by executing incremental pull requests (PRs) to address specific code issues. For example, if the goal is to enhance an AI assistant's reasoning capabilities, the first PR might focus on establishing the assistant itself, while subsequent PRs could involve setting up the training workflow and other related tasks. This structured approach allows for systematic improvements and debugging in software development.


An AI assistant would need to write code for these PRs sequentially, either on its own or prompted by the user.

How do you create an AI software engineer (SWE)?

The recent introduction of the DeepSeek model raises an interesting question: Can large language models (LLMs) utilize Reinforcement Learning (RL) to enhance their software engineering capabilities? My recent blog discusses the versatility of RL training techniques, which you might find insightful.

RL is particularly effective because it allows LLMs to independently solve problems without explicit instructions.

Defining the Task

The objective is to enable our AI assistant to automatically implement necessary changes in a repository based on its current state and the task at hand, similar to how a human software engineer would operate.

Training an AI to Be a Software Engineer

To achieve this, we can employ the RL training pipeline used by DeepSeek to enhance reasoning skills. RL is highly adaptable, meaning that improvements made in one area can translate to better performance in other contexts. Therefore, the methods DeepSeek used for reasoning enhancement could also benefit our task.

However, can we further improve this process by incorporating specific software engineering examples during training?

Data Collection

To train the LLM effectively, we need data that illustrates what changes should be made based on:

  • The current state of the code

  • The specific function that requires modification

The ideal source for this information is Git pull requests (PRs). A Git PR represents a proposal to merge changes from one branch into another. When working on a project in GitHub, developers typically modify the repository, commit those changes with descriptive messages, and then merge them into the main codebase. This process provides valuable insights into how changes are made and can serve as a rich dataset for training our AI assistant.

A pull request contains essential information, including the previous state of the repository, the task to be accomplished, and the changes implemented to fulfill that task.

By gathering millions of pull requests from public repositories, we can compile a substantial dataset. This data will provide the foundation needed to train the LLM on how to modify code based on specified tasks and the prior state of the code. With this wealth of information, we can effectively teach the AI assistant to make informed changes, mimicking the decision-making process of a human software engineer.

Prompting the LLM

With our dataset in place, the next step is to determine how to effectively prompt the LLM during both the training and inference phases.

Since we cannot upload the entire codebase, we can focus on providing two types of files:

  1. Files that changed between the previous and current commit
    These files contain the specific modifications made to address the task.

  2. Other relevant files that remained unchanged
    These files may still be important for context, even if they were not directly modified.

To identify these relevant unchanged files, we can utilize an LLM to analyze file names and their relationships. It’s crucial for the LLM to understand not only which files need to be altered but also which ones should remain untouched. Incorporating this knowledge into the training process will enhance its ability to make informed decisions, ultimately improving its performance as an AI software engineer.

Defining the reward

In the context of Reinforcement Learning (RL), defining an effective reward system is crucial for guiding the LLM's learning process.

We will establish the reward based on a straightforward principle: measure the similarity between the output code generated by the LLM and the actual new state of the file in the pull request (PR).

  • Reward Structure: The closer the LLM's output is to the actual modified code, the higher the reward it receives. This incentivizes the model to produce accurate and relevant code changes.

By quantifying this difference, we can effectively encourage the LLM to learn and improve its performance over time, aligning its outputs more closely with desired outcomes in software engineering tasks.

The training process

The training process for the LLM is straightforward and consists of the following steps:

  1. Input Data: Each data point from our dataset is passed into the LLM.

  2. Generate Output: The LLM produces an output based on the input data.

  3. Compare Outputs: The generated output is then compared to the expected output code from the pull request.

  4. Calculate Reward: A reward is assigned based on the similarity between the LLM's output and the actual code changes, using sequence similarity as a metric.

  5. Optimize Policy: The LLM adjusts its learning policy to maximize the overall reward based on the feedback received.

This iterative process allows the LLM to refine its understanding and improve its ability to make accurate code modifications, enhancing its performance as an AI software engineer over time.

How can we improve this further?

Curriculum learning

It's been shown that increasing the difficulty of problems as the RL training loop progresses helps the LLM learn more effectively, just as it would help a student learn more effectively.

Since currently, we took PRs in a random order from public repositories on GitHub, we could order them based on the size of the commit, i.e., the number of changes made within the commit, with the assumption that commits involving fewer changes are easier to reason than others.

Alternately, we could think of another metric to judge the difficulty of reasoning within each PR and order the data points in increasing difficulty.

Cleaner dataset and reward function

The cleanliness of the dataset is a significant challenge in enhancing LLM performance during the RL training process, as highlighted by DeepSeek and other researchers.

  • The dataset: One major issue is that we cannot guarantee the quality of the data. Ideally, all pull requests (PRs) should exemplify "good software engineering skills," but this is unrealistic when sourcing PRs from public repositories on the internet. Many submissions may not adhere to best practices, which could negatively impact the training outcomes.

  • The reward function: Another concern arises with the reward function. If there are multiple valid solutions to a problem, the LLM might receive a negative reward if its approach differs from that in the repository, even if its method is superior. This misalignment can misguide the LLM's learning process.

Improving Dataset and Reward Function

To enhance both the dataset and reward function, we could consider implementing stricter data curation processes to filter out lower-quality PRs. Additionally, refining the reward function to recognize alternative valid solutions could provide a more accurate incentive structure for the LLM.

If you have suggestions for improving these aspects, I would love to hear your thoughts in the comments!

Conclusion

The concept of an AI software engineer is rapidly approaching reality and is likely to become open-sourced, making it widely accessible to software developers globally.

In this discussion, we explored one potential approach to creating an AI software engineer. There are numerous ways to enhance the efficiency of this process. Training LLMs using Reinforcement Learning (RL) remains an emerging field, filled with opportunities for research and development.

As I delve into this area myself this year, I encourage anyone interested to explore it as well. The potential for innovation in AI-driven software engineering is immense, and your contributions could help shape its future!

Connect with us on LinkedIn for more updates like this!