Back to articles

How to build your own AI Codegen

Alex Phan
Alex PhanGrowth Engineer
Jason Howmans
Jason HowmansDashboard Tech Lead
November 11, 2024
8 min read
Share

Creating a Codegen Feature: A Generalized Guide

Code generation from natural language commands is now commonly used in modern development workflows. We'll build a tool that converts natural language instructions into functional Playwright test scripts that can interact with live browser sessions.

In this short guide we share a framework for building a natural language to Playwright code generator for any website. Our example uses a NextJS app with Browserbase, but the principles can be adapted for many different programming languages and frameworks.

Prerequisites

First of all let’s create a new NextJS app using create-next-app and add the necessary dependencies.

pnpm dlx create-next-app@latest --ts browserbase-codegen

pnpm add -D shadcn

pnpm add react-spinners

pnpm exec shadcn init

pnpm exec shadcn add separator tabs textarea button label input toast

Along with the dependencies above, you will also need:

  • Anthropic account & API key

  • Browserbase account & API key

You should add these values as variables in your .env.development.local file:

ANTHROPIC_API_KEY=

BROWSERBASE_API_KEY=

1. User Interface

Let’s start by building out a simple UI with our installed components for inputting prompts and displaying generated code.

Here are some key features we should consider for the UI:

  • Support multi-line input for complex prompts

  • Provide clear visual feedback for different states

  • User-editable code input to make it possible to extend existing Playwright scripts

The component below is used to create the UI. Add this to your default NextJS page.tsx file. Don’t worry about the empty handleExecute and useEffect functions; we’ll get to these shortly.

// app/page.tsx

"use client";

import { Button } from "@/components/ui/button";

import { Input } from "@/components/ui/input";

import { Separator } from "@/components/ui/separator";

import { Textarea } from "@/components/ui/textarea";

import { useToast } from "@/components/ui/use-toast";

import { useEffect, useRef, useState } from "react";

import { MoonLoader } from "react-spinners";

import { codegenAction, getBrowserbasePage } from "./actions";

export default function Codegen() {

const { toast } = useToast();

// The user's input prompt

const [prompt, setPrompt] = useState("");

// The Playwright script to be generated or edited by the user

const [script, setScript] = useState("");

// The page we want to write the script for

const [websiteUrl, setWebsiteUrl] = useState("");

// THe component state

const [state, setState] = useState<"ready" | "connecting" | "generating">(

"ready",

);

// A toast that shows the current loading state

const staticToast = useRef<ReturnType<typeof toast> | null>(null);

const appendCode = (newLines: string) => {

setScript((prev) => prev + newLines);

};

// Execute a prompt

const handleExecute = async () => {

// To be implemented...

};

We’re relying on a toast to communicate various loading and error states. For toasts to work, we’ll need to put a <Toaster /> provider in our app/layout.tsx file. Let’s do that to complete the UI:

// app/layout.tsx

import { Toaster } from "@/components/ui/toaster";

import type { Metadata } from "next";

export default function RootLayout({

children,

}: Readonly<{

children: React.ReactNode;

}>) {

return (

<body>

{children}

</body>

</html>

);

}

With the UI so far, we’ve created:

  1. An easy way to input code and prompts.

  2. A way to submit the prompt for AI codegen.

  3. A way to handle loading states using toasts.

In the next step we’ll implement connecting to a running Browserbase session, generating valid Playwright JS code for the open page, and we’ll add some basic error handling and user feedback. Let’s continue.

2. Connect to a Browserbase page

Before we get to the exciting part of generating Playwright code, we need to pull some data from a running Browserbase session. To do this we’ll create a new file app/actions.ts where we’ll add a NextJS server action to connect to a new Browserbase session and visit the user-inputted URL stored in websiteUrl:

// app/actions.ts

"use server";

import playwright from "playwright";

export async function getBrowserbasePage(targetUrl: string) {

try {

const browser = await playwright.chromium.connectOverCDP(

`wss://connect.browserbase.com?apiKey=${browserbaseApiKey}`,

);

const defaultContext = browser.contexts()[0];

const page = defaultContext.pages()[0];

await page.goto(targetUrl);

let pageHTML = "";

pageHTML = await page.evaluate(() => {

return document.documentElement.outerHTML;

});

if (!pageHTML) {

pageHTML = await page.content();

}

return {

pageHTML,

pageUrl: page.url(),

};

} catch (err: unknown) {

console.error(err);

throw new Error("Failed to connect to session");

}

}

The getBrowserbasePage action will be called from our client component to connect to a new Browserbase session, then get the page URL and the current DOM state for the target page ready to send to Claude for inference.

The handleExecute function in page.tsx is looking a little empty. We should add some code to the function that calls our new server action.

// app/page.tsx

// ...existing code

const handleExecute = async () => {

if (!websiteUrl) {

toast({

title: "Empty Browserbase URL",

description: "Please enter a Browserbase URL",

});

return;

}

try {

setState("connecting");

const page = await getBrowserbasePage(websiteUrl);

if (!page) {

toast({

title: "Failed to connect to Browserbase page",

description:

'Ensure you have added your "BROWSERBASE_API_KEY" to the .env file',

});

return;

}

// To implement: Codegen...

} catch (err: unknown)

console.error(err);

toast({

title: "Error",

description: "Failed to execute the action (see console)",

});

setState("ready");

} finally {

setState("ready");

}

}

While we’re in page.tsx, let’s add some code to the empty useEffect that provides user feedback based on the current state:

// app/page.tsx

// Handle different loading states

useEffect(() => {

let t;

switch (state) {

case "ready": {

if (staticToast.current) {

staticToast.current?.dismiss();

staticToast.current = null;

}

break;

}

case "connecting": {

t = {

description: (

<MoonLoader color="#000000" size={16} /> Connecting to session...

</div>

),

};

break;

}

case "generating": {

t = {

description: (

<MoonLoader color="#000000" size={16} /> Generating code...

</div>

),

};

break;

}

}

if (!t) {

return;

}

3. Code Generation

In the final step we’ll generate Playwright code based on the user’s input prompt and page context pulled from the running Browserbase session. To do this, we need to add a new function to app/actions.ts with the full prompt we send to Claude to generate code.

We must include all the relevant context (current code state, application state, etc) with the prompt to ensure the LLM has all the information it needs. Finally, we should format the response so that we have the output displayed correctly alongside the constraints we’ve put aside.

Let’s update the app/actions.ts file to look like this:

// app/actions.ts

"use server";

import Anthropic from "@anthropic-ai/sdk";

import playwright from "playwright";

// Automatically sets apiKey using process.env.ANTHROPIC_API_KEY

const anthropic = new Anthropic();

const browserbaseApiKey = process.env.BROWSERBASE_API_KEY;

if (!browserbaseApiKey) {

throw new Error("BROWSERBASE_API_KEY is required");

}

export async function getBrowserbasePage(targetUrl: string) { ... }

export async function codegenAction(

cmd: string,

script: string,

pageHTML: string,

) {

let pageHTMLPrompt = "";

if (pageHTML) {

pageHTMLPrompt = `\n**Page HTML:**\n\`\`\`html\n${pageHTML}\n\`\`\``;

}

const prompt = `

You are an AI that produces snippets of Playwright JS code.

Your task is to extend a Playwright JS script according to an instruction. You will be provided with a Playwright script and the associated HTML page source. You must respond with new lines of Playwright JS code to be evaluated and appended the script.

${pageHTMLPrompt}

**Playwright JS script to extend:**

\`\`\`js

${script}

The key details in the completed prompt are the current script, the current page HTML, and the input command. Note that we need to provide clear instructions and constraints for Claude to generate valid Playwright code.

You may have noticed that codegenAction returns a console statement instead of throwing an error. This is the case because codegenAction must always return valid Javascript to be interpreted later.

As a final step, let’s append some code to our handleExecute function in app/page.tsx. We need to call the server action we just created and append the generated Playwright code to the textarea. Your handleExecute function should now look like this:

// app/page.tsx

const handleExecute = async () => {

if (!websiteUrl) {

toast({

title: "Empty Browserbase URL",

description: "Please enter a Browserbase URL",

});

return;

}

if (!prompt) {

toast({

title: "Empty prompt",

description: "Please write a prompt and hit execute",

});

return;

}

try {

setState("connecting");

const page = await getBrowserbasePage(websiteUrl);

if (!page) {

toast({

title: "Failed to connect to Browserbase page",

description:

'Ensure you have added your "BROWSERBASE_API_KEY" to the .env file',

});

return;

}

// Make the codegen request

setState("generating");

const response = await codegenAction(prompt, script, page.pageHTML);

if (!response) {

toast({

title: "Codegen error 😢",

description: "Failed to generate code.",

Final Tips

This guide helps you create a codegen app that connects to a Browserbase session, pulls the DOM for a live page, and generates valid Playwright code to perform actions on the page. If you’re building a web agent, this is a great starting point.

This project can be extended to include the Browserbase live view to watch what’s happening on the page, a feedback loop that executes the generated code in the live session, and more tailored prompts for specific use cases. For more advanced use cases, you can explore our open-source AI SDK, Stagehand, which offers additional flexibility and functionality. If you’d like to see these features in action, check out our playground.