Nov 11, 2024

How to build your own AI Codegen

How to build your own AI Codegen

Alex Phan

Alex Phan

alexdphan

Jason Howmans

Creating a Codegen Feature: A Generalized Guide

Code generation from natural language commands is now commonly used in modern development workflows. We'll build a tool that converts natural language instructions into functional Playwright test scripts that can interact with live browser sessions.

In this short guide we share a framework for building a natural language to Playwright code generator for any website. Our example uses a NextJS app with Browserbase, but the principles can be adapted for many different programming languages and frameworks.

Prerequisites

First of all let’s create a new NextJS app using create-next-app and add the necessary dependencies.

pnpm dlx create-next-app@latest --ts browserbase-codegen 

pnpm add -D

Along with the dependencies above, you will also need:

  • Anthropic account & API key

  • Browserbase account & API key

You should add these values as variables in your .env.development.local file:

ANTHROPIC_API_KEY=
BROWSERBASE_API_KEY

1. User Interface

Let’s start by building out a simple UI with our installed components for inputting prompts and displaying generated code.

Here are some key features we should consider for the UI:

  • Support multi-line input for complex prompts

  • Provide clear visual feedback for different states

  • User-editable code input to make it possible to extend existing Playwright scripts

The component below is used to create the UI. Add this to your default NextJS page.tsx file. Don’t worry about the empty handleExecute and useEffect functions; we’ll get to these shortly.

// app/page.tsx
"use client";

import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Separator } from "@/components/ui/separator";
import { Textarea } from "@/components/ui/textarea";
import { useToast } from "@/components/ui/use-toast";
import { useEffect, useRef, useState } from "react";
import { MoonLoader } from "react-spinners";
import { codegenAction, getBrowserbasePage } from "./actions";

export default function Codegen() {
  const { toast } = useToast();
  // The user's input prompt
  const [prompt, setPrompt] = useState("");
  // The Playwright script to be generated or edited by the user
  const [script, setScript] = useState("");
  // The page we want to write the script for
  const [websiteUrl, setWebsiteUrl] = useState("");
  // THe component state
  const [state, setState] = useState<"ready" | "connecting" | "generating">(
    "ready",
  );
  // A toast that shows the current loading state
  const staticToast = useRef<ReturnType<typeof toast> | null>(null);

  const appendCode = (newLines: string) => {
    setScript((prev) => prev + newLines);
  };

	// Execute a prompt
  const handleExecute = async () => {
    // To be implemented...
  };

  // Handle different states
  useEffect(() => {
    // To be implemented...
  }, [state, toast]);

  return (
    <>
      <div className="hidden h-full flex-col md:flex">
        <div className="container flex flex-col items-start justify-between space-y-2 py-4 sm:flex-row sm:items-center sm:space-y-0 md:h-16">
          <h2 className="text-lg font-semibold">AI Codegen</h2>
        </div>
        <Separator />
        <div className="container h-full py-6">
          <div className="grid h-full items-stretch gap-6 md:grid-cols-[300px_1fr]">
            <div className="hidden flex-col space-y-4 sm:flex md:order-1">
              <div className="grid gap-2">
                <span className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70">
                  Target website
                </span>
                <Input
                  placeholder="https://"
                  onChange={(e) => setWebsiteUrl(e.target.value)}
                  value={websiteUrl}
                  disabled={state !== "ready"}
                />
              </div>
              <div className="grid gap-2">
                <span className="text-sm font-medium leading-none peer-disabled:cursor-not-allowed peer-disabled:opacity-70">
                  Write a simple prompt
                </span>
                <Textarea
                  placeholder="Press the 'sign up' button"
                  onChange={(e) => setPrompt(e.target.value)}
                  value={prompt}
                  disabled={state !== "ready"}
                />
              </div>
              <footer>
                <Button onClick={handleExecute} disabled={state !== "ready"}>
                  Execute
                </Button>
              </footer>
            </div>
            <div className="md:order-2">
              <div className="flex h-full flex-col space-y-4">
                <Textarea
                  placeholder="// Playwright JS"
                  className="min-h-[400px] flex-1 p-4 md:min-h-[700px] lg:min-h-[700px]"
                  onChange={(e) => setScript(e.target.value)}
                  value={script}
                />
              </div>
            </div>
          </div>
        </div>
      </div>
    </>
  );
}

We’re relying on a toast to communicate various loading and error states. For toasts to work, we’ll need to put a <Toaster /> provider in our app/layout.tsx file. Let’s do that to complete the UI:

// app/layout.tsx
import { Toaster } from "@/components/ui/toaster";
import type { Metadata } from "next";

export default function RootLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <html lang="en">
      <body>
        {children}
        <Toaster />
      </body>
    </html>
  );
}

With the UI so far, we’ve created:

  1. An easy way to input code and prompts.

  2. A way to submit the prompt for AI codegen.

  3. A way to handle loading states using toasts.

In the next step we’ll implement connecting to a running Browserbase session, generating valid Playwright JS code for the open page, and we’ll add some basic error handling and user feedback. Let’s continue.

2. Connect to a Browserbase page

Before we get to the exciting part of generating Playwright code, we need to pull some data from a running Browserbase session. To do this we’ll create a new file app/actions.ts where we’ll add a NextJS server action to connect to a new Browserbase session and visit the user-inputted URL stored in websiteUrl:

// app/actions.ts
"use server";

import playwright from "playwright";

export async function getBrowserbasePage(targetUrl: string) {
  try {
    const browser = await playwright.chromium.connectOverCDP(
      `wss://connect.browserbase.com?apiKey=${browserbaseApiKey}`,
    );
    const defaultContext = browser.contexts()[0];
    const page = defaultContext.pages()[0];
    await page.goto(targetUrl);
    let pageHTML = "";
    pageHTML = await page.evaluate(() => {
      return document.documentElement.outerHTML;
    });
    if (!pageHTML) {
      pageHTML = await page.content();
    }
    return {
      pageHTML,
      pageUrl: page.url(),
    };
  } catch (err: unknown) {
    console.error(err);
    throw new Error("Failed to connect to session");
  }
}

The getBrowserbasePage action will be called from our client component to connect to a new Browserbase session, then get the page URL and the current DOM state for the target page ready to send to Claude for inference.

The handleExecute function in page.tsx is looking a little empty. We should add some code to the function that calls our new server action.

// app/page.tsx

// ...existing code

const handleExecute = async () => {
  if (!websiteUrl) {
    toast({
      title: "Empty Browserbase URL",
      description: "Please enter a Browserbase URL",
    });
    return;
  }
  try {
    setState("connecting");
    const page = await getBrowserbasePage(websiteUrl);
    if (!page) {
      toast({
        title: "Failed to connect to Browserbase page",
        description:
          'Ensure you have added your "BROWSERBASE_API_KEY" to the .env file',
      });
      return;
    }

		// To implement: Codegen...
	} catch (err: unknown) 
	  console.error(err);
	  toast({
	    title: "Error",
	    description: "Failed to execute the action (see console)",
	  });
	  setState("ready");
	} finally {
	  setState("ready");
	}
}

// ...existing code


While we’re in page.tsx, let’s add some code to the empty useEffect that provides user feedback based on the current state:

// app/page.tsx
// Handle different loading states
useEffect(() => {
  let t;
  switch (state) {
    case "ready": {
      if (staticToast.current) {
        staticToast.current?.dismiss();
        staticToast.current = null;
      }
      break;
    }
    case "connecting": {
      t = {
        description: (
          <div className="flex gap-x-2">
            <MoonLoader color="#000000" size={16} /> Connecting to session...
          </div>
        ),
      };
      break;
    }
    case "generating": {
      t = {
        description: (
          <div className="flex gap-x-2">
            <MoonLoader color="#000000" size={16} /> Generating code...
          </div>
        ),
      };
      break;
    }
  }
  if (!t) {
    return;
  }
  if (staticToast.current) {
    staticToast.current.update({
      id: staticToast.current.id,
      duration: 30000,
      ...t,
    });
  } else {
    staticToast.current = toast({ duration: 30000, ...t });
  }
}, [state, toast]);

3. Code Generation

In the final step we’ll generate Playwright code based on the user’s input prompt and page context pulled from the running Browserbase session. To do this, we need to add a new function to app/actions.ts with the full prompt we send to Claude to generate code.

We must include all the relevant context (current code state, application state, etc) with the prompt to ensure the LLM has all the information it needs. Finally, we should format the response so that we have the output displayed correctly alongside the constraints we’ve put aside.

Let’s update the app/actions.ts file to look like this:

// app/actions.ts
"use server";

import Anthropic from "@anthropic-ai/sdk";
import playwright from "playwright";

// Automatically sets apiKey using process.env.ANTHROPIC_API_KEY
const anthropic = new Anthropic();

const browserbaseApiKey = process.env.BROWSERBASE_API_KEY;
if (!browserbaseApiKey) {
  throw new Error("BROWSERBASE_API_KEY is required");
}

export async function getBrowserbasePage(targetUrl: string) { ... }

export async function codegenAction(
  cmd: string,
  script: string,
  pageHTML: string,
) {
  let pageHTMLPrompt = "";

  if (pageHTML) {
    pageHTMLPrompt = `\n**Page HTML:**\n\`\`\`html\n${pageHTML}\n\`\`\``;
  }

  const prompt = `
  You are an AI that produces snippets of Playwright JS code.

  Your task is to extend a Playwright JS script according to an instruction. You will be provided with a Playwright script and the associated HTML page source. You must respond with new lines of Playwright JS code to be evaluated and appended the script.
  ${pageHTMLPrompt}

  **Playwright JS script to extend:**
  \`\`\`js
  ${script}
  \`\`\`

  **IMPORTANT:**
  - The script above was already executed. Your code will be appended to the end of the script and executed automatically without review. Your response MUST NOT include any text that isn't Playwright JS code. DO NOT include an explanation or any text that isn't valid playwright JS code in your response.
  - If you write a comment, add it to the line above the line you're commenting on. DO NOT add comments to the same line as a JS command as it will cause a SyntaxError.
  - Prefer \`let\` variable assignments to \`const\`. Do not re-assign variables; always use new variables or re-use existing variables that have the required value.
  - If you aren't asked you to open a new page, you must use an existing page from the script. You should infer the correct page index from the sequence of open pages. If you cannot infer a page variable, you should open a new page. Example:
    If the ACTIVE PAGE is '2. https://example2.com'. As it's index #2 in the list of open pages, it can be inferred that \`defaultContext.pages()[2]\` is the correct page to use in your response.

  **Playwright rules:**
  - Create new pages with the default context variable, not the browser variable. Example: \`const newPage = await defaultContext.newPage()\`
  - Follow Playwright best practices, such as:
    - Use waitForSelector or waitForLoadState when necessary to ensure elements are ready.
    - Prefer locator methods over deprecated elementHandle methods, example: \`page.locator()\`.
    - Avoid locator strict mode errors by adding \`.first()\` to a locator when accessing the first element. Example: \`page.locator('button').first()\`.
    - Use specific, robust locators: \`page.locator('button[data-testid="submit-btn"]')\` instead of \`page.locator('button')\`
    - Use page.evaluate for complex interactions that require JavaScript execution in the browser context.

  Instruction instruction: ${cmd}

  Remember: Your code will be appended to the end of the script and executed automatically. Ensure it's complete, correct, and safe to run.

  If you lack sufficient information to complete the user instruction, respond with:
  \`console.error("Unable to complete request: [Brief, actionable explanation]")\`
  Example: \`console.error("Unable to complete request: Cannot find element. Please provide more specific selector or element text.")\`
  `;
  try {
    const params: Anthropic.MessageCreateParams = {
      messages: [{ role: "user", content: prompt }],
      model: "claude-3-5-sonnet-20240620",
      max_tokens: 2048,
    };
    const message = await anthropic.messages.create(params);
    const lastMessage = message.content[message.content.length - 1];
    const response = lastMessage.type === "text" ? lastMessage.text : "";
    return response;
  } catch (err: unknown) {
    return `console.error("Couldn't execute command")`;
  }
}

The key details in the completed prompt are the current script, the current page HTML, and the input command. Note that we need to provide clear instructions and constraints for Claude to generate valid Playwright code.

You may have noticed that codegenAction returns a console statement instead of throwing an error. This is the case because codegenAction must always return valid Javascript to be interpreted later.

As a final step, let’s append some code to our handleExecute function in app/page.tsx. We need to call the server action we just created and append the generated Playwright code to the textarea. Your handleExecute function should now look like this:

// app/page.tsx
const handleExecute = async () => {
  if (!websiteUrl) {
    toast({
      title: "Empty Browserbase URL",
      description: "Please enter a Browserbase URL",
    });
    return;
  }
  if (!prompt) {
    toast({
      title: "Empty prompt",
      description: "Please write a prompt and hit execute",
    });
    return;
  }
  try {
    setState("connecting");
    const page = await getBrowserbasePage(websiteUrl);
    if (!page) {
      toast({
        title: "Failed to connect to Browserbase page",
        description:
          'Ensure you have added your "BROWSERBASE_API_KEY" to the .env file',
      });
      return;
    }

    // Make the codegen request
    setState("generating");
    const response = await codegenAction(prompt, script, page.pageHTML);

    if (!response) {
      toast({
        title: "Codegen error 😢",
        description: "Failed to generate code.",
      });
      return;
    }

    // Append code to the text editor
    appendCode(response);
  } catch (err: unknown) {
    console.error(err);
    toast({
      title: "Error",
      description: "Failed to execute the action (see console)",
    });
    setState("ready");
  } finally {
    setState("ready");
  }
};

Final Tips

This guide helps you create a codegen app that connects to a Browserbase session, pulls the DOM for a live page, and generates valid Playwright code to perform actions on the page. If you’re building a web agent, this is a great starting point.

This project can be extended to include the Browserbase live view to watch what’s happening on the page, a feedback loop that executes the generated code in the live session, and more tailored prompts for specific use cases. For more advanced use cases, you can explore our open-source AI SDK, Stagehand, which offers additional flexibility and functionality. If you’d like to see these features in action, check out our playground.

What will you 🅱️uild?

What will you 🅱️uild?

© 2024 Browserbase. All rights reserved.