Coding visually

I spent two days building something that looks like a drag-and-drop table builder. But in reality, it kickstarted my exploration into a more fundamental question:

How might we instruct AI to code for us?

Once upon a time…

Before I enter the tech industry, I was an HR recruiter. I wanted to build an Applicant Tracking System tailored to the experience we want to create for our company’s applicants. Since I also searched for jobs before, I know what I want the experience to look like on the screen. But I have no idea how it would work in the background — how can we store an in-progress application? how do we take those applications and put it in a Google sheet?

Most non-techies, when faced with the dilemma of whether to build their own tool, are not fazed by deciding the looks of the tool, but building the inner workings of it. They are also opinionated about the face of the tool, but indifferent about how it works under the hood, as long as what is surfaced on the screen fits to their imagination.

That’s why many “Create your X” products (like Framer) focus on customizing the frontend, and hiding away the backend. I believe in that assumption as well but with some caveats:

Builders are opinionated about the frontend, but indifferent about how the backend is implemented, as long as it serves the frontend completely
How the builder creates the frontend reveals a lot about the business logic of the application, which is then realized by the backend
The end product should not be a close-sourced application that lives on the building platform, but an open-source codebase that is interoperable with other applications, and can be extended through code

A (recent) spark of inspiration

Then, I was inspired by this Twitter post by Amelia taking about building simple visualization of data:

Why can’t people build a dashboard like this with some simple point-and-click interaction? Most applications like this involves simple backend actions — create, read, update, delete, some data transformation — aggregate, look up, reference, and a few external data sources — from our own or from other parts of the internet.

I started building a prototype that helps non-techies create some applications like what Amelia did for her house-hunting. But this experiment got me started thinking about a key question:

How can we instruct AI to generate code for us?

Why AI (again)?

As I talked about in my “What I think about app building”, I believe that Generative AI creates a new opportunity for people to easily create tools without knowing any rigid, non-natural language code. Right now, LLM becomes the new compiling layer for machine instructions, with an intermediate, higher-level programming language layer between it.

Many people have shown that Generative AI can do more than just generate code. It also understands visual concepts like positioning. With Code Interpreter, it is also able to recognize its mistakes and fix itself. This makes AI very powerful as a software engineer that we can all instruct in written language, and in the future, visually.

But then why not ask the AI to generate the entire application from front to back for us? While AI is unbeatable in accuracy, I still believe that it lacks taste. It does not know, out of the countless variation of an application, which one is the best suited for a purpose. Some people believe that, given enough samples, an AI can acquire taste, or even exceed human’s ability to create delightful things. But I think Generative AI is still at the stage of mimicry, rather than create.

Generative AI is more mimicry than create, just as good artist copy

With that in mind, builders are still needed to establish a baseline taste for the tool they are building, specifically the look and feel of the tool. But once the taste is established, AI can proceed to wire it all up, making it work dynamically.

So the next question is, how can we provide a baseline taste to the AI?

How do we tell AI our taste?

Traditionally, when we build applications, we have a plethora of artifacts that guide us on what to build. We brainstorm with sketches and post-its, we define with requirement documents, we ideate with designs, we build with code, and we iterate with working prototypes.

But right now, we instruct most AI with text. We are limited to a chat box. We have to translate all these complex thoughts and rich artifacts into words. Are we satisfied with describing a beautiful design with just words? In our current AI world, we tell, not show.

The most direct way to tell AI our taste is to show it what we want. I believe that non-techies, even if they don’t know the principles of design or front-end development, still have a concrete vision of what the screens should look like. So let’s start from there.

By letting the user drag and drop UI elements onto the canvas, the builder is telling the AI what the application needs to look like, and implicitly, what the application does. If there is a table, that means the application need to fetch data from somewhere and display it. If there is a form, that means the application need to send data somewhere permanent.

Why just stop at building a mock screen, when the builder can incorporate live data to it to preview it as the builder builds it? The builder knows what data to display on the table, the builder just doesn’t know how to link the data with the UI together. This is where AI can assist, by generating the backend code on the fly (see Why not use LLM as the backend for anything) to fetch the data needed for the table.

Coding visually

What I am experimenting with here is not just a drag-and-drop table builder, but how can people “code” visually?

Right now, we write in text what the program has to do every time it is run. But other than text, there must be many more ways we can “codify” the instructions for the program.

In the table case, instead of writing code to “codify” that a column needs to display a property of a bunch of things, I am dragging that property into that column, and checking if the program understands my intention by seeing it run live.

I am converting the act of writing a few lines of database fetching and column rendering code into a visual gesture that aligns with the builder’s mental model. I am previewing what the program will do and let the builder decide to revise the program.

While, at the end, it culminates into a codebase, how we “code” becomes less text-based, and more visual and fits what most people are more familiar with. The challenges that stem from this are:

Figure out the visual and interaction primitives for different UI components, such as what would it look like to wire up a form
Define a general model for visualizing any type of code

While AI is still short of creating novel solutions, given rich formats of instructions and specificity, it can fill in the gaps to make applications work. I am very interested in figuring out more ways that we can instruct AI to do things for us, and how we can expose the app-building process beyond text and code.