Vibe Coding an Alt Text Generator

Jun 23, 2025

11 minutes

Vibe coding an alt text Generator

I made an alt text generator using Aider with Claude Sonnet 4, SvelteKit and Tailwind. I didn’t think it was very hard to do since I straight up vibe coded it.

Why?

I made this for a few reasons.

I wanted to see how Sonnet 4 is compared to other LLMs for coding.
I don’t like writing alt text very much.
I can use it at work if they want to integrate it into the cms.
I never thought it would be hard.
I haven’t really purely vibe coded anything

The last point is really the whole reason why. I just wanted to vibe code something. I really didn’t have any plans for this other than I wanted to upload an image or supply a url and have OpenAI’s o4 mini model return alt text describing the image. No preconceived design, components, architecture or guidelines for the LLM.

Before we continue I think it’s pretty important to inform you that where I work we are all in on AI assisted coding. I pretty much use Aider everyday all day. I don’t consider that vibe coding because I do plan out everything I do. I review the code and make sure it is acceptable. I don’t rely on it 100% for everything either. We spent a lot of time figuring out what system works for us and creating guidelines for the LLM to use to code.

What I did.

The first thing I did was setup a SvelteKit project with Tailwind. Then ran Aider and started prompting away.

I let the LLM decide on the look and layout of it. I do feel it looks rather Bootstrapish. No big deal. If I really wanted to I could style it myself.

In the end I got what I wanted and more. It does the image upload (to nowhere) and converts it to base64 and then sends it to Sonnet 3.5 for analysis. It returns the output, which is the alt text, into an editable area with a button that can copy the text to the user’s clipboard. There is also a way to add additional context for the image to be sent along with the image to give the LLM some help in describing the image better. I was also able to add a history that will let you compare the output with a previous output. That is kept in the browser’s session storage. Also on the page there is a section that lists the WCAG criteria for alt text.

I did have to direct Aider on accessibility. Can’t make a tool for accessibility not accessible. It would either not do it or would do it badly.

You may have noticed that I am using Sonnet 3.5 instead of o4 mini. The reason is something I think is pretty funny. I felt that ChatGPT would be the right tool to use to write the function that sends the image data to o4 mini. However it failed at it. After playing with it for about 20min I decided, never mind this let’s see if Sonnet can write its own function to upload to itself. It wrote one on the first try and I decided to use Sonnet 3.5 instead of 4. Which is not an issue, I can always change it. The reason I wanted o4 mini was because it’s cheap.

I did this in 2 sessions and it took a total of about 4 hours. It cost something like $4 USD to make.

What I did according to AI

Since the entire Aider log is available, I gave it to Claude and ChatGPT to give me a summary of what I did.

Both of them broke it down into several steps. ChatGPT counted 8 and Claude counted 7. Both of them were pretty much the same. The following list is paraphrasing their steps.

Step 1: Initial setup

I setup the SvelteKit project and tailwind and the initial AltText.svelte component creation. This was done without Aider. It was just typing in the npm command in my terminal to create a Svelte project with tailwind and whatever other options I selected. After that it was creating the AltText.svelte component which I left empty.

After that is when I started to use Aider to start making the component. I had it make the switch which toggles between image upload and URL input, with base64 conversion for LLM processing. I didn’t do any back-end wiring.

Step 2: UI/UX

This is where I added the file upload button and drop area, URL sanitization, and submit button. I also had it display some placeholder text in an editable textarea with copy-to-clipboard functionality. This gave me the first visible flow of how it would work before hooking up to an API. I let Sonnet decide on the look of everything. At most I told it to move things, for example to place the input for additional context above the submit button instead of after.

Step 3: Accessibility

In this step I reviewed the accessibility of what was made with Aider and Sonnet. It was missing some keyboard interactions. The uploader did not work at all with the keyboard. You would tab past it. The inputs were missing labels. I also took care of the error handling in this step and did a little bit of UX work.

I made the submit button look disabled. From my understanding when you add the disabled attribute to form elements, screen readers will not have any visibility on the element and skip over them. You could still click on it or submit via keyboard but you would get an error message saying you need to upload something.

I also added an accordion that displays the WCAG guidelines for alt text. I think it’s a useful thing to have but also not be in the way of the UI. Which is why I put it in an accordion.

Step 4: Polishing and more features

I tried to add some quality metrics here. I added a character counter and tried to make a quality meter. This is where the AI really seemed to struggle. It chose to do some funky way of measuring the character count with a Svelte $effect. I had to tell it to just do a normal character count on the returned text.

The quality meter didn’t really work. Sonnet implemented a series of general conditions that you could consider a good starting point for a quality meter. For example one of the conditions was that it checked if the words image, or picture of were used. The conditions were ok, however the meter just stayed red all the time and never changed to green. The best way to have done this would be to send it back to an LLM for evaluation. I considered that out of the scope of what I wanted. I decided to just eliminate that feature altogether.

I also added a reset feature to clear everything and a history section that uses the session storage. You can compare the text to a previous version in the event that you regenerated the alt text. Sonnet made a nice little modal that would show you the 2 versions.

Step 5: Backend API

Originally I wanted to use OpenAI’s GPT o4 mini. I thought ChatGPT would be the best person (AI) to write code for that. I didn’t use Aider with the OpenAI API for this. I did it in the chat window. The results didn’t work. Even when I gave some example code from its own documentation it just didn’t work. After mucking around with it for a bit, I decided to switch to having Sonnet write code for itself to analyze the image. I did this in Aider because it was still connected to the Anthropic API. Worked on the first try.

Step 6: Clean up and even more features.

The last feature I added was showing a thumbnail of the image in the history preview section. Then I had Sonnet clean up the code, add JSDoc and write a README.md.

Insights from ChatGPT and Claude.

In addition to summarizing, I also asked both ChatGPT and Claude to give me some key insights they got from the logs. I am just going to paste in exactly what each wrote out. Claude made a list and ChatGPT made a table. Overall I agree with their observations.

Claude

1. Iterative UX Design

Your process shows excellent iterative refinement - starting with basic functionality and continuously improving based on usage patterns. The progression from prominent WCAG guidelines to a subtle question mark icon demonstrates user-centered design thinking.

2. Accessibility as a Core Requirement

You spent significant time on accessibility improvements, treating it as essential rather than an afterthought. This included keyboard navigation, proper error messaging, and ARIA attributes - valuable for readers building inclusive applications.

3. Technology Pragmatism

Svelte 5 over React: Your preference for Svelte 5 runes shows consideration for developer experience
Provider Flexibility: Switching from OpenAI to Anthropic when encountering issues demonstrates adaptable architecture
No TypeScript: Choosing JavaScript for rapid prototyping while leaving TypeScript as an option

4. Real-World Problem Solving

The log reveals common development challenges:

Svelte 5 reactivity debugging (character count issues)
Browser API limitations (contenteditable with two-way binding)
Environment variable security considerations

5. User Experience Evolution

The progression from basic alt text generation to a full-featured tool with history, context, regeneration, and quality feedback shows how requirements naturally expand during development.

6. Development Workflow Insights

Your use of an AI coding assistant for rapid iteration demonstrates modern development practices - using AI for boilerplate while maintaining control over architecture and UX decisions.

ChatGPT

Insight	Why it matters / how readers can apply it
Micro-prompts ≈ micro-commits	You drove the assistant with very small, task-focused prompts (“add a switch”, “make it collapsible”). This mirrors good Git practice: atomic commits and fast feedback. Encourage readers to keep prompts laser-focused; it reduces hallucinations and makes diff-based review trivial.
Design → a11y → code, not the other way around	You repeatedly paused to ask “Does this meet WCAG?” before finalizing UI pieces. Front-loading accessibility avoids painful retrofits. Call out the specific a11y issues you caught early (keyboard-only upload, aria roles, disabled-state pattern).
Human-in-the-loop UX for alt text	Instead of a one-click generator, you implemented editable output, context injection, regeneration, and quality hints. This respects that alt text is descriptive writing, not raw inference. Highlight that the AI is a drafting partner, not an oracle.
Local persistence for iteration	Storing histories in localStorage let you compare versions offline—useful when model calls are metered. Point out this lightweight alternative to databases for single-page tools.
Vendor agility & fallback strategy	When OpenAI vision failed you toggled to Anthropic with minimal diff. Abstracting the fetch layer (endpoint file, env vars) future-proofs the project and is a live example of “LLM polyglot” design.
Security basics still apply	Even rapid AI prototyping must respect .gitignore, .env, and example templates. Readers often overlook this when copying tutorial repos—reinforce that habit.
Early realism about stretch goals	You removed the real-time quality bar when it proved flaky. Blogging about what you decided not to ship is valuable: it teaches scope management and honesty over feature-checkbox culture.
Tooling friction points	Notable hurdles: Svelte two-way binding forbids dynamic contenteditable, event-handler syntax errors, SDK typings. Sharing those fixes saves readers time.
Prompt patterns	“/ask” to explore options, then “do it” for execution. Calling out this explore → commit cycle shows how to “pair program” with an LLM effectively.
Progressive disclosure of complexity	You started with a static mock, then added context, then regeneration, then history—each layer unlocked after the previous one felt solid. That incremental reveal keeps cognitive load low for both the AI and the human maintainer. Encourage readers to resist the urge to prompt everything at once.

The overall experience

I’d say the overall experience of vibe coding something was pretty fun. A little frustrating when it would just kind of do something stupid or just not be able to do what you asked at all. However that is not much different from dealing with people in my opinion.

It wasn’t a super complex project or really a hard project. If I had not used AI it probably would have taken about 1-3 days. I know I would have gotten caught up in the UI parts of it especially the styling part. The longer I would have taken on it, the more scope creep would happen. I recognized from my past personal projects I need to watch adding additional unplanned features. It’s always “while I am doing this I can extend it to do that” or “why not one more crazy idea.”

Coding with AI without a fully structured plan can be pretty tricky and hard. I mentioned earlier that I do use Aider at work but we have a plan, designs and a product to make. If you let the vibe coding get out of hand you will get a lot of weird code and kind of waste money because you will find yourself going 2 steps forward 1 step back.

Older Array of elements from Svelte each block

Newer -