Vibing in 2025
And the quality-code-to-prompt ratio
Saying LLM’s write bad/unsafe/broken/… code, is like trying out Rust for five hours, then going to Reddit and bashing it for being overly hard and complex.
Coding with e.g. Cursor doesn’t mean you should let the model do all the planning, thinking and coding. Building apps with e.g. lovable.dev doesn’t mean you get infinitely extensible production grade software.
… and this is all fine — though some people do use these tools with the wrong expectations, and some base their opinions solely on what other people say. To quote the late Steve Jobs; “You’re holding it wrong”.
I can get behind statements like “90% of code will be written by LLMs”, but only when in the light of the size of the prompt. To generate that 90% of code, the prompt is likely about the same size as the code you’re not writing.
I.e. writing 20 lines of code might require 20 lines of prompt, give it 6 months and that might be 20:15, 20:10 etc. I think this ratio is what we should track to understand the ability of the models — and our ability to instruct the models — and it’s this ratio that I try and learn to optimise when working with Cursor. However, and this is important, we should not fool ourselves that “more code is better code”, and this does indeed vary by how verbose the language is etc. See it more as a mental model.
By figuring out how to ask for implementations, what information to include in the prompts and various other strategies, I can get this ratio up.
The problems with vibe software is in this ratio. Asking an agent to build a complete system from a short prompt is like asking for 5000:2 . Even with the best prompting strategies I’ve yet to see any quality code generated even close to this. Building the guard-rails around the model to enable this is indeed what e.g. lovable has done rather successfully. But it likely also means constricting the liberties of the models drastically.
Asking an agent to implement a very small scope might be 1:5 , i.e. you describe what you want in five lines, and it’s solved by a one-liner of code. The former is meh because of the resulting quality, the latter is meh because it’s not a massive efficiency boost (although if you don’t fully know the language, writing that prompt might be a lot faster than finding the documentation for that one-liner). I guess the ultimate metric isn’t simply code-to-prompt, but qualitycode-to-prompt.
With this, code written by AI can definitely be indistinguishable from it’s humanely crafted counterpart, and many times of higher quality. I agree that I don’t want my next chest X-ray or self driving car to be running on vibe-coded 10000:1 spaghetti software, but I won’t mind at all if it’s well reviewed LLM generated 20:10 code, created with smart strategies, good prompting and thorough review.
Some example strategies
My main point is concluded, but I figured I could share some strategies that work for me. First of all, learning what scope of changes to ask the model for, and how to phrase the question, is like learning to communicate with an alien life form. Sure it understands your words, but it has a hard time putting it in context.
In your system prompt (Cursor user rules), at least tell the model to always run code linting and tests as a part of it’s workflow. This way, it will catch many of it’s own bugs and issues, and fix them before considering itself done.
Second, add user rules to always implement tests, and be detailed how you want those tests to work also when prompting. You need to help the model here, because these tests is what makes the code somewhat robust when you ask for a higher code-to-prompt ratio.
Third, tell the model to write comments in the code so that it itself can understand the motivations of why methods and parameters are implemented the way they are. I even ask it to maintain a “context” folder with markdown documentation of the system. After it has implemented and tested, and hence concluded, new or changed functionality, it will create or modify these docs.
Fourth, tell it to instrument debug logging, and ensure to include those logs whenever you troubleshoot. With this, the model is multitudes better at understanding bugs.
Fifth, strongly advice the model to be the expert. LLM’s are infuriatingly agreeable and will not only agree with you (even when you’re wrong), it will commend your intellect. This is nice, but it’s also rather dangerous.
For prompting, and for larger changes, ask it to first write a todo list, then you as a developer review and alter that list, and then you set the model free to implement items one-by-one.
Even with the context and motivations documented, ensure to include them in the context of your queries, and remind the model to stick to e.g. class interfaces or design patterns. Continuously remind the model what methods it can re-use, what methods it can probable deprecate and to use proper encapsulation.
Also for prompting, don’t go back and forth in the same “chat session”, ask for an implementation or bug fix — if it isn’t quite right, revert it and change the prompt. Otherwise you risk following the model down infinitely deep rabbit holes where a lot of unnecessary, and often breaking, changes happen along the way. If you do go down a rabbit hole like this, continue until you find the issue, then pretend you knew this issue all along, revert to the state of the original prompt, and give it this context.
