05/06/2026 // AI Research
Beyond Prompt-and-Pray
A practical baseline workflow for using LLMs and coding agents without losing control of the work: specs, examples, clarification, diff review, runtime QA, and earned commits.
This is not meant to be a universal rulebook.
LLM-assisted work is still new, and everyone is going to find a slightly different rhythm. Some people will use agents for large implementation passes. Some will mostly use them for review, debugging, documentation, brainstorming, UI changes, or getting unstuck. Some will want a lot of structure. Others will want just enough process to keep the work from turning into chaos.
That is fine.
The point is not that everyone should copy one workflow. The point is that everyone should start with some workflow, try it on real work, and adjust it based on what actually helps.
The worst version of LLM-assisted development is what people often call prompt-and-pray: ask for something vague, accept the first result, and hope the codebase survives.
That can feel productive because the model moves fast. You type a request, it changes files, and suddenly it looks like progress happened. Sometimes it did. Sometimes it only created code that appears correct until someone has to maintain it, extend it, debug it, or review it.
The better path is not to reject LLMs. The better path is to treat them like powerful collaborators that still need direction, context, examples, review, correction, and runtime QA.
The goal is not to become an "LLM person."
The goal is to become more capable at making things.
Start Somewhere
The best way to learn this is not by debating the perfect workflow forever.
Pick something. Build something. Use the tool on a real task.
Start with something you would use yourself:
- a small internal tool
- a game feature
- a script
- a dashboard
- a workflow helper
- a refactor you have been avoiding
- a UI improvement
- or a boring task that would be useful if it were automated
Personal usefulness matters because it gives you taste. You know when the thing is wrong. You know when it feels clunky. You know when it saves time. You are not building for an abstract user in your head. You are building for yourself first.
Then, once it works for you, ask the next question:
Could this be useful to someone else too?
That is a healthy progression.
Build something real enough that you care whether it works. Then widen the lens if the idea has legs.
Keep the Tool in Proportion
LLMs are useful, but they are not magic.
It is easy to get caught up in the feeling that the model can do anything if you just ask the right way. That mindset can get weird fast. The tool starts to feel smarter, more certain, or more important than it really is.
Keep it in proportion.
An LLM is not omnipotent. It is not a replacement for judgment. It is not a senior engineer living inside your IDE. It is closer to a very fast language and pattern engine that can help you think, draft, refactor, explain, and explore.
A word calculator, basically.
A very useful one, but still a tool.
Use it seriously, but do not get mystical about it. Check its work. Take breaks. Talk to real people. Build things in the real world. Let diffs, users, screenshots, runtime behavior, and working software pull you back to ground truth.
The goal is not to disappear into the machine.
The goal is to make better things with less friction.
A few practical reminders:
- If the model sounds certain, that does not mean it is right.
- If it flatters your idea, that does not mean the idea is good.
- If it generates a lot of code, that does not mean progress happened.
- If it explains something smoothly, that does not mean it understands your system.
- If you feel yourself chasing the perfect prompt forever, stop and build something small.
Use LLMs with responsibility, but keep it chill.
It is not a prophecy machine.
It is just another tool on the bench.
A Practical Baseline Workflow
One useful starting loop looks like this:
Write the spec
Correct the spec
Point to examples
Ask the agent to clarify ambiguity
Let the agent execute
Review the diff
Ask for a second-pass critique
Question unclear decisions
Refactor what smells wrong
QA the runtime behavior
Commit if it earns itThis is not the only way to work. It is just a good baseline because it gives the model structure without making the process too heavy.
It also keeps the human in the right role.
The agent can move fast.
The human adds judgment.
Write the Spec First
The first artifact should usually be a markdown spec, not code.
The spec is the handoff. It tells the agent what you want, what already exists, what to avoid, and what "done" means.
A vague request gives the model too much room to guess.
Bad:
Add an inventory system.Better:
Add a persistent inventory system for player-owned items.
It should support:
- picking up items
- dropping items
- stackable consumables
- weapons as inventory items
- save/load through the existing persistence system
Use these existing patterns as references:
- Make the command executor follow the same shape as CreateOrderCommandExecutor.
- Make the inventory dialog behave like the existing EquipmentDialog.
- Use the same persistence approach already used by PlayerProgressService.
Do not:
- replace the existing item definitions
- introduce a new global state manager
- rewrite the HUD
- touch combat code unless required for weapon equip
Acceptance criteria:
- inventory persists after reload
- empty inventory state is handled
- no unrelated files are changedThe better version is not complicated. It is just clearer.
That clarity matters.
Without it, the agent will make assumptions. Sometimes those assumptions are fine. Sometimes they send the implementation in the wrong direction. A markdown spec reduces that risk before the code starts changing.
Before execution, it is worth reviewing the spec itself:
- Is the goal clear?
- Is the scope small enough?
- Are the non-goals obvious?
- Are the relevant files or systems named?
- Are there examples the agent can copy from?
- Would another developer understand the request?
- Is there anything here the agent might reasonably misinterpret?
The spec does not need to be perfect. It just needs to be better than a vague wish.
Garbage in, garbage out still applies. LLMs just make the garbage faster.
Examples Are Gold
Relative language helps a lot.
That is true for UI, but it is also true for code.
The model does better when you can point to something that already exists and say:
Make the new thing like that.
For code:
Create this service using the same pattern as CustomerImportService.
Keep the constructor style, logging approach, and error handling consistent with that file.Make this command executor follow the same structure as SavePackerSaleCommandExecutor.
Do not invent a new flow unless the existing one cannot work.Add this endpoint beside the existing order endpoints.
Use the same response shape, validation style, and naming conventions.For UI:
Make this dialog look and behave like EditCustomerDialog.
Use the same footer layout, button hierarchy, spacing, and title style.This table should match the existing SalesHistoryTable.
Same row height, same empty state pattern, same action menu style.For architecture:
This should fit into the same application flow as the existing invoice export.
Use that as the reference implementation and only deviate where necessary.Examples reduce invention.
That matters because invention is not always what you want. A lot of the time, the best code is boring code that fits the repo.
If the repo already has a way of doing something, show the agent that way.
Know the Inventory of Your Codebase
A big part of using LLMs well is knowing what your solution already has.
Not just files. Capabilities.
The logical inventory:
- existing services
- command executors
- helpers
- validators
- data access patterns
- domain flows
- import/export patterns
- logging patterns
- error handling conventions
- background jobs
- permission checks
- configuration patterns
The visual inventory:
- dialogs
- tables
- forms
- empty states
- buttons
- cards
- layout patterns
- loading states
- error messages
- navigation patterns
The more you know what already exists, the better you can direct the LLM.
Instead of asking:
Build a new customer import flow.You can ask:
Build a new customer import flow using the supplier import flow as the reference.
Reuse the same file structure, validation pattern, progress UI, and error summary behavior.
Only create new patterns if the existing ones do not fit.That is a much stronger request.
It gives the model context, but it also protects the codebase from random new shapes.
Good LLM usage is not only about asking well. It is about knowing what to point at.
Ask It to Clarify Before It Acts
For bigger asks, it is useful to tell the agent not to execute immediately.
Ask it to stop and clarify anything ambiguous first.
Example:
Before changing files, review this request and ask me any clarifying questions if something is ambiguous.
If the task is clear, summarize your intended approach first.
Do not execute until the scope is confirmed.This catches a lot of problems early.
The model may assume:
- the wrong file
- the wrong pattern
- the wrong UI behavior
- the wrong data flow
- the wrong ownership boundary
- or the wrong definition of "done"
The larger the ask, the more useful this step becomes.
This is not about slowing everything down. It is about avoiding avoidable rework. A two-minute clarification can save you from a giant diff going in the wrong direction.
Let the Agent Execute in a Narrow Lane
Agents are much more useful when the task has boundaries.
A broad request like "improve the whole app" can turn into a messy diff that touches too many files and mixes unrelated decisions together.
A narrower request works better:
Refactor the save/load path for inventory items only.
Keep the public API unchanged.
Do not modify the combat system.
Handle missing item definitions and empty inventories safely.
Use PlayerProgressService as the reference for persistence style.That kind of request gives the agent a lane to stay in.
This does not mean every task has to be tiny. It means the task should have edges. The agent should know what success looks like, what examples to follow, and what areas are off-limits.
Use the Agent to Scaffold, Then Fill the Logic
One useful workflow is to have the agent create the files and structure first, without asking it to fully solve everything at once.
For example, you can ask it to create:
- command executor files
- service classes
- UI components
- handlers
- DTOs
- view models
- route files
- editor panels
- or other pieces that follow your repo's normal shape
Then use comments to lay out the logic before asking it to implement.
Example:
public class CreateInventoryItemCommandExecutor
{
public void Execute(CreateInventoryItemCommand command)
{
// 1. Validate the item definition exists.
// 2. Check whether the player already has a stack of this item.
// 3. If stackable, increase the quantity.
// 4. If not stackable, create a new inventory entry.
// 5. Persist the inventory change through the existing unit of work.
// 6. Return a result the UI can use without leaking domain internals.
}
}This gives the agent rails.
Instead of asking it to invent the architecture and the logic at the same time, you provide the shape and let it help fill in the pieces.
That is often a better use of the tool.
The human decides the flow.
The agent helps with the implementation.
Build Templates for Your Best Practices
If your codebase has patterns you use often, turn them into templates.
For example:
- command executor template
- query handler template
- API endpoint template
- service template
- UI dialog template
- form component template
- table/list component template
- domain event template
- import/export workflow template
These templates can encode the best practices of the codebase:
How we name things
Where files go
How dependencies are passed
How errors are handled
How UI state is managed
How domain logic is separated
What should not be doneThen the agent is not starting from nothing. It can pull from the shape your codebase already wants.
This is one of the better long-term uses of LLMs. You are not just prompting harder. You are making the project easier for the model to work in.
It also helps humans. A good template is documentation that can actually be used.
Treat the First Pass as a Draft
The first pass from an agent is not final code.
It is a candidate.
Most agents write the first version like someone trying to get the ticket closed. They often choose the shortest path to something that looks complete. That can be useful, but it can also produce shortcuts.
This is why the second pass matters.
After the agent writes code, ask it to review the work:
Double check this like a senior engineer reviewing a PR.
Look for:
- code smells
- unnecessary abstractions
- missing edge cases
- unrelated changes
- security concerns
- places where you took the fastest path instead of the cleanest one
- places where you invented a new pattern instead of following an existing oneThis changes the frame.
The first prompt asks the model to produce.
The second asks it to critique.
That distinction is important. Traditional coding works the same way. You write something, then you come back and inspect it with a colder eye.
The review is not extra. It is part of the work.
Review the Diff
The git diff is where the real review happens.
Do not rely only on the agent's summary. The summary can sound clean even when the diff is not.
Read the changes like a PR.
Look for:
- unrelated files
- unnecessary services
- duplicated logic
- hidden state
- hardcoded values
- skipped edge cases
- new dependencies
- naming that does not fit the repo
- patterns that bypass existing architecture
- code that should have reused an existing capability
- and code that works but feels off
That "feels off" reaction is worth listening to. A lot of experienced review starts as a smell before it becomes a fully formed explanation.
When something feels wrong, stop and ask.
Ask Why
One of the most useful things you can do with an agent is question its decisions.
Not aggressively. Just directly.
Why did you create this service?Why are we passing this value through three layers?Why did you add this dependency?What existing pattern did you follow here?Was there an existing service or helper that already did this?What breaks if this command fails halfway?What assumption did you make here?What would you change if this had to run in production?This is where the human adds a lot of value.
The agent can generate the work, but the human can pull on the loose threads. Sometimes the answer is reasonable. Sometimes the explanation reveals that the model did not really have a good reason.
Either way, you learn something before the code lands.
Refactor What Does Not Pass the Sniff Test
Not every bad smell means the whole diff should be thrown away.
Sometimes the direction is right and the implementation just needs cleanup.
Useful follow-ups look like this:
The direction is right, but this service is unnecessary. Inline the logic into the existing flow and keep the public API unchanged.This works, but it bypasses the domain layer. Refactor it to follow the existing pattern in these files.This touched too many unrelated files. Reduce the diff to only the files required for the task.This technically works, but the implementation feels brittle. Simplify the flow and reduce the moving parts.This invented a new dialog pattern. Refactor it to match EditCustomerDialog unless there is a specific reason not to.This duplicates behavior that already exists in ImportSummaryService. Reuse the existing capability instead of creating a parallel one.This is where agentic coding becomes less like "asking for magic" and more like steering a fast collaborator.
You do not need to accept the shape of the first answer. You can push it toward the shape your codebase actually needs.
QA the Runtime Feature
As LLMs take on more implementation work, the human role shifts more toward QA.
Not QA as a separate department. QA as a mindset.
You become the person who asks:
Does this actually work when I use it?
That means running the feature, clicking through it, trying the weird path, changing the input, refreshing the page, opening the dialog twice, resizing the screen, creating the thing, deleting the thing, backing out halfway, and seeing what breaks.
The diff tells you what changed.
The runtime tells you what the user experiences.
Both matter.
This is especially important because an LLM can produce code that looks reasonable in a diff but feels wrong in the product. The behavior might technically exist, but the flow may be awkward. The loading state may be weird. The button may be in the wrong place. The error message may be useless. The happy path may work while the real user path falls apart.
Runtime QA catches what static review misses.
Useful QA prompts after you try the feature:
I tried the feature and this part feels awkward:
[describe what happened]
Suggest a simpler interaction flow before changing files.Here is what happened at runtime:
[paste error/log/behavior]
Explain the likely cause and propose the smallest safe fix.The feature works, but the UX feels clunky when:
[describe scenario]
Give me three possible improvements and recommend the least risky one.The important thing is to stay grounded in the actual behavior.
Do not only ask:
Did the code change?
Ask:
Did the thing feel right when used?
That is where human taste becomes extremely valuable.
Front End Work Is Powerful, But Be Explicit
Front end work is one of the best uses of LLMs.
You can ask for UI changes, layout cleanup, component refactors, better empty states, improved dialogs, responsive behavior, cleaner forms, or more polished interactions, and the agent can often move very quickly.
But there is one catch:
The LLM does not have eyes unless you give it something to see.
If you ask:
Make this look better.You might get anything.
Better:
Make the primary button take up about one third of the dialog width.
Keep it aligned to the right.
Make the cancel button visually secondary.
Do not change the dialog copy.Or:
The save button is too tall right now.
Make it about half as tall as it currently appears.
Keep the same font size if possible.
Reduce vertical padding before changing anything else.Or:
Make this dialog visually match EditCustomerDialog.
Same header style, same footer button layout, same spacing between fields.
Do not invent a new dialog layout.Visual changes need relative language.
Use:
- one third of the width
- half as tall
- slightly more padding
- 25% less spacing
- align with the left edge of the title
- match the width of the input above it
- keep this section fixed while the list scrolls
- make the secondary action less visually dominant
- match the existing customer edit dialog
- follow the same empty state pattern as the order history screen
Screenshots help a lot.
If the UI is the thing being changed, give the agent a screenshot and describe what you want relative to what exists. Otherwise it is guessing from code alone.
Good front end prompting is less about saying "make it nicer" and more about describing the visual delta.
Relative Language Works for Code Too
Relative language is not only for UI.
It works for code structure, naming, architecture, and behavior.
Instead of:
Create a service for inventory exports.Try:
Create an inventory export service that follows the same shape as SalesExportService.
Use the same constructor pattern, same result object style, same logging approach, and same error handling conventions.
Only deviate where inventory genuinely requires different behavior.Instead of:
Add a dialog for assigning users.Try:
Add an assign-user dialog that matches EditRoleDialog.
Same footer actions, same width, same loading behavior, same validation message placement.
The content is different, but the interaction pattern should feel identical.Instead of:
Add a background job.Try:
Add this background job using the same registration and execution pattern as NightlyInvoiceSyncJob.
Do not introduce a new scheduling pattern.
Use the existing logging and failure handling style.This is how you get conforming code.
The model is much better when it can compare against a known target. You are not just describing what to build. You are describing where it belongs in the existing system.
That is the difference between new code that feels native to the repo and new code that feels pasted in from somewhere else.
Verify Behavior
A green checkmark does not automatically mean the implementation is good.
The more important question is:
Does the behavior actually make sense?
Sometimes the fastest way to build confidence is simply:
- running the feature
- interacting with it directly
- checking edge cases manually
- profiling behavior
- reviewing logs
- looking at the UI
- or reading through the flow carefully
The goal is not to blindly trust output from either the model or the tooling around it.
The goal is to understand what the system is actually doing.
That understanding matters more than ceremony.
Commit Only When It Earns It
Once code is committed, it is your code.
Not "the LLM's code."
Yours.
So the commit should only happen after:
- the diff has been reviewed
- the behavior has been verified
- the scope is clean
- the approach makes sense
- the code fits the existing patterns
- and you understand the important decisions
That does not mean every line needs to be perfect. It means the work has passed the same basic standard you would expect from any code entering the repo.
LLMs do not remove that standard. They just change how quickly code arrives at the gate.
Respect the Next Reviewer
When you share code with someone else, you are asking for their attention.
That is true whether the code was handwritten, LLM-assisted, or mostly generated by an agent.
It is usually better not to hand off raw agent output as-is. A quick human pass shows respect for the next person's time and catches obvious issues early.
Before opening a PR, it is worth doing the first pass yourself:
- clean up weird changes
- remove unrelated files
- question strange decisions
- ask for a second-pass critique
- verify the behavior
- check whether the implementation follows existing patterns
- and leave notes where something deserves closer attention
This is not about shaming anyone's workflow. It is just good collaboration.
The next reviewer should receive something that has already had some care put into it.
Build Project Rules Over Time
Every repeated correction is a clue.
If you keep telling the agent the same thing, write it down.
Examples:
- Do not use underscore-prefixed private fields.
- Follow this transaction pattern.
- Do not introduce new services for one-off logic.
- Do not bypass the domain layer.
- Do not touch generated files.
- Prefer existing helpers over new utilities.
- Keep changes scoped to the task.
- Explain any new dependency before adding it.
- For UI changes, ask for screenshots when visual context matters.
- For large asks, clarify ambiguity before executing.
- Prefer existing templates over inventing new structure.
- When creating new files, point to the closest existing example.
- Reuse existing visual and logical patterns unless there is a reason not to.
These rules can live in files like:
AGENTS.md
.junie/guidelines.md
ARCHITECTURE.md
STYLEGUIDE.md
CONTRIBUTING.mdThis gives the agent reusable context. It also helps the team clarify its own practices.
That is one of the underrated benefits of agentic workflows: they force vague team knowledge to become explicit.
If the agent keeps getting something wrong, the answer is not always a better prompt. Sometimes the answer is better project context.
Different People Will Use This Differently
Some developers will want a strict spec-first flow.
Some will use agents more casually while prototyping.
Some will use them mostly for review and debugging.
Some will use them heavily for front end work.
Some will use them to scaffold files and then fill in the logic manually.
Some will never let an agent touch production code directly but will still use it to explain unfamiliar code, find edge cases, or draft refactors.
That is all valid.
The useful question is not:
Is this the one correct way to use LLMs?
The useful question is:
Does this workflow help me build better things without losing control of the work?
If yes, keep it.
If no, adjust it.
The workflow should serve the builder, not the other way around.
Working Maxims
Prompting is not the whole system. The workflow around the prompt matters more.
The first pass writes. The second pass critiques.
For large asks, clarify before execution.
Examples are gold.
Relative language works for both UI and code.
Know the inventory of your solution. Point the agent at what already exists.
The agent can move fast, but the human adds taste, context, and care.
The diff shows what changed. Runtime QA shows what the user actually feels.
The LLM does not have eyes. Give it screenshots or describe the visual change clearly.
Agent-written code is still your code once you accept it.
Start with something you would actually use.
Build your own loop, then refine it.
Take what works. Leave what does not. Keep building.
Article FAQ
Article takeaways
- What is prompt-and-pray development?
- Prompt-and-pray means giving an LLM or coding agent a vague request, accepting the first result, and hoping the codebase survives without enough spec, review, or runtime QA.
- What is the baseline LLM-assisted workflow recommended here?
- Start with a clear spec, point to existing examples, ask the agent to clarify ambiguity, let it execute in a narrow lane, review the diff, request a second-pass critique, QA the runtime behavior, and commit only after the work earns it.
- Who is this workflow for?
- It is for builders, developers, founders, and teams using LLMs or coding agents on real software who want more speed without giving up judgment, maintainability, or product quality.