Professional Software Developers and AI Agent Use

Paper: https://arxiv.org/html/2512.14012v1

Summary

This research investigates how experienced software developers (3+ years professional experience) actually use AI agents for coding, contrasting real-world practice with "vibe coding" approaches that prioritize flow over code quality.

Methodology

Field Observations (N=13): Screen-recorded sessions where experienced developers tackled real work tasks using preferred agentic tools, followed by semi-structured interviews
Qualitative Survey (N=99): A 15-minute survey gathering broader developer perspectives

Key Findings

Developers want control, not autonomy - Professional developers exercise deliberate control over agents rather than trusting them blindly
Validation is essential - They validate all agent outputs before accepting
Planning before implementation - Developers plan extensively before letting agents work
Task suitability matters - Agents work well for straightforward, well-described tasks but struggle with complex problems
Control enables enjoyment - Developers enjoy agent use when they retain control

Notable Quote

"There is no way I'll EVER go back to coding by hand" (S28)

Yet this doesn't mean abandoning oversight—developers actively supervise agent behavior.

Relevance to llm-do

This paper provides empirical validation for llm-do's design philosophy.

Alignment with llm-do Architecture

Paper Finding	llm-do Implementation
Developers want control, not autonomy	Imperative control flow—Python owns the logic (`if`, `for`, `try/except`), not a declarative graph DSL
Validation of all agent outputs	Approval system—tool calls gated with syscall-like approvals; pattern-based rules
Plan extensively before implementation	Workers as focused units—decomposition into composable, well-scoped workers
Agents work well for straightforward tasks	Worker design—each worker has clear instructions, tools, and boundaries
Maintaining software quality standards	Progressive stabilizing—extract stable LLM patterns into deterministic, testable Python code

Core Insight Validation

The paper's central finding—"Professional developers exercise deliberate control over agents rather than trusting them blindly"—is exactly what llm-do's architecture enables:

Visible distribution boundaries - The system explicitly marks where you cross from deterministic (tools) to stochastic (LLM workers), so developers know when trust is required
Bidirectional refactoring - As patterns stabilize, stabilize them to code; as edge cases multiply, soften back to prompts. This matches the paper's observation that developers "leverage their expertise to ensure quality"
Bounded recursion - Depth limits prevent runaway delegation, supporting the observed developer need for predictable behavior

Implications

This paper supports positioning llm-do as aligned with professional practice:

The approval system directly addresses the "validation of outputs" finding
Progressive stabilizing is the systematic approach to what developers do informally (gradually trusting and extracting patterns)
The "Unix philosophy for agents" matches how experienced developers actually want to work—with control, oversight, and the ability to stabilize patterns over time

Relevant Notes:

three distinct mechanisms — the developer behaviors observed here (validate, extract patterns, stabilise) are empirical evidence of the crystallisation loop this note formalises as continuous learning

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search