Human writing structures transfer to LLMs because failure modes overlap

Type: note · Status: current

Human writing genres — Toulmin argumentation, scientific paper structure, legal brief format — evolved under selection pressure. They exist because they help humans avoid specific reasoning failures. The Toulmin scaffold, for instance, forces the writer to separate evidence from reasoning and to surface assumptions.

The naive transfer argument ("it helps humans, so it helps LLMs") is weak because LLMs are not humans. But it becomes stronger when we observe that LLMs exhibit surprisingly human-like failure modes: they get distracted by irrelevant context, they conflate evidence with opinion, they skip qualifications when the argument feels strong. These failures are "un-machine-like" — you wouldn't expect a computational system to get distracted — but they happen.

This overlap suggests a methodology: rather than assuming wholesale transfer, evaluate the specific arguments for why a structure helps humans, and check whether each argument applies to LLMs. For Toulmin, the argument is "separating evidence from warrant prevents conflation" — and LLMs do conflate evidence with warrant. For scientific paper structure, the argument is "methods sections enable reproducibility" — which may be less relevant for LLMs.

Empirical evidence supports this overlap. Lampinen et al. (2024) tested LLMs on three reasoning tasks (syllogisms, the Linda problem, the Wason selection task) and found that LLMs mirror human accuracy patterns — both perform better on familiar/believable content and worse on abstract or belief-violating content. LLM confidence even correlates with human response times on the same problems. The overlap is real but not universal: on the Wason selection task, LLMs show qualitatively different error patterns (antecedent-false errors rather than matching bias), marking a concrete boundary where human conventions may not transfer.

Chain-of-thought prompting reduces content bias by improving performance on abstract/unfamiliar conditions without degrading familiar ones — suggesting that structured prompting pushes toward content-independent reasoning. This is directly relevant: the structured templates in this KB (Toulmin sections, Evidence/Reasoning/Caveats) may work by a similar mechanism, forcing the model past content-biased defaults.

The methodology applies beyond Toulmin: any human writing convention proposed for LLM use should be evaluated by asking "what specific failure does this prevent, and does the LLM exhibit that failure?" The Lampinen results show this must be done per-convention — syllogisms and NLI show shared failure modes (convention transfers), while the Wason task shows divergence (convention may not transfer).


Relevant Notes:

Topics: