Re: Literate LLM programming? [Re: Is org-mode accepting AI-assisted babel ob- code updates?]

Jean Louis Sat, 28 Mar 2026 04:21:37 -0700

* Dr. Arne Babenhauserheide <[email protected]> [2026-03-28 13:49]:
> A difference is that LLMs can cope with other complexity in the code.
> 
> 
> For context: humans have a rough mental limit of handling 4 elements at
> the same time. Though we’re able to raise that by building longlived
> models we can lean on and don’t have to think about consciously.
> 
> Those few elements are a limit of our brains.
> 
> We have to limit the number of function arguments, the sources of input,
> and so forth, to keep code manageable for humans.¹
> 
> But LLMs have different limits. While current LLMs copy the structures
> we use, those aren’t actually required for LLMs. An LLM doesn’t need
> files named by context. It doesn’t need to extract code snippets into
> library functions.
> 
> 
> We do.
> 
> 
> So if we accept code that the submitting human does not understand, we
> can reach a state that’s unmaintainable for humans.
> 
> This includes problems that aren’t inherently complex: LLM’s don’t need
> to suffer from incidental complexity as we do. Their context may be much
> smaller than the context of experienced humans, but it’s all accessible
> at the same time.
> 
> 
> So tradeoffs are different there. Code I’ve seen includes unmaintainable
> pages and pages of test code -- with errors -- that won’t be
> maintainable by humans, because changes in requirements -- even bugfixes
> -- will require huge swaths of changes all over the place.
> 
> 
> And usual methods to conquer complexity like extracting solvers that
> apply rules to large amounts of data² or generating code from a concise
> actual source of truth don’t translate, because LLMs aren’t
> deterministic: we can’t know what exactly will be the result of a given
> prompt or whether it will have the same result after an update.


Arne, you make an excellent point about different cognitive limits — humans 
need abstraction and modularity in ways LLMs don't. That's a real maintenance 
risk if we're not careful.

One quick clarification on determinism, though: while LLMs can be 
non-deterministic, temperature 0 gives fully repeatable outputs. I just tested 
this with a local model (Qwen3.5-9B) — three runs each on math and even a 
haiku, identical results every time.

So we can have deterministic behavior when we need it. The bigger challenge 
you're highlighting — keeping code human-maintainable despite LLMs not needing 
that structure — is still the core issue.

More below with the test details.

LLM Deterministic Output Test with Temperature 0
===================================================

API Endpoint: http://192.168.1.68:8080/v1/chat/completions
Model: Qwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf
Temperature: 0

---

Test 1: True/False Question
Question: "True or false: With temperature 0, you can get deterministic outputs 
from LLMs. Explain briefly."

Answer:
**True.**

At a temperature of 0, the LLM's sampling process becomes deterministic. 
Instead of randomly selecting the next token from the probability distribution, 
the model will always select the token with the highest probability (the 
"greedy" choice). This means that for the same input prompt, the model will 
generate the exact same sequence of tokens every time, assuming the model 
weights and context are identical.

---

Test 2: Simple Math Question (Run 3 times)
Question: "What is 2+2?"

Run 1: 2 + 2 = 4.
Run 2: 2 + 2 = 4.
Run 3: 2 + 2 = 4.

Result: IDENTICAL output all 3 times ✓

---

Test 3: Creative Writing - Haiku (Run 3 times)
Question: "Write a short haiku about the ocean."

Run 1:
Deep blue, endless waves,
Whispering secrets to the shore,
Moonlight on the foam.

Run 2:
Deep blue, endless waves,
Whispering secrets to the shore,
Moonlight on the foam.

Run 3:
Deep blue, endless waves,
Whispering secrets to the shore,
Moonlight on the foam.

Result: IDENTICAL output all 3 times ✓

Test 4: Sentence Completion (Run 5 times)
Question: "Complete this sentence: The best way to predict the future
is to"



Run 1:
The best way to predict the future is to **create it**.

This famous quote is often attributed to **Abraham Lincoln**, though it also 
appears in variations by other thinkers like **John F. Kennedy** ("The best way 
to predict the future is to create it") and **Peter Drucker**.

It emphasizes that rather than passively waiting for events to unfold, 
proactive action and innovation are the most reliable methods for shaping what 
comes next.

Run 2: IDENTICAL ✓
Run 3: IDENTICAL ✓
Run 4: IDENTICAL ✓
Run 5: IDENTICAL ✓

Result: IDENTICAL output all 5 times ✓

---

Conclusion:
With temperature 0, the LLM produces deterministic (repeatable) outputs.
The model uses greedy decoding, always selecting the highest-probability token.


Jean Louis

Here are the commands I used, and model was not really "gpt-4", 
ratherQwen3.5-9B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf

Test 1 (True/False question):
curl -s http://192.168.1.68:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "True or false: With temperature 
0, you can get deterministic outputs from LLMs. Explain briefly."}],
    "temperature": 0
  }'
Test 2 (Math - 3 runs):
for i in 1 2 3; do
  echo "=== Run $i ==="
  curl -s http://192.168.1.68:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "What is 
2+2?"}], "temperature": 0}' \
    | jq -r '.choices[0].message.content'
  echo
done
Test 3 (Haiku - 3 runs):
for i in 1 2 3; do
  echo "=== Run $i ==="
  curl -s http://192.168.1.68:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Write a 
short haiku about the ocean."}], "temperature": 0}' \
    | jq -r '.choices[0].message.content'
  echo
done
Test 4 (Sentence completion - 5 runs):
for i in 1 2 3 4 5; do
  echo "=== Run $i ==="
  curl -s http://192.168.1.68:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Complete 
this sentence: The best way to predict the future is to"}], "temperature": 0}' \
    | jq -r '.choices[0].message.content'
  echo
done

Re: Literate LLM programming? [Re: Is org-mode accepting AI-assisted babel ob- code updates?]

Reply via email to