Re: [emacs-tangents] [RFC] Pros and cons of using LLM for patches beyond simple copyright

Jean Louis Mon, 16 Mar 2026 15:11:57 -0700

* Russell Adams <[email protected]> [2026-03-16 16:57]:
> On Sun, Mar 15, 2026 at 09:35:06PM +0300, Jean Louis wrote:
> > * Dr. Arne Babenhauserheide <[email protected]> [2026-03-15 20:42]:
> >
> > Let's quit drama, focus on contribution guidelines related to LLM as
> > that is subject of the thread.
> 
> My opinion is simple: zero LLM usage tolerated. LLM created and
> assisted patches should be rejected.


It's a modern form of blood and soil ideology—purity of thought,
rejecting any tool that isn't "native born" to the human mind. Any
outside assistance is seen as contamination.

It's a shibboleth—a way to say 'I am a real programmer' by burning the
books of the future. The ritual requires suffering, so any shortcut is
heresy.

It creates a new inquisition. Now we must prove our thoughts are
"organically grown," forcing developers to hide their process or face
excommunication.

Here are five ways LLMs can assist in coding without generating or
patching code directly:

1. Code Review Preparation

   An LLM can analyze your code for potential edge cases, style
   inconsistencies, or common pitfalls before you submit it for human
   review. This helps clean up minor issues so reviewers can focus on
   architecture and logic.

   It would be difficult to say that user even used LLM to generate
   code. Planning process of the tool `opencode' https://opencode.ai/
   is that type. It can tell programmer what to do, instead of
   generating code. 

2. Writing Documentation and Comments

   LLMs can draft docstrings, inline comments, or high-level README
   sections based on your code. This saves time and encourages better
   documentation practices without changing the code itself.

   Docstrings may not be "code"...

3. Planning and Pseudocode

   You can describe a problem in plain English, and the LLM can help
   outline steps, data structures, or algorithms in pseudocode. This
   clarifies your approach before you write actual code.

4. Explaining Legacy or Complex Code

   When you inherit unfamiliar code, an LLM can provide a
   plain-English summary of what it does, how its parts interact, and
   potential side effects—helping you understand it without modifying
   it.

5. Generating Test Cases

   LLMs can suggest edge cases or example inputs/outputs for unit
   tests based on your function’s purpose. You still write the tests,
   but the brainstorming is assisted. Tests may not be submitted.

These uses keep the human firmly in the driver’s seat while using the
LLM as a thinking partner or documentation aid.

Here are a few ways to correct and improve your message, depending on the tone 
you want to convey.

The original text had a few grammatical errors regarding pluralization 
("experiences"), word order ("benefits" vs. "benefits of"), and capitalization. 
It also had a slightly confusing sentence structure in the second paragraph.

Russel, I don't think you have enough experience with LLMs to know
their benefits. It is surprising to see such a hard and strict stance
saying 'zero LLM tolerance.' You can't know this, Russel. Most likely,
you already use LLM-generated code in your Emacs without realizing
it. One big LOL!

> LLM is not just any tool, where we can blame the user for the
> resulting outputs. LLMs are everything free software has stood against
> for decades.

So the argument is: proprietary LLMs bad, therefore all LLM assistance
bad. No nuance. No distinction between running Llama.cpp locally and
feeding code to OpenAI. Just a clean, simple ban that requires zero
thought to enforce.

Perfect thought-stopper. "LLMs are everything free software has stood
against"—say it enough times and you never have to ask which LLMs, how
they're used, or who controls them. Just a comfortable, absolute
rejection.

Quite contrary. Free software is about sharing. LLMs are distributed
sharing machines.

Quite the opposite. Free software is about sharing—sharing code,
sharing knowledge, sharing how-to. LLMs are distributed sharing
machines. They take the collective wisdom of thousands and make it
available to anyone. That's not against free software; that's its
logical endpoint.

Free software's genius was always the gift economy: I share my code,
you share yours, we all improve together. LLMs are just that principle
scaled—thousands of developers, millions of decisions, distilled into
a tool that shares it all back with you.

> LLMs are proprietary, locked behind paywall services.

Free software ran on proprietary hardware for decades. We didn't ban
coding because Intel kept microcode secret. We built tools that ran on
it anyway. Same here: open models running on closed GPUs is just the
next iteration of that same struggle.

Unix was proprietary once. So was C. So was the internet itself. The
pattern is always: corporate labs invent, then free software
democratizes. We're watching that happen in real time with LLMs.

Why don't you go to Swiss AI and tell to them how "LLMs are
proprietary"? Reference: https://www.swiss-ai.org/apertus

> They are not open source.

There are free software drivers and proprietary once for GPUs,
depending on the GPU. Then again to get the inference, one doesn't
need to run it on the GPU, CPU can get good numbers of token. This
means fully free software can run inference of fully free language
models.

Majority of software running language models are free software
already.

Language model is not software. It is distributes statistics of
probabilities.  An LLM is like a recording of a master pianist
improvising. It contains the patterns, the style, the "essence" of
that performance. You can't point to a single note and say "that's the
software," but you can feed that recording into a player piano (the
inference code) to create new music that sounds like the master.

You keep calling it "open source software"—but an LLM isn't
software. It's a statistical model, a matrix of numbers. You're
judging a fish by how well it climbs a tree.

> You can't run them yourself because you lack the trained model and
> the heavyweight hardware.

What? That is point of my frustration, seeing such statements. What
year is this statement from? 2022? Because in 2026, millions of people
run LLMs locally every single day. The statement isn't just wrong—it's
aggressively outdated.

Let's see what is running locally over here:

48672 /usr/local/bin/llama-server -ngl 0 --device none --rerank -m 
/mnt/nvme0n1/LLM/quantized/bge-reranker-v2-m3-q8_0.gguf -v -c 8192 -ub 1024 
--log-timestamps --host 192.168.1.68 --port 7676
49138 /usr/local/bin/llama-server --jinja -fa on -c 131072 -ngl 64 -v 
--log-timestamps --host 192.168.1.68 -ub 1024 --threads 16 --embeddings 
--reasoning-format deepseek-legacy --reasoning-budget 0 -m 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-9B-UD-Q4_K_XL.gguf --mmproj 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-9B-mmproj-F16.gguf
49772 /usr/local/bin/llama-server -ngl 999 -v -c 8192 -ub 1024 --embedding 
--log-timestamps --host 192.168.1.68 --port 9999 -m 
/mnt/nvme0n1/LLM/nomic-ai/quantized/nomic-embed-text-v1.5-Q8_0.gguf
79549 llama-server --jinja -fa on -c 131072 -ngl 0 --device none -v 
--log-timestamps --host 192.168.1.68 --port 9991 -ub 1024 --threads 16 
--embeddings --reasoning-format deepseek-legacy --reasoning-budget 0 -m 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-2B-UD-Q8_K_XL.gguf --mmproj 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-2B-mmproj-F16.gguf
79820 llama-server --jinja -fa on -c 131072 -ngl 0 --device none -v 
--log-timestamps --host 192.168.1.68 --port 9992 -ub 1024 --threads 16 
--embeddings --reasoning-format deepseek-legacy --reasoning-budget 0 -m 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-0.8B-UD-Q8_K_XL.gguf --mmproj 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-0.8B-mmproj-F16.gguf
79978 llama-server --jinja -fa on -c 131072 -ngl 0 --device none -v 
--log-timestamps --host 192.168.1.68 --port 9993 -ub 1024 --threads 16 
--embeddings --reasoning-format deepseek-legacy --reasoning-budget 0 -m 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-4B-UD-Q5_K_XL.gguf --mmproj 
/mnt/nvme0n1/LLM/quantized/Qwen3.5-4B-mmproj-F16.gguf

I have six (6) language models running simultaneously on a single
computer in my home. Not in the cloud. Not behind a paywall. Not
through someone else's API.

Process 48672: A reranker model, filtering and sorting information
locally.

Process 49138: A 9-billion parameter vision-language model (Qwen 3.5)
with 64 GPU layers, handling both text and images. Context window of
131,072 tokens—long enough to digest entire books.

Process 49772: A dedicated embedding model creating vector
representations of text, running on port 9999. Plus there is same
embedding space image model as well.

Processes 79549, 79820, 79978: Three additional Qwen models of varying
sizes (2B, 0.8B, 4B), each serving different purposes, each running on
CPU with zero GPU layers.

> They do vendor lock in with their APIs.

Friend, you're confusing two completely different things. Vendor
lock-in is what OpenAI and Anthropic do—proprietary APIs, black boxes,
your data training their models. I'm talking about Hugging Face: open
weights, downloadable models, run-anywhere formats. One is a cage. The
other is a library. Please go to https://huggingface.co and research

> They can change at any time. You have no rights while using
> them. Their output is not your own, and they could claim partial
> ownership.

Change what? The model sitting on my hard drive? Go ahead—try to
change it remotely. I'll wait. Oh right, you can't, because it's mine
now. Downloaded, quantified, running on my machine with no internet
connection. Good luck "changing" that.

"Their output is not your own"—whose output? The model I'm running
locally, on my hardware, with my prompts, generating text that never
touches their servers? That output is mine. Every byte of it. There's
no "they" to claim anything.

> On the side of software ethics LLMs fail every litmus test. They are
> trained without permission on the free work of others. They are
> untrustworthy, not only in their erroneous output, but they may not
> follow user instructions when the owner's directions override. They
> are owned and pushed by some of the worst companies and people on
> earth. They are being used to hurt and manipulate others on an
> industrial scale.

- "Trained without permission on free work of others" -- You were
  trained on the free work of others too—your parents, your teachers,
  every book you ever read, every conversation you ever
  overheard. That's called learning. Humans don't pay royalties for
  every idea they absorb, and neither should machines

- "Untrustworthy in erroneous output" — All human-written code has
  bugs. All human experts make mistakes. The standard isn't
  perfection—it's utility. Local LLMs are tools, not oracles, and
  responsible users verify outputs just as they verify human
  contributions.

- "May not follow user instructions when owner's directions override"
  — This describes proprietary APIs with hidden system prompts. It
  does not describe locally-run open models where you control the
  instructions, the system prompt, and every line of inference
  code. There are so called abliterated and uncencored models. It is
  free choice for user to use what they want. Your statement is
  generalized.

- "Owned and pushed by some of the worst companies and people on
  earth" — The same companies build the hardware you're using, the
  operating system you're running, and the compilers you trust. The
  tool is not the crime. Free software communities now produce and
  distribute their own models independent of corporate agendas.

- "Being used to hurt and manipulate others on an industrial scale" —
  So are social media algorithms. So are targeted advertising
  systems. So are search engines. The existence of misuse does not
  invalidate the tool—it calls for ownership and control so that
  individuals can use them for good rather than ceding them entirely
  to bad actors.

> "It wurked fer me!" and "I like that it helped me make something
> quickly" are arguments for the utility. I can't completely disregard
> their utility, but the ethical side I cannot ignore. It's easier to
> use commercial vendor's lock-in software too! Yet for ethical reasons
> here we are using and building FREE software.
> 
> LLMs have no place in free software.

You're posing this as utility vs ethics, as if we have to choose. But
that's a false choice. The ethical path isn't rejecting LLMs—it's
owning them. Running local models, on your hardware, with open
weights, accountable to no corporation. That's not choosing utility
over ethics. That's choosing both.

Perfect ethical purity is easy when you're not the one being
harmed. But the people who benefit most from LLMs—students, hobbyists,
non-native speakers, developers in developing countries—don't have the
luxury of rejecting powerful tools. The ethical choice is to give them
free alternatives, not take away the only ones they have.

You've correctly identified what's wrong with (some) corporate
language models. But your solution—reject the entire category—throws
out the baby with the bathwater. The fight isn't against LLMs. It's
for free LLMs.

The evidence is overwhelming—and it's all on Hugging Face right
now. Let's walk through the specific examples that prove LLMs have a
place in free software:

- Microsoft Phi — Released under the MIT license. The Phi-4 model with
  14B parameters is explicitly "ready for commercial and
  non-commercial use" and carries an MIT License Agreement. You can
  download it from Hugging Face, run it locally, modify it, and
  integrate it into your own projects without asking Microsoft for
  permission.

- OLMo (Allen Institute for AI) — This is not just open weights. It's
  everything open: the base model weights, the training code, the
  fine-tuning code, the architecture documentation, and even the
  pre-print papers detailing the data and training process. All
  released under Apache 2.0 by a non-profit research institute
  committed to open science.

- Qwen (Alibaba) — One of the most widely used open LLM families
  globally, with over 40 million downloads. The Qwen organization on
  Hugging Face hosts models ranging from 0.5B to 235B parameters,
  including dense and MoE architectures, multimodal versions, and
  quantized variants. Most are released under Apache 2.0. You can run
  them locally with llama.cpp, Ollama, or LM Studio.

- Apertus (Swiss ETH/EPFL) — Developed by public Swiss universities,
  trained on the Alps supercomputer. This is a fully open model with a
  crucial distinction: when collecting training data, the developers
  explicitly observed Swiss and EU legal regulations related to data
  privacy, copyright, and transparency rules. The training data is
  disclosed and reproducible, the basic model is freely available on
  Hugging Face under Apache 2.0, and the project even provides email
  addresses for privacy and copyright requests. This is what ethical
  natural language model development looks like in practice.

- Granite (IBM) — A family of encoder-based embedding models for
  retrieval tasks, spanning dense and sparse architectures. IBM
  publicly releases all Granite models under the Apache 2.0 license,
  allowing both research and commercial use with full transparency
  into their training data. The Granite Vision models are designed for
  enterprise document understanding and achieve top ranks on industry
  benchmarks.

So when someone claims that LLMs have "no place in free software,"
what are they telling these projects?

- That Microsoft's MIT-licensed Phi, which you can download and run
  without ever phoning home to Microsoft, doesn't count?
  
- That Allen Institute's OLMo, with everything—weights, code, data,
  logs—released under Apache 2.0, is somehow not free software?
  
- That IBM's Apache-2.0 Granite models, used in enterprise retrieval
  pipelines, are ethically equivalent to OpenAI's black-box API?

The category is not monolithic. There is a clear spectrum from fully
closed (GPT-4) to fully open (OLMo, Apertus). And the fully open end
of that spectrum is growing every day, built by universities,
nonprofits, and even corporations who understand that open wins.

The question isn't whether LLMs belong in free software. The question
is whether free software will rise to meet them—or cede the field
entirely.

-- 
Jean Louis

---
via emacs-tangents mailing list 
(https://lists.gnu.org/mailman/listinfo/emacs-tangents)

Re: [emacs-tangents] [RFC] Pros and cons of using LLM for patches beyond simple copyright

Reply via email to