On Mon, 28 Apr 2025 at 17:46, Stephan Verbücheln <verbuech...@posteo.de> wrote:
> > Is the change technical or legal/philosophical? You could call this > > a Turing test for copyright. > This is not a new issue at all. I remember that back in the day in > order to legally reverse engineer a computer program, companies had to > set up two separate teams of developers. > One team reads the code and writes documentation. The second team reads > the documentation and writes the new code. It was crucial that no > member of the second team sees the original code in order to rule out > any copyright issues. But, does it? If we consider the product of trained knowledge to be a derivative work of the training input, then the documentation produced by the first team would also be tainted by the copyright of the original code. So such interpretation also defeats the whole two-teams process. And many modern LLMs are actually often trained in stages - there is a very large model that is trained on the source data and then there are compact models that are actually trained by the first model. It's called model distillation. And then there are other methods of getting new information into already trained models at runtime via RAG technique - with that a LLM may only contain fundamental information and then reach out to load additional data sources, relevant to the specific query. Like an expert going online and checking prices and availability of various products before advising you what to choose for your planned build. At this point the LLM+RAG is just a smart web browser. (Sadly, I am *not* an expert on modern AI technologies) -- Best regards, Aigars Mahinovs