Re: Accepting AI generated contributions

Josh McKenzie Sat, 16 Aug 2025 04:25:02 -0700

> P.S. "Does don't-use-our-output-to-train-a-competitor language disqualify a 
> model/vendor" also seems to me to be plainly a question for Legal.
Yeah; I was noodling on this meta question last week. If there are restrictions 
on output, but those restrictions don't apply in the circumstances you're using 
it, is that output actually restricted *in that context*? In the genai terms of 
use and output cases I read such restrictions as being non-transitive (i.e. non 
copyleft and non sticky) so it is a somewhat isolated question I think.


There are also code generation platforms with restrictions such as "You agree 
you won't use this to break the law". I could see some interpretations of 
https://opensource.org/osd arguing against accepting contributions with even 
that phrasing.

If it all boils down to "Does accepting this contribution expose the foundation 
or project to legal risk or license restriction due to restrictions on terms of 
use of the output?" then ISTM being in clear compliance with terms at time of 
generation w/a non-transitive license should be a non-issue.

Anyway - I'm not a lawyer. Defer to actual lawyers and all that, just found the 
angle of reasoning interesting.

On Fri, Aug 15, 2025, at 9:13 AM, Jonathan Ellis wrote:
> To the degree that we can draw conclusions from this three month old thread, 
> I submit that the top one is: ASF policy is not optimally clear.
> 
> (Personally I think the *spirit* of the policy is clear, as embodied in the 
> TLDR at the bottom, but the text itself is not and since it's a legal 
> document that's a problem.)
> I have a lot of sympathy for the viewpoint that: we (contributors) are 
> responsible for what we submit and we (committers and PMC members) are 
> responsible for reviewing it. It shouldn't matter if copyrighted code was 
> submitted by someone pasting from an GPLed repo or decompiling a competitor's 
> product, or by someone channeling GPT-5 or Sonnet 4.
> 
> I think the reason this (reasonably!) makes some people uncomfortable is that 
> there is no way to 100% guarantee that your AI assistant didn't pull out a 
> GPL section from its internal training data, nor is there a way for a 
> developer to reasonably check for such a thing. So unlike in the manual labor 
> scenario you could, in theory, end up with inadvertent infringement. 
> 
> The problem is that there isn't a solution for this, not really, not even 
> with a SBOM which would end up as the software equivalent of "security 
> theater." 
> 
> So the options I see are
> 1. The developer + reviewer are responsible, we accept that mistakes may 
> happen, we will fix them if and when they do.
> 2. We publish a list of approved models and accept that it will probably be 
> quietly ignored by a lot of people since it will be out of date within weeks, 
> but hey at least we have legal cover.
> 
> Either way, I think we need to revert to ASF legal to clarify policy.
> P.S. "Does don't-use-our-output-to-train-a-competitor language disqualify a 
> model/vendor" also seems to me to be plainly a question for Legal.
> On Thu, Aug 14, 2025 at 12:26 PM Ariel Weisberg <ar...@weisberg.ws> wrote:
>> __
>> Hi,
>> 
>> I want to dig a little deeper into the actual ToS and make a distinction 
>> between the terms placing a burden on the output of the model and placing a 
>> burden on access/usage.
>> 
>> Here are the Claude consumer ToS that seem relevant:
>> ```
>> You may not access or use, or help another person to access or use, our 
>> Services in the following ways:
>>  1. To develop any products or services that compete with our Services, 
>> including to develop or train any artificial intelligence or machine 
>> learning algorithms or models or resell the Services.
>> ```
>> 
>> And the commercial ToS:
>> ```
>>  1. *Use Restrictions.* Customer may not and must not attempt to (a) access 
>> the Services to build a competing product or service, including to train 
>> competing AI models or resell the Services except as expressly approved by 
>> Anthropic; (b) reverse engineer or duplicate the Services; or (c) support 
>> any third party’s attempt at any of the conduct restricted in this sentence.
>> ```
>> One way to interpret this is there is a burden on the access/usage and if 
>> what you are doing when you access/use is acceptable then the output is 
>> unencumbered. So for example if you are developing code for Apache Cassandra 
>> and you generate something for that purpose then your access was not any one 
>> of (a) or (b) and it would be a very large stretch to say that contributing 
>> that code to ASF contributes to (c).
>> 
>> So unless I hear legal say otherwise I would say those ToS are acceptable.
>> 
>> Now let's look at OpenAI's terms which state:
>> ```
>>  • Use Output to develop models that compete with OpenAI.
>> ```
>> This is more concerning because it's restriction on output not access.
>> 
>> Gemini has restrictions on "generating or distributing content that 
>> facilitates:... Spam, phishing, or malware"
>> and that is a little concerning because it sounds like it encumbers the 
>> output of the model not the access.
>> 
>> It really really sucks to be in the position of trying to be a lawyer for 
>> every single service's ToS.
>> 
>> Ariel
>> 
>> On Thu, Aug 14, 2025, at 12:36 PM, Ariel Weisberg wrote:
>>> Hi,
>>> 
>>> It's not up to us to interpret right? It's been interpreted by Apache Legal 
>>> and if we are confused we can check, but this is one instance where they 
>>> aren't being ambiguous or delegating to us to make a decision.
>>> 
>>> I can't see how we can follow legal's guidance and accept output from 
>>> models or services running models with these issues.
>>> 
>>> This isn't even a change of what we settled on right? We seemed to broadly 
>>> agree that we wouldn't accept output from models that aren't license 
>>> compatible. What has changed is we have realized is that it applies to more 
>>> models.
>>> 
>>> At this point I don't think we should try to maintain a list. We should 
>>> provide a brief guidance that we don't accept code from models/services 
>>> that are not license compatible (and highlight that this is most popular 
>>> services) and encourage people to watch out for models/services that might 
>>> reproduce license incompatible training data.
>>> 
>>> Ariel
>>> 
>>> On Fri, Aug 1, 2025, at 1:13 PM, Josh McKenzie wrote:
>>>> So I'll go ahead and preface this email - I'm not trying to open Pandora's 
>>>> Box or re-litigate settled things from the thread. *But...*
>>>> 
>>>>>         • The terms and conditions of the generative AI tool do not place 
>>>>> any restrictions on use of the output that would be inconsistent with the 
>>>>> Open Source Definition. https://opensource.org/osd/
>>>> By that logic, Anthropic's terms would also run afoul of that right?
>>>> https://www.anthropic.com/legal/consumer-terms
>>>>> You may not access or use, or help another person to access or use, our 
>>>>> Services in the following ways:
>>>>> ...
>>>>> 2. To develop any products or services that compete with our Services, 
>>>>> including to develop or train any artificial intelligence or machine 
>>>>> learning algorithms or models or resell the Services.
>>>>> ...
>>>> 
>>>> Strictly speaking, that collides with the open source definition: 
>>>> https://opensource.org/osd
>>>>> 6. No Discrimination Against Fields of Endeavor
>>>>> 
>>>>> The license must not restrict anyone from making use of the program in a 
>>>>> specific field of endeavor. For example, it may not restrict the program 
>>>>> from being used in a business, or from being used for genetic research.
>>>> 
>>>> Which is going to hold true for basically all AI platforms. At least right 
>>>> now, they all have some form of restriction and verbiage discouraging 
>>>> using their services to build competing services.
>>>> 
>>>> Gemini, similar terms <https://ai.google.dev/gemini-api/terms>:
>>>>> You may not use the Services to develop models that compete with the 
>>>>> Services (e.g., Gemini API or Google AI Studio). You also may not attempt 
>>>>> to reverse engineer, extract or replicate any component of the Services, 
>>>>> including the underlying data or models (e.g., parameter weights).
>>>> Plus a prohibited use clause.
>>>> 
>>>> So ISTM we should either be ok with all of them (i.e. cassandra doesn't 
>>>> compete with any of them and it matches the definition of open-source in 
>>>> the context of our project's usage) or ok with none of them. And I'm 
>>>> heavily in favor of the former interpretation.

Re: Accepting AI generated contributions

Reply via email to