FWIW, as Loïc already mentioned, we had the same discussions on the
scikit-learn side.
We noticed every now and then a few issues / PRs would come which were
clearly AI generated, and in almost all those cases, the account posting
them didn't look like a human / didn't have a history on GH.
At the end, practically speaking, we know people use AI generated code (a
bunch of us use copilot ourselves). So the issue wasn't that we didn't want
any AI generated code, people use AI and there's no way around it. However,
we wanted to be able to tell people that completely AI generated code /
issues are not acceptable, and came up with this text:
Please refrain from submitting issues or pull requests generated by
> fully-automated tools. Maintainers reserve the right, at their sole
> discretion,
> to close such submissions and to block any account responsible for them.
>
> Ideally, contributions should follow from a human-to-human discussion in
> the
> form of an issue.
Another point not included here, but we like, is to say that "if you use
AI, you should only submit code which you really understand".
Cheers,
Adrin
On Thu, Jul 4, 2024 at 9:40 PM Ralf Gommers wrote:
>
>
> On Thu, Jul 4, 2024 at 8:42 PM Matthew Brett
> wrote:
>
>> Hi,
>>
>> On Thu, Jul 4, 2024 at 6:44 PM Ralf Gommers
>> wrote:
>> >
>> >
>> >
>> > On Thu, Jul 4, 2024 at 5:08 PM Matthew Brett
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> On Thu, Jul 4, 2024 at 3:41 PM Ralf Gommers
>> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Jul 4, 2024 at 1:34 PM Matthew Brett <
>> matthew.br...@gmail.com> wrote:
>> >> >>
>> >> >> Hi,
>> >> >>
>> >> >> On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers <
>> ralf.gomm...@gmail.com> wrote:
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett <
>> matthew.br...@gmail.com> wrote:
>> >> >> >>
>> >> >> >> Sorry - reposting from my subscribed address:
>> >> >> >>
>> >> >> >> Hi,
>> >> >> >>
>> >> >> >> Sorry to top-post! But - I wanted to bring the discussion back
>> to
>> >> >> >> licensing. I have great sympathy for the ecological and
>> code-quality
>> >> >> >> concerns, but licensing is a separate question, and, it seems to
>> me,
>> >> >> >> an urgent question.
>> >> >> >>
>> >> >> >> Imagine I asked some AI to give me code to replicate a
>> particular algorithm A.
>> >> >> >>
>> >> >> >> It is perfectly possible that the AI will largely or completely
>> >> >> >> reproduce some existing GPL code for A, from its training data.
>> There
>> >> >> >> is no way that I could know that the AI has done that without
>> some
>> >> >> >> substantial research. Surely, this is a license violation of
>> the GPL
>> >> >> >> code? Let's say we accept that code. Others pick up the code
>> and
>> >> >> >> modify it for other algorithms. The code-base gets infected
>> with GPL
>> >> >> >> code, in a way that will make it very difficult to disentangle.
>> >> >> >
>> >> >> >
>> >> >> > This is a question that's topical for all of open source, and
>> usages of CoPilot & co. We're not going to come to any insightful answer
>> here that is specific to NumPy. There's a ton of discussion in a lot of
>> places; someone needs to research/summarize that to move this forward.
>> Debating it from scratch here is unlikely to yield new arguments imho.
>> >> >>
>> >> >> Right - I wasn't expecting a detailed discussion on the merits -
>> only
>> >> >> some thoughts on policy for now.
>> >> >>
>> >> >> > I agree with Rohit's: "it is probably hopeless to enforce a ban
>> on AI generated content". There are good ways to use AI code assistant
>> tools and bad ones; we in general cannot know whether AI tools were used at
>> all by a contributor (just like we can't know whether something was copied
>> from Stack Overflow), nor whether when it's done the content is derived
>> enough to fall under some other license. The best we can do here is add a
>> warning to the contributing docs and PR template about this, saying the
>> contributor needs to be the author so copied or AI-generated content needs
>> to not contain things that are complex enough to be copyrightable (none of
>> the linked PRs come close to this threshold).
>> >> >>
>> >> >> Yes, these PRs are not the concern - but I believe we do need to
>> plan
>> >> >> now for the future.
>> >> >>
>> >> >> I agree it is hard to enforce, but it seems to me it would be a
>> >> >> reasonable defensive move to say - for now - that authors will need
>> to
>> >> >> take full responsibility for copyright, and that, as of now,
>> >> >> AI-generated code cannot meet that standard, so we require authors
>> to
>> >> >> turn off AI-generation when writing code for Numpy.
>> >> >
>> >> >
>> >> > I don't think that that is any more reasonable than asking
>> contributors to not look at Stack Overflow at all, or to not look at any
>> other code base for any reason. I bet many contributors may not even know
>> whether the auto-complete functionality in their IDE comes from a r