Isn't plagiarism detection based on overlaps with sentence structure?
That way, it would catch plagiarism if someone simply did a
find-and-replace. But that would also catch regressions with the same
output format.

How long was the original thesis?  If 25% of it was all regression
output, sounds like a lot of regressions.



On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard <pda...@gmail.com> wrote:
> Marc,
>
> I don't think Copyright/Intellectual property issues factor into this. Urkund 
> and similar tools are to my knowledge entirely about plagiarism. So the issue 
> would seem to be that the R output is considered identical or nearly 
> indentical to R output in other published orotherwise  submitted material.
>
> What puzzles me (except for how a document can be deemed 32% plagiarized in 
> 25% of the text) is whether this includes the numbers and variable names. If 
> those are somehow factored out, then any R regression could be pretty much 
> identical to any other R regression. However, two analyses with similar 
> variable names could happen if they are based on the same cookbook recipe and 
> analyses with similar numerical output come from analyzing the same standard 
> data. Such situations would not necessarily be considered plagiarism (I mean: 
> If you claim that you are analyzing data from experiments that you yourself 
> have performed, and your numbers are exactly identical to something that has 
> been previously published, then it would be suspect. If you analyze something 
> from public sources, someone else might well have done the same thing.).
>
> Similarly to John Kane, I think it is necessary to know exactly what sources 
> the text is claimed to be plagiarized from and/or what parts of the text that 
> are being matched by Urkund. If it turns out that Urkund is generating false 
> positives, then this needs to be pointed out to them and to the people basing 
> decisions on it.
>
> -pd
>
>> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwa...@me.com> wrote:
>>
>> Hi,
>>
>> With the usual caveat that I Am Not A Lawyer....and that I am not speaking 
>> on behalf of any organization...
>>
>> My guess is that they are claiming that the output of R, simply being copied 
>> and pasted verbatim into your thesis constitutes the use of copyrighted 
>> output from the software.
>>
>> It is not clear to me that R's output is copyrighted by the R Foundation (or 
>> by other parties for CRAN packages), albeit, the source code underlying R 
>> is, along with other copyright owner's as apropos. There is some caselaw to 
>> support the notion that the output alone is not protected in a similar 
>> manner, but that may be country specific.
>>
>> Did you provide any credit to R (see the output of citation() ) in your 
>> thesis and indicate that your analyses were performed using R?
>>
>> If R is uncredited, I could see them raising the issue.
>>
>> You might check with your institution's legal/policy folks to see if there 
>> is any guidance provided for students regarding the crediting of software 
>> used in this manner, especially if that guidance is at no cost to you.
>>
>> Regards,
>>
>> Marc Schwartz
>>
>>
>>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> wrote:
>>>
>>> 1. It is highly unlikely that we could be of help (unless someone else
>>> has experienced this and knows what happened). You will have to
>>> contact the Urkund people and ask them why their algorithms raised the
>>> flags.
>>>
>>> 2. But of course, the regression methodology is not "your own" -- it's
>>> just a standard tool that you used in your work, which is entirely
>>> legitimate of course.
>>>
>>> Cheers,
>>> Bert
>>>
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And knowledge
>>> is certainly not wisdom."
>>>  -- Clifford Stoll
>>>
>>>
>>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver
>>> <oliver.barr...@skema.edu> wrote:
>>>>
>>>> Dear 'R' community support,
>>>>
>>>>
>>>> I am a student at Skema business school and I have recently submitted my 
>>>> MSc thesis/dissertation. This has been passed on to an external plagiarism 
>>>> service provider, Urkund, who have scanned my document and returned a 
>>>> plagiarism report to my professor having detected 32% plagiarism.
>>>>
>>>>
>>>> I have contacted Urkund regarding this issue having committed no such 
>>>> plagiarism and they have told me that all the plagiarism detected in my 
>>>> document comes from the last 25% which consists only of 'R' regressions 
>>>> like the one I have pasted below:
>>>>
>>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. +
>>>>   Fed.t.4., data = OLS_CAR, x = TRUE)
>>>>
>>>> Residuals:
>>>>     Min        1Q    Median        3Q       Max
>>>> -0.154587 -0.015961  0.001429  0.017196  0.110907
>>>>
>>>> Coefficients:
>>>>            Estimate Std. Error t value Pr(>|t|)
>>>> (Intercept) -0.001630   0.001763  -0.925   0.3559
>>>> Fed         -0.121595   0.165359  -0.735   0.4627
>>>> Fed.t.1.     0.344014   0.140979   2.440   0.0153 *
>>>> Fed.t.2.     0.026529   0.143648   0.185   0.8536
>>>> Fed.t.3.     0.622357   0.142021   4.382 1.62e-05 ***
>>>> Fed.t.4.     0.291985   0.158914   1.837   0.0671 .
>>>> ---
>>>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>>
>>>> Residual standard error: 0.0293 on 304 degrees of freedom
>>>> (20 observations deleted due to missingness)
>>>> Multiple R-squared:  0.08629,  Adjusted R-squared:  0.07126
>>>> F-statistic: 5.742 on 5 and 304 DF,  p-value: 4.422e-05
>>>>
>>>> I have produced all of these regressions myself and pasted them directly 
>>>> from the 'R' software package. My regression methodology is entirely my 
>>>> own along with the sourcing and preperation of the data used to produce 
>>>> these statistics.
>>>>
>>>> I would be very grateful if you could provide my with some clarity as to 
>>>> why this output from 'R' is reading as plagiarism.
>>>>
>>>> I would like to thank you in advance,
>>>>
>>>> Kind regards,
>>>>
>>>> Oliver Barrett
>>>> (+44) 7341 834 217
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide 
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to