Isn't plagiarism detection based on overlaps with sentence structure? That way, it would catch plagiarism if someone simply did a find-and-replace. But that would also catch regressions with the same output format.
How long was the original thesis? If 25% of it was all regression output, sounds like a lot of regressions. On Tue, Sep 22, 2015 at 4:06 PM, peter dalgaard <pda...@gmail.com> wrote: > Marc, > > I don't think Copyright/Intellectual property issues factor into this. Urkund > and similar tools are to my knowledge entirely about plagiarism. So the issue > would seem to be that the R output is considered identical or nearly > indentical to R output in other published orotherwise submitted material. > > What puzzles me (except for how a document can be deemed 32% plagiarized in > 25% of the text) is whether this includes the numbers and variable names. If > those are somehow factored out, then any R regression could be pretty much > identical to any other R regression. However, two analyses with similar > variable names could happen if they are based on the same cookbook recipe and > analyses with similar numerical output come from analyzing the same standard > data. Such situations would not necessarily be considered plagiarism (I mean: > If you claim that you are analyzing data from experiments that you yourself > have performed, and your numbers are exactly identical to something that has > been previously published, then it would be suspect. If you analyze something > from public sources, someone else might well have done the same thing.). > > Similarly to John Kane, I think it is necessary to know exactly what sources > the text is claimed to be plagiarized from and/or what parts of the text that > are being matched by Urkund. If it turns out that Urkund is generating false > positives, then this needs to be pointed out to them and to the people basing > decisions on it. > > -pd > >> On 22 Sep 2015, at 18:24 , Marc Schwartz <marc_schwa...@me.com> wrote: >> >> Hi, >> >> With the usual caveat that I Am Not A Lawyer....and that I am not speaking >> on behalf of any organization... >> >> My guess is that they are claiming that the output of R, simply being copied >> and pasted verbatim into your thesis constitutes the use of copyrighted >> output from the software. >> >> It is not clear to me that R's output is copyrighted by the R Foundation (or >> by other parties for CRAN packages), albeit, the source code underlying R >> is, along with other copyright owner's as apropos. There is some caselaw to >> support the notion that the output alone is not protected in a similar >> manner, but that may be country specific. >> >> Did you provide any credit to R (see the output of citation() ) in your >> thesis and indicate that your analyses were performed using R? >> >> If R is uncredited, I could see them raising the issue. >> >> You might check with your institution's legal/policy folks to see if there >> is any guidance provided for students regarding the crediting of software >> used in this manner, especially if that guidance is at no cost to you. >> >> Regards, >> >> Marc Schwartz >> >> >>> On Sep 22, 2015, at 11:01 AM, Bert Gunter <bgunter.4...@gmail.com> wrote: >>> >>> 1. It is highly unlikely that we could be of help (unless someone else >>> has experienced this and knows what happened). You will have to >>> contact the Urkund people and ask them why their algorithms raised the >>> flags. >>> >>> 2. But of course, the regression methodology is not "your own" -- it's >>> just a standard tool that you used in your work, which is entirely >>> legitimate of course. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Tue, Sep 22, 2015 at 7:27 AM, BARRETT, Oliver >>> <oliver.barr...@skema.edu> wrote: >>>> >>>> Dear 'R' community support, >>>> >>>> >>>> I am a student at Skema business school and I have recently submitted my >>>> MSc thesis/dissertation. This has been passed on to an external plagiarism >>>> service provider, Urkund, who have scanned my document and returned a >>>> plagiarism report to my professor having detected 32% plagiarism. >>>> >>>> >>>> I have contacted Urkund regarding this issue having committed no such >>>> plagiarism and they have told me that all the plagiarism detected in my >>>> document comes from the last 25% which consists only of 'R' regressions >>>> like the one I have pasted below: >>>> >>>> lm(formula = Prague50 ~ Fed + Fed.t.1. + Fed.t.2. + Fed.t.3. + >>>> Fed.t.4., data = OLS_CAR, x = TRUE) >>>> >>>> Residuals: >>>> Min 1Q Median 3Q Max >>>> -0.154587 -0.015961 0.001429 0.017196 0.110907 >>>> >>>> Coefficients: >>>> Estimate Std. Error t value Pr(>|t|) >>>> (Intercept) -0.001630 0.001763 -0.925 0.3559 >>>> Fed -0.121595 0.165359 -0.735 0.4627 >>>> Fed.t.1. 0.344014 0.140979 2.440 0.0153 * >>>> Fed.t.2. 0.026529 0.143648 0.185 0.8536 >>>> Fed.t.3. 0.622357 0.142021 4.382 1.62e-05 *** >>>> Fed.t.4. 0.291985 0.158914 1.837 0.0671 . >>>> --- >>>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>>> >>>> Residual standard error: 0.0293 on 304 degrees of freedom >>>> (20 observations deleted due to missingness) >>>> Multiple R-squared: 0.08629, Adjusted R-squared: 0.07126 >>>> F-statistic: 5.742 on 5 and 304 DF, p-value: 4.422e-05 >>>> >>>> I have produced all of these regressions myself and pasted them directly >>>> from the 'R' software package. My regression methodology is entirely my >>>> own along with the sourcing and preperation of the data used to produce >>>> these statistics. >>>> >>>> I would be very grateful if you could provide my with some clarity as to >>>> why this output from 'R' is reading as plagiarism. >>>> >>>> I would like to thank you in advance, >>>> >>>> Kind regards, >>>> >>>> Oliver Barrett >>>> (+44) 7341 834 217 >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.