I get a different set of errors than you do (what version of R are you using?).

Patrizio showed one way to do what you want.  But, what is it that you are 
really trying to accomplish?  What do you think the result of 20,000 normality 
tests (each of which may not be answering the real question on its own) will 
tell you?

Your code below seems to be mixing concepts that it would benefit you to learn 
more about and when to use each one.  If y is fully numeric, then it is more 
efficient to use a matrix than a data frame.  

In your loop you assign y.temp to be a list containing 1 row from y, this 
results in a list with 1 element which is a 1 row data frame.  Why make it a 
list?  Do you really want it to stay a data frame or be a vector?

You then run lapply on a list with only one element, that works, but is a bit 
wasteful and does not accomplish anything more than running the function on the 
single element.

Then the shapiro.test function is passed a data frame when it is expecting a 
vector (this gives an error on my install, you may have something different 
going on).

Then testtable is being overwritten each time through the loop, so you are 
throwing away most of your work without ever doing anything with it.

Why the loop and the apply's?

Why not just try something like apply(y, 1, shapiro.test) ?

And overall what are you trying to accomplish? Because what this is likely to 
accomplish is probably less useful than just generating random numbers.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
> project.org] On Behalf Of DB1984
> Sent: Thursday, February 03, 2011 8:52 PM
> To: r-help@r-project.org
> Subject: [R] Finding non-normal distributions per row of data frame?
> 
> 
> This is my first attempt at this, so hopefully a few kind pointers can
> get me
> going in the right direction...
> 
> I have a large data frame of 20+ columns and 20,000 rows. I'd like to
> evaluate the distribution of values in each row, to determine whether
> they
> meet the criteria of a normal distribution. I'd loop this over all the
> rows
> in the data frame, and output the summary results to a new data frame.
> 
> I have a loop that should run a Shapiro-Wilk test over each row,
> 
> y= data frame
> 
> for (j in 1:nr) {
> y.temp<-list(y[j,])
> testsw <- lapply(y.temp, shapiro.test)
> testtable <- t(sapply(testsw, function(x) c(x$statistic, x$p.value)))
>  colnames(testtable) <- c("W", "p.value")
> }
> 
> 
> but it is currently throwing out an error:
>  "Error in `rownames<-`(`*tmp*`, value = "1") :
>   attempt to set rownames on object with no dimensions"
> 
> ...which I guess is unrelated to the evaluation of normality, and more
> likely a faulty loop?
> 
> Any suggestions either for this test, or a better way to evaluate the
> normal
> distribution (e.g. qq-plot residuals for each row) would be greatly
> received. Thanks.
> --
> View this message in context: http://r.789695.n4.nabble.com/Finding-
> non-normal-distributions-per-row-of-data-frame-tp3259439p3259439.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to