Dear Andrew Halford Here just merely a suggestion of R newbie without sufficient statistical background
MCMCpoisson from package MCMCpack Could you please lately post the answer to your very interesting question when you will find it. Wit best regards Denis У Аўт, 08/02/2011 у 14:38 +1000, Andrew Halford піша: > Hi R-Users, > > I have a student doing work with lionfish and she has been trying to analyse > a multivariate dataset to see what variables/factors are influencing the > behaviour of lionfish. We have attempted a number of analyses, including > rpart, relimpo and standard linear regression but we are not having much > luck with quality output. The data is very non-normal and we would > appreciate some advice on the best way to go about analysing it. > > Kathy has provided a synopsis below along with part of the dataset below. > > Any help/advice appreciated. > > I am stuck in a problem with a dataset on a behavior study on Indo-Pacific > lionfish *Pterois volitans*. The idea is to find out whether lionfish behave > differently at different locations and times of day and whether these > differences can be accounted for by any of the explanatory variables > measured. > My response variable is a series of behavior categories: (1) rest, (2) > passive hunting and (3) active hunting. I have chosen to treat them > individually because each one has a different biological importance, so > basically I am trying to come up with an answer for 3 response variables. > Measurement for these behavior categories is proportion of time (10 minute > observation) spent at the activity described and values range from 0 to 1. > Explanatory variables are a mix of categorical and continuous variables and > are six: Region (Guam and Philippines), Hours after Sunrise, Habitat (5 > categories), Weather (3 categories), Current (3 categories) and Lionfish > Size (cm). > > The following is an example of the dataset for response variable Rest (R) > > R > > REG > > HAS > > HAB > > WE > > CU > > SI > > 0.05 > > 0 > > 11.0166667 > > Artificial > > 2 > > 0 > > 10 > > 0.05 > > 0 > > 0.56666667 > > Rock_boulder_cave > > 1 > > 1 > > 11 > > 0.05 > > 0 > > 9.13333333 > > Artificial > > 1 > > 1 > > 18 > > 0.1 > > 0 > > 4.2 > > Sand_rubble > > 1 > > 2 > > 20 > > 0.1 > > 0 > > 9.13333333 > > Rock_boulder_cave > > 1 > > 2 > > 10 > > 0.1 > > 0 > > 9.6 > > Sand_rubble > > 0 > > 0 > > 7 > > 0.1 > > 0 > > 0.78333333 > > Rock_boulder_cave > > 1 > > 0 > > 31 > > 0.1 > > 0 > > 1.28333333 > > Artificial > > 1 > > 0 > > 20 > > 0.1 > > 0 > > 10.8666667 > > Coral > > 1 > > 0 > > 22 > > 0.15 > > 0 > > 10.4166667 > > Coral > > 0 > > 1 > > 27 > > 0.2 > > 0 > > 3.46666667 > > Rock_boulder_cave > > 0 > > 0 > > 8 > > 0.2 > > 0 > > 1.23333333 > > Rock_boulder_cave > > 1 > > 0 > > 25 > > 0.45 > > 1 > > 11.6833333 > > Coral > > 2 > > 0 > > 15 > > 0.5 > > 1 > > 11.0166667 > > Artificial > > 1 > > 2 > > 14 > > 0.5 > > 1 > > 11.9166667 > > Artificial > > 0 > > 0 > > 14 > > 0.5 > > 1 > > 9.53333333 > > Artificial > > 1 > > 0 > > 24 > > 0.5 > > 1 > > 9.83333333 > > Artificial > > 1 > > 0 > > 15 > > 0.5 > > 1 > > 11.5833333 > > Rock_boulder_cave > > 1 > > 1 > > 29 > > 0.53 > > 1 > > 5.91666667 > > Coral > > 1 > > 1 > > 15 > > 0.6 > > 1 > > 11.0166667 > > Artificial > > 1 > > 2 > > 17 > > 0.6 > > 1 > > 9.78333333 > > Rock_boulder_cave > > 0 > > 0 > > 12 > > 0.6 > > 1 > > 4.68333333 > > Sand_rubble > > 2 > > 0 > > 14 > > 0.6 > > 1 > > 5.01666667 > > Rock_boulder_cave > > 2 > > 0 > > 16 > > 0.6 > > 1 > > 3.18333333 > > Artificial > > 2 > > 1 > > 19 > > 0.65 > > 1 > > 5.25 > > Coral > > 2 > > 0 > > 15 > > 0.65 > > 1 > > 9.63333333 > > Sand_rubble > > 1 > > 1 > > 17 > > > > > As you can see here I have converted categorical variables region, > current and weather to numerical; region because it can be expressed in > binary form and the other two because they represent a quantity. For habitat > I have created a dummy variable based on deviation coding, and introduced it > as a variable in my model. > Total sample size is 357, of which each sample is an observation at a > particular time of day. A histogram of my response variable is not normally > distributed and has a bit of a U-shape with lots of 0s and 1s, which means > the animal was either completely engaged in that activity during the 10 min. > observation or didn't show it at all. I have tried a series of > transformations to normalize but have been unsuccessful (log, log(x+1), ln, > sqrt, fourth root). > What type of analyses have I tried? > (1) Regression trees. > Using categorical variables as categorical without changing into > numerical. This was coded with package rpart and is the preferred analyses > due to ease of interpretation. The response variable was untransformed and > the distribution chosen Poisson. Result was a tree with immediately > increasing error (cp) which picked 0 splits as the best tree. > > (2) Multiple regression > Tried using package relaimpo to obtain a classification on the > importance of explanatory variables. Used different transformations to > analyze residuals and in all cases obtained a weird looking set of residuals > with a portion normally distributed and another portion clustered to the > side, giving the whole graph a clear trend (my guess is these are all the 1s > and 0s in the data). > I also tried non-linear regressions (glm) with package pscl (Poisson, > negative binomial and zero inflated negative binomial. In all cases fit > seemed adequate but variance explained was very small and coefficients > estimated for my EVs very low. > > Any ideas??? I have lastly used Primer to analyze the response variable > in response to each EV individually. That works well but limits my > conclusions and doesn't allow me to account for variation in one of the EVs > affecting others. I appreciate any help I can get, > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

