Hi, I would like to ask for best practice advice on the design of data structure and the connected analysis techniques.
In my particular case, I have measurements of several variables at several, sometimes equal, heights. Following the tidy data approach of Hadley Wickham, I want to put all data in one data frame. In principle, the height variable is something like a category. For example, I want to average over time for every height. Using dplyr this works very well when my height variable is a factor. However, if it is not a factor the grouping sometimes will not work probably due to numerical issues: http://stackoverflow.com/questions/24555010/dplyr-and-group-by-factor-vs-no-factor https://github.com/hadley/dplyr/issues/482 Even if the behaviour described in the links above is a bug, on can easily create other numerical issues in R: > (0.1+0.2) == 0.3 [1] FALSE Thus, it seems one should avoid grouping by float values and, in my case, use factors. However, from time to time, I need the numerical character of the heights: compare heights, find the maximum height, etc. Here, the ordered factor approach might help. However, I have to combine (via rbind or merge) different data sets quite often so keeping the order of the different ordered factor heights also seem to be difficult. Is there any general approach which reduces the work or do I have to switch between approaches as needed? Thanks a lot for any input, Sebastian ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.