Hi Hadley, actually, I started with floating point numbers, ensured that the respective numbers are equal in R but I still got strange behaviour with dplyr's group_by:
https://github.com/hadley/dplyr/issues/482 If I had to guess, I would suppose the source of this error somewhere in the C++ part of dplyr. This happened only on one machine I have available. Whether this is a bug in dplyr, or in the older machine's libraries, or not a bug at all, I cannot say. Nonetheless, this confirmed my feelings about avoiding floating point numbers in this context and lead me to ask for advice here... Sebastian Am 04.07.2014 17:33, schrieb Hadley Wickham: > Why not just round the floating point numbers to ensure they're equal > with zapsmall, round or signif? > > Hadley > > On Fri, Jul 4, 2014 at 4:04 AM, Sebastian Schubert > <schubert....@gmail.com> wrote: >> Hi, >> >> I would like to ask for best practice advice on the design of data >> structure and the connected analysis techniques. >> >> In my particular case, I have measurements of several variables at >> several, sometimes equal, heights. Following the tidy data approach of >> Hadley Wickham, I want to put all data in one data frame. In principle, >> the height variable is something like a category. For example, I want to >> average over time for every height. Using dplyr this works very well >> when my height variable is a factor. However, if it is not a factor the >> grouping sometimes will not work probably due to numerical issues: >> >> http://stackoverflow.com/questions/24555010/dplyr-and-group-by-factor-vs-no-factor >> https://github.com/hadley/dplyr/issues/482 >> >> Even if the behaviour described in the links above is a bug, on can >> easily create other numerical issues in R: >>> (0.1+0.2) == 0.3 >> [1] FALSE >> >> Thus, it seems one should avoid grouping by float values and, in my >> case, use factors. However, from time to time, I need the numerical >> character of the heights: compare heights, find the maximum height, etc. >> Here, the ordered factor approach might help. However, I have to combine >> (via rbind or merge) different data sets quite often so keeping the >> order of the different ordered factor heights also seem to be difficult. >> >> Is there any general approach which reduces the work or do I have to >> switch between approaches as needed? >> >> Thanks a lot for any input, >> Sebastian >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.