On Sat, Sep 25, 2010 at 6:24 PM, Lorenzo Isella <lorenzo.ise...@gmail.com>wrote:
> ld represent the distance as the proportion of maximum possible > >> distance, i.e. scaling it to be between 0 and 1. >> >> An example: >> A and B have the same length (x), and you calculate the emd(A, B), which >> is d. >> Now you have to determine the maximum distance between these two: >> remembering the analogy of moving earth, the biggest distance between >> the two distributions would be if in A, all elements would be in A(1) >> and all other would be zero, and in B all elements would be zero, except >> of B(x). Now you can calculate the difference between these two, and you >> get dmax >> The last step is to divide d/dmax, i.e. scaling to a value between 0 and >> 1. >> >> this value then can be compared with the same ratio obtained from C and >> D with length y. >> >> One important point to keep in mind when using the emd: if the sum(A) is >> not the same as sum(B), emd(A,B) is NOT EQUAL to emd(B,A). If this >> applies to your case, you have to decide what to do, but one option is >> to standardise A and B so that their sum is the same (effectively >> comparing the SHAPES and not the actual values. >> > > OK, I see. The standardization part is not a terrible problem, I guess. > The other bit is less clear (to me). What are A(1) and B(x)? Am I piling up > all the elements in A and B in a single bin? > Cheers > OK. Some code: > set.seed(13) > B <- sample(1:10, 10) > B [1] 8 3 4 1 6 7 9 10 2 5 > set.seed(13) > A <- sample(1:10, 10) > B <- sample(1:10, 10) > A [1] 8 3 4 1 6 7 9 10 2 5 > B [1] 7 8 9 4 10 2 5 6 3 1 > A[1] <- sum(A) > A[-1] <- 0 > B[length(B)] <- sum(B) > B[-length(B)] <- 0 > A [1] 55 0 0 0 0 0 0 0 0 0 > B [1] 0 0 0 0 0 0 0 0 0 55 And now you can calculate the emd(A, B), which then is the maximum distance between A and B. Imagine: the distance is the work you have to do to convert A into B. Work equals distance times mass you have to move. Therefore you have to maximise the distance you have to carry the earth and the amount you have to carry. Therefore, in A, piling everything up in the first element, and in B, piling everything up in the last element, gives you the most work you have to du, which equals the largest distance. Even though it is rather straight forward, I should probably integrate a function in the package which gives you the largest distance between two distributions - I'll think about it. Hope this helps, Cheers, Rainer > Lorenzo > -- NEW GERMAN FAX NUMBER!!! Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation Biology, UCT), Dipl. Phys. (Germany) Centre of Excellence for Invasion Biology Natural Sciences Building Office Suite 2039 Stellenbosch University Main Campus, Merriman Avenue Stellenbosch South Africa Cell: +27 - (0)83 9479 042 Fax: +27 - (0)86 516 2782 Fax: +49 - (0)321 2125 2244 email: rai...@krugs.de Skype: RMkrug Google: r.m.k...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.