Re: [Rd] Canberra distance

Christophe Genolini Sat, 06 Feb 2010 08:32:06 -0800

The definition I use is the on find in the book "Cluster analysis" byBrian Everitt, Sabine Landau and Morven Leese.They cite, as definition paper for Canberra distance, an article ofLance and Williams "Computer programs for hierarchical polytheticclassification" Computer Journal 1966.I do not have access, but here is the link :http://comjnl.oxfordjournals.org/cgi/content/abstract/9/1/60

Hope this helps.

Christophe

On 06/02/2010 10:39 AM, Christophe Genolini wrote:
Hi the list,
According to what I know, the Canberra distance between X et Y is :sum[ (|x_i - y_i|) / (|x_i|+|y_i|) ] (with | | denoting the function'absolute value')In the source code of the canberra distance in the file distance.c,we find :
    sum = fabs(x[i1] + x[i2]);
    diff = fabs(x[i1] - x[i2]);
    dev = diff/sum;

which correspond to the formula : sum[ (|x_i - y_i|) / (|x_i+y_i|) ]
(note that this does not define a distance... This is correct whenx_i and y_i are positive, but not when a value is negative.)
Is it on purpose or is it a bug?
It matches the documentation in ?dist, so it's not just a codingerror. It will give the same value as your definition if the twoitems have the same sign (not only both positive), but differentvalues if the signs differ.
The first three links I found searching Google Scholar for "Canberradistance" all define it only for non-negative data. One of them gaveexactly the R formula (even though the absolute value in thedenominator is redundant), the others just put x_i + y_i in thedenominator.
None of the 3 papers cited the origin of the definition, so I can'ttell you who is wrong.
Duncan Murdoch


______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Canberra distance

Reply via email to