Re: TextProfileSigature using deduplication

Andrzej Bialecki Thu, 20 Nov 2008 06:47:00 -0800

Mark Miller wrote:

Thanks for sharing Marc, thats very nice to know. I'll take yourexperience as a starting point for some wiki recommendations.
Sounds like we should add a switch to order alpha as well.

On the general note of near-duplicate detection ... I found this paperin the proceedings of SIGIR-08, which presents an interesting andrelatively simple algorithm that yields excellent results. Who has somespare CPU cycles to implement this? ;)


http://ilpubs.stanford.edu:8090/860/

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: TextProfileSigature using deduplication

Reply via email to