It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ...
and private static class TokenComparator implements Comparator<Token> { public int compare(Token t1, Token t2) { return t2.cnt - t1.cnt; } and cnt is the token count. Ludovic. 2011/4/7 Frederico Azeiteiro [via Lucene] < ml-node+2790579-1141723501-383...@n3.nabble.com> > Well at this point I'm more dedicated to the Deduplicate issue. > > Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns > a lot of similar docs that I don't consider similar - even tuning the > parameters. > > Finishing this issue, I found out that the signature also contains the > field name meaning that if you wish to signature both title and text fields, > your signature will be a hash of ("text"+"text value"+"title"+"title > value"). > > In any case, I found that the Hashmap used on the hash algorithm inserts > the tokens by some hashmap internal sort method that I can't understand :), > and so, impossible to copy to C# implementation. > > Thank you for all your help, > Frederico > > ----- Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.html Sent from the Solr - User mailing list archive at Nabble.com.