Hi. Yes, I manage to create a stable comparator in c# for profile. The problem is before that on:
... tokens.put(s, tok); ... Imagine you have 2 tokens with the same frequency, on the stable sort comparator for profile it will maintain the original order. The problem is that the original order comes from the way they are inserted in hashmap 'tokens' and not from the order the tokens appear on original text. Frederico -----Original Message----- From: lboutros [mailto:boutr...@gmail.com] Sent: sexta-feira, 8 de Abril de 2011 09:49 To: solr-user@lucene.apache.org Subject: Re: Using MLT feature It seems that tokens are sorted by frequencies : ... Collections.sort(profile, new TokenComparator()); ... and private static class TokenComparator implements Comparator<Token> { public int compare(Token t1, Token t2) { return t2.cnt - t1.cnt; } and cnt is the token count. Ludovic. 2011/4/7 Frederico Azeiteiro [via Lucene] < ml-node+2790579-1141723501-383...@n3.nabble.com> > Well at this point I'm more dedicated to the Deduplicate issue. > > Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns > a lot of similar docs that I don't consider similar - even tuning the > parameters. > > Finishing this issue, I found out that the signature also contains the > field name meaning that if you wish to signature both title and text fields, > your signature will be a hash of ("text"+"text value"+"title"+"title > value"). > > In any case, I found that the Hashmap used on the hash algorithm inserts > the tokens by some hashmap internal sort method that I can't understand :), > and so, impossible to copy to C# implementation. > > Thank you for all your help, > Frederico > > ----- Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h tml Sent from the Solr - User mailing list archive at Nabble.com.