Couldn't you extend the TextProfileSignature and modify the TokenComparator class to use lexical order when token have the same frequency ?
Ludovic. 2011/4/8 Frederico Azeiteiro [via Lucene] < ml-node+2794604-1683988626-383...@n3.nabble.com> > Hi. > > Yes, I manage to create a stable comparator in c# for profile. > The problem is before that on: > > ... > tokens.put(s, tok); > ... > > Imagine you have 2 tokens with the same frequency, on the stable sort > comparator for profile it will maintain the original order. > The problem is that the original order comes from the way they are > inserted in hashmap 'tokens' and not from the order the tokens appear on > original text. > > Frederico > > -----Original Message----- > From: lboutros [mailto:[hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=0&by-user=t>] > > Sent: sexta-feira, 8 de Abril de 2011 09:49 > To: [hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=1&by-user=t> > Subject: Re: Using MLT feature > > It seems that tokens are sorted by frequencies : > > ... > Collections.sort(profile, new TokenComparator()); > ... > > > and > > private static class TokenComparator implements Comparator<Token> { > public int compare(Token t1, Token t2) { > return t2.cnt - t1.cnt; > } > > and cnt is the token count. > > Ludovic. > > 2011/4/7 Frederico Azeiteiro [via Lucene] < > [hidden > email]<http://user/SendEmail.jtp?type=node&node=2794604&i=2&by-user=t>> > > > > Well at this point I'm more dedicated to the Deduplicate issue. > > > > Using a Min_token_len of 4 I'm getting nice comparison results. MLT > returns > > a lot of similar docs that I don't consider similar - even tuning the > > parameters. > > > > Finishing this issue, I found out that the signature also contains the > > field name meaning that if you wish to signature both title and text > fields, > > your signature will be a hash of ("text"+"text value"+"title"+"title > > value"). > > > > In any case, I found that the Hashmap used on the hash algorithm > inserts > > the tokens by some hashmap internal sort method that I can't > understand :), > > and so, impossible to copy to C# implementation. > > > > Thank you for all your help, > > Frederico > > > > > > > ----- > Jouve > France. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h<http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h?by-user=t> > tml > Sent from the Solr - User mailing list archive at Nabble.com. > > > ------------------------------ > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794604.html > To start a new topic under Solr - User, email > ml-node+472068-1765922688-383...@n3.nabble.com > To unsubscribe from Solr - User, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>. > > ----- Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794622.html Sent from the Solr - User mailing list archive at Nabble.com.