It seems that tokens are sorted by frequencies :

...
Collections.sort(profile, new TokenComparator());
...


and

private static class TokenComparator implements Comparator<Token> {
    public int compare(Token t1, Token t2) {
      return t2.cnt - t1.cnt;
    }

and cnt is the token count.

Ludovic.

2011/4/7 Frederico Azeiteiro [via Lucene] <
ml-node+2790579-1141723501-383...@n3.nabble.com>

> Well at this point I'm more dedicated to the Deduplicate issue.
>
> Using a Min_token_len of 4 I'm getting nice comparison results. MLT returns
> a lot of similar docs that I don't consider similar - even tuning the
> parameters.
>
> Finishing this issue, I found out that the signature also contains the
> field name meaning that if you wish to signature both title and text fields,
> your signature will be a hash of ("text"+"text value"+"title"+"title
> value").
>
> In any case, I found that the Hashmap used on the hash algorithm inserts
> the tokens by some hashmap internal sort method that I can't understand :),
> and so, impossible to copy to C# implementation.
>
> Thank you for all your help,
> Frederico
>
>


-----
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to