Hi.

Yes, I manage to create a stable comparator in c# for profile. 
The problem is before that on: 

...
tokens.put(s, tok);
...

Imagine you have 2 tokens with the same frequency, on the stable sort
comparator for profile it will maintain the original order. 
The problem is that the original order comes from the way they are
inserted in hashmap 'tokens' and not from the order the tokens appear on
original text.

Frederico

-----Original Message-----
From: lboutros [mailto:boutr...@gmail.com] 
Sent: sexta-feira, 8 de Abril de 2011 09:49
To: solr-user@lucene.apache.org
Subject: Re: Using MLT feature

It seems that tokens are sorted by frequencies :

...
Collections.sort(profile, new TokenComparator());
...


and

private static class TokenComparator implements Comparator<Token> {
    public int compare(Token t1, Token t2) {
      return t2.cnt - t1.cnt;
    }

and cnt is the token count.

Ludovic.

2011/4/7 Frederico Azeiteiro [via Lucene] <
ml-node+2790579-1141723501-383...@n3.nabble.com>

> Well at this point I'm more dedicated to the Deduplicate issue.
>
> Using a Min_token_len of 4 I'm getting nice comparison results. MLT
returns
> a lot of similar docs that I don't consider similar - even tuning the
> parameters.
>
> Finishing this issue, I found out that the signature also contains the
> field name meaning that if you wish to signature both title and text
fields,
> your signature will be a hash of ("text"+"text value"+"title"+"title
> value").
>
> In any case, I found that the Hashmap used on the hash algorithm
inserts
> the tokens by some hashmap internal sort method that I can't
understand :),
> and so, impossible to copy to C# implementation.
>
> Thank you for all your help,
> Frederico
>
>


-----
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h
tml
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to