Couldn't you extend the TextProfileSignature and modify the TokenComparator
class to use lexical order when token have the same frequency ?

Ludovic.

2011/4/8 Frederico Azeiteiro [via Lucene] <
ml-node+2794604-1683988626-383...@n3.nabble.com>

> Hi.
>
> Yes, I manage to create a stable comparator in c# for profile.
> The problem is before that on:
>
> ...
> tokens.put(s, tok);
> ...
>
> Imagine you have 2 tokens with the same frequency, on the stable sort
> comparator for profile it will maintain the original order.
> The problem is that the original order comes from the way they are
> inserted in hashmap 'tokens' and not from the order the tokens appear on
> original text.
>
> Frederico
>
> -----Original Message-----
> From: lboutros [mailto:[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=2794604&i=0&by-user=t>]
>
> Sent: sexta-feira, 8 de Abril de 2011 09:49
> To: [hidden 
> email]<http://user/SendEmail.jtp?type=node&node=2794604&i=1&by-user=t>
> Subject: Re: Using MLT feature
>
> It seems that tokens are sorted by frequencies :
>
> ...
> Collections.sort(profile, new TokenComparator());
> ...
>
>
> and
>
> private static class TokenComparator implements Comparator<Token> {
>     public int compare(Token t1, Token t2) {
>       return t2.cnt - t1.cnt;
>     }
>
> and cnt is the token count.
>
> Ludovic.
>
> 2011/4/7 Frederico Azeiteiro [via Lucene] <
> [hidden 
> email]<http://user/SendEmail.jtp?type=node&node=2794604&i=2&by-user=t>>
>
>
> > Well at this point I'm more dedicated to the Deduplicate issue.
> >
> > Using a Min_token_len of 4 I'm getting nice comparison results. MLT
> returns
> > a lot of similar docs that I don't consider similar - even tuning the
> > parameters.
> >
> > Finishing this issue, I found out that the signature also contains the
> > field name meaning that if you wish to signature both title and text
> fields,
> > your signature will be a hash of ("text"+"text value"+"title"+"title
> > value").
> >
> > In any case, I found that the Hashmap used on the hash algorithm
> inserts
> > the tokens by some hashmap internal sort method that I can't
> understand :),
> > and so, impossible to copy to C# implementation.
> >
> > Thank you for all your help,
> > Frederico
> >
> >
>
>
> -----
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h<http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794585.h?by-user=t>
> tml
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794604.html
>  To start a new topic under Solr - User, email
> ml-node+472068-1765922688-383...@n3.nabble.com
> To unsubscribe from Solr - User, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE=>.
>
>


-----
Jouve
France.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-MLT-feature-tp2774454p2794622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to