Re: Getting unique key of a document inside of a Similarity class.

Chris Hostetter Thu, 19 Feb 2015 22:45:51 -0800

: 1. name:DocumentOne^7 => doc1(score=7)
: 2. name:DocumentOne^7 AND place:notExist^3 => doc1(score=7)
: 3. place:(34\ High\ Street)^3 => doc1(score=3), doc2(score=3)
: 4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=10),
: doc2(score=3)
        ...
: > it's not clear why you need any sort of unique document identification for
: > you scoring algorithm .. from what you described, matches on fieldA should
: > get score "A" matches on fieldB should get score "B" ... why does it mater
: > which doc is which?
: 
: For case #3, for example, method SimScorer.score is called 3 times for each of
: these documents, total 6 times for both. I have added a
: ThreadLocal<HashSet<String>> to my custom similarity, which is cleared every
: time before new scoring session (after each query execution). This HashSet
: stores strings consisting of fieldName + docID. Every time score() is called,


Ah HA! ... this is why it's an XY problem... you've decided that you need 
a unique identifier for each doc so you can maintain a HashSet of all the 
times a doc matches a term in the query so you can count them ... you 
don't need to do any of that.

from all the examples of what you've described, i'm fairly certain all you 
really need is a TFIDF based Similarity where coord(), idf(), tf() and 
queryNorm() return 1 allways, and you omitNorms from all fields.

that's it ... that should literally be everything you need to do.

(You didn't give any examples of what you expect to happen with exclusion 
clauses in your BooleanQueries, but the approach you were describing 
wouldn't give you any aded advantages towards interesting MUST_NOT clauses 
either ... it would in fact only increase the scores for those docs in a 
way that is almost certainly not what you want)


-Hoss
http://www.lucidworks.com/

Re: Getting unique key of a document inside of a Similarity class.

Reply via email to