: 1. name:DocumentOne^7 => doc1(score=7) : 2. name:DocumentOne^7 AND place:notExist^3 => doc1(score=7) : 3. place:(34\ High\ Street)^3 => doc1(score=3), doc2(score=3) : 4. name:DocumentOne^7 OR place:(34\ High\ Street)^3 => doc1(score=10), : doc2(score=3) ... : > it's not clear why you need any sort of unique document identification for : > you scoring algorithm .. from what you described, matches on fieldA should : > get score "A" matches on fieldB should get score "B" ... why does it mater : > which doc is which? : : For case #3, for example, method SimScorer.score is called 3 times for each of : these documents, total 6 times for both. I have added a : ThreadLocal<HashSet<String>> to my custom similarity, which is cleared every : time before new scoring session (after each query execution). This HashSet : stores strings consisting of fieldName + docID. Every time score() is called,
Ah HA! ... this is why it's an XY problem... you've decided that you need a unique identifier for each doc so you can maintain a HashSet of all the times a doc matches a term in the query so you can count them ... you don't need to do any of that. from all the examples of what you've described, i'm fairly certain all you really need is a TFIDF based Similarity where coord(), idf(), tf() and queryNorm() return 1 allways, and you omitNorms from all fields. that's it ... that should literally be everything you need to do. (You didn't give any examples of what you expect to happen with exclusion clauses in your BooleanQueries, but the approach you were describing wouldn't give you any aded advantages towards interesting MUST_NOT clauses either ... it would in fact only increase the scores for those docs in a way that is almost certainly not what you want) -Hoss http://www.lucidworks.com/