: I need to uniquely identify a document inside of a Similarity class during : scoring. Is it possible to get value of unique key of a document at this : point?
Can you tell us a bit more about your usecase ... your problem description is a bit vague, and sounds like it may be an "XY Problem"... https://people.apache.org/~hossman/#xyproblem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 : 1. Is docIds behavior described above a bug or a feature? Obviously, if it's a : bug and I can use docID to uniquely identify a document, then my question is : answered after this bug is fixed. : 2. If docIds behavior described above is normal, then what is an alternative : way of uniquely identify a document inside of a Similarity class during : scoring? Can I get unique key of a scoring document in Similarity? Assuming the method you are refering to (you didn't give a specific class/interface name) is SimScorer.score(doc,req) then the javadocs say... doc - document id within the inverted index segment freq - sloppy term frequency ...so for #1, yes this is definitely the per-segment docId. for #2: the methor for providing a SimScorer to lucene is by implementing Similarity.simScorer(...) -- that method gets as an argument an AtomicReaderContext context, which not only has an AtomicReader for the individual segment, but also details about that segments role in the larger index. As far as getting the Solr uniqueKey ... it's non trivial, and there are different things you could do depending on what your ultimate goal is (ie: see my earlier question about XY problem) ... my guess is from this low level down in the code you want to use DocValues (aka: FieldCache in older versions of lucene) on your uniqueKey field, then ask it for the fieldvalue of each internal docId that gets passed to your method -- either by using the per-segment DocValues, or by using the AtomicReaderContext's base information to determine the "top level" internal docId and use the "top level" DocValues/FieldCache (the per-segment vs "top level" DocValues and internalId stuff can be kind of confusing -- start with whichever seems simpler based on your understanding of the internal lucene/solr APIs and worry about maybe switching to the other approach later once you have something working and see if it helps or hinders performance for your usecases) -Hoss http://www.lucidworks.com/