Re: Getting unique key of a document inside of a Similarity class.

J-Pro Thu, 19 Feb 2015 14:56:19 -0800

Thank you for your answer, Chris. I will reply with inline comments aswell. Please see below.

: I need to uniquely identify a document inside of a Similarity class during
: scoring. Is it possible to get value of unique key of a document at this
: point?


Can you tell us a bit more about your usecase ... your problem description
is a bit vague, and sounds like it may be an "XY Problem"...

Sure, sorry I did not do it before, I just wanted to take minimum ofyour valuable time. So in my custom Similarity class I am trying toimplement such a logic, where score calculation is only based on fieldweight and a field match - that's it. In other words, if a field matchesthe query, I want "score" method to return this field's weight only,regardless of factors like: norms; coord; doc frequencies; fact thatfield was multivalued and more than one value matched; fact that fieldwas tokenized as multiple tokens and more than one token matched, etc.As far as I know, there is no such a similarity in list of existing ones.In order to implement this, I am trying to score only once for acombination of a specific field + doc unique identifier. And I don'tcare what is this unique doc identifier - it can be unique key or it canbe internal doc ID.I had my implementation working, but as I understood from your answer, Ihad it working only for one segment. So now I need to add segment ID orsomething like this to my combination.

Assuming the method you are refering to (you didn't give a specific
class/interface name) is SimScorer.score(doc,req) then the javadocs say...

     doc - document id within the inverted index segment
     freq - sloppy term frequency

...so for #1, yes this is definitely the per-segment docId.

Yes, it's ExactSimScorer.score(int doc, int freq). Ah! Per segment! Herewe go, then I understand why it's 0 every new commit! SOLR doc says newdocs are written to a new segment. Then question #1 is clear for me.Thanks, Chris!

for #2: the methor for providing a SimScorer to lucene is by implementing
Similarity.simScorer(...) -- that method gets as an argument an
AtomicReaderContext context, which not only has an AtomicReader for the
individual segment, but also details about that segments role in the
larger index.

Interesting details, that may be exactly what I need. If I can somehowuniquely identify a document using its internal doc id + data fromcontext (like segment id or something), that would be awesome. I havechecked AtomicReaderContext, it has 'ord' (The readers ord in thetop-level's leaves array) and 'docBase' (The readers absolute doc base)- probably what I need. Do you have any more information (maybe links towikis) about this AtomicReaderContext, DocValues, "low" and "top" levels(other than javadoc in source code)? I have a high-level understanding,but it's obviously not enough for the problem I am solving. I would bemore than happy to understand it.

Thank you very much for your time, Chris and other people who spend timeon reading/answering this thread!

Re: Getting unique key of a document inside of a Similarity class.

Reply via email to