Problem retrieving payloads from a specific term in a boosting function

Félix Sanjuán Wed, 18 May 2016 05:31:56 -0700

Hi all,

I have added a new field to my schema that is of the following type:


    <fieldtype name="payloads" stored="false" indexed="true"
class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!--
        The DelimitedPayloadTokenFilter can put payloads on tokens... for
example,
        a token of "foo|1.4"  would be indexed as "foo" with a payload of
1.4f
        Attributes of the DelimitedPayloadTokenFilterFactory :
         "delimiter" - a one character delimiter. Default is | (pipe)
"encoder" - how to encode the following value into a playload
   float -> org.apache.lucene.analysis.payloads.FloatEncoder,
   integer -> o.a.l.a.p.IntegerEncoder
   identity -> o.a.l.a.p.IdentityEncoder
            Fully Qualified class name implementing PayloadEncoder, Encoder
must have a no arg constructor.
         -->
        <filter class="solr.DelimitedPayloadTokenFilterFactory"
encoder="integer"/>
      </analyzer>
      <!-- Similarity class for payload scoring -->
      <similarity
class="com.ibm.connections.search.solr.plugin.similarity.PayloadSimilarityFactory"/>
    </fieldtype>

    <field name="people" type="payloads" indexed="true" stored="true"/>

As a quick example, the value of this field would have values like the
following:

people: "userid1|1 userid2|56"

Basically, a user id and an integer payload.

I am trying to retrieve the payload from a custom boosting function I'm
developing. Thing is, in this function I only want to retrieve the payload
for a certain user. For instance, in the example given above, I would only
want to retrieve the payload for userid2, which is 56. In order to achieve
this, I used the implementation below and I used the DocsAndPositionsEnum
to retrieve the payload.

Document document = atomicReader.document(doc);
Term term = new Term("people", "userid2");
DocsAndPositionsEnum dpe = atomicReader.termPositionsEnum(term);

Problem here is that, when iterating over the positions, I am getting the
payload of terms which are not userid2.

All the documents have been sent using SolrJ 4.7.2, which is the Solr
version I am using. Moreover, the luceneMatchVersion is 4.7 too.

Just as a quick test, I tried to reindex my documents using Solr Web
Interface. Basically, I took the same documents in JSON and I sent them
through Solr Web UI. Once I did that, the code above was working and I
retrieve only the payload for the user requested as expected.

Therefore, the problem seems to be related to indexing. I've checked both
indexes using look and there seems to be no difference between them.

Does anybody know what could be causing this problem or what is the Web UI
doing for indexing that I might be missing when using SolrJ?

Thanks for your help!!

Regards,

Felix

Problem retrieving payloads from a specific term in a boosting function

Reply via email to