Hello:

I'm trying to figure out if there is some limitation to a cross core join, or 
if I'm must misunderstanding something.  This has been working fine with a 
small number of documents in the from index, but now I'm not getting the 
expected results now that a given example here has 41K from index documents 
with which to filter the results of the main index.  On the other hand, I do 
have a case where things work with 80K docs in the from index that match the 
criteria...

My scenario is that we have canonical content, to which tenants map their 
product information. The canonical content is in one core, while each tenant 
has their own core for defining their mappings and other stuff.  In the tenant 
index, a product ID is mapped to a canonical node, whose ID is the document ID. 
 For example, a product mapping is defined as:

<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="rows">1000</str>
            <str name="fl">id,conceptId,productId,parentProductId</str>
            <str 
name="q">parentProductId:A10E5AC6-306C-4B71-BE03-62ACA0C4D34D</str>
            <str name="fq">conceptId:ING\:3uly</str>
        </lst>
    </lst>
    <result name="response" numFound="1" start="0">
        <doc>
            <str 
name="id">ING:3uly|285676_A10E5AC6-306C-4B71-BE03-62ACA0C4D34D|ING:3uly</str>
            <str name="conceptId">ING:3uly</str>
            <str 
name="parentProductId">A10E5AC6-306C-4B71-BE03-62ACA0C4D34D</str>
            <str 
name="productId">285676_A10E5AC6-306C-4B71-BE03-62ACA0C4D34D</str>
        </doc>
    </result>
</response>

Some schema.xml for this product index:

<field name="id" type="string" indexed="true" stored="true" required="true" /> 
<field name="conceptId" type="string" indexed="true" stored="true" 
required="true"/>
<field name="productId" type="string" indexed="true" stored="true" 
required="true"/> <field name="parentProductId" type="string" indexed="true" 
stored="true"/>

In the canonical core, the document ID is defined the same way:

<field name="id" type="string" indexed="true" stored="true" required="true" />

I concatenate the node ID (ING:3uly, which is the document ID in the canonical 
index) and the product ID (285676_A10E5AC6-306C-4B71-BE03-62ACA0C4D34D) to 
create a unique document ID in the product index. However, their are 
hierarchies defined in the canonical (biological) content, so if the canonical 
node is a member of another node (group or complex), then I create additional 
documents to accommodate this.  Thus, the ID is also comprised of the ID of the 
hierarchical node, if any, otherwise the same origin node is used.

Products can have a parent product ID to group the products as being related to 
one another. Only one level is supported, and the parent product ID is optional.

Okay, back to the join query issue. :)  The goal is to search the canonical 
index, and return only documents to which one or more products are mapped to 
them that have a designated parent product ID. Given the response above, you 
can see the field conceptId refers to the ID of the document in the canonical 
index, and that parentProductId is defined.

Now, I can search the canonical index for a specific term, and I get results:

curl 
'http://localhost:8983/solr/IngenuityContent.SearchMain/select/?qt=partner-tmo&fl=id&q=znf454&rows=1'
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
        <lst name="params">
            <str name="rows">1</str>
            <str name="fl">id</str>
            <str name="q">znf454</str>
            <str name="qt">partner-tmo</str>
        </lst>
    </lst>
    <result name="response" numFound="98" start="0">
        <doc>
            <str name="id">ING:3uly</str>
        </doc>
    </result>
</response>

Note that the first result document ID is same one as the conceptId defined for 
the product mapping earlier.  So, when I do the join query:

curl 
"http://localhost:8983/solr/IngenuityContent.SearchMain/select/?qt=partner-tmo&fl=id,n_name&q=znf454&fq=%7b%21join+from=conceptId+to=id+fromIndex=PartnerContent.SearchProducts%7dparentProductId:A10E5AC6-306C-4B71-BE03-62ACA0C4D34D";
<response>
    <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">1</int>
        <lst name="params">
            <str name="fl">id,n_name</str>
            <str name="q">znf454</str>
            <str name="qt">partner-tmo</str>
            <str name="fq">{!join from=conceptId to=id 
fromIndex=PartnerContent.SearchProducts}parentProductId:A10E5AC6-306C-4B71-BE03-62ACA0C4D34D</str>
        </lst>
    </lst>
    <result name="response" numFound="0" start="0"/>
</response>

Should I not get the canonical content document with ID ING:3uly as a result 
rather than zero documents?  In other cases, this works as expected.  Note the 
partner-tmo query type is edismax.

Anyway, this email is long already so I don't want to go adding misc 
configuration information. With debugQuery=true, I can see:

        <arr name="filter_queries">
            <str>{!join from=conceptId to=id 
fromIndex=PartnerContent.SearchProducts}parentProductId:A10E5AC6-306C-4B71-BE03-62ACA0C4D34D</str>
        </arr>
        <arr name="parsed_filter_queries">
            <str>JoinQuery({!join from=conceptId to=id 
fromIndex=PartnerContent.SearchProducts}parentProductId:A10E5AC6-306C-4B71-BE03-62ACA0C4D34D)</str>
        </arr>

That looks normal to me, but maybe it's not...

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
j...@535consulting.com
http://www.535consulting.com
(650) 423-1068









Reply via email to