Hi Mikhail,

Lacking clarity on this in the Ref Guide, I'm trying to understand all 
requirements for block join here.
I have noticed that if I index the blocks as one ADD request and then 
afterwards index the "other"
single documents in another request, then the results look ok.

But is it enough with a new ADD to divide the two or do we actually need a 
COMMIT in between?

I'm also worried that after some index merges that the docs may be mixed up 
again?
Is there some structure in the segments that prevent that from happening?

Finally, when we some day need to SPLITSHARD, is the SPLIT API aware of blocks 
so that it will
never split in the middle of a block?

I hope to perhaps update the RefGuide documentation to clarify all of these 
constraints and pitfalls.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 5. feb. 2018 kl. 20:49 skrev Mikhail Khludnev <m...@apache.org>:
> 
> Jan, mixing plan docs and blocks are not supported.
> 
> On Thu, Jan 11, 2018 at 2:42 AM, Jan Høydahl <jan....@cominvent.com> wrote:
> 
>> Hi,
>> 
>> We index several large nested documents. We found that querying the data
>> behaves differently depending on how the documents are indexed.
>> 
>> To reproduce:
>> 
>> solr start
>> solr create -c nested
>> # Index one plain document, “friend" and a nested one, “mother” and
>> “daughter”, in same request:
>> curl localhost:8983/solr/nested/update -d ‘
>> <add>
>>   <doc>
>>     <field name="id">friend</field>
>>     <field name="type">other</field>
>>   </doc>
>>   <doc>
>>     <field name="id">mother</field>
>>     <field name="type">parent</field>
>>     <doc>
>>       <field name="id">daughter</field>
>>       <field name="type">child</field>
>>     </doc>
>>   </doc>
>> </add>'
>> 
>> # Query for mother’s children using either child transformer or child
>> query parser
>> curl "localhost:8983/solr/a/query?q=id:mother&fl=%2A%2C%5Bchild%
>> 20parentFilter%3Dtype%3Aparent%5D”
>> {
>>  "responseHeader":{
>>    "zkConnected":true,
>>    "status":0,
>>    "QTime":4,
>>    "params":{
>>      "q":"id:mother",
>>      "fl":"*,[child parentFilter=type:parent]"}},
>>  "response":{"numFound":1,"start":0,"docs":[
>>      {
>>        "id":"mother",
>>        "type":["parent"],
>>        "_version_":1589249812802306048,
>>        "type_str":["parent"],
>>        "_childDocuments_":[
>>        {
>>          "id":"friend",
>>          "type":["other"],
>>          "_version_":1589249812729954304,
>>          "type_str":["other"]},
>>        {
>>          "id":"daughter",
>>          "type":["child"],
>>          "_version_":1589249812802306048,
>>          "type_str":["child"]}]}]
>>  }}
>> 
>> As you can see, the “friend” got included as a child of “mother”.
>> If you index the exact same request, putting “friend” after “mother” in
>> the xml,
>> the query works as expected.
>> 
>> Inspecting the index, everything looks correct, and only “daughter” and
>> “mother” have _root_=mother.
>> Is there a rule that you should start a new update request for each type
>> of parent/child relationship
>> that you need to index, and not mix them in the same request?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>> 
> 
> 
> -- 
> Sincerely yours
> Mikhail Khludnev

Reply via email to