Hey Solr Community,
Does anyone know if it's possible to manage the parent/child relationship for
nested documents manually? (i.e I manage the "_root_" relationship outside of
Solr and still take advantage of block join functionality?).
Typically nested documents are defined as follows:
In the below circumstance the "_root_" relationship is managed by Solr and
would result in 4 documents indexed (parent + 3 children)
{
"id": "parent_id",
"content_type": "parent",
"Text_Blob": ["Some Large Text Field Here"],
"_childDocuments_": [{
"id": "child_id_1",
"content_type": "child",
...more child fields
},{
"id": "child_id_2",
"content_type": "child",
...more child fields
},{
"id": "child_id_3",
"content_type": "child",
...more child fields
}]
}
The reason for the use case is we have a large text field we don't want to
repeat at child level in the document but in some circumstances we have many
child records meaning our
"_childDocuments_" array can get very big forcing us to build large individual
Solr documents when performing an update. We've found it much more performant
sending smaller documents across to Solr with the relationship managed outside
of Solr
i.e. Using the example above
document 1 - parent
{
"id": "parent_id",
"_root_":"parent_id",
"content_type": "parent",
"Text_Blob": ["Some Large Text Field Here"]
}
document 2,3,4 - three children all related by their "_root_" element but
managed external to Solr
{
"id": "child_id_1",
"_root_": "parent_id",
"content_type": "child",
...morechildfields
}{
"id": "child_id_2",
"_root_": "parent_id",
"content_type": "child",
...morechildfields
}{
"id": "child_id_3",
"_root_": "parent_id",
"content_type": "child",
...morechildfields
}
All of the above records could be batched as individual "documents" in JSON
Array format and pushed to Solr. The problem with the above approach is when
you specify the "_root_" element on an update Solr assumes it's an atomic
update and overrides the previous document (i.e. only the last document remains
in the index).
Is there a way to manage the parent/child relationship outside of Solr without
us creating a large "_childDocuments_" array?
Any advice would be appreciated.
Thanks,
Dwane