Re: Index Deeply Nested documents and retrieve a full nested document in solr

Alexandre Rafalovitch Thu, 24 Sep 2020 07:49:41 -0700

It is yes to both questions, but I am not sure if they play well
together for historical reasons.


For storing/parsing original JSON in any (custom) format:
https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
(srcField parameter)
For indexing nested children (with named collections of subdocuments)
but in Solr's own JSON format:
https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html

I am not sure if defining additional fields as per the second document
but indexing the first way will work together. A feedback on that
would be useful.

Please also note that Solr is not intended to be the primary storage
(like a database). If you do atomic operations, the stored JSON will
get out of sync as it is not regenerated. Also, for the advanced
searches, you may want to normalize your data in different ways than
those your original data structure has. So, you may want to consider
an architecture where that JSON is stored separately or is retrieved
from original database and the Solr is focused on good search and
returning you just the record ID. That would actually allow you to
store a lot less in Solr (like just IDs) and focus on indexing in the
best way. Not saying it is the right way for your needs, just that is
a non-obvious architecture choice you may want to keep in mind as you
add Solr to your existing stack.

Regards,
   Alex.

On Thu, 24 Sep 2020 at 10:23, Abhay Kumar <abhay.ku...@anjusoftware.com> wrote:
>
> Hello Team,
>
> Can someone please help to index the below sample json document into Solr.
>
> I have following queries on indexing multi level child document.
>
>
>   1.  Can we specify names to documents hierarchy such as "therapeuticareas" 
> or "sites" while indexing.
>   2.  How can we index document at multi-level hierarchy.
>
> I have following queries on retrieving the result.
>
>
>   1.  How can I retrieve result with full nested structure.
>
> [{
>                "id": "NCT00000102",
>                "title": "Congenital Adrenal Hyperplasia: Calcium Channels as 
> Therapeutic Targets",
>                "phase": "Phase 1/Phase 2",
>                "status": "Completed",
>                "studytype": "Interventional",
>                "enrollmenttype": "",
>                "sponsorname": ["National Center for Research Resources 
> (NCRR)"],
>                "sponsorrole": ["lead"],
>                "score": [0],
>                "source": "National Center for Research Resources (NCRR)",
>                "therapeuticareas": [{
>                                              "taid": "ta1",
>                                              "ta": "Lung Cancer",
>                                              "diseaseAreas": ["Oncology, 
> Respiratory tract diseases"],
>                                              "pubmeds": [{
>                                                             "pmbid": "pm1",
>                                                             "articleTitle": 
> "Consensus minimum data set for lung cancer multidisciplinary teams Results 
> of a Delphi process",
>                                                             "revisedDate": 
> "2018-12-11T18:30:00Z"
>                                              }],
>                                              "conferences": [{
>                                                             "confid": "conf1",
>                                                             "conferencename": 
> "American Academy of Neurology Annual Meeting",
>                                                             
> "conferencetopic": "Avances en el manejo de los trastornos del movimiento 
> hipercineticos",
>                                                             "conferencedate": 
> "2019-05-08T18:30:00Z"
>                                              }]
>                               },
>                               {
>                                              "taid": "ta2",
>                                              "ta": "Breast Cancer",
>                                              "diseaseAreas": ["Oncology"],
>                                              "pubmeds": [],
>                                              "conferences": []
>                               }
>                ],
>
>                "sites": [{
>                               "siteid": "site1",
>                               "type": "Hospital",
>                               "institutionname": "Methodist Health System",
>                               "country": "United States",
>                               "state": "Texas",
>                               "city": "Dallas",
>                               "zip": ""
>                }],
>
>                "investigators": [{
>                               "invid": "inv1",
>                               "investigatorname": "Bryan A Faller",
>                               "role": "Principal Investigator",
>                               "location": "",
>                               "score": ""
>                }],
>
>                "Drugs": [{
>                               "id": "11",
>                               "drugname": "Methotrexate",
>                               "activeIngredient": "Methotrexate Sodium"
>                }]
> }]
>
> Thanks.
> Abhay
>
> Confidentiality Notice
> ====================
> This email message, including any attachments, is for the sole use of the 
> intended recipient and may contain confidential and privileged information. 
> Any unauthorized view, use, disclosure or distribution is prohibited. If you 
> are not the intended recipient, please contact the sender by reply email and 
> destroy all copies of the original message. Anju Software, Inc. 4500 S. 
> Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.

Re: Index Deeply Nested documents and retrieve a full nested document in solr

Reply via email to