It is yes to both questions, but I am not sure if they play well together for historical reasons.
For storing/parsing original JSON in any (custom) format: https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html (srcField parameter) For indexing nested children (with named collections of subdocuments) but in Solr's own JSON format: https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html I am not sure if defining additional fields as per the second document but indexing the first way will work together. A feedback on that would be useful. Please also note that Solr is not intended to be the primary storage (like a database). If you do atomic operations, the stored JSON will get out of sync as it is not regenerated. Also, for the advanced searches, you may want to normalize your data in different ways than those your original data structure has. So, you may want to consider an architecture where that JSON is stored separately or is retrieved from original database and the Solr is focused on good search and returning you just the record ID. That would actually allow you to store a lot less in Solr (like just IDs) and focus on indexing in the best way. Not saying it is the right way for your needs, just that is a non-obvious architecture choice you may want to keep in mind as you add Solr to your existing stack. Regards, Alex. On Thu, 24 Sep 2020 at 10:23, Abhay Kumar <abhay.ku...@anjusoftware.com> wrote: > > Hello Team, > > Can someone please help to index the below sample json document into Solr. > > I have following queries on indexing multi level child document. > > > 1. Can we specify names to documents hierarchy such as "therapeuticareas" > or "sites" while indexing. > 2. How can we index document at multi-level hierarchy. > > I have following queries on retrieving the result. > > > 1. How can I retrieve result with full nested structure. > > [{ > "id": "NCT00000102", > "title": "Congenital Adrenal Hyperplasia: Calcium Channels as > Therapeutic Targets", > "phase": "Phase 1/Phase 2", > "status": "Completed", > "studytype": "Interventional", > "enrollmenttype": "", > "sponsorname": ["National Center for Research Resources > (NCRR)"], > "sponsorrole": ["lead"], > "score": [0], > "source": "National Center for Research Resources (NCRR)", > "therapeuticareas": [{ > "taid": "ta1", > "ta": "Lung Cancer", > "diseaseAreas": ["Oncology, > Respiratory tract diseases"], > "pubmeds": [{ > "pmbid": "pm1", > "articleTitle": > "Consensus minimum data set for lung cancer multidisciplinary teams Results > of a Delphi process", > "revisedDate": > "2018-12-11T18:30:00Z" > }], > "conferences": [{ > "confid": "conf1", > "conferencename": > "American Academy of Neurology Annual Meeting", > > "conferencetopic": "Avances en el manejo de los trastornos del movimiento > hipercineticos", > "conferencedate": > "2019-05-08T18:30:00Z" > }] > }, > { > "taid": "ta2", > "ta": "Breast Cancer", > "diseaseAreas": ["Oncology"], > "pubmeds": [], > "conferences": [] > } > ], > > "sites": [{ > "siteid": "site1", > "type": "Hospital", > "institutionname": "Methodist Health System", > "country": "United States", > "state": "Texas", > "city": "Dallas", > "zip": "" > }], > > "investigators": [{ > "invid": "inv1", > "investigatorname": "Bryan A Faller", > "role": "Principal Investigator", > "location": "", > "score": "" > }], > > "Drugs": [{ > "id": "11", > "drugname": "Methotrexate", > "activeIngredient": "Methotrexate Sodium" > }] > }] > > Thanks. > Abhay > > Confidentiality Notice > ==================== > This email message, including any attachments, is for the sole use of the > intended recipient and may contain confidential and privileged information. > Any unauthorized view, use, disclosure or distribution is prohibited. If you > are not the intended recipient, please contact the sender by reply email and > destroy all copies of the original message. Anju Software, Inc. 4500 S. > Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.