Can you use Tika? https://tika.apache.org/0.9/formats.html
On Wed, 2016-06-08 at 10:06 -0400, Aniruddh Sharma wrote: > Hi > > I am new to use Solr. > > I am running Solr 4.10.3 on CDH 5.5. > > My use case is , I have real time data ingestion in Hadoop on which I want > to implement search. > > My input data format is XML and it has nested child nodes. So my question > is about schema creation for solr. > > Technically I notice in JSON format , it is possible to handle nested data. > > a) Although technically JSON can handle nested child data. Is it also > doable in XML format. If no, then are there any guidelines to change XML > data to JSON or what is best way around to deal with this. > > b) Even though if could be technically done, from a functional point of > view when does it make sense to store data in Solr as nested vs flattened . > What is functional use case which drives this. > > > Thanks and Regards