Well have indexed heterogeneous sources including a variety of NoSQL's, RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite of using SolrJ is that you should have an API to fetch data from your data source (Say JDBC for RDBMS, Tika for extracting text content from rich documents etc.) than SolrJ is so damn great and simple. Its as simple as downloading the jar and few lines of code to send data to your solr server after pre-processing your data. More details here:
http://lucidworks.com/blog/indexing-with-solrj/ https://wiki.apache.org/solr/Solrj http://www.solrtutorial.com/solrj-tutorial.html Cheers, Yavar On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com < sangeetha.subraman...@gtnexus.com> wrote: > Hi, > > I am a newbie to SOLR and basically from database background. We have a > requirement of indexing files of different formats (x12,edifact, csv,xml). > The files which are inputted can be of any format and we need to do a > content based search on it. > > From the web I understand we can use TIKA processor to extract the content > and store it in SOLR. What I want to know is, is there any better approach > for indexing files in SOLR ? Can we index the document through streaming > directly from the Application ? If so what is the disadvantage of using it > (against DIH which fetches from the database)? Could someone share me some > insight on this ? ls there any web links which I can refer to get some idea > on it ? Please do help. > > Thanks > Sangeetha > >