Re: What is the best way of Indexing different formats of documents?

Yavar Husain Tue, 07 Apr 2015 06:16:22 -0700

Well have indexed heterogeneous sources including a variety of NoSQL's,
RDBMs and Rich Documents (PDF Word etc.) using SolrJ. The only prerequisite
of using SolrJ is that you should have an API to fetch data from your data
source (Say JDBC for RDBMS, Tika for extracting text content from rich
documents etc.) than SolrJ is so damn great and simple. Its as simple as
downloading the jar and few lines of code to send data to your solr server
after pre-processing your data. More details here:


http://lucidworks.com/blog/indexing-with-solrj/

https://wiki.apache.org/solr/Solrj

http://www.solrtutorial.com/solrj-tutorial.html

Cheers,
Yavar



On Tue, Apr 7, 2015 at 4:18 PM, sangeetha.subraman...@gtnexus.com <
sangeetha.subraman...@gtnexus.com> wrote:

> Hi,
>
> I am a newbie to SOLR and basically from database background. We have a
> requirement of indexing files of different formats (x12,edifact, csv,xml).
> The files which are inputted can be of any format and we need to do a
> content based search on it.
>
> From the web I understand we can use TIKA processor to extract the content
> and store it in SOLR. What I want to know is, is there any better approach
> for indexing files in SOLR ? Can we index the document through streaming
> directly from the Application ? If so what is the disadvantage of using it
> (against DIH which fetches from the database)? Could someone share me some
> insight on this ? ls there any web links which I can refer to get some idea
> on it ? Please do help.
>
> Thanks
> Sangeetha
>
>

Re: What is the best way of Indexing different formats of documents?

Reply via email to