Thanks a lot, Lance.

So, are these part of solr 1.4 release ?

-----Original Message-----
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Thursday, April 15, 2010 9:53 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH

FileListEntityProcessor -> BinFileDataSource -> TikaEntityProcessor (I think)

FLEP walks the directory and supplies a separate record per file.
BFDS pulls the file and supplies it to TikaEntityProcessor.

BinFileDataSource is not documented, but you need it for binary data
streams like PDF & Word. For text files, use FileDataSource.

On 4/14/10, Sandhya Agarwal <sagar...@opentext.com> wrote:
> Hello,
>
> We want to design a solution where we have one polling directory (data
> source directory) containing the xml files, of all data that must be
> indexed. These XML files contain a reference to the content file. So, we
> need another datasource that must be created for the content files. Could
> somebody please tell me what is the best way to get this working using the
> DIH / tika processor.
>
> Thanks,
> Sandhya
>
>
>


-- 
Lance Norskog
goks...@gmail.com
  • DIH Sandhya Agarwal

Reply via email to