FileListEntityProcessor -> BinFileDataSource -> TikaEntityProcessor (I think)
FLEP walks the directory and supplies a separate record per file.
BFDS pulls the file and supplies it to TikaEntityProcessor.

BinFileDataSource is not documented, but you need it for binary data
streams like PDF & Word. For text files, use FileDataSource.

On 4/14/10, Sandhya Agarwal <sagar...@opentext.com> wrote:
> Hello,
>
> We want to design a solution where we have one polling directory (data
> source directory) containing the xml files, of all data that must be
> indexed. These XML files contain a reference to the content file. So, we
> need another datasource that must be created for the content files. Could
> somebody please tell me what is the best way to get this working using the
> DIH / tika processor.
>
> Thanks,
> Sandhya
>
>
>


-- 
Lance Norskog
goks...@gmail.com
  • DIH Sandhya Agarwal

Reply via email to