FileListEntityProcessor -> BinFileDataSource -> TikaEntityProcessor (I think)
FLEP walks the directory and supplies a separate record per file. BFDS pulls the file and supplies it to TikaEntityProcessor. BinFileDataSource is not documented, but you need it for binary data streams like PDF & Word. For text files, use FileDataSource. On 4/14/10, Sandhya Agarwal <sagar...@opentext.com> wrote: > Hello, > > We want to design a solution where we have one polling directory (data > source directory) containing the xml files, of all data that must be > indexed. These XML files contain a reference to the content file. So, we > need another datasource that must be created for the content files. Could > somebody please tell me what is the best way to get this working using the > DIH / tika processor. > > Thanks, > Sandhya > > > -- Lance Norskog goks...@gmail.com