isn't it possible to do this by having two datasources (one Js=dbc and another File) and two entities . The outer entity can read from a DB and the inner entity can read from a file.
On Tue, Aug 11, 2009 at 8:05 PM, Sascha Szott<sz...@zib.de> wrote: > Hello, > > is it possible (and if it is, how can I accomplish it) to configure DIH to > build up index documents by using content that resides in different data > sources? > > Here is an example scenario: > Let's assume we have a table T with two columns, ID (which is the primary > key of T) and TITLE. Furthermore, each record in T is assigned a directory > containing text files that were generated out of pdf documents by using > Tika. A directory name is build by using the ID of the record in T > associated to that directory, e.g. all text files associated to a record > with id = 101 are stored in direcory 101. > > Is there a way to configure DIH such that it uses ID, TITLE and the content > of all related text files when building a document (the documents should > have three fields: id, title, and text)? > > Furthermore, as you may have noticed, a second question arises naturally: > Will there be any integration of Solr Cell and DIH in an upcoming release, > so that it would be possible to directly use the pdf documents instead of > the extracted text files that were generated outside of Solr? This is something I wish to see. But there has been no user request yet. You can raise an issue and it can be looked upon > > Best, > Sascha > > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com