There really isn't a _Tika_ database connector, Tika parses the structured files. A typical jdbc connector can connect to a DB. You might be thinking of Data Import Handler (DIH).
Here's a program that both uses Tika and connects to a DB that might give you a hint. It uses an older version of Solr, but should be fairly easily modifiable. https://lucidworks.com/2012/02/14/indexing-with-solrj/ Best, Erick On Wed, Feb 22, 2017 at 4:14 AM, Wilhelm Eger <wilhelm.e...@gmail.com> wrote: > Hi! > > I am using a setup of datafari (www.datafari.com), which more or less combines > a ManifoldCF file index with SolR as a search engine. > > My setup consists of ~350000 files, which are composed mainly of doc(x), > xls(x), msg and pdf files. pdf files are ocr'd externally before they are > added > to the ManifoldCF index. Only remaining image files (png, jpg) are ocr'd on- > the-fly, when being imported. > > The files are actually part of an external file management system (files in > the > literal meaning of files, not files in the meaning of entities saved on the > hard > disk), which is not related to ManifoldCF/SolR at all. This system > unfortunately does not provide a proper full text search, hence I implemented > it as outlined above. > > However, the users are used to certain file numbers provided by this file > management system. These file numbers are stored in a MSSQL database, which is > accessible from the host my setup is running on. I can easily get the file > number by sending a respective SQL statement based on the file name (of the > entity saved on the hard disk) to the SQL Server. Hence, for each file name, > there is a file number stored in the database. I would like to have these file > numbers to be stored in a specific field of the solr index to be shown by the > (tomcat) output, e.g: > > File name: /data/1003234234.docx > Content: "This is the content. You searched for _text_." > File name belongs to file number: SUI-G-25-A > > Is there any possibility to achieve that? Did I understand it correctly that > this could happen either in ManifoldCF during indexing or in SolR during > importing? > > I know that there is a tika plugin to talk to databases, which could be fed > with a SQL statement. But how to connect it with the data retrieved from the > files crawler? > > Alternatively, I could also call an external script (bash, python) to retrieve > the respective data from the database using bsqldb. > > Any hint in the right direction is very much appreciated. > > Thanks in advance, > > Wilhelm >