For a lot of reasons, I greatly prefer to put this work on a client rather than use Solr directly. Here’s a place to get started, it connects to a DB and also scans local file directory for docs to push through (local) Tika and index. So you should be able to modify it relatively easily to get the data from SqlBase, read the associated PDF, combine the two and send to Solr.
https://lucidworks.com/2012/02/14/indexing-with-solrj/ The code itself is a bit old, but illustrates the process. Best, Erick > On Apr 2, 2019, at 11:46 PM, Arunas Spurga <arunas2...@gmail.com> wrote: > > Hello, > > I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase > database. I did half the job - I can to index all table fields, I can do a > search in these fields except field in which is stored a pdf file content. > As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to > understand how to force to extract and index field with pdf content. I need > a help. > > Regards, > > Aruna > > in solrconfig.xml i have > > > * <lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib" > regex=".*\.jar" /> <lib dir="${solr.install.dir:../../../..}/dist/" > regex="solr-dataimporthandler-.*\.jar" /> * > * <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" > regex=".*\.jar" />* > * <lib dir="${solr.install.dir:../../../..}/dist/" > regex="solr-cell-\d.*\.jar" />* > > > > > > > > > > *<requestHandler name="/update/extract" > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > <lst > name="defaults"> <str name="lowernames">true</str> <str > name="fmap.meta">ignored_</str> <str > name="fmap.content">_text_</str> </lst> </requestHandler>* > > > > > > *<requestHandler name="/dataimport" > class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst > name="defaults"> <str name="config">db-data-config.xml</str> </lst> > </requestHandler>* > > > > > > > > > > > > > > > > > > > > *---------------------------------------------------------------------------------------------------------------------------------------------db-data-config.xml<dataConfig><dataSource > type="JdbcDataSource" > driver="jdbc.unify.sqlbase.SqlbaseDriver" > url="jdbc:sqlbase://localhost:2155/PDFDOCS" > user="sysadm" password="sysadm" /> <document> <entity > name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS"> > <field column="ID" name="idx" /> <field column="PDOCUMENT" > name="PDF" /> <field column="UNIT" name="division" /> </entity> > </document></dataConfig>*