Hello,
I got a task to index in Solr 7.71 a PDF files which are stored in SqlBase
database. I did half the job - I can to index all table fields, I can do a
search in these fields except field in which is stored a pdf file content.
As I am ttotally new in Solr, spent unsuccessfully a lot a time trying to
understand how to force to extract and index field with pdf content. I need
a help.
Regards,
Aruna
in solrconfig.xml i have
* <lib dir="${solr.install.dir:../../../..}/contrib/dataimporthandler/lib"
regex=".*\.jar" /> <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-dataimporthandler-.*\.jar" /> *
* <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib"
regex=".*\.jar" />*
* <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-cell-\d.*\.jar" />*
*<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" > <lst
name="defaults"> <str name="lowernames">true</str> <str
name="fmap.meta">ignored_</str> <str
name="fmap.content">_text_</str> </lst> </requestHandler>*
*<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst
name="defaults"> <str name="config">db-data-config.xml</str> </lst>
</requestHandler>*
*---------------------------------------------------------------------------------------------------------------------------------------------db-data-config.xml<dataConfig><dataSource
type="JdbcDataSource"
driver="jdbc.unify.sqlbase.SqlbaseDriver"
url="jdbc:sqlbase://localhost:2155/PDFDOCS"
user="sysadm" password="sysadm" /> <document> <entity
name="PDFDOCUMENTS" query="select ID, PDOCUMENT, UNIT from SYSADM.DOCS">
<field column="ID" name="idx" /> <field column="PDOCUMENT"
name="PDF" /> <field column="UNIT" name="division" /> </entity>
</document></dataConfig>*