Hi,
I would first check if external libraries are present and loaded. How do you start Solr? Try explicitly setting solr.install.dir or set absolute path to libs and see in logs if they are loaded.

<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib"
regex=".*\.jar" />


Thanks,
Emir

On 25.01.2016 15:16, kostali hassan wrote:
0down votefavorite
<http://stackoverflow.com/questions/34962280/solr-indexing-pdf-attachments-not-working-in-ubuntu#>

I have a problem with integrating solr in Ubuntu server.Before using solr
on ubuntu server i tested it on my mac it was working perfectly for DIH
request handler and update/extract. it indexed my PDF,Doc,Docx documents.so
after installing solr on ubuntu server and using the same configuration
files and librairies. i've found out that solr doesn't index PDf documents
and none Error and any exceptions in solr log.But i can search over .Doc
and .Docx documents.

here some parts of my solrconfig.xml contents :

<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib"
regex=".*\.jar" />
   <lib dir="${solr.install.dir:../../../..}/dist/"
regex="solr-cell-\d.*\.jar" />

<requestHandler name="/update/extract"
                   startup="lazy"
                   class="solr.extraction.ExtractingRequestHandler" >
     <lst name="defaults">
       <str name="lowernames">true</str>
       <str name="fmap.meta">ignored_</str>
       <str name="fmap.content">_text_</str>
     </lst>
   </requestHandler>

DIH config:

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
<str name="config">tika.config.xml</str>
</lst>
</requestHandler>

tika.config.xml

<dataConfig>
     <dataSource type="BinFileDataSource" />
     <document>
         <entity name="files" processor="FileListEntityProcessor"
dataSource="null" rootEntity="false"
                 baseDir="D:\Lucene\document"
fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
                                onError="skip"
             recursive="true">
                 <field column="fileAbsolutePath" name="id" />
                 <field column="fileSize" name="size" />
                 <field column="fileLastModified" name="lastModified" />
                  <field column="file" name="title" />
                <entity
                     name="documentImport"
                                        dataSource="files"
                     processor="TikaEntityProcessor"
                     url="${files.fileAbsolutePath}"
                     format="text">

                                        
                     <field column="Author" name="author" meta="true"/>
                                        <field column="title" name="title" 
meta="true"/>
                     <field column="text" name="text"/>

                                        <field column="text" name="content"/>
                     <field column="LastModifiedBy"
name="LastModifiedBy" meta="true"/>
                 </entity>
         </entity>
     </document>
</dataConfig>


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Reply via email to