they are loaded because solr is indexing .doc and .docx (msword) and fail for pdf files .
2016-01-26 12:49 GMT+00:00 Emir Arnautovic <emir.arnauto...@sematext.com>: > Hi, > I would first check if external libraries are present and loaded. How do > you start Solr? Try explicitly setting solr.install.dir or set absolute > path to libs and see in logs if they are loaded. > > <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" > regex=".*\.jar" /> > > > Thanks, > Emir > > On 25.01.2016 15:16, kostali hassan wrote: > >> 0down votefavorite >> < >> http://stackoverflow.com/questions/34962280/solr-indexing-pdf-attachments-not-working-in-ubuntu# >> > >> >> >> I have a problem with integrating solr in Ubuntu server.Before using solr >> on ubuntu server i tested it on my mac it was working perfectly for DIH >> request handler and update/extract. it indexed my PDF,Doc,Docx >> documents.so >> after installing solr on ubuntu server and using the same configuration >> files and librairies. i've found out that solr doesn't index PDf documents >> and none Error and any exceptions in solr log.But i can search over .Doc >> and .Docx documents. >> >> here some parts of my solrconfig.xml contents : >> >> <lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" >> regex=".*\.jar" /> >> <lib dir="${solr.install.dir:../../../..}/dist/" >> regex="solr-cell-\d.*\.jar" /> >> >> <requestHandler name="/update/extract" >> startup="lazy" >> class="solr.extraction.ExtractingRequestHandler" > >> <lst name="defaults"> >> <str name="lowernames">true</str> >> <str name="fmap.meta">ignored_</str> >> <str name="fmap.content">_text_</str> >> </lst> >> </requestHandler> >> >> DIH config: >> >> <requestHandler name="/dataimport" >> class="org.apache.solr.handler.dataimport.DataImportHandler"> >> <lst name="defaults"> >> <str name="config">tika.config.xml</str> >> </lst> >> </requestHandler> >> >> tika.config.xml >> >> <dataConfig> >> <dataSource type="BinFileDataSource" /> >> <document> >> <entity name="files" processor="FileListEntityProcessor" >> dataSource="null" rootEntity="false" >> baseDir="D:\Lucene\document" >> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" >> onError="skip" >> recursive="true"> >> <field column="fileAbsolutePath" name="id" /> >> <field column="fileSize" name="size" /> >> <field column="fileLastModified" name="lastModified" /> >> <field column="file" name="title" /> >> <entity >> name="documentImport" >> dataSource="files" >> processor="TikaEntityProcessor" >> url="${files.fileAbsolutePath}" >> format="text"> >> >> >> <field column="Author" name="author" meta="true"/> >> <field column="title" >> name="title" meta="true"/> >> <field column="text" name="text"/> >> >> <field column="text" >> name="content"/> >> <field column="LastModifiedBy" >> name="LastModifiedBy" meta="true"/> >> </entity> >> </entity> >> </document> >> </dataConfig> >> >> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > >