thank you , that's why I choose to add the exact value using solarium PHP Client, but the time out stop indexing after 30seconde:
$dir = new Folder($dossier); $files = $dir->find('.*\.*'); foreach ($files as $file) { $file = new File($dir->pwd() . DS . $file); $query = $client->createExtract(); $query->setFile($file->pwd()); $query->setCommit(true); $query->setOmitHeader(false); $doc = $query->createDocument(); $doc->id =$file->pwd(); $doc->name = $file->name; $doc->title = $file->name(); $query->setDocument($doc); 2015-12-04 16:50 GMT+00:00 Erik Hatcher <erik.hatc...@gmail.com>: > Kostali - > > See if the "Introspect rich document parsing and extraction” section of > http://lucidworks.com/blog/2015/08/04/solr-5-new-binpost-utility/ > helps*. You’ll be able to see the output of /update/extract (aka Tika) and > adjust your mappings and configurations accordingly. > > * And apologies that bin/post isn’t Windows savvy at this point, but > you’ve got the hang of the Windows-compatible command-line it looks like. > > — > Erik Hatcher, Senior Solutions Architect > http://www.lucidworks.com > > > > > On Dec 4, 2015, at 11:44 AM, kostali hassan <med.has.kost...@gmail.com> > wrote: > > > > thank you Erick, i follow you advice and take a look to config apache > tika, > > I have modifie my request handler /update/extract: > > > > <requestHandler name="/update/extract" > > startup="lazy" > > class="solr.extraction.ExtractingRequestHandler" > > > <lst name="defaults"> > > <str name="fmap.Last-Modified">last_modified</str> > > <str name="uprefix">ignored_</str> > > > > <!-- capture link hrefs but ignore div attributes --> > > <str name="captureAttr">true</str> > > <str name="fmap.a">links</str> > > <str name="fmap.div">ignored_</str> > > </lst> > > <str > > > name="tika.config">D:\solr\solr-5.3.1\server\solr\tika-data-config.xml</str> > > </requestHandler> > > > > and config tika : > > > > dataConfig> > > <dataSource type="BinFileDataSource" /> > > <document> > > <entity name="files" processor="FileListEntityProcessor" > > dataSource="null" rootEntity="false" > > baseDir="D:\Lucene\document" > > fileName=".*.(doc)|(pdf)|(docx)" > > onError="skip" > > recursive="true"> > > <field column="fileAbsolutePath" name="lux_uri" /> > > <field column="fileSize" name="size" /> > > <field column="fileLastModified" name="lastModified" /> > > > > <entity > > name="documentImport" > > processor="TikaEntityProcessor" > > url="${files.fileAbsolutePath}" > > format="text"> > > <field column="file" name="fileName" meta="true"/> > > <field column="Author" name="author" meta="true"/> > > <field column="name" name="name" meta="true"/> > > <field column="title" name="title" meta="true"/> > > <field column="text" name="text"/> > > <field column="custom:Testmeta" name="Testmeta" > > meta="true"/> > > <field column="LastModifiedBy" name="LastModifiedBy" > > meta="true"/> > > </entity> > > </entity> > > </document> > > </dataConfig> > > > > and schema.xml: > > > > <field name="Testmeta" type="text" indexed="true" stored="true" /> > > > > > > > > but the prb is the same title of indexed files is wrong for msword > >