Could you please show me how to get those values inside TikaEntityProcessor?

-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: 2012年3月27日 22:43
To: solr-user@lucene.apache.org
Subject: Re: how to store file path in Solr when using TikaEntityProcessor


> I am using DIH to index local file system. But the file path, size and 
> lastmodified field were not stored. in the schema.xml I defined:
> 
>  <fields>
>    <field name="title" type="string"
> indexed="true" stored="true"/>
>    <field name="author" type="string"
> indexed="true" stored="true" />
>    <!--<field name="text" type="text"
> indexed="true" stored="true" />
>     liang added-->
>    <field name="path" type="string"
> indexed="true" stored="true" />
>    <field name="size" type="long"
> indexed="true" stored="true" />
>    <field name="lastmodified" type="date"
> indexed="true" stored="true" />
>  </fields>
> 
> 
> And also defined tika-data-config.xml:
> 
> <dataConfig>
>     <dataSource name="bin"
> type="BinFileDataSource" />
>     <document>
>         <entity name="f"
> dataSource="null" rootEntity="false"
>            
> processor="FileListEntityProcessor"
>            
> baseDir="E:/my_project/ecmkit/infotouch"
>            
> fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)"
> onError="skip"
>            
> recursive="true">
>             <entity
> name="tika-test" dataSource="bin"
> processor="TikaEntityProcessor"
>            
> url="${f.fileAbsolutePath}" format="text"
> onError="skip">
>                
> <field column="Author" name="author" meta="true"/>
>                
> <field column="title" name="title" meta="true"/>
>                
> <!--
>                
> <field column="text" name="text"/> -->
>                
> <field column="fileAbsolutePath" name="path" />
>                
> <field column="fileSize" name="size" />
>                
> <field column="fileLastModified" name="lastmodified"
> />
>             </entity>
>         </entity>
>     </document>
> </dataConfig>
> 
> 
> The Solr version is 3.5. any idea?

The implicit fields fileDir, file, fileAbsolutePath, fileSize, fileLastModified 
are generated by the FileListEntityProcessor. They should be defined above the 
TikaEntityProcessor.  

Reply via email to