Hi, It does the magic! Thanks a lot! Although I found the transformer was added there but has no reference, so I suppose it is not needed.
Thanks again! -----Original Message----- From: Luca Cavanna [mailto:cavannal...@gmail.com] Sent: 2012年3月28日 23:16 To: solr-user@lucene.apache.org Cc: Ahmet Arslan Subject: Re: how to store file path in Solr when using TikaEntityProcessor Hi, you should change your data-config moving data that come from FileListEntityProcessor to its entity, one level up. Try this configuration: <dataConfig> <dataSource name="bin" type="BinFileDataSource" /> <document> <entity name="f" dataSource="null" rootEntity="false" processor="FileListEntityProcessor" transformer="TemplateTransformer" baseDir="/home/luca/Documents" fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" onError="skip" recursive="true"> <field column="fileAbsolutePath" name="path" /> <field column="fileSize" name="size" /> <field column="fileLastModified" name="lastmodified" /> <entity name="tika-test" dataSource="bin" processor="TikaEntityProcessor" url="${f.fileAbsolutePath}" format="text" onError="skip"> <field column="Author" name="author" meta="true"/> <field column="title" name="title" meta="true"/> <!--<field column="text" />--> </entity> </entity> </document> </dataConfig> On Wed, Mar 28, 2012 at 3:50 AM, ZHANG Liang F < liang.f.zh...@alcatel-sbell.com.cn> wrote: > Could you please show me how to get those values inside > TikaEntityProcessor? > > -----Original Message----- > From: Ahmet Arslan [mailto:iori...@yahoo.com] > Sent: 2012年3月27日 22:43 > To: solr-user@lucene.apache.org > Subject: Re: how to store file path in Solr when using > TikaEntityProcessor > > > > I am using DIH to index local file system. But the file path, size > > and lastmodified field were not stored. in the schema.xml I defined: > > > > <fields> > > <field name="title" type="string" > > indexed="true" stored="true"/> > > <field name="author" type="string" > > indexed="true" stored="true" /> > > <!--<field name="text" type="text" > > indexed="true" stored="true" /> > > liang added--> > > <field name="path" type="string" > > indexed="true" stored="true" /> > > <field name="size" type="long" > > indexed="true" stored="true" /> > > <field name="lastmodified" type="date" > > indexed="true" stored="true" /> > > </fields> > > > > > > And also defined tika-data-config.xml: > > > > <dataConfig> > > <dataSource name="bin" > > type="BinFileDataSource" /> > > <document> > > <entity name="f" > > dataSource="null" rootEntity="false" > > > > processor="FileListEntityProcessor" > > > > baseDir="E:/my_project/ecmkit/infotouch" > > > > fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)|(docx)|(ppt)" > > onError="skip" > > > > recursive="true"> > > <entity > > name="tika-test" dataSource="bin" > > processor="TikaEntityProcessor" > > > > url="${f.fileAbsolutePath}" format="text" > > onError="skip"> > > > > <field column="Author" name="author" meta="true"/> > > > > <field column="title" name="title" meta="true"/> > > > > <!-- > > > > <field column="text" name="text"/> --> > > > > <field column="fileAbsolutePath" name="path" /> > > > > <field column="fileSize" name="size" /> > > > > <field column="fileLastModified" name="lastmodified" > > /> > > </entity> > > </entity> > > </document> > > </dataConfig> > > > > > > The Solr version is 3.5. any idea? > > The implicit fields fileDir, file, fileAbsolutePath, fileSize, > fileLastModified are generated by the FileListEntityProcessor. They > should be defined above the TikaEntityProcessor. >