Hi All,
         I need to index the documents presents in my file system at various
locations (e.g. C:\docs , d:\docs ).
    Is there any way through which i can specify this in my DIH
Configuration.
    Here is my configuration:-

<document>
      <entity name="sd"
        processor="FileListEntityProcessor"
        fileName="docx$|doc$|pdf$|xls$|xlsx|html$|rtf$|txt$|zip$"
*baseDir="G:\\Desktop\\"*
        recursive="false"
        rootEntity="true"
        transformer="DateFormatTransformer"
onerror="continue">
        <entity name="tikatest"
processor="org.apache.solr.handler.dataimport.TikaEntityProcessor"
url="${sd.fileAbsolutePath}" format="text" dataSource="bin">
          <field column="Author" name="author" meta="true"/>
          <field column="Content-Type" name="title" meta="true"/>
          <!-- field column="title" name="title" meta="true"/ -->
          <field column="text" name="all_text"/>
        </entity>

        <!-- field column="fileLastModified" name="date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" / -->
        <field column="fileSize" name="size"/>
        <field column="file" name="filename"/>
    </entity>
<!--baseDir="../site"-->
  </document>

/ Pankaj Bhatt.

Reply via email to