hi all,
I would index on solr my pdf files wich includeds on my directory c:\myfile\

so, I add on my solr/conf directory the file data-config.xml like the
following:


<dataConfig>
<dataSource type="BinFileDataSource" />
<document>
<entity name="f" dataSource="null" rootEntity="false"
processor="FileListEntityProcessor"
baseDir="c:\myfile\" fileName="*.pdf"
recursive="true">
<entity name="tika-test" processor="TikaEntityProcessor"
url="${f.fileAbsolutePath}" format="text">
<field column="author" name="author" meta="true"/>
<field column="title" name="title" meta="true"/>
 <field column="content_type" name="content_type" meta="true"/>
</entity>
</entity>
</document>
</dataConfig>

before, I add this part into solr-config.xml:


<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">c:\solr\conf\data-config.xml</str>
    </lst>
  </requestHandler>


but this is the result:

....
* * <str name="*command*">*delta-import*</str>
 * * <str name="*status*">*idle*</str>
 * * <str name="*importResponse*" />
 
*-*<http://pc-alessio:8983/solr/select?clean=false&commit=true&qt=%2Fdataimport&command=delta-import#>
<lst name="*statusMessages*">
 * * <str name="*Time Elapsed*">*0:0:2.512*</str>
 * * <str name="*Total Requests made to DataSource*">*0*</str>
 * * <str name="*Total Rows Fetched*">*0*</str>
 * * <str name="*Total Documents Processed*">*0*</str>
 * * <str name="*Total Documents Skipped*">*0*</str>
 * * <str name="*Full Dump Started*">*2012-02-09 23:37:07*</str>
 * * <str name="**">*Indexing failed. Rolled back all changes.*</str>
 * * <str name="*Rolledback*">*2012-02-09 23:37:07*</str>
* * </lst>
 * * <str name="*WARNING*">*This response format is experimental. It is
likely to change in the future.*</str>
* * </response>

suggestions?
thanks
alessio

Reply via email to