Re: Solr configuration with Text files

Erik Hatcher Wed, 11 Mar 2009 02:35:04 -0700

Using Solr Cell (ExtractingRequestHandler) which is now built intotrunk, and thus an eventual Solr 1.4 release, indexing a directory oftext (or even Word, PDF, etc) files is mostly 'out of the box'.

It still requires scripting an iteration over all files and sendingthem. Here's an example of doing that scripting using Ant and the ant-contrib <for> and <post> tasks:


  <target name="index-docs" description="Index documents">
    <for param="filename">
      <fileset dir="${docs.dir}"/>
      <sequential>
        <echo>Processing @{filename}</echo>

<post to="${solr.url}/update/extract" verbose="false"failonerror="true">

          <prop name="stream.file" value="@{filename}"/>
          <prop name="ext.resource.name" value="@{filename}"/>
          <prop name="ext.idx.attr" value="false"/>
          <prop name="ext.ignore.und.fl" value="true"/>

          <prop name="ext.literal.id" value="@{filename}"/>
          <prop name="ext.def.fl" value="text"/>
          <prop name="ext.map.title" value="title"/>
          <prop name="wt" value="ruby"/>
        </post>
      </sequential>
    </for>
  </target>

And it also should be possible, perhaps slightly easier and more built-in to do the entire iteration using DataImportHandler's ability toiterate over a list of files and read their contents into a field.[an example of this on the wiki would be handy, or a pointer to it ifit doesn't already exist]


        Erik


On Mar 10, 2009, at 2:01 PM, KennyN wrote:

This functionality is possible 'out of the box', right? Or am Igoing to needto code up something that reads in the id named files and generatesthe xml
file?
--
View this message in context: 
http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr configuration with Text files

Reply via email to