Using Solr Cell (ExtractingRequestHandler) which is now built into
trunk, and thus an eventual Solr 1.4 release, indexing a directory of
text (or even Word, PDF, etc) files is mostly 'out of the box'.
It still requires scripting an iteration over all files and sending
them. Here's an example of doing that scripting using Ant and the ant-
contrib <for> and <post> tasks:
<target name="index-docs" description="Index documents">
<for param="filename">
<fileset dir="${docs.dir}"/>
<sequential>
<echo>Processing @{filename}</echo>
<post to="${solr.url}/update/extract" verbose="false"
failonerror="true">
<prop name="stream.file" value="@{filename}"/>
<prop name="ext.resource.name" value="@{filename}"/>
<prop name="ext.idx.attr" value="false"/>
<prop name="ext.ignore.und.fl" value="true"/>
<prop name="ext.literal.id" value="@{filename}"/>
<prop name="ext.def.fl" value="text"/>
<prop name="ext.map.title" value="title"/>
<prop name="wt" value="ruby"/>
</post>
</sequential>
</for>
</target>
And it also should be possible, perhaps slightly easier and more built-
in to do the entire iteration using DataImportHandler's ability to
iterate over a list of files and read their contents into a field.
[an example of this on the wiki would be handy, or a pointer to it if
it doesn't already exist]
Erik
On Mar 10, 2009, at 2:01 PM, KennyN wrote:
This functionality is possible 'out of the box', right? Or am I
going to need
to code up something that reads in the id named files and generates
the xml
file?
--
View this message in context:
http://www.nabble.com/Solr-configuration-with-Text-files-tp22438201p22440095.html
Sent from the Solr - User mailing list archive at Nabble.com.