Hi Colin,

Solr's DataImportHandler sounds like what you want:

        http://wiki.apache.org/solr/DataImportHandler

In particular, take a look at FileListEntityProcessor:

        http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

Steve

> -----Original Message-----
> From: csm [mailto:cmcswig...@gmail.com]
> Sent: Friday, March 04, 2011 5:50 PM
> To: solr-user@lucene.apache.org
> Subject: Help please - recursively indexing lots and lots of text files
> 
> Hi,
> 
> I'm new to Lucene/Solr and I'm trying to build an index of a large body of
> plaintext files for some corpus research that I'm doing.  There are about
> 37,000 files of typically 50-100 lines each, and they're scattered
> throughout a huge nested directory structure.  I've worked through the
> basic
> Solr tutorial and the text/html indexing tutorial at
> http://www.slideshare.net/LucidImagination/indexing-text-and-html-files-
> with-solr-4063407
> , but after some looking around, I haven't been able to find any resources
> for indexing a large number of text files that aren't all sitting in the
> same directory.
> 
> Is this simply a case of having to write a shell script to crawl through
> the
> whole directory tree and call cURL for every single file, or is there a
> library or utility that can do this, or just an easier way?  Any help
> would
> be greatly appreciated!  Alternatively, if this is a solved problem and I
> just need to RTFM, it'd be great if someone could point me in the right
> direction.
> 
> Thanks a lot,
> Colin
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Help-
> please-recursively-indexing-lots-and-lots-of-text-files-
> tp2635884p2635884.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to