Hi Colin, Solr's DataImportHandler sounds like what you want:
http://wiki.apache.org/solr/DataImportHandler In particular, take a look at FileListEntityProcessor: http://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor Steve > -----Original Message----- > From: csm [mailto:cmcswig...@gmail.com] > Sent: Friday, March 04, 2011 5:50 PM > To: solr-user@lucene.apache.org > Subject: Help please - recursively indexing lots and lots of text files > > Hi, > > I'm new to Lucene/Solr and I'm trying to build an index of a large body of > plaintext files for some corpus research that I'm doing. There are about > 37,000 files of typically 50-100 lines each, and they're scattered > throughout a huge nested directory structure. I've worked through the > basic > Solr tutorial and the text/html indexing tutorial at > http://www.slideshare.net/LucidImagination/indexing-text-and-html-files- > with-solr-4063407 > , but after some looking around, I haven't been able to find any resources > for indexing a large number of text files that aren't all sitting in the > same directory. > > Is this simply a case of having to write a shell script to crawl through > the > whole directory tree and call cURL for every single file, or is there a > library or utility that can do this, or just an easier way? Any help > would > be greatly appreciated! Alternatively, if this is a solved problem and I > just need to RTFM, it'd be great if someone could point me in the right > direction. > > Thanks a lot, > Colin > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Help- > please-recursively-indexing-lots-and-lots-of-text-files- > tp2635884p2635884.html > Sent from the Solr - User mailing list archive at Nabble.com.