Terry -

You’re speaking of bin/post, looks like.   bin/post is _just_ a simple tool to 
provide some basic utility.   The fact that it can recurse a directory 
structure at all is an extra bonus that really isn’t about “Solr” per se, but 
about posting content into it.   

Frankly, (even as the author of bin/post) I don’t think bin/post for file 
system crawling is the rightest way to go.   Having Solr parse content (which 
bin/post sends into Solr’s /update/extract handler) itself is recommended for 
production/scale.

All caveats aside and recommendations to upsize your file crawler…. it’s just a 
bin/post shell script and a Java class called SimplePostTool - I’d encourage 
you to adapt what it does to your requirements so that it will send over .eml 
files like apparently work manually (how did you test that?  curious on the 
details), and handle multiple directories.   It wasn’t designed to handle 
robust file crawls, but certainly is there for your taking to adjust to your 
needs if it is close enough.   And of course, if you want to generalize the 
handling and submit that back then bin/post can improve!

In short: no, bin/post can’t do the things you’re asking of it, but there’s no 
reason it couldn’t be evolved to handle those things.

        Erik


> 
> I note this message that's displayed when I begin indexing: "Entering
> auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> 
> Is there a way to get it to recurse through files with different
> extensions, for example, like .eml?  When I manually add all the
> subdirectory content, solr seems to parse the content very well,
> recognizing all the standard email metadata.  I just can't get it to do
> the indexing recursively.
> 
> Second question: if I want to index files from many different source
> directories, is there a way to specify these different sources in one
> command? (Right now I have to issue a separate indexing command for each
> directory - which means I have to sit around and wait till each is
> finished.)
> 
> Third question: I have a very large directory structure that includes a
> couple of subdirectories I'd like to exclude from indexing.  Is there a
> way to index recursively, but exclude specified directories?
> 

Reply via email to