Terry - You’re speaking of bin/post, looks like. bin/post is _just_ a simple tool to provide some basic utility. The fact that it can recurse a directory structure at all is an extra bonus that really isn’t about “Solr” per se, but about posting content into it.
Frankly, (even as the author of bin/post) I don’t think bin/post for file system crawling is the rightest way to go. Having Solr parse content (which bin/post sends into Solr’s /update/extract handler) itself is recommended for production/scale. All caveats aside and recommendations to upsize your file crawler…. it’s just a bin/post shell script and a Java class called SimplePostTool - I’d encourage you to adapt what it does to your requirements so that it will send over .eml files like apparently work manually (how did you test that? curious on the details), and handle multiple directories. It wasn’t designed to handle robust file crawls, but certainly is there for your taking to adjust to your needs if it is close enough. And of course, if you want to generalize the handling and submit that back then bin/post can improve! In short: no, bin/post can’t do the things you’re asking of it, but there’s no reason it couldn’t be evolved to handle those things. Erik > > I note this message that's displayed when I begin indexing: "Entering > auto mode. File endings considered are > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > > Is there a way to get it to recurse through files with different > extensions, for example, like .eml? When I manually add all the > subdirectory content, solr seems to parse the content very well, > recognizing all the standard email metadata. I just can't get it to do > the indexing recursively. > > Second question: if I want to index files from many different source > directories, is there a way to specify these different sources in one > command? (Right now I have to issue a separate indexing command for each > directory - which means I have to sit around and wait till each is > finished.) > > Third question: I have a very large directory structure that includes a > couple of subdirectories I'd like to exclude from indexing. Is there a > way to index recursively, but exclude specified directories? >