On 3/29/2018 3:59 PM, Terry Steichen wrote: > First question: When indexing content in a directory, Solr's normal > behavior is to recursively index all the files found in that directory > and its subdirectories. However, turns out that when the files are of > the form *.eml (email), solr won't do that. I can use a wildcard to get > it to index the current directory, but it won't recurse.
At first I had no idea what program you were using. I may have figured it out, see below. > I note this message that's displayed when I begin indexing: "Entering > auto mode. File endings considered are > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log That looks like the simple post tool included with Solr. If it is, type "bin/post -help" and you will see that there is a -filetypes option that lets you change the list of extensions that are considered valid. Note that the post tool included with Solr is a SIMPLE post tool. It's designed as a way to get your feet wet, not for heavy production usage. It does not have extensive capability. We strongly recommend that you graduate to a better indexing program. Usually that means that you're going to have to write one yourself, to be sure that it does everything YOU want it to do. The one included with Solr probably can't do some of the things that you want it to do. Also, indexing files using the post tool is going to run Tika extraction inside Solr. Tika is a separate Apache project. Solr happens to include a subset of Tika's capability that can run inside Solr. That program is known to sometimes behave explosively when it processes documents. If an explosion happens in Tika and it's running inside Solr, then Solr itself might crash. Running Tika outside Solr, usually in a program that you write yourself, is highly recommended. Doing this will also give you access to the full range of Tika's capabilities. Here's an example of a program that uses both JDBC and Tika to index to Solr: https://lucidworks.com/2012/02/14/indexing-with-solrj/ If you search google for "tika index solr" (without the quotes), you'll find some other examples of custom programs that use Tika to index to Solr. There may be better searches you can do on Google as well. Thanks, Shawn