On Apr 23, 2012, at 9:27 AM, Erick Erickson wrote: > 50 hours is a really long time for 2M docs though, so something > doesn't seem right unless the docs are really unusual.
Don't forget he's n-gramming ;-) There's not much more demanding you could ask of text analysis except for throwing shingling in there too for good measure[*]. Neosky, you should consider using Solr trunk which has dramatic multithreaded indexing performance improvements if your hardware is capable. If you try trunk, use a large ramBufferSizeMB (say 2GB worth), but if you stick with Solr 3.x, use 1GB. And finally, increasing your mergeFactor will increase indexing performance at the expense of search speed. You could throw in an optimize at the very end with a maxSegments=10 or something to compensate. ~ David Smiley [*] that was a joke