I have a scenario in which I need to post 500,000 documents to Solr as a
test. I have these documents in XML files already formatted in Solr's
xml format.

Posting to Solr using post.jar it takes 1m55s. With a bit of bash
jiggery-pokery, I was able to get this down to 1m08s by running four
concurrent post.jar instances, which strikes me as a significant
improvement.

I'm considering adding multithreaded capabilities to post.jar, but
before I go to that effort, I wanted to see if anyone else would
consider it a useful feature. Given that the SimplePostTool is becoming
far from simple, I wanted to see whether the feature is likely to be
accepted before I put in the effort. Also, I would need to consider
which parts of the tool to add that to. Currently I only want it for
posting XML docs, but there's also crawling capabilities in it too.

Thoughts?

Upayavira

Reply via email to