Wiki page exists already: http://wiki.apache.org/solr/post.jar
I'm happy to consider a refactoring, especially if it make it SIMPLER to read and interact with and doesn't add a ton of mandatory dependencies. It should probably still be possible to say something like javac org/apache/solr/util/SimplePostTool.java java -cp . org.apache.solr.util.SimplePostTool -h That's just how I've been thinking so far though. If other committers are happy with abandoning the simple-ness and instead create a best-practices based feature-rich tool with dependencies, then I'll not object. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 5. feb. 2013 kl. 05:22 skrev Upayavira <u...@odoko.co.uk>: > Thx Jan, > > All I know is I've got a data set of 500k documents, Solr formatted, and > I want it to be as easy as possible to get them into Solr. I also want > to be able to show the benefit of multithreading. The outcome would > really be "make sure your code uses multiple threads to push to Solr" > rather than "use post.jar in production". I see post.jar as a > demonstration tool, rather than anything else, and am considering adding > another feature to enhance that. > > However, I did stall once I started looking at the SimplePostTool.jar > class, because it is loosing its connection with the term 'Simple'. > Adding multithreading, however useful, correct, whatever, would > completely push it over the edge. Thus, I think the proper approach is > to refactor the tool into a number of classes, and only then think about > adding multithreading as a completely separate affair. I'm more than > happy to have a go at that refactoring, especially if you're prepared to > review it. > > I guess the other thing that is much needed is a wiki page that details > the features of the tool, and also explains that its role is > educational, rather than anything else. > > Upayavira > > On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote: >> Hi, >> >> Hmm, the tool is getting bloated for a one-class no-deps tool already :) >> Guess it would be useful too with real-life code examples using SolrJ and >> other libs as well (such as robots.txt lib, commons-cli etc), but whether >> that should be an extension of SimplePostTool or a totally new tool from >> scratch is something to discuss. Please bring on your ideas of how you >> plan to extend it, perhaps even simplifying the code in the process? >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> 3. feb. 2013 kl. 17:19 skrev Upayavira <u...@odoko.co.uk>: >> >>> I have a scenario in which I need to post 500,000 documents to Solr as a >>> test. I have these documents in XML files already formatted in Solr's >>> xml format. >>> >>> Posting to Solr using post.jar it takes 1m55s. With a bit of bash >>> jiggery-pokery, I was able to get this down to 1m08s by running four >>> concurrent post.jar instances, which strikes me as a significant >>> improvement. >>> >>> I'm considering adding multithreaded capabilities to post.jar, but >>> before I go to that effort, I wanted to see if anyone else would >>> consider it a useful feature. Given that the SimplePostTool is becoming >>> far from simple, I wanted to see whether the feature is likely to be >>> accepted before I put in the effort. Also, I would need to consider >>> which parts of the tool to add that to. Currently I only want it for >>> posting XML docs, but there's also crawling capabilities in it too. >>> >>> Thoughts? >>> >>> Upayavira >>