Btw wouldn't this be a chance to create a solr cli tool, much like es2unix? Maybe with a shell? I'm off-line now, but I recently came across a java lib that makes this easy... jclam jsomething ...
Otis Solr & ElasticSearch Support http://sematext.com/ On Feb 6, 2013 8:48 AM, "Jan Høydahl" <jan....@cominvent.com> wrote: > With dependencies I meant external jar dependencies. Perhaps extensions > could have deps while leaving the "core" compilable without? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > > 5. feb. 2013 kl. 17:10 skrev Upayavira <u...@odoko.co.uk>: > > > By dependencies, do you mean other java classes? I was thinking of > > splitting it out into a few classes, each of which is clearer in its > > purpose. > > > > Upayavira > > > > On Tue, Feb 5, 2013, at 02:26 PM, Jan Høydahl wrote: > >> Wiki page exists already: http://wiki.apache.org/solr/post.jar > >> > >> I'm happy to consider a refactoring, especially if it make it SIMPLER to > >> read and interact with and doesn't add a ton of mandatory dependencies. > >> It should probably still be possible to say something like > >> > >> javac org/apache/solr/util/SimplePostTool.java > >> java -cp . org.apache.solr.util.SimplePostTool -h > >> > >> That's just how I've been thinking so far though. If other committers > are > >> happy with abandoning the simple-ness and instead create a > best-practices > >> based feature-rich tool with dependencies, then I'll not object. > >> > >> -- > >> Jan Høydahl, search solution architect > >> Cominvent AS - www.cominvent.com > >> Solr Training - www.solrtraining.com > >> > >> 5. feb. 2013 kl. 05:22 skrev Upayavira <u...@odoko.co.uk>: > >> > >>> Thx Jan, > >>> > >>> All I know is I've got a data set of 500k documents, Solr formatted, > and > >>> I want it to be as easy as possible to get them into Solr. I also want > >>> to be able to show the benefit of multithreading. The outcome would > >>> really be "make sure your code uses multiple threads to push to Solr" > >>> rather than "use post.jar in production". I see post.jar as a > >>> demonstration tool, rather than anything else, and am considering > adding > >>> another feature to enhance that. > >>> > >>> However, I did stall once I started looking at the SimplePostTool.jar > >>> class, because it is loosing its connection with the term 'Simple'. > >>> Adding multithreading, however useful, correct, whatever, would > >>> completely push it over the edge. Thus, I think the proper approach is > >>> to refactor the tool into a number of classes, and only then think > about > >>> adding multithreading as a completely separate affair. I'm more than > >>> happy to have a go at that refactoring, especially if you're prepared > to > >>> review it. > >>> > >>> I guess the other thing that is much needed is a wiki page that details > >>> the features of the tool, and also explains that its role is > >>> educational, rather than anything else. > >>> > >>> Upayavira > >>> > >>> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote: > >>>> Hi, > >>>> > >>>> Hmm, the tool is getting bloated for a one-class no-deps tool already > :) > >>>> Guess it would be useful too with real-life code examples using SolrJ > and > >>>> other libs as well (such as robots.txt lib, commons-cli etc), but > whether > >>>> that should be an extension of SimplePostTool or a totally new tool > from > >>>> scratch is something to discuss. Please bring on your ideas of how you > >>>> plan to extend it, perhaps even simplifying the code in the process? > >>>> > >>>> -- > >>>> Jan Høydahl, search solution architect > >>>> Cominvent AS - www.cominvent.com > >>>> Solr Training - www.solrtraining.com > >>>> > >>>> 3. feb. 2013 kl. 17:19 skrev Upayavira <u...@odoko.co.uk>: > >>>> > >>>>> I have a scenario in which I need to post 500,000 documents to Solr > as a > >>>>> test. I have these documents in XML files already formatted in Solr's > >>>>> xml format. > >>>>> > >>>>> Posting to Solr using post.jar it takes 1m55s. With a bit of bash > >>>>> jiggery-pokery, I was able to get this down to 1m08s by running four > >>>>> concurrent post.jar instances, which strikes me as a significant > >>>>> improvement. > >>>>> > >>>>> I'm considering adding multithreaded capabilities to post.jar, but > >>>>> before I go to that effort, I wanted to see if anyone else would > >>>>> consider it a useful feature. Given that the SimplePostTool is > becoming > >>>>> far from simple, I wanted to see whether the feature is likely to be > >>>>> accepted before I put in the effort. Also, I would need to consider > >>>>> which parts of the tool to add that to. Currently I only want it for > >>>>> posting XML docs, but there's also crawling capabilities in it too. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> Upayavira > >>>> > >> > >