Wiki page exists already: http://wiki.apache.org/solr/post.jar

I'm happy to consider a refactoring, especially if it make it SIMPLER to read 
and interact with and doesn't add a ton of mandatory dependencies. It should 
probably still be possible to say something like

  javac org/apache/solr/util/SimplePostTool.java
  java -cp . org.apache.solr.util.SimplePostTool -h

That's just how I've been thinking so far though. If other committers are happy 
with abandoning the simple-ness and instead create a best-practices based 
feature-rich tool with dependencies, then I'll not object.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

5. feb. 2013 kl. 05:22 skrev Upayavira <u...@odoko.co.uk>:

> Thx Jan,
> 
> All I know is I've got a data set of 500k documents, Solr formatted, and
> I want it to be as easy as possible to get them into Solr. I also want
> to be able to show the benefit of multithreading. The outcome would
> really be "make sure your code uses multiple threads to push to Solr"
> rather than "use post.jar in production". I see post.jar as a
> demonstration tool, rather than anything else, and am considering adding
> another feature to enhance that.
> 
> However, I did stall once I started looking at the SimplePostTool.jar
> class, because it is loosing its connection with the term 'Simple'.
> Adding multithreading, however useful, correct, whatever, would
> completely push it over the edge. Thus, I think the proper approach is
> to refactor the tool into a number of classes, and only then think about
> adding multithreading as a completely separate affair. I'm more than
> happy to have a go at that refactoring, especially if you're prepared to
> review it.
> 
> I guess the other thing that is much needed is a wiki page that details
> the features of the tool, and also explains that its role is
> educational, rather than anything else.
> 
> Upayavira
> 
> On Mon, Feb 4, 2013, at 09:10 PM, Jan Høydahl wrote:
>> Hi,
>> 
>> Hmm, the tool is getting bloated for a one-class no-deps tool already :)
>> Guess it would be useful too with real-life code examples using SolrJ and
>> other libs as well (such as robots.txt lib, commons-cli etc), but whether
>> that should be an extension of SimplePostTool or a totally new tool from
>> scratch is something to discuss. Please bring on your ideas of how you
>> plan to extend it, perhaps even simplifying the code in the process?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> 3. feb. 2013 kl. 17:19 skrev Upayavira <u...@odoko.co.uk>:
>> 
>>> I have a scenario in which I need to post 500,000 documents to Solr as a
>>> test. I have these documents in XML files already formatted in Solr's
>>> xml format.
>>> 
>>> Posting to Solr using post.jar it takes 1m55s. With a bit of bash
>>> jiggery-pokery, I was able to get this down to 1m08s by running four
>>> concurrent post.jar instances, which strikes me as a significant
>>> improvement.
>>> 
>>> I'm considering adding multithreaded capabilities to post.jar, but
>>> before I go to that effort, I wanted to see if anyone else would
>>> consider it a useful feature. Given that the SimplePostTool is becoming
>>> far from simple, I wanted to see whether the feature is likely to be
>>> accepted before I put in the effort. Also, I would need to consider
>>> which parts of the tool to add that to. Currently I only want it for
>>> posting XML docs, but there's also crawling capabilities in it too.
>>> 
>>> Thoughts?
>>> 
>>> Upayavira
>> 

Reply via email to