Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread Jack Krupansky
I defer to Erick on on this level of detail and experience. Let's continue the discussion - some of it will be a matter of how to configure and tune Solr, how to select, configure, and tune hardware, the need for further Lucene/Solr improvements, and how much further we have to go to get to th

Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread shushuai zhu
Erick, Thanks a lot for the detailed answers. They are very helpful and I do get some idea from them. As per our searches, we will mainly do term and field (AND/OR) searches, histogram, and faceting. Generally the queries are bound by time (e.g, last hour, last day, last week, or even last mon

Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread Erick Erickson
Well, the "commonsense limits" Jack is referring to in that post are more (IMO) scales you should count on having to do some _serious_ prototyping/configuring/etc. As you scale out, you'll run into edge cases that aren't the common variety, aren't reliably tested every night, etc. I mean how would

Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread shushuai zhu
Jack, thanks for your reply. Sorry for the confusion about 4 nodes. What I meant was to use 4 nodes to do some POC, mainly focusing on handling the high incoming rate in a few days  instead of storing data over one year. You estimated the required nodes (6,308) and storage (322TB) based on the

Re: Rounding errors with SOLR score

2014-03-22 Thread William Bell
I will send the debugQuery. They are exactly the same. On Fri, Mar 21, 2014 at 2:59 AM, Raymond Wiker wrote: > Are you sure that SOLR is rounding incorrectly, and not simply differently > from what you expect? I was surprised myself at some of the rounding > behaviour I saw with SOLR, but acco

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-22 Thread Michael Sokolov
Excellent, thanks Shalin! On 3/22/2014 3:32 PM, Shalin Shekhar Mangar wrote: Thanks Michael! I just committed your fix. It will be released with 4.7.1 On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov wrote: I just managed to track this down -- as you said the disconnect was a red herring. Ul

Re: Solr Cloud collection keep going down?

2014-03-22 Thread Shawn Heisey
On 3/22/2014 1:23 PM, Software Dev wrote: > We have 2 collections with 1 shard each replicated over 5 servers in the > cluster. We see a lot of flapping (down or recovering) on one of the > collections. When this happens the other collection hosted on the same > machine is still marked as active. W

Re: using SolrJ with SolrCloud, searching multiple indexes.

2014-03-22 Thread Shawn Heisey
On 3/22/2014 7:34 AM, Russell Taylor wrote: > Yeah sorry didn't explain myself there, one of the three zookeepers will > return me one of the solrcloud machines for me to access the index. I either > need to know which machine it returned(is this feasible I can't seem to find > a way to access i

Re: Limit on # of collections -SolrCloud

2014-03-22 Thread Chris W
I figured out that most of the startup time seems to spent on waiting for replicas to recover. It waits from 6 seconds all the way upto 600 seconds for replicas to recover before trying again and sometimes it succeeds and otherwise it marks the core as down. Is there a way to reduce the timeout whi

Re: Solr4.7 No live SolrServers available to handle this request

2014-03-22 Thread Shalin Shekhar Mangar
Thanks Michael! I just committed your fix. It will be released with 4.7.1 On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov wrote: > I just managed to track this down -- as you said the disconnect was a red > herring. > > Ultimately the problem was caused by a custom analysis component we wrote >

Re: Solr Cloud collection keep going down?

2014-03-22 Thread Software Dev
Some logs the core in question is "items". - WARN - 2014-03-22 02:37:13.344; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=10.0.14.101:8983_solr_itemscore=items WARN

Solr Cloud collection keep going down?

2014-03-22 Thread Software Dev
We have 2 collections with 1 shard each replicated over 5 servers in the cluster. We see a lot of flapping (down or recovering) on one of the collections. When this happens the other collection hosted on the same machine is still marked as active. When this happens it takes a fairly long time (~30

Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread Jack Krupansky
20K docs/sec = 20,000 * 60 * 60 * 24 = 1,728,000,000 = 1.7 billion docs/day * 365 = 630,720,000,000 = 631 billion docs/yr At 100 million docs/node = 6,308 nodes! And you think you can do it with 4 nodes? Oh, and that's before replication! 0.5K/doc * 631 billion docs = 322 TB. -- Jack Krupans

Re: Best approach to handle large volume of documents with constantly high incoming rate?

2014-03-22 Thread shushuai zhu
Any thoughts? Can Solr Cloud support such use case with acceptable performance? On Thursday, March 20, 2014 7:51 PM, shushuai zhu wrote: Hi, I am looking for some advice to handle large volume of documents with a very high incoming rate. The size of each document is about 0.5 KB and the i

RE: using SolrJ with SolrCloud, searching multiple indexes.

2014-03-22 Thread Russell Taylor
Yeah sorry didn't explain myself there, one of the three zookeepers will return me one of the solrcloud machines for me to access the index. I either need to know which machine it returned(is this feasible I can't seem to find a way to access information in SolrCloudServer) and then add the ext

setting up solr on tomcat

2014-03-22 Thread anupamk
Hi, Is the solrTomcat wiki article valid for solr-4.7.0 ? http://wiki.apache.org/solr/SolrTomcat I am not able to deploy solr after following the instructions there. When I try to access the solr admin page I get a 404. I followed every step exactly as mentioned in the wiki, still no dice. A