FYI: I've committed the rest of the work I was doing on trunk in this area.
On Aug 2, 2012, at 4:42 PM, Timothy Potter <thelabd...@gmail.com> wrote: > Yes, I can but won't get to it today unfortunately. I had my eval > environment running on some very expensive EC2 instances and shut it > down for the time being until I can focus on it again. Will try to get > back to this either tomorrow or over the weekend. Sorry for the delay. > > Tim > > On Thu, Aug 2, 2012 at 1:35 PM, Mark Miller <markrmil...@gmail.com> wrote: >> Can you do me a favor and try not using the batch add for a run? >> >> Just do the add one doc at a time. (solrServer.add(doc) rather than >> solrServer.add(collection)) >> >> I just fixed one issue with it this morning on trunk - it may be the cause >> of this oddity. >> >> I'm also working on some performance issues around that method too (good >> performance without starting thousands of threads). >> >> Until I get all that straightened out (hopefully very soon), I think you >> will have better luck not using the bulk, collection add method. >> >> On Aug 2, 2012, at 2:16 PM, Timothy Potter <thelabd...@gmail.com> wrote: >> >>> Thanks Mark. >>> >>> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer: >>> >>> Collection<SolrInputDocument> batch = ... >>> ... build up batch ... >>> solrServer.add( batch ); >>> >>> Basically, I have a custom Pig StoreFunc that sends docs to Solr from >>> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA >>> is that I couldn't get it to run in my Hadoop environment. There's >>> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on >>> 4.1.3 but when I run it in my env, I get the following: >>> >>> Caused by: java.lang.NoSuchMethodError: >>> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method >>> <init>()V not found >>> at >>> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94) >>> at >>> org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70) >>> ... 16 more >>> >>> I spent hours trying to resolve the classpath issue and finally had to >>> bail and just used the 3.4 SolrJ client as I'm just at the evaluation >>> stage at this point. So it sounds like this could be the cause of my >>> problems. >>> >>> One other thing ... I do have the _version_ field defined in my >>> schema.xml but am not setting it on the client side when indexing. >>> Should I be doing that? >>> >>> Cheers, >>> Tim >>> >>> >>> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmil...@gmail.com> wrote: >>>> >>>> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabd...@gmail.com> wrote: >>>> >>>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very >>>>> impressed so far ... >>>>> >>>>> I have a 12-shard index with ~104M docs with each shard having >>>>> 1-replica (so 24 Solr servers running) >>>>> >>>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery >>>>> (*:*) and each time I send the request the value for numFound in the >>>>> result is different. It's always close but not exactly the same as I >>>>> would expect? Can anyone shed some light on this issue? I also tried a >>>>> real query, such as "#olympics lochte" and same thing - different >>>>> numFound each time. The first page of actual docs returned is the same >>>>> so maybe I should just ignore the numFound issue? >>>>> >>>>> Note that while experiencing this behavior, I am not adding any docs >>>>> to the index and all docs have been committed with waitFlush=true and >>>>> waitSearcher=true on the commit. Also, not doing soft commits at this >>>>> point. In addition, after having committed all 104M docs, I hit the >>>>> optimize button the panel so I have only 1 segment. In other words, >>>>> the index is not being updated and has been optimized at this point. >>>> >>>> >>>> How are you adding docs? Eg what client and what method in particular >>>> (what is your line of code that actually adds the doc). >>>> >>>> You can find the numFound result for each node by passing the param >>>> distrib=false. What does this tell you? Are your replicas in sync with the >>>> leader? What does the count for each shard add up to? >>>> >>>> I would not ignore the issue - something must be off. It may somehow be >>>> user error, it may be a bug that has been fixed since the alpha, or it may >>>> be something new. >>>> >>>> Are you sure every shard you are issuing the query *from* is active and >>>> live according to ZooKeeper? Eg when you look at the cloud admin view and >>>> look at the cluster visualization, are all the nodes green? >>>> >>>> - Mark Miller >>>> lucidimagination.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com