Yes, I can but won't get to it today unfortunately. I had my eval environment running on some very expensive EC2 instances and shut it down for the time being until I can focus on it again. Will try to get back to this either tomorrow or over the weekend. Sorry for the delay.
Tim On Thu, Aug 2, 2012 at 1:35 PM, Mark Miller <markrmil...@gmail.com> wrote: > Can you do me a favor and try not using the batch add for a run? > > Just do the add one doc at a time. (solrServer.add(doc) rather than > solrServer.add(collection)) > > I just fixed one issue with it this morning on trunk - it may be the cause of > this oddity. > > I'm also working on some performance issues around that method too (good > performance without starting thousands of threads). > > Until I get all that straightened out (hopefully very soon), I think you will > have better luck not using the bulk, collection add method. > > On Aug 2, 2012, at 2:16 PM, Timothy Potter <thelabd...@gmail.com> wrote: > >> Thanks Mark. >> >> I'm actually using SolrJ 3.4.0, so using CommonsHttpSolrServer: >> >> Collection<SolrInputDocument> batch = ... >> ... build up batch ... >> solrServer.add( batch ); >> >> Basically, I have a custom Pig StoreFunc that sends docs to Solr from >> our Hadoop analytics nodes. The reason I'm not using SolrJ 4.0.0-ALPHA >> is that I couldn't get it to run in my Hadoop environment. There's >> some classpath conflict with the Apache HttpClient. SolrJ 4 depends on >> 4.1.3 but when I run it in my env, I get the following: >> >> Caused by: java.lang.NoSuchMethodError: >> org.apache.http.impl.conn.tsccm.ThreadSafeClientConnManager: method >> <init>()V not found >> at >> org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:94) >> at >> org.apache.solr.client.solrj.impl.CloudSolrServer.<init>(CloudSolrServer.java:70) >> ... 16 more >> >> I spent hours trying to resolve the classpath issue and finally had to >> bail and just used the 3.4 SolrJ client as I'm just at the evaluation >> stage at this point. So it sounds like this could be the cause of my >> problems. >> >> One other thing ... I do have the _version_ field defined in my >> schema.xml but am not setting it on the client side when indexing. >> Should I be doing that? >> >> Cheers, >> Tim >> >> >> On Thu, Aug 2, 2012 at 11:27 AM, Mark Miller <markrmil...@gmail.com> wrote: >>> >>> On Aug 2, 2012, at 11:08 AM, Timothy Potter <thelabd...@gmail.com> wrote: >>> >>>> Just starting to get into SolrCloud using 4.0.0-ALPHA and am very >>>> impressed so far ... >>>> >>>> I have a 12-shard index with ~104M docs with each shard having >>>> 1-replica (so 24 Solr servers running) >>>> >>>> Using the Query form on the Admin panel, I issue the MatchAllDocsQuery >>>> (*:*) and each time I send the request the value for numFound in the >>>> result is different. It's always close but not exactly the same as I >>>> would expect? Can anyone shed some light on this issue? I also tried a >>>> real query, such as "#olympics lochte" and same thing - different >>>> numFound each time. The first page of actual docs returned is the same >>>> so maybe I should just ignore the numFound issue? >>>> >>>> Note that while experiencing this behavior, I am not adding any docs >>>> to the index and all docs have been committed with waitFlush=true and >>>> waitSearcher=true on the commit. Also, not doing soft commits at this >>>> point. In addition, after having committed all 104M docs, I hit the >>>> optimize button the panel so I have only 1 segment. In other words, >>>> the index is not being updated and has been optimized at this point. >>> >>> >>> How are you adding docs? Eg what client and what method in particular (what >>> is your line of code that actually adds the doc). >>> >>> You can find the numFound result for each node by passing the param >>> distrib=false. What does this tell you? Are your replicas in sync with the >>> leader? What does the count for each shard add up to? >>> >>> I would not ignore the issue - something must be off. It may somehow be >>> user error, it may be a bug that has been fixed since the alpha, or it may >>> be something new. >>> >>> Are you sure every shard you are issuing the query *from* is active and >>> live according to ZooKeeper? Eg when you look at the cloud admin view and >>> look at the cluster visualization, are all the nodes green? >>> >>> - Mark Miller >>> lucidimagination.com >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> > > - Mark Miller > lucidimagination.com > > > > > > > > > > >