fq caching question

2013-10-14 Thread Tim Vaillancourt

Hey guys,

Sorry for such a simple question, but I am curious as to the differences 
in caching between a "combined" filter query, and many separate filter 
queries.


Here are 2 example queries, one with combined fq, one separate:

1) "/select?q=*:*&fq=type:bid&fq=user_id:3"
2) "/select?q=*:*&fq=(type:bid%20AND%20user_id:3)"

For query #1: am I correct that the first query will keep 2 independent 
entries in the filterCache for type:bid and user_id:3?\
For query #2: is it correct that the 2nd query will keep 1 entry in the 
filterCache that satisfies all conditions?


Lastly, is it a fair statement that under general query patterns, many 
separate filter queries are more-cacheable than 1 combined one? Eg, if I 
performed query #2 (in the filterCache) and then changed the user_id, 
nothing about my new query is cache able, correct (but if I used 2 
separate filter queries than 1 of 2 is still cached)?


Cheers,

Tim Vaillancourt


Re: fq caching question

2013-10-14 Thread Tim Vaillancourt

Thanks Koji!

Cheers,

Tim

On 14/10/13 03:56 PM, Koji Sekiguchi wrote:

Hi Tim,

(13/10/15 5:22), Tim Vaillancourt wrote:

Hey guys,

Sorry for such a simple question, but I am curious as to the 
differences in caching between a

"combined" filter query, and many separate filter queries.

Here are 2 example queries, one with combined fq, one separate:

1) "/select?q=*:*&fq=type:bid&fq=user_id:3"
2) "/select?q=*:*&fq=(type:bid%20AND%20user_id:3)"

For query #1: am I correct that the first query will keep 2 
independent entries in the filterCache

for type:bid and user_id:3?\


Correct.

For query #2: is it correct that the 2nd query will keep 1 entry in 
the filterCache that satisfies

all conditions?


Correct.

Lastly, is it a fair statement that under general query patterns, 
many separate filter queries are
more-cacheable than 1 combined one? Eg, if I performed query #2 (in 
the filterCache) and then
changed the user_id, nothing about my new query is cache able, 
correct (but if I used 2 separate

filter queries than 1 of 2 is still cached)?


Yes, it is.

koji


Skipping caches on a /select

2013-10-16 Thread Tim Vaillancourt
Hey guys,

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the "cache=false" param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Is there a way to do this (without disabling my caches in solrconfig.xml),
or is this feature request?

Thanks!

Tim Vaillancourt


Re: SolrCloud on SSL

2013-10-16 Thread Tim Vaillancourt
Not important, but I'm also curious why you would want SSL on Solr (adds
overhead, complexity, harder-to-troubleshoot, etc)?

To avoid the overhead, could you put Solr on a separate VLAN (with ACLs to
client servers)?

Cheers,

Tim


On 12 October 2013 17:30, Shawn Heisey  wrote:

> On 10/11/2013 9:38 AM, Christopher Gross wrote:
> > On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey 
> wrote:
> >
> >> On 10/11/2013 8:17 AM, Christopher Gross wrote: 
> >>> Is there a spot in a Solr configuration that I can set this up to use
> >> HTTPS?
> >>
> >> From what I can tell, not yet.
> >>
> >> https://issues.apache.org/jira/browse/SOLR-3854
> >> https://issues.apache.org/jira/browse/SOLR-4407
> >> https://issues.apache.org/jira/browse/SOLR-4470
> >>
> >>
> > Dang.
>
> Christopher,
>
> I was just looking through Solr source code for a completely different
> issue, and it seems that there *IS* a way to do this in your configuration.
>
> If you were to use "https://hostname"; or "https://ipaddress"; as the
> "host" parameter in your solr.xml file on each machine, it should do
> what you want.  The parameter is described here, but not the behavior
> that I have discovered:
>
> http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params
>
> Boring details: In the org.apache.solr.cloud package, there is a
> ZkController class.  The getHostAddress method is where I discovered
> that you can do this.
>
> If you could try this out and confirm that it works, I will get the wiki
> page updated and look into the Solr reference guide as well.
>
> Thanks,
> Shawn
>
>


Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt

Thanks Yonik,

Does "cache=false" apply to all caches? The docs make it sound like it 
is for filterCache only, but I could be misunderstanding.


When I force a commit and perform a /select a query many times with 
"cache=false", I notice my query gets cached still, my guess is in the 
queryResultCache. At first the query takes 500ms+, then all subsequent 
requests take 0-1ms. I'll confirm this queryResultCache assumption today.


Cheers,

Tim

On 16/10/13 06:33 PM, Yonik Seeley wrote:

On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourt  wrote:

I am debugging some /select queries on my Solr tier and would like to see
if there is a way to tell Solr to skip the caches on a given /select query
if it happens to ALREADY be in the cache. Live queries are being inserted
and read from the caches, but I want my debug queries to bypass the cache
entirely.

I do know about the "cache=false" param (that causes the results of a
select to not be INSERTED in to the cache), but what I am looking for
instead is a way to tell Solr to not read the cache at all, even if there
actually is a cached result for my query.

Yeah, cache=false for "q" or "fq" should already not use the cache at
all (read or write).

-Yonik


Re: Skipping caches on a /select

2013-10-17 Thread Tim Vaillancourt


  
  
Awesome, this make a lot of sense now. Thanks a lot guys.

Currently the only mention of this setting in the docs is under
filterQuery on the "SolrCaching" page as:

" Solr3.4 Adding the
localParam flag of {!cache=false} to a query will prevent
the filterCache from being consulted for that query. "

I will update the docs sometime soon to reflect that this can apply
to any query (q or fq).

Cheers,

Tim

On 17/10/13 01:44 PM, Chris Hostetter wrote:

  

: Does "cache=false" apply to all caches? The docs make it sound like it is for
: filterCache only, but I could be misunderstanding.

it's per *query* -- not per cache, or per request...

 /select?q={!cache=true}foo&fq={!cache=false}bar&fq={!cache=true}baz

...should cause 1 lookup/insert in the filterCache (baz) and 1 
lookup/insert into the queryResultCache (for the main query with it's 
associated filters & pagination)



-Hoss


  



Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE "recommended" container
for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim


Re: difference between apache tomcat vs Jetty

2013-10-24 Thread Tim Vaillancourt
Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta  wrote:

> Thought you may want to have a look at this:
>
> https://issues.apache.org/jira/browse/SOLR-4792
>
> P.S: There are no timelines for 5.0 for now, but it's the future
> nevertheless.
>
>
>
> On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt  >wrote:
>
> > I agree with Jonathan (and Shawn on the Jetty explanation), I think the
> > docs should make this a bit more clear - I notice many people choosing
> > Tomcat and then learning these details after, possibly regretting it.
> >
> > I'd be glad to modify the docs but I want to be careful how it is worded.
> > Is it fair to go as far as saying Jetty is 100% THE "recommended"
> container
> > for Solr, or should a recommendation be avoided, and maybe just a list of
> > pros/cons?
> >
> > Cheers,
> >
> > Tim
> >
>
>
>
> --
>
> Anshum Gupta
> http://www.anshumgupta.net
>


Re: difference between apache tomcat vs Jetty

2013-10-25 Thread Tim Vaillancourt
I (jokingly) propose we take it a step further and drop Java :)! I'm 
getting tired of trying to scale GC'ing JVMs!


Tim

On 25/10/13 09:02 AM, Mark Miller wrote:

Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer 
consider itself a webapp and will consider the fact that Jetty is a used an 
implementation detail.

We won’t necessarily make it impossible to use a different container, but the 
project won’t condone it or support it and may do some things that assume 
Jetty. Solr is taking over this layer in 5.0.

- Mark

On Oct 25, 2013, at 11:18 AM, Cassandra Targett  wrote:


In terms of adding or fixing documentation, the "Installing Solr" page
(https://cwiki.apache.org/confluence/display/solr/Installing+Solr)
includes a yellow box that says:

"Solr ships with a working Jetty server, with optimized settings for
Solr, inside the example directory. It is recommended that you use the
provided Jetty server for optimal performance. If you absolutely must
use a different servlet container then continue to the next section on
how to install Solr."

So, it's stated, but maybe not in a way that makes it clear to most
users. And maybe it needs to be repeated in another section.
Suggestions?

I did find this page,
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty,
which pretty much contradicts the previous text. I'll fix that now.

Other recommendations for where doc could be more clear are welcome.

On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourt  wrote:

Hmm, thats an interesting move. I'm on the fence on that one but it surely
simplifies some things. Good info, thanks!

Tim


On 24 October 2013 16:46, Anshum Gupta  wrote:


Thought you may want to have a look at this:

https://issues.apache.org/jira/browse/SOLR-4792

P.S: There are no timelines for 5.0 for now, but it's the future
nevertheless.



On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt
wrote:
I agree with Jonathan (and Shawn on the Jetty explanation), I think the
docs should make this a bit more clear - I notice many people choosing
Tomcat and then learning these details after, possibly regretting it.

I'd be glad to modify the docs but I want to be careful how it is worded.
Is it fair to go as far as saying Jetty is 100% THE "recommended"

container

for Solr, or should a recommendation be avoided, and maybe just a list of
pros/cons?

Cheers,

Tim




--

Anshum Gupta
http://www.anshumgupta.net



Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of the 
SolrCloud collection on each instance, only to notice the same problem - 
the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of a 
SolrCloud collection, the SolrCloud routing is bypassed and I am talking 
directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are no 
deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see the 
docCount of the core's index, it does not fluctuate, only the query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg: 
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is 
remaining "state: active" in my /clusterstate.json - something is really 
wrong with this cloud! Would a Zookeeper issue explain my varied results 
when querying a core directly?


Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).


Currently we are noticing inconsistent results from the SolrCloud when 
performing the same simple /select query many times to our collection. 
Almost every other query the numFound count (and the returned data) 
jumps between two very different values.


Initially I suspected a replica in a shard of the collection was 
inconsistent (and every other request hit that node) and started 
performing the same /select query direct to the individual cores of 
the SolrCloud collection on each instance, only to notice the same 
problem - the count jumps between two very different values!


I may be incorrect here, but I assumed when querying a single core of 
a SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking directly to a plain/non-SolrCloud core.


As you can see here, the count for 1 core of my SolrCloud collection 
fluctuates wildly, and is only receiving updates and no deletes to 
explain the jumps:


"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep 
numFound

  "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a 
single core would return inconsistent, flapping results if there are 
no deletes issued in my app to cause such jumps? Am I incorrect in my 
assumption that I am querying the core "directly"?


An interesting observation is when I do an /admin/cores call to see 
the docCount of the core's index, it does not fluctuate, only the 
query result.


That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Thanks Markus,

I'm not sure if I'm encountering the same issue. This JIRA mentions 10s 
of docs difference, I'm seeing differences in the multi-millions of 
docs, and even more strangely it very predictably flaps between a 123M 
value and an 87M value, a 30M+ doc difference.


Secondly, I'm not comparing values from 2 instances (Leader to Replica), 
I'm currently performing the same curl call to the same core directly 
and am seeing flapping results each time I perform the query, so this is 
currently happening within a single instance/core unless I am 
misunderstanding how to directly query a core.


Cheers,

Tim

On 04/12/13 02:46 PM, Markus Jelsma wrote:

https://issues.apache.org/jira/browse/SOLR-4260

Join the club Tim! Can you upgrade to trunk or incorporate the latest patches 
of related issues? You can fix it by trashing the bad node's data, although 
without multiple clusters it may be difficult to decide which node is bad.

We use the latest commits now (since tuesday) and are still waiting for it to 
happen again.

-Original message-

From:Tim Vaillancourt
Sent: Wednesday 4th December 2013 23:38
To: solr-user@lucene.apache.org
Subject: Re: Inconsistent numFound in SC when querying core directly

To add two more pieces of data:

1) This occurs with real, conditional queries as well (eg:
"q=key:timvaillancourt"), not just the "q=*:*" I provided in my email.
2) I've noticed when I bring a node of the SolrCloud down it is
remaining "state: active" in my /clusterstate.json - something is really
wrong with this cloud! Would a Zookeeper issue explain my varied results
when querying a core directly?

Thanks again!

Tim

On 04/12/13 02:17 PM, Tim Vaillancourt wrote:

Hey guys,

I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with
3-node external Zookeeper and 1 collection (2 shards, 2 replicas).

Currently we are noticing inconsistent results from the SolrCloud when
performing the same simple /select query many times to our collection.
Almost every other query the numFound count (and the returned data)
jumps between two very different values.

Initially I suspected a replica in a shard of the collection was
inconsistent (and every other request hit that node) and started
performing the same /select query direct to the individual cores of
the SolrCloud collection on each instance, only to notice the same
problem - the count jumps between two very different values!

I may be incorrect here, but I assumed when querying a single core of
a SolrCloud collection, the SolrCloud routing is bypassed and I am
talking directly to a plain/non-SolrCloud core.

As you can see here, the count for 1 core of my SolrCloud collection
fluctuates wildly, and is only receiving updates and no deletes to
explain the jumps:

"solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84739144,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":123596839,"start":0,"maxScore":1.0,"docs":[]

solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s
'http://backend:8983/solr/app_shard2_replica2/select?q=*:*&wt=json&rows=0&indent=true'|grep
numFound
   "response":{"numFound":84771358,"start":0,"maxScore":1.0,"docs":[]"


Could anyone help me understand why the same /select query direct to a
single core would return inconsistent, flapping results if there are
no deletes issued in my app to cause such jumps? Am I incorrect in my
assumption that I am querying the core "directly"?

An interesting observation is when I do an /admin/cores call to see
the docCount of the core's index, it does not fluctuate, only the
query result.

That was hard to explain, hopefully someone has some insight! :)

Thanks!

Tim


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to the 
thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am talking
: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-04 Thread Tim Vaillancourt

Hey all,

Now that I am getting correct results with "distrib=false", I've 
identified that 1 of my nodes has just 1/3rd of the total data set and 
totally explains the flapping in results. The fix for this is obvious 
(rebuild replica) but the cause is less obvious.


There is definately more than one issue going on with this SolrCloud 
(but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that 
/clusterstate.json doesn't seem to get updated when nodes are brought 
down/up is the reason why this replica remained in the distributed 
request chain without recovering/re-replicating from leader.


I imagine my Zookeeper ensemble is having some problems unrelated to 
Solr that is the real root cause.


Thanks!

Tim

On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
Chris, this is extremely helpful and it's silly I didn't think of this 
sooner! Thanks a lot, this makes the situation make much more sense.


I will gather some proper data with your suggestion and get back to 
the thread shortly.


Thanks!!

Tim

On 04/12/13 02:57 PM, Chris Hostetter wrote:

:
: I may be incorrect here, but I assumed when querying a single core 
of a
: SolrCloud collection, the SolrCloud routing is bypassed and I am 
talking

: directly to a plain/non-SolrCloud core.

No ... every query received from a client by solr is handled by a single
core -- if that core knows it's part of a SolrCloud collection then it
will do a distributed search across a random replica from each shard in
that collection.

If you want to bypass the distribute search logic, you have to say so
explicitly...

To ask an arbitrary replica to only search itself add "distrib=false" to
the request.

Alternatively: you can ask that only certain shard names (or certain
explicit replicas) be included in a distribute request..

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests



-Hoss
http://www.lucidworks.com/


Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
Very good point. I've seen this issue occur once before when I was playing
with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
news - we are just behind.

For anyone that is curious, on my earlier mention that
Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
Zookeeper has no issues taking set/creates to clusterstate.json (or any
znode), just this one node seemed to stay stuck as "state: active" while it
was very inconsistent for reasons unknown, potentially just bugs.

The good news is this will be resolved today with a create/destroy of the
bad replica.

Thanks all!

Tim


On 4 December 2013 16:50, Mark Miller  wrote:

> Keep in mind, there have been a *lot* of bug fixes since 4.3.1.
>
> - Mark
>
> On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt  wrote:
>
> > Hey all,
> >
> > Now that I am getting correct results with "distrib=false", I've
> identified that 1 of my nodes has just 1/3rd of the total data set and
> totally explains the flapping in results. The fix for this is obvious
> (rebuild replica) but the cause is less obvious.
> >
> > There is definately more than one issue going on with this SolrCloud
> (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
> /clusterstate.json doesn't seem to get updated when nodes are brought
> down/up is the reason why this replica remained in the distributed request
> chain without recovering/re-replicating from leader.
> >
> > I imagine my Zookeeper ensemble is having some problems unrelated to
> Solr that is the real root cause.
> >
> > Thanks!
> >
> > Tim
> >
> > On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
> >> Chris, this is extremely helpful and it's silly I didn't think of this
> sooner! Thanks a lot, this makes the situation make much more sense.
> >>
> >> I will gather some proper data with your suggestion and get back to the
> thread shortly.
> >>
> >> Thanks!!
> >>
> >> Tim
> >>
> >> On 04/12/13 02:57 PM, Chris Hostetter wrote:
> >>> :
> >>> : I may be incorrect here, but I assumed when querying a single core
> of a
> >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am
> talking
> >>> : directly to a plain/non-SolrCloud core.
> >>>
> >>> No ... every query received from a client by solr is handled by a
> single
> >>> core -- if that core knows it's part of a SolrCloud collection then it
> >>> will do a distributed search across a random replica from each shard in
> >>> that collection.
> >>>
> >>> If you want to bypass the distribute search logic, you have to say so
> >>> explicitly...
> >>>
> >>> To ask an arbitrary replica to only search itself add "distrib=false"
> to
> >>> the request.
> >>>
> >>> Alternatively: you can ask that only certain shard names (or certain
> >>> explicit replicas) be included in a distribute request..
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
> >>>
> >>>
> >>>
> >>> -Hoss
> >>> http://www.lucidworks.com/
>
>


Re: Inconsistent numFound in SC when querying core directly

2013-12-05 Thread Tim Vaillancourt
I spoke too soon, my plan for fixing this didn't quite work.

I've moved this issue into a new thread/topic: "No /clusterstate.json
updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE".

Thanks all for the help on this one!

Tim


On 5 December 2013 11:37, Tim Vaillancourt  wrote:

> Very good point. I've seen this issue occur once before when I was playing
> with 4.3.1 and don't  remember it happening since 4.5.0+, so that is good
> news - we are just behind.
>
> For anyone that is curious, on my earlier mention that
> Zookeeper/clusterstate.json was not taking updates: this was NOT correct.
> Zookeeper has no issues taking set/creates to clusterstate.json (or any
> znode), just this one node seemed to stay stuck as "state: active" while it
> was very inconsistent for reasons unknown, potentially just bugs.
>
> The good news is this will be resolved today with a create/destroy of the
> bad replica.
>
> Thanks all!
>
> Tim
>
>
> On 4 December 2013 16:50, Mark Miller  wrote:
>
>> Keep in mind, there have been a *lot* of bug fixes since 4.3.1.
>>
>> - Mark
>>
>> On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt 
>> wrote:
>>
>> > Hey all,
>> >
>> > Now that I am getting correct results with "distrib=false", I've
>> identified that 1 of my nodes has just 1/3rd of the total data set and
>> totally explains the flapping in results. The fix for this is obvious
>> (rebuild replica) but the cause is less obvious.
>> >
>> > There is definately more than one issue going on with this SolrCloud
>> (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that
>> /clusterstate.json doesn't seem to get updated when nodes are brought
>> down/up is the reason why this replica remained in the distributed request
>> chain without recovering/re-replicating from leader.
>> >
>> > I imagine my Zookeeper ensemble is having some problems unrelated to
>> Solr that is the real root cause.
>> >
>> > Thanks!
>> >
>> > Tim
>> >
>> > On 04/12/13 03:00 PM, Tim Vaillancourt wrote:
>> >> Chris, this is extremely helpful and it's silly I didn't think of this
>> sooner! Thanks a lot, this makes the situation make much more sense.
>> >>
>> >> I will gather some proper data with your suggestion and get back to
>> the thread shortly.
>> >>
>> >> Thanks!!
>> >>
>> >> Tim
>> >>
>> >> On 04/12/13 02:57 PM, Chris Hostetter wrote:
>> >>> :
>> >>> : I may be incorrect here, but I assumed when querying a single core
>> of a
>> >>> : SolrCloud collection, the SolrCloud routing is bypassed and I am
>> talking
>> >>> : directly to a plain/non-SolrCloud core.
>> >>>
>> >>> No ... every query received from a client by solr is handled by a
>> single
>> >>> core -- if that core knows it's part of a SolrCloud collection then it
>> >>> will do a distributed search across a random replica from each shard
>> in
>> >>> that collection.
>> >>>
>> >>> If you want to bypass the distribute search logic, you have to say so
>> >>> explicitly...
>> >>>
>> >>> To ask an arbitrary replica to only search itself add "distrib=false"
>> to
>> >>> the request.
>> >>>
>> >>> Alternatively: you can ask that only certain shard names (or certain
>> >>> explicit replicas) be included in a distribute request..
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/solr/Distributed+Requests
>> >>>
>> >>>
>> >>>
>> >>> -Hoss
>> >>> http://www.lucidworks.com/
>>
>>
>


No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE

2013-12-05 Thread Tim Vaillancourt
Hey guys,

I've been having an issue with 1 of my 4 replicas having an inconsistent
replica, and have been trying to fix it. At the core of this issue, I've
noticed /clusterstate.json doesn't seem to be receiving updates when cores
get unhealthy, or even added/removed.

Today I decided I would remove the "bad" replica from the SolrCloud and
force a sync of a new clean replica, so I ran a
'/admin/cores?command=UNLOAD&name=name' to drop it. After this, on the
instance with the "bad" replica, the core was removed from solr.xml but
strangely NOT the /clusterstate.json in Zookeeper - it remained in
Zookeeper unchanged, still with "state: active" :(.

So, I then manually edited the clusterstate.json with a Perl script,
removing the json data for the "bad" replica. I checked all nodes saw the
change themselves, things looked good. Then I brought the node up/down to
check that it was properly adding/removing itself from /live_nodes znode in
Zookeeper. That all worked perfectly, too.

Here is the really odd part: when I created a new replica on this node (to
replace the "bad" replica), the core was created on the node, and NO update
was made to /clusterstate.json. At this point this node had no cores, no
cores with state in /clusterstate.json, and all data dirs deleted, so this
is quite confusing.

Upon checking ACLs on /clusterstate.json, it is world/anyone accessible:

"[zk: localhost:2181(CONNECTED) 18] getAcl /clusterstate.json
'world,'anyone
: cdrwa"

Also, keep in mind my external Perl script had no issue updating
/clusterstate.json. Can anyone make any suggestions why /clusterstate.json
isn't getting updated when I create this new core?

One other thing I checked was the health of the Zookeeper ensemble, and all
3 Zookeepers have the same mZxid, ctime, mtime, etc for /clusterstate.json
and receive updates no problem, just this node isn't updating Zookeeper
somehow.

Any thoughts are much appreciated!

Thanks!

Tim


Re: Redis as Solr Cache

2014-01-02 Thread Tim Vaillancourt
This is a neat idea, but could be too close to lucene/etc.

You could jump up one level in the stack and use Redis/memcache as a
distributed HTTP cache in conjunction with Solr's HTTP caching and a proxy.
I tried doing this myself with Nginx, but I forgot what issue I hit - I
think "misses" needed logic outside of nginx but I didn't spend too much
time on it.

Tim


On 2 January 2014 07:51, Alexander Ramos Jardim <
alexander.ramos.jar...@gmail.com> wrote:

> You touched an interesting point. I am really assuming if a quick win
> scenario is even possible. But what would be the advantage of using Redis
> to keep Solr Cache if each node would keep it's own Redis cache?
>
>
> 2013/12/29 Upayavira 
>
> > On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote:
> > > While researching for Solr Caching options and interesting cases, I
> > > bumped
> > > on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has
> > any
> > > experience with this setup? Using Redis as Solr Cache.
> > >
> > > I see a lot of advantage in having a distributed cache for solr. One
> solr
> > > node benefiting from the cache generated on another one would be
> > > beautiful.
> > >
> > > I see problems too. Performance wise, I don't know if it would be
> viable
> > > for Solr to write it's cache through the network on Redis Master node.
> > >
> > > And what about if I have Solr nodes with different index version
> looking
> > > at
> > > the same cache?
> > >
> > > IMO as long as Redis is useful, if it isn't to have a distributed
> cache,
> > > I
> > > think it's not possible to get better performance using it.
> >
> > This idea makes assumptions about how a Solr/Lucene index operates.
> > Certainly, in a SolrCloud setup, each node is responsible for its own
> > committing, and its caches exist for the timespan between commits. Thus,
> > the cache one node will need will not necessarily be the same as the one
> > that is needed by another node, which might have a commit interval
> > slightly out of sync with the first.
> >
> > So, whilst this may be possible, and may give some benefits, I'd reckon
> > that it would be a rather substantial engineering exercise, rather than
> > the quick win you seem to be assuming it might be.
> >
> > Upayavira
> >
>
>
>
> --
> Alexander Ramos Jardim
>


Re: Perl Client for SolrCloud

2014-01-10 Thread Tim Vaillancourt
I'm pretty interested in taking a stab at a Perl CPAN for SolrCloud that 
is Zookeeper-aware; it's the least I can do for Solr as a non-Java 
developer. :)


A quick question though: how would I write the shard logic to behave 
similar to Java's Zookeeper-aware client? I'm able to get the hash/hex 
needed for each shard from clusterstate.json, but how do I know which 
field to hash on?


I'm guessing I also need to read the collection's schema.xml from 
Zookeeper to get uniqueKey, and then use that for sharding, or does the 
Java client take the sharding field as input? Looking for ideas here.


Thanks!

Tim

On 08/01/14 09:35 AM, Chris Hostetter wrote:

:>  I couldn't find anyone which can connect to SolrCloud similar to SolrJ's
:>  CloudSolrServer.
:
: Since I have a load balancer in front of 8 nodes, WebService::Solr[1] still
: works fine.

Right -- just because SolrJ is ZooKeeper aware doesn't mean you can *only*
talk to SolrCloud with SolrJ -- you can still use any HTTP client of your
choice to connect to your Solr nodes in a round robin fashion (or via a
load blancer) if you wish -- just like with a non SolrCloud deployment
using something like master/slave.

What you might want to consider, is taking a look at something like
Net::ZooKeeper to have a ZK aware perl client layer that could wrap
WebService::Solr.


-Hoss
http://www.lucidworks.com/


4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-05 Thread Tim Vaillancourt
Hey guys,

I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
shards over 4 Solr instances, (which results in 1 core per Solr instance).

After some time in Production without issues, we are seeing errors related
to the IndexWriter all over our logs and an infinite loop of failing
replication from Leader on our 2 replicas.

We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed" stacktraces, then the Solr replica tries to
replicate/recover, then fails replication and then the following 2 errors
show up:

1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
POSSIBLE RESOURCE LEAK!!!"
2) "Error closing IndexWriter, trying rollback" (which results in a
null-pointer exception).

I'm guessing the best way forward would be to upgrade to latest, but that
is an undertaking that will take significant time/testing. In the meantime,
is there anything I can do to mitigate or understand the issue more?

Does anyone know what the IndexWriter errors refer to?

Below is a URL to a .txt file with summarized portions of my solr.log. Any
help is really appreciated as always!!

http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt

Thanks all,

Tim


Re: 4.3.1 SC - IndexWriter issues causing replication + failures

2014-02-06 Thread Tim Vaillancourt
Some more info to provide:

-Replication almost never completes following the "this IndexWriter is
closed" stacktraces.
-When the replication begins after "this IndexWriter is closed" error, over
a few hours the replica eventually fills the disk to 100% with index files
under data/. There are so many files in the data directory it can't be
listed and takes a very long time to delete. It seems the frequent
replications are filling the disk with new files whose sum is roughly 3
times larger than the real index. Is it leaking filehandles or forgetting
it has downloaded something?

Is this a better question for the lucene list? It seems (see below) that
this stacktrace is occuring in the lucene layer vs solr, but maybe someone
could confirm?

"ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199)
at
org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398)
at
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
... "

Thanks!

Tim


On 5 February 2014 13:04, Tim Vaillancourt  wrote:

> Hey guys,
>
> I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2
> shards over 4 Solr instances, (which results in 1 core per Solr instance).
>
> After some time in Production without issues, we are seeing errors related
> to the IndexWriter all over our logs and an infinite loop of failing
> replication from Leader on our 2 replicas.
>
> We see a flood of: "org.apache.lucene.store.AlreadyClosedException: this
> IndexWriter is closed" stacktraces, then the Solr replica tries to
> replicate/recover, then fails replication and then the following 2 errors
> show up:
>
> 1) "SolrIndexWriter was not closed prior to finalize(), indicates a bug --
> POSSIBLE RESOURCE LEAK!!!"
> 2) "Error closing IndexWriter, trying rollback" (which results in a
> null-pointer exception).
>
> I'm guessing the best way forward would be to upgrade to latest, but that
> is an undertaking that will take significant time/testing. In the meantime,
> is there anything I can do to mitigate or understand the issue more?
>
> Does anyone know what the IndexWriter errors refer to?
>
> Below is a URL to a .txt file with summarized portions of my solr.log. Any
> help is really appreciated as always!!
>
> http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt
>
> Thanks all,
>
> Tim
>


documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Hey guys,

This has to be a stupid question/I must be doing something wrong, but after
frequent load testing with documentCache enabled under Solr 4.3.1 with
autoWarmCount=150, I'm noticing that my documentCache metrics are always
zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see the
non-cumlative result, but after 100s of samples I still always get zero
values.

Here is the current output of my documentCache from Solr's admin for 1 core:

"

   - 
documentCache
  - class:org.apache.solr.search.LRUCache
  - version:1.0
  - description:LRU Cache(maxSize=512, initialSize=512,
  autowarmCount=150, regenerator=null)
  - src:$URL: https:/
  /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
  
solr/core/src/java/org/apache/solr/search/LRUCache.java$
  - stats:
 - lookups:0
 - hits:0
 - hitratio:0.00
 - inserts:0
 - evictions:0
 - size:0
 - warmupTime:0
 - cumulative_lookups:65198986
 - cumulative_hits:63075669
 - cumulative_hitratio:0.96
 - cumulative_inserts:2123317
 - cumulative_evictions:1010262
  "

The cumulative values seem to rise, suggesting doc cache is working, but at
the same time it seems I never see non-cumlative metrics, most importantly
warmupTime.

Am I doing something wrong, is this normal/by-design, or is there an issue
here?

Thanks for helping with my silly question! Have a good weekend,

Tim


Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
To answer some of my own question, Shawn H's great reply on this thread
explains why I see no autoWarming on doc cache:

http://www.marshut.com/iznwr/soft-commit-and-document-cache.html

It is still unclear to me why I see no other metrics, however.

Thanks Shawn,

Tim


On 28 June 2013 16:14, Tim Vaillancourt  wrote:

> Hey guys,
>
> This has to be a stupid question/I must be doing something wrong, but
> after frequent load testing with documentCache enabled under Solr 4.3.1
> with autoWarmCount=150, I'm noticing that my documentCache metrics are
> always zero for non-cumlative.
>
> At first I thought my commit rate is fast enough I just never see the
> non-cumlative result, but after 100s of samples I still always get zero
> values.
>
> Here is the current output of my documentCache from Solr's admin for 1
> core:
>
> "
>
>- 
> documentCache<http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache>
>   - class:org.apache.solr.search.LRUCache
>   - version:1.0
>   - description:LRU Cache(maxSize=512, initialSize=512,
>   autowarmCount=150, regenerator=null)
>   - src:$URL: https:/
>   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
>   
> solr/core/src/java/org/apache/solr/search/LRUCache.java<https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java>$
>   - stats:
>  - lookups:0
>  - hits:0
>  - hitratio:0.00
>  - inserts: 0
>  - evictions:0
>  - size:0
>  - warmupTime:0
>  - cumulative_lookups: 65198986
>  - cumulative_hits:63075669
>  - cumulative_hitratio:0.96
>  - cumulative_inserts: 2123317
>  - cumulative_evictions:1010262
>   "
>
> The cumulative values seem to rise, suggesting doc cache is working, but
> at the same time it seems I never see non-cumlative metrics, most
> importantly warmupTime.
>
> Am I doing something wrong, is this normal/by-design, or is there an issue
> here?
>
> Thanks for helping with my silly question! Have a good weekend,
>
> Tim
>
>
>
>


Re: documentCache not used in 4.3.1?

2013-06-28 Thread Tim Vaillancourt
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodnetic  wrote:

> Hi Tim,
>
> Not sure about the zeros in 4.3.1, but in SPM we see all these numbers
> are non-0, though I haven't had the chance to confirm with Solr 4.3.1.
>
> Note that you can't really autowarm document cache...
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt 
> wrote:
> > Hey guys,
> >
> > This has to be a stupid question/I must be doing something wrong, but
> after
> > frequent load testing with documentCache enabled under Solr 4.3.1 with
> > autoWarmCount=150, I'm noticing that my documentCache metrics are always
> > zero for non-cumlative.
> >
> > At first I thought my commit rate is fast enough I just never see the
> > non-cumlative result, but after 100s of samples I still always get zero
> > values.
> >
> > Here is the current output of my documentCache from Solr's admin for 1
> core:
> >
> > "
> >
> >- documentCache<
> http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache
> >
> >   - class:org.apache.solr.search.LRUCache
> >   - version:1.0
> >   - description:LRU Cache(maxSize=512, initialSize=512,
> >   autowarmCount=150, regenerator=null)
> >   - src:$URL: https:/
> >   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
> >   solr/core/src/java/org/apache/solr/search/LRUCache.java<
> https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java
> >$
> >   - stats:
> >  - lookups:0
> >  - hits:0
> >  - hitratio:0.00
> >  - inserts:0
> >  - evictions:0
> >  - size:0
> >  - warmupTime:0
> >  - cumulative_lookups:65198986
> >  - cumulative_hits:63075669
> >  - cumulative_hitratio:0.96
> >  - cumulative_inserts:2123317
> >  - cumulative_evictions:1010262
> >   "
> >
> > The cumulative values seem to rise, suggesting doc cache is working, but
> at
> > the same time it seems I never see non-cumlative metrics, most
> importantly
> > warmupTime.
> >
> > Am I doing something wrong, is this normal/by-design, or is there an
> issue
> > here?
> >
> > Thanks for helping with my silly question! Have a good weekend,
> >
> > Tim
>


Re: documentCache not used in 4.3.1?

2013-06-29 Thread Tim Vaillancourt

That's a good idea, I'll try that next week.

Thanks!

Tim

On 29/06/13 12:39 PM, Erick Erickson wrote:

Tim:

Yeah, this doesn't make much sense to me either since,
as you say, you should be seeing some metrics upon
occasion. But do note that the underlying cache only gets
filled when getting documents to return in query results,
since there's no autowarming going on it may come and
go.

But you can test this pretty quickly by lengthening your
autocommit interval or just not indexing anything
for a while, then run a bunch of queries and look at your
cache stats. That'll at least tell you whether it works at all.
You'll have to have hard commits turned off (or openSearcher
set to 'false') for that check too.

Best
Erick


On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Timwrote:


Yes, we are softCommit'ing every 1000ms, but that should be enough time to
see metrics though, right? For example, I still get non-cumulative metrics
from the other caches (which are also throw away). I've also curl/sampled
enough that I probably should have seen a value by now.

If anyone else can reproduce this on 4.3.1 I will feel less crazy :).

Cheers,

Tim

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, June 29, 2013 10:13 AM
To: solr-user@lucene.apache.org
Subject: Re: documentCache not used in 4.3.1?

It's especially weird that the hit ratio is so high and you're not seeing
anything in the cache. Are you perhaps soft committing frequently? Soft
commits throw away all the top-level caches including documentCache I
think....

Erick


On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourt
wrote:
Thanks Otis,

Yeah I realized after sending my e-mail that doc cache does not warm,
however I'm still lost on why there are no other metrics.

Thanks!

Tim


On 28 June 2013 16:22, Otis Gospodnetic
wrote:


Hi Tim,

Not sure about the zeros in 4.3.1, but in SPM we see all these
numbers are non-0, though I haven't had the chance to confirm with

Solr 4.3.1.

Note that you can't really autowarm document cache...

Otis
--
Solr&  ElasticSearch Support -- http://sematext.com/ Performance
Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt

wrote:

Hey guys,

This has to be a stupid question/I must be doing something wrong,
but

after

frequent load testing with documentCache enabled under Solr 4.3.1
with autoWarmCount=150, I'm noticing that my documentCache metrics
are

always

zero for non-cumlative.

At first I thought my commit rate is fast enough I just never see
the non-cumlative result, but after 100s of samples I still always
get zero values.

Here is the current output of my documentCache from Solr's admin
for 1

core:

"

- documentCache<

http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en
try=documentCache

   - class:org.apache.solr.search.LRUCache
   - version:1.0
   - description:LRU Cache(maxSize=512, initialSize=512,
   autowarmCount=150, regenerator=null)
   - src:$URL: https:/
   /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/
   solr/core/src/java/org/apache/solr/search/LRUCache.java<

https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s
olr/core/src/java/org/apache/solr/search/LRUCache.java

$
   - stats:
  - lookups:0
  - hits:0
  - hitratio:0.00
  - inserts:0
  - evictions:0
  - size:0
  - warmupTime:0
  - cumulative_lookups:65198986
  - cumulative_hits:63075669
  - cumulative_hitratio:0.96
  - cumulative_inserts:2123317
  - cumulative_evictions:1010262
   "

The cumulative values seem to rise, suggesting doc cache is
working,

but

at

the same time it seems I never see non-cumlative metrics, most

importantly

warmupTime.

Am I doing something wrong, is this normal/by-design, or is there
an

issue

here?

Thanks for helping with my silly question! Have a good weekend,

Tim


Re: preferred container for running SolrCloud

2013-07-13 Thread Tim Vaillancourt

We run Jetty 8 and 9 with Solr. No issues I can think of.

We use Jetty interally anyways, and it seemed to be the most common 
container out there for Solr (from reading this mailinglist, articles, 
etc), so that made me feel a bit better if I needed advice or help from 
the community - not to say there isn't a lot of Tomcat + Solr knowledge 
on the list.


Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM 
container in regards to Tomcat, but recent benchmarks I've seen out 
there seem to indicate Tomcat is on par or possibly faster now, although 
I believe while using more RAM. Don't quote me here. I'd love if someone 
could do a Solr-specific benchmark.


Another neat, but sort of unimportant tidbit is Google App Engine went 
with Jetty, which to me indicates the Jetty project isn't going away 
anytime soon. Who knows, Google may even submit back valuable 
improvements to the project. Live in hope!


Tim

On 11/07/13 08:14 PM, Saikat Kanjilal wrote:

One last thing, no issues with jetty.  The issues we did have was actually 
running separate zookeeper clusters.


From: sxk1...@hotmail.com
To: solr-user@lucene.apache.org
Subject: RE: preferred container for running SolrCloud
Date: Thu, 11 Jul 2013 20:13:27 -0700

Separate Zookeeper.


Date: Thu, 11 Jul 2013 19:27:18 -0700
Subject: Re: preferred container for running SolrCloud
From: docbook@gmail.com
To: solr-user@lucene.apache.org

With the embedded Zookeeper or separate Zookeeper? Also have run into any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilalwrote:


We're running under jetty.

Sent from my iPhone

On Jul 11, 2013, at 6:06 PM, "Ali, Saqib"  wrote:


1) Jboss
2) Jetty
3) Tomcat
4) Other..

?






Re: preferred container for running SolrCloud

2013-07-13 Thread Tim Vaillancourt

Very good point, Furkan.

The unit tests being ran against Jetty is another very good reason to 
feel safer on Jetty, IMHO. I'm assuming the SolrCloud ChaosMonkey tests 
are ran against Jetty as well?


Tim

On 13/07/13 02:46 PM, Furkan KAMACI wrote:

Of course you may have some reasons to use Tomcat or anything else (i.e.
your stuff may have more experience at Tomcat etc.) However developers
generally runs Jetty because it is default for Solr and I should point that
Solr unit tests run against jetty (in fact, a specific version of Jetty)
and well tested (if you search in mail list you can find some conversations
about it). If you follow Solr developer list you may realize using a well
tested container or not. For example:
https://issues.apache.org/jira/browse/SOLR-4716 and
https://issues.apache.org/jira/browse/SOLR-4584?focusedCommentId=13625276&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13625276can
show that there maybe some bugs for non Jetty containers and if you
choose any other container except for Jetty you can hit one of them.

If you want to look at the comparison of Jetty vs. Tomcat I suggest you
look at here:

http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server

and here:

http://www.infoq.com/news/2009/08/google-chose-jetty



2013/7/13 Tim Vaillancourt


We run Jetty 8 and 9 with Solr. No issues I can think of.

We use Jetty interally anyways, and it seemed to be the most common
container out there for Solr (from reading this mailinglist, articles,
etc), so that made me feel a bit better if I needed advice or help from the
community - not to say there isn't a lot of Tomcat + Solr knowledge on the
list.

Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM
container in regards to Tomcat, but recent benchmarks I've seen out there
seem to indicate Tomcat is on par or possibly faster now, although I
believe while using more RAM. Don't quote me here. I'd love if someone
could do a Solr-specific benchmark.

Another neat, but sort of unimportant tidbit is Google App Engine went
with Jetty, which to me indicates the Jetty project isn't going away
anytime soon. Who knows, Google may even submit back valuable improvements
to the project. Live in hope!

Tim


On 11/07/13 08:14 PM, Saikat Kanjilal wrote:


One last thing, no issues with jetty.  The issues we did have was
actually running separate zookeeper clusters.

  From: sxk1...@hotmail.com

To: solr-user@lucene.apache.org
Subject: RE: preferred container for running SolrCloud
Date: Thu, 11 Jul 2013 20:13:27 -0700

Separate Zookeeper.

  Date: Thu, 11 Jul 2013 19:27:18 -0700

Subject: Re: preferred container for running SolrCloud
From: docbook@gmail.com
To: solr-user@lucene.apache.org

With the embedded Zookeeper or separate Zookeeper? Also have run into
any
issues with running SolrCloud on jetty?


On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilal
wrote:

  We're running under jetty.

Sent from my iPhone

On Jul 11, 2013, at 6:06 PM, "Ali, Saqib"
  wrote:

  1) Jboss

2) Jetty
3) Tomcat
4) Other..

?





SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt
Hey guys,

I am reaching out to the Solr list with a very vague issue: under high load
against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas (2
cores per instance), I eventually see failure messages related to
transaction logs, and shortly after these stacktraces occur the cluster
starts to fall apart.

To explain my setup:
- SolrCloud 4.3.1.
- Jetty 9.x.
- Oracle/Sun JDK 1.7.25 w/CMS.
- RHEL 6.x 64-bit.
- 3 instances, 1 per server.
- 3 shards.
- 2 replicas per shard.

The transaction log error I receive after about 10-30 minutes of load
testing is:

"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)
/opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException:
java.io.EOFException
at
org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
at
org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83)
at
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:138)
at
org.apache.solr.update.UpdateHandler.(UpdateHandler.java:125)
at
org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
at
org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
at org.apache.solr.core.SolrCore.(SolrCore.java:805)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at
org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:982)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException
at
org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73)
at
org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
at
org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
at
org.apache.solr.update.TransactionLog.(TransactionLog.java:160)
... 25 more
"

Eventually after a few of these stack traces, the cluster starts to lose
shards and replicas fail. Jetty then creates hung threads until hitting
OutOfMemory on native threads due to the maximum process ulimit.

I know this is quite a vague issue, so I'm not expecting a silver-bullet
answer, but I was wondering if anyone has suggestions on where to look
next? Does this sound Solr-related at all, or possibly system? Has anyone
seen this issue before, or has any hypothesis how to find out more?

I will reply shortly with a thread dump, taken from 1 locked-up node.

Thanks for any suggestions!

Tim


Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt
Stack trace:

http://timvaillancourt.com.s3.amazonaws.com/tmp/solrcloud.nodeC.2013-07-25-16.jstack.gz

Cheers!

Tim


On 25 July 2013 16:44, Tim Vaillancourt  wrote:

> Hey guys,
>
> I am reaching out to the Solr list with a very vague issue: under high
> load against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas
> (2 cores per instance), I eventually see failure messages related to
> transaction logs, and shortly after these stacktraces occur the cluster
> starts to fall apart.
>
> To explain my setup:
> - SolrCloud 4.3.1.
> - Jetty 9.x.
> - Oracle/Sun JDK 1.7.25 w/CMS.
> - RHEL 6.x 64-bit.
> - 3 instances, 1 per server.
> - 3 shards.
> - 2 replicas per shard.
>
> The transaction log error I receive after about 10-30 minutes of load
> testing is:
>
> "ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
> Failure to open existing log file (non fatal)
> /opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException:
> java.io.EOFException
> at
> org.apache.solr.update.TransactionLog.(TransactionLog.java:182)
> at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233)
> at
> org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:138)
> at
> org.apache.solr.update.UpdateHandler.(UpdateHandler.java:125)
> at
> org.apache.solr.update.DirectUpdateHandler2.(DirectUpdateHandler2.java:95)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
> at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525)
> at
> org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596)
> at org.apache.solr.core.SolrCore.(SolrCore.java:805)
> at org.apache.solr.core.SolrCore.(SolrCore.java:618)
> at
> org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:982)
> at
> org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
> at
> org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.io.EOFException
> at
> org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73)
> at
> org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216)
> at
> org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266)
> at
> org.apache.solr.update.TransactionLog.(TransactionLog.java:160)
> ... 25 more
> "
>
> Eventually after a few of these stack traces, the cluster starts to lose
> shards and replicas fail. Jetty then creates hung threads until hitting
> OutOfMemory on native threads due to the maximum process ulimit.
>
> I know this is quite a vague issue, so I'm not expecting a silver-bullet
> answer, but I was wondering if anyone has suggestions on where to look
> next? Does this sound Solr-related at all, or possibly system? Has anyone
> seen this issue before, or has any hypothesis how to find out more?
>
> I will reply shortly with a thread dump, taken from 1 locked-up node.
>
> Thanks for any suggestions!
>
> Tim
>


Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt
Thanks for the reply Shawn, I can always count on you :).

We are using 10GB heaps and have over 100GB of OS cache free to answer the
JVM question, Young has about 50% of the heap, all CMS. Our max number of
processes for the JVM user is 10k, which is where Solr dies when it blows
up with 'cannot create native thread'.

I also want to say this is system related, but I am seeing this occur on
all 3 servers, which are brand-new Dell R720s. I'm not saying this is
impossible, but I don't see much to suggest that, and it would need to be
one hell of a coincidence.

To add more confusion to the mix, we actually run a 2nd SolrCloud cluster
on the same Solr, Jetty and JVM versions that do not exhibit this issue,
although using a completely different schema, servers and access-patterns,
although it is also at high-TPS. That is some evidence to say the current
software stack is OK, or maybe this only occurs under an extreme load that
2nd cluster does not see, or lastly only with a certain schema.

Lastly, to add a bit more detail to my original description, so far I have
tried:

- Entirely rebuilding my cluster from scratch, reinstalling all deps,
configs, reindexing the data (in case I screwed up somewhere). The EXACT
same issue occurs under load about 20-45 minutes in.
- Moving to Java 1.7.0_21 from _25 due to some known bugs. Same issue
occurs after some load.
- Restarting SolrCloud / forcing rebuilds or cores. Same issue occurs after
some load.

Cheers,

Tim


On 25 July 2013 17:13, Shawn Heisey  wrote:

> On 7/25/2013 5:44 PM, Tim Vaillancourt wrote:
>
>> The transaction log error I receive after about 10-30 minutes of load
>> testing is:
>>
>> "ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.**SolrException]
>> Failure to open existing log file (non fatal)
>> /opt/easw/easw_apps/easo_solr_**cloud/solr/xmshd_shard3_**
>> replica2/data/tlog/tlog.**078:org.**apache.solr.common.**
>> SolrException:
>> java.io.EOFException
>>
>
> 
>
>
>  Caused by: java.io.EOFException
>>  at
>> org.apache.solr.common.util.**FastInputStream.**readUnsignedByte(**
>> FastInputStream.java:73)
>>  at
>> org.apache.solr.common.util.**FastInputStream.readInt(**
>> FastInputStream.java:216)
>>  at
>> org.apache.solr.update.**TransactionLog.readHeader(**
>> TransactionLog.java:266)
>>  at
>> org.apache.solr.update.**TransactionLog.(**TransactionLog.java:160)
>>  ... 25 more
>> "
>>
>
> This looks to me like a system problem.  RHEL should be pretty solid, I
> use CentOS without any trouble.  My initial guesses are a corrupt
> filesystem, failing hardware, or possibly a kernel problem with your
> specific hardware.
>
> I'm running Jetty 8, which is the version that the example uses.  Could
> Jetty 9 be a problem here?  I couldn't really say, though my initial guess
> is that it's not a problem.
>
> I'm running Oracle Java 1.7.0_13.  Normally later releases are better, but
> Java bugs do exist and do get introduced in later releases.  Because you're
> on the absolute latest, I'm guessing that you had the problem with an
> earlier release and upgraded to see if it went away.  If that's what
> happened, it is less likely that it's Java.
>
> My first instinct would be to do a 'yum distro-sync' followed by 'touch
> /forcefsck' and reboot with console access to the server, so that you can
> deal with any fsck problems.  Perhaps you've already tried that. I'm aware
> that this could be very very hard to get pushed through strict change
> management procedures.
>
> I did some searching.  SOLR-4519 is a different problem, but it looks like
> it has a similar underlying exception, with no resolution.  It was filed
> When Solr 4.1.0 was current.
>
> Could there be a resource problem - heap too small, not enough OS disk
> cache, etc?
>
> Thanks,
> Shawn
>
>


Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-25 Thread Tim Vaillancourt

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not 
appearing after a previous crash. Every time I run this high-volume test 
that produced my stack trace, I zero out the logs, Solr data and 
Zookeeper data and start over from scratch with a brand new collection 
and zero'd out logs.


The test is mostly high volume (2000-4000 updates/sec) and at the start 
the SolrCloud runs decently for a good 20-60~ minutes, no errors in the 
logs at all. Then that stack trace occurs on all 3 nodes (staggered), I 
immediately get some replica down messages and then some "cannot 
connect" errors to all other cluster nodes, who have all crashed the 
same way. The tlog error could be a symptom of the problem of running 
out of threads perhaps.


Shawn: thanks so much for sharing those details! Yes, they seem to be 
nice servers, for sure - I don't get to touch/see them but they're fast! 
I'll look into firmwares for sure and will try again after updating 
them. These Solr instances are not-bare metal and are actually KVM VMs 
so that's another layer to look into, although it is consistent between 
the two clusters.


I am not currently increasing the 'nofiles' ulimit to above default like 
you are, but does Solr use 10,000+ file handles? It won't hurt to try it 
I guess :). To rule out Java 7, I'll probably also try Jetty 8 and Java 
1.6 as an experiment as well.


Thanks!

Tim

On 25/07/13 05:55 PM, Yonik Seeley wrote:

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt  wrote:

"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)


That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com


Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks for the reply Erick,

Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true

15sec hard commit was sort of a guess, I could try a smaller number. 
When you say "getting too large" what limit do you think it would be 
hitting: a ulimit (nofiles), disk space, number of changes, a limit in 
Solr itself?


By my math there would be 15 tlogs max per core, but I don't really know 
how it all works if someone could fill me in/point me somewhere.


Cheers,

Tim

On 27/07/13 07:57 AM, Erick Erickson wrote:

What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.

FWIW,
Erick

On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourt  
wrote:

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data and Zookeeper
data and start over from scratch with a brand new collection and zero'd out
logs.

The test is mostly high volume (2000-4000 updates/sec) and at the start the
SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs at
all. Then that stack trace occurs on all 3 nodes (staggered), I immediately
get some replica down messages and then some "cannot connect" errors to all
other cluster nodes, who have all crashed the same way. The tlog error could
be a symptom of the problem of running out of threads perhaps.

Shawn: thanks so much for sharing those details! Yes, they seem to be nice
servers, for sure - I don't get to touch/see them but they're fast! I'll
look into firmwares for sure and will try again after updating them. These
Solr instances are not-bare metal and are actually KVM VMs so that's another
layer to look into, although it is consistent between the two clusters.

I am not currently increasing the 'nofiles' ulimit to above default like you
are, but does Solr use 10,000+ file handles? It won't hurt to try it I guess
:). To rule out Java 7, I'll probably also try Jetty 8 and Java 1.6 as an
experiment as well.

Thanks!

Tim


On 25/07/13 05:55 PM, Yonik Seeley wrote:

On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourt
wrote:

"ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException]
Failure to open existing log file (non fatal)


That itself isn't necessarily a problem (and why it says "non fatal")
- it just means that most likely the a transaction log file was
truncated from a previous crash.  It may be unrelated to the other
issues you are seeing.

-Yonik
http://lucidworks.com


Re: SolrCloud 4.3.1 - "Failure to open existing log file (non fatal)" errors under high load

2013-07-27 Thread Tim Vaillancourt

Thanks Jack/Erick,

I don't know if this is true or not, but I've read there is a tlog per 
soft commit, which is then truncated by the hard commit. If this were 
true, a 15sec hard-commit with a 1sec soft-commit could generate around 
15~ tlogs, but I've never checked. I like Erick's scenario more if it is 
1 tlog/core though. I'll try to find out some more.



Another two test/things I really should try for sanity are:
- Java 1.6 and Jetty 8: just to rule things out (wouldn't actually 
launch this way).

- ulimit for 'nofiles': the default is pretty high but why not?
- Monitor size + # of tlogs.


I'll be sure to share findings and really appreciate the help guys!


PS: This is asking a lot, but if anyone can take a look at that thread 
dump, or give me some pointers on what to look for in a 
stall/thread-pile up thread dump like this, I would really appreciate 
it. I'm quite weak at deciphering those (I use Thread Dump Analyzer) but 
I'm sure it would tell a lot.



Cheers,


Tim


On 27/07/13 02:24 PM, Erick Erickson wrote:

Tim:

15 seconds isn't unreasonable, I was mostly wondering if it was hours.

Take a look at the size of the tlogs as you're indexing, you should see them
truncate every 15 seconds or so. There'll be a varying number of tlogs kept
around, although under heavy indexing I'd only expect 1 or 2 inactive ones,
the internal number is that there'll be enough tlogs kept around to
hold 100 docs.

There should only be 1 open tlog/core as I understand it. When a commit
happens (hard, openSearcher = true or false doesn't matter) the current
tlog is closed and a new one opened. Then some cleanup happens so there
are only enough tlogs kept around to hold 100 docs.

Strange, Im kind of out of ideas.
Erick

On Sat, Jul 27, 2013 at 4:41 PM, Jack Krupansky  wrote:

No hard numbers, but the general guidance is that you should set your hard
commit interval to match your expectations for how quickly nodes should come
up if they need to be restarted. Specifically, a hard commit assures that
all changes have been committed to disk and are ready for immediate access
on restart, but any and all soft commit changes since the last hard commit
must be "replayed" (reexecuted) on restart of a node.

How long does it take to replay the changes in the update log? No firm
numbers, but treat it as if all of those uncommitted updates had to be
resent and reprocessed by Solr. It's probably faster than that, but you get
the picture.

I would suggest thinking in terms of minutes rather than seconds for hard
commits 5 minutes, 10, 15, 20, 30 minutes.

Hard commits may result in kicking off segment merges, so too rapid a rate
of segment creation might cause problems or at least be counterproductive.

So, instead of 15 seconds, try 15 minutes.

OTOH, if you really need to handle 4,000 update a seconds... you are clearly
in "uncharted territory" and need to expect to need to do some heavy duty
trial and error tuning on your own.

-- Jack Krupansky

-Original Message- From: Tim Vaillancourt
Sent: Saturday, July 27, 2013 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud 4.3.1 - "Failure to open existing log file (non
fatal)" errors under high load


Thanks for the reply Erick,

Hard Commit - 15000ms, openSearcher=false
Soft Commit - 1000ms, openSearcher=true

15sec hard commit was sort of a guess, I could try a smaller number.
When you say "getting too large" what limit do you think it would be
hitting: a ulimit (nofiles), disk space, number of changes, a limit in
Solr itself?

By my math there would be 15 tlogs max per core, but I don't really know
how it all works if someone could fill me in/point me somewhere.

Cheers,

Tim

On 27/07/13 07:57 AM, Erick Erickson wrote:

What is your autocommit limit? Is it possible that your transaction
logs are simply getting too large? tlogs are truncated whenever
you do a hard commit (autocommit) with openSearcher either
true for false it doesn't matter.

FWIW,
Erick

On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourt
wrote:

Thanks Shawn and Yonik!

Yonik: I noticed this error appears to be fairly trivial, but it is not
appearing after a previous crash. Every time I run this high-volume test
that produced my stack trace, I zero out the logs, Solr data and
Zookeeper
data and start over from scratch with a brand new collection and zero'd
out
logs.

The test is mostly high volume (2000-4000 updates/sec) and at the start
the
SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs
at
all. Then that stack trace occurs on all 3 nodes (staggered), I
immediately
get some replica down messages and then some "cannot connect" errors to
all
other cluster nodes, who have all crashed the same way. The tlog error
could
be a symptom of the problem of running out of threads perhaps.

Shawn: thanks so much

Re: debian package for solr with jetty

2013-08-02 Thread Tim Vaillancourt

Hey guys,

It is by no means perfect or pretty, but I use this script below to 
build Solr into a .deb package that installs Solr to /opt/solr-VERSION 
with 'example' and 'docs' removed, and a symlink to /opt/solr. When 
building, the script wget's the tgz, builds it in a tmpdir within the 
cwd and makes a .deb.


There is no container included or anything, so this essentially builds a 
library-style package of Solr to be included by other packages, so it's 
probably not entirely what people are looking for here, but here goes:


solr-dpkg.sh:
"#!/bin/bash

set -e

VERSION=$1
if test -z ${VERSION};
then
  echo "Usage: $0 [SOLR VERSION]"
  exit 1
fi

NAME=solr
MIRROR_BASE="http://apache.mirror.iweb.ca";
PREFIX=/opt
PNAME=solr_${VERSION}
BUILD_BASE=$$
BUILD_DIR=${BUILD_BASE}/${PNAME}
START_DIR=${PWD}

# Clean build dir:
if test -e ${BUILD_DIR};
then
  rm -rf ${BUILD_DIR}
fi

# Wget solr:
SOLR_TAR=solr-${VERSION}.tgz
if test ! -e ${SOLR_TAR};
then
  wget -N ${MIRROR_BASE}/lucene/solr/${VERSION}/${SOLR_TAR}
fi

# Debian metadata:
mkdir -p ${BUILD_DIR} ${BUILD_DIR}/DEBIAN
cat <>${BUILD_DIR}/DEBIAN/control
Package: solr
Priority: extra
Maintainer: Tim Vaillancourt 
Section: libs
Homepage: http://lucene.apache.org/solr/
Version: ${VERSION}
Description: Apache Solr ${VERSION}
Architecture: all
EOF

# Unpack solr in correct location:
mkdir -p ${BUILD_DIR}${PREFIX}
tar xfz ${SOLR_TAR} -C ${BUILD_DIR}${PREFIX}
rm -rf ${BUILD_DIR}${PREFIX}/solr-${VERSION}/{docs,example}
ln -s ${PREFIX}/solr-${VERSION} ${BUILD_DIR}${PREFIX}/solr

# Package and cleanup after:
cd ${BUILD_BASE}
dpkg-deb -b ${PNAME} && \
  mv ${PNAME}.deb ${START_DIR}/${PNAME}.deb
cd ${START_DIR}
rm -rf ${BUILD_BASE}

exit 0
"

Usage example: "./solr-dpkg.sh 4.4.0"

In my setup I have other packages pointing to this package's path as a 
library with solr, jetty and the 'instance-package' separated. These 
packages depend on the version of the solr 'library package' built by 
this script.


Enjoy!

Tim

On 01/08/13 08:14 PM, Yago Riveiro wrote:

Some time ago a found this 
https://github.com/LucidWorks/solr-fabric/blob/master/solr-fabric-guide.md , 
Instead of puppet or chef (I don't know if it is a requirement) it is developed 
with fabric.

--
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, August 2, 2013 at 3:32 AM, Alexandre Rafalovitch wrote:


Well, it is one of the requests with a couple of vote on the Solr Usability
Contest:
https://solrstart.uservoice.com/forums/216001-usability-contest/suggestions/4249809-puppet-chef-configuration-to-automatically-setup-s


So, if somebody with the knowledge of those tools could review the space
and figure out what the state of the art for this is, it would be great. If
somebody could identify the gap and fill in, it would be awesome. :-)

Regards,
Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)


On Thu, Aug 1, 2013 at 10:25 PM, Michael Della Bitta<
michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com)>  
wrote:


There should be at least a good Chef recipe, since Chef uses Solr
internally. I'm not using anything of theirs, since we've thus far been a
Tomcat shop. If nothing exists, I should whip something up.
On Aug 1, 2013 3:06 PM, "Alexandre Rafalovitch"mailto:arafa...@gmail.com)>
wrote:


And are there good chef/puppet/etc rules for the public use? I could not
find when I looked.

Regards,
Alex

On 1 Aug 2013 11:32, "Michael Della Bitta"<
michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com)>  
wrote:


Hi Manasi,

We use Chef for this type of thing here at my current job. Have you
considered something like it?

Other ones to look at are Puppet, CFEngine, Salt, and Ansible.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062 | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions<https://twitter.com/Appinions>  | g+:
plus.google.com/appinions (http://plus.google.com/appinions)<






https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts




w: appinions.com<http://www.appinions.com/>


On Wed, Jul 31, 2013 at 8:10 PM, smanadmailto:sma...@gmail.com)>  wrote:


Hi,

I am trying to create a debian package for solr 4.3 (default

installation

with jetty).
Is there anything already available?

Also, I need 3 different cores so plan to create corresponding

packages

for

each of them to create solr core using admin/cores or collections







api.


I also want to use, solrcloud setup with external zookeeper ensemble,

whats

the be

Re: Adding Postgres and Mysql JDBC drivers to Solr

2013-08-11 Thread Tim Vaillancourt
Another option is defining the location of these jars in your 
solrconfig.xml and storing the libraries external to jetty, which has 
some advantages.


Eg: MySQL connector is located at '/opt/mysql_connector' and adding this 
to your solrconfig.xml alongside the other lib entities:


   

Cheers,

Tim

On 06/08/13 08:02 AM, Spadez wrote:

Thank you very much



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Internal shard communication - performance?

2013-08-11 Thread Tim Vaillancourt
For me the biggest deal with increased chatter between SolrCloud is 
object creation and GCs.


The resulting CPU load from the increase GCing seems to affect 
performance for me in some load tests, but I'm still trying to gather 
hard numbers on it.


Cheers,

Tim

On 07/08/13 04:05 PM, Shawn Heisey wrote:

On 8/7/2013 2:45 PM, Torsten Albrecht wrote:

I would like to run zookeeper external at my old master server.

So I have two zookeeper to control my cloud. The third and fourth 
zookeeper will be a virtual machine.


For true HA with zookepeer, you need at least three instances on 
separate physical hardware.  If you want to use VMs, that would be 
fine, but you must ensure that you aren't running more than one 
instance on the same physical server.


For best results, use an odd number of ZK instances.  With three ZK 
instances, one can go down and everything still works.  With five, two 
can go down and everything still works.


If you've got a fully switched network that's at least gigabit speed, 
then the network latency involved in internal communication shouldn't 
really matter.


Thanks,
Shawn



Re: SolrCloud Load Balancer "weight"

2013-08-15 Thread Tim Vaillancourt

Soon ended up being a while :), feel free to add any thoughts.

https://issues.apache.org/jira/browse/SOLR-5166

Tim

On 07/06/13 03:07 PM, Vaillancourt, Tim wrote:

Cool!

Having those values influenced by stats is a neat idea too. I'll get on that 
soon.

Tim

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, June 03, 2013 5:07 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Load Balancer "weight"


On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt  wrote:


Should I JIRA this? Thoughts?

Yeah - it's always been in the back of my mind - it's come up a few times - 
eventually we would like nodes to report some stats to zk to influence load 
balancing.

- mark


Re: Problems installing Solr4 in Jetty9

2013-08-17 Thread Tim Vaillancourt

Try adding 'ext' to your OPTIONS= line for Jetty.

Tim

On 16/08/13 05:04 AM, Dmitry Kan wrote:

Hi,

I have the following jar in jetty/lib/ext:

log4j-1.2.16.jar
slf4j-api-1.6.6.jar
slf4j-log4j12-1.6.6.jar
jcl-over-slf4j-1.6.6.jar
jul-to-slf4j-1.6.6.jar

do you?

Dmitry


On Thu, Aug 8, 2013 at 12:49 PM, Spadez  wrote:


Apparently this is the error:

2013-08-08 09:35:19.994:WARN:oejw.WebAppContext:main: Failed startup of
context
o.e.j.w.WebAppContext@64a20878
{/solr,file:/tmp/jetty-0.0.0.0-8080-solr.war-_solr-any-/webapp/,STARTING}{/solr.war}
org.apache.solr.common.SolrException: Could not find necessary SLF4j
logging
jars. If using Jetty, the SLF4j logging jars need to go in the jetty
lib/ext
directory. For other containers, the corresponding directory should be
used.
For more information, see: http://wiki.apache.org/solr/SolrLogging



--
View this message in context:
http://lucene.472066.n3.nabble.com/Problems-installing-Solr4-in-Jetty9-tp4083209p4083224.html
Sent from the Solr - User mailing list archive at Nabble.com.



Sharing SolrCloud collection configs w/overrides

2013-08-20 Thread Tim Vaillancourt
Hey guys,

I have a situation where I have a lot of collections that share the same
core config in Zookeeper. For each of my SolrCloud collections, 99.9% of
the config (schema.xml, solrcloud.xml) are the same, only the
DataImportHandler parameters are different for different database
names/credentials, per collection.

To provide the different DIH credentials per collection, I currently upload
many copies of the exact-same Solr config dir with 1 Xincluded file with
the 4-5 database parameters that are different alongside the schema.xml and
solrconfig.xml.

I don't feel this ideal and is wasting space in Zookeeper considering most
of my configs are duplicated.

At a high level, is there a way for me to share one config in Zookeeper
while having minor overrides to the variables?

Is there a way for me to XInclude a file outside of my Zookeeper config
dir, ie: could I XInclude arbitrary locations in Zookeeper so that I can
have the same config dir for all collections and a file in Zookeeper that
is external to the common config dir to apply the collection-specific
overrides?

To extend my question for Solr 4.4 core.properties files: am I stuck in the
same boat under Solr 4.4 if I have say 10 collections sharing one config,
but I want each to have a unique core.properties?

Cheers!

Tim


Re: Sharing SolrCloud collection configs w/overrides

2013-08-21 Thread Tim Vaillancourt
Well, the mention of DIH is a bit off-topic. I'll simplify and say all I
need is the ability to set ANY variables in solrconfig.xml without having
to make N number of copies of the same configuration to achieve that.
Essentially I need 10+ collections to use the exact same config dir in
Zookeeper with minor/trivial differences set in variables.

Your proposal of taking in values at core creation-time is a neat one and
would be a very flexible solution for a lot of use cases. My only concern
for my really-specific use cae is that I'd be setting DB user/passwords via
plain-text HTTP calls, but having this feature is better than not.

In a perfect world I'd like to be able to include files in Zookeeper (like
XInclude) that are outside the common config dir (eg:
'/configs/sharedconfig') all the collections would be sharing. On the other
hand, that sort of solution would open up the Zookeeper layout to arbitrary
files and could end up in a nightmare if not done carefully, however.

Would it be possible for Solr to support specifying multiple configs at
collection creation, that are merged or concatenated. This idea sounds
terrible to me even at this moment, but I wonder if there is something in
there..

Tim


Re: Sharing SolrCloud collection configs w/overrides

2013-09-01 Thread Tim Vaillancourt

Here you go Erick, feel free to update this.

I am unable to assign to you, but asked for someone to do so:

https://issues.apache.org/jira/browse/SOLR-5208

Cheers,

Tim

On 21/08/13 10:40 AM, Tim Vaillancourt wrote:
Well, the mention of DIH is a bit off-topic. I'll simplify and say all 
I need is the ability to set ANY variables in solrconfig.xml without 
having to make N number of copies of the same configuration to achieve 
that. Essentially I need 10+ collections to use the exact same config 
dir in Zookeeper with minor/trivial differences set in variables.


Your proposal of taking in values at core creation-time is a neat one 
and would be a very flexible solution for a lot of use cases. My only 
concern for my really-specific use cae is that I'd be setting DB 
user/passwords via plain-text HTTP calls, but having this feature is 
better than not.


In a perfect world I'd like to be able to include files in Zookeeper 
(like XInclude) that are outside the common config dir (eg: 
'/configs/sharedconfig') all the collections would be sharing. On the 
other hand, that sort of solution would open up the Zookeeper layout 
to arbitrary files and could end up in a nightmare if not done 
carefully, however.


Would it be possible for Solr to support specifying multiple configs 
at collection creation, that are merged or concatenated. This idea 
sounds terrible to me even at this moment, but I wonder if there is 
something in there..


Tim


SolrCloud 4.x hangs under high update volume

2013-09-03 Thread Tim Vaillancourt
r.handler.HandlerCollection.handle(HandlerCollection.java:109)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:445)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532)
at java.lang.Thread.run(Thread.java:724)"

Some questions I had were:
1) What exclusive locks does SolrCloud "make" when performing an update?
2) Keeping in mind I do not read or write java (sorry :D), could someone
help me understand "what" solr is locking in this case at
"org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61)"
when performing an update? That will help me understand where to look next.
3) It seems all threads in this state are waiting for "0x0007216e68d8",
is there a way to tell what "0x0007216e68d8" is?
4) Is there a limit to how many updates you can do in SolrCloud?
5) Wild-ass-theory: would more shards provide more locks (whatever they
are) on update, and thus more update throughput?

To those interested, I've provided a stacktrace of 1 of 3 nodes at this URL
in gzipped form:
https://s3.amazonaws.com/timvaillancourt.com/tmp/solr-jstack-2013-08-23.gz

Any help/suggestions/ideas on this issue, big or small, would be much
appreciated.

Thanks so much all!

Tim Vaillancourt


Re: DIH + Solr Cloud

2013-09-04 Thread Tim Vaillancourt

Hey Alejandro,

I guess it means what you call "more than one instance".

The request handlers are at the core-level, and not the Solr 
instance/global level, and within each of those cores you could have one 
or more data import handlers.


Most setups have 1 DIH per core at the handler location "/dataimport", 
but I believe you could have several, ie: "/dataimport2", "/dataimport3" 
if you had different DIH configs for each handler.


Within a single data import handler, you can have several "entities", 
which are what explain to the DIH processes how to get/index the data. 
What you can do here is have several entities that construct your index, 
and execute those entities with several separate HTTP calls to the DIH, 
thus creating more than one instance of the DIH process within 1 core 
and 1 DIH handler.


ie:

curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=suppliers"; 
&
curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=parts"; 
&
curl 
"http://localhost:8983/solr/core1/dataimport?command=full-import&entity=companies"; 
&


http://wiki.apache.org/solr/DataImportHandler#Commands

Cheers,

Tim

On 03/09/13 09:25 AM, Alejandro Calbazana wrote:

Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro



Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks guys! :)

Mark: this patch is much appreciated, I will try to test this shortly,
hopefully today.

For my curiosity/understanding, could someone explain to me quickly what
locks SolrCloud takes on updates? Was I on to something that more shards
decrease the chance for locking?

Secondly, I was wondering if someone could summarize what this patch
'fixes'? I'm not too familiar with Java and the solr codebase (working on
that though :D).

Cheers,

Tim



On 4 September 2013 09:52, Mark Miller  wrote:

> There is an issue if I remember right, but I can't find it right now.
>
> If anyone that has the problem could try this patch, that would be very
> helpful: http://pastebin.com/raw.php?i=aaRWwSGP
>
> - Mark
>
>
> On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma  >wrote:
>
> > Hi Mark,
> >
> > Got an issue to watch?
> >
> > Thanks,
> > Markus
> >
> > -Original message-
> > > From:Mark Miller 
> > > Sent: Wednesday 4th September 2013 16:55
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SolrCloud 4.x hangs under high update volume
> > >
> > > I'm going to try and fix the root cause for 4.5 - I've suspected what
> it
> > is since early this year, but it's never personally been an issue, so
> it's
> > rolled along for a long time.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt 
> > wrote:
> > >
> > > > Hey guys,
> > > >
> > > > I am looking into an issue we've been having with SolrCloud since the
> > > > beginning of our testing, all the way from 4.1 to 4.3 (haven't tested
> > 4.4.0
> > > > yet). I've noticed other users with this same issue, so I'd really
> > like to
> > > > get to the bottom of it.
> > > >
> > > > Under a very, very high rate of updates (2000+/sec), after 1-12 hours
> > we
> > > > see stalled transactions that snowball to consume all Jetty threads
> in
> > the
> > > > JVM. This eventually causes the JVM to hang with most threads waiting
> > on
> > > > the condition/stack provided at the bottom of this message. At this
> > point
> > > > SolrCloud instances then start to see their neighbors (who also have
> > all
> > > > threads hung) as down w/"Connection Refused", and the shards become
> > "down"
> > > > in state. Sometimes a node or two survives and just returns 503s "no
> > server
> > > > hosting shard" errors.
> > > >
> > > > As a workaround/experiment, we have tuned the number of threads
> sending
> > > > updates to Solr, as well as the batch size (we batch updates from
> > client ->
> > > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > > Client-to-Solr batching (1 update = 1 call to Solr), which also did
> not
> > > > help. Certain combinations of update threads and batch sizes seem to
> > > > mask/help the problem, but not resolve it entirely.
> > > >
> > > > Our current environment is the following:
> > > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > > - 3 x Zookeeper instances, external Java 7 JVM.
> > > > - 1 collection, 3 shards, 2 replicas (each node is a leader of 1
> shard
> > and
> > > > a replica of 1 shard).
> > > > - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a
> > good
> > > > day.
> > > > - 5000 max jetty threads (well above what we use when we are
> healthy),
> > > > Linux-user threads ulimit is 6000.
> > > > - Occurs under Jetty 8 or 9 (many versions).
> > > > - Occurs under Java 1.6 or 1.7 (several minor versions).
> > > > - Occurs under several JVM tunings.
> > > > - Everything seems to point to Solr itself, and not a Jetty or Java
> > version
> > > > (I hope I'm wrong).
> > > >
> > > > The stack trace that is holding up all my Jetty QTP threads is the
> > > > following, which seems to be waiting on a lock that I would very much
> > like
> > > > to understand further:
> > > >
> > > > "java.lang.Thread.State: WAITING (parking)
> > > >at sun.misc.Unsafe.park(Native Method)
> > > >- parking to wait for  <0x0007216e68d8> (a
>

Re: SolrCloud 4.x hangs under high update volume

2013-09-04 Thread Tim Vaillancourt
Thanks so much for the explanation Mark, I owe you one (many)!

We have this on our high TPS cluster and will run it through it's paces
tomorrow. I'll provide any feedback I can, more soon! :D

Cheers,

Tim


Re: SolrCloud 4.x hangs under high update volume

2013-09-05 Thread Tim Vaillancourt
Update: It is a bit too soon to tell, but about 6 hours into testing there
are no crashes with this patch. :)

We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard
cluster I mentioned above. 5000 updates per second total.

More tomorrow after a 24 hr soak!

Tim

On Wednesday, 4 September 2013, Tim Vaillancourt wrote:

> Thanks so much for the explanation Mark, I owe you one (many)!
>
> We have this on our high TPS cluster and will run it through it's paces
> tomorrow. I'll provide any feedback I can, more soon! :D
>
> Cheers,
>
> Tim
>


Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Hey guys,

(copy of my post to SOLR-5216)

We tested this patch and unfortunately encountered some serious issues a
few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are
writing about 5000 docs/sec total, using autoCommit to commit the updates
(no explicit commits).

Our environment:

Solr 4.3.1 w/SOLR-5216 patch.
Jetty 9, Java 1.7.
3 solr instances, 1 per physical server.
1 collection.
3 shards.
2 replicas (each instance is a leader and a replica).
Soft autoCommit is 1000ms.
Hard autoCommit is 15000ms.

After about 6 hours of stress-testing this patch, we see many of these
stalled transactions (below), and the Solr instances start to see each
other as down, flooding our Solr logs with "Connection Refused" exceptions,
and otherwise no obviously-useful logs that I could see.

I did notice some stalled transactions on both /select and /update,
however. This never occurred without this patch.

Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC
Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9

Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak.
My script "normalizes" the ERROR-severity stack traces and returns them in
order of occurrence.

Summary of my solr.log: http://pastebin.com/pBdMAWeb

Thanks!

Tim Vaillancourt


On 6 September 2013 07:27, Markus Jelsma  wrote:

> Thanks!
>
> -Original message-
> > From:Erick Erickson 
> > Sent: Friday 6th September 2013 16:20
> > To: solr-user@lucene.apache.org
> > Subject: Re: SolrCloud 4.x hangs under high update volume
> >
> > Markus:
> >
> > See: https://issues.apache.org/jira/browse/SOLR-5216
> >
> >
> > On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma
> > wrote:
> >
> > > Hi Mark,
> > >
> > > Got an issue to watch?
> > >
> > > Thanks,
> > > Markus
> > >
> > > -Original message-
> > > > From:Mark Miller 
> > > > Sent: Wednesday 4th September 2013 16:55
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: SolrCloud 4.x hangs under high update volume
> > > >
> > > > I'm going to try and fix the root cause for 4.5 - I've suspected
> what it
> > > is since early this year, but it's never personally been an issue, so
> it's
> > > rolled along for a long time.
> > > >
> > > > Mark
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt 
> > > wrote:
> > > >
> > > > > Hey guys,
> > > > >
> > > > > I am looking into an issue we've been having with SolrCloud since
> the
> > > > > beginning of our testing, all the way from 4.1 to 4.3 (haven't
> tested
> > > 4.4.0
> > > > > yet). I've noticed other users with this same issue, so I'd really
> > > like to
> > > > > get to the bottom of it.
> > > > >
> > > > > Under a very, very high rate of updates (2000+/sec), after 1-12
> hours
> > > we
> > > > > see stalled transactions that snowball to consume all Jetty
> threads in
> > > the
> > > > > JVM. This eventually causes the JVM to hang with most threads
> waiting
> > > on
> > > > > the condition/stack provided at the bottom of this message. At this
> > > point
> > > > > SolrCloud instances then start to see their neighbors (who also
> have
> > > all
> > > > > threads hung) as down w/"Connection Refused", and the shards become
> > > "down"
> > > > > in state. Sometimes a node or two survives and just returns 503s
> "no
> > > server
> > > > > hosting shard" errors.
> > > > >
> > > > > As a workaround/experiment, we have tuned the number of threads
> sending
> > > > > updates to Solr, as well as the batch size (we batch updates from
> > > client ->
> > > > > solr), and the Soft/Hard autoCommits, all to no avail. Turning off
> > > > > Client-to-Solr batching (1 update = 1 call to Solr), which also
> did not
> > > > > help. Certain combinations of update threads and batch sizes seem
> to
> > > > > mask/help the problem, but not resolve it entirely.
> > > > >
> > > > > Our current environment is the following:
> > > > > - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7.
> > > > > - 3 x Zookeepe

Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Hey Mark,

The farthest we've made it at the same batch size/volume was 12 hours
without this patch, but that isn't consistent. Sometimes we would only get
to 6 hours or less.

During the crash I can see an amazing spike in threads to 10k which is
essentially our ulimit for the JVM, but I strangely see no "OutOfMemory:
cannot open native thread errors" that always follow this. Weird!

We also notice a spike in CPU around the crash. The instability caused some
shard recovery/replication though, so that CPU may be a symptom of the
replication, or is possibly the root cause. The CPU spikes from about
20-30% utilization (system + user) to 60% fairly sharply, so the CPU, while
spiking isn't quite "pinned" (very beefy Dell R720s - 16 core Xeons, whole
index is in 128GB RAM, 6xRAID10 15k).

More on resources: our disk I/O seemed to spike about 2x during the crash
(about 1300kbps written to 3500kbps), but this may have been the
replication, or ERROR logging (we generally log nothing due to
WARN-severity unless something breaks).

Lastly, I found this stack trace occurring frequently, and have no idea
what it is (may be useful or not):

"java.lang.IllegalStateException :
  at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
  at org.eclipse.jetty.server.Response.sendError(Response.java:325)
  at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
  at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
  at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
  at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
  at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
  at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
  at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
  at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
  at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
  at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
  at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
  at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
  at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.eclipse.jetty.server.Server.handle(Server.java:445)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
  at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
  at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
  at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
  at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
  at java.lang.Thread.run(Thread.java:724)"

On your live_nodes question, I don't have historical data on this from when
the crash occurred, which I guess is what you're looking for. I could add
this to our monitoring for future tests, however. I'd be glad to continue
further testing, but I think first more monitoring is needed to understand
this further. Could we come up with a list of metrics that would be useful
to see following another test and successful crash?

Metrics needed:

1) # of live_nodes.
2) Full stack traces.
3) CPU used by Solr's JVM specifically (instead of system-wide).
4) Solr's JVM thread count (already done)
5) ?

Cheers,

Tim Vaillancourt


On 6 September 2013 13:11, Mark Miller  wrote:

> Did you ever get to index that long before without hitting the deadlock?
>
> There really isn't anything negative the patch could be introducing, other
> than allowing for some more threads to possibly run at once. If I had to
> guess, I would say its likely this patch fixes the deadlock issue and your
> seeing another issue - which looks like the system cannot keep up with the
> requests or something for some reason - perhaps due to some OS networking
> settings or something (more guessing). Connection refused happens generally
> when there is nothing listening on the port.
>
> Do you see anything interesting change with the rest of the system? CPU
> usage spikes or something like that?
>
> Clamping down further on the overall number of threads night help (which
> would require making something configurable). How many nodes are listed in
> zk under live_nodes?

Re: SolrCloud 4.x hangs under high update volume

2013-09-06 Thread Tim Vaillancourt
Enjoy your trip, Mark! Thanks again for the help!

Tim

On 6 September 2013 14:18, Mark Miller  wrote:

> Okay, thanks, useful info. Getting on a plane, but ill look more at this
> soon. That 10k thread spike is good to know - that's no good and could
> easily be part of the problem. We want to keep that from happening.
>
> Mark
>
> Sent from my iPhone
>
> On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt  wrote:
>
> > Hey Mark,
> >
> > The farthest we've made it at the same batch size/volume was 12 hours
> > without this patch, but that isn't consistent. Sometimes we would only
> get
> > to 6 hours or less.
> >
> > During the crash I can see an amazing spike in threads to 10k which is
> > essentially our ulimit for the JVM, but I strangely see no "OutOfMemory:
> > cannot open native thread errors" that always follow this. Weird!
> >
> > We also notice a spike in CPU around the crash. The instability caused
> some
> > shard recovery/replication though, so that CPU may be a symptom of the
> > replication, or is possibly the root cause. The CPU spikes from about
> > 20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
> while
> > spiking isn't quite "pinned" (very beefy Dell R720s - 16 core Xeons,
> whole
> > index is in 128GB RAM, 6xRAID10 15k).
> >
> > More on resources: our disk I/O seemed to spike about 2x during the crash
> > (about 1300kbps written to 3500kbps), but this may have been the
> > replication, or ERROR logging (we generally log nothing due to
> > WARN-severity unless something breaks).
> >
> > Lastly, I found this stack trace occurring frequently, and have no idea
> > what it is (may be useful or not):
> >
> > "java.lang.IllegalStateException :
> >  at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
> >  at org.eclipse.jetty.server.Response.sendError(Response.java:325)
> >  at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
> >  at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
> >  at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
> >  at
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
> >  at
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
> >  at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
> >  at
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
> >  at
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
> >  at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083)
> >  at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379)
> >  at
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175)
> >  at
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017)
> >  at
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136)
> >  at
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258)
> >  at
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
> >  at
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> >  at org.eclipse.jetty.server.Server.handle(Server.java:445)
> >  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260)
> >  at
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225)
> >  at
> >
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358)
> >  at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596)
> >  at
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527)
> >  at java.lang.Thread.run(Thread.java:724)"
> >
> > On your live_nodes question, I don't have historical data on this from
> when
> > the crash occurred, which I guess is what you're looking for. I could add
> > this to our monitoring for future tests, however. I'd be glad to continue
> > further testing, but I think first more monitoring is needed to
> understand
> >

Re: solrcloud shards backup/restoration

2013-09-06 Thread Tim Vaillancourt
I wouldn't say I love this idea, but wouldn't it be safe to LVM snapshot
the Solr index? I think this may even work on a live server, depending on
some file I/O details. Has anyone tried this?

An in-Solr solution sounds more elegant, but considering the tlog concern
Shalin mentioned, I think this may work as an interim solution.

Cheers!

Tim


On 6 September 2013 15:41, Aditya Sakhuja  wrote:

> Thanks Shalin and Mark for your responses. I am on the same page about the
> conventions for taking the backup. However, I am less sure about the
> restoration of the index. Lets say we have 3 shards across 3 solrcloud
> servers.
>
> 1.> I am assuming we should take a backup from each of the shard leaders to
> get a complete collection. do you think that will get the complete index (
> not worrying about what is not hard committed at the time of backup ). ?
>
> 2.> How do we go about restoring the index in a fresh solrcloud cluster ?
> From the structure of the snapshot I took, I did not see any
> replication.properties or index.properties  which I see normally on a
> healthy solrcloud cluster nodes.
> if I have the snapshot named snapshot.20130905 does the snapshot.20130905/*
> go into data/index ?
>
> Thanks
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:
>
> > Phone typing. The end should not say "don't hard commit" - it should say
> > "do a hard commit and take a snapshot".
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
> >
> > > I don't know that it's too bad though - its always been the case that
> if
> > you do a backup while indexing, it's just going to get up to the last
> hard
> > commit. With SolrCloud that will still be the case. So just make sure you
> > do a hard commit right before taking the backup - yes, it might miss a
> few
> > docs in the tran log, but if you are taking a back up while indexing, you
> > don't have great precision in any case - you will roughly get a snapshot
> > for around that time - even without SolrCloud, if you are worried about
> > precision and getting every update into that backup, you want to stop
> > indexing and commit first. But if you just want a rough snapshot for
> around
> > that time, in both cases you can still just don't hard commit and take a
> > snapshot.
> > >
> > > Mark
> > >
> > > Sent from my iPhone
> > >
> > > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> > >
> > >> The replication handler's backup command was built for pre-SolrCloud.
> > >> It takes a snapshot of the index but it is unaware of the transaction
> > >> log which is a key component in SolrCloud. Hence unless you stop
> > >> updates, commit your changes and then take a backup, you will likely
> > >> miss some updates.
> > >>
> > >> That being said, I'm curious to see how peer sync behaves when you try
> > >> to restore from a snapshot. When you say that you haven't been
> > >> successful in restoring, what exactly is the behaviour you observed?
> > >>
> > >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
> > aditya.sakh...@gmail.com> wrote:
> > >>> Hello,
> > >>>
> > >>> I was looking for a good backup / recovery solution for the solrcloud
> > >>> indexes. I am more looking for restoring the indexes from the index
> > >>> snapshot, which can be taken using the replicationHandler's backup
> > command.
> > >>>
> > >>> I am looking for something that works with solrcloud 4.3 eventually,
> > but
> > >>> still relevant if you tested with a previous version.
> > >>>
> > >>> I haven't been successful in have the restored index replicate across
> > the
> > >>> new replicas, after I restart all the nodes, with one node having the
> > >>> restored index.
> > >>>
> > >>> Is restoring the indexes on all the nodes the best way to do it ?
> > >>> --
> > >>> Regards,
> > >>> -Aditya Sakhuja
> > >>
> > >>
> > >>
> > >> --
> > >> Regards,
> > >> Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> -Aditya Sakhuja
>


Re: SolrCloud 4.x hangs under high update volume

2013-09-10 Thread Tim Vaillancourt
Hey guys,

Based on my understanding of the problem we are encountering, I feel we've
been able to reduce the likelihood of this issue by making the following
changes to our app's usage of SolrCloud:

1) We increased our document batch size to 200 from 10 - our app batches
updates to reduce HTTP requests/overhead. The theory is increasing the
batch size reduces the likelihood of this issue happening.
2) We reduced to 1 application node sending updates to SolrCloud - we write
Solr updates to Redis, and have previously had 4 application nodes pushing
the updates to Solr (popping off the Redis queue). Reducing the number of
nodes pushing to Solr reduces the concurrency on SolrCloud.
3) Less threads pushing to SolrCloud - due to the increase in batch size,
we were able to go down to 5 update threads on the update-pushing-app (from
10 threads).

To be clear the above only reduces the likelihood of the issue happening,
and DOES NOT actually resolve the issue at hand.

If we happen to encounter issues with the above 3 changes, the next steps
(I could use some advice on) are:

1) Increase the number of shards (2x) - the theory here is this reduces the
locking on shards because there are more shards. Am I onto something here,
or will this not help at all?
2) Use CloudSolrServer - currently we have a plain-old least-connection
HTTP VIP. If we go "direct" to what we need to update, this will reduce
concurrency in SolrCloud a bit. Thoughts?

Thanks all!

Cheers,

Tim


On 6 September 2013 14:47, Tim Vaillancourt  wrote:

> Enjoy your trip, Mark! Thanks again for the help!
>
> Tim
>
>
> On 6 September 2013 14:18, Mark Miller  wrote:
>
>> Okay, thanks, useful info. Getting on a plane, but ill look more at this
>> soon. That 10k thread spike is good to know - that's no good and could
>> easily be part of the problem. We want to keep that from happening.
>>
>> Mark
>>
>> Sent from my iPhone
>>
>> On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt 
>> wrote:
>>
>> > Hey Mark,
>> >
>> > The farthest we've made it at the same batch size/volume was 12 hours
>> > without this patch, but that isn't consistent. Sometimes we would only
>> get
>> > to 6 hours or less.
>> >
>> > During the crash I can see an amazing spike in threads to 10k which is
>> > essentially our ulimit for the JVM, but I strangely see no "OutOfMemory:
>> > cannot open native thread errors" that always follow this. Weird!
>> >
>> > We also notice a spike in CPU around the crash. The instability caused
>> some
>> > shard recovery/replication though, so that CPU may be a symptom of the
>> > replication, or is possibly the root cause. The CPU spikes from about
>> > 20-30% utilization (system + user) to 60% fairly sharply, so the CPU,
>> while
>> > spiking isn't quite "pinned" (very beefy Dell R720s - 16 core Xeons,
>> whole
>> > index is in 128GB RAM, 6xRAID10 15k).
>> >
>> > More on resources: our disk I/O seemed to spike about 2x during the
>> crash
>> > (about 1300kbps written to 3500kbps), but this may have been the
>> > replication, or ERROR logging (we generally log nothing due to
>> > WARN-severity unless something breaks).
>> >
>> > Lastly, I found this stack trace occurring frequently, and have no idea
>> > what it is (may be useful or not):
>> >
>> > "java.lang.IllegalStateException :
>> >  at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)
>> >  at org.eclipse.jetty.server.Response.sendError(Response.java:325)
>> >  at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)
>> >  at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)
>> >  at
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
>> >  at
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)
>> >  at
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)
>> >  at
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)
>> >  at
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564)
>> >  at
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213)
>> >  at
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(C

Re: SolrCloud 4.x hangs under high update volume

2013-09-11 Thread Tim Vaillancourt

Thanks Erick!

Yeah, I think the next step will be CloudSolrServer with the SOLR-4816 
patch. I think that is a very, very useful patch by the way. SOLR-5232 
seems promising as well.


I see your point on the more-shards idea, this is obviously a 
global/instance-level lock. If I really had to, I suppose I could run 
more Solr instances to reduce locking then? Currently I have 2 cores per 
instance and I could go 1-to-1 to simplify things.


The good news is we seem to be more stable since changing to a bigger 
client->solr batch-size and fewer client threads updating.


Cheers,

Tim

On 11/09/13 04:19 AM, Erick Erickson wrote:

If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
copy of the 4x branch. By "recent", I mean like today, it looks like Mark
applied this early this morning. But several reports indicate that this will
solve your problem.

I would expect that increasing the number of shards would make the problem
worse, not
better.

There's also SOLR-5232...

Best
Erick


On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourtwrote:


Hey guys,

Based on my understanding of the problem we are encountering, I feel we've
been able to reduce the likelihood of this issue by making the following
changes to our app's usage of SolrCloud:

1) We increased our document batch size to 200 from 10 - our app batches
updates to reduce HTTP requests/overhead. The theory is increasing the
batch size reduces the likelihood of this issue happening.
2) We reduced to 1 application node sending updates to SolrCloud - we write
Solr updates to Redis, and have previously had 4 application nodes pushing
the updates to Solr (popping off the Redis queue). Reducing the number of
nodes pushing to Solr reduces the concurrency on SolrCloud.
3) Less threads pushing to SolrCloud - due to the increase in batch size,
we were able to go down to 5 update threads on the update-pushing-app (from
10 threads).

To be clear the above only reduces the likelihood of the issue happening,
and DOES NOT actually resolve the issue at hand.

If we happen to encounter issues with the above 3 changes, the next steps
(I could use some advice on) are:

1) Increase the number of shards (2x) - the theory here is this reduces the
locking on shards because there are more shards. Am I onto something here,
or will this not help at all?
2) Use CloudSolrServer - currently we have a plain-old least-connection
HTTP VIP. If we go "direct" to what we need to update, this will reduce
concurrency in SolrCloud a bit. Thoughts?

Thanks all!

Cheers,

Tim


On 6 September 2013 14:47, Tim Vaillancourt  wrote:


Enjoy your trip, Mark! Thanks again for the help!

Tim


On 6 September 2013 14:18, Mark Miller  wrote:


Okay, thanks, useful info. Getting on a plane, but ill look more at this
soon. That 10k thread spike is good to know - that's no good and could
easily be part of the problem. We want to keep that from happening.

Mark

Sent from my iPhone

On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt
wrote:


Hey Mark,

The farthest we've made it at the same batch size/volume was 12 hours
without this patch, but that isn't consistent. Sometimes we would only

get

to 6 hours or less.

During the crash I can see an amazing spike in threads to 10k which is
essentially our ulimit for the JVM, but I strangely see no

"OutOfMemory:

cannot open native thread errors" that always follow this. Weird!

We also notice a spike in CPU around the crash. The instability caused

some

shard recovery/replication though, so that CPU may be a symptom of the
replication, or is possibly the root cause. The CPU spikes from about
20-30% utilization (system + user) to 60% fairly sharply, so the CPU,

while

spiking isn't quite "pinned" (very beefy Dell R720s - 16 core Xeons,

whole

index is in 128GB RAM, 6xRAID10 15k).

More on resources: our disk I/O seemed to spike about 2x during the

crash

(about 1300kbps written to 3500kbps), but this may have been the
replication, or ERROR logging (we generally log nothing due to
WARN-severity unless something breaks).

Lastly, I found this stack trace occurring frequently, and have no

idea

what it is (may be useful or not):

"java.lang.IllegalStateException :
  at

org.eclipse.jetty.server.Response.resetBuffer(Response.java:964)

  at org.eclipse.jetty.server.Response.sendError(Response.java:325)
  at


org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380)

  at


org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)

  at


org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423)

  at


org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450)

  at


org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138)

  

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
Lol, at breaking during a demo - always the way it is! :) I agree, we are
just tip-toeing around the issue, but waiting for 4.5 is definitely an
option if we "get-by" for now in testing; patched Solr versions seem to
make people uneasy sometimes :).

Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up
worse due to less limitations on thread), I'm guessing only SOLR-5232 and
SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a
world of difference!

Thanks so much again guys!

Tim



On 12 September 2013 03:43, Erick Erickson  wrote:

> Fewer client threads updating makes sense, and going to 1 core also seems
> like it might help. But it's all a crap-shoot unless the underlying cause
> gets fixed up. Both would improve things, but you'll still hit the problem
> sometime, probably when doing a demo for your boss ;).
>
> Adrien has branched the code for SOLR 4.5 in preparation for a release
> candidate tentatively scheduled for next week. You might just start working
> with that branch if you can rather than apply individual patches...
>
> I suspect there'll be a couple more changes to this code (looks like
> Shikhar already raised an issue for instance) before 4.5 is finally cut...
>
> FWIW,
> Erick
>
>
>
> On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt  >wrote:
>
> > Thanks Erick!
> >
> > Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> > patch. I think that is a very, very useful patch by the way. SOLR-5232
> > seems promising as well.
> >
> > I see your point on the more-shards idea, this is obviously a
> > global/instance-level lock. If I really had to, I suppose I could run
> more
> > Solr instances to reduce locking then? Currently I have 2 cores per
> > instance and I could go 1-to-1 to simplify things.
> >
> > The good news is we seem to be more stable since changing to a bigger
> > client->solr batch-size and fewer client threads updating.
> >
> > Cheers,
> >
> > Tim
> >
> > On 11/09/13 04:19 AM, Erick Erickson wrote:
> >
> >> If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent
> >> copy of the 4x branch. By "recent", I mean like today, it looks like
> Mark
> >> applied this early this morning. But several reports indicate that this
> >> will
> >> solve your problem.
> >>
> >> I would expect that increasing the number of shards would make the
> problem
> >> worse, not
> >> better.
> >>
> >> There's also SOLR-5232...
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourt **com
> >> >wrote:
> >>
> >>  Hey guys,
> >>>
> >>> Based on my understanding of the problem we are encountering, I feel
> >>> we've
> >>> been able to reduce the likelihood of this issue by making the
> following
> >>> changes to our app's usage of SolrCloud:
> >>>
> >>> 1) We increased our document batch size to 200 from 10 - our app
> batches
> >>> updates to reduce HTTP requests/overhead. The theory is increasing the
> >>> batch size reduces the likelihood of this issue happening.
> >>> 2) We reduced to 1 application node sending updates to SolrCloud - we
> >>> write
> >>> Solr updates to Redis, and have previously had 4 application nodes
> >>> pushing
> >>> the updates to Solr (popping off the Redis queue). Reducing the number
> of
> >>> nodes pushing to Solr reduces the concurrency on SolrCloud.
> >>> 3) Less threads pushing to SolrCloud - due to the increase in batch
> size,
> >>> we were able to go down to 5 update threads on the update-pushing-app
> >>> (from
> >>> 10 threads).
> >>>
> >>> To be clear the above only reduces the likelihood of the issue
> happening,
> >>> and DOES NOT actually resolve the issue at hand.
> >>>
> >>> If we happen to encounter issues with the above 3 changes, the next
> steps
> >>> (I could use some advice on) are:
> >>>
> >>> 1) Increase the number of shards (2x) - the theory here is this reduces
> >>> the
> >>> locking on shards because there are more shards. Am I onto something
> >>> here,
> >>> or will this not help at all?
> >>> 2) Use CloudSolrServer - currently we have a plain-old least-connection
> >>> HTTP VIP. If we go "dire

Re: SolrCloud 4.x hangs under high update volume

2013-09-12 Thread Tim Vaillancourt
That makes sense, thanks Erick and Mark for you help! :)

I'll see if I can find a place to assist with the testing of SOLR-5232.

Cheers,

Tim



On 12 September 2013 11:16, Mark Miller  wrote:

> Right, I don't see SOLR-5232 making 4.5 unfortunately. It could perhaps
> make a 4.5.1 - it does resolve a critical issue - but 4.5 is in motion and
> SOLR-5232 is not quite ready - we need some testing.
>
> - Mark
>
> On Sep 12, 2013, at 2:12 PM, Erick Erickson 
> wrote:
>
> > My take on it is this, assuming I'm reading this right:
> > 1> SOLR-5216 - probably not going anywhere, 5232 will take care of it.
> > 2> SOLR-5232 - expected to fix the underlying issue no matter whether
> > you're using CloudSolrServer from SolrJ or sending lots of updates from
> > lots of clients.
> > 3> SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the
> > meantime.
> >
> > I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it
> > hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0
> > is looking like it'll be ready to cut next week so it might not be
> included.
> >
> > Best,
> > Erick
> >
> >
> > On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt  >wrote:
> >
> >> Lol, at breaking during a demo - always the way it is! :) I agree, we
> are
> >> just tip-toeing around the issue, but waiting for 4.5 is definitely an
> >> option if we "get-by" for now in testing; patched Solr versions seem to
> >> make people uneasy sometimes :).
> >>
> >> Seeing there seems to be some danger to SOLR-5216 (in some ways it
> blows up
> >> worse due to less limitations on thread), I'm guessing only SOLR-5232
> and
> >> SOLR-4816 are making it into 4.5? I feel those 2 in combination will
> make a
> >> world of difference!
> >>
> >> Thanks so much again guys!
> >>
> >> Tim
> >>
> >>
> >>
> >> On 12 September 2013 03:43, Erick Erickson 
> >> wrote:
> >>
> >>> Fewer client threads updating makes sense, and going to 1 core also
> seems
> >>> like it might help. But it's all a crap-shoot unless the underlying
> cause
> >>> gets fixed up. Both would improve things, but you'll still hit the
> >> problem
> >>> sometime, probably when doing a demo for your boss ;).
> >>>
> >>> Adrien has branched the code for SOLR 4.5 in preparation for a release
> >>> candidate tentatively scheduled for next week. You might just start
> >> working
> >>> with that branch if you can rather than apply individual patches...
> >>>
> >>> I suspect there'll be a couple more changes to this code (looks like
> >>> Shikhar already raised an issue for instance) before 4.5 is finally
> >> cut...
> >>>
> >>> FWIW,
> >>> Erick
> >>>
> >>>
> >>>
> >>> On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt <
> t...@elementspace.com
> >>>> wrote:
> >>>
> >>>> Thanks Erick!
> >>>>
> >>>> Yeah, I think the next step will be CloudSolrServer with the SOLR-4816
> >>>> patch. I think that is a very, very useful patch by the way. SOLR-5232
> >>>> seems promising as well.
> >>>>
> >>>> I see your point on the more-shards idea, this is obviously a
> >>>> global/instance-level lock. If I really had to, I suppose I could run
> >>> more
> >>>> Solr instances to reduce locking then? Currently I have 2 cores per
> >>>> instance and I could go 1-to-1 to simplify things.
> >>>>
> >>>> The good news is we seem to be more stable since changing to a bigger
> >>>> client->solr batch-size and fewer client threads updating.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> Tim
> >>>>
> >>>> On 11/09/13 04:19 AM, Erick Erickson wrote:
> >>>>
> >>>>> If you use CloudSolrServer, you need to apply SOLR-4816 or use a
> >> recent
> >>>>> copy of the 4x branch. By "recent", I mean like today, it looks like
> >>> Mark
> >>>>> applied this early this morning. But several reports indicate that
> >> this
> >>>>> will
> >>>>> solve your problem.
> >>>>>
> >>&

Re: App server?

2013-10-02 Thread Tim Vaillancourt
Jetty should be sufficient, and is the more-common container for Solr.
Also, Solr tests are written for Jetty.

Lastly, I'd argue Jetty is just-as "enterprise" as Tomcat. Google App
Engine (running lots of enterprise), is Jetty-based, for example.

Cheers,

Tim


On 2 October 2013 15:44, Mark  wrote:

> Is Jetty sufficient for running Solr or should I go with something a
> little more enterprise like tomcat?
>
> Any others?


Re: {soft}Commit and cache flusing

2013-10-07 Thread Tim Vaillancourt
Is there a way to make autoCommit only commit if there are pending changes,
ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
and wipe the caches)?

Cheers,

Tim


On 2 October 2013 00:52, Dmitry Kan  wrote:

> right. We've got the autoHard commit configured only atm. The soft-commits
> are controlled on the client. It was just easier to implement the first
> version of our internal commit policy that will commit to all solr
> instances at once. This is where we have noticed the reported behavior.
>
>
> On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam  wrote:
>
> > if there are no modifications to an index and a softCommit or hardCommit
>  issued, then solr flushes the cache.
> 
> >>>
> > Indeed. The easiest way to work around this is by disabling auto commits
> > and only commit when you have to.
> >
>


Re: solr cpu usage

2013-10-07 Thread Tim Vaillancourt
Fantastic article!

Tim


On 5 October 2013 18:14, Erick Erickson  wrote:

> From my perspective, your question is almost impossible to
> answer, there are too many variables. See:
>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
>  wrote:
> > Hi,
> >
> > More CPU cores means more concurrency.  This is good if you need to
> handle
> > high query rates.
> >
> > Faster cores mean lower query latency, assuming you are not bottlenecked
> by
> > memory or disk IO or network IO.
> >
> > So what is ideal for you depends on your concurrency and latency needs.
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Oct 1, 2013 9:33 AM, "adfel70"  wrote:
> >
> >> hi
> >> We're building a spec for a machine to purchase.
> >> We're going to buy 10 machines.
> >> we aren't sure yet how many proccesses we will run per machine.
> >> the question is  -should we buy faster cpu with less cores or slower cpu
> >> with more cores?
> >> in any case we will have 2 cpus in each machine.
> >> should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?
> >>
> >> what will we gain by having many cores?
> >>
> >> what kinds of usages would make cpu be the bottleneck?
> >>
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>


Re: solr cpu usage

2013-10-08 Thread Tim Vaillancourt
Yes, you've saved us all lots of time with this article. I'm about to do
the same for the old "Jetty or Tomcat?" container question ;).

Tim


On 7 October 2013 18:55, Erick Erickson  wrote:

> Tim:
>
> Thanks! Mostly I wrote it to have something official looking to hide
> behind when I didn't have a good answer to the hardware sizing question
> :).
>
> On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt 
> wrote:
> > Fantastic article!
> >
> > Tim
> >
> >
> > On 5 October 2013 18:14, Erick Erickson  wrote:
> >
> >> From my perspective, your question is almost impossible to
> >> answer, there are too many variables. See:
> >>
> >>
> http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic
> >>  wrote:
> >> > Hi,
> >> >
> >> > More CPU cores means more concurrency.  This is good if you need to
> >> handle
> >> > high query rates.
> >> >
> >> > Faster cores mean lower query latency, assuming you are not
> bottlenecked
> >> by
> >> > memory or disk IO or network IO.
> >> >
> >> > So what is ideal for you depends on your concurrency and latency
> needs.
> >> >
> >> > Otis
> >> > Solr & ElasticSearch Support
> >> > http://sematext.com/
> >> > On Oct 1, 2013 9:33 AM, "adfel70"  wrote:
> >> >
> >> >> hi
> >> >> We're building a spec for a machine to purchase.
> >> >> We're going to buy 10 machines.
> >> >> we aren't sure yet how many proccesses we will run per machine.
> >> >> the question is  -should we buy faster cpu with less cores or slower
> cpu
> >> >> with more cores?
> >> >> in any case we will have 2 cpus in each machine.
> >> >> should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores?
> >> >>
> >> >> what will we gain by having many cores?
> >> >>
> >> >> what kinds of usages would make cpu be the bottleneck?
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >> http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >>
>


Re: {soft}Commit and cache flusing

2013-10-08 Thread Tim Vaillancourt
I have a genuine question with substance here. If anything this
nonconstructive, rude response was "to get noticed". Thanks for
contributing to the discussion.

Tim


On 8 October 2013 05:31, Dmitry Kan  wrote:

> Tim,
> I suggest you open a new thread and not reply to this one to get noticed.
> Dmitry
>
>
> On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt  >wrote:
>
> > Is there a way to make autoCommit only commit if there are pending
> changes,
> > ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher
> > and wipe the caches)?
> >
> > Cheers,
> >
> > Tim
> >
> >
> > On 2 October 2013 00:52, Dmitry Kan  wrote:
> >
> > > right. We've got the autoHard commit configured only atm. The
> > soft-commits
> > > are controlled on the client. It was just easier to implement the first
> > > version of our internal commit policy that will commit to all solr
> > > instances at once. This is where we have noticed the reported behavior.
> > >
> > >
> > > On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam 
> > wrote:
> > >
> > > > if there are no modifications to an index and a softCommit or
> > hardCommit
> > > >>>> issued, then solr flushes the cache.
> > > >>>>
> > > >>>
> > > > Indeed. The easiest way to work around this is by disabling auto
> > commits
> > > > and only commit when you have to.
> > > >
> > >
> >
>


Re: {soft}Commit and cache flusing

2013-10-09 Thread Tim Vaillancourt
Apologies all. I think the suggestion that I was replying "to get noticed"
is what erked me, otherwise I would have moved on. I'll follow this advice.

Cheers,

Tim


On 9 October 2013 05:20, Erick Erickson  wrote:

> Tim:
>
> I think you're mis-interpreting. By replying to a post with the subject:
>
> {soft}Commit and cache flushing
>
> but going in a different direction, it's easy for people to think "I'm
> not interested in that
> thread, I'll ignore it", thereby missing the fact that you're asking a
> somewhat different
> question that they might have information about. It's not about whether
> you're
> doing anything particularly wrong with the question. It's about making
> it easy for
> people to help.
>
> See http://people.apache.org/~hossman/#threadhijack
>
> Best,
> Erick
>
> On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt 
> wrote:
> > I have a genuine question with substance here. If anything this
> > nonconstructive, rude response was "to get noticed". Thanks for
> > contributing to the discussion.
> >
> > Tim
> >
> >
> > On 8 October 2013 05:31, Dmitry Kan  wrote:
> >
> >> Tim,
> >> I suggest you open a new thread and not reply to this one to get
> noticed.
> >> Dmitry
> >>
> >>
> >> On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt  >> >wrote:
> >>
> >> > Is there a way to make autoCommit only commit if there are pending
> >> changes,
> >> > ie: if there are 0 adds pending commit, don't autoCommit
> (open-a-searcher
> >> > and wipe the caches)?
> >> >
> >> > Cheers,
> >> >
> >> > Tim
> >> >
> >> >
> >> > On 2 October 2013 00:52, Dmitry Kan  wrote:
> >> >
> >> > > right. We've got the autoHard commit configured only atm. The
> >> > soft-commits
> >> > > are controlled on the client. It was just easier to implement the
> first
> >> > > version of our internal commit policy that will commit to all solr
> >> > > instances at once. This is where we have noticed the reported
> behavior.
> >> > >
> >> > >
> >> > > On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam 
> >> > wrote:
> >> > >
> >> > > > if there are no modifications to an index and a softCommit or
> >> > hardCommit
> >> > > >>>> issued, then solr flushes the cache.
> >> > > >>>>
> >> > > >>>
> >> > > > Indeed. The easiest way to work around this is by disabling auto
> >> > commits
> >> > > > and only commit when you have to.
> >> > > >
> >> > >
> >> >
> >>
>


Re: Zookeeper dataimport.properties node

2013-04-04 Thread Tim Vaillancourt
It its in your SolrCloud-based collection's config, it won't be on disk 
and only in Zookeeper.


What I did was use the XInclude feature to include a file with my 
dataimport handler properties, so I'm assuming you're doing the same. 
Use a relative path to the config dir in Zookeeper, ie: no path and just 
'dataimport.properties', unless it is in a subdir of your config, then 
'/dataimport.properties'.


I have a deployment system template the properties file before it is 
inserted into Zookeeper.


Tim

On 03/04/13 08:48 PM, Nathan Findley wrote:

 - Is dataimport.properties ever written to the filesystem? (Trying to
 determine if I have a permissions error because I don't see it
 anywhere on disk). - How do you manually edit dataimport.properties?
 My system is periodically pulling in new data. If that process has
 issues, I want to be able to reset to an earlier known good timestamp
 value.

 Regards, Nate





Re: Does solr cloud support rename or swap function for collection?

2013-04-07 Thread Tim Vaillancourt
I aim to use this feature in more in testing soon. I'll be sure to doc 
what I can.


Cheers,

Tim

On 07/04/13 12:28 PM, Mark Miller wrote:

On Apr 7, 2013, at 9:44 AM, bradhill99  wrote:


Thanks Mark for this great feature but I suggest you can update the wiki
too.


Yeah, I've stopped updating the wiki for a while now looking back - paralysis 
on how to handle versions (I didn't want to do the std 'this applies to 4.1', 
'this applied to 4.0' all over the page) and the current likely move to a new 
Confluence wiki with Docs based on documentation LucidWorks recently donated to 
the project.

That's all a lot of work away still I guess.

I'll try and add some basic doc for this to the SolrCloud wiki page soon.

- Mark


Re: Solr 4.2.1 Branch

2013-04-08 Thread Tim Vaillancourt
There is also this path for the SVN guys out there: 
https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1


Cheers,

Tim

On 05/04/13 05:53 PM, Jagdish Nomula wrote:

That works out. Thanks for shooting the link.

On Fri, Apr 5, 2013 at 5:51 PM, Jack Krupanskywrote:


You want the "tagged" branch:

https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2_1


-- Jack Krupansky

-Original Message- From: Jagdish Nomula Sent: Friday, April 05,
2013 8:36 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 Branch
Hello,

I was trying to get hold of solr 4.2.1 branch on github. I see
https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2.
  I don't see
any branch for 4.2.1. Am i missing anything ?.

Thanks in advance for your help.

--
***Jagdish Nomula*

Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com

www.simplyhired.com






/admin/stats.jsp in SolrCloud

2013-04-10 Thread Tim Vaillancourt
Hey guys,

This feels like a silly question already, here goes:

In SolrCloud it doesn't seem obvious to me where one can grab stats
regarding caches for a given core using an http call (JSON/XML). Those
values are available in the web-based app, but I am looking for a http call
that would return this same data.

In 3.x this was located at /admin/stats.php, and I used a script to grab
the data, but in SolrCloud I am unclear and would like to add that to the
docs below:

http://wiki.apache.org/solr/SolrCaching#Overview
http://wiki.apache.org/solr/SolrAdminStats

Thanks!

Tim


Re: /admin/stats.jsp in SolrCloud

2013-04-10 Thread Tim Vaillancourt
There we go, Thanks Stefan!

You're right, 3.x has this as well, I guess I missed it. I'll add this to
the docs for SolrCaching.

Cheers!

Tim



On 10 April 2013 13:19, Stefan Matheis  wrote:

> Hey Tim
>
> SolrCloud-Mode or not does not really matter for this fact .. in 4.x (and
> afaik as well in 3.x) you can find the stats here: 
> http://host:port/solr/admin/mbeans?stats=true
> in xml or json (setting the responsewriter with wt=json) - as you like
>
> HTH
> Stefan
>
>
>
> On Wednesday, April 10, 2013 at 9:53 PM, Tim Vaillancourt wrote:
>
> > Hey guys,
> >
> > This feels like a silly question already, here goes:
> >
> > In SolrCloud it doesn't seem obvious to me where one can grab stats
> > regarding caches for a given core using an http call (JSON/XML). Those
> > values are available in the web-based app, but I am looking for a http
> call
> > that would return this same data.
> >
> > In 3.x this was located at /admin/stats.php, and I used a script to grab
> > the data, but in SolrCloud I am unclear and would like to add that to the
> > docs below:
> >
> > http://wiki.apache.org/solr/SolrCaching#Overview
> > http://wiki.apache.org/solr/SolrAdminStats
> >
> > Thanks!
> >
> > Tim
>
>


CSS appearing in Solr 4.2.1 logs

2013-04-12 Thread Tim Vaillancourt
Hey guys,

This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x
logs?

Often I am finding entire CSS documents (likely from Solr's Admin) in my
jetty's stderrout log.

Example:

"2013-04-12 00:23:20.363:WARN:oejh.HttpGenerator:Ignoring extra content /**
 * @license RequireJS order 1.0.5 Copyright (c) 2010-2011, The Dojo
Foundation All Rights Reserved.
 * Available via the MIT or new BSD license.
 * see: http://github.com/jrburke/requirejs for details
 */
/*jslint nomen: false, plusplus: false, strict: false */
/*global require: false, define: false, window: false, document: false,
  setTimeout: false */

//Specify that requirejs optimizer should wrap this code in a closure that
//maps the namespaced requirejs API to non-namespaced local variables.
/*requirejs namespace: true */

(function () {

//Sadly necessary browser inference due to differences in the way
//that browsers load and execute dynamically inserted javascript
//and whether the script/cache method works when ordered execution is
//desired. Currently, Gecko and Opera do not load/fire onload for
scripts with
//type="script/cache" but they execute injected scripts in order
//unless the 'async' flag is present.
//However, this is all changing in latest browsers implementing HTML5
//spec. With compliant browsers .async true by default, and
//if false, then it will execute in order. Favor that test first for
forward
//compatibility.
var testScript = typeof document !== "undefined" &&
 typeof window !== "undefined" &&
 document.createElement("script"),

supportsInOrderExecution = testScript && (testScript.async ||
   ((window.opera &&

Object.prototype.toString.call(window.opera) === "[object Opera]") ||
   //If Firefox 2 does not have to be
supported, then
   //a better check may be:
   //('mozIsLocallyAvailable' in
window.navigator)
   ("MozAppearance" in
document.documentElement.style))),

"

Due this, my logs are getting really huge, and sometimes it breaks my tail
-F commands on the logs, printing what looks like binary, so there is
possibly some other junk in my logs aside from CSS.

I am running Jetty 8.1.10 and Solr 4.2.1 (stable build).

Cheers!

Tim Vaillancourt


Re: Basic auth on SolrCloud /admin/* calls

2013-04-13 Thread Tim Vaillancourt

This JIRA covers a lot of what you're asking:

https://issues.apache.org/jira/browse/SOLR-4470

I am also trying to get this sort of solution in place, but it seems to 
be dying off a bit. Hopefully we can get some interest on this again, 
this question comes up every few weeks, it seems.


I can confirm the latest patch from this JIRA works as expected, 
although my primary concern is the credentials appear in the JVM 
command, and I'd like to move that to a file.


Cheers,

Tim

On 11/04/13 10:41 AM, Michael Della Bitta wrote:

It's fairly easy to lock down Solr behind basic auth using just the
servlet container it's running in, but the problem becomes letting
services that *should* be able to access Solr in. I've rolled with
basic auth in some setups, but certain deployments such as Solr Cloud
or sharded setups don't play well with auth because there's no good
way to configure them to use it.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Apr 11, 2013 at 1:19 PM, Raymond Wiker  wrote:

On Apr 11, 2013, at 17:12 , adfel70  wrote:

Hi
I need to implement security in solr as follows:
1. prevent unauthorized users from accessing to solr admin pages.
2. prevent unauthorized users from performing solr operations - both /admin
and /update.


Is the conclusion of this thread is that this is not possible at the moment?


The "obvious" solution (to me, at least) would be to (1) restrict access to solr to 
localhost, and (2) use a reverse proxy (e.g, apache) on the same node to provide 
authenticated&  restricted access to solr. I think I've seen recipes for (1), somewhere, 
and I've used (2) fairly extensively for similar purposes.


Re: Basic auth on SolrCloud /admin/* calls

2013-04-14 Thread Tim Vaillancourt
I've thought about this too, and have heard of some people running a 
lightweight http proxy upstream of Solr.


With the right network restrictions (only way for a client to reach solr 
is via a proxy + the nodes can still talk to each other), you could 
achieve the same thing SOLR-4470 is doing, with the drawback of 
additional proxy and firewall components to maintain, plus added 
overhead on HTTP calls.


A benefit though is a lightweight proxy ahead of Solr could implement 
HTTP caching, taking some load off of Solr.


In a perfect world, I'd say rolling out SOLR-4470 is the best solution, 
but again, it seems to be losing momentum (please Vote/support the 
discussion!). While proxies can achieve this, I think enough people have 
pondered about this to implement this as a feature in Solr.


Tim

On 14/04/13 12:32 AM, adfel70 wrote:

Did anyone try blocking access to the ports in the firewall level, and
allowing all the solr servers in the cluster+given control-machines?
Assuming that search request to solr run though a proxy..





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Basic-auth-on-SolrCloud-admin-calls-tp4052266p4055868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does solr cloud support rename or swap function for collection?

2013-04-14 Thread Tim Vaillancourt

I added a brief description on CREATEALIAS here, feel free to tweak:

http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API

Tim

On 07/04/13 05:29 PM, Mark Miller wrote:

It's pretty simple - just as Brad said, it's just

http://localhost:8983/solr/admin/collections?action=CREATEALIAS&name=alias&collections=collection1,collection2,…

You also have action=DELETEALIAS

CREATEALIAS will create and update.

For update requests, you only want a 1to1 alias. For read requests, you can map 
1to1 or 1toN.

I've also started work on shard level aliases, but I've yet to get back to 
finishing it.

- Mark

On Apr 7, 2013, at 5:10 PM, Tim Vaillancourt  wrote:


I aim to use this feature in more in testing soon. I'll be sure to doc what I 
can.

Cheers,

Tim

On 07/04/13 12:28 PM, Mark Miller wrote:

On Apr 7, 2013, at 9:44 AM, bradhill99   wrote:


Thanks Mark for this great feature but I suggest you can update the wiki
too.

Yeah, I've stopped updating the wiki for a while now looking back - paralysis 
on how to handle versions (I didn't want to do the std 'this applies to 4.1', 
'this applied to 4.0' all over the page) and the current likely move to a new 
Confluence wiki with Docs based on documentation LucidWorks recently donated to 
the project.

That's all a lot of work away still I guess.

I'll try and add some basic doc for this to the SolrCloud wiki page soon.

- Mark


Re: Storing Solr Index on NFS

2013-04-15 Thread Tim Vaillancourt
If centralization of storage is your goal by choosing NFS, iSCSI works 
reasonably well with SOLR indexes, although good local-storage will 
always be the overall winner.


I noticed a near 5% degredation in overall search performance (casual 
testing, nothing scientific) when moving a 40-50GB indexes to iSCSI 
(10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup.


Tim

On 15/04/13 09:59 AM, Walter Underwood wrote:

Solr 4.2 does have field compression which makes smaller indexes. That will 
reduce the amount of network traffic. That probably does not help much, because 
I think the latency of NFS is what causes problems.

wunder

On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote:


Hello Walter,

Thanks for the response. That has been my experience in the past as well.
But I was wondering if there new are things in Solr 4 and NFS 4.1 that make
the storing of indexes on a NFS mount feasible.

Thanks,
Saqib


On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwrote:


On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote:


Greetings,

Are there any issues with storing Solr Indexes on a NFS share? Also any
recommendations for using NFS for Solr indexes?

I recommend that you do not put Solr indexes on NFS.

It can be very slow, I measured indexing as 100X slower on NFS a few years
ago.

It is not safe to share Solr index files between two Solr servers, so
there is no benefit to NFS.

wunder
--
Walter Underwood
wun...@wunderwood.org





--
Walter Underwood
wun...@wunderwood.org






Re: protect solr pages

2013-05-17 Thread Tim Vaillancourt
A lot of people (including me) are asking for this type of support in this
JIRA:

https://issues.apache.org/jira/browse/SOLR-4470

Although brought up frequently on the list, the effort doesn't seem to be
moving too much. I can confirm that the most recent patch on this JIRA will
work with the specific revision of 4.2.x though.

Cheers,

Tim


On 17 May 2013 13:11, gpssolr2020  wrote:

> Hi,
>
> i want implement security through jetty realm in solr4. So i configured
> related stuffs in realm.properties ,jetty.xml, webdefault.xml under
> /solrhome/example/etc. But still it is not working. Please advise.
>
>
>
> Thanks.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/protect-solr-pages-tp4064274.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: seeing lots of "autowarming" messages in log during DIH indexing

2013-05-25 Thread Tim Vaillancourt

Interesting.

In your scenario would you use commit=true, or commit=false, and do you use 
auto soft/hard commits?

Secondly, if you did use auto-soft/hard commits, how would they affect this 
scenario? I'm guessing even with commit=false, the autoCommits would be 
triggered either by time or max docs, which opens a searcher anyways. A total 
guess though.

I'm interested in doing full-imports without committing/opening new searchers 
until it is complete.

Cheers!

Tim

On 20/05/13 03:59 PM, shreejay wrote:

geeky2 wrote

you mean i would add this switch to my script that kicks of the
dataimport?

exmaple:


OUTPUT=$(curl -v
http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport
-F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F
optimize=${OPTIMIZE} -F openSearcher=false)

Yes. Thats correct



geeky2 wrote

what needs to be done _AFTER_ the DIH finishes (if anything)?

eg, does this need to be turned back on after the DIH has finished?

Yes. You need to open the searcher to be able to search. Just run another
commit with openSearcher = true , once your indexing process finishes.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064768.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Load Balancer "weight"

2013-06-03 Thread Tim Vaillancourt
Hey guys,

I have recently looked into an issue with my Solrcloud related to very high
load when performing a full-import on DIH.

While some work could be done to improve my queries, etc in DIH, this lead
me to a new feature idea in Solr: weighted internal load balancing.

Basically, I can think of two uses cases, and how a weight on load
balancing could help:

1) My situation from above - I'm doing a huge import and want SolrCloud to
direct fewer queries to the node handling the DIH full-import, say weight
10/100 (10%) instead of 100/100.
2) Mixed hardware - Although I wouldn't recommend doing this, some people
may have mixed hardware, some capable of handling more or less traffic.

These weights wouldn't be expected to be exact, just best-effort to be able
generally to influence load on nodes inside the cluster. They of course
would only matter on reads (/get, /select, etc).

A full blown approach would have weight awareness in the Zookeeper-aware
client implementation, and on inter-node replica requests.

Should I JIRA this? Thoughts?

Tim


Lucene/Solr Filesystem tunings

2013-06-04 Thread Tim Vaillancourt

Hey all,

Does anyone have any advice or special filesytem tuning to share for 
Lucene/Solr, and which file systems they like more?


Also, does Lucene/Solr care about access times if I turn them off (I 
think I doesn't care)?


A bit unrelated: What are people's opinions on reducing some consistency 
things like filesystem journaling, etc (ext2?) due to SolrCloud's 
additional HA with replicas? How about RAID 0 x 3 replicas or so?


Thanks!

Tim Vaillancourt


Re: Lucene/Solr Filesystem tunings

2013-06-07 Thread Tim Vaillancourt
I figured as much for atime, thanks Otis!

I haven't ran benchmarks just yet, but I'll be sure to share whatever I
find. I plan to try ext4 vs xfs.

I am also curious what effect disabling journaling (ext2) would have,
relying on SolrCloud to manage 'consistency' over many instances vs FS
journaling. Anyone have opinions there? If I test I'll share the results.

Cheers,

Tim


On 4 June 2013 16:11, Otis Gospodnetic  wrote:

> Hi,
>
> You can use noatime, nodiratime, nothing in Solr depends on that as
> far as I know.  We tend to use ext4.  Some people love xfs.  Want to
> run some benchmarks and publish the results? :)
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Tue, Jun 4, 2013 at 6:48 PM, Tim Vaillancourt 
> wrote:
> > Hey all,
> >
> > Does anyone have any advice or special filesytem tuning to share for
> > Lucene/Solr, and which file systems they like more?
> >
> > Also, does Lucene/Solr care about access times if I turn them off (I
> think I
> > doesn't care)?
> >
> > A bit unrelated: What are people's opinions on reducing some consistency
> > things like filesystem journaling, etc (ext2?) due to SolrCloud's
> additional
> > HA with replicas? How about RAID 0 x 3 replicas or so?
> >
> > Thanks!
> >
> > Tim Vaillancourt
>


Re: Two instances of solr - the same datadir?

2013-06-07 Thread Tim Vaillancourt
If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.

My main concern was "re-using" the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla  wrote:

> So here it is for a record how I am solving it right now:
>
> Write-master is started with: -Dmontysolr.warming.enabled=false
> -Dmontysolr.write.master=true -Dmontysolr.read.master=
> http://localhost:5005
> Read-master is started with: -Dmontysolr.warming.enabled=true
> -Dmontysolr.write.master=false
>
>
> solrconfig.xml changes:
>
> 1. all index changing components have this bit,
> enable="${montysolr.master:true}" - ie.
>
>   enable="${montysolr.master:true}">
>
> 2. for cache warming de/activation
>
>class="solr.QuerySenderListener"
>   enable="${montysolr.enable.warming:true}">...
>
> 3. to trigger refresh of the read-only-master (from write-master):
>
>class="solr.RunExecutableListener"
>   enable="${montysolr.master:true}">
>   curl
>   .
>   false
>${montysolr.read.master:http://localhost
>
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> 
>
> This works, I still don't like the reload of the whole core, but it seems
> like the easiest thing to do now.
>
> -- roman
>
>
> On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> wrote:
>
> > Hi Peter,
> >
> > Thank you, I am glad to read that this usecase is not alien.
> >
> > I'd like to make the second instance (searcher) completely read-only, so
> I
> > have disabled all the components that can write.
> >
> > (being lazy ;)) I'll probably use
> > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> after
> > commit, or write some IndexReaderFactory that checks for changes
> >
> > The problem with calling the 'core reload' - is that it seems lots of
> work
> > for just opening a new searcher, eeekkk...somewhere I read that it is
> cheap
> > to reload a core, but re-opening the index searches must be definitely
> > cheaper...
> >
> > roman
> >
> >
> > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  >wrote:
> >
> >> Hi,
> >> We use this very same scenario to great effect - 2 instances using the
> >> same
> >> dataDir with many cores - 1 is a writer (no caching), the other is a
> >> searcher (lots of caching).
> >> To get the searcher to see the index changes from the writer, you need
> the
> >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> documents.
> >> This will refresh the caches (including autowarming), [re]build the
> >> relevant searchers etc. and make any index changes visible to the RO
> >> instance.
> >> Also, make sure to use native in solrconfig.xml to
> >> ensure the two instances don't try to commit at the same time.
> >> There are several ways to trigger a commit:
> >> Call commit() periodically within your own code.
> >> Use autoCommit in solrconfig.xml.
> >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> >> searcher the index has changed, then call commit when called (more
> complex
> >> coding, but good if the index changes on an ad-hoc basis).
> >> Note, doing things this way isn't really suitable for an NRT
> environment.
> >>
> >> HTH,
> >> Peter
> >>
> >>
> >>
> >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> >> wrote:
> >>
> >> > Replication is fine, I am going to use it, but I wanted it for
> instances
> >> > *distributed* across several (physical) machines - but here I have one
> >> > physical machine, it has many cores. I want to run 2 instances of solr
> >> > because I think it has these benefits:
> >> >
> >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> >> > searcher (28GB)
> >> > 2) I can deactivate warming for the writer and keep it for the
> searcher
> >> > (this considerably speeds up indexing - each time we commit, the
> server
> >> is
> >> > rebuilding a citation network of 80M edges)
> >> > 3) saving disk space and better OS caching (OS should be able to use
> >> more
> >> > RAM for the caching, which should result in faster operations - the
> two
> >> > processes are accessing the same index)
> >> >
> >> > Maybe I should just forget it and go with the replication, but it
> >> doesn't
> >> > 'feel right' IFF it is on the same physical machine. And Lucene
> >> > specifically has a method for discovering changes and re-opening the
> >> index
> >> > (DirectoryReader.openIfChanged)
> >> >
> >> > Am I not seeing something?
> >> >
> >> > roman
> >> >
> >> >
> >> >
> >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> >> > jhell...@innoventsolutions.com> wrote:
> >> >
> >> > > Roman,
> >

Re: Dataless nodes in SolrCloud?

2013-06-10 Thread Tim Vaillancourt
To answer Otis' question of whether or not this would be useful, the
trouble is, I don't know! :) It very well could be useful for my use case.

Is there any way to determine the impact of result merging (time spent?
Etc?) aside from just 'trying it'?

Cheers,

Tim


On 10 June 2013 14:48, Otis Gospodnetic  wrote:

> I think it would be useful.  I know people using ElasticSearch use it
> relatively often.
>
> >  Is aggregation expensive enough to warrant a separate box?
>
> I think it can get expensive if X in rows=X is highish.  We've seen
> this reported here on the Solr ML before
> So to make sorting/merging of N result set from N "data nodes" on this
> "aggregator node" you may want to get all the CPU you can get and not
> have the CPU simultaneously also try to handle incoming queries.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar
>  wrote:
> > No, there's no such notion in SolrCloud. Each node that is part of a
> > collection/shard is a replica and will handle indexing/querying. Even
> > though you can send a request to a node containing a different
> collection,
> > the request would just be forwarded to the right node and will be
> executed
> > there.
> >
> > That being said, do people find such a feature useful? Is aggregation
> > expensive enough to warrant a separate box? In a distributed search, the
> > local index is used. One'd would just be adding a couple of extra network
> > requests if you don't have a local index.
> >
> >
> > On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Is there a notion of a data-node vs. non-data node in SolrCloud?
> >> Something a la
> http://www.elasticsearch.org/guide/reference/modules/node/
> >>
> >>
> >> Thanks,
> >> Otis
> >> Solr & ElasticSearch Support
> >> http://sematext.com/
> >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>