Re: commit in solr4 takes a longer time

2013-05-03 Thread Sandeep Mestry
That's not ideal.
Can you post solrconfig.xml?
On 3 May 2013 07:41, "vicky desai"  wrote:

> Hi sandeep,
>
> I made the changes u mentioned and tested again for the same set of docs
> but
> unfortunately the commit time increased.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060622.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: commit in solr4 takes a longer time

2013-05-03 Thread vicky desai
My solrconfig.xml is as follows



LUCENE_40



2147483647
simple
true



500
1000


5 
30 
false











 
true

*:*
 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What Happens to Consistency if I kill a Leader and Startup it again?

2013-05-03 Thread Furkan KAMACI
Shawn thanks for detailed answer, it explains everything. I think that
there is no problem. I will use 4.3. when it is available and if I see a
situation something like that I will report.

2013/5/3 Shawn Heisey 

> On 5/2/2013 2:19 PM, Furkan KAMACI wrote:
> > I see that at my admin page:
> >
> > Replication (Slave)  Version  GenSize
> > Master:  1367307652512   82  778.04 MB
> > Slave:   1367307658862   82  781.05 MB
> >
> > and I started to figure about it so that's why I asked this question.
>
> As we've been trying to tell you, the sizes can (and will) be different
> between replicas on SolrCloud.  Also, if you're not running a recent
> release candidate of 4.3, then the version numbers on the replication
> screen are misleading.  See SOLR-4661 for more details.
>
> Your example of version numbers like 100, 90, and 95 wouldn't actually
> happen, because the version number is based on the current time in
> milliseconds since 1970-01-01 00:00:00 UTC.  If you index after killing
> the leader, the new leader's version number will be higher than the
> offline replica.
>
> If you can find actual proof of a problem with index updates related to
> killing the leader, then we can take the bug report and work on fixing
> it.  Here's how you would go about finding proof.  It would be easiest
> to have one shard, but if you want to make sure it's OK with multiple
> shards, you would have to kill all the leaders.
>
> * Start with a functional collection with two replicas.
> * Index a document with a recognizable ID like "A".
> * Make sure you can find document A.
> * Kill the leader replica, let's say it was replica1.
> * Make sure replica2 becomes leader.
> * Make sure you can find document A.
> * Index document B.
> * Start replica1, wait for it to turn green.
> * Make sure you can still find document B.
> * Kill the leader again, this time it's replica2.
> * Make sure you can still find document B.
>
> To my knowledge, nobody has reported a real problem with proof.  I would
> imagine that more than one person has done testing like this to make
> sure that SolrCloud is reliable.
>
> Thanks,
> Shawn
>
>


Re: Does Near Real Time get not supported at SolrCloud?

2013-05-03 Thread Furkan KAMACI
Does soft commits distributes into nodes of SolrCloud?

2013/5/3 Otis Gospodnetic 

> NRT works with SolrCloud.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
>
> On May 2, 2013 5:34 AM, "Furkan KAMACI"  wrote:
> >
> > Does Near Real Time get not supported at SolrCloud?
> >
> > I mean when a soft commit occurs at a leader I think that it doesn't
> > distribute it to replicas(because it is not at storage, does indexes at
> RAM
> > distributes to replicas too?) and a search query comes what happens?
>


Re: Rearranging Search Results of a Search?

2013-05-03 Thread Furkan KAMACI
I think this looks like what I search for:
https://issues.apache.org/jira/browse/SOLR-4465

How about post filter for Lucene, can it help me for my purpose?

2013/5/3 Otis Gospodnetic 

> Hi,
>
> You should use search more often :)
>
> http://search-lucene.com/?q=scriptable+collector&sort=newestOnTop&fc_project=Solr&fc_type=issue
>
> Coincidentally, what you see there happens to be a good example of a
> Solr component that does something behind the scenes to deliver those
> search results even though my original query was bad.  Knd of
> similar to what you are after.
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Thu, May 2, 2013 at 4:47 PM, Furkan KAMACI 
> wrote:
> > I know that I can use boosting at query for a field, for a searching
> term,
> > at solrconfig.xml and query elevator so I can arrange the results of a
> > search. However after I get top documents how can I change the order of a
> > results? Does Lucene's postfilter stands for that?
>


Re: Delete from Solr Cloud 4.0 index..

2013-05-03 Thread Annette Newton
Thanks Shawn.

I have played around with Soft Commits before and didn't seem to have any
improvement, but with the current load testing I am doing I will give it
another go.

I have researched docValues and came across the fact that it would increase
the index size.  With the upgrade to 4.2.1 the index size has reduced by
approx 33% which is pleasing and I don't really want to lose that saving.

We do use the facet.enum method - which works really well, but I will
verify that we are using that in every instance, we have numerous
developers working on the product and maybe one or two have slipped
through.

Right from the first I upped the zkClientTimeout to 30 as I wanted to give
extra time for any network blips that we experience on AWS.  We only seem
to drop communication on a full garbage collection though.

I am coming to the conclusion that we need to have more shards to cope with
the writes, so I will play around with adding more shards and see how I go.


I appreciate you having a look over our setup and the advice.

Thanks again.

Netty.


On 2 May 2013 23:17, Shawn Heisey  wrote:

> On 5/2/2013 4:24 AM, Annette Newton wrote:
> > Hi Shawn,
> >
> > Thanks so much for your response.  We basically are very write intensive
> > and write throughput is pretty essential to our product.  Reads are
> > sporadic and actually is functioning really well.
> >
> > We write on average (at the moment) 8-12 batches of 35 documents per
> > minute.  But we really will be looking to write more in the future, so
> need
> > to work out scaling of solr and how to cope with more volume.
> >
> > Schema (I have changed the names) :
> >
> > http://pastebin.com/x1ry7ieW
> >
> > Config:
> >
> > http://pastebin.com/pqjTCa7L
>
> This is very clean.  There's probably more you could remove/comment, but
> generally speaking I couldn't find any glaring issues.  In particular,
> you have disabled autowarming, which is a major contributor to commit
> speed problems.
>
> The first thing I think I'd try is increasing zkClientTimeout to 30 or
> 60 seconds.  You can use the startup commandline or solr.xml, I would
> probably use the latter.  Here's a solr.xml fragment that uses a system
> property or a 15 second default:
>
> 
> 
>zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
> hostContext="solr">
>
> General thoughts, these changes might not help this particular issue:
> You've got autoCommit with openSearcher=true.  This is a hard commit.
> If it were me, I would set that up with openSearcher=false and either do
> explicit soft commits from my application or set up autoSoftCommit with
> a shorter timeframe than autoCommit.
>
> This might simply be a scaling issue, where you'll need to spread the
> load wider than four shards.  I know that there are financial
> considerations with that, and they might not be small, so let's leave
> that alone for now.
>
> The memory problems might be a symptom/cause of the scaling issue I just
> mentioned.  You said you're using facets, which can be a real memory hog
> even with only a few of them.  Have you tried facet.method=enum to see
> how it performs?  You'd need to switch to it exclusively, never go with
> the default of fc.  You could put that in the defaults or invariants
> section of your request handler(s).
>
> Another way to reduce memory usage for facets is to use disk-based
> docValues on version 4.2 or later for the facet fields, but this will
> increase your index size, and your index is already quite large.
> Depending on your index contents, the increase may be small or large.
>
> Something to just mention: It looks like your solrconfig.xml has
> hard-coded absolute paths for dataDir and updateLog.  This is fine if
> you'll only ever have one core/collection on each server, but it'll be a
> disaster if you have multiples.  I could be wrong about how these get
> interpreted in SolrCloud -- they might actually be relative despite
> starting with a slash.
>
> Thanks,
> Shawn
>
>


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*


Re: Delete from Solr Cloud 4.0 index..

2013-05-03 Thread Annette Newton
One question Shawn - did you ever get any costings around Zing? Did you
trial it?

Thanks.


On 3 May 2013 10:03, Annette Newton  wrote:

> Thanks Shawn.
>
> I have played around with Soft Commits before and didn't seem to have any
> improvement, but with the current load testing I am doing I will give it
> another go.
>
> I have researched docValues and came across the fact that it would
> increase the index size.  With the upgrade to 4.2.1 the index size has
> reduced by approx 33% which is pleasing and I don't really want to lose
> that saving.
>
> We do use the facet.enum method - which works really well, but I will
> verify that we are using that in every instance, we have numerous
> developers working on the product and maybe one or two have slipped
> through.
>
> Right from the first I upped the zkClientTimeout to 30 as I wanted to give
> extra time for any network blips that we experience on AWS.  We only seem
> to drop communication on a full garbage collection though.
>
> I am coming to the conclusion that we need to have more shards to cope
> with the writes, so I will play around with adding more shards and see how
> I go.
>
> I appreciate you having a look over our setup and the advice.
>
> Thanks again.
>
> Netty.
>
>
> On 2 May 2013 23:17, Shawn Heisey  wrote:
>
>> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> > Hi Shawn,
>> >
>> > Thanks so much for your response.  We basically are very write intensive
>> > and write throughput is pretty essential to our product.  Reads are
>> > sporadic and actually is functioning really well.
>> >
>> > We write on average (at the moment) 8-12 batches of 35 documents per
>> > minute.  But we really will be looking to write more in the future, so
>> need
>> > to work out scaling of solr and how to cope with more volume.
>> >
>> > Schema (I have changed the names) :
>> >
>> > http://pastebin.com/x1ry7ieW
>> >
>> > Config:
>> >
>> > http://pastebin.com/pqjTCa7L
>>
>> This is very clean.  There's probably more you could remove/comment, but
>> generally speaking I couldn't find any glaring issues.  In particular,
>> you have disabled autowarming, which is a major contributor to commit
>> speed problems.
>>
>> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> probably use the latter.  Here's a solr.xml fragment that uses a system
>> property or a 15 second default:
>>
>> 
>> 
>>   > zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> hostContext="solr">
>>
>> General thoughts, these changes might not help this particular issue:
>> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> If it were me, I would set that up with openSearcher=false and either do
>> explicit soft commits from my application or set up autoSoftCommit with
>> a shorter timeframe than autoCommit.
>>
>> This might simply be a scaling issue, where you'll need to spread the
>> load wider than four shards.  I know that there are financial
>> considerations with that, and they might not be small, so let's leave
>> that alone for now.
>>
>> The memory problems might be a symptom/cause of the scaling issue I just
>> mentioned.  You said you're using facets, which can be a real memory hog
>> even with only a few of them.  Have you tried facet.method=enum to see
>> how it performs?  You'd need to switch to it exclusively, never go with
>> the default of fc.  You could put that in the defaults or invariants
>> section of your request handler(s).
>>
>> Another way to reduce memory usage for facets is to use disk-based
>> docValues on version 4.2 or later for the facet fields, but this will
>> increase your index size, and your index is already quite large.
>> Depending on your index contents, the increase may be small or large.
>>
>> Something to just mention: It looks like your solrconfig.xml has
>> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> you'll only ever have one core/collection on each server, but it'll be a
>> disaster if you have multiples.  I could be wrong about how these get
>> interpreted in SolrCloud -- they might actually be relative despite
>> starting with a slash.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>



-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secu

Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Edd Grant
Hi all,

I have been playing with Solr Cloud recently and am enjoying the
distributed indexing capability.

At the moment my SolrCloud consists of 2 leaders and 2 replicas which are
fronted by an HAProxy instance. I want to maximise performance for indexing
and it occurred to me that the model I use for loadbalancing my indexing
requests may impact performance. i.e. am I likely to see better indexing
performance if I stick certain groups of requests to certain nodes vs
simply using a round robin approach?

I'll be doing some impirical testing to try and figure this out but was
wondering if there's any general guidance here? Or if anyone has any
experience of particularly good/ bad configurations?

Many thanks,

Edd

-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Good Desktop Search?

2013-05-03 Thread Savia Beson
Hi everybody, 
just a simple question 
is there any solr/lucene based desktop search project around someone might 
recommend? 
I am looking for something for personal use that is kind of mature, at least 
stable,  runs on java and does not require admin rights to install. Nothing too 
fancy.

Thanks/S.





Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Furkan KAMACI
Do you use CloudSolrServer when you push documnts into SolrCloud to be
indexed?

2013/5/3 Edd Grant 

> Hi all,
>
> I have been playing with Solr Cloud recently and am enjoying the
> distributed indexing capability.
>
> At the moment my SolrCloud consists of 2 leaders and 2 replicas which are
> fronted by an HAProxy instance. I want to maximise performance for indexing
> and it occurred to me that the model I use for loadbalancing my indexing
> requests may impact performance. i.e. am I likely to see better indexing
> performance if I stick certain groups of requests to certain nodes vs
> simply using a round robin approach?
>
> I'll be doing some impirical testing to try and figure this out but was
> wondering if there's any general guidance here? Or if anyone has any
> experience of particularly good/ bad configurations?
>
> Many thanks,
>
> Edd
>
> --
> Web: http://www.eddgrant.com
> Email: e...@eddgrant.com
> Mobile: +44 (0) 7861 394 543
>


Re: Good Desktop Search?

2013-05-03 Thread Paul Libbrecht
Savia,

maybe not very mature yet, but someone on java-us...@lucene.apache.org 
announced such a tool the other day.
I'm copying it below.
I do not know of many otherwise.

paul

> Hi everybody, 
> just a simple question 
> is there any solr/lucene based desktop search project around someone might 
> recommend? 
> I am looking for something for personal use that is kind of mature, at least 
> stable,  runs on java and does not require admin rights to install. Nothing 
> too fancy.



Begin forwarded message:

> From: Mirko Sertic 
> Date: 29 avril 2013 21:20:19 HAEC
> To: java-u...@lucene.apache.org
> Subject: Lucene Desktop Search Engine with JavaFX/Tika/Filesystem 
> Crawler/HTML5
> Reply-To: java-u...@lucene.apache.org
> 
> Hi@all
> 
> Lucene rocks, and based on some JavaFX/HTML5 hyprids i built a small Java 
> search engine for your desktop!
> 
> The prototype and the result can be seen here:
> 
> http://www.mirkosertic.de/doku.php/javastuff/fxdesktopsearch
> 
> I am using a multithreaded pipes and filters architecture with Tika as the 
> content extraction framework and of course Lucene as the fulltext engine. It 
> really rocks, i can search thousands of documents with syntax highlighting 
> within a few milliseconds. It also supports MoreLikeThis queries showing 
> document similarities.
> 
> Thanks @all working on Lucene!
> 
> I am planning future releases of the desktop search engine with facetted 
> search based on tika-extracted document metadata. Also NLP with named entity 
> extraction might be a usecase, so everyone who is willing to contribute is 
> very welcome. Sourcecode is OSS and hosted on Google Code here:
> 
> http://code.google.com/p/freedesktopsearch/
> 
> Regards
> Mirko
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 


Duplicated Documents Across shards

2013-05-03 Thread Iker Mtnz. Apellaniz
Hi,
  We have currently a solrCloud implementation running 5 shards in 3
physical machines, so the first machine will have the shard number 1, the
second machine shards 2 & 4, and the third shards 3 & 5. We noticed that
while queryng numFoundDocs decreased when we increased the start param.
  After some investigation we found that the documents in shards 2 to 5
were being counted twice. Querying to shard 2 will give you back the
results for shard 2 & 4, and the same thing for shards 3 & 5. Our guess is
that the physical index for both shard 2&4 is shared, so the shards don't
know which part of it is for each one.
  The uniqueKey is correctly defined, and we have tried using shard prefix
(shard1!docID).

  Is there any way to solve this problem when a unique physical machine
shares shards?
  Is it a "real" problem os it just affects facet & numResults?

Thanks
   Iker

-- 
/** @author imartinez*/
Person me = *new* Developer();
me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
me.setTwit("@mitxino77 ");
me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]});
me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
me.setWebs({*urbasaabentura.com, ikertxef.com*});
*return* me;


Re: Good Desktop Search?

2013-05-03 Thread Savia Beson
Thanks Paul,  I missed that one. 


On May 3, 2013, at 2:27 PM, Paul Libbrecht  wrote:

> Savia,
> 
> maybe not very mature yet, but someone on java-us...@lucene.apache.org 
> announced such a tool the other day.
> I'm copying it below.
> I do not know of many otherwise.
> 
> paul
> 
>> Hi everybody, 
>> just a simple question 
>> is there any solr/lucene based desktop search project around someone might 
>> recommend? 
>> I am looking for something for personal use that is kind of mature, at least 
>> stable,  runs on java and does not require admin rights to install. Nothing 
>> too fancy.
> 
> 
> 
> Begin forwarded message:
> 
>> From: Mirko Sertic 
>> Date: 29 avril 2013 21:20:19 HAEC
>> To: java-u...@lucene.apache.org
>> Subject: Lucene Desktop Search Engine with JavaFX/Tika/Filesystem 
>> Crawler/HTML5
>> Reply-To: java-u...@lucene.apache.org
>> 
>> Hi@all
>> 
>> Lucene rocks, and based on some JavaFX/HTML5 hyprids i built a small Java 
>> search engine for your desktop!
>> 
>> The prototype and the result can be seen here:
>> 
>> http://www.mirkosertic.de/doku.php/javastuff/fxdesktopsearch
>> 
>> I am using a multithreaded pipes and filters architecture with Tika as the 
>> content extraction framework and of course Lucene as the fulltext engine. It 
>> really rocks, i can search thousands of documents with syntax highlighting 
>> within a few milliseconds. It also supports MoreLikeThis queries showing 
>> document similarities.
>> 
>> Thanks @all working on Lucene!
>> 
>> I am planning future releases of the desktop search engine with facetted 
>> search based on tika-extracted document metadata. Also NLP with named entity 
>> extraction might be a usecase, so everyone who is willing to contribute is 
>> very welcome. Sourcecode is OSS and hosted on Google Code here:
>> 
>> http://code.google.com/p/freedesktopsearch/
>> 
>> Regards
>> Mirko
>> 
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>> 



Solr 4 reload failed core

2013-05-03 Thread Peter Kirk
Hi

I have a multi-core installation, with 2 cores. Sometimes, when Solr starts up, 
one of the cores fails (due to an extension to Solr I have, which is waiting on 
an external service which has yet to initialise).

In previous versions of Solr, I could subsequently issue a RELOAD to this core, 
even though it was in a "fail" state, and it would reload and start up.
Now it seems with Solr 4, I cannot issue a RELOAD to a core which has failed.

Is this the case?

How can I get Solr to start a core which failed on initial start up?

Thanks,
Peter






Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Edd Grant
Hi,

No we're actually POSTing them over plain old http. Our "feeder" process
simply points at the HAProxy box and posts merrily away.

Cheers,

Edd


On 3 May 2013 13:17, Furkan KAMACI  wrote:

> Do you use CloudSolrServer when you push documnts into SolrCloud to be
> indexed?
>
> 2013/5/3 Edd Grant 
>
> > Hi all,
> >
> > I have been playing with Solr Cloud recently and am enjoying the
> > distributed indexing capability.
> >
> > At the moment my SolrCloud consists of 2 leaders and 2 replicas which are
> > fronted by an HAProxy instance. I want to maximise performance for
> indexing
> > and it occurred to me that the model I use for loadbalancing my indexing
> > requests may impact performance. i.e. am I likely to see better indexing
> > performance if I stick certain groups of requests to certain nodes vs
> > simply using a round robin approach?
> >
> > I'll be doing some impirical testing to try and figure this out but was
> > wondering if there's any general guidance here? Or if anyone has any
> > experience of particularly good/ bad configurations?
> >
> > Many thanks,
> >
> > Edd
> >
> > --
> > Web: http://www.eddgrant.com
> > Email: e...@eddgrant.com
> > Mobile: +44 (0) 7861 394 543
> >
>



-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Furkan KAMACI
If you index them with SolrCloudServer, your server will learn where data
will go from Zookeeper and send data to that shard leader. However if you
use another random processes or something like data will go any of nodes
and after that will be routed into the right place within cluster. This
extra routing process within cluster may cause unnecessary network traffic
and latency for indexing time as well.

2013/5/3 Edd Grant 

> Hi,
>
> No we're actually POSTing them over plain old http. Our "feeder" process
> simply points at the HAProxy box and posts merrily away.
>
> Cheers,
>
> Edd
>
>
> On 3 May 2013 13:17, Furkan KAMACI  wrote:
>
> > Do you use CloudSolrServer when you push documnts into SolrCloud to be
> > indexed?
> >
> > 2013/5/3 Edd Grant 
> >
> > > Hi all,
> > >
> > > I have been playing with Solr Cloud recently and am enjoying the
> > > distributed indexing capability.
> > >
> > > At the moment my SolrCloud consists of 2 leaders and 2 replicas which
> are
> > > fronted by an HAProxy instance. I want to maximise performance for
> > indexing
> > > and it occurred to me that the model I use for loadbalancing my
> indexing
> > > requests may impact performance. i.e. am I likely to see better
> indexing
> > > performance if I stick certain groups of requests to certain nodes vs
> > > simply using a round robin approach?
> > >
> > > I'll be doing some impirical testing to try and figure this out but was
> > > wondering if there's any general guidance here? Or if anyone has any
> > > experience of particularly good/ bad configurations?
> > >
> > > Many thanks,
> > >
> > > Edd
> > >
> > > --
> > > Web: http://www.eddgrant.com
> > > Email: e...@eddgrant.com
> > > Mobile: +44 (0) 7861 394 543
> > >
> >
>
>
>
> --
> Web: http://www.eddgrant.com
> Email: e...@eddgrant.com
> Mobile: +44 (0) 7861 394 543
>


Re: commit in solr4 takes a longer time

2013-05-03 Thread vicky desai
Hi All,

setting opensearcher flag to true solution worked and it give me visible
improvement in commit time. One thing to make note of is that while using
solrj client we have to call server.commit(false,false) which i was doing
incorrectly and hence was not able to see the improvement earliear.

Thanks everyone



--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060688.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delete from Solr Cloud 4.0 index..

2013-05-03 Thread Shawn Heisey
On 5/3/2013 3:22 AM, Annette Newton wrote:
> One question Shawn - did you ever get any costings around Zing? Did you
> trial it?

I never did do a trial.  I asked them for a cost and they didn't have an
immediate answer, wanted to do a phone call and get a lot of information
about my setup.  The price apparently has a lot of variance based on the
specific environment, so I didn't pursue it, figuring that the cost
would be higher than my superiors are willing to pay.

The only information I could find about the cost of Zing was a very
recent Register article that had this to say:

"Azul is similarly cagey about what a supported version of the Zing JVM
costs, and only says that Zing costs around what a supported version of
an Oracle, IBM, or Red Hat JVM will run enterprises and that it has an
annual subscription model for Zing pricing. You can't easily get pricing
for Oracle, IBM, or Red Hat JVMs, of course, so the comparison is
accurate but perfectly useless."

http://www.theregister.co.uk/2013/04/08/azul_systems_zing_lmax_exchange/

Thanks,
Shawn



Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Edd Grant
Thanks, that's exactly what I was worried about. If I take your suggested
approach of using SolrCloudServer and the feeder learns which shard leader
to target, then if the shard leader goes down midway through indexing then
I've lost my ability to index. Whereas if I take the route of making all
updates via the HAProxy instance then I've got HA but at the cost of
performance.

This has me wondering if it might be feasable to address each shard with a
VIP? Then if the leader of the shard goes down and a replica is elected as
the leader it could also take the VIP, so in essence we'd always be sending
messages to the leader. Anyone tried anything like this?

Cheers,

Edd


On 3 May 2013 15:22, Furkan KAMACI  wrote:

> If you index them with SolrCloudServer, your server will learn where data
> will go from Zookeeper and send data to that shard leader. However if you
> use another random processes or something like data will go any of nodes
> and after that will be routed into the right place within cluster. This
> extra routing process within cluster may cause unnecessary network traffic
> and latency for indexing time as well.
>
> 2013/5/3 Edd Grant 
>
> > Hi,
> >
> > No we're actually POSTing them over plain old http. Our "feeder" process
> > simply points at the HAProxy box and posts merrily away.
> >
> > Cheers,
> >
> > Edd
> >
> >
> > On 3 May 2013 13:17, Furkan KAMACI  wrote:
> >
> > > Do you use CloudSolrServer when you push documnts into SolrCloud to be
> > > indexed?
> > >
> > > 2013/5/3 Edd Grant 
> > >
> > > > Hi all,
> > > >
> > > > I have been playing with Solr Cloud recently and am enjoying the
> > > > distributed indexing capability.
> > > >
> > > > At the moment my SolrCloud consists of 2 leaders and 2 replicas which
> > are
> > > > fronted by an HAProxy instance. I want to maximise performance for
> > > indexing
> > > > and it occurred to me that the model I use for loadbalancing my
> > indexing
> > > > requests may impact performance. i.e. am I likely to see better
> > indexing
> > > > performance if I stick certain groups of requests to certain nodes vs
> > > > simply using a round robin approach?
> > > >
> > > > I'll be doing some impirical testing to try and figure this out but
> was
> > > > wondering if there's any general guidance here? Or if anyone has any
> > > > experience of particularly good/ bad configurations?
> > > >
> > > > Many thanks,
> > > >
> > > > Edd
> > > >
> > > > --
> > > > Web: http://www.eddgrant.com
> > > > Email: e...@eddgrant.com
> > > > Mobile: +44 (0) 7861 394 543
> > > >
> > >
> >
> >
> >
> > --
> > Web: http://www.eddgrant.com
> > Email: e...@eddgrant.com
> > Mobile: +44 (0) 7861 394 543
> >
>



-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Shawn Heisey
On 5/3/2013 8:35 AM, Edd Grant wrote:
> Thanks, that's exactly what I was worried about. If I take your suggested
> approach of using SolrCloudServer and the feeder learns which shard leader
> to target, then if the shard leader goes down midway through indexing then
> I've lost my ability to index. Whereas if I take the route of making all
> updates via the HAProxy instance then I've got HA but at the cost of
> performance.
> 
> This has me wondering if it might be feasable to address each shard with a
> VIP? Then if the leader of the shard goes down and a replica is elected as
> the leader it could also take the VIP, so in essence we'd always be sending
> messages to the leader. Anyone tried anything like this?

CloudSolrServer is part of the SolrJ (Java) API.  It incorporates a
zookeeper client.  To initialize it, you don't tell it about your Solr
servers, you give it the same zookeeper host information that you give
to Solr when starting in cloud mode.  It always knows the current state
of the cluster, so if you have a failure, it adjusts so that your
queries and updates don't fail.  That also means that it will know when
servers are added to or removed from the cloud.

http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html

Thanks,
Shawn



Re: Performance considerations when using distributed indexing + loadbalancing with Solr cloud

2013-05-03 Thread Edd Grant
Aah I see - very useful. Thanks!


On 3 May 2013 15:49, Shawn Heisey  wrote:

> On 5/3/2013 8:35 AM, Edd Grant wrote:
> > Thanks, that's exactly what I was worried about. If I take your suggested
> > approach of using SolrCloudServer and the feeder learns which shard
> leader
> > to target, then if the shard leader goes down midway through indexing
> then
> > I've lost my ability to index. Whereas if I take the route of making all
> > updates via the HAProxy instance then I've got HA but at the cost of
> > performance.
> >
> > This has me wondering if it might be feasable to address each shard with
> a
> > VIP? Then if the leader of the shard goes down and a replica is elected
> as
> > the leader it could also take the VIP, so in essence we'd always be
> sending
> > messages to the leader. Anyone tried anything like this?
>
> CloudSolrServer is part of the SolrJ (Java) API.  It incorporates a
> zookeeper client.  To initialize it, you don't tell it about your Solr
> servers, you give it the same zookeeper host information that you give
> to Solr when starting in cloud mode.  It always knows the current state
> of the cluster, so if you have a failure, it adjusts so that your
> queries and updates don't fail.  That also means that it will know when
> servers are added to or removed from the cloud.
>
>
> http://lucene.apache.org/solr/4_2_1/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrServer.html
>
> Thanks,
> Shawn
>
>


-- 
Web: http://www.eddgrant.com
Email: e...@eddgrant.com
Mobile: +44 (0) 7861 394 543


Re: commit in solr4 takes a longer time

2013-05-03 Thread vicky desai
Hi,

After using the following config



500
1000


5000 
false



When a commit operation is fired I am getting the following logs 
INFO: start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

even though openSearcher is false , waitSearcher is true . Can that be set
to false too? Will that give a performance improvement and what is the
config for that  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060706.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does Near Real Time get not supported at SolrCloud?

2013-05-03 Thread Timothy Potter
yes, absolutely - NRT was a big driver for the leader to replica
distribution approach in Solr Cloud

On Fri, May 3, 2013 at 1:14 AM, Furkan KAMACI  wrote:
> Does soft commits distributes into nodes of SolrCloud?
>
> 2013/5/3 Otis Gospodnetic 
>
>> NRT works with SolrCloud.
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>> On May 2, 2013 5:34 AM, "Furkan KAMACI"  wrote:
>> >
>> > Does Near Real Time get not supported at SolrCloud?
>> >
>> > I mean when a soft commit occurs at a leader I think that it doesn't
>> > distribute it to replicas(because it is not at storage, does indexes at
>> RAM
>> > distributes to replicas too?) and a search query comes what happens?
>>


Re: Solr metrics in Codahale metrics and Graphite?

2013-05-03 Thread Furkan KAMACI
Does anybody tested Ganglia with JMXTrans at production environment for
SolrCloud?

2013/4/26 Dmitry Kan 

> Alan, Shawn,
>
> If backporting to 3.x is hard, no worries, we don't necessarily require the
> patch as we are heading to 4.x eventually. It is just much easier within
> our organization to test on the existing solr 3.4 as there are a few of
> internal dependencies and custom code on top of solr. Also solr upgrades on
> production systems are usually pushed forward by a month or so starting the
> upgrade on development systems (requires lots of testing and
> verifications).
>
> Nevertheless, it is good effort to make #solr #graphite friendly, so keep
> it up! :)
>
> Dmitry
>
>
>
>
> On Thu, Apr 25, 2013 at 9:29 PM, Shawn Heisey  wrote:
>
> > On 4/25/2013 6:30 AM, Dmitry Kan wrote:
> > > We are very much interested in 3.4.
> > >
> > > On Thu, Apr 25, 2013 at 12:55 PM, Alan Woodward 
> wrote:
> > >> This is on top of trunk at the moment, but would be back ported to 4.4
> > if
> > >> there was interest.
> >
> > This will be bad news, I'm sorry:
> >
> > All remaining work on 3.x versions happens in the 3.6 branch. This
> > branch is in maintenance mode.  It will only get fixes for serious bugs
> > with no workaround.  Improvements and new features won't be considered
> > at all.
> >
> > You're welcome to try backporting patches from newer issues.  Due to the
> > major differences in the 3x and 4x codebases, the best case scenario is
> > that you'll be facing a very manual task.  Some changes can't be
> > backported because they rely on other features only found in 4.x code.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: commit in solr4 takes a longer time

2013-05-03 Thread Gopal Patwa
Since you have define commit option as "Auto Commit" for hard and soft
commit then you don't have to explicitly call commit from SolrJ client. And
openSearcher=false for hard commit will make hard commit faster since it is
only makes sure that recent changes are flushed to disk (for durability)
 and not opening any searcher.

can you post you log when soft commit and hard commit happens?

You can read about waitFlush=false and waitSearcher=false which are default
to true, see below from  java doc

JavaDoc:
*waitFlush* block until index changes are flushed to disk
*waitSearcher* block until a new searcher is opened and registered as the
main query searcher, making the changes visible*T*


On Fri, May 3, 2013 at 7:19 AM, vicky desai wrote:

> Hi All,
>
> setting opensearcher flag to true solution worked and it give me visible
> improvement in commit time. One thing to make note of is that while using
> solrj client we have to call server.commit(false,false) which i was doing
> incorrectly and hence was not able to see the improvement earliear.
>
> Thanks everyone
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060688.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: commit in solr4 takes a longer time

2013-05-03 Thread vicky desai
Hi,

When a auto commit operation is fired I am getting the following logs
INFO: start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

setting the openSearcher to false definetly gave me a lot of performance
improvement but  was wondering if waitSearcher can also be set to false and
will that give me a performance raise too. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/commit-in-solr4-takes-a-longer-time-tp4060396p4060715.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: any plans to remove int32 limitation on the number of the documents in the index?

2013-05-03 Thread Erick Erickson
My off the cuff thought is that there are significant costs trying to
do this that would be paid by 99.999% of setups out there. Also,
usually you'll run into other issues (RAM etc) long before you come
anywhere close to 2^31 docs.

Lucene/Solr often allocates int[maxDoc] for various operations. when
maxDoc approaches 2^31, well memory goes through the roof. Now
consider allocating longs instead...

which is a long way of saying that I don't really think anyone's going
to be working on this any time soon, especially when SolrCloud removes
a LOT of the pain /complexity (from a user perspective anyway) from
going to a sharded setup...

FWIW,
Erick

On Thu, May 2, 2013 at 1:17 PM, Valery Giner  wrote:
> Otis,
>
> The documents themselves are relatively small, tens of fields, only a few of
> them could be up to a hundred bytes.
> Lunix Servers with relatively large RAM (256),
> Minutes on the searches are fine for our purposes,  adding a few tens of
> millions of records in tens of minutes are also fine.
> We had to do some simple tricks for keeping indexing up to speed but nothing
> too fancy.
> Moving to the sharding adds a layer of complexity which we don't really need
> because of the above, ... and adding complexity may result in lower
> reliability :)
>
> Thanks,
> Val
>
>
> On 05/02/2013 03:41 PM, Otis Gospodnetic wrote:
>>
>> Val,
>>
>> Haven't seen this mentioned in a while...
>>
>> I'm curious...what sort of index, queries, hardware, and latency
>> requirements do you have?
>>
>> Otis
>> Solr & ElasticSearch Support
>> http://sematext.com/
>> On May 1, 2013 4:36 PM, "Valery Giner"  wrote:
>>
>>> Dear Solr Developers,
>>>
>>> I've been unable to find an answer to the question in the subject line of
>>> this e-mail, except of a vague one.
>>>
>>> We need to be able to index over 2bln+ documents.   We were doing well
>>> without sharding until the number of docs hit the limit ( 2bln+).   The
>>> performance was satisfactory for the queries, updates and indexing of new
>>> documents.
>>>
>>> That is, except for the need to go around the int32 limit, we don't
>>> really
>>> have a need for setting up distributed solr.
>>>
>>> I wonder whether some one on the solr team could tell us when/what
>>> version
>>> of solr we could expect the limit to be removed.
>>>
>>> I hope this question may be of interest to some one else :)
>>>
>>> --
>>> Thanks,
>>> Val
>>>
>>>
>


Re: transientCacheSize doesn't seem to have any effect, except on startup

2013-05-03 Thread Erick Erickson
The cores aren't loaded (or at least shouldn't be) for getting the status.
The _names_ of the cores should be returned, but those are (supposed) to be
retrieved from a list rather than loaded cores. So are you sure that's not what
you are seeing? How are you determining whether the cores are actually loaded
or not?

That said, it's perfectly possible that the status command is doing something we
didn't anticipate, but I took a quick look at the code (got to rush to a plane)
and CoreAdminHandler _appears_ to be just returning whatever info it can
about an unloaded core for status. I _think_ you'll get more info if the
core has ever been loaded though, even though if it's been removed from
the transient cache. Ditto for the create action.

So let's figure out whether you're really seeing loaded cores or not, and then
raise a JIRA if so...

Thanks for reporting!
Erick

On Thu, May 2, 2013 at 1:27 PM, didier deshommes  wrote:
> Hi,
> I've been very interested in the transient core feature of solr to manage a
> large number of cores. I'm especially interested in this use case, that the
> wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down
> now):
>
>>loadOnStartup=false transient=true: This is really the use-case. There are
> a large number of cores in your system that are short-duration use. You
> want Solr to load them as necessary, but unload them when the cache gets
> full on an LRU basis.
>
> I'm creating 10 transient core via core admin like so
>
> $ curl "
> http://localhost:8983/solr/admin/cores?wt=json&action=CREATE&name=new_core2&instanceDir=collection1/&dataDir=new_core2&transient=true&loadOnStartup=false
> "
>
> and have "transientCacheSize=2" in my solr.xml file, which I take means I
> should have at most 2 transient cores loaded at any time. The problem is
> that these cores are still loaded when when I ask solr to list cores:
>
> $ curl "http://localhost:8983/solr/admin/cores?wt=json&action=status";
>
> From the explanation in the wiki, it looks like solr would manage loading
> and unloading transient cores for me without having to worry about them,
> but this is not what's happening.
>
> The situation is different when I restart solr; it does the "right thing"
> by loading the maximum cores set by transientCacheSize. When I add more
> cores, the old behavior happens again, where all created transient cores
> are loaded in solr.
>
> I'm using the development branch lucene_solr_4_3 to run my example. I can
> open a jira if need be.


Re: socket write error

2013-05-03 Thread Dmitry Kan
After some more debugging I have found out, that one of the requests had a
size of 4,4MB. The default maxPostSize in tomcat6 is 2MB (
http://tomcat.apache.org/tomcat-6.0-doc/config/ajp.html).

Changing that to 10MB has greatly improved situation on the solr side.

Dmitry


On Fri, May 3, 2013 at 9:55 AM, Dmitry Kan  wrote:

> Digging in further, found this in HttpCommComponent class:
>
> [code]
>   static {
> MultiThreadedHttpConnectionManager mgr = new
> MultiThreadedHttpConnectionManager();
> mgr.getParams().setDefaultMaxConnectionsPerHost(20);
> mgr.getParams().setMaxTotalConnections(1);
> mgr.getParams().setConnectionTimeout(SearchHandler.connectionTimeout);
> mgr.getParams().setSoTimeout(SearchHandler.soTimeout);
> // mgr.getParams().setStaleCheckingEnabled(false);
> client = new HttpClient(mgr);
>   }
> [/code]
>
> Could the value set by setDefaultMaxConnectionsPerHost(20) be to small for
> 80+ shards returning results to the router?
>
> Dmitry
>
>
>
> On Fri, May 3, 2013 at 6:50 AM, Dmitry Kan  wrote:
>
>> Hi, thanks.
>>
>> Solr 3.4.
>> There is POST request everywhere, between client and router, router and
>> shards.
>>
>> Do you do faceting across all shards? How many documents approx you have?
>> On 2 May 2013 22:02, "Patanachai Tangchaisin" <
>> patanachai.tangchai...@wizecommerce.com> wrote:
>>
>>> Hi,
>>>
>>> First, which version of Solr are you using?
>>>
>>> I also has 60 shards+ on Solr 4.2.1 and it doesn't seems to be a problem
>>> for me.
>>>
>>> - Make sure you use POST to send a query to Solr.
>>> - 'connection reset by peer' from client can indicate that there is
>>> something wrong with server e.g. server closes a connection etc.
>>>
>>> --
>>> Patanachai
>>>
>>> On 05/02/2013 05:05 AM, Dmitry Kan wrote:
>>>
 After some searching around, I see this:

 http://search-lucene.com/m/**ErEZUl7P5f2/%2522socket+write+**
 error%2522&subj=Long+list+of+**shards+breaks+solrj+query

 Seems like this has happened in the past with large amount of shards.

 To make it clear: the distributed search works with 20 shards.


 On Thu, May 2, 2013 at 1:57 PM, Dmitry Kan 
 wrote:

  Hi guys!
>
> We have solr router and shards. I see this in jetty log on the router:
>
> May 02, 2013 1:30:22 PM org.apache.commons.httpclient.**
> HttpMethodDirector
> executeWithRetry
> INFO: I/O exception (java.net.SocketException) caught when processing
> request: Connection reset by peer: socket write error
>
> and then:
>
> May 02, 2013 1:30:22 PM org.apache.commons.httpclient.**
> HttpMethodDirector
> executeWithRetry
> INFO: Retrying request
>
> followed by exception about Internal Server Error
>
> any ideas why this happens?
>
> We run 80+ shards distributed across several servers. Router runs on
> its
> own node.
>
> Is there anything in particular I should be looking into wrt ubuntu
> socket
> settings? Is this a known issue for solr's distributed search from the
> past?
>
> Thanks,
> Dmitry
>
>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ==
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the
>>> sender by reply email and destroy all copies of the original message along
>>> with any attachments, from your computer system. If you are the intended
>>> recipient, please be advised that the content of this message is subject to
>>> access, review and disclosure by the sender's Email System Administrator.
>>>
>>>
>


Re: commit in solr4 takes a longer time

2013-05-03 Thread Shawn Heisey

On 5/3/2013 9:28 AM, vicky desai wrote:

Hi,

When a auto commit operation is fired I am getting the following logs
INFO: start
commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

setting the openSearcher to false definetly gave me a lot of performance
improvement but  was wondering if waitSearcher can also be set to false and
will that give me a performance raise too.


The openSearcher parameter changes what actually happens when you do a 
hard commit, so using it can change your performance.


The "wait" parameters are for client software that does commits.  The 
idea is that if you don't want your client to wait for the commit to 
finish, you use these options so that the commit API call will return 
quickly and the server will finish the commit in the background.  It 
doesn't change what the commit does, it just allows the client to start 
doing other things.


With auto commits, the client and the server are both Solr, and 
everything is multi-threaded.  The wait parameters have no meaning, 
because there's no user software that has to wait.  There would be no 
performance gain from turning them off.


Side note: The waitFlush parameter was completely removed in Solr 4.0.

Thanks,
Shawn



Re: The HttpSolrServer "add(Collection docs)" method is not atomic.

2013-05-03 Thread Erick Erickson
bq:  Is there a way to commit multiple documents/beans in a
transaction/together in a way that it succeeds completely or fails
completely?

Not that I know of. I've seen various "divide and conquer" strategies
to identify _which_ document failed, but the general process
is usually to re-index the docs in smaller chunks until you
isolate the offending one and trust that re-indexing documents will
be OK since it overwrites the earlier copiy.

Best
Erick

On Thu, May 2, 2013 at 7:53 PM, mark12345  wrote:
> One thing I noticed is that while the HttpSolrServer "add(SolrInputDocument
> doc)" method is atomic (Either a bean is added or an exception is thrown),
> the HttpSolrServer "add(Collection docs)" method is not
> atomic.
>
> Question:  Is there a way to commit multiple documents/beans in a
> transaction/together in a way that it succeeds completely or fails
> completely?
>
>
> Quick outline of what I did to highlight a call to HttpSolrServer
> "add(Collection docs)" method is not atomic.
> 1.  Create 5 documents, comprising of 4 valid documents (Documents 1,2,4,5)
> and 1 document with an issue, document 3.
> 2.  Call to HttpSolrServer "add(Collection docs)" which
> threw a SolrException.
> 3.  Call to HttpSolrServer "commit()".
> 4.  Discovered that 2 out of 5 (documents 1 and 2) documents where still
> commited.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrJ-Solr-Two-Phase-Commit-tp4060399p4060590.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query across multiple shards - key fields have different names

2013-05-03 Thread Erick Erickson
I don't think you can. Problem is that the "pseudo join" capability
can work "cross core", which meas with two separate cores, but last I
knew distributed joins aren't supported which is what you're asking
for.

Really think about flattening your data if at all possible.

Best
Erick

On Thu, May 2, 2013 at 11:03 PM, Benjamin Ryan
 wrote:
> Hi,
>   Sorry for the basic question - I can't get to the WiKi to find the answer.
>   Version Solr 3.3.0
>   I have two separate indexes (currently in two cores but can be moved to 
> shards)
>   One core holds metadata about educational resources, the other usage 
> statistics
>   They have a common value  named "id" in one core and "search.resourceid" in 
> the other core.
>   How can I construct a shard query (once I have moved one the cores to a 
> different node) so that I can effectively get the statistics for each 
> educational resource grouped by each resource?
>   This is an offline reporting job that needs to list the usage events for 
> educational resources over a time period (the usage events have a date/time 
> field.
>
> Regards,
>Ben
>
> --
> Dr Ben Ryan
> Jorum Technical Manager
>
> 5.12 Roscoe Building
> The University of Manchester
> Oxford Road
> Manchester
> M13 9PL
> Tel: 0160 275 6039
> E-mail: 
> benjamin.r...@manchester.ac.uk
> --
>


Re: Duplicated Documents Across shards

2013-05-03 Thread Erick Erickson
What version of Solr? The custom routing stuff is quite new so
I'm guessing 4x?

But this shouldn't be happening. The actual index data for the
shards should be in separate directories, they just happen to
be on the same physical machine.

Try querying each one with &distrib=false to see the counts
from single shards, that may shed some light on this. It vaguely
sounds like you have indexed the same document to both shards
somehow...

Best
Erick

On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz
 wrote:
> Hi,
>   We have currently a solrCloud implementation running 5 shards in 3
> physical machines, so the first machine will have the shard number 1, the
> second machine shards 2 & 4, and the third shards 3 & 5. We noticed that
> while queryng numFoundDocs decreased when we increased the start param.
>   After some investigation we found that the documents in shards 2 to 5
> were being counted twice. Querying to shard 2 will give you back the
> results for shard 2 & 4, and the same thing for shards 3 & 5. Our guess is
> that the physical index for both shard 2&4 is shared, so the shards don't
> know which part of it is for each one.
>   The uniqueKey is correctly defined, and we have tried using shard prefix
> (shard1!docID).
>
>   Is there any way to solve this problem when a unique physical machine
> shares shards?
>   Is it a "real" problem os it just affects facet & numResults?
>
> Thanks
>Iker
>
> --
> /** @author imartinez*/
> Person me = *new* Developer();
> me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
> me.setTwit("@mitxino77 ");
> me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]});
> me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
> me.setWebs({*urbasaabentura.com, ikertxef.com*});
> *return* me;


Re: Solr 4 reload failed core

2013-05-03 Thread Erick Erickson
It seems odd, but consider "create" rather than "reload". Create
will load up an existing core, think of it as "create in memory"
rather than "create on disk" for the case where there's already
an index.

Best
Erick

On Fri, May 3, 2013 at 6:27 AM, Peter Kirk  wrote:
> Hi
>
> I have a multi-core installation, with 2 cores. Sometimes, when Solr starts 
> up, one of the cores fails (due to an extension to Solr I have, which is 
> waiting on an external service which has yet to initialise).
>
> In previous versions of Solr, I could subsequently issue a RELOAD to this 
> core, even though it was in a "fail" state, and it would reload and start up.
> Now it seems with Solr 4, I cannot issue a RELOAD to a core which has failed.
>
> Is this the case?
>
> How can I get Solr to start a core which failed on initial start up?
>
> Thanks,
> Peter
>
>
>
>


Re: Delete from Solr Cloud 4.0 index..

2013-05-03 Thread Erick Erickson
Anette:

Be a little careful with the index size savings, they really don't
mean much for _searching_. The sotred field compression
significantly reduces the size on disk, but only for the stored
data which is only accessed when returning the top N docs. In
terms of how many docs you can fit on your hardware, it's pretty
irrelevant.

The *.fdt and *.fdx files in your index directory contain the stored
data, so when looking at the effects of various options (including
compression), you can pretty much ignore these files.

FWIW,
Erick

On Fri, May 3, 2013 at 2:03 AM, Annette Newton
 wrote:
> Thanks Shawn.
>
> I have played around with Soft Commits before and didn't seem to have any
> improvement, but with the current load testing I am doing I will give it
> another go.
>
> I have researched docValues and came across the fact that it would increase
> the index size.  With the upgrade to 4.2.1 the index size has reduced by
> approx 33% which is pleasing and I don't really want to lose that saving.
>
> We do use the facet.enum method - which works really well, but I will
> verify that we are using that in every instance, we have numerous
> developers working on the product and maybe one or two have slipped
> through.
>
> Right from the first I upped the zkClientTimeout to 30 as I wanted to give
> extra time for any network blips that we experience on AWS.  We only seem
> to drop communication on a full garbage collection though.
>
> I am coming to the conclusion that we need to have more shards to cope with
> the writes, so I will play around with adding more shards and see how I go.
>
>
> I appreciate you having a look over our setup and the advice.
>
> Thanks again.
>
> Netty.
>
>
> On 2 May 2013 23:17, Shawn Heisey  wrote:
>
>> On 5/2/2013 4:24 AM, Annette Newton wrote:
>> > Hi Shawn,
>> >
>> > Thanks so much for your response.  We basically are very write intensive
>> > and write throughput is pretty essential to our product.  Reads are
>> > sporadic and actually is functioning really well.
>> >
>> > We write on average (at the moment) 8-12 batches of 35 documents per
>> > minute.  But we really will be looking to write more in the future, so
>> need
>> > to work out scaling of solr and how to cope with more volume.
>> >
>> > Schema (I have changed the names) :
>> >
>> > http://pastebin.com/x1ry7ieW
>> >
>> > Config:
>> >
>> > http://pastebin.com/pqjTCa7L
>>
>> This is very clean.  There's probably more you could remove/comment, but
>> generally speaking I couldn't find any glaring issues.  In particular,
>> you have disabled autowarming, which is a major contributor to commit
>> speed problems.
>>
>> The first thing I think I'd try is increasing zkClientTimeout to 30 or
>> 60 seconds.  You can use the startup commandline or solr.xml, I would
>> probably use the latter.  Here's a solr.xml fragment that uses a system
>> property or a 15 second default:
>>
>> 
>> 
>>   > zkClientTimeout="${zkClientTimeout:15000}" hostPort="${jetty.port:}"
>> hostContext="solr">
>>
>> General thoughts, these changes might not help this particular issue:
>> You've got autoCommit with openSearcher=true.  This is a hard commit.
>> If it were me, I would set that up with openSearcher=false and either do
>> explicit soft commits from my application or set up autoSoftCommit with
>> a shorter timeframe than autoCommit.
>>
>> This might simply be a scaling issue, where you'll need to spread the
>> load wider than four shards.  I know that there are financial
>> considerations with that, and they might not be small, so let's leave
>> that alone for now.
>>
>> The memory problems might be a symptom/cause of the scaling issue I just
>> mentioned.  You said you're using facets, which can be a real memory hog
>> even with only a few of them.  Have you tried facet.method=enum to see
>> how it performs?  You'd need to switch to it exclusively, never go with
>> the default of fc.  You could put that in the defaults or invariants
>> section of your request handler(s).
>>
>> Another way to reduce memory usage for facets is to use disk-based
>> docValues on version 4.2 or later for the facet fields, but this will
>> increase your index size, and your index is already quite large.
>> Depending on your index contents, the increase may be small or large.
>>
>> Something to just mention: It looks like your solrconfig.xml has
>> hard-coded absolute paths for dataDir and updateLog.  This is fine if
>> you'll only ever have one core/collection on each server, but it'll be a
>> disaster if you have multiples.  I could be wrong about how these get
>> interpreted in SolrCloud -- they might actually be relative despite
>> starting with a slash.
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
>
> Annette Newton
>
> Database Administrator
>
> ServiceTick Ltd
>
>
>
> T:+44(0)1603 618326
>
>
>
> Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ
>
> www.servicetick.com
>
> *www.sessioncam.com*
>
> --
> *This message is confidential and is intended to be read sole

Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end

2013-05-03 Thread Jack Krupansky
In short, no. I don't think you want to use the shingle filter on a token 
stream that has multiple tokens at the same position, otherwise, you will 
get confused "suggestions", as you've encountered.


-- Jack Krupansky

-Original Message- 
From: Rounak Jain

Sent: Friday, May 03, 2013 7:34 AM
To: solr-user@lucene.apache.org
Subject: Configure Shingle Filter to ignore ngrams made of tokens with same 
start and end


Hello,

I was using Shingle Fitler with Suggester to implement an autosuggest
dropdown. The field I'm using with shingle filter has a worddelimiter with
preserveoriginal=1 to tokenize "women's" as "women's" and "womens."

Because of this, when shingle filter is generating word ngrams, apart from
the expected tokens, there's also a "women's womens" tokens. I wanted to
know if there's any way to configure ShingleFilter so that it ignores
tokens with same start and end values.

Thanks,
Rounak 



SV: Solr 4 reload failed core

2013-05-03 Thread Peter Kirk
Thanks - I had just found the CREATE command, and I think that's the easiest 
path for us to take. It will actually basically function as our "reload" 
workaround works now.



Fra: Erick Erickson [erickerick...@gmail.com]
Sendt: 3. maj 2013 19:22
Til: solr-user@lucene.apache.org
Emne: Re: Solr 4 reload failed core

It seems odd, but consider "create" rather than "reload". Create
will load up an existing core, think of it as "create in memory"
rather than "create on disk" for the case where there's already
an index.

Best
Erick

On Fri, May 3, 2013 at 6:27 AM, Peter Kirk  wrote:
> Hi
>
> I have a multi-core installation, with 2 cores. Sometimes, when Solr starts 
> up, one of the cores fails (due to an extension to Solr I have, which is 
> waiting on an external service which has yet to initialise).
>
> In previous versions of Solr, I could subsequently issue a RELOAD to this 
> core, even though it was in a "fail" state, and it would reload and start up.
> Now it seems with Solr 4, I cannot issue a RELOAD to a core which has failed.
>
> Is this the case?
>
> How can I get Solr to start a core which failed on initial start up?
>
> Thanks,
> Peter
>
>
>
>


Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end

2013-05-03 Thread Walter Underwood
The shingle filter should respect positions. If it doesn't, that is worth 
filing a bug so we know about it.

wunder

On May 3, 2013, at 10:50 AM, Jack Krupansky wrote:

> In short, no. I don't think you want to use the shingle filter on a token 
> stream that has multiple tokens at the same position, otherwise, you will get 
> confused "suggestions", as you've encountered.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Rounak Jain
> Sent: Friday, May 03, 2013 7:34 AM
> To: solr-user@lucene.apache.org
> Subject: Configure Shingle Filter to ignore ngrams made of tokens with same 
> start and end
> 
> Hello,
> 
> I was using Shingle Fitler with Suggester to implement an autosuggest
> dropdown. The field I'm using with shingle filter has a worddelimiter with
> preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
> 
> Because of this, when shingle filter is generating word ngrams, apart from
> the expected tokens, there's also a "women's womens" tokens. I wanted to
> know if there's any way to configure ShingleFilter so that it ignores
> tokens with same start and end values.
> 
> Thanks,
> Rounak 






custom tokenizer error

2013-05-03 Thread Sarita Nair
I am using a custom Tokenizer, as part of analysis chain, for a Solr (4.2.1) 
field. On trying to index, Solr throws a NullPointerException. 
The unit tests for the custom tokenizer work fine. Any ideas as to what is it 
that I am missing/doing incorrectly will be appreciated.

Here is the relevant schema.xml excerpt:

    
    
    
    
    
    
    

Here are the relevant pieces of the Tokenizer:

    /**
     * Intercepts each token produced by {@link 
StandardTokenizer#incrementToken()}
     * and checks for the presence of a colon or period. If found, splits the 
token 
     * on the punctuation mark and adjusts the term and offset attributes of 
the 
     * underlying {@link TokenStream} to create additional tokens.
     * 
     * 
     */
    public class EmbeddedPunctuationTokenizer extends Tokenizer {
private static final Pattern PUNCTUATION_SYMBOLS = Pattern.compile("[:.]");
private StandardTokenizer baseTokenizer;
       private CharTermAttribute termAttr;

private OffsetAttribute offsetAttr;

private /*@Nullable*/ String tokenAfterPunctuation = null;

private int currentOffset = 0;

public EmbeddedPunctuationTokenizer(final Reader reader) {
super(reader);
baseTokenizer = new StandardTokenizer(Version.MINIMUM_LUCENE_VERSION, reader);
// Two TokenStreams are in play here: the one underlying the current 
// instance and the one underlying the StandardTokenizer. The attribute 
// instances must be associated with both.
termAttr = baseTokenizer.addAttribute(CharTermAttribute.class);
offsetAttr = baseTokenizer.addAttribute(OffsetAttribute.class);
this.addAttributeImpl((CharTermAttributeImpl)termAttr);
this.addAttributeImpl((OffsetAttributeImpl)offsetAttr);
}

@Override
public void end() throws IOException {
baseTokenizer.end();
super.end();
}

@Override
public void close() throws IOException {
baseTokenizer.close();
super.close();
}

@Override
public void reset() throws IOException {
super.reset();
baseTokenizer.reset();
currentOffset = 0;
tokenAfterPunctuation = null;
}

@Override
public final boolean incrementToken() throws IOException {
clearAttributes();
if (tokenAfterPunctuation != null) {
// Do not advance the underlying TokenStream if the previous call
// found an embedded punctuation mark and set aside the substring 
// that follows it. Set the attributes instead from the substring, 
// bearing in mind that the substring could contain more embedded
// punctuation marks.
adjustAttributes(tokenAfterPunctuation);
} else if (baseTokenizer.incrementToken()) {
// No remaining substring from a token with embedded punctuation: save
// the starting offset reported by the base tokenizer as the current 
// offset, then proceed with the analysis of token it returned.
currentOffset = offsetAttr.startOffset();
adjustAttributes(termAttr.toString());
} else {
// No more tokens in the underlying token stream: return false
return false;
}
return true;
}


           private void adjustAttributes(final String token) {
Matcher m = PUNCTUATION_SYMBOLS.matcher(token);
if (m.find()) {
int index = m.start();
offsetAttr.setOffset(currentOffset, currentOffset + index);
termAttr.copyBuffer(token.toCharArray(), 0, index);
tokenAfterPunctuation = token.substring(index + 1);
// Given that the incoming token had an embedded punctuation mark, 
// the starting offset for the substring following the punctuation
// mark will be 1 beyond the end of the current token, which is the
// substring preceding embedded punctuation mark.
currentOffset = offsetAttr.endOffset() + 1;
} else if (tokenAfterPunctuation != null) {
// Last remaining substring following a previously detected embedded
// punctuation mark: adjust attributes based on its values.
int length = tokenAfterPunctuation.length();
termAttr.copyBuffer(tokenAfterPunctuation.toCharArray(), 0, length);
offsetAttr.setOffset(currentOffset, currentOffset + length);
tokenAfterPunctuation = null;
}
// Implied else: neither is true so attributes from base tokenizer need
// no adjustments.
}

 }
}

Solr throws the following error, in the 'else if' block of #incrementToken

    2013-04-29 14:19:48,920 [http-thread-pool-8080(3)] ERROR 
org.apache.solr.core.SolrCore - java.lang.NullPointerException
    at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.zzRefill(StandardTokenizerImpl.java:923)
    at 
org.apache.lucene.analysis.standard.StandardTokenizerImpl.getNextToken(StandardTokenizerImpl.java:1133)
    at 
org.apache.lucene.analysis.standard.StandardTokenizer.incrementToken(StandardTokenizer.java:180)
    at 
some.other.solr.analysis.EmbeddedPunctuationTokenizer.incrementToken(EmbeddedPunctuationTokenizer.java:83)
    at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
    at 
org.apache.lucene.analysis.en.EnglishPossessiveFilter.incrementToken(EnglishPossessiveFilter.java:57)
    at 
org.apache.lucene.analysis.en.EnglishMinimalStemFilter.incrementToken(EnglishMinimalStemFilter.java:48)
    at 
org.apache.lucene.index.DocInverterP

Re: transientCacheSize doesn't seem to have any effect, except on startup

2013-05-03 Thread didier deshommes
On Fri, May 3, 2013 at 11:18 AM, Erick Erickson wrote:

> The cores aren't loaded (or at least shouldn't be) for getting the status.
> The _names_ of the cores should be returned, but those are (supposed) to be
> retrieved from a list rather than loaded cores. So are you sure that's not
> what
> you are seeing? How are you determining whether the cores are actually
> loaded
> or not?
>
>
I'm looking at the output of :

$ curl "http://localhost:8983/solr/admin/cores?wt=json&action=status";

cores that are loaded have a "startTime" and "upTime" value. Cores that are
unloaded don't appear in the output at all. For example, I created 3
transient cores with "transientCacheSize=2" . When I asked for a list of
all cores, all 3 cores were returned. I explicitly unloaded 1 core and got
back 2 cores when I asked for the list again.

It would be nice if cores had a "isTransient" and a "isCurrentlyLoaded"
value so that one could see exactly which cores are loaded.




> That said, it's perfectly possible that the status command is doing
> something we
> didn't anticipate, but I took a quick look at the code (got to rush to a
> plane)
> and CoreAdminHandler _appears_ to be just returning whatever info it can
> about an unloaded core for status. I _think_ you'll get more info if the
> core has ever been loaded though, even though if it's been removed from
> the transient cache. Ditto for the create action.
>
> So let's figure out whether you're really seeing loaded cores or not, and
> then
> raise a JIRA if so...
>
> Thanks for reporting!
> Erick
>
> On Thu, May 2, 2013 at 1:27 PM, didier deshommes 
> wrote:
> > Hi,
> > I've been very interested in the transient core feature of solr to
> manage a
> > large number of cores. I'm especially interested in this use case, that
> the
> > wiki lists at http://wiki.apache.org/solr/LotsOfCores (looks to be down
> > now):
> >
> >>loadOnStartup=false transient=true: This is really the use-case. There
> are
> > a large number of cores in your system that are short-duration use. You
> > want Solr to load them as necessary, but unload them when the cache gets
> > full on an LRU basis.
> >
> > I'm creating 10 transient core via core admin like so
> >
> > $ curl "
> >
> http://localhost:8983/solr/admin/cores?wt=json&action=CREATE&name=new_core2&instanceDir=collection1/&dataDir=new_core2&transient=true&loadOnStartup=false
> > "
> >
> > and have "transientCacheSize=2" in my solr.xml file, which I take means I
> > should have at most 2 transient cores loaded at any time. The problem is
> > that these cores are still loaded when when I ask solr to list cores:
> >
> > $ curl "http://localhost:8983/solr/admin/cores?wt=json&action=status";
> >
> > From the explanation in the wiki, it looks like solr would manage loading
> > and unloading transient cores for me without having to worry about them,
> > but this is not what's happening.
> >
> > The situation is different when I restart solr; it does the "right thing"
> > by loading the maximum cores set by transientCacheSize. When I add more
> > cores, the old behavior happens again, where all created transient cores
> > are loaded in solr.
> >
> > I'm using the development branch lucene_solr_4_3 to run my example. I can
> > open a jira if need be.
>


disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Dennis Haller
Hi,

Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
nodes, it is possible to manage zookeeper maintenance and online
availability to be close to %100. But what is the worst case for Solr if
for some unanticipated reason all Zookeeper nodes go offline?

Could someone comment on a couple of possible scenarios for which all ZK
nodes are offline. What would happen to Solr and what would be needed to
recover in each case?
1) brief interruption, say <2 minutes,
2) longer downtime, say 60 min

Thanks
Dennis


Re: Duplicated Documents Across shards

2013-05-03 Thread Iker Mtnz. Apellaniz
We are currently using version 4.2.
We have made tests with a single document and it gives us a 2 document
count. But if we force to shard into te first machine, the one with a
unique shard, the count gives us 1 document.
I've tried using distrib=false parameter, it gives us no duplicate
documents, but the same document appears to be in two different shards.

Finally, about the separate directories, We have only one directory for the
data in each physical machine and collection, and I don't see any subfolder
for the different shards.

Is it possible that we have something wrong with the dataDir configuration
to use multiple shards in one machine?

${solr.data.dir:}




2013/5/3 Erick Erickson 

> What version of Solr? The custom routing stuff is quite new so
> I'm guessing 4x?
>
> But this shouldn't be happening. The actual index data for the
> shards should be in separate directories, they just happen to
> be on the same physical machine.
>
> Try querying each one with &distrib=false to see the counts
> from single shards, that may shed some light on this. It vaguely
> sounds like you have indexed the same document to both shards
> somehow...
>
> Best
> Erick
>
> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz
>  wrote:
> > Hi,
> >   We have currently a solrCloud implementation running 5 shards in 3
> > physical machines, so the first machine will have the shard number 1, the
> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed that
> > while queryng numFoundDocs decreased when we increased the start param.
> >   After some investigation we found that the documents in shards 2 to 5
> > were being counted twice. Querying to shard 2 will give you back the
> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our guess
> is
> > that the physical index for both shard 2&4 is shared, so the shards don't
> > know which part of it is for each one.
> >   The uniqueKey is correctly defined, and we have tried using shard
> prefix
> > (shard1!docID).
> >
> >   Is there any way to solve this problem when a unique physical machine
> > shares shards?
> >   Is it a "real" problem os it just affects facet & numResults?
> >
> > Thanks
> >Iker
> >
> > --
> > /** @author imartinez*/
> > Person me = *new* Developer();
> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
> > me.setTwit("@mitxino77 ");
> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*,
> World"]});
> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
> > me.setWebs({*urbasaabentura.com, ikertxef.com*});
> > *return* me;
>



-- 
/** @author imartinez*/
Person me = *new* Developer();
me.setName(*"Iker Mtz de Apellaniz Anzuola"*);
me.setTwit("@mitxino77 ");
me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]});
me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*});
*return* me;


Re: Configure Shingle Filter to ignore ngrams made of tokens with same start and end

2013-05-03 Thread Steve Rowe
An issue exists for this problem: 
https://issues.apache.org/jira/browse/LUCENE-3475

On May 3, 2013, at 11:00 AM, Walter Underwood  wrote:

> The shingle filter should respect positions. If it doesn't, that is worth 
> filing a bug so we know about it.
> 
> wunder
> 
> On May 3, 2013, at 10:50 AM, Jack Krupansky wrote:
> 
>> In short, no. I don't think you want to use the shingle filter on a token 
>> stream that has multiple tokens at the same position, otherwise, you will 
>> get confused "suggestions", as you've encountered.
>> 
>> -- Jack Krupansky
>> 
>> -Original Message- From: Rounak Jain
>> Sent: Friday, May 03, 2013 7:34 AM
>> To: solr-user@lucene.apache.org
>> Subject: Configure Shingle Filter to ignore ngrams made of tokens with same 
>> start and end
>> 
>> Hello,
>> 
>> I was using Shingle Fitler with Suggester to implement an autosuggest
>> dropdown. The field I'm using with shingle filter has a worddelimiter with
>> preserveoriginal=1 to tokenize "women's" as "women's" and "womens."
>> 
>> Because of this, when shingle filter is generating word ngrams, apart from
>> the expected tokens, there's also a "women's womens" tokens. I wanted to
>> know if there's any way to configure ShingleFilter so that it ignores
>> tokens with same start and end values.
>> 
>> Thanks,
>> Rounak 
> 
> 
> 
> 



Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Otis Gospodnetic
I *think* at this point SolrCloud without ZooKeeper is like a .
body without a head?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Fri, May 3, 2013 at 3:21 PM, Dennis Haller  wrote:
> Hi,
>
> Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
> expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
> nodes, it is possible to manage zookeeper maintenance and online
> availability to be close to %100. But what is the worst case for Solr if
> for some unanticipated reason all Zookeeper nodes go offline?
>
> Could someone comment on a couple of possible scenarios for which all ZK
> nodes are offline. What would happen to Solr and what would be needed to
> recover in each case?
> 1) brief interruption, say <2 minutes,
> 2) longer downtime, say 60 min
>
> Thanks
> Dennis


Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Walter Underwood
Ideally, the Solr nodes should be able to continue as long as no node fails. 
Failure of a leader would be bad, failure of non-leader replicas might cause 
some timeouts, but could be survivable.

Of course, nodes could not be added.

wunder

On May 3, 2013, at 5:05 PM, Otis Gospodnetic wrote:

> I *think* at this point SolrCloud without ZooKeeper is like a .
> body without a head?
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Fri, May 3, 2013 at 3:21 PM, Dennis Haller  wrote:
>> Hi,
>> 
>> Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
>> expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
>> nodes, it is possible to manage zookeeper maintenance and online
>> availability to be close to %100. But what is the worst case for Solr if
>> for some unanticipated reason all Zookeeper nodes go offline?
>> 
>> Could someone comment on a couple of possible scenarios for which all ZK
>> nodes are offline. What would happen to Solr and what would be needed to
>> recover in each case?
>> 1) brief interruption, say <2 minutes,
>> 2) longer downtime, say 60 min
>> 
>> Thanks
>> Dennis

--
Walter Underwood
wun...@wunderwood.org





Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Shawn Heisey
On 5/3/2013 6:07 PM, Walter Underwood wrote:
> Ideally, the Solr nodes should be able to continue as long as no node fails. 
> Failure of a leader would be bad, failure of non-leader replicas might cause 
> some timeouts, but could be survivable.
> 
> Of course, nodes could not be added.

I have read a few things that say things go read only when the zookeeper
ensemble loses quorum.  I'm not sure whether that means that Solr goes
read only or zookeeper goes read only.  I would be interested in knowing
exactly what happens when zookeeper loses quorum as well as what happens
if all three (or more) zookeeper nodes in the ensemble go away entirely.

I have a SolrCloud I can experiment with, but I need to find a
maintenance window for testing, so I can't check right now.

Thanks,
Shawn



Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Anshum Gupta
In case all your Zk nodes go down, the querying would continue to work fine (as 
far as no nodes fail) but you'd not be able to add docs.

Sent from my iPhone

On 03-May-2013, at 17:52, Shawn Heisey  wrote:

> On 5/3/2013 6:07 PM, Walter Underwood wrote:
>> Ideally, the Solr nodes should be able to continue as long as no node fails. 
>> Failure of a leader would be bad, failure of non-leader replicas might cause 
>> some timeouts, but could be survivable.
>> 
>> Of course, nodes could not be added.
> 
> I have read a few things that say things go read only when the zookeeper
> ensemble loses quorum.  I'm not sure whether that means that Solr goes
> read only or zookeeper goes read only.  I would be interested in knowing
> exactly what happens when zookeeper loses quorum as well as what happens
> if all three (or more) zookeeper nodes in the ensemble go away entirely.
> 
> I have a SolrCloud I can experiment with, but I need to find a
> maintenance window for testing, so I can't check right now.
> 
> Thanks,
> Shawn
> 


Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Gopal Patwa
agree with Anshum and Netflix has very nice supervisor system for ZooKeeper
if they goes down it will restart them automatically

http://techblog.netflix.com/2012/04/introducing-exhibitor-supervisor-system.html
https://github.com/Netflix/exhibitor




On Fri, May 3, 2013 at 6:53 PM, Anshum Gupta  wrote:

> In case all your Zk nodes go down, the querying would continue to work
> fine (as far as no nodes fail) but you'd not be able to add docs.
>
> Sent from my iPhone
>
> On 03-May-2013, at 17:52, Shawn Heisey  wrote:
>
> > On 5/3/2013 6:07 PM, Walter Underwood wrote:
> >> Ideally, the Solr nodes should be able to continue as long as no node
> fails. Failure of a leader would be bad, failure of non-leader replicas
> might cause some timeouts, but could be survivable.
> >>
> >> Of course, nodes could not be added.
> >
> > I have read a few things that say things go read only when the zookeeper
> > ensemble loses quorum.  I'm not sure whether that means that Solr goes
> > read only or zookeeper goes read only.  I would be interested in knowing
> > exactly what happens when zookeeper loses quorum as well as what happens
> > if all three (or more) zookeeper nodes in the ensemble go away entirely.
> >
> > I have a SolrCloud I can experiment with, but I need to find a
> > maintenance window for testing, so I can't check right now.
> >
> > Thanks,
> > Shawn
> >
>


Re: disaster recovery scenarios for solr cloud and zookeeper

2013-05-03 Thread Jason Hellman
I have to imagine I'm quibbling with the original assertion that "Solr 4.x is 
architected with a dependency on Zookeeper" when I say the following:

Solr 4.x is not architected with a dependency on Zookeeper.  SolrCloud, 
however, is.  As such, if a line of reasoning drives greater concern about 
Zookeeper than (necessarily) Solr's resiliency it can clearly be opted to use 
Solr 4.x without Zookeeper.

I have to further imagine that isn't really the point of the original message.  
Unfortunately for me somehow I'm obsessing on saying it :)

On May 3, 2013, at 12:21 PM, Dennis Haller  wrote:

> Hi,
> 
> Solr 4.x is architected with a dependency on Zookeeper, and Zookeeper is
> expected to have a very high (perfect?) availability. With 3 or 5 zookeeper
> nodes, it is possible to manage zookeeper maintenance and online
> availability to be close to %100. But what is the worst case for Solr if
> for some unanticipated reason all Zookeeper nodes go offline?
> 
> Could someone comment on a couple of possible scenarios for which all ZK
> nodes are offline. What would happen to Solr and what would be needed to
> recover in each case?
> 1) brief interruption, say <2 minutes,
> 2) longer downtime, say 60 min
> 
> Thanks
> Dennis



How to get solr synonyms in result set.

2013-05-03 Thread Suneel Pandey
Hi, 

I want to get specific solr synonyms terms list during query time in result
set based on filter criteria.
I have implemented synonyms in .txt file.

Thanks 








-
Regards,

Suneel Pandey
Sr. Software Developer
--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html
Sent from the Solr - User mailing list archive at Nabble.com.