Re: Solr Cloud index refreshes after restart

2013-01-04 Thread Sai Gadde
Hi Erick, The issue was with zookeeper when we tried to force full replication by cleaning the datadir in zookeeper, caused the index removal. Our index always replicated full even on short outage or restart. I think "too far out of date" could be the reason. We felt zookeeper was to blame here.

Re: SolrCloud and Join Queries

2013-01-04 Thread Otis Gospodnetic
Hi, I think things will work for Hassan as he described them. The key is not to shard in his case, that's all. Hassan, yes, 1-2M docs is small. But beware of creating a crazy number (e.g. thousands) of collections per server, as each collection has some cost. Otis -- Solr & ElasticSearch Suppor

Re: Removing terms from a search query with no results

2013-01-04 Thread Otis Gospodnetic
Hi Varun, I don't think this exists in Solr... But have a look at http://sematext.com/products/dym-researcher/index.html . Look at the screenshot and you will spot something labeled as "Relaxer" in the blue area. This (Query) Relaxer is DYM ReSearcher's cousin and can be seen in action on http:

Re: Removing terms from a search query with no results

2013-01-04 Thread Jack Krupansky
Not at this time. That is something you would do at your app level - re-query with a looser query if zero results for the original query. -- Jack Krupansky -Original Message- From: Varun Thacker Sent: Friday, January 04, 2013 7:50 AM To: solr-user@lucene.apache.org Subject: Removing t

Re: StatsComponent and query times while indexing

2013-01-04 Thread Otis Gospodnetic
If you index from the outside (i.e. not using DIH) you have more control: * how many threads you use * how you batch documents * how much you wait between indexing batches ... Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Jan 4, 2013 at 6:25 PM, Marcin Rzewucki wrote: >

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jack Krupansky
That's probably as official as anything ever gets around here. -- Jack Krupansky -Original Message- From: Mark Miller Sent: Friday, January 04, 2013 11:47 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question) I'm going to push *hard* fo

Re: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Alexandre Rafalovitch
Tried Wireshark yet to see what host/port it is trying to connect and why it fails? It is a complex tool, but well worth learning. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps ev

Re: StatsComponent and query times while indexing

2013-01-04 Thread Upayavira
DIH won't make any real difference, I'd say. The work to write terms to your index still happens in either case. Upayavira On Fri, Jan 4, 2013, at 11:25 PM, Marcin Rzewucki wrote: > Thanks. I guess you're right - it's normal behaviour. Are there some > guidelines how to use ramBufferSizeMB or onl

RE: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Jay Parashar
Thanks! I had a different version of httpclient in the classpath. So the 2nd exception is gone but now I am back to the first one " org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request" -Original Message- From: Alexandre Rafalovitch [m

Re: StatsComponent and query times while indexing

2013-01-04 Thread Marcin Rzewucki
Thanks. I guess you're right - it's normal behaviour. Are there some guidelines how to use ramBufferSizeMB or only by testing ? Do you know if DIH is "gentler" than indexing via REST or solrj API ? Kind regards. On 4 January 2013 23:14, Otis Gospodnetic wrote: > Hi, > > I think what you are seein

Re: Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Alexandre Rafalovitch
For the second one: Wrong version of library on a classpath or multiple versions of library on the classpath which causes wrong classes with missing fields/variables? Or library interface baked in and the implementation is newer. Some sort of mismatch basically. Most probably in Apache http librar

Re: StatsComponent and query times while indexing

2013-01-04 Thread Otis Gospodnetic
Hi, I think what you are seeing is a general thing. Regular search is slower while there is indexing, too, of course. So maybe it's best to mentally decouple indexing part here and simply make your calls as fast as possible without indexing. Then you can add indexing and play with things like ra

Solr 4 exceptions on trying to create a collection

2013-01-04 Thread Jay Parashar
Hi All, I am getting exceptions on trying to create a collection. Any help is appreciated. While trying to create a collection, I got this error Caused by: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at org.apache.solr.client.sol

Search Engineers at SimplyHired.com

2013-01-04 Thread Jagdish Nomula
Hello Solr-Users, I thought you, or someone you know, might be interested in a very important role here at Simply Hired. The Staff Search Engineer will own the responsibility of writing the search engine of SimplyHired. You will work on cutting edge machine learning, search and big data tools

Re: search features Endeca vs Solr

2013-01-04 Thread Mark Miller
On Jan 4, 2013, at 3:41 PM, "Dyer, James" wrote: > 4. Dynamic Business Rules. There is an open JIRA issue around biz rules and drools integration. Not sure if there is any work done there, but at least some notes about it last I looked. - Mark

RE: search features Endeca vs Solr

2013-01-04 Thread Dyer, James
Sachin, You might more response on this list is you can describe a little in detail what your application needs to do. A lot of us haven't used Endeca and won't understand exactly what you mean here. With that said, I migrated a few apps from Endeca to Solr a few years back and will try to he

Re: distributed / federated search Solr

2013-01-04 Thread Alexandre Rafalovitch
I think the problem is that you have to interpret the user query (Solr has one syntax, other sources have a different one) and then combine results (how?). All of those are non-trivial. Have you looked at something like http://www.comcepta.com/en/enterprise-metasearch.html which builds on top of C

Re: background merge hit exception AND read past EOF: NIOFSIndexInput

2013-01-04 Thread Otis Gospodnetic
Sounds like you may have a corrupt index. Try running the CheckIndex tool. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 3, 2013 8:59 AM, "Karan jindal" wrote: > Hi everyone, > > I have a solr index which is built using solr 3.2. > > I am facing two problem with that solr index.

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Upayavira
I agree with the 'more mature' analysis, but surely you can use 4.0 in a 3.x style without greater difficulty, no? Upayavira On Fri, Jan 4, 2013, at 07:35 PM, Otis Gospodnetic wrote: > Hi, > > If you don't need to shard your index and don't need NRT search Solr 3.x > is > much simpler to operate

Re: distributed / federated search Solr

2013-01-04 Thread Oleg Ruchovets
Yes , it would be great to start discussion of this topic. I am looking a sort of kick start information to get start more detailed investigation. And of course may be someone already faced with this problem so please share your ideas and experience. Thanks Oleg. On Fri, Jan 4, 2013 at 2:15 PM,

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Otis Gospodnetic
Hi, If you don't need to shard your index and don't need NRT search Solr 3.x is much simpler to operate and is more mature. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 4, 2013 7:08 AM, "Dikchant Sahi" wrote: > As someone in the forum correctly said, if all Solr releases were >

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Mark Miller
On Jan 4, 2013, at 2:14 PM, Per Steffensen wrote: >> I'm not sure what the node tells Zookeeper and who does shard assignment. I >> mean, does a node explicitly say what shard it wants to be, or is that >> assigned by Zookeeper, or is that a node's choice/option? It's basically both. If you

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Would this be a reasonable (if very rough) attempt at cake diagram? https://docs.google.com/drawings/d/1XxLjds0OOm44zOVCMR-cwCJXnTs3C2x257KpCTxI1Ec/edit Not sure if I managed to get logical/physical separation clearly enough, but it could be a start. Regards, Alex. Personal blog: http://blog

Re: distributed / federated search Solr

2013-01-04 Thread Upayavira
We're not gonna have documentation to explain it. I guess it is more a question of starting a discussion here about how to do it. My thought would be to write an adapter in front of your APIs to make it look like a Solr instance, and fake distributed search. But, to get that to work, you'd need to

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Per Steffensen
It was a very good explanation, Jack! I believe I have heard most of it before, so it is really not new for me. I DO understand that the name "replica" and "replication-factor" CAN be justified, but it requires a long and thorough explanation. And thats the point. A good name for a concept mea

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Darren Govoni
Yes. In that case, core should best be described as a logical solr entity with various "managed" attributes and qualities above the physical layer (sorry, not trying to perpetuate this thread so much). On 01/04/2013 01:55 PM, Mark Miller wrote: Currently a SolrCore is 1:1 with a low level Luce

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Mark Miller
Currently a SolrCore is 1:1 with a low level Lucene index. There is no reason that needs to alway be that way. It's possible that we may at some point add built in micro sharding support that means a SolrCore could have multiple underlying Lucene indexes. Or we may not. - Mark On Jan 4, 2013,

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Good point. Agree. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Upayavira Date: To: solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection vs... Using your terminology, I'd say core is a physical solr term, and index

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Upayavira
Using your terminology, I'd say core is a physical solr term, and index is a pysical lucene term. A collection or a shard is a logical solr term. Upayavira On Fri, Jan 4, 2013, at 06:28 PM, darren wrote: > My understanding is core is a logical solr term. Index is a physical > lucene term. A solr

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
I agree. In my opinion index is a low level lucene thing. I never say a collection has an index directly. That confuses levels and creates confusion. To me at least. I think the terminology discussed is good. Just some lingering usage inconsistencies. Sent from my Verizon Wireless 4G LTE Smart

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 1:35 PM, Alexandre Rafalovitch wrote: > Hmm. Doesn't that make (logical) index=collection? And (physical) > index=core? Which creates duplication of terminology and at the same time > can cause confusion between highest logical and lowest physical level. That's why I've avo

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Hmm. Doesn't that make (logical) index=collection? And (physical) index=core? Which creates duplication of terminology and at the same time can cause confusion between highest logical and lowest physical level. Regards, Alex. P.s. Hoping not to start a new terminology war. Personal blog: http:

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
My understanding is core is a logical solr term. Index is a physical lucene term. A solr core is backed by a physical lucene index. One index per core. Solr team can correct me if its not accurate. :) Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Alex

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Jack Krupansky
The entire collection does have an index - a distributed index - which consists of a Lucene index on each core/replica for the subset of the data in that shard. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Friday, January 04, 2013 1:12 PM To: solr-user@lucen

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Alexandre Rafalovitch
Can I just start by saying that this was AMAZING. :-) When I asked the question, I certainly did not expect this level of details. And I vote on the cake diagram for WIKI as well. Perhaps, two with the first one showing the trivial collapsed state of single collection/shard/replica/core. The trivi

RE: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Markus Jelsma
Well, i hope this won't spoil everything then: https://issues.apache.org/jira/browse/SOLR-4260 I'll continue tests monday -Original message- > From:Mark Miller > Sent: Fri 04-Jan-2013 17:54 > To: solr-user@lucene.apache.org > Subject: Re: Solr 4 (CloudSolrServer and LBHttpSolrServer que

Re: indexing cpu utilization

2013-01-04 Thread Uwe Reh
Hi Mark, SOLR-3929 rocks! A nigthly build of 4.1 with maxIndexingThreads configured to 24, takes 80% to 100% of the cpu resources :-) Thank you, Otis and Gora "mpstat 10" CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 13 607 241 234 78 100

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Mark Miller
I'm going to push *hard* for a Jan release. Woe to those that get in my way :) - Mark On Jan 4, 2013, at 11:37 AM, Shawn Heisey wrote: > On 1/4/2013 8:54 AM, Luis Cappa Banda wrote: >> Any release stimation date, Mark? I heard something about January. I was >> considering using 4.0 for producti

RE: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jay Parashar
Thanks Mark. -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Friday, January 04, 2013 9:51 AM To: solr-user@lucene.apache.org Subject: Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question) CloudSolrServer can be used for indexing and is smart about indexing

Re: Solr 4.0 SolrCloud with AWS Auto Scaling

2013-01-04 Thread Bill Au
thanks for pointing me to Solr's Zookeeper servlet. I will look at the source to see how I can use to fulfill my needs. Bill On Thu, Jan 3, 2013 at 6:43 PM, Mark Miller wrote: > Technically, you want to make sure zookeeper reports the node as live and > active. > > You could use the same api

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Shawn Heisey
On 1/4/2013 8:54 AM, Luis Cappa Banda wrote: Any release stimation date, Mark? I heard something about January. I was considering using 4.0 for production but if 4.1 release is incomming I could wait a little more. I'm not a committer, but I contribute the occasional patch and keep an eye on t

Re: distributed / federated search Solr

2013-01-04 Thread Oleg Ruchovets
Ok , thank you for the answer. May be you can pointing me on documentation or any other source where can I get the Idea how to develop such extension. Thanks Oleg. On Fri, Jan 4, 2013 at 2:47 PM, Upayavira wrote: > Solr does not support federated search in the form you describe - that > is, to

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Luis Cappa Banda
Any release stimation date, Mark? I heard something about January. I was considering using 4.0 for production but if 4.1 release is incomming I could wait a little more. 2013/1/4 Mark Miller > CloudSolrServer can be used for indexing and is smart about indexing since > it knows the current clus

Re: Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Mark Miller
CloudSolrServer can be used for indexing and is smart about indexing since it knows the current cluster state. For 4.0 I'd use one per collection because there is a bug around this fixed in the upcoming 4.1 (using one for more than one collection). In fact, if you are moving to 4, it's a good i

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
This is the containment hierarchy i understand but includes both physical and logical.  Sent from my Verizon Wireless 4G LTE Smartphone Original message From: darren Date: To: dar...@ontrenet.com,yo...@lucidworks.com,solr-user@lucene.apache.org Subject: Re: Terminology qu

Solr 4 (CloudSolrServer and LBHttpSolrServer question)

2013-01-04 Thread Jay Parashar
Hi, I am trying to migrate to Solr 4 (from 3.6) for a multithreaded/multicollection environment using the Solrj java client. I need some clarification of when to use the Cloud Solr Server vs LBHttpSolrServer. Any help is appreciated. Which one do I use? The CloudSolrServer uses the LB server

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Actually. Node/collection/shard/replica/core/index Sent from my Verizon Wireless 4G LTE Smartphone Original message From: darren Date: To: yo...@lucidworks.com,solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection vs... Agreed. But for compl

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Agreed. But for completeness can it be node/collection/shard/replica/core? Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Yonik Seeley Date: To: solr-user@lucene.apache.org Subject: Re: Terminology question: Core vs. Collection vs... On Fri, Jan

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Yonik Seeley
On Fri, Jan 4, 2013 at 2:26 AM, Per Steffensen wrote: > Our biggest problem is that we really havent decided once and for all and > made sure to reflect the decision consistently across code and > documentation. As long as we havnt I believe it is still ok to change our > minds. IMO, I *think* it

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
Yes. Thats it. Its clear if we separate logical terms from physical terms. A simple cake diagram on the wiki along with perhaps a uml will solidify these concepts. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Jack Krupansky Date: To: solr-user@lu

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Jack Krupansky
I thought about adding Solr core, but it only muddies the water. Yes, it needs to be added, but carefully. In the context of SolrCloud, a Solr core is the underlying representation of a replica. Alternatively, a replica of a shard of a collection is implemented as a Solr core. [Need to factor

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread darren
This is a good explanation and makes sense. The one inconsistency is referring to a replica of a shard that has no replication. But its not that big of a problem. If you wove the term 'core' into your writeup below it would be complete and should be posted on the wiki. Sent from my Verizon Wi

Re: Terminology question: Core vs. Collection vs...

2013-01-04 Thread Jack Krupansky
Replication makes perfect sense even if our explanations so far do not. A shard is an abstraction of a subset of the data for a collection. A replica is an instance of the data of the shard and instances of Solr servers that have indicated a readiness to service queries and updates for the dat

Re: Solr Cloud index refreshes after restart

2013-01-04 Thread Erick Erickson
That is very odd. Have there been any hard commits performed at all? Even if not, there should still be an index directory. Solr will do a full replication if the replica is too far out of date, but that shouldn't create (I don't think) a new index directory unless it's a misleading message. Is th

Re: distributed / federated search Solr

2013-01-04 Thread Upayavira
Solr does not support federated search in the form you describe - that is, to make a query to Solr which solr defers to another search system. There may be ways you could achieve it (Solr is pretty extensible) and such a feature would be a very useful one, but it would take some, likely significan

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Upayavira
3.6.2 is a maintenance release with bug fixes for existing 3.x users for whom an upgrade to 4.0 is too big a leap at present. 4.0 is the release that will see active development from here on in. If you ware starting with a new project, 4.0 seems a reasonable place to start. I'd expect 4.1 to be out

Re: What can we do if one shard's index crash

2013-01-04 Thread Erick Erickson
First, I'm assuming SolrCloud with Zookeeper etc. 1> Don't do anything. If Node A is the leader, the replica for that shard will become the leader. 2> This is a little unclear. There are two cases, a> the leader crashed or b> the replica crashed. a> no problem, distributed in

carrot2 vs Apache UIMA

2013-01-04 Thread puneet139
Hi Friends, I need a help , i want to implement clustering in my solr , i have studied both carrot2 and apache uima framework , can anyone suggest me which is better to use , but with reasons. Thanks in advance Puneet Chaturvedi -- View this message in context: http://lucene.472066.n3.nabb

Re: Solr 3.6.2 or 4.0

2013-01-04 Thread Dikchant Sahi
As someone in the forum correctly said, if all Solr releases were evolutionary Solr 4.0 is revolutionary. It has lots of improvement over the previous releases like NoSql features, atomic updates, cloud features and lot more. Solr 4.0 would be the right migration I believe. Can someone in the for

Solr 3.6.2 or 4.0

2013-01-04 Thread vijeshnair
We are starting a new e-com application from this month onwards, for which I am trying to identify the right SOLR release. We were using 3.4 in our previous project, bu I have read in multiple blogs and forums about the improvements that SOLR 4 has in terms of efficient memory management, less OOMs

Re: SolrCloud and Join Queries

2013-01-04 Thread Per Steffensen
On 1/4/13 9:21 AM, Hassan wrote: Hi, I am considering SolrCloud for our applications but I have run into the limitation of not being able to use Join Queries in distributed searches. Our requirements are the following: - SolrCloud will serve many applications where each application "index" i

RE: MoreLikeThis supporting multiple document IDs as input?

2013-01-04 Thread David Parks
Aha! &mlt=true, that was the key I hadn't worked out before (thought it was &qt=mlt that achieved that), things are looking rosy now, and these results are a perfect fit for my needs. Thanks very much for your time to help explain this!! David -Original Message- From: Jack Krupansky [mai

SolrCloud and Join Queries

2013-01-04 Thread Hassan
Hi, I am considering SolrCloud for our applications but I have run into the limitation of not being able to use Join Queries in distributed searches. Our requirements are the following: - SolrCloud will serve many applications where each application "index" is separate from other application.