Re: Rerank queries and grouping

2015-07-16 Thread Diego Ceccarelli
Hi Joel,

Thanks for your reply,
Yes, I considered the Collapse and Expand [1] , the problem is that I'll
deploy it
on a multishard instance and I want to retrieve the top N groups.
I thing that collapse and expand could have two downsides:

i) it won't guarantee the retrieval of N groups, I could mitigate
retrieving a larger number of documents,
but I would prefer to avoid.
ii) It won't guarantee to have the best document per group: a shard A could
have high scoring documents in a group G1, and then have a top scoring
document D for the group G2, but since each shard returns only its top
documents, potentially I could  lose D as head of the group G2, if another
shard returns documents in G2 with a lower score.

Cheers,
Diego

[1]
https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results

On Thu, Jul 16, 2015 at 2:01 AM, Joel Bernstein  wrote:

> As you've seen RankQueries won't currently have any effect on Grouping
> queries.
>
> A RankQuery can be combined with Collapse and Expand though. You may want
> to review Collapse and Expand and see if it meets your use case.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Jul 15, 2015 at 2:36 PM, Diego Ceccarelli <
> diego.ceccare...@gmail.com> wrote:
>
> > Hi Everyone,
> >
> > I need to use a RankQuery within a grouping [1].
> > I did some experiments with RerankQuery [2]  and solr 4.10.2 and it seems
> > that
> > if you group on a field, the reranking query is completely ignored
> > (on the cloud, and on a single instance).
> > I would expect to see the results in each group reranked using the
> > RerankQuery.
> >
> > I had a look at the grouping code and documentation and,
> > if I correctly understood, the grouping works in two steps:
> >
> > 1) first the top groups are retrieved
> > 2) top documents for each group in the top groups are retrieved.
> >
> > I thought that the collector generated by a RankQuery could be injected
> > in 2), i.e., for each group set a rerank collector... but I'm not sure if
> > this solution
> > is feasable since the collectors are set in Lucene
> > (AbstractSecondPassGroupingCollector)
> > and a RankQuery is defined in Solr...
> >
> > Any suggestion?
> >
> > Thanks,
> > Diego
> >
> > [1] https://cwiki.apache.org/confluence/display/solr/Result+Grouping
> > [2] https://cwiki.apache.org/confluence/display/solr/Query+Re-Ranking
> >
>


Issue with using createNodeSet in Solr Cloud

2015-07-16 Thread Savvas Andreas Moysidis
Hello There,

I am trying to use the createNodeSet parameter when creating a new
collection but I'm getting an error when doing so.

More specifically, I have four Solr instances running locally in separate
JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986) and a
standalone Zookeeper instance which all Solr instances point to. The four
Solr instances have no collections added to them and are all up and running
(I can access the admin page in all of them).

Now, I want to create a collections in only two of these four instances (
127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the
following URL:

http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A

I am getting the following response:



400
3503


org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Cannot create collection collection_A. No live Solr-instances among
Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
_solr



Cannot create collection collection_A. No live Solr-instances among
Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
_solr

400



Cannot create collection collection_A. No live Solr-instances among
Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
_solr

400




The instances are definitely up and running (at least the admin console can
be accessed as mentioned) and if I remove the createNodeSet parameter the
collection is created as expected.

Am I missing something obvious or is this a bug?

The exact Solr version I'm using is 4.9.1.

Any pointers would be much appreciated.

Thanks,
Savvas


Using Facet API to get histograms for two keywords?

2015-07-16 Thread Kim Kangmo
Hi lovely Solr masters!

I am using Facet API to get histograms for two keywords.

For each keyword, the histogram calculates the number of documents with the
keyword every hour.


An example of list of documents :

{ q_s : "keyword1", when_dt : "2015-05-27T15:13:00.000Z" }

{ q_s : "keyword2", when_dt : "2015-05-27T16:17:00.000Z" }

{ q_s : "keyword2", when_dt : "2015-05-27T16:18:00.000Z" }

{ q_s : "keyword1", when_dt : "2015-05-27T16:20:00.000Z" }


An example of output historgram using facet:

"keyword1" : {

"2015-05-27T15:00:00.000Z" : "1",

"2015-05-27T16:00:00.000Z" : "1"

}

"keyword2" : {

"2015-05-27T15:00:00.000Z" : "0",

"2015-05-27T16:00:00.000Z" : "2"

}


My question is what is best practice to run the query once for N keywords
using Facet JSON API ?


The following command using the query using Json request API successfully
responds with the above expected results, but the facet.date part of the
Json request is hard to read for programmers.

curl -d @- http://localhost:8983/solr/queries/select
{
 "params": {
   "wt": "json",
   "indent": true,
   "_": 1436772757584,
   "q": "*:*",
   "rows": 0,
   "fq": [
 "{!tag=fq0}q_s:keyword1",
 "{!tag=fq1}q_s:keyword2",
 "when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]"
   ],
   "facet": true,
   "facet.date": [
 "{!ex=fq1 key=keyword1 facet.date.start=2015-05-27T15:00:00.000Z
facet.date.end=2015-05-28T10:19:04.000Z facet.date.gap=+1HOURS
facet.date.sort=when_dt}when_dt",
 "{!ex=fq0 key=keyword2 facet.date.start=2015-05-27T15:00:00.000Z
facet.date.end=2015-05-28T10:19:04.000Z facet.date.gap=+1HOURS
facet.date.sort=when_dt}when_dt"
   ]
 }
}

 To make the complicated part simpler, I decided to use Facet API as shown
below, but it does not return any facet results.

curl -d @- http://localhost:8983/solr/queries/select

{
 "params": {
   "wt": "json",
   "indent": true,
   "_": 1436772757584,
   "q": "*:*",
   "rows": 0,
   "fq": [
 "{!tag=fq0}q_s:keyword1",
 "{!tag=fq1}q_s:keyword2",
 "when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]"
   ]
 },
 "facet": {
   "keyword1": {
 "range": {
   "excludeTags": ["fq1"],
   "field": "when_dt",
   "start": "2015-05-27T15:00:00.000Z",
   "end": "2015-05-28T10:19:04.000Z",
   "gap": "+1HOURS",
   "sort": "when_dt"
 }
   },
   "keyword2": {
 "range": {
   "excludeTags": ["fq0"],
   "field": "when_dt",
   "start": "2015-05-27T15:00:00.000Z",
   "end": "2015-05-28T10:19:04.000Z",
   "gap": "+1HOURS",
   "sort": "when_dt"
 }
   }
 }
}

 Response :

> {
>  "responseHeader":{
>"status":0,
>"QTime":2,
>"params":{
>  "json":"{  \"params\": {\"wt\": \"json\",\"indent\": true,
> \"_\": 1436772757584,\"q\": \"*:*\",\"rows\": 0,\"fq\": [
> \"{!tag=fq0}q_s:keyworkd1\",   \"{!tag=fq1}q_s:keyword2\",
> \"when_dt:[2015-05-27T15:00:00.000Z TO 2015-05-28T10:19:04.000Z]\" ]
> },  \"facet\": {\"keyword1\": {   \"range\": {
> \"excludeTags\": [\"fq1\"],\"field\": \"when_dt\",
> \"start\": \"2015-05-27T15:00:00.000Z\", \"end\":
> \"2015-05-28T10:19:04.000Z\", \"gap\": \"+1HOURS\",
> \"sort\": \"when_dt\"  } },\"keyword2\": {   \"range\": {
>   \"excludeTags\": [\"fq0\"],\"field\": \"when_dt\",
> \"start\": \"2015-05-27T15:00:00.000Z\", \"end\":
> \"2015-05-28T10:19:04.000Z\", \"gap\": \"+1HOURS\",
> \"sort\": \"when_dt\"  } }  }}"}},
>  "response":{"numFound":0,"start":0,"docs":[]
>  },
>  "facets":{
>"count":0}}


Any idea about this?

Thanks in advance. Solr rocks!

- Kangmo


Programmatically find out if node is overseer

2015-07-16 Thread Markus Jelsma
Hello - i need to run a thread on a single instance of a cloud so need to find 
out if current node is the overseer. I know we can already programmatically 
find out if this replica is the leader of a shard via isLeader(). I have looked 
everywhere but i cannot find an isOverseer. I did find the election stuff but i 
am unsure if that is what i need to use.

Any thoughts?

Thanks!
Markus


Setup cloud collection

2015-07-16 Thread SolrUser2015
Hi, I'm new to solr!

So downloaded version 5.2 and modified the solr file so it allows me to create 
a 5 node cluster:

> 5 shards and replication factor 3 < 

Now I see that one node is marked as leader for 3 shards.

So my question is, how can 1 node serve requests for 3 shards, wouldn't that be 
uneven distribution of load?  

Regards

Multiple boost queries on a specific field

2015-07-16 Thread bengates
Hello,

I'm trying to use the boost queries for the 1st time and I need some help.
Let's assume my documents have a /provider /field, which is populated by a
string, i.e. A, B, C, D, E.

I'd like to assign weight to providers. A is /^2.0/, B is /^1.5/ and the
others are 1.0.

So, if I run the following query :
/?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/
My first results have provider A.

Let's try another one :

?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​B​^​1.5 
My​ first results have provider B. Good!

But that's not exactly what I'm looking for (otherwise I'd just made a
filterQuery). 
What I want is a /multiple /boost query. So I tried :
/?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:​(​A^2.0​ B^1.5)​/
Then my first results have provider B. It's not logical.

I tried another syntax :
/?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0&bq=provider:B^1.5/
But nothing changes.

Can you help me ?

My 2nd problem is that I would like to give some weight to the newer
documents, but in a range of a month.
That means, if a document named B1 with provider B (weighs 1.5) is newer
than a document named A1 with provider A (weighs 2.0), I want that B1 gets
on top of A1 /only /if the difference between their dates are /at least 1
month/. Do you know how to do this ?

Since my boost logic depends on the user navigation, I have to realize this
only at query-time.

Thanks for your help,
Ben



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Setup cloud collection

2015-07-16 Thread Esther-Melaine Quansah
If you’ve set numShards to 5, then your indexes are split evenly across all 5 
shards and they should all be considered leaders in charge of updating the 
replicas with new information. Could it be the case that 1 of your shards has 3 
replicas and is the leader for that specific shard? What specifically is 
indicating that one node is marked as leader for 3 shards?

Thanks,

Esther Quansah

> On Jul 16, 2015, at 7:51 AM, SolrUser2015  wrote:
> 
> Hi, I'm new to solr!
> 
> So downloaded version 5.2 and modified the solr file so it allows me to 
> create a 5 node cluster:
> 
>> 5 shards and replication factor 3 < 
> 
> Now I see that one node is marked as leader for 3 shards.
> 
> So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
> be uneven distribution of load?  
> 
> Regards



Re: Setup cloud collection

2015-07-16 Thread SolrUser2015
I'm looking at the cloud graph in the admin UI.

The black dots with green indicates same node as leader for three shards out of 
five.

Regards

> On 16 jul 2015, at 14:31, Esther-Melaine Quansah 
>  wrote:
> 
> If you’ve set numShards to 5, then your indexes are split evenly across all 5 
> shards and they should all be considered leaders in charge of updating the 
> replicas with new information. Could it be the case that 1 of your shards has 
> 3 replicas and is the leader for that specific shard? What specifically is 
> indicating that one node is marked as leader for 3 shards?
> 
> Thanks,
> 
> Esther Quansah
> 
>> On Jul 16, 2015, at 7:51 AM, SolrUser2015  wrote:
>> 
>> Hi, I'm new to solr!
>> 
>> So downloaded version 5.2 and modified the solr file so it allows me to 
>> create a 5 node cluster:
>> 
>>> 5 shards and replication factor 3 <
>> 
>> Now I see that one node is marked as leader for 3 shards.
>> 
>> So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
>> be uneven distribution of load?  
>> 
>> Regards
> 


Re: Setup cloud collection

2015-07-16 Thread Shawn Heisey
On 7/16/2015 5:51 AM, SolrUser2015 wrote:
> Hi, I'm new to solr!
> 
> So downloaded version 5.2 and modified the solr file so it allows me to 
> create a 5 node cluster:
> 
>> 5 shards and replication factor 3 < 
> 
> Now I see that one node is marked as leader for 3 shards.
> 
> So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
> be uneven distribution of load?  

SolrCloud will distribute individual queries to different replicas, so
over time the entire cloud will be used.  The leader role shouldn't
affect queries, that role is mostly there for indexing and fault handling.

If you are really concerned about this, you can assign preferred leaders
and then ask Solr to reshuffle them.  I have never used this
functionality.  Here's the documentation on it:

https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders

Thanks,
Shawn



SolrCloud 5.2.1 - collection creation error

2015-07-16 Thread Aaron Gibbons
I'm installing SolrCloud 5.2.1 on 4 Ubuntu 14.04 machines with 3 external
zookeepers.  I've installed the solr machines using Ansible following the
"Taking Solr to Production" steps.

   1. Download 5.2.1
   2. Extract installation script
   3. Run installation script

Then I stop solr and make my configuration changes to the solr.in.sh file
(adding zookeepers) and log4j.properties (recommended changes).  Restart
solr and everything looks good.

The problem I have is that I can't create a collection.  I create the
collection folder in /var/solr/data and tried both the bin script and API
but get the error below. I've tried 5.2.0 also and both Java 7 and 8 with
the same result.

50047java.io.InvalidClassException:
org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
classdesc serialVersionUID = 3123208377723774018, local class
serialVersionUID = 3945300637328478755org.apache.solr.common.SolrException:
java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
local class incompatible: stream classdesc serialVersionUID =
3123208377723774018, local class serialVersionUID = 3945300637328478755 at
org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:62)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:228)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:646)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745) Caused by:
java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
local class incompatible: stream classdesc serialVersionUID =
3123208377723774018, local class serialVersionUID = 3945300637328478755 at
java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at
org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:60)
... 27 more 500


Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thanks Shawn, but don't want to build something in front of Solr cloud to help 
Solr assign leader role to distribute load of indexing.

Instead of doing this manual step (rebalance leaders) maybe one host should not 
take the leader role of multiple shards for same collection if the number of 
live nodes are equal to number of shards.

But assuming that when you say it will happen "over time", Maybe I'll continue 
indexing and see that leaders will be rebalanced soon.

Regards

> On 16 Jul 2015, at 14:57, Shawn Heisey  wrote:
> 
>> On 7/16/2015 5:51 AM, SolrUser2015 wrote:
>> Hi, I'm new to solr!
>> 
>> So downloaded version 5.2 and modified the solr file so it allows me to 
>> create a 5 node cluster:
>> 
>>> 5 shards and replication factor 3 <
>> 
>> Now I see that one node is marked as leader for 3 shards.
>> 
>> So my question is, how can 1 node serve requests for 3 shards, wouldn't that 
>> be uneven distribution of load?  
> 
> SolrCloud will distribute individual queries to different replicas, so
> over time the entire cloud will be used.  The leader role shouldn't
> affect queries, that role is mostly there for indexing and fault handling.
> 
> If you are really concerned about this, you can assign preferred leaders
> and then ask Solr to reshuffle them.  I have never used this
> functionality.  Here's the documentation on it:
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders
> 
> Thanks,
> Shawn
> 


Re: Setup cloud collection

2015-07-16 Thread Shawn Heisey
On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote:
> Thanks Shawn, but don't want to build something in front of Solr cloud to 
> help Solr assign leader role to distribute load of indexing.
>
> Instead of doing this manual step (rebalance leaders) maybe one host should 
> not take the leader role of multiple shards for same collection if the number 
> of live nodes are equal to number of shards.
>
> But assuming that when you say it will happen "over time", Maybe I'll 
> continue indexing and see that leaders will be rebalanced soon.

Unless you have a fairly major event (like Solr restarting or an
operation taking longer than zkClientTimeout) your leaders will never
change.  It's a semi-permanent role.  When a qualifying event happens,
SolrCloud does an election process to determine the leader, but
elections do not happen unless you force them with a REBALANCELEADERS
action or one of several errors occurs.

You don't have to build anything in front of Solr.  You simply have to
assign a preferred leader for each shard, an action that can be done
with an HTTP call in a browser.  I don't think we have anything in the
admin UI to assign preferred leaders ... I will look into it and open an
issue if necessary.

The thing that I'm saying will happen over time is that all replicas
will be used for queries.  If you send a thousand queries, you'll find
that they will be divided fairly evenly among all replicas.  The fact
that you have one node as leader for three of your shards is not very
much of a big deal, but if you really want to change it, you can do so
with the preferred leader feature.

Thanks,
Shawn



Re: Setup cloud collection

2015-07-16 Thread solr . user . 1507
Thank you, very good explanation.

Regards

> On 16 Jul 2015, at 17:12, Shawn Heisey  wrote:
> 
>> On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote:
>> Thanks Shawn, but don't want to build something in front of Solr cloud to 
>> help Solr assign leader role to distribute load of indexing.
>> 
>> Instead of doing this manual step (rebalance leaders) maybe one host should 
>> not take the leader role of multiple shards for same collection if the 
>> number of live nodes are equal to number of shards.
>> 
>> But assuming that when you say it will happen "over time", Maybe I'll 
>> continue indexing and see that leaders will be rebalanced soon.
> 
> Unless you have a fairly major event (like Solr restarting or an
> operation taking longer than zkClientTimeout) your leaders will never
> change.  It's a semi-permanent role.  When a qualifying event happens,
> SolrCloud does an election process to determine the leader, but
> elections do not happen unless you force them with a REBALANCELEADERS
> action or one of several errors occurs.
> 
> You don't have to build anything in front of Solr.  You simply have to
> assign a preferred leader for each shard, an action that can be done
> with an HTTP call in a browser.  I don't think we have anything in the
> admin UI to assign preferred leaders ... I will look into it and open an
> issue if necessary.
> 
> The thing that I'm saying will happen over time is that all replicas
> will be used for queries.  If you send a thousand queries, you'll find
> that they will be divided fairly evenly among all replicas.  The fact
> that you have one node as leader for three of your shards is not very
> much of a big deal, but if you really want to change it, you can do so
> with the preferred leader feature.
> 
> Thanks,
> Shawn
> 


What does replicationFactor really do?

2015-07-16 Thread Jim . Musil
Hi,

In 5.1, we are creating a collection using the Collections API with an initial 
replicationFactor of X. This value is then stored in the state.json file for 
that collection.

If I try to issue ADDREPLICA on this cluster, it throws an error saying that 
there are no live nodes for additional replicas.

If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the 
replica is created and no errors are thrown, but replicationFactor remains at X 
in the state.json file.

Why? What does replicationFactor really mean? It seems like it's being honored 
in some cases and ignored in others.

Thanks for any help you can provide.

Cheers,
Jim




Re: What does replicationFactor really do?

2015-07-16 Thread Shawn Heisey
On 7/16/2015 10:46 AM, Jim.Musil wrote:
> In 5.1, we are creating a collection using the Collections API with an 
> initial replicationFactor of X. This value is then stored in the state.json 
> file for that collection.
>
> If I try to issue ADDREPLICA on this cluster, it throws an error saying that 
> there are no live nodes for additional replicas.
>
> If I connect a new solr node to zookeeper and issue an ADDREPLICA call, the 
> replica is created and no errors are thrown, but replicationFactor remains at 
> X in the state.json file.
>
> Why? What does replicationFactor really mean? It seems like it's being 
> honored in some cases and ignored in others.

I believe what I'm saying below is correct.  Hopefully someone with more
knowledge will speak up if I'm wrong.

If you're not using a shared filesystem (which I think right now is only
HDFS), then the only time replicationFactor is used is at collection
creation time.  It won't affect anything that happens later.

If you ARE using HDFS, then there is a feature called autoAddReplicas
which will detect when your replica count is below replicationFactor and
automatically add more replicas until you're back in compliance.  I know
almost nothing about this feature.  Here is the issue where it was added
and the page in the reference guide where the feature is mentioned:

https://issues.apache.org/jira/browse/SOLR-5656
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

Thanks,
Shawn



serious JSON Facet bug

2015-07-16 Thread Yonik Seeley
To anyone using the JSON Facet API in released Solr versions:
I discovered a serious memory leak while doing performance benchmarks
(see http://yonik.com/facet_performance/ for some of the early results).

Assuming you're in the evaluation / development phase of your project,
I'd recommend using a recent developer snapshot for now:
https://builds.apache.org/job/Solr-Artifacts-5.x/lastSuccessfulBuild/artifact/solr/package/

The fix (and performance improvements) will also be in the next Solr
release (5.3) of course.

-Yonik


Re: Setup cloud collection

2015-07-16 Thread Erick Erickson
Piling on to Shawn's comments. Leadership is a very misunderstood
role when people start using SolrCloud, and it often gets conflated
with the old "master" role in master/slave.

There is, indeed, a small additional bit of processing that goes on on
the leader node that's not done on replicas. But the REBALANCELEADER
code was put in place to handle situations where 100+ leaders happened
to be on the _same_ node. It took many tens of leaders being on a node
for the additional work imposed by being a leader to be noticed in a very
demanding environment.

Indexing is done _both_ on the leader and the replicas, so the workload
for indexing isn't substantially different. And, as Shawn says querying is
done on all replicas by a software load balancer, although you can reasonably
put a HW load balancer in front of the whole thing too.

So by and large you can completely ignore it whe leaders that aren't evenly
distributed. The additional load isn't worth the headache of trying to control
it. And it will change as you bounce Solr servers, leadership is assigned
to the node that contains the first replica of a shard to come up.

Best,
Erick

On Thu, Jul 16, 2015 at 8:23 AM,   wrote:
> Thank you, very good explanation.
>
> Regards
>
>> On 16 Jul 2015, at 17:12, Shawn Heisey  wrote:
>>
>>> On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote:
>>> Thanks Shawn, but don't want to build something in front of Solr cloud to 
>>> help Solr assign leader role to distribute load of indexing.
>>>
>>> Instead of doing this manual step (rebalance leaders) maybe one host should 
>>> not take the leader role of multiple shards for same collection if the 
>>> number of live nodes are equal to number of shards.
>>>
>>> But assuming that when you say it will happen "over time", Maybe I'll 
>>> continue indexing and see that leaders will be rebalanced soon.
>>
>> Unless you have a fairly major event (like Solr restarting or an
>> operation taking longer than zkClientTimeout) your leaders will never
>> change.  It's a semi-permanent role.  When a qualifying event happens,
>> SolrCloud does an election process to determine the leader, but
>> elections do not happen unless you force them with a REBALANCELEADERS
>> action or one of several errors occurs.
>>
>> You don't have to build anything in front of Solr.  You simply have to
>> assign a preferred leader for each shard, an action that can be done
>> with an HTTP call in a browser.  I don't think we have anything in the
>> admin UI to assign preferred leaders ... I will look into it and open an
>> issue if necessary.
>>
>> The thing that I'm saying will happen over time is that all replicas
>> will be used for queries.  If you send a thousand queries, you'll find
>> that they will be divided fairly evenly among all replicas.  The fact
>> that you have one node as leader for three of your shards is not very
>> much of a big deal, but if you really want to change it, you can do so
>> with the preferred leader feature.
>>
>> Thanks,
>> Shawn
>>


Re: SolrCloud 5.2.1 - collection creation error

2015-07-16 Thread Erick Erickson
It looks at a glance like you're in "Jar hell" and have one or more jar
files from "somewhere else" in your classpath, possibly a jar file from
an older Solr or one of the libraries.

Best,
Erick

On Thu, Jul 16, 2015 at 6:17 AM, Aaron Gibbons
 wrote:
> I'm installing SolrCloud 5.2.1 on 4 Ubuntu 14.04 machines with 3 external
> zookeepers.  I've installed the solr machines using Ansible following the
> "Taking Solr to Production" steps.
>
>1. Download 5.2.1
>2. Extract installation script
>3. Run installation script
>
> Then I stop solr and make my configuration changes to the solr.in.sh file
> (adding zookeepers) and log4j.properties (recommended changes).  Restart
> solr and everything looks good.
>
> The problem I have is that I can't create a collection.  I create the
> collection folder in /var/solr/data and tried both the bin script and API
> but get the error below. I've tried 5.2.0 also and both Java 7 and 8 with
> the same result.
>
> 50047java.io.InvalidClassException:
> org.apache.solr.client.solrj.SolrResponse; local class incompatible: stream
> classdesc serialVersionUID = 3123208377723774018, local class
> serialVersionUID = 3945300637328478755org.apache.solr.common.SolrException:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> local class incompatible: stream classdesc serialVersionUID =
> 3123208377723774018, local class serialVersionUID = 3945300637328478755 at
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:62)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:228)
> at
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:168)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
> at
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:646)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:417) at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
> at org.eclipse.jetty.server.Server.handle(Server.java:497) at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
> at
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
> at java.lang.Thread.run(Thread.java:745) Caused by:
> java.io.InvalidClassException: org.apache.solr.client.solrj.SolrResponse;
> local class incompatible: stream classdesc serialVersionUID =
> 3123208377723774018, local class serialVersionUID = 3945300637328478755 at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at
> org.apache.solr.client.solrj.SolrResponse.deserialize(SolrResponse.java:60)
> ... 27 more 500


Re: Programmatically find out if node is overseer

2015-07-16 Thread Erick Erickson
look at the overseer election ephemeral node in ZK, the first one in
line is the current overseer.

Best,
Erick

On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma
 wrote:
> Hello - i need to run a thread on a single instance of a cloud so need to 
> find out if current node is the overseer. I know we can already 
> programmatically find out if this replica is the leader of a shard via 
> isLeader(). I have looked everywhere but i cannot find an isOverseer. I did 
> find the election stuff but i am unsure if that is what i need to use.
>
> Any thoughts?
>
> Thanks!
> Markus


Re: Multiple boost queries on a specific field

2015-07-16 Thread Erick Erickson
Why are you using q.alt? That uses much different query parsing
logic that I believe bypasses the dismax stuff. Just use q=*:*.

*:* also short-circuits most of the scoring since there's nothing to score
there, try with q= real terms.

As to your second query, see
https://wiki.apache.org/solr/FunctionQuery#Date_Boosting
for a way to make more recent documents bubble to the top. It doesn't quite do
what you're asking, but it might be "close enough".

Best,
Erick

On Thu, Jul 16, 2015 at 4:55 AM, bengates  wrote:
> Hello,
>
> I'm trying to use the boost queries for the 1st time and I need some help.
> Let's assume my documents have a /provider /field, which is populated by a
> string, i.e. A, B, C, D, E.
>
> I'd like to assign weight to providers. A is /^2.0/, B is /^1.5/ and the
> others are 1.0.
>
> So, if I run the following query :
> /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0/
> My first results have provider A.
>
> Let's try another one :
>
> ?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:B^1.5
> My first results have provider B. Good!
>
> But that's not exactly what I'm looking for (otherwise I'd just made a
> filterQuery).
> What I want is a /multiple /boost query. So I tried :
> /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:(A^2.0 B^1.5)/
> Then my first results have provider B. It's not logical.
>
> I tried another syntax :
> /?q=&wt=json&defType=dismax&q.alt=*:*&bq=provider:A^2.0&bq=provider:B^1.5/
> But nothing changes.
>
> Can you help me ?
>
> My 2nd problem is that I would like to give some weight to the newer
> documents, but in a range of a month.
> That means, if a document named B1 with provider B (weighs 1.5) is newer
> than a document named A1 with provider A (weighs 2.0), I want that B1 gets
> on top of A1 /only /if the difference between their dates are /at least 1
> month/. Do you know how to do this ?
>
> Since my boost logic depends on the user navigation, I have to realize this
> only at query-time.
>
> Thanks for your help,
> Ben
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Multiple-boost-queries-on-a-specific-field-tp4217678.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Issue with using createNodeSet in Solr Cloud

2015-07-16 Thread Erick Erickson
There were a couple of cases where the "no live servers" was being
returned when the error was something completely different. Does the
Solr log show something more useful? And are you sure you have a
configset named collection_A?

'cause this works (admittedly on 5.x) fine for me, and I'm quite sure
there are bunches of automated tests that would be failing so I
suspect it's just a misleading error being returned.

Best,
Erick

On Thu, Jul 16, 2015 at 2:22 AM, Savvas Andreas Moysidis
 wrote:
> Hello There,
>
> I am trying to use the createNodeSet parameter when creating a new
> collection but I'm getting an error when doing so.
>
> More specifically, I have four Solr instances running locally in separate
> JVMs (127.0.0.1:8983, 127.0.0.1:8984, 127.0.0.1:8985, 127.0.0.1:8986) and a
> standalone Zookeeper instance which all Solr instances point to. The four
> Solr instances have no collections added to them and are all up and running
> (I can access the admin page in all of them).
>
> Now, I want to create a collections in only two of these four instances (
> 127.0.0.1:8983, 127.0.0.1:8984) but when I hit one instance with the
> following URL:
>
> http://localhost:8983/solr/admin/collections?action=CREATE&name=collection_A&numShards=1&replicationFactor=2&maxShardsPerNode=1&createNodeSet=127.0.0.1:8983_solr,127.0.0.1:8984_solr&collection.configName=collection_A
>
> I am getting the following response:
>
> 
> 
> 400
> 3503
> 
> 
> org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
> Cannot create collection collection_A. No live Solr-instances among
> Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
> _solr
> 
> 
> 
> Cannot create collection collection_A. No live Solr-instances among
> Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
> _solr
> 
> 400
> 
> 
> 
> Cannot create collection collection_A. No live Solr-instances among
> Solr-instances specified in createNodeSet:127.0.0.1:8983_solr,127.0.0.1:8984
> _solr
> 
> 400
> 
> 
>
>
> The instances are definitely up and running (at least the admin console can
> be accessed as mentioned) and if I remove the createNodeSet parameter the
> collection is created as expected.
>
> Am I missing something obvious or is this a bug?
>
> The exact Solr version I'm using is 4.9.1.
>
> Any pointers would be much appreciated.
>
> Thanks,
> Savvas


Re: Programmatically find out if node is overseer

2015-07-16 Thread Shai Erera
An easier way (IMO) and more 'official' is to use the CLUSTERSTATUS (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api18)
or OVERSEERSTATUS (
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api17)
API.

The OVERSEERSTATUS returns a 'leader' item which says who is the overseer,
at least as far as I understand. Not sure what is returned in case there
are multiple nodes with the overseer role.

The CLUSTERSTATUS returns an 'overseer' item with all nodes that have the
overseer role assigned. I'm usually using that API to query for the status
of my Solr cluster.

Shai

On Fri, Jul 17, 2015 at 3:55 AM, Erick Erickson 
wrote:

> look at the overseer election ephemeral node in ZK, the first one in
> line is the current overseer.
>
> Best,
> Erick
>
> On Thu, Jul 16, 2015 at 3:42 AM, Markus Jelsma
>  wrote:
> > Hello - i need to run a thread on a single instance of a cloud so need
> to find out if current node is the overseer. I know we can already
> programmatically find out if this replica is the leader of a shard via
> isLeader(). I have looked everywhere but i cannot find an isOverseer. I did
> find the election stuff but i am unsure if that is what i need to use.
> >
> > Any thoughts?
> >
> > Thanks!
> > Markus
>