date:20130422

I get this exception when I try to create a new collection. someone have any
idea that what's going on?

org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RPS_12':
Could not get shard_id for core: RPS_12
coreNodeName:192.168.20.48:8983_solr_RPS_12



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Severe errors in log

I have got this in my logs. What's that mean?

ConcurrentLRUCache was not destroyed prior to finalize(), indicates a bug
-- POSSIBLE RESOURCE LEAK!!!



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Severe-errors-in-log-tp4057860.html
Sent from the Solr - User mailing list archive at Nabble.com.

The overseer is stucks

Hi,My overseer has enqueued more than 1 task and apparently is stuck.
Exists any way to force to do the enqueued tasks?A screenshot of the
overseer queue  here   



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-overseer-is-stucks-tp4057862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Stats facet on int/tint fields

2013-04-22 Thread vinothkumar raman

I have a schema like this






I wanted to find the average price faceted on cat. So was using the
stats facet to get the average on the fields like this
http://solr-serv/solr/latest/select?q=*%3A*&wt=xml&indent=true&stats=true&rows=0&stats.field=price&stats.facet=cat

Which throws an exception like this
org.apache.solr.common.SolrException: Server at
http://solr-serv/solr/latest returned non ok status:500,
message:Server Error at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

When i look at the logs this is all i get

null:java.lang.NumberFormatException: For input string: "`=?"


But when i try with stats.facet=cat_name it works perfectly fine. It
doesnt work with any other int/tint field

I am not sure whats really wrong with my query.

(Dropping it to dev list too. Incase if its a bug)


PS: Not sure if its bug so sending it to dev mailing list too.n

Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

1) Imagine you have lots and lots and lots of different Solr indexes
and a 50 node cluster. Further imagine that one of those indexes has 2
shards, and a leader + shard is adequate to handle the load. You need
some way to limit the number of nodes your index gets distributed to,
that's what replicationFactor is for. So in this case
replicationFactor=2 will stop assigning nodes to that particular
collection after there's a leader + 1 replica

2> In the system you described, there won't be more than one
shard/node. But one strategy for growth is to "overshard". That is, in
the early days you put (numbers from thin air) 10 shards/node and they
are all quite small. As your index grows, you move to two nodes with 5
shards each. And later to 5 nodes with 2 shards and so on. There are
cases where you want some way to make the most of your hardware yet
plan for expansion.

Best
Erick

On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  wrote:
> I know that: when using SolrCloud we define the number of shards into the
> system. When we start up new Solr instances each one will be a a leader for
> a shard, and if I continue to start up new Solr instances (that has
> exceeded the number number of shards) each one will be a replica for each
> leader as a round robin process.
>
> However when I read wiki there are two parameters: *replicationFactor *and *
> maxShardsPerNode.
>
> *1) Can you give details about what are they. If all newly added Solr
> instances becomes a replica what is that replication factor for?
> 2) If what I wrote is true about that round robin process what is that *
> maxShardsPerNode*? How can be more than one shard at the system I described?

Re: Solr cloud and batched updates

Thanks Yonik! You see how behind the times I get

On Sun, Apr 21, 2013 at 5:07 PM, Timothy Potter  wrote:
> That's awesome! Thanks Yonik.
>
> Tim
>
> On Sun, Apr 21, 2013 at 1:30 PM, Yonik Seeley  wrote:
>> On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter  
>> wrote:
>>> There's no problem here, but I'm curious about how batches of updates
>>> are handled on the Solr server side in Solr cloud?
>>>
>>> Going over the code for DistributedUpdateProcessor and
>>> SolrCmdDistributor, it appears that the batch is broken down and docs
>>> are processed one-by-one. By processed, I mean that each doc in the
>>> batch from the client is sent to replicas individually.
>>>
>>> This makes sense but I wonder if the forwarding on to replicas could
>>> be done in sub-batches?
>>
>> Good news... they already are sent in batches!  The docs are processed
>> one-by-one, but then buffered (into batches) for forwarding to
>> replicas.
>>
>> -Yonik
>> http://lucidworks.com

Re: is phrase search possible in solr

bq: wherein if I have a query in double quotes it simply ignores all the
tokenizers and analyzers.

Nope. In general you're quite right, you need to re-index whenever you
change your schema... You could define the query part of your field
to just use KeywordTokenizerFactory, but that would affect _all_ queries
which doesn't work for your case..

You might be able to spoof things with, say, the "raw" query parser, see:
http://wiki.apache.org/solr/SolrQuerySyntax
or perhaps the "term" query, but I think you'll have some issues here if you
need to have more than one term next to each other (i.e. phrases). And
you'll have to handle all the upstream bits yourself, e.g. making sure
casing matches.  DelhiDareDevil is indexed as delhidaredevil for instance.

You could write your own query parser that handled this as a special
case, but that would involve quite a lot of work.

Best
Erick

On Mon, Apr 22, 2013 at 1:02 AM, vicky desai  wrote:
> Hi Jack,
>
> Making a changes in the schema either keyword tokenizer or copy field option
> which u suggested would require reindexing of entire data. Is there an
> option wherein if I have a query in double quotes it simply ignores all the
> tokenizers and analyzers.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/is-phrase-search-possible-in-solr-tp4057312p4057804.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: Stats facet on int/tint fields

2013-04-22 Thread Michael Ryan

Sounds like this could be https://issues.apache.org/jira/browse/SOLR-2976.

-Michael

-Original Message-
From: vinothkumar raman [mailto:vinothkr.k...@gmail.com] 
Sent: Monday, April 22, 2013 5:54 AM
To: solr-user@lucene.apache.org; solr-...@lucene.apache.org
Subject: Stats facet on int/tint fields

I have a schema like this






I wanted to find the average price faceted on cat. So was using the stats facet 
to get the average on the fields like this 
http://solr-serv/solr/latest/select?q=*%3A*&wt=xml&indent=true&stats=true&rows=0&stats.field=price&stats.facet=cat

Which throws an exception like this
org.apache.solr.common.SolrException: Server at http://solr-serv/solr/latest 
returned non ok status:500, message:Server Error at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

When i look at the logs this is all i get

null:java.lang.NumberFormatException: For input string: "`=?"


But when i try with stats.facet=cat_name it works perfectly fine. It doesnt 
work with any other int/tint field

I am not sure whats really wrong with my query.

(Dropping it to dev list too. Incase if its a bug)


PS: Not sure if its bug so sending it to dev mailing list too.n

Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Furkan KAMACI

Sorry but if I have 10 shards and a collection with replication factor of 1
and if I start up 30 nodes what happens to that last 10 nodes? I mean:

10 nodes as leader
10 nodes as replica

if I don't specify replication factor there was going to be a round robin
system that assigns other 10 machine as:
+ 10 nodes as replica

However what will happen to that 10 nodes when I specify replication factor?


2013/4/22 Erick Erickson 

> 1) Imagine you have lots and lots and lots of different Solr indexes
> and a 50 node cluster. Further imagine that one of those indexes has 2
> shards, and a leader + shard is adequate to handle the load. You need
> some way to limit the number of nodes your index gets distributed to,
> that's what replicationFactor is for. So in this case
> replicationFactor=2 will stop assigning nodes to that particular
> collection after there's a leader + 1 replica
>
> 2> In the system you described, there won't be more than one
> shard/node. But one strategy for growth is to "overshard". That is, in
> the early days you put (numbers from thin air) 10 shards/node and they
> are all quite small. As your index grows, you move to two nodes with 5
> shards each. And later to 5 nodes with 2 shards and so on. There are
> cases where you want some way to make the most of your hardware yet
> plan for expansion.
>
> Best
> Erick
>
> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI 
> wrote:
> > I know that: when using SolrCloud we define the number of shards into the
> > system. When we start up new Solr instances each one will be a a leader
> for
> > a shard, and if I continue to start up new Solr instances (that has
> > exceeded the number number of shards) each one will be a replica for each
> > leader as a round robin process.
> >
> > However when I read wiki there are two parameters: *replicationFactor
> *and *
> > maxShardsPerNode.
> >
> > *1) Can you give details about what are they. If all newly added Solr
> > instances becomes a replica what is that replication factor for?
> > 2) If what I wrote is true about that round robin process what is that *
> > maxShardsPerNode*? How can be more than one shard at the system I
> described?
>

Re: Dynamically loading Elevation Info

I believe (but don't know for sure) that the QEV file is re-read on
core reload, which the same app that modifies the elevator.xml file
could trigger with an http request, see:

http://wiki.apache.org/solr/CoreAdmin#RELOAD

At least that's what I would try first.

Best
Erick

On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> Hi,
>  Business User wants to configure the elevation text and the IDs and they
> want to have an UI to do the same. As soon as they configure, it should be
> reflected  in SOLR,(without restarting).
>
> My understanding is, Now, the QueryElevationComponent reads the
> Elevator.xml(Configurable) and loads the information into ElevationCache
> during startup and uses the information while responding to queries. Is
> there any way, the content in the ElevationCache can be modifiable  by
> some other external process / is there any easy way of achieving this
> requirement ?
>
> Thanks and Regards,
> Saroj Kumar Choudhury
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>

Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Jan Høydahl

2) Does this mean that if you have one physical server with one Solr instance,
   and you try to create a collection with numShards=2&maxShardsPerNode=2
   then it will succeed, putting three shards on the same node?

If you then add another node, you still need to move one shard over to the
new node manually, don't you? Is there a JIRA to auto-balance shards?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

22. apr. 2013 kl. 13:04 skrev Erick Erickson :

> 1) Imagine you have lots and lots and lots of different Solr indexes
> and a 50 node cluster. Further imagine that one of those indexes has 2
> shards, and a leader + shard is adequate to handle the load. You need
> some way to limit the number of nodes your index gets distributed to,
> that's what replicationFactor is for. So in this case
> replicationFactor=2 will stop assigning nodes to that particular
> collection after there's a leader + 1 replica
> 
> 2> In the system you described, there won't be more than one
> shard/node. But one strategy for growth is to "overshard". That is, in
> the early days you put (numbers from thin air) 10 shards/node and they
> are all quite small. As your index grows, you move to two nodes with 5
> shards each. And later to 5 nodes with 2 shards and so on. There are
> cases where you want some way to make the most of your hardware yet
> plan for expansion.
> 
> Best
> Erick
> 
> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  wrote:
>> I know that: when using SolrCloud we define the number of shards into the
>> system. When we start up new Solr instances each one will be a a leader for
>> a shard, and if I continue to start up new Solr instances (that has
>> exceeded the number number of shards) each one will be a replica for each
>> leader as a round robin process.
>> 
>> However when I read wiki there are two parameters: *replicationFactor *and *
>> maxShardsPerNode.
>> 
>> *1) Can you give details about what are they. If all newly added Solr
>> instances becomes a replica what is that replication factor for?
>> 2) If what I wrote is true about that round robin process what is that *
>> maxShardsPerNode*? How can be more than one shard at the system I described?

Re: is phrase search possible in solr


"I want queries within double quotes to be ..."

Just to be clear (as already stated), you do not get to set the semantics of 
quotes, which are set by the query parser and the analyzer for the field - 
if you want a different semantics, copy the data to another field and use 
that different semantics in the new field's analyzer.


But also to be clear, in case anybody is simply reading the message subject 
line literally, yes, phrase search is possible in Solr.


-- Jack Krupansky

-Original Message- 
From: vicky desai

Sent: Monday, April 22, 2013 1:50 AM
To: solr-user@lucene.apache.org
Subject: Re: is phrase search possible in solr

Hi,

If I use shinglingFilter than all type of queries will be impacted. I want
queries within double quotes to be an exact search but for queries without
double quotes all analyzers and tokenizers should be applied. Is there a
setting or a configuration in schema.xml which can cater this requirement



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-phrase-search-possible-in-solr-tp4057312p4057812.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

"replicationFactor=2 will stop assigning nodes to that particular collection 
after there's a leader + 1 replica"


They are both replicas, right?

I mean, at any given moment one of the replicas will also have a role of 
"leader", but it's still a replica - in SolrCloud, that is, as opposed to 
old master/slave/replica Solr.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Monday, April 22, 2013 7:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Where to use replicationFactor and maxShardsPerNode at 
SolrCloud?


1) Imagine you have lots and lots and lots of different Solr indexes
and a 50 node cluster. Further imagine that one of those indexes has 2
shards, and a leader + shard is adequate to handle the load. You need
some way to limit the number of nodes your index gets distributed to,
that's what replicationFactor is for. So in this case
replicationFactor=2 will stop assigning nodes to that particular
collection after there's a leader + 1 replica

2> In the system you described, there won't be more than one
shard/node. But one strategy for growth is to "overshard". That is, in
the early days you put (numbers from thin air) 10 shards/node and they
are all quite small. As your index grows, you move to two nodes with 5
shards each. And later to 5 nodes with 2 shards and so on. There are
cases where you want some way to make the most of your hardware yet
plan for expansion.

Best
Erick

On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  
wrote:

I know that: when using SolrCloud we define the number of shards into the
system. When we start up new Solr instances each one will be a a leader 
for

a shard, and if I continue to start up new Solr instances (that has
exceeded the number number of shards) each one will be a replica for each
leader as a round robin process.

However when I read wiki there are two parameters: *replicationFactor *and 
*

maxShardsPerNode.

*1) Can you give details about what are they. If all newly added Solr
instances becomes a replica what is that replication factor for?
2) If what I wrote is true about that round robin process what is that *
maxShardsPerNode*? How can be more than one shard at the system I 
described?

Re: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread Yonik Seeley

Thanks David,

I've confirmed this is still a problem in trunk and opened
https://issues.apache.org/jira/browse/SOLR-4746

-Yonik
http://lucidworks.com


On Sun, Apr 21, 2013 at 11:16 PM, David Parks  wrote:
> We just took an installation of 4.1 which was working fine and changed it to
> run as solr cloud. We encountered the most incredibly bizarre apparent bug:
>
> In the JSON output, a colon ':' changed to a comma ',', which of course
> broke the JSON parser.  I'm guessing I should file this as a bug, but it was
> so odd I thought I'd post here before doing so. Demo below:
>
> Here is a query on our previous single-server instance:
>
> Query:
> --
> http://10.1.3.28:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalog_name
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&group.li
> mit=50
>
> Response:
> -
> {"responseHeader":{"status":0,"QTime":15714,"params":{"fl":"score,id,unique_
> catalog_name","start":"0","q":"book","group.limit":"50","group.field":"uniqu
> e_catalog_name","group":"true","wt":"json","rows":"50"}},"grouped":{"unique_
> catalog_name":{"matches":106711214,"groups":[{"groupValue":"ls:2653","doclis
> t":{"numFound":103981882,"start":0,"maxScore":4.7039795,"docs":[{"id":"10055
> 02088784","score":4.7039795},{"id":"1005500291075","score":4.7039795},{"id":
> "1000810546074","score":4.7039795},{"id":"1000611003270","score":4.7039795},
>
> Note this part:
> --
>   {"unique_catalog_name":{"matches":
>
>
>
> Now we run that same query on a server that was derived from the same build,
> just configuration changes to run it in distributed "solr cloud" mode.
>
> Query:
> -
> http://10.1.3.18:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalog_name
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&group.li
> mit=50
>
> Response:
> -{"responseHeader":{"status":0,"QTime":8855,"params":{"fl":"scor
> e,id,unique_catalog_name","start":"0","q":"book","group.limit":"50","group.f
> ield":"unique_catalog_name","group":"true","wt":"json","rows":"50"}},"groupe
> d":["unique_catalog_name",{"matches":106711214,"groups":[{"groupValue":"ls:2
> 653","doclist":{"numFound":103981882,"start":0,"maxScore":4.7042913,"docs":[
> {"id":"1005502088784","score":4.7042913},{"id":"1000611003270","score":4.704
> 2913},{"id":"1005500291075","score":4.703668},{"id":"1000810546074","score":
> 4.703668},
>
> Note how it's changed:
> 
>   "unique_catalog_name",{"matches":
>
>
>
>

Re: fuzzy search issue with PatternTokenizer Factory

Once again, fuzzy search is completely independent of your analyzer or 
pattern tokenizer. Please use the Solr Admin UI Analysis page to debug 
whether the terms are what you expect. And realize that fuzzy search has a 
maximum editing distance of 2 and that includes case changes.


-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Monday, April 22, 2013 3:25 AM
To: solr-user@lucene.apache.org
Subject: Re: fuzzy search issue with PatternTokenizer Factory

Jack,

the regex will split tokens by anything expect alphabets , numbers, '&' ,
'-' and ns: (where n is number from 0 to , e.g 4323s: )

Lets say for example my text is like below.

*this is nice* day & sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day & sun is risen*

pattern seem to working fine with different text,

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED* , etc...

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly.


Jack Krupansky-2 wrote

Give us some examples of tokens that you are expecting that pattern to
tokenize. And express the pattern in simple English as well. Some some
actual input data.

I suspect that Solr is working fine - but you may not have precisely
specified your pattern. But we don't know what your pattern is supposed to
recognize.

Maybe some of your previous hits had punctuation adjacent to to the terms
that your pattern doesn't recognize.

And use the Solr Admin UI Analysis page to see how your sample input data
is
analyzed.
w
One other thing... without a "group", the pattern specifies what delimiter
sequence will "split" the rest of the input into tokens. I suspect you
didn't mean this.

-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Friday, April 19, 2013 9:01 AM
To:



solr-user@.apache



Subject: fuzzy search issue with PatternTokenizer Factory

I m using Solr4.2 , I have changed my text field definition, to use the
Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
and
changed my schema defination as below

























after doing so, fuzzy search do not seems to working properly as it was
working before.

I m searching with search term : worde~1

on search , before it was returning , around 300 records , but now its
returning only 5 records. not sure what can be issue.

Can anybody help me to make it work!!







--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: ComplexPhraseQParserPlugin not working with solr 4.2

2013-04-22 Thread Ahmet Arslan


Hi ilay,

Can you try ComplexPhrase-4.2.1.zip, it supposed to work with 4.2.


--- On Mon, 4/22/13, ilay raja  wrote:

> From: ilay raja 
> Subject: Re: ComplexPhraseQParserPlugin not working with solr 4.2
> To: solr-user@lucene.apache.org, solr-...@lucene.apache.org
> Date: Monday, April 22, 2013, 12:30 PM
> I was able to solve the previous
> problem of not loading
> COmplexPhraseQParserPlugin. Still I am able to run this
> with
> defType=complexphrase:
> java.lang.NoSuchMethodError:
> org.apache.solr.search.QueryParsing.getQueryParserDefaultOperator(Lorg/apache/solr/schema/IndexSchema;Ljava/lang/String;)Lorg/apache/lucene/queryparser/classic/QueryParser$Operator;
> 
> Is there an issue with running ComplexPhraseQParserPLuging
> (4.00 jar)
> agaisnt solr 4.2 ?
> 
> On Sat, Apr 20, 2013 at 10:30 PM, ilay raja 
> wrote:
> 
> > Hi
> >
> >   I followed the steps given in
> > https://issues.apache.org/jira/browse/SOLR-1604 for
> integrating the
> > plugin.
> > But is not picking up the classpath correctly. Though
> added the following
> > lines to solrconfig.xml
> >  regex="ComplexPhrase-\d.*\.jar" />
> >  >
> class="org.apache.solr.search.ComplexPhraseQParserPlugin"
> />
> >
> > I have the compiled jar in solr-home/dist/
> >
> > The exception is as below - unable to create core
> mainindex.
> > Is it an issue with using this plugin in 4.2? Does it
> work with 4.0 ?
> >
> >
> >      at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >         at
> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >         at
> >
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> >         at
> >
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> >         at
> >
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> >         at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> >         at
> org.apache.solr.core.SolrCore.(SolrCore.java:619)
> >         at
> org.apache.solr.core.SolrCore.(SolrCore.java:806)
> > org.apache.solr.common.SolrException: Error loading
> class
> > 'org.apache.solr.search.ComplexPhraseQParserPlugin'
> > SEVERE: Unable to create core: mainindex
> >
> >
>

Re: SolrCloud Leaders

2013-04-22 Thread Furkan KAMACI

Hi Jack;

You said: "An hour from now some other replica may be the leader"

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky 

> All nodes are replicas in SolrCloud since there are no masters. It's a
> fully distributed model. A leader is also a replica. A leader is simply a
> replica which was elected to be a leader, for now. An hour from now some
> other replica may be the leader.
>
> It is indeed misleading and inaccurate to suggest that "leader" and
> "replicas" are disjoint.
>
> Once again, I think you are confusing SolrCloud with the older Solr
> master/slave/replication.
>
> Every node in SolrCloud can do indexing. That's the same as saying that
> every replica in SolrCloud can do indexing.
>
> Although we do need to be clear that a given replica will only index
> documents for the shard(s) to which it belongs.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Furkan KAMACI
> Sent: Monday, April 15, 2013 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud Leaders
>
> Here writes something:
>
> https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
>
> says:
>
> Both leaders and replicas index items and perform searches.
>
> How replicas index items?
>
>
> 2013/4/15 Furkan KAMACI 
>
>  Does leaders may response search requests (I mean do they store indexes)
>> at when I run SolrCloud at first and after a time later?
>>
>>
>> 2013/4/15 Jack Krupansky 
>>
>>  When the cluster is fully operational, yes. But if part of the cluster is
>>> down or split and unable to communicate, or leader election is in
>>> progress,
>>> the actual count of leaders will not be indicative of the number of
>>> shards.
>>>
>>> Leaders and shards are apples and oranges. If you take down a cluster, by
>>> definition it would have no leaders (because leaders are running code),
>>> but
>>> shards are the files in the index on disk that continue to exist even if
>>> the code is not running. So, in the extreme, the number of leaders can be
>>> zero while the number of shards is non-zero on disk.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Furkan KAMACI
>>> Sent: Monday, April 15, 2013 8:21 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: SolrCloud Leaders
>>>
>>>
>>> Does number of leaders at a SolrCloud is equal to number of shards?
>>>
>>>
>>
>>
>

Re: SolrCloud Leaders

2013-04-22 Thread Otis Gospodnetic

If the current leader dies, somebody's got to take over.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Apr 22, 2013 at 9:41 AM, Furkan KAMACI  wrote:
> Hi Jack;
>
> You said: "An hour from now some other replica may be the leader"
>
> What is the criteria to change a leader of a shard?
>
> 2013/4/15 Jack Krupansky 
>
>> All nodes are replicas in SolrCloud since there are no masters. It's a
>> fully distributed model. A leader is also a replica. A leader is simply a
>> replica which was elected to be a leader, for now. An hour from now some
>> other replica may be the leader.
>>
>> It is indeed misleading and inaccurate to suggest that "leader" and
>> "replicas" are disjoint.
>>
>> Once again, I think you are confusing SolrCloud with the older Solr
>> master/slave/replication.
>>
>> Every node in SolrCloud can do indexing. That's the same as saying that
>> every replica in SolrCloud can do indexing.
>>
>> Although we do need to be clear that a given replica will only index
>> documents for the shard(s) to which it belongs.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Furkan KAMACI
>> Sent: Monday, April 15, 2013 9:38 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud Leaders
>>
>> Here writes something:
>>
>> https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
>>
>> says:
>>
>> Both leaders and replicas index items and perform searches.
>>
>> How replicas index items?
>>
>>
>> 2013/4/15 Furkan KAMACI 
>>
>>  Does leaders may response search requests (I mean do they store indexes)
>>> at when I run SolrCloud at first and after a time later?
>>>
>>>
>>> 2013/4/15 Jack Krupansky 
>>>
>>>  When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in
 progress,
 the actual count of leaders will not be indicative of the number of
 shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code),
 but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?


>>>
>>>
>>

Re: SolrCloud Leaders

Leader election will result from nodes coming up and going down as well as 
changes in network connectivity and even simply responsiveness between the 
nodes. A "quorum" is always needed.


There may be other reasons as well that I don't know about.

The point was simply that it is not a "leader" vs. "replica" issue - all of 
the nodes are replicas and one replica just "happens" to be be playing the 
role of leader at a given moment.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 22, 2013 9:41 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Hi Jack;

You said: "An hour from now some other replica may be the leader"

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky 


All nodes are replicas in SolrCloud since there are no masters. It's a
fully distributed model. A leader is also a replica. A leader is simply a
replica which was elected to be a leader, for now. An hour from now some
other replica may be the leader.

It is indeed misleading and inaccurate to suggest that "leader" and
"replicas" are disjoint.

Once again, I think you are confusing SolrCloud with the older Solr
master/slave/replication.

Every node in SolrCloud can do indexing. That's the same as saying that
every replica in SolrCloud can do indexing.

Although we do need to be clear that a given replica will only index
documents for the shard(s) to which it belongs.


-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Here writes something:

https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and

says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI 

 Does leaders may response search requests (I mean do they store indexes)

at when I run SolrCloud at first and after a time later?


2013/4/15 Jack Krupansky 

 When the cluster is fully operational, yes. But if part of the cluster 
is

down or split and unable to communicate, or leader election is in
progress,
the actual count of leaders will not be indicative of the number of
shards.

Leaders and shards are apples and oranges. If you take down a cluster, 
by

definition it would have no leaders (because leaders are running code),
but
shards are the files in the index on disk that continue to exist even if
the code is not running. So, in the extreme, the number of leaders can 
be

zero while the number of shards is non-zero on disk.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders


Does number of leaders at a SolrCloud is equal to number of shards?

spellcheck: change in behavior and QTime

2013-04-22 Thread SandeepM

I am using the same setup (solrconfig.xml and schema.xml) as stated in my
prior message:
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tt4057176.html#a4057389
I am using SOLR 4.2.1 . Just wanted to report something wierd that I am
seeing and would like to find out if anyone else is seeing this behavior.  
Since I don't understand the details of what is happening, I'd like to know
why the change in behavior and if we can do anything to get better QTime
upfront?

I see a change in behavior when running queries against the server due to
which the QTime also changes.

QUERY:
?spellcheck=true
&spellcheck.q=cucoo's+nest
&df=spell
&fq= Its the same every time and I believe moot.

Here is what I have to do:
1.  Run the query.
2.  Run the same query with spellcheck=false
3.  Run the original query (spellcheck=true)

QTime from each of the above stages:
1.  40ms (multiple runs with spellcheck=true.)
2.  10ms (spellcheck = false is run just once)
3.  20ms (after changing back to spellcheck=true again and running multiple
times.)

Cache details at each of the above times:
1.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 30, Now: 35, Delta: 5

hits:
Was: 25, Now: 30, Delta: 5

hitratio:
Was: 0.83, Now: 0.85

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 30, Now: 35, Delta: 5

cumulative_hits:
Was: 25, Now: 30, Delta: 5

cumulative_hitratio:
Was: 0.83, Now: 0.85

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=40960, initialSize=10240,
minSize=36864, acceptableSize=38912, cleanupThread=false,
autowarmCount=2560,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@520adaf0)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 8, Now: 10, Delta: 2

hits:
Was: 3, Now: 4, Delta: 1

hitratio:
Was: 0.37, Now: 0.40

inserts:
Was: 5, Now: 6, Delta: 1

evictions:
0

size:
Was: 6, Now: 7, Delta: 1

warmupTime:
0

cumulative_lookups:
Was: 8, Now: 10, Delta: 2

cumulative_hits:
Was: 3, Now: 4, Delta: 1

cumulative_hitratio:
Was: 0.37, Now: 0.40

cumulative_inserts:
Was: 5, Now: 6, Delta: 1

cumulative_evictions:
0

CACHE 2
CORE
HIGHLIGHTING
OTHER
QUERYHANDLER 3
UPDATEHANDLER
Watch Changes
Refresh Values

2.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 35, Now: 40, Delta: 5

hits:
Was: 30, Now: 35, Delta: 5

hitratio:
Was: 0.85, Now: 0.87

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 35, Now: 40, Delta: 5

cumulative_hits:
Was: 30, Now: 35, Delta: 5

cumulative_hitratio:
Was: 0.85, Now: 0.87

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

Have a little more info about this ... the numDocs for *:* fluctuates
between two values (difference of 324 docs) depending on which nodes I
hit (distrib=true)

589,674,416
589,674,092

Using distrib=false, I found 1 shard with a mis-match:

shard15: {
  leader = 32,765,254
  replica = 32,764,930 diff:324
}

Interesting that the replica has more docs than the leader.

Unfortunately, due to some bad log management scripting on my part,
the logs were lost when these instances got re-started, which really
bums me out :-(

For now, I'm going to assume the replica with more docs is the one I
want to keep and will replicate the full index over to the other one.
Sorry about losing the logs :-(

Tim




On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  wrote:
> Thanks for responding Mark. I'll collect the information you asked
> about and open a JIRA once I have a little more understanding of what
> happened. Hopefully I can piece together some story after going over
> the logs.
>
> As for replica / leader, I suspect some leaders went down but
> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
> once and continued to serve queries, which is awesome.
>
> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
>> Yeah, thats no good.
>>
>> You might hit each node with distrib=false to get the doc counts.
>>
>> Which ones have what you think are the right counts and which the wrong - eg 
>> is it all replicas that are off, or leaders as well?
>>
>> You say several replicas - do you mean no leaders went down?
>>
>> You might look closer at the logs for a node that has it's count off.
>>
>> Finally, I guess I'd try and track it in a JIRA issue.
>>
>> - Mark
>>
>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>
>>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>>>
>>> After recovering, when I execute the match all docs query (*:*), I get a
>>> different count each time.
>>>
>>> In other words, if I execute q=*:* several times in a row, then I get a
>>> different count back for numDocs.
>>>
>>> This was not the case prior to the failure as that is one thing we monitor
>>> for.
>>>
>>> I think I should be worried ... any ideas on how to troubleshoot this? One
>>> thing to mention is that several of my replicas had to do full recoveries
>>> from the leader when they came back online. Indexing was happening when the
>>> replicas failed.
>>>
>>> Thanks.
>>> Tim
>>

Re: Dynamically loading Elevation Info

2013-04-22 Thread Ravi Solr

If you place the elevate.xml in the data directory of your index it will be
loaded every time a commit happens.

Thanks

Ravi Kiran Bhaskar


On Mon, Apr 22, 2013 at 7:38 AM, Erick Erickson wrote:

> I believe (but don't know for sure) that the QEV file is re-read on
> core reload, which the same app that modifies the elevator.xml file
> could trigger with an http request, see:
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> At least that's what I would try first.
>
> Best
> Erick
>
> On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> > Hi,
> >  Business User wants to configure the elevation text and the IDs and they
> > want to have an UI to do the same. As soon as they configure, it should
> be
> > reflected  in SOLR,(without restarting).
> >
> > My understanding is, Now, the QueryElevationComponent reads the
> > Elevator.xml(Configurable) and loads the information into ElevationCache
> > during startup and uses the information while responding to queries. Is
> > there any way, the content in the ElevationCache can be modifiable  by
> > some other external process / is there any easy way of achieving this
> > requirement ?
> >
> > Thanks and Regards,
> > Saroj Kumar Choudhury
> > =-=-=
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>

Re: updating documents unintentionally adds extra values to certain fields

2013-04-22 Thread Chris Hostetter


: I am using solr 4.2, and have set up spatial search config as below
: 
: http://wiki.apache.org/solr/SpatialSearch#Schema_Configuration
: 
: But everything I make an update to a document,
: http://wiki.apache.org/solr/UpdateJSON#Updating_a_Solr_Index_with_JSON
: 
: more values of the *_coordinates fields gets inserted, even though it was
: not set to multivalue & this behavior doesn't happen to any of the other
: fields.

can you elaborate on what exactly you mena by "more values of the 
*_coordinates fields gets inserted" ?

FYI...

atomic updates work by leveraging the existing stored values of fields;
independently, the LatLonType field works by creating on the fly sub 
fields representing internal state.

My hunch is that you don't actaully have the LatLonType setup exactly as 
describedi n hte wiki you linked to, where "*_coordinate" is confiured 
with 'stored="false"' ... my hunch is that you have the *_coordinate 
dynamicField configured to stored="true", and so when you do an atomic 
update the old (stored) sub-field values are copied over and the (new) 
sub-field values are generated again by LatLonType.


-Hoss

Re: Dynamically loading Elevation Info

2013-04-22 Thread Chris Hostetter


: In-Reply-To: <1366609851170-4057812.p...@n3.nabble.com>
: References: <1366383543826-4057312.p...@n3.nabble.com>
:  
:  <1366609851170-4057812.p...@n3.nabble.com>
: Subject: Dynamically loading Elevation Info

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

nm - can't read my own output - the leader had more docs than the replica ;-)

On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  wrote:
> Have a little more info about this ... the numDocs for *:* fluctuates
> between two values (difference of 324 docs) depending on which nodes I
> hit (distrib=true)
>
> 589,674,416
> 589,674,092
>
> Using distrib=false, I found 1 shard with a mis-match:
>
> shard15: {
>   leader = 32,765,254
>   replica = 32,764,930 diff:324
> }
>
> Interesting that the replica has more docs than the leader.
>
> Unfortunately, due to some bad log management scripting on my part,
> the logs were lost when these instances got re-started, which really
> bums me out :-(
>
> For now, I'm going to assume the replica with more docs is the one I
> want to keep and will replicate the full index over to the other one.
> Sorry about losing the logs :-(
>
> Tim
>
>
>
>
> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  wrote:
>> Thanks for responding Mark. I'll collect the information you asked
>> about and open a JIRA once I have a little more understanding of what
>> happened. Hopefully I can piece together some story after going over
>> the logs.
>>
>> As for replica / leader, I suspect some leaders went down but
>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>> once and continued to serve queries, which is awesome.
>>
>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
>>> Yeah, thats no good.
>>>
>>> You might hit each node with distrib=false to get the doc counts.
>>>
>>> Which ones have what you think are the right counts and which the wrong - 
>>> eg is it all replicas that are off, or leaders as well?
>>>
>>> You say several replicas - do you mean no leaders went down?
>>>
>>> You might look closer at the logs for a node that has it's count off.
>>>
>>> Finally, I guess I'd try and track it in a JIRA issue.
>>>
>>> - Mark
>>>
>>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>>
 We had a rogue query take out several replicas in a large 4.2.0 cluster
 today, due to OOM's (we use the JVM args to kill the process on OOM).

 After recovering, when I execute the match all docs query (*:*), I get a
 different count each time.

 In other words, if I execute q=*:* several times in a row, then I get a
 different count back for numDocs.

 This was not the case prior to the failure as that is one thing we monitor
 for.

 I think I should be worried ... any ideas on how to troubleshoot this? One
 thing to mention is that several of my replicas had to do full recoveries
 from the leader when they came back online. Indexing was happening when the
 replicas failed.

 Thanks.
 Tim
>>>

Support of field variants in solr

2013-04-22 Thread Timo Schmidt

Hi together,

i am timo and work for a solr implementation company. During the last projects 
we came to know that we need to be able to generate different variants of a 
document.
 
Example 1 (Language):
 
To handle all documents in one solr core, we need a field variant for each 
language.
 

content for spanish content


 
content for german content


 
 
Each of these fields can be configured in the solr schema to act optimal for 
the specific taget language.
 
Example 2 (Stores):
 
We have customers who want to sell the same product in different stores for 
different prices.
 

price in frankfurt


 
price in paris


 
To solve this in an optimal way it would be nice when this works complely 
transparent inside solr by definig a „variantQuery“
 
A select query could look like this:

select?variantQuery=fr&qf=price,content
 
Additional the following is possible. No variant is present, behavious should 
be as before, so it should be relevant for all queries.

The setting variant=“*“ would mean: There can be several wildcard variant 
defined in a commited document. This makes sence when the data type would be 
the same for all variants and you will have many variants (like in the price 
example).

The same as during query time should be possible during indexing time.

I know, that we can do somthing like this also with dynamic fields but then we 
need to resolve the concrete fields during index and querytime on the 
application level, what is possible but it would be nicer to have a concept 
like this in solr, also working with facets is easier with this approach when 
the concrete fieldname does not need to be populated in the application.
 
So my questions are:

What do you think about this approach?
Is it better to work with dynamic fields? Is it reasonable when you have 200 
variants or more of a document?
What needs to be done in solr to have something like this variant attribute for 
fields?
Do you have other approaches?

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM

James, Thanks. That was very helpful. That helped me understand count and
alternativeTermCount a bit more.

I also have the following case as pointed out earlier...
My query:

http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms .

Both query produce collations correctly and picking the first suggestions
and applying them as collation find what I am looking for but there is order
of magnitude difference in QTime. There is one edit per term in both cases
or 2 edits in each query. The length of words in both these queries seem
identical. I'd like to understand why there is this vast difference in
QTime. Also "Chocolate factory" and "Pursuit happyness" both are spellcheck
indexed as is.

I would appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular.

Thanks.
Regards,
-- Sandeep

--
View this message in context:
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058048.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-22 Thread Sudhakar Maddineni

Hi,
  We recently upgraded our solr version from 4.1 to 4.2 and started seeing
below exceptions when running distributed queries:
Any idea what we are missing here -

http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1
http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1
http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1

  "error":{
"trace":"java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)\n\tat
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)\n\tat
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)\n\tat
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)\n\tat
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat
java.lang.Thread.run(Unknown Source)\n",
"code":500}}


Thanks,Sudhakar.

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread Dyer, James

On both queries, set "spellcheck.extendedResults=true" and also 
"spellcheck.collateExtendedResults=true", then post the full spelling response. 
 Also, how long does each query take on average with spellcheck turned off?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Monday, April 22, 2013 2:02 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

James, Thanks.  That was very helpful. That helped me understand count and
alternativeTermCount a bit more.

I also have the following case as pointed out earlier...
My query: 

http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms 

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms . 

Both query produce collations correctly and picking the first suggestions
and applying them as collation find what I am looking for but there is order
of magnitude difference in QTime.  There is one edit per term in both cases
or 2 edits in each query. The length of words in both these queries seem
identical. I'd like to understand why there is this vast difference in
QTime.  Also "Chocolate factory" and "Pursuit happyness" both are spellcheck
indexed as is.

I would appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

Thanks.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058048.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

Bummer on the log loss :(

Good info though. Somehow that replica became active without actually syncing? 
This is heavily tested (though not with OOM's I suppose), so I'm a little 
surprised, but it's hard to speculate how it happened without the logs. 
Specially, the logs from the node that is off would be great - we would see 
what it did when it recovered and why it might think it was in sync :(

- Mark

On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:

> nm - can't read my own output - the leader had more docs than the replica ;-)
> 
> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  wrote:
>> Have a little more info about this ... the numDocs for *:* fluctuates
>> between two values (difference of 324 docs) depending on which nodes I
>> hit (distrib=true)
>> 
>> 589,674,416
>> 589,674,092
>> 
>> Using distrib=false, I found 1 shard with a mis-match:
>> 
>> shard15: {
>>  leader = 32,765,254
>>  replica = 32,764,930 diff:324
>> }
>> 
>> Interesting that the replica has more docs than the leader.
>> 
>> Unfortunately, due to some bad log management scripting on my part,
>> the logs were lost when these instances got re-started, which really
>> bums me out :-(
>> 
>> For now, I'm going to assume the replica with more docs is the one I
>> want to keep and will replicate the full index over to the other one.
>> Sorry about losing the logs :-(
>> 
>> Tim
>> 
>> 
>> 
>> 
>> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
>> wrote:
>>> Thanks for responding Mark. I'll collect the information you asked
>>> about and open a JIRA once I have a little more understanding of what
>>> happened. Hopefully I can piece together some story after going over
>>> the logs.
>>> 
>>> As for replica / leader, I suspect some leaders went down but
>>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>>> once and continued to serve queries, which is awesome.
>>> 
>>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
 Yeah, thats no good.
 
 You might hit each node with distrib=false to get the doc counts.
 
 Which ones have what you think are the right counts and which the wrong - 
 eg is it all replicas that are off, or leaders as well?
 
 You say several replicas - do you mean no leaders went down?
 
 You might look closer at the logs for a node that has it's count off.
 
 Finally, I guess I'd try and track it in a JIRA issue.
 
 - Mark
 
 On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
 
> We had a rogue query take out several replicas in a large 4.2.0 cluster
> today, due to OOM's (we use the JVM args to kill the process on OOM).
> 
> After recovering, when I execute the match all docs query (*:*), I get a
> different count each time.
> 
> In other words, if I execute q=*:* several times in a row, then I get a
> different count back for numDocs.
> 
> This was not the case prior to the failure as that is one thing we monitor
> for.
> 
> I think I should be worried ... any ideas on how to troubleshoot this? One
> thing to mention is that several of my replicas had to do full recoveries
> from the leader when they came back online. Indexing was happening when 
> the
> replicas failed.
> 
> Thanks.
> Tim

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

What do you know about the # of docs you *should*? Do you have that mean when 
taking the bad replica out of the equation?

- Mark

On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:

> Bummer on the log loss :(
> 
> Good info though. Somehow that replica became active without actually 
> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
> little surprised, but it's hard to speculate how it happened without the 
> logs. Specially, the logs from the node that is off would be great - we would 
> see what it did when it recovered and why it might think it was in sync :(
> 
> - Mark
> 
> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
> 
>> nm - can't read my own output - the leader had more docs than the replica ;-)
>> 
>> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
>> wrote:
>>> Have a little more info about this ... the numDocs for *:* fluctuates
>>> between two values (difference of 324 docs) depending on which nodes I
>>> hit (distrib=true)
>>> 
>>> 589,674,416
>>> 589,674,092
>>> 
>>> Using distrib=false, I found 1 shard with a mis-match:
>>> 
>>> shard15: {
>>> leader = 32,765,254
>>> replica = 32,764,930 diff:324
>>> }
>>> 
>>> Interesting that the replica has more docs than the leader.
>>> 
>>> Unfortunately, due to some bad log management scripting on my part,
>>> the logs were lost when these instances got re-started, which really
>>> bums me out :-(
>>> 
>>> For now, I'm going to assume the replica with more docs is the one I
>>> want to keep and will replicate the full index over to the other one.
>>> Sorry about losing the logs :-(
>>> 
>>> Tim
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
>>> wrote:
 Thanks for responding Mark. I'll collect the information you asked
 about and open a JIRA once I have a little more understanding of what
 happened. Hopefully I can piece together some story after going over
 the logs.
 
 As for replica / leader, I suspect some leaders went down but
 fail-over to new leaders seemed to work fine. We lost about 9 nodes at
 once and continued to serve queries, which is awesome.
 
 On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
 wrote:
> Yeah, thats no good.
> 
> You might hit each node with distrib=false to get the doc counts.
> 
> Which ones have what you think are the right counts and which the wrong - 
> eg is it all replicas that are off, or leaders as well?
> 
> You say several replicas - do you mean no leaders went down?
> 
> You might look closer at the logs for a node that has it's count off.
> 
> Finally, I guess I'd try and track it in a JIRA issue.
> 
> - Mark
> 
> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
> 
>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>> 
>> After recovering, when I execute the match all docs query (*:*), I get a
>> different count each time.
>> 
>> In other words, if I execute q=*:* several times in a row, then I get a
>> different count back for numDocs.
>> 
>> This was not the case prior to the failure as that is one thing we 
>> monitor
>> for.
>> 
>> I think I should be worried ... any ideas on how to troubleshoot this? 
>> One
>> thing to mention is that several of my replicas had to do full recoveries
>> from the leader when they came back online. Indexing was happening when 
>> the
>> replicas failed.
>> 
>> Thanks.
>> Tim
> 
>

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

I ended up just nuking the index on the replica with less docs and
restarting it - which triggered the snap pull from the leader. So now
I'm in sync and have better processes in place to capture the
information if it happens again, which given some of the queries my UI
team develops, is highly likely ;-)

Also, all our input data to Solr lives in Hive so I'm doing some id
-to- id comparisons of what is in Solr vs. what is in Hive to find any
discrepancies.

Again, sorry about the loss of the logs. This is a tough scenario to
try to re-create as it was a perfect storm of high indexing throughput
and a rogue query.

Tim

On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller  wrote:
> What do you know about the # of docs you *should*? Do you have that mean when 
> taking the bad replica out of the equation?
>
> - Mark
>
> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>
>> Bummer on the log loss :(
>>
>> Good info though. Somehow that replica became active without actually 
>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
>> little surprised, but it's hard to speculate how it happened without the 
>> logs. Specially, the logs from the node that is off would be great - we 
>> would see what it did when it recovered and why it might think it was in 
>> sync :(
>>
>> - Mark
>>
>> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
>>
>>> nm - can't read my own output - the leader had more docs than the replica 
>>> ;-)
>>>
>>> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
>>> wrote:
 Have a little more info about this ... the numDocs for *:* fluctuates
 between two values (difference of 324 docs) depending on which nodes I
 hit (distrib=true)

 589,674,416
 589,674,092

 Using distrib=false, I found 1 shard with a mis-match:

 shard15: {
 leader = 32,765,254
 replica = 32,764,930 diff:324
 }

 Interesting that the replica has more docs than the leader.

 Unfortunately, due to some bad log management scripting on my part,
 the logs were lost when these instances got re-started, which really
 bums me out :-(

 For now, I'm going to assume the replica with more docs is the one I
 want to keep and will replicate the full index over to the other one.
 Sorry about losing the logs :-(

 Tim




 On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
 wrote:
> Thanks for responding Mark. I'll collect the information you asked
> about and open a JIRA once I have a little more understanding of what
> happened. Hopefully I can piece together some story after going over
> the logs.
>
> As for replica / leader, I suspect some leaders went down but
> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
> once and continued to serve queries, which is awesome.
>
> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
> wrote:
>> Yeah, thats no good.
>>
>> You might hit each node with distrib=false to get the doc counts.
>>
>> Which ones have what you think are the right counts and which the wrong 
>> - eg is it all replicas that are off, or leaders as well?
>>
>> You say several replicas - do you mean no leaders went down?
>>
>> You might look closer at the logs for a node that has it's count off.
>>
>> Finally, I guess I'd try and track it in a JIRA issue.
>>
>> - Mark
>>
>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>
>>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>>>
>>> After recovering, when I execute the match all docs query (*:*), I get a
>>> different count each time.
>>>
>>> In other words, if I execute q=*:* several times in a row, then I get a
>>> different count back for numDocs.
>>>
>>> This was not the case prior to the failure as that is one thing we 
>>> monitor
>>> for.
>>>
>>> I think I should be worried ... any ideas on how to troubleshoot this? 
>>> One
>>> thing to mention is that several of my replicas had to do full 
>>> recoveries
>>> from the leader when they came back online. Indexing was happening when 
>>> the
>>> replicas failed.
>>>
>>> Thanks.
>>> Tim
>>
>>
>

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM

Chocolat Factry






  0
  77




  

  1
  0
  8
  615
  

  chocolate
  6544

  


  5
  9
  15
  6
  

  factory
  23614


  factor
  5128


  factus
  290


  factum
  178


  factae
  102

  

false

  chocolate factory
  85
  
chocolate
factory
  

  






Pursut Hapyness




  0
  16




  

  5
  0
  6
  0
  

  pursuit
  1209


  pursue
  108


  pursit
  1


  perdut
  94


  purdue
  70

  


  5
  7
  15
  0
  

  happyness
  175


  hapiness
  62


  hayness
  1


  happiness
  7788


  harkness
  324

  

false

  pursuit happyness
  10
  
pursuit
happyness
  

  



Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr dynamic fields scalability

2013-04-22 Thread jhuffaker

Hi All,

I was curious how lucene/solr scale as the total number of non-stored fields
grow.  So, for example, if my average document has 50 fields on it, but the
total number of fields in the system is upwards of 100k and I query on one
of those fields: Will I see runtime that is proportional to the total number
of fields in the system?  Or will it be solely proportional to the corpus
size?

I've tried searching and running my own benchmarks, but all of my answers
have been unsatisfactory thus far.

Let me know if there are any parts of the question I can clarify.

Regards,
John



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dynamic-fields-scalability-tp4058090.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread Dyer, James

This doesn't make a lot of sense to me as in both cases the very first 
collation it tries is the one it is returning.  So you're getting a very 
optimized spellcheck in both cases.  But it does have to issue both queries 2 
times:  the first time, it tries the user's main query anding there are not 
enough hits, it then tries the collation query to see how many hits that will 
return.  Could it be that these two queries just are less/more expensive and 
that difference gets magnified by running each twice?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Monday, April 22, 2013 4:04 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

Chocolat Factry






  0
  77




  

  1
  0
  8
  615
  

  chocolate
  6544

  


  5
  9
  15
  6
  

  factory
  23614


  factor
  5128


  factus
  290


  factum
  178


  factae
  102

  

false

  chocolate factory
  85
  
chocolate
factory
  

  






Pursut Hapyness




  0
  16




  

  5
  0
  6
  0
  

  pursuit
  1209


  pursue
  108


  pursit
  1


  perdut
  94


  purdue
  70

  


  5
  7
  15
  0
  

  happyness
  175


  hapiness
  62


  hayness
  1


  happiness
  7788


  harkness
  324

  

false

  pursuit happyness
  10
  
pursuit
happyness
  

  



Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Soft Commit and Document Cache

2013-04-22 Thread Niran Fajemisin

Hi all,

A quick (and hopefully simply) question: Does the document cache (or any of the 
other caches for that matter), get invalidated after a soft commit has been 
performed?

Thanks,
Niran

Re: Soft Commit and Document Cache

Yup - all of the top level caches are. It's a trade off - don't NRT more than 
you need to.

- Mark

On Apr 22, 2013, at 6:16 PM, Niran Fajemisin  wrote:

> Hi all,
> 
> A quick (and hopefully simply) question: Does the document cache (or any of 
> the other caches for that matter), get invalidated after a soft commit has 
> been performed?
> 
> Thanks,
> Niran

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

No worries, thanks for the info. Let me know if you gain any more insight! I'd 
love to figure out what happened here and address it. And I'm especially 
interested in knowing if you lost any updates if you are able to determine that.

- Mark

On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:

> I ended up just nuking the index on the replica with less docs and
> restarting it - which triggered the snap pull from the leader. So now
> I'm in sync and have better processes in place to capture the
> information if it happens again, which given some of the queries my UI
> team develops, is highly likely ;-)
> 
> Also, all our input data to Solr lives in Hive so I'm doing some id
> -to- id comparisons of what is in Solr vs. what is in Hive to find any
> discrepancies.
> 
> Again, sorry about the loss of the logs. This is a tough scenario to
> try to re-create as it was a perfect storm of high indexing throughput
> and a rogue query.
> 
> Tim
> 
> On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller  wrote:
>> What do you know about the # of docs you *should*? Do you have that mean 
>> when taking the bad replica out of the equation?
>> 
>> - Mark
>> 
>> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>> 
>>> Bummer on the log loss :(
>>> 
>>> Good info though. Somehow that replica became active without actually 
>>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
>>> little surprised, but it's hard to speculate how it happened without the 
>>> logs. Specially, the logs from the node that is off would be great - we 
>>> would see what it did when it recovered and why it might think it was in 
>>> sync :(
>>> 
>>> - Mark
>>> 
>>> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
>>> 
 nm - can't read my own output - the leader had more docs than the replica 
 ;-)
 
 On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
 wrote:
> Have a little more info about this ... the numDocs for *:* fluctuates
> between two values (difference of 324 docs) depending on which nodes I
> hit (distrib=true)
> 
> 589,674,416
> 589,674,092
> 
> Using distrib=false, I found 1 shard with a mis-match:
> 
> shard15: {
> leader = 32,765,254
> replica = 32,764,930 diff:324
> }
> 
> Interesting that the replica has more docs than the leader.
> 
> Unfortunately, due to some bad log management scripting on my part,
> the logs were lost when these instances got re-started, which really
> bums me out :-(
> 
> For now, I'm going to assume the replica with more docs is the one I
> want to keep and will replicate the full index over to the other one.
> Sorry about losing the logs :-(
> 
> Tim
> 
> 
> 
> 
> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
> wrote:
>> Thanks for responding Mark. I'll collect the information you asked
>> about and open a JIRA once I have a little more understanding of what
>> happened. Hopefully I can piece together some story after going over
>> the logs.
>> 
>> As for replica / leader, I suspect some leaders went down but
>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>> once and continued to serve queries, which is awesome.
>> 
>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
>> wrote:
>>> Yeah, thats no good.
>>> 
>>> You might hit each node with distrib=false to get the doc counts.
>>> 
>>> Which ones have what you think are the right counts and which the wrong 
>>> - eg is it all replicas that are off, or leaders as well?
>>> 
>>> You say several replicas - do you mean no leaders went down?
>>> 
>>> You might look closer at the logs for a node that has it's count off.
>>> 
>>> Finally, I guess I'd try and track it in a JIRA issue.
>>> 
>>> - Mark
>>> 
>>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  
>>> wrote:
>>> 
 We had a rogue query take out several replicas in a large 4.2.0 cluster
 today, due to OOM's (we use the JVM args to kill the process on OOM).
 
 After recovering, when I execute the match all docs query (*:*), I get 
 a
 different count each time.
 
 In other words, if I execute q=*:* several times in a row, then I get a
 different count back for numDocs.
 
 This was not the case prior to the failure as that is one thing we 
 monitor
 for.
 
 I think I should be worried ... any ideas on how to troubleshoot this? 
 One
 thing to mention is that several of my replicas had to do full 
 recoveries
 from the leader when they came back online. Indexing was happening 
 when the
 replicas failed.
 
 Thanks.
 Tim
>>> 
>>> 
>>

Re: Soft Commit and Document Cache

2013-04-22 Thread Shawn Heisey


On 4/22/2013 4:16 PM, Niran Fajemisin wrote:

A quick (and hopefully simply) question: Does the document cache (or any of the 
other caches for that matter), get invalidated after a soft commit has been 
performed?


All Solr caches are invalidated when you issue a commit with 
openSearcher set to true.  There would be no reason to do a soft commit 
with openSearcher set to false.  That setting only makes sense with hard 
commits.


If you have queries defined for the newSearcher event, then they will be 
run, which can pre-populate caches.


The filterCache and queryResultCache can be autowarmed on commit - the 
most relevant autowarmCount queries in the cache from the old searcher 
are re-run against the new searcher.  The queryResultWindowSize 
parameter helps control exactly what gets cached with the queryResultCache.


The documentCache cannot be autowarmed, although I *think* that when 
entries from the queryResultCache are run, it will also populate the 
documentCache, though I could be wrong about that.


I do not know whether autowarming is done before or after newSearcher 
queries.


http://wiki.apache.org/solr/SolrCaching

Thanks,
Shawn

Re: Soft Commit and Document Cache

2013-04-22 Thread Niran Fajemisin

Thanks Shawn and Mark! That was very helpful.

-Niran



>
> From: Shawn Heisey 
>To: solr-user@lucene.apache.org 
>Sent: Monday, April 22, 2013 5:30 PM
>Subject: Re: Soft Commit and Document Cache
> 
>
>On 4/22/2013 4:16 PM, Niran Fajemisin wrote:
>> A quick (and hopefully simply) question: Does the document cache (or any of 
>> the other caches for that matter), get invalidated after a soft commit has 
>> been performed?
>
>All Solr caches are invalidated when you issue a commit with 
>openSearcher set to true.  There would be no reason to do a soft commit 
>with openSearcher set to false.  That setting only makes sense with hard 
>commits.
>
>If you have queries defined for the newSearcher event, then they will be 
>run, which can pre-populate caches.
>
>The filterCache and queryResultCache can be autowarmed on commit - the 
>most relevant autowarmCount queries in the cache from the old searcher 
>are re-run against the new searcher.  The queryResultWindowSize 
>parameter helps control exactly what gets cached with the queryResultCache.
>
>The documentCache cannot be autowarmed, although I *think* that when 
>entries from the queryResultCache are run, it will also populate the 
>documentCache, though I could be wrong about that.
>
>I do not know whether autowarming is done before or after newSearcher 
>queries.
>
>http://wiki.apache.org/solr/SolrCaching
>
>Thanks,
>Shawn
>
>
>
>

Re: Error creating collection

What version of Sor? More context for the stack trace?

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Apr 22, 2013 at 5:33 AM, yriveiro  wrote:
> I get this exception when I try to create a new collection. someone have any
> idea that what's going on?
>
> org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RPS_12':
> Could not get shard_id for core: RPS_12
> coreNodeName:192.168.20.48:8983_solr_RPS_12
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

bq: However what will happen to that 10 nodes when I specify replication factor?


I think they just sit around doing nothing.

Best
Erick

On Mon, Apr 22, 2013 at 7:24 AM, Furkan KAMACI  wrote:
> Sorry but if I have 10 shards and a collection with replication factor of 1
> and if I start up 30 nodes what happens to that last 10 nodes? I mean:
>
> 10 nodes as leader
> 10 nodes as replica
>
> if I don't specify replication factor there was going to be a round robin
> system that assigns other 10 machine as:
> + 10 nodes as replica
>
> However what will happen to that 10 nodes when I specify replication factor?
>
>
> 2013/4/22 Erick Erickson 
>
>> 1) Imagine you have lots and lots and lots of different Solr indexes
>> and a 50 node cluster. Further imagine that one of those indexes has 2
>> shards, and a leader + shard is adequate to handle the load. You need
>> some way to limit the number of nodes your index gets distributed to,
>> that's what replicationFactor is for. So in this case
>> replicationFactor=2 will stop assigning nodes to that particular
>> collection after there's a leader + 1 replica
>>
>> 2> In the system you described, there won't be more than one
>> shard/node. But one strategy for growth is to "overshard". That is, in
>> the early days you put (numbers from thin air) 10 shards/node and they
>> are all quite small. As your index grows, you move to two nodes with 5
>> shards each. And later to 5 nodes with 2 shards and so on. There are
>> cases where you want some way to make the most of your hardware yet
>> plan for expansion.
>>
>> Best
>> Erick
>>
>> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI 
>> wrote:
>> > I know that: when using SolrCloud we define the number of shards into the
>> > system. When we start up new Solr instances each one will be a a leader
>> for
>> > a shard, and if I continue to start up new Solr instances (that has
>> > exceeded the number number of shards) each one will be a replica for each
>> > leader as a round robin process.
>> >
>> > However when I read wiki there are two parameters: *replicationFactor
>> *and *
>> > maxShardsPerNode.
>> >
>> > *1) Can you give details about what are they. If all newly added Solr
>> > instances becomes a replica what is that replication factor for?
>> > 2) If what I wrote is true about that round robin process what is that *
>> > maxShardsPerNode*? How can be more than one shard at the system I
>> described?
>>

Re: ranking score by fields

You can sometimes use the highlighter component to do this, but it's a
little tricky...

But note your syntax isn't doing what you expect.
(field1:apache solr) parses as field1:apache defaultfield:solr. You want
field1:(apache solr)

&debug=all is your friend for these kinds of things, especially the parsed query
section

Best
Erick

On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр
 wrote:
> Hi.
> I want to make subject but don't know exactly how can I do it.
> Example.
> I have index with field1, field2, field3.
> I make a query like:
> (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr)
> And I want to know: is it found this doc by field1 or by field2 or by field3?
>
> I try to make like this: (field1:apache solr)^100 OR (field2:apache solr)^10 
> OR (field3:apache solr)^1
> But the problem is that I don't know range, minimum and maximum value of 
> score for each field.
> With other types of similarities (BM25 or othres) same situation.
> I cant find information about this in manual.
>
> Else, I try to use Relevance Functions, f.e. "termfreq" but it work only with 
> terms, not with phrases, like "apache solr".
>
> May be I miss something or you have other idea to do this?
> And else, I am not a java programmer and best way for me don't  write any 
> plugins for solr.
>
> Thanks.
> Alex.

Export Index and Re-Index XML

2013-04-22 Thread Kalyan Kuram

Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr,The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content 
Kalyan

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Sudhakar Maddineni

We had encountered similar issue few days back with 4.0- Beta version.
We have 6 node - 3 shard cluster setup.And, one of our replica
servers[tomcat] was not responding to any requests because it reached the
max no of the threads[200 -default]. To temporarily fix the issue, we had
to restart the server.After restarting, we realized that there were 2
tomcat processes running[old one + new one].So, we manually killed the two
tomcat processes and had a clean start.And, we observed the numDocs of
replica server not matching to the count on leader.
So, this discrepancy is because we manually killed the process which
interrupted the sync process?

Thx,Sudhakar.




On Mon, Apr 22, 2013 at 3:28 PM, Mark Miller  wrote:

> No worries, thanks for the info. Let me know if you gain any more insight!
> I'd love to figure out what happened here and address it. And I'm
> especially interested in knowing if you lost any updates if you are able to
> determine that.
>
> - Mark
>
> On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:
>
> > I ended up just nuking the index on the replica with less docs and
> > restarting it - which triggered the snap pull from the leader. So now
> > I'm in sync and have better processes in place to capture the
> > information if it happens again, which given some of the queries my UI
> > team develops, is highly likely ;-)
> >
> > Also, all our input data to Solr lives in Hive so I'm doing some id
> > -to- id comparisons of what is in Solr vs. what is in Hive to find any
> > discrepancies.
> >
> > Again, sorry about the loss of the logs. This is a tough scenario to
> > try to re-create as it was a perfect storm of high indexing throughput
> > and a rogue query.
> >
> > Tim
> >
> > On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller 
> wrote:
> >> What do you know about the # of docs you *should*? Do you have that
> mean when taking the bad replica out of the equation?
> >>
> >> - Mark
> >>
> >> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
> >>
> >>> Bummer on the log loss :(
> >>>
> >>> Good info though. Somehow that replica became active without actually
> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a
> little surprised, but it's hard to speculate how it happened without the
> logs. Specially, the logs from the node that is off would be great - we
> would see what it did when it recovered and why it might think it was in
> sync :(
> >>>
> >>> - Mark
> >>>
> >>> On Apr 22, 2013, at 2:19 PM, Timothy Potter 
> wrote:
> >>>
>  nm - can't read my own output - the leader had more docs than the
> replica ;-)
> 
>  On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter <
> thelabd...@gmail.com> wrote:
> > Have a little more info about this ... the numDocs for *:* fluctuates
> > between two values (difference of 324 docs) depending on which nodes
> I
> > hit (distrib=true)
> >
> > 589,674,416
> > 589,674,092
> >
> > Using distrib=false, I found 1 shard with a mis-match:
> >
> > shard15: {
> > leader = 32,765,254
> > replica = 32,764,930 diff:324
> > }
> >
> > Interesting that the replica has more docs than the leader.
> >
> > Unfortunately, due to some bad log management scripting on my part,
> > the logs were lost when these instances got re-started, which really
> > bums me out :-(
> >
> > For now, I'm going to assume the replica with more docs is the one I
> > want to keep and will replicate the full index over to the other one.
> > Sorry about losing the logs :-(
> >
> > Tim
> >
> >
> >
> >
> > On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter <
> thelabd...@gmail.com> wrote:
> >> Thanks for responding Mark. I'll collect the information you asked
> >> about and open a JIRA once I have a little more understanding of
> what
> >> happened. Hopefully I can piece together some story after going over
> >> the logs.
> >>
> >> As for replica / leader, I suspect some leaders went down but
> >> fail-over to new leaders seemed to work fine. We lost about 9 nodes
> at
> >> once and continued to serve queries, which is awesome.
> >>
> >> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller <
> markrmil...@gmail.com> wrote:
> >>> Yeah, thats no good.
> >>>
> >>> You might hit each node with distrib=false to get the doc counts.
> >>>
> >>> Which ones have what you think are the right counts and which the
> wrong - eg is it all replicas that are off, or leaders as well?
> >>>
> >>> You say several replicas - do you mean no leaders went down?
> >>>
> >>> You might look closer at the logs for a node that has it's count
> off.
> >>>
> >>> Finally, I guess I'd try and track it in a JIRA issue.
> >>>
> >>> - Mark
> >>>
> >>> On Apr 19, 2013, at 6:37 PM, Timothy Potter 
> wrote:
> >>>
>  We had a rogue query take out several replicas in a large 4.2.0
> cluster
>

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

Hi Sudhakar,

Unfortunately, we don't know the underlying cause and I lost the logs
that could have helped diagnose further. FWIW, I think this is an
extreme case as I've lost nodes before and haven't had any
discrepancies after recovering. In my case, it was a perfect storm of
high throughput indexing 8-10K docs/sec and an very nasty query that
OOM'd on about half the nodes in my cluster. Of course we want to get
to the bottom of this but it's going to be hard to reproduce. The good
news is I'm recovered and the cluster is consistent again.

Cheers,
Tim

On Mon, Apr 22, 2013 at 5:18 PM, Sudhakar Maddineni
 wrote:
> We had encountered similar issue few days back with 4.0- Beta version.
> We have 6 node - 3 shard cluster setup.And, one of our replica
> servers[tomcat] was not responding to any requests because it reached the
> max no of the threads[200 -default]. To temporarily fix the issue, we had
> to restart the server.After restarting, we realized that there were 2
> tomcat processes running[old one + new one].So, we manually killed the two
> tomcat processes and had a clean start.And, we observed the numDocs of
> replica server not matching to the count on leader.
> So, this discrepancy is because we manually killed the process which
> interrupted the sync process?
>
> Thx,Sudhakar.
>
>
>
>
> On Mon, Apr 22, 2013 at 3:28 PM, Mark Miller  wrote:
>
>> No worries, thanks for the info. Let me know if you gain any more insight!
>> I'd love to figure out what happened here and address it. And I'm
>> especially interested in knowing if you lost any updates if you are able to
>> determine that.
>>
>> - Mark
>>
>> On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:
>>
>> > I ended up just nuking the index on the replica with less docs and
>> > restarting it - which triggered the snap pull from the leader. So now
>> > I'm in sync and have better processes in place to capture the
>> > information if it happens again, which given some of the queries my UI
>> > team develops, is highly likely ;-)
>> >
>> > Also, all our input data to Solr lives in Hive so I'm doing some id
>> > -to- id comparisons of what is in Solr vs. what is in Hive to find any
>> > discrepancies.
>> >
>> > Again, sorry about the loss of the logs. This is a tough scenario to
>> > try to re-create as it was a perfect storm of high indexing throughput
>> > and a rogue query.
>> >
>> > Tim
>> >
>> > On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller 
>> wrote:
>> >> What do you know about the # of docs you *should*? Do you have that
>> mean when taking the bad replica out of the equation?
>> >>
>> >> - Mark
>> >>
>> >> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>> >>
>> >>> Bummer on the log loss :(
>> >>>
>> >>> Good info though. Somehow that replica became active without actually
>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a
>> little surprised, but it's hard to speculate how it happened without the
>> logs. Specially, the logs from the node that is off would be great - we
>> would see what it did when it recovered and why it might think it was in
>> sync :(
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Apr 22, 2013, at 2:19 PM, Timothy Potter 
>> wrote:
>> >>>
>>  nm - can't read my own output - the leader had more docs than the
>> replica ;-)
>> 
>>  On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter <
>> thelabd...@gmail.com> wrote:
>> > Have a little more info about this ... the numDocs for *:* fluctuates
>> > between two values (difference of 324 docs) depending on which nodes
>> I
>> > hit (distrib=true)
>> >
>> > 589,674,416
>> > 589,674,092
>> >
>> > Using distrib=false, I found 1 shard with a mis-match:
>> >
>> > shard15: {
>> > leader = 32,765,254
>> > replica = 32,764,930 diff:324
>> > }
>> >
>> > Interesting that the replica has more docs than the leader.
>> >
>> > Unfortunately, due to some bad log management scripting on my part,
>> > the logs were lost when these instances got re-started, which really
>> > bums me out :-(
>> >
>> > For now, I'm going to assume the replica with more docs is the one I
>> > want to keep and will replicate the full index over to the other one.
>> > Sorry about losing the logs :-(
>> >
>> > Tim
>> >
>> >
>> >
>> >
>> > On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter <
>> thelabd...@gmail.com> wrote:
>> >> Thanks for responding Mark. I'll collect the information you asked
>> >> about and open a JIRA once I have a little more understanding of
>> what
>> >> happened. Hopefully I can piece together some story after going over
>> >> the logs.
>> >>
>> >> As for replica / leader, I suspect some leaders went down but
>> >> fail-over to new leaders seemed to work fine. We lost about 9 nodes
>> at
>> >> once and continued to serve queries, which is awesome.
>> >>
>> >> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller <
>>

Re: Export Index and Re-Index XML

2013-04-22 Thread Shawn Heisey


On 4/22/2013 5:07 PM, Kalyan Kuram wrote:

Hi All I am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr, The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content


The best option is to do the indexing again from whatever source you did 
the index from the first time.  Because your requirements may change at 
any time, this is something that you should be prepared to do quite often.


If you did not set all fields to stored="true" in your schema, then you 
will not be able to export all your documents from your current index to 
a new one.  There is no way around this, you will have to wipe your 
index, go back to your original data source, and do the indexing again.


If you DID store all your fields, then you have two choices.

1) Use the dataimport handler with SolrEntityProcessor.  You can use 
this to import from one core onto another core on the same server with a 
different config/schema, or from one server to another.


http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

2) I don't recommend this option, but it might work.  You can query Solr 
for your docs, one page at a time (use the rows and start parameters), 
with wt=xml or wt=json, and save that output.  With a little bit of 
modification, you can then use what you save as input for indexing. 
Here's a website describing the process and PHP script to make it 
easier.  I have not checked to see whether the script actually works, 
and I won't be able to help you with it:


http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/

Thanks,
Shawn

Re: Export Index and Re-Index XML

Any fields which have stored values can be read and output, but 
indexed-only, non-stored fields cannot be read or exported. Even if they 
could be, their values are post-analysis, which means that there is a good 
chance that they cannot be run through term analysis again.


It is always best to keep a copy of your raw source data separate from the 
data you add to Solr. Or, at least make sure any important data is "stored".


In short, you need to model your data for "reindexing", which is a fact of 
life in Solr land.


-- Jack Krupansky

-Original Message- 
From: Kalyan Kuram

Sent: Monday, April 22, 2013 7:07 PM
To: solr-user@lucene.apache.org
Subject: Export Index and Re-Index XML

Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr,The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content
Kalyan

Too many close, count -1