Re: fuzzy search issue with PatternTokenizer Factory

2013-04-22 Thread meghana
Jack, 

the regex will split tokens by anything expect alphabets , numbers, '&' ,
'-' and ns: (where n is number from 0 to , e.g 4323s: ) 

Lets say for example my text is like below. 

*this is nice* day & sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day & sun is risen*

pattern seem to working fine with different text, 

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED* , etc... 

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly. 


Jack Krupansky-2 wrote
> Give us some examples of tokens that you are expecting that pattern to 
> tokenize. And express the pattern in simple English as well. Some some 
> actual input data.
> 
> I suspect that Solr is working fine - but you may not have precisely 
> specified your pattern. But we don't know what your pattern is supposed to 
> recognize.
> 
> Maybe some of your previous hits had punctuation adjacent to to the terms 
> that your pattern doesn't recognize.
> 
> And use the Solr Admin UI Analysis page to see how your sample input data
> is 
> analyzed.
> w
> One other thing... without a "group", the pattern specifies what delimiter 
> sequence will "split" the rest of the input into tokens. I suspect you 
> didn't mean this.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: meghana
> Sent: Friday, April 19, 2013 9:01 AM
> To: 

> solr-user@.apache

> Subject: fuzzy search issue with PatternTokenizer Factory
> 
> I m using Solr4.2 , I have changed my text field definition, to use the
> Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
> and
> changed my schema defination as below
>  positionIncrementGap="100">
>   
> 
>
>  pattern="[^a-zA-Z0-9&\-']|\d{0,4}s:" />
>
>  words="stopwords.txt" enablePositionIncrements="false" />
> 
> 
>   
> 
>   
> 
>
>  pattern="[^a-zA-Z0-9&\-']|\d{0,4}s:" />
>
>  words="stopwords_extra_query.txt" enablePositionIncrements="false" />
>
>  ignoreCase="true" expand="true"/>
>   
> 
>   
> 
> 
> 
> after doing so, fuzzy search do not seems to working properly as it was
> working before.
> 
> I m searching with search term : worde~1
> 
> on search , before it was returning , around 300 records , but now its
> returning only 5 records. not sure what can be issue.
> 
> Can anybody help me to make it work!!
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ComplexPhraseQParserPlugin not working with solr 4.2

2013-04-22 Thread Ahmet Arslan

Hi ilay,

You cannot load this plugin via lib directives. e.g. You need to embbed this 
jar into solar.war file. (by unzip and zip) 

There should be a ReadMe file inside the latest attachment in Jira.



-- On Sat, 4/20/13, ilay raja  wrote:

> From: ilay raja 
> Subject: ComplexPhraseQParserPlugin not working with solr 4.2
> To: solr-user@lucene.apache.org, solr-...@lucene.apache.org
> Date: Saturday, April 20, 2013, 8:00 PM
> Hi
> 
>   I followed the steps given in
> https://issues.apache.org/jira/browse/SOLR-1604 for
> integrating the plugin.
> But is not picking up the classpath correctly. Though added
> the following
> lines to solrconfig.xml
>  />
>  class="org.apache.solr.search.ComplexPhraseQParserPlugin"
> />
> 
> I have the compiled jar in solr-home/dist/
> 
> The exception is as below - unable to create core
> mainindex.
> Is it an issue with using this plugin in 4.2? Does it work
> with 4.0 ?
> 
> 
>      at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
>         at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
>         at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
>         at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
>         at
> org.apache.solr.core.SolrCore.(SolrCore.java:619)
>         at
> org.apache.solr.core.SolrCore.(SolrCore.java:806)
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.search.ComplexPhraseQParserPlugin'
> SEVERE: Unable to create core: mainindex
>


ranking score by fields

2013-04-22 Thread Каскевич Александр
Hi.
I want to make subject but don't know exactly how can I do it.
Example.
I have index with field1, field2, field3.
I make a query like:
(field1:apache solr) OR (field2:apache solr) OR (field3:apache solr)
And I want to know: is it found this doc by field1 or by field2 or by field3?

I try to make like this: (field1:apache solr)^100 OR (field2:apache solr)^10 OR 
(field3:apache solr)^1
But the problem is that I don't know range, minimum and maximum value of score 
for each field.
With other types of similarities (BM25 or othres) same situation.
I cant find information about this in manual.

Else, I try to use Relevance Functions, f.e. "termfreq" but it work only with 
terms, not with phrases, like "apache solr".

May be I miss something or you have other idea to do this?
And else, I am not a java programmer and best way for me don't  write any 
plugins for solr.

Thanks.
Alex.


RE: external values source

2013-04-22 Thread Maciej Liżewski
Hi Timothy,

Thank you for your answer - it is really helpful. Just to clarify - when using 
ValueSource then flow is something like this:
- user sends query
- solr calls ValueSource to prepare values for every document (this part is 
cached in ExternalFileField implementation I guess)
- solr runs query

And above flow is valid in every use case of ValueSource? There are no 
pre-calculated values, etc (just asking to make it clear)? What caching 
scenario is recommended here to make sure you won't end up with different 
cached entry for every query (I think I would follow the example of 
ExternalFileField)?

Another thing is that in most cases array of values created in this process is 
rather sparse.. so I was thinking if there are no other solutions to store them 
with associatnion to documents index...

--
Maciej Liżewski


-Original Message-
From: Timothy Potter [mailto:thelabd...@gmail.com] 
Sent: Saturday, April 20, 2013 2:02 AM
To: solr-user@lucene.apache.org
Subject: Re: external values source

Hi Maciek,

I think a custom ValueSource is definitely what you want because you need to 
compute some derived value based on an indexed field and some external value.

The trick is figuring how to make the lookup to the external data very, very 
fast. Here's a rough sketch of what we do:

We have a table in a database that contains a numeric value for a user and an 
organization, such as query:

select num from table where userId='bob' and orgId=123 (similar to what you 
stated in question #4)

On the Solr side, documents are indexed with user_id_s field, which is half of 
what I need to do my lookup. The orgId is determined by the Solr client at 
query construction time, so is passed to my custom ValueSource (aka function) 
in the query. In our app, users can be associated with many different orgIds 
and changes frequently so we can't index the association.

To do the lookup to the database, we have a custom ValueSource, something like: 
dbLookup(user_id_s, 123)

(note: user_id_s is the name of the field holding my userID values in the index 
and 123 is the orgId)

Behind the scenes, the ValueSource will have access to the user_id_s field 
values using FieldCache, something like:

final BinaryDocValues dv =
FieldCache.DEFAULT.getTerms(reader.reader(), "user_id_s");

This gives us fast access to the user_id_s value for any given doc (question #1 
above) So now we can return an IntDocValues instance by
doing:

@Override
public FunctionValues getValues(Map context, AtomicReaderContext
reader) throws IOException {
final BytesRef br = new BytesRef();
final BinaryDocValues dv =
FieldCache.DEFAULT.getTerms(reader.reader(), fieldName);
return new IntDocValues(this) {
@Override
public int intVal(int doc) {
dv.get(doc,br);
if (br.length == 0)
return 0;

final String user_id_s = br.utf8ToString(); // the indexed 
userID for doc
int val = 0;
// todo: do custom lookup with orgID and user_id_s to compute 
int value for doc
return val;
}
}
...
}

In this code, fieldName is set in the constructor (not shown) by parsing it out 
of the parameters, something like:

this.fieldName =
((org.apache.solr.schema.StrFieldSource)source).getField();

The user_id_s field comes into your ValueSource as a StrFieldSource (or 
whatever type you use) ... here is how the ValueSource gets constructed at 
query time:

public class MyValueSourceParser extends ValueSourceParser {
public void init(NamedList namedList) {}

public ValueSource parse(FunctionQParser fqp) throws SyntaxError {
return new MyValueSource(fqp.parseValueSource(), fqp.parseArg());
}
}

There is one instance of your ValueSourceParser created per core. The parse 
method gets called for every query that uses the ValueSource.

At query time, I might use the ValueSource to return this computed value in my 
fl list, such as:

fl=id,looked_up:dbLookup(user_id_l,123),...

Or to sort by:

sort=dbLookup(user_id_s,123) desc

The data in our table doesn't change that frequently, so we export it to a flat 
file in S3 and our custom ValueSource downloads from S3, transforms it into an 
in-memory HashMap for fast lookups. We thought about just issuing a query to 
load the data from the db directly but we have many nodes and the query is 
expensive and result set is large so we didn't want to hammer our database with 
N Solr nodes querying for the same data at roughly the same time. So we do it 
once and post the compressed results to a shared location. The data in the 
table is "sparse" as compared to the number of documents and userIds we have.

We simply poll S3 for changes every few minutes, which is good enough for us. 
This happens from many nodes in a large Solr Cloud cluster running in EC2 so S3 
works well for us as a distribution mechanism.

Re: Max http connections in CloudSolrServer

2013-04-22 Thread J Mohamed Zahoor

On 18-Apr-2013, at 9:43 PM, Shawn Heisey  wrote:

> Are you using the Jetty included with Solr, or a Jetty installed separately?  


I am using the Jetty that comes with Solr.


> The Jetty included with Solr has a maxThreads value of 1 in its config.  
> The default would be closer to 200, and a single request from a Cloud client 
> likely uses multiple Jetty threads.

The default maxThreads is 1 and minThreads is 10.


./zahoor

Re: Pros and cons of using RAID or different RAIDS?

2013-04-22 Thread Toke Eskildsen
On Mon, 2013-04-22 at 02:04 +0200, Shawn Heisey wrote:
> Aside from cost, the main reason that I have not seriously investigated
> SSD drives is because I have not come across a solution for any level of
> RAID (even RAID1) with SSDs that exposes TRIM to the operating system.
> Without reliable TRIM support, an SSD solution is not viable for a
> long-term setup.

Why not? Enterprise-oriented benchmarks starts by hammering the drives
until they are "fragmented" enough that performance does not suffer any
more from subsequent writes. Even in that state they have vastly
superior latency, compared to spinning drives.

- Toke Eskildsen, State and University Library, Denmark



Re: ComplexPhraseQParserPlugin not working with solr 4.2

2013-04-22 Thread ilay raja
I was able to solve the previous problem of not loading
COmplexPhraseQParserPlugin. Still I am able to run this with
defType=complexphrase:
java.lang.NoSuchMethodError:
org.apache.solr.search.QueryParsing.getQueryParserDefaultOperator(Lorg/apache/solr/schema/IndexSchema;Ljava/lang/String;)Lorg/apache/lucene/queryparser/classic/QueryParser$Operator;

Is there an issue with running ComplexPhraseQParserPLuging (4.00 jar)
agaisnt solr 4.2 ?

On Sat, Apr 20, 2013 at 10:30 PM, ilay raja  wrote:

> Hi
>
>   I followed the steps given in
> https://issues.apache.org/jira/browse/SOLR-1604 for integrating the
> plugin.
> But is not picking up the classpath correctly. Though added the following
> lines to solrconfig.xml
> 
>  class="org.apache.solr.search.ComplexPhraseQParserPlugin" />
>
> I have the compiled jar in solr-home/dist/
>
> The exception is as below - unable to create core mainindex.
> Is it an issue with using this plugin in 4.2? Does it work with 4.0 ?
>
>
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> at
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> at
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> at org.apache.solr.core.SolrCore.(SolrCore.java:619)
> at org.apache.solr.core.SolrCore.(SolrCore.java:806)
> org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.search.ComplexPhraseQParserPlugin'
> SEVERE: Unable to create core: mainindex
>
>


Error creating collection

2013-04-22 Thread yriveiro
I get this exception when I try to create a new collection. someone have any
idea that what's going on?

org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RPS_12':
Could not get shard_id for core: RPS_12
coreNodeName:192.168.20.48:8983_solr_RPS_12



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Severe errors in log

2013-04-22 Thread yriveiro
I have got this in my logs. What's that mean?

ConcurrentLRUCache was not destroyed prior to finalize(),​ indicates a bug
-- POSSIBLE RESOURCE LEAK!!!



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Severe-errors-in-log-tp4057860.html
Sent from the Solr - User mailing list archive at Nabble.com.


The overseer is stucks

2013-04-22 Thread yriveiro
Hi,My overseer has enqueued more than 1 task and apparently is stuck.
Exists any way to force to do the enqueued tasks?A screenshot of the
overseer queue  here   



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-overseer-is-stucks-tp4057862.html
Sent from the Solr - User mailing list archive at Nabble.com.

Stats facet on int/tint fields

2013-04-22 Thread vinothkumar raman
I have a schema like this






I wanted to find the average price faceted on cat. So was using the
stats facet to get the average on the fields like this
http://solr-serv/solr/latest/select?q=*%3A*&wt=xml&indent=true&stats=true&rows=0&stats.field=price&stats.facet=cat

Which throws an exception like this
org.apache.solr.common.SolrException: Server at
http://solr-serv/solr/latest returned non ok status:500,
message:Server Error at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

When i look at the logs this is all i get

null:java.lang.NumberFormatException: For input string: "`=?"


But when i try with stats.facet=cat_name it works perfectly fine. It
doesnt work with any other int/tint field

I am not sure whats really wrong with my query.

(Dropping it to dev list too. Incase if its a bug)


PS: Not sure if its bug so sending it to dev mailing list too.n


Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Erick Erickson
1) Imagine you have lots and lots and lots of different Solr indexes
and a 50 node cluster. Further imagine that one of those indexes has 2
shards, and a leader + shard is adequate to handle the load. You need
some way to limit the number of nodes your index gets distributed to,
that's what replicationFactor is for. So in this case
replicationFactor=2 will stop assigning nodes to that particular
collection after there's a leader + 1 replica

2> In the system you described, there won't be more than one
shard/node. But one strategy for growth is to "overshard". That is, in
the early days you put (numbers from thin air) 10 shards/node and they
are all quite small. As your index grows, you move to two nodes with 5
shards each. And later to 5 nodes with 2 shards and so on. There are
cases where you want some way to make the most of your hardware yet
plan for expansion.

Best
Erick

On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  wrote:
> I know that: when using SolrCloud we define the number of shards into the
> system. When we start up new Solr instances each one will be a a leader for
> a shard, and if I continue to start up new Solr instances (that has
> exceeded the number number of shards) each one will be a replica for each
> leader as a round robin process.
>
> However when I read wiki there are two parameters: *replicationFactor *and *
> maxShardsPerNode.
>
> *1) Can you give details about what are they. If all newly added Solr
> instances becomes a replica what is that replication factor for?
> 2) If what I wrote is true about that round robin process what is that *
> maxShardsPerNode*? How can be more than one shard at the system I described?


Re: Solr cloud and batched updates

2013-04-22 Thread Erick Erickson
Thanks Yonik! You see how behind the times I get

On Sun, Apr 21, 2013 at 5:07 PM, Timothy Potter  wrote:
> That's awesome! Thanks Yonik.
>
> Tim
>
> On Sun, Apr 21, 2013 at 1:30 PM, Yonik Seeley  wrote:
>> On Sun, Apr 21, 2013 at 11:57 AM, Timothy Potter  
>> wrote:
>>> There's no problem here, but I'm curious about how batches of updates
>>> are handled on the Solr server side in Solr cloud?
>>>
>>> Going over the code for DistributedUpdateProcessor and
>>> SolrCmdDistributor, it appears that the batch is broken down and docs
>>> are processed one-by-one. By processed, I mean that each doc in the
>>> batch from the client is sent to replicas individually.
>>>
>>> This makes sense but I wonder if the forwarding on to replicas could
>>> be done in sub-batches?
>>
>> Good news... they already are sent in batches!  The docs are processed
>> one-by-one, but then buffered (into batches) for forwarding to
>> replicas.
>>
>> -Yonik
>> http://lucidworks.com


Re: is phrase search possible in solr

2013-04-22 Thread Erick Erickson
bq: wherein if I have a query in double quotes it simply ignores all the
tokenizers and analyzers.

Nope. In general you're quite right, you need to re-index whenever you
change your schema... You could define the query part of your field
to just use KeywordTokenizerFactory, but that would affect _all_ queries
which doesn't work for your case..

You might be able to spoof things with, say, the "raw" query parser, see:
http://wiki.apache.org/solr/SolrQuerySyntax
or perhaps the "term" query, but I think you'll have some issues here if you
need to have more than one term next to each other (i.e. phrases). And
you'll have to handle all the upstream bits yourself, e.g. making sure
casing matches.  DelhiDareDevil is indexed as delhidaredevil for instance.

You could write your own query parser that handled this as a special
case, but that would involve quite a lot of work.

Best
Erick

On Mon, Apr 22, 2013 at 1:02 AM, vicky desai  wrote:
> Hi Jack,
>
> Making a changes in the schema either keyword tokenizer or copy field option
> which u suggested would require reindexing of entire data. Is there an
> option wherein if I have a query in double quotes it simply ignores all the
> tokenizers and analyzers.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/is-phrase-search-possible-in-solr-tp4057312p4057804.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: Stats facet on int/tint fields

2013-04-22 Thread Michael Ryan
Sounds like this could be https://issues.apache.org/jira/browse/SOLR-2976.

-Michael

-Original Message-
From: vinothkumar raman [mailto:vinothkr.k...@gmail.com] 
Sent: Monday, April 22, 2013 5:54 AM
To: solr-user@lucene.apache.org; solr-...@lucene.apache.org
Subject: Stats facet on int/tint fields

I have a schema like this






I wanted to find the average price faceted on cat. So was using the stats facet 
to get the average on the fields like this 
http://solr-serv/solr/latest/select?q=*%3A*&wt=xml&indent=true&stats=true&rows=0&stats.field=price&stats.facet=cat

Which throws an exception like this
org.apache.solr.common.SolrException: Server at http://solr-serv/solr/latest 
returned non ok status:500, message:Server Error at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:169)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

When i look at the logs this is all i get

null:java.lang.NumberFormatException: For input string: "`=?"


But when i try with stats.facet=cat_name it works perfectly fine. It doesnt 
work with any other int/tint field

I am not sure whats really wrong with my query.

(Dropping it to dev list too. Incase if its a bug)


PS: Not sure if its bug so sending it to dev mailing list too.n


Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Furkan KAMACI
Sorry but if I have 10 shards and a collection with replication factor of 1
and if I start up 30 nodes what happens to that last 10 nodes? I mean:

10 nodes as leader
10 nodes as replica

if I don't specify replication factor there was going to be a round robin
system that assigns other 10 machine as:
+ 10 nodes as replica

However what will happen to that 10 nodes when I specify replication factor?


2013/4/22 Erick Erickson 

> 1) Imagine you have lots and lots and lots of different Solr indexes
> and a 50 node cluster. Further imagine that one of those indexes has 2
> shards, and a leader + shard is adequate to handle the load. You need
> some way to limit the number of nodes your index gets distributed to,
> that's what replicationFactor is for. So in this case
> replicationFactor=2 will stop assigning nodes to that particular
> collection after there's a leader + 1 replica
>
> 2> In the system you described, there won't be more than one
> shard/node. But one strategy for growth is to "overshard". That is, in
> the early days you put (numbers from thin air) 10 shards/node and they
> are all quite small. As your index grows, you move to two nodes with 5
> shards each. And later to 5 nodes with 2 shards and so on. There are
> cases where you want some way to make the most of your hardware yet
> plan for expansion.
>
> Best
> Erick
>
> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI 
> wrote:
> > I know that: when using SolrCloud we define the number of shards into the
> > system. When we start up new Solr instances each one will be a a leader
> for
> > a shard, and if I continue to start up new Solr instances (that has
> > exceeded the number number of shards) each one will be a replica for each
> > leader as a round robin process.
> >
> > However when I read wiki there are two parameters: *replicationFactor
> *and *
> > maxShardsPerNode.
> >
> > *1) Can you give details about what are they. If all newly added Solr
> > instances becomes a replica what is that replication factor for?
> > 2) If what I wrote is true about that round robin process what is that *
> > maxShardsPerNode*? How can be more than one shard at the system I
> described?
>


Re: Dynamically loading Elevation Info

2013-04-22 Thread Erick Erickson
I believe (but don't know for sure) that the QEV file is re-read on
core reload, which the same app that modifies the elevator.xml file
could trigger with an http request, see:

http://wiki.apache.org/solr/CoreAdmin#RELOAD

At least that's what I would try first.

Best
Erick

On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> Hi,
>  Business User wants to configure the elevation text and the IDs and they
> want to have an UI to do the same. As soon as they configure, it should be
> reflected  in SOLR,(without restarting).
>
> My understanding is, Now, the QueryElevationComponent reads the
> Elevator.xml(Configurable) and loads the information into ElevationCache
> during startup and uses the information while responding to queries. Is
> there any way, the content in the ElevationCache can be modifiable  by
> some other external process / is there any easy way of achieving this
> requirement ?
>
> Thanks and Regards,
> Saroj Kumar Choudhury
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>


Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Jan Høydahl
2) Does this mean that if you have one physical server with one Solr instance,
   and you try to create a collection with numShards=2&maxShardsPerNode=2
   then it will succeed, putting three shards on the same node?

If you then add another node, you still need to move one shard over to the
new node manually, don't you? Is there a JIRA to auto-balance shards?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

22. apr. 2013 kl. 13:04 skrev Erick Erickson :

> 1) Imagine you have lots and lots and lots of different Solr indexes
> and a 50 node cluster. Further imagine that one of those indexes has 2
> shards, and a leader + shard is adequate to handle the load. You need
> some way to limit the number of nodes your index gets distributed to,
> that's what replicationFactor is for. So in this case
> replicationFactor=2 will stop assigning nodes to that particular
> collection after there's a leader + 1 replica
> 
> 2> In the system you described, there won't be more than one
> shard/node. But one strategy for growth is to "overshard". That is, in
> the early days you put (numbers from thin air) 10 shards/node and they
> are all quite small. As your index grows, you move to two nodes with 5
> shards each. And later to 5 nodes with 2 shards and so on. There are
> cases where you want some way to make the most of your hardware yet
> plan for expansion.
> 
> Best
> Erick
> 
> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  wrote:
>> I know that: when using SolrCloud we define the number of shards into the
>> system. When we start up new Solr instances each one will be a a leader for
>> a shard, and if I continue to start up new Solr instances (that has
>> exceeded the number number of shards) each one will be a replica for each
>> leader as a round robin process.
>> 
>> However when I read wiki there are two parameters: *replicationFactor *and *
>> maxShardsPerNode.
>> 
>> *1) Can you give details about what are they. If all newly added Solr
>> instances becomes a replica what is that replication factor for?
>> 2) If what I wrote is true about that round robin process what is that *
>> maxShardsPerNode*? How can be more than one shard at the system I described?



Re: is phrase search possible in solr

2013-04-22 Thread Jack Krupansky

"I want queries within double quotes to be ..."

Just to be clear (as already stated), you do not get to set the semantics of 
quotes, which are set by the query parser and the analyzer for the field - 
if you want a different semantics, copy the data to another field and use 
that different semantics in the new field's analyzer.


But also to be clear, in case anybody is simply reading the message subject 
line literally, yes, phrase search is possible in Solr.


-- Jack Krupansky

-Original Message- 
From: vicky desai

Sent: Monday, April 22, 2013 1:50 AM
To: solr-user@lucene.apache.org
Subject: Re: is phrase search possible in solr

Hi,

If I use shinglingFilter than all type of queries will be impacted. I want
queries within double quotes to be an exact search but for queries without
double quotes all analyzers and tokenizers should be applied. Is there a
setting or a configuration in schema.xml which can cater this requirement



--
View this message in context: 
http://lucene.472066.n3.nabble.com/is-phrase-search-possible-in-solr-tp4057312p4057812.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Jack Krupansky
"replicationFactor=2 will stop assigning nodes to that particular collection 
after there's a leader + 1 replica"


They are both replicas, right?

I mean, at any given moment one of the replicas will also have a role of 
"leader", but it's still a replica - in SolrCloud, that is, as opposed to 
old master/slave/replica Solr.


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Monday, April 22, 2013 7:04 AM
To: solr-user@lucene.apache.org
Subject: Re: Where to use replicationFactor and maxShardsPerNode at 
SolrCloud?


1) Imagine you have lots and lots and lots of different Solr indexes
and a 50 node cluster. Further imagine that one of those indexes has 2
shards, and a leader + shard is adequate to handle the load. You need
some way to limit the number of nodes your index gets distributed to,
that's what replicationFactor is for. So in this case
replicationFactor=2 will stop assigning nodes to that particular
collection after there's a leader + 1 replica

2> In the system you described, there won't be more than one
shard/node. But one strategy for growth is to "overshard". That is, in
the early days you put (numbers from thin air) 10 shards/node and they
are all quite small. As your index grows, you move to two nodes with 5
shards each. And later to 5 nodes with 2 shards and so on. There are
cases where you want some way to make the most of your hardware yet
plan for expansion.

Best
Erick

On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI  
wrote:

I know that: when using SolrCloud we define the number of shards into the
system. When we start up new Solr instances each one will be a a leader 
for

a shard, and if I continue to start up new Solr instances (that has
exceeded the number number of shards) each one will be a replica for each
leader as a round robin process.

However when I read wiki there are two parameters: *replicationFactor *and 
*

maxShardsPerNode.

*1) Can you give details about what are they. If all newly added Solr
instances becomes a replica what is that replication factor for?
2) If what I wrote is true about that round robin process what is that *
maxShardsPerNode*? How can be more than one shard at the system I 
described? 




Re: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread Yonik Seeley
Thanks David,

I've confirmed this is still a problem in trunk and opened
https://issues.apache.org/jira/browse/SOLR-4746

-Yonik
http://lucidworks.com


On Sun, Apr 21, 2013 at 11:16 PM, David Parks  wrote:
> We just took an installation of 4.1 which was working fine and changed it to
> run as solr cloud. We encountered the most incredibly bizarre apparent bug:
>
> In the JSON output, a colon ':' changed to a comma ',', which of course
> broke the JSON parser.  I'm guessing I should file this as a bug, but it was
> so odd I thought I'd post here before doing so. Demo below:
>
> Here is a query on our previous single-server instance:
>
> Query:
> --
> http://10.1.3.28:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalog_name
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&group.li
> mit=50
>
> Response:
> -
> {"responseHeader":{"status":0,"QTime":15714,"params":{"fl":"score,id,unique_
> catalog_name","start":"0","q":"book","group.limit":"50","group.field":"uniqu
> e_catalog_name","group":"true","wt":"json","rows":"50"}},"grouped":{"unique_
> catalog_name":{"matches":106711214,"groups":[{"groupValue":"ls:2653","doclis
> t":{"numFound":103981882,"start":0,"maxScore":4.7039795,"docs":[{"id":"10055
> 02088784","score":4.7039795},{"id":"1005500291075","score":4.7039795},{"id":
> "1000810546074","score":4.7039795},{"id":"1000611003270","score":4.7039795},
>
> Note this part:
> --
>   {"unique_catalog_name":{"matches":
>
>
>
> Now we run that same query on a server that was derived from the same build,
> just configuration changes to run it in distributed "solr cloud" mode.
>
> Query:
> -
> http://10.1.3.18:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalog_name
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&group.li
> mit=50
>
> Response:
> -{"responseHeader":{"status":0,"QTime":8855,"params":{"fl":"scor
> e,id,unique_catalog_name","start":"0","q":"book","group.limit":"50","group.f
> ield":"unique_catalog_name","group":"true","wt":"json","rows":"50"}},"groupe
> d":["unique_catalog_name",{"matches":106711214,"groups":[{"groupValue":"ls:2
> 653","doclist":{"numFound":103981882,"start":0,"maxScore":4.7042913,"docs":[
> {"id":"1005502088784","score":4.7042913},{"id":"1000611003270","score":4.704
> 2913},{"id":"1005500291075","score":4.703668},{"id":"1000810546074","score":
> 4.703668},
>
> Note how it's changed:
> 
>   "unique_catalog_name",{"matches":
>
>
>
>


Re: fuzzy search issue with PatternTokenizer Factory

2013-04-22 Thread Jack Krupansky
Once again, fuzzy search is completely independent of your analyzer or 
pattern tokenizer. Please use the Solr Admin UI Analysis page to debug 
whether the terms are what you expect. And realize that fuzzy search has a 
maximum editing distance of 2 and that includes case changes.


-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Monday, April 22, 2013 3:25 AM
To: solr-user@lucene.apache.org
Subject: Re: fuzzy search issue with PatternTokenizer Factory

Jack,

the regex will split tokens by anything expect alphabets , numbers, '&' ,
'-' and ns: (where n is number from 0 to , e.g 4323s: )

Lets say for example my text is like below.

*this is nice* day & sun 53s: is risen. *

Then pattern tokenizer should create tokens as

*this is nice day & sun is risen*

pattern seem to working fine with different text,

also for fuzzy search *worde~1*, I have checked the results returns for
patterntokenizer factory, having punctuation marks like '*WORDS,*' ,
*WORDED* , etc...

One more weird thing is, all the results are in uppercase letters, no
results with lowercase results come. although it does not return all results
of uppercase letters.

but not sure after changing to this fuzzy search not working properly.


Jack Krupansky-2 wrote

Give us some examples of tokens that you are expecting that pattern to
tokenize. And express the pattern in simple English as well. Some some
actual input data.

I suspect that Solr is working fine - but you may not have precisely
specified your pattern. But we don't know what your pattern is supposed to
recognize.

Maybe some of your previous hits had punctuation adjacent to to the terms
that your pattern doesn't recognize.

And use the Solr Admin UI Analysis page to see how your sample input data
is
analyzed.
w
One other thing... without a "group", the pattern specifies what delimiter
sequence will "split" the rest of the input into tokens. I suspect you
didn't mean this.

-- Jack Krupansky

-Original Message- 
From: meghana

Sent: Friday, April 19, 2013 9:01 AM
To:



solr-user@.apache



Subject: fuzzy search issue with PatternTokenizer Factory

I m using Solr4.2 , I have changed my text field definition, to use the
Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory ,
and
changed my schema defination as below

























after doing so, fuzzy search do not seems to working properly as it was
working before.

I m searching with search term : worde~1

on search , before it was returning , around 300 records , but now its
returning only 5 records. not sure what can be issue.

Can anybody help me to make it work!!







--
View this message in context:
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275.html
Sent from the Solr - User mailing list archive at Nabble.com.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/fuzzy-search-issue-with-PatternTokenizer-Factory-tp4057275p4057831.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: ComplexPhraseQParserPlugin not working with solr 4.2

2013-04-22 Thread Ahmet Arslan

Hi ilay,

Can you try ComplexPhrase-4.2.1.zip, it supposed to work with 4.2.


--- On Mon, 4/22/13, ilay raja  wrote:

> From: ilay raja 
> Subject: Re: ComplexPhraseQParserPlugin not working with solr 4.2
> To: solr-user@lucene.apache.org, solr-...@lucene.apache.org
> Date: Monday, April 22, 2013, 12:30 PM
> I was able to solve the previous
> problem of not loading
> COmplexPhraseQParserPlugin. Still I am able to run this
> with
> defType=complexphrase:
> java.lang.NoSuchMethodError:
> org.apache.solr.search.QueryParsing.getQueryParserDefaultOperator(Lorg/apache/solr/schema/IndexSchema;Ljava/lang/String;)Lorg/apache/lucene/queryparser/classic/QueryParser$Operator;
> 
> Is there an issue with running ComplexPhraseQParserPLuging
> (4.00 jar)
> agaisnt solr 4.2 ?
> 
> On Sat, Apr 20, 2013 at 10:30 PM, ilay raja 
> wrote:
> 
> > Hi
> >
> >   I followed the steps given in
> > https://issues.apache.org/jira/browse/SOLR-1604 for
> integrating the
> > plugin.
> > But is not picking up the classpath correctly. Though
> added the following
> > lines to solrconfig.xml
> >  regex="ComplexPhrase-\d.*\.jar" />
> >  >
> class="org.apache.solr.search.ComplexPhraseQParserPlugin"
> />
> >
> > I have the compiled jar in solr-home/dist/
> >
> > The exception is as below - unable to create core
> mainindex.
> > Is it an issue with using this plugin in 4.2? Does it
> work with 4.0 ?
> >
> >
> >      at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >         at
> >
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> >         at
> java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         at
> >
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >         at
> >
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629)
> >         at
> >
> org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634)
> >         at
> >
> org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051)
> >         at
> >
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021)
> >         at
> org.apache.solr.core.SolrCore.(SolrCore.java:619)
> >         at
> org.apache.solr.core.SolrCore.(SolrCore.java:806)
> > org.apache.solr.common.SolrException: Error loading
> class
> > 'org.apache.solr.search.ComplexPhraseQParserPlugin'
> > SEVERE: Unable to create core: mainindex
> >
> >
>


Re: SolrCloud Leaders

2013-04-22 Thread Furkan KAMACI
Hi Jack;

You said: "An hour from now some other replica may be the leader"

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky 

> All nodes are replicas in SolrCloud since there are no masters. It's a
> fully distributed model. A leader is also a replica. A leader is simply a
> replica which was elected to be a leader, for now. An hour from now some
> other replica may be the leader.
>
> It is indeed misleading and inaccurate to suggest that "leader" and
> "replicas" are disjoint.
>
> Once again, I think you are confusing SolrCloud with the older Solr
> master/slave/replication.
>
> Every node in SolrCloud can do indexing. That's the same as saying that
> every replica in SolrCloud can do indexing.
>
> Although we do need to be clear that a given replica will only index
> documents for the shard(s) to which it belongs.
>
>
> -- Jack Krupansky
>
> -Original Message- From: Furkan KAMACI
> Sent: Monday, April 15, 2013 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud Leaders
>
> Here writes something:
>
> https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
>
> says:
>
> Both leaders and replicas index items and perform searches.
>
> How replicas index items?
>
>
> 2013/4/15 Furkan KAMACI 
>
>  Does leaders may response search requests (I mean do they store indexes)
>> at when I run SolrCloud at first and after a time later?
>>
>>
>> 2013/4/15 Jack Krupansky 
>>
>>  When the cluster is fully operational, yes. But if part of the cluster is
>>> down or split and unable to communicate, or leader election is in
>>> progress,
>>> the actual count of leaders will not be indicative of the number of
>>> shards.
>>>
>>> Leaders and shards are apples and oranges. If you take down a cluster, by
>>> definition it would have no leaders (because leaders are running code),
>>> but
>>> shards are the files in the index on disk that continue to exist even if
>>> the code is not running. So, in the extreme, the number of leaders can be
>>> zero while the number of shards is non-zero on disk.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Furkan KAMACI
>>> Sent: Monday, April 15, 2013 8:21 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: SolrCloud Leaders
>>>
>>>
>>> Does number of leaders at a SolrCloud is equal to number of shards?
>>>
>>>
>>
>>
>


Re: SolrCloud Leaders

2013-04-22 Thread Otis Gospodnetic
If the current leader dies, somebody's got to take over.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Apr 22, 2013 at 9:41 AM, Furkan KAMACI  wrote:
> Hi Jack;
>
> You said: "An hour from now some other replica may be the leader"
>
> What is the criteria to change a leader of a shard?
>
> 2013/4/15 Jack Krupansky 
>
>> All nodes are replicas in SolrCloud since there are no masters. It's a
>> fully distributed model. A leader is also a replica. A leader is simply a
>> replica which was elected to be a leader, for now. An hour from now some
>> other replica may be the leader.
>>
>> It is indeed misleading and inaccurate to suggest that "leader" and
>> "replicas" are disjoint.
>>
>> Once again, I think you are confusing SolrCloud with the older Solr
>> master/slave/replication.
>>
>> Every node in SolrCloud can do indexing. That's the same as saying that
>> every replica in SolrCloud can do indexing.
>>
>> Although we do need to be clear that a given replica will only index
>> documents for the shard(s) to which it belongs.
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Furkan KAMACI
>> Sent: Monday, April 15, 2013 9:38 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: SolrCloud Leaders
>>
>> Here writes something:
>>
>> https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and
>>
>> says:
>>
>> Both leaders and replicas index items and perform searches.
>>
>> How replicas index items?
>>
>>
>> 2013/4/15 Furkan KAMACI 
>>
>>  Does leaders may response search requests (I mean do they store indexes)
>>> at when I run SolrCloud at first and after a time later?
>>>
>>>
>>> 2013/4/15 Jack Krupansky 
>>>
>>>  When the cluster is fully operational, yes. But if part of the cluster is
 down or split and unable to communicate, or leader election is in
 progress,
 the actual count of leaders will not be indicative of the number of
 shards.

 Leaders and shards are apples and oranges. If you take down a cluster, by
 definition it would have no leaders (because leaders are running code),
 but
 shards are the files in the index on disk that continue to exist even if
 the code is not running. So, in the extreme, the number of leaders can be
 zero while the number of shards is non-zero on disk.

 -- Jack Krupansky

 -Original Message- From: Furkan KAMACI
 Sent: Monday, April 15, 2013 8:21 AM
 To: solr-user@lucene.apache.org
 Subject: SolrCloud Leaders


 Does number of leaders at a SolrCloud is equal to number of shards?


>>>
>>>
>>


Re: SolrCloud Leaders

2013-04-22 Thread Jack Krupansky
Leader election will result from nodes coming up and going down as well as 
changes in network connectivity and even simply responsiveness between the 
nodes. A "quorum" is always needed.


There may be other reasons as well that I don't know about.

The point was simply that it is not a "leader" vs. "replica" issue - all of 
the nodes are replicas and one replica just "happens" to be be playing the 
role of leader at a given moment.


-- Jack Krupansky

-Original Message- 
From: Furkan KAMACI

Sent: Monday, April 22, 2013 9:41 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Hi Jack;

You said: "An hour from now some other replica may be the leader"

What is the criteria to change a leader of a shard?

2013/4/15 Jack Krupansky 


All nodes are replicas in SolrCloud since there are no masters. It's a
fully distributed model. A leader is also a replica. A leader is simply a
replica which was elected to be a leader, for now. An hour from now some
other replica may be the leader.

It is indeed misleading and inaccurate to suggest that "leader" and
"replicas" are disjoint.

Once again, I think you are confusing SolrCloud with the older Solr
master/slave/replication.

Every node in SolrCloud can do indexing. That's the same as saying that
every replica in SolrCloud can do indexing.

Although we do need to be clear that a given replica will only index
documents for the shard(s) to which it belongs.


-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud Leaders

Here writes something:

https://support.lucidworks.com/entries/22180608-Solr-HA-DR-overview-3-x-and-4-0-SolrCloud-and

says:

Both leaders and replicas index items and perform searches.

How replicas index items?


2013/4/15 Furkan KAMACI 

 Does leaders may response search requests (I mean do they store indexes)

at when I run SolrCloud at first and after a time later?


2013/4/15 Jack Krupansky 

 When the cluster is fully operational, yes. But if part of the cluster 
is

down or split and unable to communicate, or leader election is in
progress,
the actual count of leaders will not be indicative of the number of
shards.

Leaders and shards are apples and oranges. If you take down a cluster, 
by

definition it would have no leaders (because leaders are running code),
but
shards are the files in the index on disk that continue to exist even if
the code is not running. So, in the extreme, the number of leaders can 
be

zero while the number of shards is non-zero on disk.

-- Jack Krupansky

-Original Message- From: Furkan KAMACI
Sent: Monday, April 15, 2013 8:21 AM
To: solr-user@lucene.apache.org
Subject: SolrCloud Leaders


Does number of leaders at a SolrCloud is equal to number of shards?











spellcheck: change in behavior and QTime

2013-04-22 Thread SandeepM
I am using the same setup (solrconfig.xml and schema.xml) as stated in my
prior message:
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tt4057176.html#a4057389
I am using SOLR 4.2.1 . Just wanted to report something wierd that I am
seeing and would like to find out if anyone else is seeing this behavior.  
Since I don't understand the details of what is happening, I'd like to know
why the change in behavior and if we can do anything to get better QTime
upfront?

I see a change in behavior when running queries against the server due to
which the QTime also changes.

QUERY:
?spellcheck=true
&spellcheck.q=cucoo's+nest
&df=spell
&fq= Its the same every time and I believe moot.

Here is what I have to do:
1.  Run the query.
2.  Run the same query with spellcheck=false
3.  Run the original query (spellcheck=true)

QTime from each of the above stages:
1.  40ms (multiple runs with spellcheck=true.)
2.  10ms (spellcheck = false is run just once)
3.  20ms (after changing back to spellcheck=true again and running multiple
times.)

Cache details at each of the above times:
1.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 30, Now: 35, Delta: 5

hits:
Was: 25, Now: 30, Delta: 5

hitratio:
Was: 0.83, Now: 0.85

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 30, Now: 35, Delta: 5

cumulative_hits:
Was: 25, Now: 30, Delta: 5

cumulative_hitratio:
Was: 0.83, Now: 0.85

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=40960, initialSize=10240,
minSize=36864, acceptableSize=38912, cleanupThread=false,
autowarmCount=2560,
regenerator=org.apache.solr.search.SolrIndexSearcher$3@520adaf0)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 8, Now: 10, Delta: 2

hits:
Was: 3, Now: 4, Delta: 1

hitratio:
Was: 0.37, Now: 0.40

inserts:
Was: 5, Now: 6, Delta: 1

evictions:
0

size:
Was: 6, Now: 7, Delta: 1

warmupTime:
0

cumulative_lookups:
Was: 8, Now: 10, Delta: 2

cumulative_hits:
Was: 3, Now: 4, Delta: 1

cumulative_hitratio:
Was: 0.37, Now: 0.40

cumulative_inserts:
Was: 5, Now: 6, Delta: 1

cumulative_evictions:
0

CACHE 2
CORE
HIGHLIGHTING
OTHER
QUERYHANDLER 3
UPDATEHANDLER
Watch Changes
Refresh Values

2.  filterCache

class:
org.apache.solr.search.FastLRUCache

version:
1.0

description:
Concurrent LRU Cache(maxSize=1024, initialSize=512, minSize=921,
acceptableSize=972, cleanupThread=false, autowarmCount=128,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@7ce3d64e)

src:
$URL:
https:/?/?svn.apache.org/?repos/?asf/?lucene/?dev/?branches/?lucene_solr_4_2/?solr/?core/?src/?java/?org/?apache/?solr/?search/?FastLRUCache.java
$

stats:

lookups:
Was: 35, Now: 40, Delta: 5

hits:
Was: 30, Now: 35, Delta: 5

hitratio:
Was: 0.85, Now: 0.87

inserts:
5

evictions:
0

size:
5

warmupTime:
0

cumulative_lookups:
Was: 35, Now: 40, Delta: 5

cumulative_hits:
Was: 30, Now: 35, Delta: 5

cumulative_hitratio:
Was: 0.85, Now: 0.87

cumulative_inserts:
5

cumulative_evictions:
0

queryResultCache

class:

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Timothy Potter
Have a little more info about this ... the numDocs for *:* fluctuates
between two values (difference of 324 docs) depending on which nodes I
hit (distrib=true)

589,674,416
589,674,092

Using distrib=false, I found 1 shard with a mis-match:

shard15: {
  leader = 32,765,254
  replica = 32,764,930 diff:324
}

Interesting that the replica has more docs than the leader.

Unfortunately, due to some bad log management scripting on my part,
the logs were lost when these instances got re-started, which really
bums me out :-(

For now, I'm going to assume the replica with more docs is the one I
want to keep and will replicate the full index over to the other one.
Sorry about losing the logs :-(

Tim




On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  wrote:
> Thanks for responding Mark. I'll collect the information you asked
> about and open a JIRA once I have a little more understanding of what
> happened. Hopefully I can piece together some story after going over
> the logs.
>
> As for replica / leader, I suspect some leaders went down but
> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
> once and continued to serve queries, which is awesome.
>
> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
>> Yeah, thats no good.
>>
>> You might hit each node with distrib=false to get the doc counts.
>>
>> Which ones have what you think are the right counts and which the wrong - eg 
>> is it all replicas that are off, or leaders as well?
>>
>> You say several replicas - do you mean no leaders went down?
>>
>> You might look closer at the logs for a node that has it's count off.
>>
>> Finally, I guess I'd try and track it in a JIRA issue.
>>
>> - Mark
>>
>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>
>>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>>>
>>> After recovering, when I execute the match all docs query (*:*), I get a
>>> different count each time.
>>>
>>> In other words, if I execute q=*:* several times in a row, then I get a
>>> different count back for numDocs.
>>>
>>> This was not the case prior to the failure as that is one thing we monitor
>>> for.
>>>
>>> I think I should be worried ... any ideas on how to troubleshoot this? One
>>> thing to mention is that several of my replicas had to do full recoveries
>>> from the leader when they came back online. Indexing was happening when the
>>> replicas failed.
>>>
>>> Thanks.
>>> Tim
>>


Re: Dynamically loading Elevation Info

2013-04-22 Thread Ravi Solr
If you place the elevate.xml in the data directory of your index it will be
loaded every time a commit happens.

Thanks

Ravi Kiran Bhaskar


On Mon, Apr 22, 2013 at 7:38 AM, Erick Erickson wrote:

> I believe (but don't know for sure) that the QEV file is re-read on
> core reload, which the same app that modifies the elevator.xml file
> could trigger with an http request, see:
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> At least that's what I would try first.
>
> Best
> Erick
>
> On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> > Hi,
> >  Business User wants to configure the elevation text and the IDs and they
> > want to have an UI to do the same. As soon as they configure, it should
> be
> > reflected  in SOLR,(without restarting).
> >
> > My understanding is, Now, the QueryElevationComponent reads the
> > Elevator.xml(Configurable) and loads the information into ElevationCache
> > during startup and uses the information while responding to queries. Is
> > there any way, the content in the ElevationCache can be modifiable  by
> > some other external process / is there any easy way of achieving this
> > requirement ?
> >
> > Thanks and Regards,
> > Saroj Kumar Choudhury
> > =-=-=
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>


Re: updating documents unintentionally adds extra values to certain fields

2013-04-22 Thread Chris Hostetter

: I am using solr 4.2, and have set up spatial search config as below
: 
: http://wiki.apache.org/solr/SpatialSearch#Schema_Configuration
: 
: But everything I make an update to a document,
: http://wiki.apache.org/solr/UpdateJSON#Updating_a_Solr_Index_with_JSON
: 
: more values of the *_coordinates fields gets inserted, even though it was
: not set to multivalue & this behavior doesn't happen to any of the other
: fields.

can you elaborate on what exactly you mena by "more values of the 
*_coordinates fields gets inserted" ?

FYI...

atomic updates work by leveraging the existing stored values of fields;
independently, the LatLonType field works by creating on the fly sub 
fields representing internal state.

My hunch is that you don't actaully have the LatLonType setup exactly as 
describedi n hte wiki you linked to, where "*_coordinate" is confiured 
with 'stored="false"' ... my hunch is that you have the *_coordinate 
dynamicField configured to stored="true", and so when you do an atomic 
update the old (stored) sub-field values are copied over and the (new) 
sub-field values are generated again by LatLonType.


-Hoss


Re: Dynamically loading Elevation Info

2013-04-22 Thread Chris Hostetter

: In-Reply-To: <1366609851170-4057812.p...@n3.nabble.com>
: References: <1366383543826-4057312.p...@n3.nabble.com>
:  
:  <1366609851170-4057812.p...@n3.nabble.com>
: Subject: Dynamically loading Elevation Info

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss


Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Timothy Potter
nm - can't read my own output - the leader had more docs than the replica ;-)

On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  wrote:
> Have a little more info about this ... the numDocs for *:* fluctuates
> between two values (difference of 324 docs) depending on which nodes I
> hit (distrib=true)
>
> 589,674,416
> 589,674,092
>
> Using distrib=false, I found 1 shard with a mis-match:
>
> shard15: {
>   leader = 32,765,254
>   replica = 32,764,930 diff:324
> }
>
> Interesting that the replica has more docs than the leader.
>
> Unfortunately, due to some bad log management scripting on my part,
> the logs were lost when these instances got re-started, which really
> bums me out :-(
>
> For now, I'm going to assume the replica with more docs is the one I
> want to keep and will replicate the full index over to the other one.
> Sorry about losing the logs :-(
>
> Tim
>
>
>
>
> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  wrote:
>> Thanks for responding Mark. I'll collect the information you asked
>> about and open a JIRA once I have a little more understanding of what
>> happened. Hopefully I can piece together some story after going over
>> the logs.
>>
>> As for replica / leader, I suspect some leaders went down but
>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>> once and continued to serve queries, which is awesome.
>>
>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
>>> Yeah, thats no good.
>>>
>>> You might hit each node with distrib=false to get the doc counts.
>>>
>>> Which ones have what you think are the right counts and which the wrong - 
>>> eg is it all replicas that are off, or leaders as well?
>>>
>>> You say several replicas - do you mean no leaders went down?
>>>
>>> You might look closer at the logs for a node that has it's count off.
>>>
>>> Finally, I guess I'd try and track it in a JIRA issue.
>>>
>>> - Mark
>>>
>>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>>
 We had a rogue query take out several replicas in a large 4.2.0 cluster
 today, due to OOM's (we use the JVM args to kill the process on OOM).

 After recovering, when I execute the match all docs query (*:*), I get a
 different count each time.

 In other words, if I execute q=*:* several times in a row, then I get a
 different count back for numDocs.

 This was not the case prior to the failure as that is one thing we monitor
 for.

 I think I should be worried ... any ideas on how to troubleshoot this? One
 thing to mention is that several of my replicas had to do full recoveries
 from the leader when they came back online. Indexing was happening when the
 replicas failed.

 Thanks.
 Tim
>>>


Support of field variants in solr

2013-04-22 Thread Timo Schmidt
Hi together,

i am timo and work for a solr implementation company. During the last projects 
we came to know that we need to be able to generate different variants of a 
document.
 
Example 1 (Language):
 
To handle all documents in one solr core, we need a field variant for each 
language.
 

content for spanish content


 
content for german content


 
 
Each of these fields can be configured in the solr schema to act optimal for 
the specific taget language.
 
Example 2 (Stores):
 
We have customers who want to sell the same product in different stores for 
different prices.
 

price in frankfurt


 
price in paris


 
To solve this in an optimal way it would be nice when this works complely 
transparent inside solr by definig a „variantQuery“
 
A select query could look like this:

select?variantQuery=fr&qf=price,content
 
Additional the following is possible. No variant is present, behavious should 
be as before, so it should be relevant for all queries.

The setting variant=“*“ would mean: There can be several wildcard variant 
defined in a commited document. This makes sence when the data type would be 
the same for all variants and you will have many variants (like in the price 
example).

The same as during query time should be possible during indexing time.

I know, that we can do somthing like this also with dynamic fields but then we 
need to resolve the concrete fields during index and querytime on the 
application level, what is possible but it would be nicer to have a concept 
like this in solr, also working with facets is easier with this approach when 
the concrete fieldname does not need to be populated in the application.
 
So my questions are:

What do you think about this approach?
Is it better to work with dynamic fields? Is it reasonable when you have 200 
variants or more of a document?
What needs to be done in solr to have something like this variant attribute for 
fields?
Do you have other approaches?


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM
James, Thanks.  That was very helpful. That helped me understand count and
alternativeTermCount a bit more.

I also have the following case as pointed out earlier...
My query: 

http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms 

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms . 

Both query produce collations correctly and picking the first suggestions
and applying them as collation find what I am looking for but there is order
of magnitude difference in QTime.  There is one edit per term in both cases
or 2 edits in each query. The length of words in both these queries seem
identical. I'd like to understand why there is this vast difference in
QTime.  Also "Chocolate factory" and "Pursuit happyness" both are spellcheck
indexed as is.

I would appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

Thanks.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058048.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr Cloud 4.2 - Distributed Requests failing with NPE

2013-04-22 Thread Sudhakar Maddineni
Hi,
  We recently upgraded our solr version from 4.1 to 4.2 and started seeing
below exceptions when running distributed queries:
Any idea what we are missing here -

http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1
http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1
http://
/solr/core1/select?q=*%3A*&wt=json&indent=true&shards=/solr/core1

  "error":{
"trace":"java.lang.NullPointerException\n\tat
org.apache.solr.handler.component.HttpShardHandler.checkDistributed(HttpShardHandler.java:340)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:182)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141)\n\tat
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)\n\tat
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)\n\tat
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)\n\tat
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)\n\tat
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:470)\n\tat
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)\n\tat
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)\n\tat
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)\n\tat
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)\n\tat
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)\n\tat
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)\n\tat
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)\n\tat
java.lang.Thread.run(Unknown Source)\n",
"code":500}}


Thanks,Sudhakar.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread Dyer, James
On both queries, set "spellcheck.extendedResults=true" and also 
"spellcheck.collateExtendedResults=true", then post the full spelling response. 
 Also, how long does each query take on average with spellcheck turned off?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Monday, April 22, 2013 2:02 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

James, Thanks.  That was very helpful. That helped me understand count and
alternativeTermCount a bit more.

I also have the following case as pointed out earlier...
My query: 

http://host/solr/select?q=&spellcheck.q=chocolat%20factry&spellcheck=true&df=spell&fl=&indent=on&wt=xml&rows=10&version=2.2&echoParams=explicit

In this case, the intent is to correct "chocolat factry" with "chocolate
factory" which exists in my spell field index. I see a QTime from the above
query as somewhere between 350-400ms 

I run a similar query replacing the spellcheck terms to "pursut hapyness"
whereas "pursuit happyness" actually exists in my spell field and I see
QTime of 15-17ms . 

Both query produce collations correctly and picking the first suggestions
and applying them as collation find what I am looking for but there is order
of magnitude difference in QTime.  There is one edit per term in both cases
or 2 edits in each query. The length of words in both these queries seem
identical. I'd like to understand why there is this vast difference in
QTime.  Also "Chocolate factory" and "Pursuit happyness" both are spellcheck
indexed as is.

I would appreciate any help with this since I am not sure how I can get any
meaningful performance numbers and attribute the slowness to anything in
particular. 

Thanks.
Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058048.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Mark Miller
Bummer on the log loss :(

Good info though. Somehow that replica became active without actually syncing? 
This is heavily tested (though not with OOM's I suppose), so I'm a little 
surprised, but it's hard to speculate how it happened without the logs. 
Specially, the logs from the node that is off would be great - we would see 
what it did when it recovered and why it might think it was in sync :(

- Mark

On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:

> nm - can't read my own output - the leader had more docs than the replica ;-)
> 
> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  wrote:
>> Have a little more info about this ... the numDocs for *:* fluctuates
>> between two values (difference of 324 docs) depending on which nodes I
>> hit (distrib=true)
>> 
>> 589,674,416
>> 589,674,092
>> 
>> Using distrib=false, I found 1 shard with a mis-match:
>> 
>> shard15: {
>>  leader = 32,765,254
>>  replica = 32,764,930 diff:324
>> }
>> 
>> Interesting that the replica has more docs than the leader.
>> 
>> Unfortunately, due to some bad log management scripting on my part,
>> the logs were lost when these instances got re-started, which really
>> bums me out :-(
>> 
>> For now, I'm going to assume the replica with more docs is the one I
>> want to keep and will replicate the full index over to the other one.
>> Sorry about losing the logs :-(
>> 
>> Tim
>> 
>> 
>> 
>> 
>> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
>> wrote:
>>> Thanks for responding Mark. I'll collect the information you asked
>>> about and open a JIRA once I have a little more understanding of what
>>> happened. Hopefully I can piece together some story after going over
>>> the logs.
>>> 
>>> As for replica / leader, I suspect some leaders went down but
>>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>>> once and continued to serve queries, which is awesome.
>>> 
>>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  wrote:
 Yeah, thats no good.
 
 You might hit each node with distrib=false to get the doc counts.
 
 Which ones have what you think are the right counts and which the wrong - 
 eg is it all replicas that are off, or leaders as well?
 
 You say several replicas - do you mean no leaders went down?
 
 You might look closer at the logs for a node that has it's count off.
 
 Finally, I guess I'd try and track it in a JIRA issue.
 
 - Mark
 
 On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
 
> We had a rogue query take out several replicas in a large 4.2.0 cluster
> today, due to OOM's (we use the JVM args to kill the process on OOM).
> 
> After recovering, when I execute the match all docs query (*:*), I get a
> different count each time.
> 
> In other words, if I execute q=*:* several times in a row, then I get a
> different count back for numDocs.
> 
> This was not the case prior to the failure as that is one thing we monitor
> for.
> 
> I think I should be worried ... any ideas on how to troubleshoot this? One
> thing to mention is that several of my replicas had to do full recoveries
> from the leader when they came back online. Indexing was happening when 
> the
> replicas failed.
> 
> Thanks.
> Tim
 



Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Mark Miller
What do you know about the # of docs you *should*? Do you have that mean when 
taking the bad replica out of the equation?

- Mark

On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:

> Bummer on the log loss :(
> 
> Good info though. Somehow that replica became active without actually 
> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
> little surprised, but it's hard to speculate how it happened without the 
> logs. Specially, the logs from the node that is off would be great - we would 
> see what it did when it recovered and why it might think it was in sync :(
> 
> - Mark
> 
> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
> 
>> nm - can't read my own output - the leader had more docs than the replica ;-)
>> 
>> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
>> wrote:
>>> Have a little more info about this ... the numDocs for *:* fluctuates
>>> between two values (difference of 324 docs) depending on which nodes I
>>> hit (distrib=true)
>>> 
>>> 589,674,416
>>> 589,674,092
>>> 
>>> Using distrib=false, I found 1 shard with a mis-match:
>>> 
>>> shard15: {
>>> leader = 32,765,254
>>> replica = 32,764,930 diff:324
>>> }
>>> 
>>> Interesting that the replica has more docs than the leader.
>>> 
>>> Unfortunately, due to some bad log management scripting on my part,
>>> the logs were lost when these instances got re-started, which really
>>> bums me out :-(
>>> 
>>> For now, I'm going to assume the replica with more docs is the one I
>>> want to keep and will replicate the full index over to the other one.
>>> Sorry about losing the logs :-(
>>> 
>>> Tim
>>> 
>>> 
>>> 
>>> 
>>> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
>>> wrote:
 Thanks for responding Mark. I'll collect the information you asked
 about and open a JIRA once I have a little more understanding of what
 happened. Hopefully I can piece together some story after going over
 the logs.
 
 As for replica / leader, I suspect some leaders went down but
 fail-over to new leaders seemed to work fine. We lost about 9 nodes at
 once and continued to serve queries, which is awesome.
 
 On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
 wrote:
> Yeah, thats no good.
> 
> You might hit each node with distrib=false to get the doc counts.
> 
> Which ones have what you think are the right counts and which the wrong - 
> eg is it all replicas that are off, or leaders as well?
> 
> You say several replicas - do you mean no leaders went down?
> 
> You might look closer at the logs for a node that has it's count off.
> 
> Finally, I guess I'd try and track it in a JIRA issue.
> 
> - Mark
> 
> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
> 
>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>> 
>> After recovering, when I execute the match all docs query (*:*), I get a
>> different count each time.
>> 
>> In other words, if I execute q=*:* several times in a row, then I get a
>> different count back for numDocs.
>> 
>> This was not the case prior to the failure as that is one thing we 
>> monitor
>> for.
>> 
>> I think I should be worried ... any ideas on how to troubleshoot this? 
>> One
>> thing to mention is that several of my replicas had to do full recoveries
>> from the leader when they came back online. Indexing was happening when 
>> the
>> replicas failed.
>> 
>> Thanks.
>> Tim
> 
> 



Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Timothy Potter
I ended up just nuking the index on the replica with less docs and
restarting it - which triggered the snap pull from the leader. So now
I'm in sync and have better processes in place to capture the
information if it happens again, which given some of the queries my UI
team develops, is highly likely ;-)

Also, all our input data to Solr lives in Hive so I'm doing some id
-to- id comparisons of what is in Solr vs. what is in Hive to find any
discrepancies.

Again, sorry about the loss of the logs. This is a tough scenario to
try to re-create as it was a perfect storm of high indexing throughput
and a rogue query.

Tim

On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller  wrote:
> What do you know about the # of docs you *should*? Do you have that mean when 
> taking the bad replica out of the equation?
>
> - Mark
>
> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>
>> Bummer on the log loss :(
>>
>> Good info though. Somehow that replica became active without actually 
>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
>> little surprised, but it's hard to speculate how it happened without the 
>> logs. Specially, the logs from the node that is off would be great - we 
>> would see what it did when it recovered and why it might think it was in 
>> sync :(
>>
>> - Mark
>>
>> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
>>
>>> nm - can't read my own output - the leader had more docs than the replica 
>>> ;-)
>>>
>>> On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
>>> wrote:
 Have a little more info about this ... the numDocs for *:* fluctuates
 between two values (difference of 324 docs) depending on which nodes I
 hit (distrib=true)

 589,674,416
 589,674,092

 Using distrib=false, I found 1 shard with a mis-match:

 shard15: {
 leader = 32,765,254
 replica = 32,764,930 diff:324
 }

 Interesting that the replica has more docs than the leader.

 Unfortunately, due to some bad log management scripting on my part,
 the logs were lost when these instances got re-started, which really
 bums me out :-(

 For now, I'm going to assume the replica with more docs is the one I
 want to keep and will replicate the full index over to the other one.
 Sorry about losing the logs :-(

 Tim




 On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
 wrote:
> Thanks for responding Mark. I'll collect the information you asked
> about and open a JIRA once I have a little more understanding of what
> happened. Hopefully I can piece together some story after going over
> the logs.
>
> As for replica / leader, I suspect some leaders went down but
> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
> once and continued to serve queries, which is awesome.
>
> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
> wrote:
>> Yeah, thats no good.
>>
>> You might hit each node with distrib=false to get the doc counts.
>>
>> Which ones have what you think are the right counts and which the wrong 
>> - eg is it all replicas that are off, or leaders as well?
>>
>> You say several replicas - do you mean no leaders went down?
>>
>> You might look closer at the logs for a node that has it's count off.
>>
>> Finally, I guess I'd try and track it in a JIRA issue.
>>
>> - Mark
>>
>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  wrote:
>>
>>> We had a rogue query take out several replicas in a large 4.2.0 cluster
>>> today, due to OOM's (we use the JVM args to kill the process on OOM).
>>>
>>> After recovering, when I execute the match all docs query (*:*), I get a
>>> different count each time.
>>>
>>> In other words, if I execute q=*:* several times in a row, then I get a
>>> different count back for numDocs.
>>>
>>> This was not the case prior to the failure as that is one thing we 
>>> monitor
>>> for.
>>>
>>> I think I should be worried ... any ideas on how to troubleshoot this? 
>>> One
>>> thing to mention is that several of my replicas had to do full 
>>> recoveries
>>> from the leader when they came back online. Indexing was happening when 
>>> the
>>> replicas failed.
>>>
>>> Thanks.
>>> Tim
>>
>>
>


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread SandeepM
Chocolat Factry






  0
  77




  

  1
  0
  8
  615
  

  chocolate
  6544

  


  5
  9
  15
  6
  

  factory
  23614


  factor
  5128


  factus
  290


  factum
  178


  factae
  102

  

false

  chocolate factory
  85
  
chocolate
factory
  

  






Pursut Hapyness




  0
  16




  

  5
  0
  6
  0
  

  pursuit
  1209


  pursue
  108


  pursit
  1


  perdut
  94


  purdue
  70

  


  5
  7
  15
  0
  

  happyness
  175


  hapiness
  62


  hayness
  1


  happiness
  7788


  harkness
  324

  

false

  pursuit happyness
  10
  
pursuit
happyness
  

  



Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr dynamic fields scalability

2013-04-22 Thread jhuffaker
Hi All,

I was curious how lucene/solr scale as the total number of non-stored fields
grow.  So, for example, if my average document has 50 fields on it, but the
total number of fields in the system is upwards of 100k and I query on one
of those fields: Will I see runtime that is proportional to the total number
of fields in the system?  Or will it be solely proportional to the corpus
size?

I've tried searching and running my own benchmarks, but all of my answers
have been unsatisfactory thus far.

Let me know if there are any parts of the question I can clarify.

Regards,
John



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-dynamic-fields-scalability-tp4058090.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-22 Thread Dyer, James
This doesn't make a lot of sense to me as in both cases the very first 
collation it tries is the one it is returning.  So you're getting a very 
optimized spellcheck in both cases.  But it does have to issue both queries 2 
times:  the first time, it tries the user's main query anding there are not 
enough hits, it then tries the collation query to see how many hits that will 
return.  Could it be that these two queries just are less/more expensive and 
that difference gets magnified by running each twice?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: SandeepM [mailto:skmi...@hotmail.com] 
Sent: Monday, April 22, 2013 4:04 PM
To: solr-user@lucene.apache.org
Subject: RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

Chocolat Factry






  0
  77




  

  1
  0
  8
  615
  

  chocolate
  6544

  


  5
  9
  15
  6
  

  factory
  23614


  factor
  5128


  factus
  290


  factum
  178


  factae
  102

  

false

  chocolate factory
  85
  
chocolate
factory
  

  






Pursut Hapyness




  0
  16




  

  5
  0
  6
  0
  

  pursuit
  1209


  pursue
  108


  pursit
  1


  perdut
  94


  purdue
  70

  


  5
  7
  15
  0
  

  happyness
  175


  hapiness
  62


  hayness
  1


  happiness
  7788


  harkness
  324

  

false

  pursuit happyness
  10
  
pursuit
happyness
  

  



Spellcheck is used separately and we are not using any q along with
spellcheck.

Our search query also queries other fields, not just spellcheck and
therefore does not give a good representation of Qtime.   We use groupings
in the search query.
For Chocolate Factory, I get a search QTime of 198ms
For Pursuit Happyness, I get a search QTime of 318ms

Would appreciate your insights.
Thanks.
-- Sandeep




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DirectSolrSpellChecker-vastly-varying-spellcheck-QTime-times-tp4057176p4058086.html
Sent from the Solr - User mailing list archive at Nabble.com.




Soft Commit and Document Cache

2013-04-22 Thread Niran Fajemisin
Hi all,

A quick (and hopefully simply) question: Does the document cache (or any of the 
other caches for that matter), get invalidated after a soft commit has been 
performed?

Thanks,
Niran

Re: Soft Commit and Document Cache

2013-04-22 Thread Mark Miller
Yup - all of the top level caches are. It's a trade off - don't NRT more than 
you need to.

- Mark

On Apr 22, 2013, at 6:16 PM, Niran Fajemisin  wrote:

> Hi all,
> 
> A quick (and hopefully simply) question: Does the document cache (or any of 
> the other caches for that matter), get invalidated after a soft commit has 
> been performed?
> 
> Thanks,
> Niran



Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Mark Miller
No worries, thanks for the info. Let me know if you gain any more insight! I'd 
love to figure out what happened here and address it. And I'm especially 
interested in knowing if you lost any updates if you are able to determine that.

- Mark

On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:

> I ended up just nuking the index on the replica with less docs and
> restarting it - which triggered the snap pull from the leader. So now
> I'm in sync and have better processes in place to capture the
> information if it happens again, which given some of the queries my UI
> team develops, is highly likely ;-)
> 
> Also, all our input data to Solr lives in Hive so I'm doing some id
> -to- id comparisons of what is in Solr vs. what is in Hive to find any
> discrepancies.
> 
> Again, sorry about the loss of the logs. This is a tough scenario to
> try to re-create as it was a perfect storm of high indexing throughput
> and a rogue query.
> 
> Tim
> 
> On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller  wrote:
>> What do you know about the # of docs you *should*? Do you have that mean 
>> when taking the bad replica out of the equation?
>> 
>> - Mark
>> 
>> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>> 
>>> Bummer on the log loss :(
>>> 
>>> Good info though. Somehow that replica became active without actually 
>>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a 
>>> little surprised, but it's hard to speculate how it happened without the 
>>> logs. Specially, the logs from the node that is off would be great - we 
>>> would see what it did when it recovered and why it might think it was in 
>>> sync :(
>>> 
>>> - Mark
>>> 
>>> On Apr 22, 2013, at 2:19 PM, Timothy Potter  wrote:
>>> 
 nm - can't read my own output - the leader had more docs than the replica 
 ;-)
 
 On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter  
 wrote:
> Have a little more info about this ... the numDocs for *:* fluctuates
> between two values (difference of 324 docs) depending on which nodes I
> hit (distrib=true)
> 
> 589,674,416
> 589,674,092
> 
> Using distrib=false, I found 1 shard with a mis-match:
> 
> shard15: {
> leader = 32,765,254
> replica = 32,764,930 diff:324
> }
> 
> Interesting that the replica has more docs than the leader.
> 
> Unfortunately, due to some bad log management scripting on my part,
> the logs were lost when these instances got re-started, which really
> bums me out :-(
> 
> For now, I'm going to assume the replica with more docs is the one I
> want to keep and will replicate the full index over to the other one.
> Sorry about losing the logs :-(
> 
> Tim
> 
> 
> 
> 
> On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter  
> wrote:
>> Thanks for responding Mark. I'll collect the information you asked
>> about and open a JIRA once I have a little more understanding of what
>> happened. Hopefully I can piece together some story after going over
>> the logs.
>> 
>> As for replica / leader, I suspect some leaders went down but
>> fail-over to new leaders seemed to work fine. We lost about 9 nodes at
>> once and continued to serve queries, which is awesome.
>> 
>> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller  
>> wrote:
>>> Yeah, thats no good.
>>> 
>>> You might hit each node with distrib=false to get the doc counts.
>>> 
>>> Which ones have what you think are the right counts and which the wrong 
>>> - eg is it all replicas that are off, or leaders as well?
>>> 
>>> You say several replicas - do you mean no leaders went down?
>>> 
>>> You might look closer at the logs for a node that has it's count off.
>>> 
>>> Finally, I guess I'd try and track it in a JIRA issue.
>>> 
>>> - Mark
>>> 
>>> On Apr 19, 2013, at 6:37 PM, Timothy Potter  
>>> wrote:
>>> 
 We had a rogue query take out several replicas in a large 4.2.0 cluster
 today, due to OOM's (we use the JVM args to kill the process on OOM).
 
 After recovering, when I execute the match all docs query (*:*), I get 
 a
 different count each time.
 
 In other words, if I execute q=*:* several times in a row, then I get a
 different count back for numDocs.
 
 This was not the case prior to the failure as that is one thing we 
 monitor
 for.
 
 I think I should be worried ... any ideas on how to troubleshoot this? 
 One
 thing to mention is that several of my replicas had to do full 
 recoveries
 from the leader when they came back online. Indexing was happening 
 when the
 replicas failed.
 
 Thanks.
 Tim
>>> 
>>> 
>> 



Re: Soft Commit and Document Cache

2013-04-22 Thread Shawn Heisey

On 4/22/2013 4:16 PM, Niran Fajemisin wrote:

A quick (and hopefully simply) question: Does the document cache (or any of the 
other caches for that matter), get invalidated after a soft commit has been 
performed?


All Solr caches are invalidated when you issue a commit with 
openSearcher set to true.  There would be no reason to do a soft commit 
with openSearcher set to false.  That setting only makes sense with hard 
commits.


If you have queries defined for the newSearcher event, then they will be 
run, which can pre-populate caches.


The filterCache and queryResultCache can be autowarmed on commit - the 
most relevant autowarmCount queries in the cache from the old searcher 
are re-run against the new searcher.  The queryResultWindowSize 
parameter helps control exactly what gets cached with the queryResultCache.


The documentCache cannot be autowarmed, although I *think* that when 
entries from the queryResultCache are run, it will also populate the 
documentCache, though I could be wrong about that.


I do not know whether autowarming is done before or after newSearcher 
queries.


http://wiki.apache.org/solr/SolrCaching

Thanks,
Shawn



Re: Soft Commit and Document Cache

2013-04-22 Thread Niran Fajemisin
Thanks Shawn and Mark! That was very helpful.

-Niran



>
> From: Shawn Heisey 
>To: solr-user@lucene.apache.org 
>Sent: Monday, April 22, 2013 5:30 PM
>Subject: Re: Soft Commit and Document Cache
> 
>
>On 4/22/2013 4:16 PM, Niran Fajemisin wrote:
>> A quick (and hopefully simply) question: Does the document cache (or any of 
>> the other caches for that matter), get invalidated after a soft commit has 
>> been performed?
>
>All Solr caches are invalidated when you issue a commit with 
>openSearcher set to true.  There would be no reason to do a soft commit 
>with openSearcher set to false.  That setting only makes sense with hard 
>commits.
>
>If you have queries defined for the newSearcher event, then they will be 
>run, which can pre-populate caches.
>
>The filterCache and queryResultCache can be autowarmed on commit - the 
>most relevant autowarmCount queries in the cache from the old searcher 
>are re-run against the new searcher.  The queryResultWindowSize 
>parameter helps control exactly what gets cached with the queryResultCache.
>
>The documentCache cannot be autowarmed, although I *think* that when 
>entries from the queryResultCache are run, it will also populate the 
>documentCache, though I could be wrong about that.
>
>I do not know whether autowarming is done before or after newSearcher 
>queries.
>
>http://wiki.apache.org/solr/SolrCaching
>
>Thanks,
>Shawn
>
>
>
>

Re: Error creating collection

2013-04-22 Thread Erick Erickson
What version of Sor? More context for the stack trace?

You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

On Mon, Apr 22, 2013 at 5:33 AM, yriveiro  wrote:
> I get this exception when I try to create a new collection. someone have any
> idea that what's going on?
>
> org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RPS_12':
> Could not get shard_id for core: RPS_12
> coreNodeName:192.168.20.48:8983_solr_RPS_12
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Error-creating-collection-tp4057859.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Where to use replicationFactor and maxShardsPerNode at SolrCloud?

2013-04-22 Thread Erick Erickson
bq: However what will happen to that 10 nodes when I specify replication factor?


I think they just sit around doing nothing.

Best
Erick

On Mon, Apr 22, 2013 at 7:24 AM, Furkan KAMACI  wrote:
> Sorry but if I have 10 shards and a collection with replication factor of 1
> and if I start up 30 nodes what happens to that last 10 nodes? I mean:
>
> 10 nodes as leader
> 10 nodes as replica
>
> if I don't specify replication factor there was going to be a round robin
> system that assigns other 10 machine as:
> + 10 nodes as replica
>
> However what will happen to that 10 nodes when I specify replication factor?
>
>
> 2013/4/22 Erick Erickson 
>
>> 1) Imagine you have lots and lots and lots of different Solr indexes
>> and a 50 node cluster. Further imagine that one of those indexes has 2
>> shards, and a leader + shard is adequate to handle the load. You need
>> some way to limit the number of nodes your index gets distributed to,
>> that's what replicationFactor is for. So in this case
>> replicationFactor=2 will stop assigning nodes to that particular
>> collection after there's a leader + 1 replica
>>
>> 2> In the system you described, there won't be more than one
>> shard/node. But one strategy for growth is to "overshard". That is, in
>> the early days you put (numbers from thin air) 10 shards/node and they
>> are all quite small. As your index grows, you move to two nodes with 5
>> shards each. And later to 5 nodes with 2 shards and so on. There are
>> cases where you want some way to make the most of your hardware yet
>> plan for expansion.
>>
>> Best
>> Erick
>>
>> On Sun, Apr 21, 2013 at 3:51 PM, Furkan KAMACI 
>> wrote:
>> > I know that: when using SolrCloud we define the number of shards into the
>> > system. When we start up new Solr instances each one will be a a leader
>> for
>> > a shard, and if I continue to start up new Solr instances (that has
>> > exceeded the number number of shards) each one will be a replica for each
>> > leader as a round robin process.
>> >
>> > However when I read wiki there are two parameters: *replicationFactor
>> *and *
>> > maxShardsPerNode.
>> >
>> > *1) Can you give details about what are they. If all newly added Solr
>> > instances becomes a replica what is that replication factor for?
>> > 2) If what I wrote is true about that round robin process what is that *
>> > maxShardsPerNode*? How can be more than one shard at the system I
>> described?
>>


Re: ranking score by fields

2013-04-22 Thread Erick Erickson
You can sometimes use the highlighter component to do this, but it's a
little tricky...

But note your syntax isn't doing what you expect.
(field1:apache solr) parses as field1:apache defaultfield:solr. You want
field1:(apache solr)

&debug=all is your friend for these kinds of things, especially the parsed query
section

Best
Erick

On Mon, Apr 22, 2013 at 4:44 AM, Каскевич Александр
 wrote:
> Hi.
> I want to make subject but don't know exactly how can I do it.
> Example.
> I have index with field1, field2, field3.
> I make a query like:
> (field1:apache solr) OR (field2:apache solr) OR (field3:apache solr)
> And I want to know: is it found this doc by field1 or by field2 or by field3?
>
> I try to make like this: (field1:apache solr)^100 OR (field2:apache solr)^10 
> OR (field3:apache solr)^1
> But the problem is that I don't know range, minimum and maximum value of 
> score for each field.
> With other types of similarities (BM25 or othres) same situation.
> I cant find information about this in manual.
>
> Else, I try to use Relevance Functions, f.e. "termfreq" but it work only with 
> terms, not with phrases, like "apache solr".
>
> May be I miss something or you have other idea to do this?
> And else, I am not a java programmer and best way for me don't  write any 
> plugins for solr.
>
> Thanks.
> Alex.


Export Index and Re-Index XML

2013-04-22 Thread Kalyan Kuram
Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr,The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content 
Kalyan

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Sudhakar Maddineni
We had encountered similar issue few days back with 4.0- Beta version.
We have 6 node - 3 shard cluster setup.And, one of our replica
servers[tomcat] was not responding to any requests because it reached the
max no of the threads[200 -default]. To temporarily fix the issue, we had
to restart the server.After restarting, we realized that there were 2
tomcat processes running[old one + new one].So, we manually killed the two
tomcat processes and had a clean start.And, we observed the numDocs of
replica server not matching to the count on leader.
So, this discrepancy is because we manually killed the process which
interrupted the sync process?

Thx,Sudhakar.




On Mon, Apr 22, 2013 at 3:28 PM, Mark Miller  wrote:

> No worries, thanks for the info. Let me know if you gain any more insight!
> I'd love to figure out what happened here and address it. And I'm
> especially interested in knowing if you lost any updates if you are able to
> determine that.
>
> - Mark
>
> On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:
>
> > I ended up just nuking the index on the replica with less docs and
> > restarting it - which triggered the snap pull from the leader. So now
> > I'm in sync and have better processes in place to capture the
> > information if it happens again, which given some of the queries my UI
> > team develops, is highly likely ;-)
> >
> > Also, all our input data to Solr lives in Hive so I'm doing some id
> > -to- id comparisons of what is in Solr vs. what is in Hive to find any
> > discrepancies.
> >
> > Again, sorry about the loss of the logs. This is a tough scenario to
> > try to re-create as it was a perfect storm of high indexing throughput
> > and a rogue query.
> >
> > Tim
> >
> > On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller 
> wrote:
> >> What do you know about the # of docs you *should*? Do you have that
> mean when taking the bad replica out of the equation?
> >>
> >> - Mark
> >>
> >> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
> >>
> >>> Bummer on the log loss :(
> >>>
> >>> Good info though. Somehow that replica became active without actually
> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a
> little surprised, but it's hard to speculate how it happened without the
> logs. Specially, the logs from the node that is off would be great - we
> would see what it did when it recovered and why it might think it was in
> sync :(
> >>>
> >>> - Mark
> >>>
> >>> On Apr 22, 2013, at 2:19 PM, Timothy Potter 
> wrote:
> >>>
>  nm - can't read my own output - the leader had more docs than the
> replica ;-)
> 
>  On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter <
> thelabd...@gmail.com> wrote:
> > Have a little more info about this ... the numDocs for *:* fluctuates
> > between two values (difference of 324 docs) depending on which nodes
> I
> > hit (distrib=true)
> >
> > 589,674,416
> > 589,674,092
> >
> > Using distrib=false, I found 1 shard with a mis-match:
> >
> > shard15: {
> > leader = 32,765,254
> > replica = 32,764,930 diff:324
> > }
> >
> > Interesting that the replica has more docs than the leader.
> >
> > Unfortunately, due to some bad log management scripting on my part,
> > the logs were lost when these instances got re-started, which really
> > bums me out :-(
> >
> > For now, I'm going to assume the replica with more docs is the one I
> > want to keep and will replicate the full index over to the other one.
> > Sorry about losing the logs :-(
> >
> > Tim
> >
> >
> >
> >
> > On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter <
> thelabd...@gmail.com> wrote:
> >> Thanks for responding Mark. I'll collect the information you asked
> >> about and open a JIRA once I have a little more understanding of
> what
> >> happened. Hopefully I can piece together some story after going over
> >> the logs.
> >>
> >> As for replica / leader, I suspect some leaders went down but
> >> fail-over to new leaders seemed to work fine. We lost about 9 nodes
> at
> >> once and continued to serve queries, which is awesome.
> >>
> >> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller <
> markrmil...@gmail.com> wrote:
> >>> Yeah, thats no good.
> >>>
> >>> You might hit each node with distrib=false to get the doc counts.
> >>>
> >>> Which ones have what you think are the right counts and which the
> wrong - eg is it all replicas that are off, or leaders as well?
> >>>
> >>> You say several replicas - do you mean no leaders went down?
> >>>
> >>> You might look closer at the logs for a node that has it's count
> off.
> >>>
> >>> Finally, I guess I'd try and track it in a JIRA issue.
> >>>
> >>> - Mark
> >>>
> >>> On Apr 19, 2013, at 6:37 PM, Timothy Potter 
> wrote:
> >>>
>  We had a rogue query take out several replicas in a large 4.2.0
> cluster
> 

Re: Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-22 Thread Timothy Potter
Hi Sudhakar,

Unfortunately, we don't know the underlying cause and I lost the logs
that could have helped diagnose further. FWIW, I think this is an
extreme case as I've lost nodes before and haven't had any
discrepancies after recovering. In my case, it was a perfect storm of
high throughput indexing 8-10K docs/sec and an very nasty query that
OOM'd on about half the nodes in my cluster. Of course we want to get
to the bottom of this but it's going to be hard to reproduce. The good
news is I'm recovered and the cluster is consistent again.

Cheers,
Tim

On Mon, Apr 22, 2013 at 5:18 PM, Sudhakar Maddineni
 wrote:
> We had encountered similar issue few days back with 4.0- Beta version.
> We have 6 node - 3 shard cluster setup.And, one of our replica
> servers[tomcat] was not responding to any requests because it reached the
> max no of the threads[200 -default]. To temporarily fix the issue, we had
> to restart the server.After restarting, we realized that there were 2
> tomcat processes running[old one + new one].So, we manually killed the two
> tomcat processes and had a clean start.And, we observed the numDocs of
> replica server not matching to the count on leader.
> So, this discrepancy is because we manually killed the process which
> interrupted the sync process?
>
> Thx,Sudhakar.
>
>
>
>
> On Mon, Apr 22, 2013 at 3:28 PM, Mark Miller  wrote:
>
>> No worries, thanks for the info. Let me know if you gain any more insight!
>> I'd love to figure out what happened here and address it. And I'm
>> especially interested in knowing if you lost any updates if you are able to
>> determine that.
>>
>> - Mark
>>
>> On Apr 22, 2013, at 5:02 PM, Timothy Potter  wrote:
>>
>> > I ended up just nuking the index on the replica with less docs and
>> > restarting it - which triggered the snap pull from the leader. So now
>> > I'm in sync and have better processes in place to capture the
>> > information if it happens again, which given some of the queries my UI
>> > team develops, is highly likely ;-)
>> >
>> > Also, all our input data to Solr lives in Hive so I'm doing some id
>> > -to- id comparisons of what is in Solr vs. what is in Hive to find any
>> > discrepancies.
>> >
>> > Again, sorry about the loss of the logs. This is a tough scenario to
>> > try to re-create as it was a perfect storm of high indexing throughput
>> > and a rogue query.
>> >
>> > Tim
>> >
>> > On Mon, Apr 22, 2013 at 2:41 PM, Mark Miller 
>> wrote:
>> >> What do you know about the # of docs you *should*? Do you have that
>> mean when taking the bad replica out of the equation?
>> >>
>> >> - Mark
>> >>
>> >> On Apr 22, 2013, at 4:33 PM, Mark Miller  wrote:
>> >>
>> >>> Bummer on the log loss :(
>> >>>
>> >>> Good info though. Somehow that replica became active without actually
>> syncing? This is heavily tested (though not with OOM's I suppose), so I'm a
>> little surprised, but it's hard to speculate how it happened without the
>> logs. Specially, the logs from the node that is off would be great - we
>> would see what it did when it recovered and why it might think it was in
>> sync :(
>> >>>
>> >>> - Mark
>> >>>
>> >>> On Apr 22, 2013, at 2:19 PM, Timothy Potter 
>> wrote:
>> >>>
>>  nm - can't read my own output - the leader had more docs than the
>> replica ;-)
>> 
>>  On Mon, Apr 22, 2013 at 11:42 AM, Timothy Potter <
>> thelabd...@gmail.com> wrote:
>> > Have a little more info about this ... the numDocs for *:* fluctuates
>> > between two values (difference of 324 docs) depending on which nodes
>> I
>> > hit (distrib=true)
>> >
>> > 589,674,416
>> > 589,674,092
>> >
>> > Using distrib=false, I found 1 shard with a mis-match:
>> >
>> > shard15: {
>> > leader = 32,765,254
>> > replica = 32,764,930 diff:324
>> > }
>> >
>> > Interesting that the replica has more docs than the leader.
>> >
>> > Unfortunately, due to some bad log management scripting on my part,
>> > the logs were lost when these instances got re-started, which really
>> > bums me out :-(
>> >
>> > For now, I'm going to assume the replica with more docs is the one I
>> > want to keep and will replicate the full index over to the other one.
>> > Sorry about losing the logs :-(
>> >
>> > Tim
>> >
>> >
>> >
>> >
>> > On Sat, Apr 20, 2013 at 10:23 AM, Timothy Potter <
>> thelabd...@gmail.com> wrote:
>> >> Thanks for responding Mark. I'll collect the information you asked
>> >> about and open a JIRA once I have a little more understanding of
>> what
>> >> happened. Hopefully I can piece together some story after going over
>> >> the logs.
>> >>
>> >> As for replica / leader, I suspect some leaders went down but
>> >> fail-over to new leaders seemed to work fine. We lost about 9 nodes
>> at
>> >> once and continued to serve queries, which is awesome.
>> >>
>> >> On Sat, Apr 20, 2013 at 10:11 AM, Mark Miller <
>>

Re: Export Index and Re-Index XML

2013-04-22 Thread Shawn Heisey

On 4/22/2013 5:07 PM, Kalyan Kuram wrote:

Hi All I am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr, The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content


The best option is to do the indexing again from whatever source you did 
the index from the first time.  Because your requirements may change at 
any time, this is something that you should be prepared to do quite often.


If you did not set all fields to stored="true" in your schema, then you 
will not be able to export all your documents from your current index to 
a new one.  There is no way around this, you will have to wipe your 
index, go back to your original data source, and do the indexing again.


If you DID store all your fields, then you have two choices.

1) Use the dataimport handler with SolrEntityProcessor.  You can use 
this to import from one core onto another core on the same server with a 
different config/schema, or from one server to another.


http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

2) I don't recommend this option, but it might work.  You can query Solr 
for your docs, one page at a time (use the rows and start parameters), 
with wt=xml or wt=json, and save that output.  With a little bit of 
modification, you can then use what you save as input for indexing. 
Here's a website describing the process and PHP script to make it 
easier.  I have not checked to see whether the script actually works, 
and I won't be able to help you with it:


http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/

Thanks,
Shawn



Re: Export Index and Re-Index XML

2013-04-22 Thread Jack Krupansky
Any fields which have stored values can be read and output, but 
indexed-only, non-stored fields cannot be read or exported. Even if they 
could be, their values are post-analysis, which means that there is a good 
chance that they cannot be run through term analysis again.


It is always best to keep a copy of your raw source data separate from the 
data you add to Solr. Or, at least make sure any important data is "stored".


In short, you need to model your data for "reindexing", which is a fact of 
life in Solr land.


-- Jack Krupansky

-Original Message- 
From: Kalyan Kuram

Sent: Monday, April 22, 2013 7:07 PM
To: solr-user@lucene.apache.org
Subject: Export Index and Re-Index XML

Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
and then re-index back into Solr,The reason i need to do this is i 
misconfigured fieldtype and to make it work i need to re-index the content
Kalyan 



Too many close, count -1

2013-04-22 Thread yriveiro
Hi,

Reviewing the solr's log I found this message.

The solr version is 4.2.1, running in a tomcat 7

4973652:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@5795a627. Please report this exception to
solr-user@lucene.apache.org
5003386:SEVERE: REFCOUNT ERROR: unreferenced
org.apache.solr.core.SolrCore@5795a627 () has a reference count of -1

2965529:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@7722b49b. Please report this exception to
solr-user@lucene.apache.org
52965531:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@32530662. Please report this exception to
solr-user@lucene.apache.org
52965533:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@144e2972. Please report this exception to
solr-user@lucene.apache.org
52971283:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@1705c88e. Please report this exception to
solr-user@lucene.apache.org
52978567:SEVERE: Too many close [count:-1] on
org.apache.solr.core.SolrCore@c200c62. Please report this exception to
solr-user@lucene.apache.org



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-close-count-1-tp4058129.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Export Index and Re-Index XML

2013-04-22 Thread Kalyan Kuram
Thank you all very much for your help.I do have field configured as stored and 
index,i did read the FAQ from wiki,I think SolrEntityProcessor is what i think 
needed.I am trying to index the data from Adobe CQ and its a push based 
indexing and pain to index data from a very large repository.I think i can 
manage this with SolrEntityProcessor for now and will think of modelling data 
for re-indexing purposes
Kalyan

> From: j...@basetechnology.com
> To: solr-user@lucene.apache.org
> Subject: Re: Export Index and Re-Index XML
> Date: Mon, 22 Apr 2013 19:54:26 -0400
> 
> Any fields which have stored values can be read and output, but 
> indexed-only, non-stored fields cannot be read or exported. Even if they 
> could be, their values are post-analysis, which means that there is a good 
> chance that they cannot be run through term analysis again.
> 
> It is always best to keep a copy of your raw source data separate from the 
> data you add to Solr. Or, at least make sure any important data is "stored".
> 
> In short, you need to model your data for "reindexing", which is a fact of 
> life in Solr land.
> 
> -- Jack Krupansky
> 
> -Original Message- 
> From: Kalyan Kuram
> Sent: Monday, April 22, 2013 7:07 PM
> To: solr-user@lucene.apache.org
> Subject: Export Index and Re-Index XML
> 
> Hi AllI am new to solr and i wanted to know if i can export the Index as XML 
> and then re-index back into Solr,The reason i need to do this is i 
> misconfigured fieldtype and to make it work i need to re-index the content
> Kalyan 
> 
  

Re: Support of field variants in solr

2013-04-22 Thread Alexandre Rafalovitch
To route different languages, you could use different request handlers
and do different alias mapping. There are two alias mapping:
On the way in for eDisMax:
https://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2BAC8_renaming
On the way out: https://wiki.apache.org/solr/CommonQueryParameters#Field_alias

Between the two, you can make sure that all searches to /searchES map
'content' field to 'content_es' and for /searchDE map 'content' to
'content_de'.

Hope this helps,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Mon, Apr 22, 2013 at 2:31 PM, Timo Schmidt  wrote:
> Hi together,
>
> i am timo and work for a solr implementation company. During the last 
> projects we came to know that we need to be able to generate different 
> variants of a document.
>
> Example 1 (Language):
>
> To handle all documents in one solr core, we need a field variant for each 
> language.
>
>
> content for spanish content
>
>  variant=“es“ />
>
> content for german content
>
>  variant=“de“ />
>
>
> Each of these fields can be configured in the solr schema to act optimal for 
> the specific taget language.
>
> Example 2 (Stores):
>
> We have customers who want to sell the same product in different stores for 
> different prices.
>
>
> price in frankfurt
>
> 
>
> price in paris
>
> 
>
> To solve this in an optimal way it would be nice when this works complely 
> transparent inside solr by definig a „variantQuery“
>
> A select query could look like this:
>
> select?variantQuery=fr&qf=price,content
>
> Additional the following is possible. No variant is present, behavious should 
> be as before, so it should be relevant for all queries.
>
> The setting variant=“*“ would mean: There can be several wildcard variant 
> defined in a commited document. This makes sence when the data type would be 
> the same for all variants and you will have many variants (like in the price 
> example).
>
> The same as during query time should be possible during indexing time.
>
> I know, that we can do somthing like this also with dynamic fields but then 
> we need to resolve the concrete fields during index and querytime on the 
> application level, what is possible but it would be nicer to have a concept 
> like this in solr, also working with facets is easier with this approach when 
> the concrete fieldname does not need to be populated in the application.
>
> So my questions are:
>
> What do you think about this approach?
> Is it better to work with dynamic fields? Is it reasonable when you have 200 
> variants or more of a document?
> What needs to be done in solr to have something like this variant attribute 
> for fields?
> Do you have other approaches?


SSLInitializationException on startup

2013-04-22 Thread Van Tassell, Kristian
I'm configuring a number of servers to support Solr 4.2 and have come across 
one that will not start. This is a pre-existing application server (running 
Tomcat) and I'm not quite sure what to look for. Has anyone seen this before 
and solved it?

Thanks in advance!

INFO: Creating new http client, 
config:maxConnectionsPerHost=20&maxConnections=1&socketTimeout=0&connTimeout=0&retry=false
Apr 22, 2013 9:00:01 PM org.apache.solr.servlet.SolrDispatchFilter init
SEVERE: Could not start Solr. Check solr/home property and the logs
Apr 22, 2013 9:00:01 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.http.conn.ssl.SSLInitializationException: Failure 
initializing default system SSL context
   at 
org.apache.http.conn.ssl.SSLSocketFactory.createSystemSSLContext(SSLSocketFactory.java:368)
   at 
org.apache.http.conn.ssl.SSLSocketFactory.getSystemSocketFactory(SSLSocketFactory.java:204)
   at 
org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault(SchemeRegistryFactory.java:82)
   at 
org.apache.http.impl.client.SystemDefaultHttpClient.createClientConnectionManager(SystemDefaultHttpClient.java:118)
   at 
org.apache.http.impl.client.AbstractHttpClient.getConnectionManager(AbstractHttpClient.java:466)
   at 
org.apache.solr.client.solrj.impl.HttpClientUtil.setMaxConnections(HttpClientUtil.java:179)
   at 
org.apache.solr.client.solrj.impl.HttpClientConfigurer.configure(HttpClientConfigurer.java:33)
   at 
org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:115)
   at 
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:105)
   at 
org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:134)
   at 
org.apache.solr.core.CoreContainer.initShardHandler(CoreContainer.java:709)
   at 
org.apache.solr.core.CoreContainer.load(CoreContainer.java:438)
   at 
org.apache.solr.core.CoreContainer.load(CoreContainer.java:405)
   at 
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:337)
   at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:110)
   at 
org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:277)
   at 
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:258)
   at 
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:382)
   at 
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:103)
   at 
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4624)
   at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5281)
   at 
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at 
org.apache.catalina.core.StandardContext.reload(StandardContext.java:3894)
   at 
org.apache.catalina.manager.ManagerServlet.reload(ManagerServlet.java:949)
   at 
org.apache.catalina.manager.HTMLManagerServlet.reload(HTMLManagerServlet.java:688)
   at 
org.apache.catalina.manager.HTMLManagerServlet.doPost(HTMLManagerServlet.java:216)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
org.apache.catalina.filters.CsrfPreventionFilter.doFilter(CsrfPreventionFilter.java:187)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
org.apache.catalina.filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:108)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
   at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:581)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReport

RE: Bug? JSON output changes when switching to solr cloud

2013-04-22 Thread David Parks
Thanks Yonik! That was fast!
We switched over to XML for the moment and will switch back to JSON when 4.3
comes out.
Dave


-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Monday, April 22, 2013 8:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Bug? JSON output changes when switching to solr cloud

Thanks David,

I've confirmed this is still a problem in trunk and opened
https://issues.apache.org/jira/browse/SOLR-4746

-Yonik
http://lucidworks.com


On Sun, Apr 21, 2013 at 11:16 PM, David Parks 
wrote:
> We just took an installation of 4.1 which was working fine and changed 
> it to run as solr cloud. We encountered the most incredibly bizarre
apparent bug:
>
> In the JSON output, a colon ':' changed to a comma ',', which of 
> course broke the JSON parser.  I'm guessing I should file this as a 
> bug, but it was so odd I thought I'd post here before doing so. Demo
below:
>
> Here is a query on our previous single-server instance:
>
> Query:
> --
> http://10.1.3.28:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalo
> g_name 
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&gr
> oup.li
> mit=50
>
> Response:
> -
> {"responseHeader":{"status":0,"QTime":15714,"params":{"fl":"score,id,u
> nique_ 
> catalog_name","start":"0","q":"book","group.limit":"50","group.field":
> "uniqu 
> e_catalog_name","group":"true","wt":"json","rows":"50"}},"grouped":{"u
> nique_ 
> catalog_name":{"matches":106711214,"groups":[{"groupValue":"ls:2653","
> doclis
> t":{"numFound":103981882,"start":0,"maxScore":4.7039795,"docs":[{"id":
> "10055
>
02088784","score":4.7039795},{"id":"1005500291075","score":4.7039795},{"id":
> "1000810546074","score":4.7039795},{"id":"1000611003270","score":4.703
> 9795},
>
> Note this part:
> --
>   {"unique_catalog_name":{"matches":
>
>
>
> Now we run that same query on a server that was derived from the same 
> build, just configuration changes to run it in distributed "solr cloud"
mode.
>
> Query:
> -
> http://10.1.3.18:8081/solr/select?q=book&fl=score%2Cid%2Cunique_catalo
> g_name 
> &start=0&rows=50&wt=json&group=true&group.field=unique_catalog_name&gr
> oup.li
> mit=50
>
> Response:
> -{"responseHeader":{"status":0,"QTime":8855,"params":{"fl"
> :"scor 
> e,id,unique_catalog_name","start":"0","q":"book","group.limit":"50","g
> roup.f 
> ield":"unique_catalog_name","group":"true","wt":"json","rows":"50"}},"
> groupe
> d":["unique_catalog_name",{"matches":106711214,"groups":[{"groupValue"
> :"ls:2 
> 653","doclist":{"numFound":103981882,"start":0,"maxScore":4.7042913,"d
> ocs":[
> {"id":"1005502088784","score":4.7042913},{"id":"1000611003270","score"
> :4.704
>
2913},{"id":"1005500291075","score":4.703668},{"id":"1000810546074","score":
> 4.703668},
>
> Note how it's changed:
> 
>   "unique_catalog_name",{"matches":
>
>
>
>



Re: Too many close, count -1

2013-04-22 Thread Yonik Seeley
Can you tell what operations cause this to happen?

I've added a comment to https://issues.apache.org/jira/browse/SOLR-4749
where we're looking at some related issues around CoreContainer, but
perhaps it should get it's own issue.

-Yonik
http://lucidworks.com


On Mon, Apr 22, 2013 at 7:57 PM, yriveiro  wrote:
> Hi,
>
> Reviewing the solr's log I found this message.
>
> The solr version is 4.2.1, running in a tomcat 7
>
> 4973652:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@5795a627. Please report this exception to
> solr-user@lucene.apache.org
> 5003386:SEVERE: REFCOUNT ERROR: unreferenced
> org.apache.solr.core.SolrCore@5795a627 () has a reference count of -1
>
> 2965529:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@7722b49b. Please report this exception to
> solr-user@lucene.apache.org
> 52965531:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@32530662. Please report this exception to
> solr-user@lucene.apache.org
> 52965533:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@144e2972. Please report this exception to
> solr-user@lucene.apache.org
> 52971283:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@1705c88e. Please report this exception to
> solr-user@lucene.apache.org
> 52978567:SEVERE: Too many close [count:-1] on
> org.apache.solr.core.SolrCore@c200c62. Please report this exception to
> solr-user@lucene.apache.org
>
>
>
> -
> Best regards
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Too-many-close-count-1-tp4058129.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Too many close, count -1

2013-04-22 Thread Chris Hostetter

: Can you tell what operations cause this to happen?

ie: what does your configuration look like? are you using any custom 
plugins? what types of features of solr do you use (faceting, grouping, 
highlighting, clustering, dih, etc...) ?  


-Hoss


Re: Solr 4.2 Startup Detects Corrupt Log And is Really Slow to Start

2013-04-22 Thread Umesh Prasad
Sorry for late reply. I was trying to change our indexing pipeline and do
explicit intermediate commits for each core. That turned out to be a bit
more work that I have time for.

So, I do want to explore hard commits.  I tried
:/solr//*update?commit=true* . But there is no
impact on Txn Log size, so I feel, it must be getting ignored.

So can someone tell me, how to do the Hard Commits ?

@Shawn : openSearcher = false is not an option. On Each commit, index will
be replicated to Slaves which will have a searcher on it immediately and
can intermediate state. The longer term and better solution is changing
indexing pipeline and doing explicit commits, but I can't implement that
right now.




On 18 Apr 2013 00:35, "Shawn Heisey"  wrote:

> On 4/17/2013 11:56 AM, Mark Miller wrote:
>
>> There is one additional caveat - when you disable the updateLog, you have
>>> to switch to MMapDirectoryFactory instead of NRTCachingDirectoryFactory.
>>>  The NRT directory implementation will cache a portion of a commit
>>> (including hard commits) into RAM instead of onto disk.  On the next
>>> commit, the previous one is persisted completely to disk.  Without a
>>> transaction log, you can lose data.
>>>
>>
>> I don't think this is true? NRTCachingDirectoryFactory should not cache
>> hard commits and should be as safe as MMapDirectoryFactory is - neither of
>> which is as safe as using a tran log.
>>
>
> This is based on observations of what happens with my segment files when I
> do a full-import, using autoCommit with openSearcher disabled.  I see that
> each autoCommit results in a full segment being written, the part of
> another segment.  On the next autoCommit, the rest of the files for the
> last segment are written, another full segment is written, I get another
> partial segment.  I asked about this on the list some time ago, and what I
> just told Umesh is a rehash of what I understood from Yonik's response.
>
> If I'm wrong, I hope someone who knows for sure can correct me.
>
> Thanks,
> Shawn
>
>


Re: Dynamically loading Elevation Info

2013-04-22 Thread Saroj C
Thanks Ravi and Eric. Will try these options.


Thanks and Regards,
Saroj Kumar Choudhury


Experience certainty.   IT Services
Business Solutions
Outsourcing




From:
Ravi Solr 
To:
"solr-user@lucene.apache.org" 
Date:
22-04-2013 23:27
Subject:
Re: Dynamically loading Elevation Info



If you place the elevate.xml in the data directory of your index it will 
be
loaded every time a commit happens.

Thanks

Ravi Kiran Bhaskar


On Mon, Apr 22, 2013 at 7:38 AM, Erick Erickson 
wrote:

> I believe (but don't know for sure) that the QEV file is re-read on
> core reload, which the same app that modifies the elevator.xml file
> could trigger with an http request, see:
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> At least that's what I would try first.
>
> Best
> Erick
>
> On Mon, Apr 22, 2013 at 2:48 AM, Saroj C  wrote:
> > Hi,
> >  Business User wants to configure the elevation text and the IDs and 
they
> > want to have an UI to do the same. As soon as they configure, it 
should
> be
> > reflected  in SOLR,(without restarting).
> >
> > My understanding is, Now, the QueryElevationComponent reads the
> > Elevator.xml(Configurable) and loads the information into 
ElevationCache
> > during startup and uses the information while responding to queries. 
Is
> > there any way, the content in the ElevationCache can be modifiable  by
> > some other external process / is there any easy way of achieving this
> > requirement ?
> >
> > Thanks and Regards,
> > Saroj Kumar Choudhury
> > =-=-=
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
>




Re: Test harness can not load existing index data in Solr 4.2

2013-04-22 Thread zhu kane
I think the problem should be EmbeddedSolrServer can't load existing index
data.

Any committer can help confirm whether it's a bug or not.

Thank you.


Kane


On Mon, Apr 15, 2013 at 7:28 PM, zhu kane  wrote:

>  I'm extending Solr's *AbstractSolrTestCase* for unit testing.
>
> I have existing 'schema.xml', 'solrconfig.xml' and index data. I want to
> start an embedded solr server to load existing collection and its data.
> Then test searching doc in solr.
>
> This way works well in Solr 3.6. However it does not work any more after
> adapting to Solr 4.2.1. After some investigating, I found it looks like the
> index data is not loaded by SolrCore created by Test harness.
>
> This also can be reproduced when using index of example doc of Solr, I
> posted the detail test class in my stackoverflow question[1].
>
> Is it a bug of test harness? Or is there better way to load existing index
> data in unit test?
>
> Thanks.
> [1]
> http://stackoverflow.com/questions/15947116/solr-4-2-test-harness-can-not-load-existing-index-data
>
> Mengxin Zhu
>


Re: Solr metrics in Codahale metrics and Graphite?

2013-04-22 Thread Dmitry Kan
Hello Walter,

Have you had a chance to get something working with graphite, codahale and
solr?

Has anyone else tried these tools with Solr 3.x family? How much work is it
to set things up?

We have tried zabbix in the past. Even though it required lots of up front
investment on configuration, it looks like a compelling option.
In the meantime, we are looking into something more "solr-tailed" yet
simple. Even without metrics persistence. Tried: jconsole and viewing stats
via jmx. Main point for us now is to gather the RAM usage.

Dmitry


On Tue, Apr 9, 2013 at 9:43 PM, Walter Underwood wrote:

> If it isn't obvious, I'm glad to help test a patch for this. We can run a
> simulated production load in dev and report to our metrics server.
>
> wunder
>
> On Apr 8, 2013, at 1:07 PM, Walter Underwood wrote:
>
> > That approach sounds great. --wunder
> >
> > On Apr 7, 2013, at 9:40 AM, Alan Woodward wrote:
> >
> >> I've been thinking about how to improve this reporting, especially now
> that metrics-3 (which removes all of the funky thread issues we ran into
> last time I tried to add it to Solr) is close to release.  I think we could
> go about it as follows:
> >>
> >> * refactor the existing JMX reporting to use metrics-3.  This would
> mean replacing the SolrCore.infoRegistry map with a MetricsRegistry, and
> adding a JmxReporter, keeping the existing config logic to determine which
> JMX server to use.  PluginInfoHandler and SolrMBeanInfoHandler translate
> the metrics-3 data back into SolrMBean format to keep the reporting
> backwards-compatible.  This seems like a lot of work for no visible
> benefit, but…
> >> * we can then add the ability to define other metrics reporters in
> solrconfig.xml.  There are already reporters for Ganglia and Graphite - you
> just add then to the Solr lib/ directory, configure them in solrconfig, and
> voila - Solr can be monitored using the same devops tools you use to
> monitor everything else.
> >>
> >> Does this sound sane?
> >>
> >> Alan Woodward
> >> www.flax.co.uk
> >>
> >>
> >> On 6 Apr 2013, at 20:49, Walter Underwood wrote:
> >>
> >>> Wow, that really doesn't help at all, since these seem to only be
> reported in the stats page.
> >>>
> >>> I don't need another non-standard app-specific set of metrics,
> especially one that needs polling. I need metrics delivered to the common
> system that we use for all our servers.
> >>>
> >>> This is also why SPM is not useful for us, sorry Otis.
> >>>
> >>> Also, there is no time period on these stats. How do you graph the
> 95th percentile? I know there was a lot of work on these, but they seem
> really useless to me. I'm picky about metrics, working at Netflix does that
> to you.
> >>>
> >>> wunder
> >>>
> >>> On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote:
> >>>
>  In the Jira, but not in the docs.
> 
>  It would be nice to have VM stats like GC, too, so we can have common
> monitoring and alerting on all our services.
> 
>  wunder
> 
>  On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote:
> 
> > It's there! :)
> > http://search-lucene.com/?q=percentile&fc_project=Solr&fc_type=issue
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> > On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood <
> wun...@wunderwood.org> wrote:
> >> That sounds great. I'll check out the bug, I didn't see anything in
> the docs about this. And if I can't find it with a search engine, it
> probably isn't there.  --wunder
> >>
> >> On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote:
> >>
> >>> On 3/29/2013 12:07 PM, Walter Underwood wrote:
>  What are folks using for this?
> >>>
> >>> I don't know that this really answers your question, but Solr 4.1
> and
> >>> later includes a big chunk of codahale metrics internally for
> request
> >>> handler statistics - see SOLR-1972.  First we tried including the
> jar
> >>> and using the API, but that created thread leak problems, so the
> source
> >>> code was added.
> >>>
> >>> Thanks,
> >>> Shawn
> >>>
> >>>
> >>>
> >>>
> >>
> >
> > --
> > Walter Underwood
> > wun...@wunderwood.org
> >
> >
> >
>
> --
> Walter Underwood
> wun...@wunderwood.org
>
>
>
>