import solr source to eclipse

2014-10-12 Thread Ali Nazemian
Hi,
I am going to import solr source code to eclipse for some development
purpose. Unfortunately every tutorial that I found for this purpose is
outdated and did not work. So would you please give me some hint about how
can I import solr source code to eclipse?
Thank you very much.

-- 
A.Nazemian


Re: import solr source to eclipse

2014-10-12 Thread Anurag Sharma
I recently tried to run eclipse in debug mode and followed
http://wiki.apache.org/solr/HowToConfigureEclipse. It worked for me.


On Sun, Oct 12, 2014 at 7:39 PM, Ali Nazemian  wrote:

> Hi,
> I am going to import solr source code to eclipse for some development
> purpose. Unfortunately every tutorial that I found for this purpose is
> outdated and did not work. So would you please give me some hint about how
> can I import solr source code to eclipse?
> Thank you very much.
>
> --
> A.Nazemian
>


Mismatch in numFound in q=*:* query

2014-10-12 Thread vidit.asthana
Dear Experts, 

I have a strange problem where select q=*:* is returning different number of
documents. Sometime its returning numFound = 5866712 and sometimes it
returns numFound = 5852274.  *numFound is always one of these 2 values.*

Here is the query:

*http://localhost:5011/solr/mycollection/select?q=*:*&rows=0*


I am running Solr in cloud mode and this problem is occurring with both
solr-4.5.1 and solr-4.10.0. I have exactly same data indexed in both
versions. 4.5.1 is running on a 8 nodes cluster (4x2 shards) and solr-4.10.0
is running on a 4 node (2x2 shards)cluster.

Size of whole collection is ~4GB. No simultaneous indexing is going on. The
last indexing happened 20 days ago. *Collection is optimized.*

What might be the problem here?

I tried reloading the collection, restarting the whole cluster and changing
the *rows* param in query. But still I am having this problem.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Mismatch-in-numFound-in-q-query-tp4163911.html
Sent from the Solr - User mailing list archive at Nabble.com.


Shard not accessible after restarting

2014-10-12 Thread nabil Kouici
Hi All,
I'm evaluating solr performance. I've created implicit collection with 2 shards 
in different server. first shard contains 100 million documents (30GB), second 
contain one million document.When I restart the second solr instance, shard 
become immediately available. However, when I restart the first solr, shard 
with 100 million doc take a huge time to be available for search. Is it normal? 
In Cloud interface, shard is green (Active). My servers have 28GB RAM.
I'm using default solrconfig.xml. 
Any help?
Regards,Nabil.

Re: Mismatch in numFound in q=*:* query

2014-10-12 Thread Shawn Heisey
On 10/12/2014 12:26 PM, vidit.asthana wrote:
> I have a strange problem where select q=*:* is returning different number of
> documents. Sometime its returning numFound = 5866712 and sometimes it
> returns numFound = 5852274.  *numFound is always one of these 2 values.*
> 
> Here is the query:
> 
> *http://localhost:5011/solr/mycollection/select?q=*:*&rows=0*
> 
> 
> I am running Solr in cloud mode and this problem is occurring with both
> solr-4.5.1 and solr-4.10.0. I have exactly same data indexed in both
> versions. 4.5.1 is running on a 8 nodes cluster (4x2 shards) and solr-4.10.0
> is running on a 4 node (2x2 shards)cluster.

I really need to make a wiki page for this.  It would save so much
typing!  I also need to boil it down to a small-scale real-world example
and show how the numbers get calculated and what goes wrong, which means
I need to have a complete understanding of the problem, and at this
moment, I don't have that.

This is a problem that's unique to distributed indexes.  What causes it
is having documents with the same value in the uniqueKey field indexed
in more than one shard.

It is not a bug, it's a result of the way that results from multiple
shards are combined into one result.  The only way to "fix" this problem
would involve so much additional processing that it would make all
queries extremely slow.

If you're using automatic document routing, then your routing algorithm
may have changed at some point, and you didn't re-index.  If you're
using manual document routing, then some documents were indexed on the
wrong shard, and later indexed on another shard as well.

Preventing the problem is easy -- always index documents onto the
correct shard.  Fixing the problem at this point might involve clearing
your index and re-indexing from scratch, unless you can figure out which
documents have been indexed on more than one shard and you can delete
them from the incorrect shard(s).

Thanks,
Shawn



Re: Shard not accessible after restarting

2014-10-12 Thread Shawn Heisey
On 10/12/2014 12:46 PM, nabil Kouici wrote:
> I'm evaluating solr performance. I've created implicit collection with 2 
> shards in different server. first shard contains 100 million documents 
> (30GB), second contain one million document.When I restart the second solr 
> instance, shard become immediately available. However, when I restart the 
> first solr, shard with 100 million doc take a huge time to be available for 
> search. Is it normal? In Cloud interface, shard is green (Active). My servers 
> have 28GB RAM.
> I'm using default solrconfig.xml. 

Does the following URL describe the problem you're running into?

http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup

Thanks,
Shawn



Re: Mismatch in numFound in q=*:* query

2014-10-12 Thread Jan Høydahl
If you don't want "downtime", you could add a  field to your schema, reload, do a full re-index 
on top of your existing index, and then delete all documents that were not 
updated, via a delelteByQuery, e.g.: indextime:[* TO NOW-1DAY]

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

12. okt. 2014 kl. 21:59 skrev Shawn Heisey :

> On 10/12/2014 12:26 PM, vidit.asthana wrote:
>> I have a strange problem where select q=*:* is returning different number of
>> documents. Sometime its returning numFound = 5866712 and sometimes it
>> returns numFound = 5852274.  *numFound is always one of these 2 values.*
>> 
>> Here is the query:
>> 
>> *http://localhost:5011/solr/mycollection/select?q=*:*&rows=0*
>> 
>> 
>> I am running Solr in cloud mode and this problem is occurring with both
>> solr-4.5.1 and solr-4.10.0. I have exactly same data indexed in both
>> versions. 4.5.1 is running on a 8 nodes cluster (4x2 shards) and solr-4.10.0
>> is running on a 4 node (2x2 shards)cluster.
> 
> I really need to make a wiki page for this.  It would save so much
> typing!  I also need to boil it down to a small-scale real-world example
> and show how the numbers get calculated and what goes wrong, which means
> I need to have a complete understanding of the problem, and at this
> moment, I don't have that.
> 
> This is a problem that's unique to distributed indexes.  What causes it
> is having documents with the same value in the uniqueKey field indexed
> in more than one shard.
> 
> It is not a bug, it's a result of the way that results from multiple
> shards are combined into one result.  The only way to "fix" this problem
> would involve so much additional processing that it would make all
> queries extremely slow.
> 
> If you're using automatic document routing, then your routing algorithm
> may have changed at some point, and you didn't re-index.  If you're
> using manual document routing, then some documents were indexed on the
> wrong shard, and later indexed on another shard as well.
> 
> Preventing the problem is easy -- always index documents onto the
> correct shard.  Fixing the problem at this point might involve clearing
> your index and re-indexing from scratch, unless you can figure out which
> documents have been indexed on more than one shard and you can delete
> them from the incorrect shard(s).
> 
> Thanks,
> Shawn
> 



What happens if you don't set positionIncrementGap

2014-10-12 Thread Alexandre Rafalovitch
Hello,

I am working on - yet another - minimal schema, which involves the
settings that are matching defaults (or non-harming if defaults are
used). The one I am trying to figure out now is: positionIncrementGap

We set it to a 100 in all text field definitions. Does it mean it is
NOT some reasonable number by default?

I tried to trace it and all I can find is a default value in
SolrAnalyzer, which is 0.

But if it is 0 (zero), then why do we explicitly define to be 0 in all
non-text fields? Would seem to be redundant and - frankly - confusing.

Regards,
Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Re: import solr source to eclipse

2014-10-12 Thread Tomás Fernández Löbbe
The way I do this:
>From a terminal:
svn checkout https://svn.apache.org/repos/asf/lucene/dev/trunk/
lucene-solr-trunk
cd lucene-solr-trunk
ant eclipse

... And then, from your Eclipse "import existing java project", and select
the directory where you placed lucene-solr-trunk

On Sun, Oct 12, 2014 at 7:09 AM, Ali Nazemian  wrote:

> Hi,
> I am going to import solr source code to eclipse for some development
> purpose. Unfortunately every tutorial that I found for this purpose is
> outdated and did not work. So would you please give me some hint about how
> can I import solr source code to eclipse?
> Thank you very much.
>
> --
> A.Nazemian
>


Re: What happens if you don't set positionIncrementGap

2014-10-12 Thread Jack Krupansky
Read the Lucene analysis package summary section entitled "Field Section 
Boundaries":

http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/analysis/package-summary.html

TL;DR - if you leave it as the default, then a word at the end of one 
section and a word at the start of the next section would be an exact phrase 
match. You might ask why Lucene chose that default - I don't know, but Solr 
"best practice" is the opposite. I suspect that Solr chose a large number 
like 100 so that a phrase query could use a significant slop like 10 and 
still not match across sections.


In my e-book I have a section entitled "Position Increment Gap" in Chapter 2 
"Analyzers Overview" that details the reasoning as well. There is also 
another section with the same title in the Term Vector Component chapter 
that runs through an example in more detail.


See:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Alexandre Rafalovitch

Sent: Sunday, October 12, 2014 7:40 PM
To: solr-user
Subject: What happens if you don't set positionIncrementGap

Hello,

I am working on - yet another - minimal schema, which involves the
settings that are matching defaults (or non-harming if defaults are
used). The one I am trying to figure out now is: positionIncrementGap

We set it to a 100 in all text field definitions. Does it mean it is
NOT some reasonable number by default?

I tried to trace it and all I can find is a default value in
SolrAnalyzer, which is 0.

But if it is 0 (zero), then why do we explicitly define to be 0 in all
non-text fields? Would seem to be redundant and - frankly - confusing.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 



Re: What happens if you don't set positionIncrementGap

2014-10-12 Thread Alexandre Rafalovitch
Thanks Jack, this makes sense for text.

But what about for the ints, dates, and floats? The package web page
does not seem to say anything. And your book (at least in my release
7) is also only talking about the text field. Yet, we seem to have a
need to define the value (still as zero) in the schema.xml!

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 12 October 2014 22:24, Jack Krupansky  wrote:
> Read the Lucene analysis package summary section entitled "Field Section
> Boundaries":
> http://lucene.apache.org/core/4_10_0/core/org/apache/lucene/analysis/package-summary.html
>
> TL;DR - if you leave it as the default, then a word at the end of one
> section and a word at the start of the next section would be an exact phrase
> match. You might ask why Lucene chose that default - I don't know, but Solr
> "best practice" is the opposite. I suspect that Solr chose a large number
> like 100 so that a phrase query could use a significant slop like 10 and
> still not match across sections.
>
> In my e-book I have a section entitled "Position Increment Gap" in Chapter 2
> "Analyzers Overview" that details the reasoning as well. There is also
> another section with the same title in the Term Vector Component chapter
> that runs through an example in more detail.
>
> See:
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html
>
> -- Jack Krupansky
>
> -Original Message- From: Alexandre Rafalovitch
> Sent: Sunday, October 12, 2014 7:40 PM
> To: solr-user
> Subject: What happens if you don't set positionIncrementGap
>
>
> Hello,
>
> I am working on - yet another - minimal schema, which involves the
> settings that are matching defaults (or non-harming if defaults are
> used). The one I am trying to figure out now is: positionIncrementGap
>
> We set it to a 100 in all text field definitions. Does it mean it is
> NOT some reasonable number by default?
>
> I tried to trace it and all I can find is a default value in
> SolrAnalyzer, which is 0.
>
> But if it is 0 (zero), then why do we explicitly define to be 0 in all
> non-text fields? Would seem to be redundant and - frankly - confusing.
>
> Regards,
>Alex.
>
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Get cache statistics via rest

2014-10-12 Thread SolrUser1543
I want to monitor my solr cache efficiency  :
Filter cache , queryresultcache, fieldvaluecache. 

This information available on plugin/stats page for specific core. 

How can I get this information via Rest ? 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-cache-statistics-via-rest-tp4163951.html
Sent from the Solr - User mailing list archive at Nabble.com.