Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Hi Suresh,

Can you give us full set of parameters you use for edismax? qf, mm, etc.
And content of your stopwords.txt. Is a listed there too?

Ahmet



On Friday, February 28, 2014 8:54 AM, sureshrk19  wrote:
Hi All,

I'm having a problem while searching for some string with a word defined in
stopwords.txt.

eg: I have 'of' defined in stopwords.txt 

My schema analyzer's defined as follows:


        
        
        
      
      

        
        
        
        
      


I have defined the filed as 'all_text' field in schema.xml.

When I try to search for 
Case 1: a of b --> I don't get any response
Case 2: "a of b" --> There is some response
Case 3: a \of\ b --> There is some response.

I'm using stopwords in both 'index' and 'query' analyzer so, this should be
working for case 1 too, right?

Did i miss anything?

Thanks,
Suresh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339.html
Sent from the Solr - User mailing list archive at Nabble.com.



I still can creat index when the write.lock exists

2014-02-28 Thread Chen Lion
Dear all,
I hava a problem i can't understand it.

I use solr 4.6.1, and 2 nodes, one leader and one follower, both have the
write.lock file.

I did not think i could create index since the write.lock file exists,
right?

But I could, why?

Jiahui Chen


Re: Filter query exclusion with SolrJ

2014-02-28 Thread idioma
Ahmet,
thank you for your reply, much appreciated. Let me answer to your
question(s):

1) "Your example response () looks like customized."

It is not, but I have not included the code that generates it. In a
nutshell, I have two XSLT files, one that transforms the Solr query into
something that my Web application can understand (aka as query XSLT) and one
that does a similar thing with the Solr response. 

With regard to , the response XSLT includes the following (which
I have omitted before for brevity, but turned out to be relevant):



 





 






The above code allows mapping between the Solr facet field (in this case
author) and the facet code in the web application (creator). The Java class
also includes the following:

private QueryResponse execQuery(SolrQuery query) throws SolrServerException
{
QueryResponse rsp = solrServer.query( query );
return rsp; 

}

Element elfacets = new Element("facets"); 
List facets = rsp.getFacetFields();
if (facets != null) {
int i = 0;
for (FacetField facet : facets) {
Element sfacet = new Element("facet");
sfacet.setAttribute("name", facet.getName());

List facetEntries = facet.getValues();

for(FacetField.Count fcount : facetEntries) {
Element facetEntry = new Element("facetEntry");
facetEntry.setText(fcount.getName());
facetEntry.setAttribute("count",
String.valueOf(fcount.getCount()));
sfacet.addContent(facetEntry);
}
elfacets.addContent(sfacet);

}
root.addContent(elfacets);
} 


doc.addContent(root);

return doc;
}

This actually concerns with your second question:

"Your SolrJ program also has Element class that I have never seen"

Element is from org.jdom defines the behavior for an XML element. In my
case, it is used in the solr query response. 

The web application has a Refine check box where each box holds a facet
value that can be excluded, depending on the user's request. It is still
unclear to me where and how I should inject the exclusion variant in the
Java code. At this stage, in fact, the Exclude option works exactly as the
Include option, whereas it should return the total hits minus the subset
that falls into a specific facet value. Can you please provide examples?

Thanks,

I.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974p4120356.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-02-28 Thread epnRui
Hi Ahmet!!

I went ahead and did something I thought it was not a clean solution and
then when I read your post and I found we thought of the same solution,
including the European_Parliament with the _  :)

So I guess there would be no way to do this more cleanly, maybe only
implementing my own Tokenizer and Filters, but I honestly couldn't find a
tutorial for implement a customized solr Tokenizer. If I end up needing to
do it I will write a tutorial.

So for now I'm doing PatternReplaceCharFilterFactory to replace "European
Parliament" with European_Parliament (initially I didnt use the
md5hash European_Parliament).

Then I replace it back after the StandardTokenizerFactory ran, into
"European Parliament". Well I guess I just found a way to do a 2 words token
:)

I had seen the ShingleFilterFactory but the problem is I don't need the
whole phrase in tokens of 2 words and I understood it's what it does. Of
course I would need some filter that would handle a .txt with the tokens to
merge, like "European" and "Parliament".

I'm still having some other problem now but maybe I find a solution after I
read the page you annexed which seems great. Solr is considering #European
as #European and European, meaning it does 2 facets for one token. I want it
to consider it only as #European. I ran the analyzer debugger in my Solr
admin console and I don't see how he can be doing that.
Would you know of a reason for this?

Thanks for your reply and that page you annexed seems excelent and I'll read
it through.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-02-28 Thread David Santamauro


Have you tried to just use a copyField? For example, I had a similar use 
case where I needed to have particular field (f1) tokenized but also 
needed to facet on the complete contents.


For that, I created a copyField

  

f1 used tokenizers and filters but f2 was just a plain string. You then 
facet on f2


... just an idea



On 02/28/2014 04:54 AM, epnRui wrote:

Hi Ahmet!!

I went ahead and did something I thought it was not a clean solution and
then when I read your post and I found we thought of the same solution,
including the European_Parliament with the _  :)

So I guess there would be no way to do this more cleanly, maybe only
implementing my own Tokenizer and Filters, but I honestly couldn't find a
tutorial for implement a customized solr Tokenizer. If I end up needing to
do it I will write a tutorial.

So for now I'm doing PatternReplaceCharFilterFactory to replace "European
Parliament" with European_Parliament (initially I didnt use the
md5hash European_Parliament).

Then I replace it back after the StandardTokenizerFactory ran, into
"European Parliament". Well I guess I just found a way to do a 2 words token
:)

I had seen the ShingleFilterFactory but the problem is I don't need the
whole phrase in tokens of 2 words and I understood it's what it does. Of
course I would need some filter that would handle a .txt with the tokens to
merge, like "European" and "Parliament".

I'm still having some other problem now but maybe I find a solution after I
read the page you annexed which seems great. Solr is considering #European
as #European and European, meaning it does 2 facets for one token. I want it
to consider it only as #European. I ran the analyzer debugger in my Solr
admin console and I don't see how he can be doing that.
Would you know of a reason for this?

Thanks for your reply and that page you annexed seems excelent and I'll read
it through.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Filter query exclusion with SolrJ

2014-02-28 Thread Ahmet Arslan
Hi,

This should do the trick :  solrQuery.add(CommonParams.FQ, "fq=-{!term 
f=author}Dickens, Janet");

Ahmet


On Friday, February 28, 2014 11:21 AM, idioma  wrote:
Ahmet,
thank you for your reply, much appreciated. Let me answer to your
question(s):

1) "Your example response () looks like customized."

It is not, but I have not included the code that generates it. In a
nutshell, I have two XSLT files, one that transforms the Solr query into
something that my Web application can understand (aka as query XSLT) and one
that does a similar thing with the Solr response. 

With regard to , the response XSLT includes the following (which
I have omitted before for brevity, but turned out to be relevant):


        
 


                
                        
                



                            
    

The above code allows mapping between the Solr facet field (in this case
author) and the facet code in the web application (creator). The Java class
also includes the following:

private QueryResponse execQuery(SolrQuery query) throws SolrServerException
{
    QueryResponse rsp = solrServer.query( query );
    return rsp;    

}

Element elfacets = new Element("facets"); 
            List facets = rsp.getFacetFields();
            if (facets != null) {
                int i = 0;
                for (FacetField facet : facets) {
                    Element sfacet = new Element("facet");
                    sfacet.setAttribute("name", facet.getName());

                    List facetEntries = facet.getValues();

                    for(FacetField.Count fcount : facetEntries) {
                        Element facetEntry = new Element("facetEntry");
                        facetEntry.setText(fcount.getName());
                        facetEntry.setAttribute("count",
String.valueOf(fcount.getCount()));
                        sfacet.addContent(facetEntry);
                    }
                    elfacets.addContent(sfacet);

            }
            root.addContent(elfacets);
        } 


        doc.addContent(root);

        return doc;
    }

This actually concerns with your second question:

"Your SolrJ program also has Element class that I have never seen"

Element is from org.jdom defines the behavior for an XML element. In my
case, it is used in the solr query response. 

The web application has a Refine check box where each box holds a facet
value that can be excluded, depending on the user's request. It is still
unclear to me where and how I should inject the exclusion variant in the
Java code. At this stage, in fact, the Exclude option works exactly as the
Include option, whereas it should return the total hits minus the subset
that falls into a specific facet value. Can you please provide examples?

Thanks,

I.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974p4120356.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filter query exclusion with SolrJ

2014-02-28 Thread Ahmet Arslan
ups I sent it prematurely. Here is the correct one :  
solrQuery.add(CommonParams.FQ, "-{!term f=author}Dickens, Janet");



On Friday, February 28, 2014 12:42 PM, Ahmet Arslan  wrote:
Hi,

This should do the trick :  solrQuery.add(CommonParams.FQ, "fq=-{!term 
f=author}Dickens, Janet");

Ahmet



On Friday, February 28, 2014 11:21 AM, idioma  wrote:
Ahmet,
thank you for your reply, much appreciated. Let me answer to your
question(s):

1) "Your example response () looks like customized."

It is not, but I have not included the code that generates it. In a
nutshell, I have two XSLT files, one that transforms the Solr query into
something that my Web application can understand (aka as query XSLT) and one
that does a similar thing with the Solr response. 

With regard to , the response XSLT includes the following (which
I have omitted before for brevity, but turned out to be relevant):


        
 


                
                        
                



                            
    

The above code allows mapping between the Solr facet field (in this case
author) and the facet code in the web application (creator). The Java class
also includes the following:

private QueryResponse execQuery(SolrQuery query) throws SolrServerException
{
    QueryResponse rsp = solrServer.query( query );
    return rsp;    

}

Element elfacets = new Element("facets"); 
            List facets = rsp.getFacetFields();
            if (facets != null) {
                int i = 0;
                for (FacetField facet : facets) {
                    Element sfacet = new Element("facet");
                    sfacet.setAttribute("name", facet.getName());

                    List facetEntries = facet.getValues();

                    for(FacetField.Count fcount : facetEntries) {
                        Element facetEntry = new Element("facetEntry");
                        facetEntry.setText(fcount.getName());
                        facetEntry.setAttribute("count",
String.valueOf(fcount.getCount()));
                        sfacet.addContent(facetEntry);
                    }
                    elfacets.addContent(sfacet);

            }
            root.addContent(elfacets);
        } 


        doc.addContent(root);

        return doc;
    }

This actually concerns with your second question:

"Your SolrJ program also has Element class that I have never seen"

Element is from org.jdom defines the behavior for an XML element. In my
case, it is used in the solr query response. 

The web application has a Refine check box where each box holds a facet
value that can be excluded, depending on the user's request. It is still
unclear to me where and how I should inject the exclusion variant in the
Java code. At this stage, in fact, the Exclude option works exactly as the
Include option, whereas it should return the total hits minus the subset
that falls into a specific facet value. Can you please provide examples?

Thanks,

I.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974p4120356.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facets, termvectors, relevancy and Multi word tokenizing

2014-02-28 Thread Ahmet Arslan
Hi,

Let's say you have accomplished what you want. You have a .txt with the tokens 
tomerge, like "European" and "Parliament". What is your use case then? What is 
your high level goal? 

MappingCharFilter approach is closer (to your .txt approach) than 
PatternReplaceCharFilterFactory approach. 

By the way, it could also be simulated with ShingleFilterFactory + 
KeepWordFilterFactory + TypeTokenFilterFactory

May be it can be done via firing phrase queries at query time (without 
interfering with the index) at client side?  e.g. q="European Parliament"~0




On Friday, February 28, 2014 11:55 AM, epnRui  wrote:
Hi Ahmet!!

I went ahead and did something I thought it was not a clean solution and
then when I read your post and I found we thought of the same solution,
including the European_Parliament with the _  :)

So I guess there would be no way to do this more cleanly, maybe only
implementing my own Tokenizer and Filters, but I honestly couldn't find a
tutorial for implement a customized solr Tokenizer. If I end up needing to
do it I will write a tutorial.

So for now I'm doing PatternReplaceCharFilterFactory to replace "European
Parliament" with European_Parliament (initially I didnt use the
md5hash European_Parliament).

Then I replace it back after the StandardTokenizerFactory ran, into
"European Parliament". Well I guess I just found a way to do a 2 words token
:)

I had seen the ShingleFilterFactory but the problem is I don't need the
whole phrase in tokens of 2 words and I understood it's what it does. Of
course I would need some filter that would handle a .txt with the tokens to
merge, like "European" and "Parliament".

I'm still having some other problem now but maybe I find a solution after I
read the page you annexed which seems great. Solr is considering #European
as #European and European, meaning it does 2 facets for one token. I want it
to consider it only as #European. I ran the analyzer debugger in my Solr
admin console and I don't see how he can be doing that.
Would you know of a reason for this?

Thanks for your reply and that page you annexed seems excelent and I'll read
it through.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120361.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr cloud: Faceting issue on text field

2014-02-28 Thread David Miller
Hi Chris,

Thanks for the info. I have looked into the "docValues" option earlier. But
docValues doesn't support textField and we require textField to enable
various tokenizer and analyzers (like shingle, pattern filter etc.) We
require the faceting to be on terms with in the text field, not as a whole
(which string does). A use case is to generate tag clouds from social
conversations.

The enum option is interesting. From its description it seemed not suitable
for this purpose. I will try that out and see.

Regards,
Dave







On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter
wrote:

>
> : Yes, the memory and cpu spiked for that machine. Another issue I found in
> : the log was "SolrException: Too many values for UnInvertedField faceting
> on
> : field".
> : I was using the fc method. Will changing the method/params help?
>
> the fc/fcs faceting methods really aren't going to work well with
> something like an indexed full text field where it has to build an
> UnInvertedField with a huge volume of unique terms.
>
> : One thing I don't understand is that, the query was returning only a
> single
> : document, but the facet still seems to be having the issue.
>
> the data structures for faceting (which are the same for sorting in the
> single valued case) are optimized for re-use -- regardles of the number of
> documents that match, the FieldCache & UnInvertedField structures are
> built up for the entire index.  You pay up front with Heap space to get
> faster speed for your overall requests in return.
>
> For your situation, there are two possible sollutions to try...
>
> 1) facet.method=enum
>
> this is the classic alternative for faceting, it's typically much slower
> then the fc & fcs methods but that's because it let's you trade speed for
> RAM.  One specific thing you have to watch out for is that this will
> usually use the filterCache, and since you are almost certainly going to
> have more terms in this facet field then any workable size of your
> filterCache, there's going to be a lot of wasted time constantly evicting
> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>
>
> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>
> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>
> I haven't done much experimenting with this, particularly in our "facet
> on full text" type situation, but when you use docValues, in theory,
> in memory fieldCache and UnInvertedField structures are't needed --
> instead much smaller structures are kept in the heap that refer down
> directly to the DocValue structures memory mapped from disk (which are
> created when you add/commit to your index -- they don't need "un-inverted"
> at query time)
>
> I, for one, would definitley be interested to know if reindexing your full
> text field with docValues makes the faceting feasible...
>
> https://cwiki.apache.org/confluence/display/solr/DocValues
>
> -Hoss
> http://www.lucidworks.com/
>


SOLR cloud disaster recovery

2014-02-28 Thread Jan Van Besien
Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan


Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
Ahmet,

Thanks for the reply..

Here is the query:

http://localhost:8080/solr/collection1/select?q=a+of+b&fq=type%3AEntity&wt=json&indent=true

And here is my stopwords_en.txt content

a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or





--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120408.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr4 performance

2014-02-28 Thread Joshi, Shital
Thanks. 

We find little evidence that page/disk cache is causing this issue. We use sar 
to collect statistics. Here is the statistics on a node where the query took 
maximum time. (out of 5 shards, one with most data takes long time) However, 
we're reducing heap size and testing in QA. 

  CPU %user %nice   %system   %iowait%steal 
%idle   
17:00:01  all  2.11  0.00  0.04 0.00  0.00  
   97.85
Average:  all  7.52  0.00  0.16 0.02  0.00  
   92.31

tps rtps  wtps  bread/s   bwrtn/s
17:00:0110.63  0.0010.63 0.00 140.56
Average:73.90  2.65 71.24314.241507.93

pgpgin/s pgpgout/s   fault/s  majflt/s
17:00:010.00 23.42367.95  0.00
Average:52.37 251.32  586.79  0.82

Our current JVM is 30G and usage is ~26G. If we reduce JVM to 25G, we're afraid 
of hitting OOM error in Java.


-Original Message-
From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Sent: Thursday, February 27, 2014 3:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr4 performance

You would get more room for disk cache by reducing your large heap.
Otherwise, you'd have to add more RAM to your systems or shard your index
to more nodes to gain more RAM that way.

The Linux VM subsystem actually has a number of tuning parameters (like
vm.bdflush, vm.swappiness and vm.pagecache), but I don't know if there's
any definitive information about how to set them appropriately for Solr.

Michael Della Bitta

Applications Developer

o: +1 646 532 3062

appinions inc.

"The Science of Influence Marketing"

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Thu, Feb 27, 2014 at 3:09 PM, Joshi, Shital  wrote:

> Hi Michael,
>
> If page cache is the issue, what is the solution?
>
> Thanks!
>
> -Original Message-
> From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> Sent: Monday, February 24, 2014 9:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr4 performance
>
> I'm not sure how you're measuring free RAM. Maybe this will help:
>
> http://www.linuxatemyram.com/play.html
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062
>
> appinions inc.
>
> "The Science of Influence Marketing"
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions<
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> >
> w: appinions.com 
>
>
> On Mon, Feb 24, 2014 at 5:35 PM, Joshi, Shital 
> wrote:
>
> > Thanks.
> >
> > We found some evidence that this could be the issue. We're monitoring
> > closely to confirm this.
> >
> > One question though: none of our nodes show more that 50% of physical
> > memory used. So there is enough memory available for memory mapped files.
> > Can this kind of pause still happen?
> >
> >
> > -Original Message-
> > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> > Sent: Friday, February 21, 2014 5:28 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr4 performance
> >
> > It could be that your query is churning the page cache on that node
> > sometimes, so Solr pauses so the OS can drag those pages off of disk.
> Have
> > you tried profiling your iowait in top or iostat during these pauses?
> > (assuming you're using linux).
> >
> > Michael Della Bitta
> >
> > Applications Developer
> >
> > o: +1 646 532 3062
> >
> > appinions inc.
> >
> > "The Science of Influence Marketing"
> >
> > 18 East 41st Street
> >
> > New York, NY 10017
> >
> > t: @appinions  | g+:
> > plus.google.com/appinions<
> >
> https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
> > >
> > w: appinions.com 
> >
> >
> > On Fri, Feb 21, 2014 at 5:20 PM, Joshi, Shital 
> > wrote:
> >
> > > Thanks for your answer.
> > >
> > > We confirmed that it is not GC issue.
> > >
> > > The auto warming query looks good too and queries before and after the
> > > long running query comes back really quick. The only thing stands out
> is
> > > shard on which query takes long time has couple million more documents
> > than
> > > other shards.
> > >
> > > -Original Message-
> > > From: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
> > > Sent: Thursday, February 20, 2014 5:26 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: RE: Solr4 performance
> > >
> > > Hi,
> > >
> > > As for your first question, setting openSearcher to true means you

Re: Filter query exclusion with SolrJ

2014-02-28 Thread idioma
Ahmet,
thanks for this, but I do not think this actually meets my requirements. My
intent is not that of harcoding the facet field and value I want to exclude,
but to be able to apply the exclusion variant regardless (I currently have 3
facet field and ~ 5 million of records). Before posting my question, I had
already tried something similar:

solrQuery.set(CommonParams.FQ, “-author:Dickens,Janet”).

By trying what you have suggested, I construct the following Solr query:

fq=-%7B%21term+f%3Dauthor%7Dickens%3Janet&start=0&rows=10&q=%28Dickens%29&facet=true&facet.mincount=1&facet.limit=20&fl=*%2Cscore&facet.field=author&facet.field=domain&facet.field=content_type&sort=score+desc

In this way, the UI will immediately  "suppress" the facet value 'Dickens,
Janet' even before I click the Refine further option to exclude another
value. 

Thanks for your help,


I.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-query-exclusion-with-SolrJ-tp4119974p4120413.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: I still can creat index when the write.lock exists

2014-02-28 Thread Mark Miller
I’m pretty sure the default config will unlock on startup.

- Mark

http://about.me/markrmiller

On Feb 28, 2014, at 3:50 AM, Chen Lion  wrote:

> Dear all,
> I hava a problem i can't understand it.
> 
> I use solr 4.6.1, and 2 nodes, one leader and one follower, both have the
> write.lock file.
> 
> I did not think i could create index since the write.lock file exists,
> right?
> 
> But I could, why?
> 
> Jiahui Chen



Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



Re: SOLR cloud disaster recovery

2014-02-28 Thread Lajos

Hi Jan,

There are a few ways to do that, but no, nothing is automatic.

1) If your node is alive, you can create new replicas on the new node, 
let them replicate, verify they are ok, then delete the replicas on the 
old node and shut it down.


2) If your node is dead, create new replicas on the new node, let them 
replicate. You'll have to hand-edit clusterstate.json however, to fix 
the entries for the shards.


3) If you have a fully up-to-date backup of your dead node, just use the 
same hostname for your new node and restore the backups there. It should 
be fine. Just verify that the replicas for that node, as listed in 
clusterstate.json, are present and accounted for.


HTH,

Lajos


On 28/02/2014 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan



Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Can give parameters defined in defaults sections of request handler / 
solrconfig.xml?

By the way echoParams=all will list all parameters.



On Friday, February 28, 2014 5:18 PM, sureshrk19  wrote:
Ahmet,

Thanks for the reply..

Here is the query:

http://localhost:8080/solr/collection1/select?q=a+of+b&fq=type%3AEntity&wt=json&indent=true

And here is my stopwords_en.txt content

a
an
and
are
as
at
be
but
by
for
if
in
into
is
it
no
not
of
on
or





--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120408.html

Sent from the Solr - User mailing list archive at Nabble.com.



Date query not returning results only some time

2014-02-28 Thread Arun Rangarajan
Solr server version 4.2.1

I am facing a strange issue with a date query like this:

q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
-tag_id:268702&fq=(burial_score:[* TO 0.49] AND
-tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id

The only process by which we add documents to the core on which this query
executes is via data import handler full import. We do indexing on master
and queries are executed against a slave.

This query returns results till the time full import starts (1 AM PST
daily). But the moment full import starts, it does not return any results.
Other queries return results.

Our auto commit settings in solrconfig have openSearcher set to false as
shown below:
 
 
25000
 60 
false
 


  ${solr.updatelog.dir:}

 

It starts returning results after the full import finishes and issues a
commit, which takes about 1.5 hrs. The pollInterval for slave is set for
every hour:


 
${enable.master:false}
 startup
commit
 optimize
solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml
 

 ${enable.slave:false}
http://${master.ip}:${master.port}/solr/${
solr.core.name}/replication
 01:00:00

 

What am I doing wrong? Please let me know if you need any more details to
help me debug this.


StackOverflow ... the errors, not the site

2014-02-28 Thread Lajos

All,

Just playing around with the SuggestComponent, trying to compare results 
with the old-style spell-check-based suggester. Tried this config 
against a string field:


  
 
   json
   true
   true
   10
   default
 
 
   suggest2
 
  

  

  default
  FuzzyLookupFactory
  DocumentDictionaryFactory
  title
  price
  string

  

I hit this URL:

/suggest2?q=ab&suggest.build=true

and that works, but because "title" was as StrField, it wasn't quite 
what I wanted.


So I tried a TextField, "description". And I get this, with the same URL:

ERROR - 2014-02-28 17:29:49.618; org.apache.solr.common.SolrException; 
null:java.lang.RuntimeException: java.lang.StackOverflowError^M
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:796)^M
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:448)^M
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)^M
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)^M

at
...
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)^M
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)^M

at java.lang.Thread.run(Thread.java:662)^M
Caused by: java.lang.StackOverflowError^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:244)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M
at 
org.apache.lucene.util.automaton.SpecialOperations.getFiniteStrings(SpecialOperations.java:259)^M


etc etc


Any ideas??

Thanks,

Lajos




Re: Date query not returning results only some time

2014-02-28 Thread Jack Krupansky

How is first_publish_date defined?

After queries start failing, do an explicit query of some of the document 
IDs that you think should be present and see what the first_publish_date 
field contains.


Also, Solr and Lucene queries are not strict Boolean, so ANDing of a purely 
negative term requires explicitly referring to all documents before applying 
the negation.


So,

AND -tag_id:268702

should be:

AND (*:* -tag_id:268702)

Or, maybe you actually wanted this:

first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] -tag_id:268702

-- Jack Krupansky

-Original Message- 
From: Arun Rangarajan

Sent: Friday, February 28, 2014 11:15 AM
To: solr-user@lucene.apache.org
Subject: Date query not returning results only some time

Solr server version 4.2.1

I am facing a strange issue with a date query like this:

q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
-tag_id:268702&fq=(burial_score:[* TO 0.49] AND
-tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id

The only process by which we add documents to the core on which this query
executes is via data import handler full import. We do indexing on master
and queries are executed against a slave.

This query returns results till the time full import starts (1 AM PST
daily). But the moment full import starts, it does not return any results.
Other queries return results.

Our auto commit settings in solrconfig have openSearcher set to false as
shown below:


25000
60 
false


   
 ${solr.updatelog.dir:}
   


It starts returning results after the full import finishes and issues a
commit, which takes about 1.5 hrs. The pollInterval for slave is set for
every hour:



${enable.master:false}
startup
commit
optimize
solrconfig.xml,data-config.xml,schema.xml,stopwords.txt,synonyms.txt,elevate.xml


${enable.slave:false}
http://${master.ip}:${master.port}/solr/${
solr.core.name}/replication
01:00:00



What am I doing wrong? Please let me know if you need any more details to
help me debug this. 



Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
explicit

For all handlers I have the same setting.

Another observation I have is,

I'm getting results when I use, 'q.op=OR' the default operator set in
solrconfig.xml is 'AND'

the query working fine is:
http://localhost:8080/solr/collection1/select?q=bank+america&wt=json&indent=true&q.op=OR






--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120441.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stopwords issue with edismax

2014-02-28 Thread Ahmet Arslan
Hi,

From the URLs you provided, it is not clear that you use edismax query parser 
at all. Thats why I asked complete list of parameters. Can you paste request 
handler definition from solrconfig.xml? 

And what do you expect and what is not working for you.





On Friday, February 28, 2014 7:30 PM, sureshrk19  wrote:
explicit

For all handlers I have the same setting.

Another observation I have is,

I'm getting results when I use, 'q.op=OR' the default operator set in
solrconfig.xml is 'AND'

the query working fine is:
http://localhost:8080/solr/collection1/select?q=bank+america&wt=json&indent=true&q.op=OR






--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120441.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: stopwords issue with edismax

2014-02-28 Thread sureshrk19
Thanks for taking time on this...

Here is my request handler definition:



 
   edismax
   explicit
   10
   all_text number party name all_code ent_name
   all_text number^3 name^5 party^3 all_code^2
ent_name^7
   id description
   AND


Name which is indexed is: a of b
When I try to search, a of b then I don't see any results.

I changes q.op=OR then, I see results for this search.

I'm not sure why the same is not being returned when I search with AND
operator.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120459.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stopwords issue with edismax

2014-02-28 Thread Jack Krupansky

Look at the parsed_query by setting the debugQuery=true parameter.

I think what is happening is that the query parser will generate a separate 
dismax query for each term and each dismax query will require at least one 
of its fields to contain the term. I suspect that some of your qf fields do 
not ignore stopwords, so the dismax for "of" will not be empty (although the 
clause for some of the fields will not be present since the stop word filter 
eliminates them) so that the dismax fails to match anything and since 
q.op=AND, the whole query matches nothing.


-- Jack Krupansky

-Original Message- 
From: sureshrk19

Sent: Friday, February 28, 2014 1:12 PM
To: solr-user@lucene.apache.org
Subject: Re: stopwords issue with edismax

Thanks for taking time on this...

Here is my request handler definition:




  edismax
  explicit
  10
  all_text number party name all_code ent_name
  all_text number^3 name^5 party^3 all_code^2
ent_name^7
  id description
  AND


Name which is indexed is: a of b
When I try to search, a of b then I don't see any results.

I changes q.op=OR then, I see results for this search.

I'm not sure why the same is not being returned when I search with AND
operator.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-issue-with-edismax-tp4120339p4120459.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: SOLR cloud disaster recovery

2014-02-28 Thread Per Steffensen
We have created some scripts that can do this for you - basically 
reconstruct (by looking at information in ZK) solr.xml, core.properties 
etc on the new machine as they where on the machine that crashed. Our 
procedure when a machine crashes is

* Remove it from rack, replace it by a similar machine with same hostname/IP
* Run the scripts pointing out the IP of the machine that needs to have 
solr.xml and core.properties written
* Start solr on this machine - it now run that same set of replica that 
the crashed machine did. Guess they will sync automatically with their 
sister-replica, but I do not know, because we do not use replication.


I might be able to find something for you. Which version are you using - 
I have some scripts that work on 4.0 and some other scripts that work 
for 4.4 (and maybe later).


Regards, Per Steffensen

On 28/02/14 16:17, Jan Van Besien wrote:

Hi,

I am a bit confused about how solr cloud disaster recovery is supposed
to work exactly in the case of loosing a single node completely.

Say I have a solr cloud cluster with 3 nodes. My collection is created
with numShards=3&replicationFactor=3&maxShardsPerNode=3, so there is
no data loss when I loose a node.

However, how do configure a new node to take the place of the dead
node? I bring up a new node (same hostname, ip, as the dead node)
which is completely empty (empty data dir, empty solr.xml), install
solr, and connect it to zookeeper.

Is it supposed to work automatically from there? In my tests, the
server has no cores and the solr-cloud graph overview simply shows all
the shards/replicas on this node as down. Do I need to recreate the
cores first? Note that these cores were initially created indirectly
by creating the collection.

Thanks,
Jan





Solr 4.5.0 replication numDocs larger in slave

2014-02-28 Thread Geary, Frank
Hi,

I'm using Solr 4.5.0, I have a single master replicating to a single slave.  
Only the master is being indexed to - never the slave.  The master is committed 
once each night.  After the first commit and replication the numDoc counts are 
identical.  After the next nightly commit and after the second replication a 
few minutes later, the numDocs has increased in both the master and the slave 
as expected, but numDocs is not the same in the master as it is in the slave.  
The slave has about 33 more documents and one fewer segements (according to 
Overview in solr admin).

I suspect the numDocs may be in sync again after tonight, but can anyone 
explain what is going on here?   Is it possible a few deletions got committed 
to the master but not replicated to the slave?

Thanks

Frank




Perm Gen issues in SolrCloud

2014-02-28 Thread KNitin
Hi

 I am seeing the Perm Gen usage increase as i keep adding more collections.
What kind of strings get interned in solr? (Only schema , fields,
collection metadata or the data itself?)

Will Permgen space (atleast interned strings) increase proportional to the
size of the data in the collections or with the # of collections themselves?


I have temporarily increased the size of PermGen to deal with this but
would love to understand what goes on behind the scenes

Thanks
Nitin


Re: Solr cloud: Faceting issue on text field

2014-02-28 Thread David Miller
Hi Chris,

The enum option is working for us, with suitable minDf settings. We are
able to do faceting with decent speed using this.

Thanks a lot,
Dave


On Fri, Feb 28, 2014 at 9:09 AM, David Miller wrote:

> Hi Chris,
>
> Thanks for the info. I have looked into the "docValues" option earlier.
> But docValues doesn't support textField and we require textField to enable
> various tokenizer and analyzers (like shingle, pattern filter etc.) We
> require the faceting to be on terms with in the text field, not as a whole
> (which string does). A use case is to generate tag clouds from social
> conversations.
>
> The enum option is interesting. From its description it seemed not
> suitable for this purpose. I will try that out and see.
>
> Regards,
> Dave
>
>
>
>
>
>
>
> On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter  > wrote:
>
>>
>> : Yes, the memory and cpu spiked for that machine. Another issue I found
>> in
>> : the log was "SolrException: Too many values for UnInvertedField
>> faceting on
>> : field".
>> : I was using the fc method. Will changing the method/params help?
>>
>> the fc/fcs faceting methods really aren't going to work well with
>> something like an indexed full text field where it has to build an
>> UnInvertedField with a huge volume of unique terms.
>>
>> : One thing I don't understand is that, the query was returning only a
>> single
>> : document, but the facet still seems to be having the issue.
>>
>> the data structures for faceting (which are the same for sorting in the
>> single valued case) are optimized for re-use -- regardles of the number of
>> documents that match, the FieldCache & UnInvertedField structures are
>> built up for the entire index.  You pay up front with Heap space to get
>> faster speed for your overall requests in return.
>>
>> For your situation, there are two possible sollutions to try...
>>
>> 1) facet.method=enum
>>
>> this is the classic alternative for faceting, it's typically much slower
>> then the fc & fcs methods but that's because it let's you trade speed for
>> RAM.  One specific thing you have to watch out for is that this will
>> usually use the filterCache, and since you are almost certainly going to
>> have more terms in this facet field then any workable size of your
>> filterCache, there's going to be a lot of wasted time constantly evicting
>> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>>
>> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>>
>> I haven't done much experimenting with this, particularly in our "facet
>> on full text" type situation, but when you use docValues, in theory,
>> in memory fieldCache and UnInvertedField structures are't needed --
>> instead much smaller structures are kept in the heap that refer down
>> directly to the DocValue structures memory mapped from disk (which are
>> created when you add/commit to your index -- they don't need "un-inverted"
>> at query time)
>>
>> I, for one, would definitley be interested to know if reindexing your full
>> text field with docValues makes the faceting feasible...
>>
>> https://cwiki.apache.org/confluence/display/solr/DocValues
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>


Solr Cloud: Explain Plan not working

2014-02-28 Thread Divya Mehta
Hello,

We have recently moved to Solr cloud in our application, but we still do have 
single solr instance which we use for testing purposes.
We already had explain plan working in single instance, now after moving to 
solr cloud it does not show any explanation field in its response.

Thsi is how we ask for explain output in our SolrQuery, as

SolrQuery sq = new SolrQuery();
...

if (args.getExplain()) {
sq.setParam(CommonParams.DEBUG_QUERY, true);
sq.addField("explanation:[explain style=text]");
}

As per my understanding, there should not be any difference the way solr 
handles query in cloud or in a single instance. 

Can anybody help me if there is some other configuration that I need to add in.

Thanks,
Divya





network slows when solr is running - help

2014-02-28 Thread Petersen, Robert
Hi guys,

Got an odd thing going on right now.  Indexing into my master server (solr 
3.6.1) has slowed and it is because when solr runs ping shows latency.  When I 
stop solr though, ping returns to normal.  This has been happening 
occasionally, rebooting didn't help.  This is the first time I noticed that 
stopping solr returns ping speeds to normal.  I was thinking it was something 
with our network.   Solr is not consuming all resources on the box or anything 
like that, and normally everything works fine.  Has anyone seen this type of 
thing before?  Let me know if more info of any kind is needed.

Solr process is at 8% memory utilization and 35% cpu utilization in 'top' 
command.

Note: solr is the only thing running on the box.

C:\Users\robertpe>ping 10.12.132.101  <-- Indexing

Pinging 10.12.132.101 with 32 bytes of data:
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64

Ping statistics for 10.12.132.101:
Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\robertpe>ping 10.12.132.101  <-- Solr stopped

Pinging 10.12.132.101 with 32 bytes of data:
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64

Ping statistics for 10.12.132.101:
Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\robertpe>ping 10.12.132.101  <-- Solr started but no indexing activity

Pinging 10.12.132.101 with 32 bytes of data:
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
Reply from 10.12.132.101: bytes=32 time<1ms TTL=64

Ping statistics for 10.12.132.101:
Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\robertpe>ping 10.12.132.101  <-- Solr started and indexing started

Pinging 10.12.132.101 with 32 bytes of data:
Reply from 10.12.132.101: bytes=32 time=53ms TTL=64
Reply from 10.12.132.101: bytes=32 time=51ms TTL=64
Reply from 10.12.132.101: bytes=32 time=48ms TTL=64
Reply from 10.12.132.101: bytes=32 time=51ms TTL=64

Ping statistics for 10.12.132.101:
Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
Approximate round trip times in milli-seconds:
Minimum = 48ms, Maximum = 53ms, Average = 50ms

Robert (Robi) Petersen
Senior Software Engineer
Search Department





Re: network slows when solr is running - help

2014-02-28 Thread Josh
Is it indexing data from over the network? (high data throughput would
increase latency) Is it a virtual machine? (Other machines causing slow
downs) Another possible option is the network card is offloading processing
onto the CPU which is introducing latency when the CPU is under load.


On Fri, Feb 28, 2014 at 4:11 PM, Petersen, Robert <
robert.peter...@mail.rakuten.com> wrote:

> Hi guys,
>
> Got an odd thing going on right now.  Indexing into my master server (solr
> 3.6.1) has slowed and it is because when solr runs ping shows latency.
>  When I stop solr though, ping returns to normal.  This has been happening
> occasionally, rebooting didn't help.  This is the first time I noticed that
> stopping solr returns ping speeds to normal.  I was thinking it was
> something with our network.   Solr is not consuming all resources on the
> box or anything like that, and normally everything works fine.  Has anyone
> seen this type of thing before?  Let me know if more info of any kind is
> needed.
>
> Solr process is at 8% memory utilization and 35% cpu utilization in 'top'
> command.
>
> Note: solr is the only thing running on the box.
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Indexing
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
> Approximate round trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr stopped
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
> Approximate round trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr started but no indexing
> activity
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
> Approximate round trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr started and indexing started
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time=53ms TTL=64
> Reply from 10.12.132.101: bytes=32 time=51ms TTL=64
> Reply from 10.12.132.101: bytes=32 time=48ms TTL=64
> Reply from 10.12.132.101: bytes=32 time=51ms TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo
> Approximate round trip times in milli-seconds:
> Minimum = 48ms, Maximum = 53ms, Average = 50ms
>
> Robert (Robi) Petersen
> Senior Software Engineer
> Search Department
>
>
>
>


RE: network slows when solr is running - help

2014-02-28 Thread Petersen, Robert
Yes my indexer runs as a service on a different box, it has 24 threads pushing 
docs to solr atomically.  No the solr master is not virtual, it has 64 GB main 
memory and dual quad xeon cpus.  The cpu utilization is not maxed out from what 
I can see in 'top'.  Right now it says 38%.  The other thing is that this only 
happens intermittently.  I'm going to have IT update firmware on the NIC and 
then we'll open a ticket with HP for lack of anything else.

Here is some other information:

OS Name: Linux OS Version: 2.6.18-128.el5 Total RAM: 62.92 GB Free RAM: 44.20 
GB Committed JVM memory: 36.03 GB Total swap: 20.00 GB Free swap: 20.00 GB

NUMBER OF REQUESTS EACH INTERVAL request count: 232678   error count: 5
PROCESSING TIME (MS) IN EACH INTERVAL processing time: 15355740   max time: 
79408
TRAFFIC VOLUME (BYTES) IN EACH INTERVAL sent: 702 GB   received: 956 MB

-Original Message-
From: Josh [mailto:jwda...@gmail.com] 
Sent: Friday, February 28, 2014 1:27 PM
To: solr-user@lucene.apache.org
Subject: Re: network slows when solr is running - help

Is it indexing data from over the network? (high data throughput would increase 
latency) Is it a virtual machine? (Other machines causing slow
downs) Another possible option is the network card is offloading processing 
onto the CPU which is introducing latency when the CPU is under load.


On Fri, Feb 28, 2014 at 4:11 PM, Petersen, Robert < 
robert.peter...@mail.rakuten.com> wrote:

> Hi guys,
>
> Got an odd thing going on right now.  Indexing into my master server 
> (solr
> 3.6.1) has slowed and it is because when solr runs ping shows latency.
>  When I stop solr though, ping returns to normal.  This has been 
> happening occasionally, rebooting didn't help.  This is the first time 
> I noticed that stopping solr returns ping speeds to normal.  I was thinking 
> it was
> something with our network.   Solr is not consuming all resources on the
> box or anything like that, and normally everything works fine.  Has 
> anyone seen this type of thing before?  Let me know if more info of 
> any kind is needed.
>
> Solr process is at 8% memory utilization and 35% cpu utilization in 'top'
> command.
>
> Note: solr is the only thing running on the box.
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Indexing
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 
> 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: 
> bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: bytes=32 time<1ms 
> TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo Approximate round 
> trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr stopped
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 
> 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: 
> bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: bytes=32 time<1ms 
> TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo Approximate round 
> trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr started but no indexing 
> activity
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 
> 10.12.132.101: bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: 
> bytes=32 time<1ms TTL=64 Reply from 10.12.132.101: bytes=32 time<1ms 
> TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo Approximate round 
> trip times in milli-seconds:
> Minimum = 0ms, Maximum = 0ms, Average = 0ms
>
> C:\Users\robertpe>ping 10.12.132.101  <-- Solr started and indexing 
> started
>
> Pinging 10.12.132.101 with 32 bytes of data:
> Reply from 10.12.132.101: bytes=32 time=53ms TTL=64 Reply from 
> 10.12.132.101: bytes=32 time=51ms TTL=64 Reply from 10.12.132.101: 
> bytes=32 time=48ms TTL=64 Reply from 10.12.132.101: bytes=32 time=51ms 
> TTL=64
>
> Ping statistics for 10.12.132.101:
> Packets: Sent = 4, Received = 4, Lost = 0 (0% lo Approximate round 
> trip times in milli-seconds:
> Minimum = 48ms, Maximum = 53ms, Average = 50ms
>
> Robert (Robi) Petersen
> Senior Software Engineer
> Search Department
>
>
>
>


Re: Perm Gen issues in SolrCloud

2014-02-28 Thread Furkan KAMACI
Hi;

Jack has an answer for a PermGen usages:

"PermGen memory has to do with number of classes loaded, rather than
documents.

Here are a couple of pages that help explain Java PermGen issues. The
bottom
line is that you can increase the PermGen space, or enable unloading of
classes, or at least trace class loading to see why the problem occurs.

http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-
permgen-space-error

http://www.brokenbuild.com/blog/2006/08/04/java-jvm-gc-permgen
-and-memory-options/
"

You can see the conversation from here:
http://search-lucene.com/m/iMaR11lgj3Q1/permgen&subj=PermGen+OOM+Error

Thanks;
Furkan KAMACI


2014-02-28 21:37 GMT+02:00 KNitin :

> Hi
>
>  I am seeing the Perm Gen usage increase as i keep adding more collections.
> What kind of strings get interned in solr? (Only schema , fields,
> collection metadata or the data itself?)
>
> Will Permgen space (atleast interned strings) increase proportional to the
> size of the data in the collections or with the # of collections
> themselves?
>
>
> I have temporarily increased the size of PermGen to deal with this but
> would love to understand what goes on behind the scenes
>
> Thanks
> Nitin
>


Re: Solr Permgen Exceptions when creating/removing cores

2014-02-28 Thread Furkan KAMACI
Hi;

You can also check here:
http://stackoverflow.com/questions/3717937/cmspermgensweepingenabled-vs-cmsclassunloadingenabled

Thanks;
Furkan KAMACI


2014-02-26 22:35 GMT+02:00 Josh :

> Thanks Timothy,
>
> I gave these a try and -XX:+CMSPermGenSweepingEnabled seemed to cause the
> error to happen more quickly. With this option on it didn't seemed to do
> any intermittent garbage collecting that delayed the issue in with it off.
> I was already using a max of 512MB, and I can reproduce it with it set this
> high or even higher. Right now because of how we have this implemented just
> increasing it to something high just delays the problem :/
>
> Anything else you could suggest I would really appreciate.
>
>
> On Wed, Feb 26, 2014 at 3:19 PM, Tim Potter  >wrote:
>
> > Hi Josh,
> >
> > Try adding: -XX:+CMSPermGenSweepingEnabled as I think for some VM
> > versions, permgen collection was disabled by default.
> >
> > Also, I use: -XX:MaxPermSize=512m -XX:PermSize=256m with Solr, so 64M may
> > be too small.
> >
> >
> > Timothy Potter
> > Sr. Software Engineer, LucidWorks
> > www.lucidworks.com
> >
> > 
> > From: Josh 
> > Sent: Wednesday, February 26, 2014 12:27 PM
> > To: solr-user@lucene.apache.org
> > Subject: Solr Permgen Exceptions when creating/removing cores
> >
> > We are using the Bitnami version of Solr 4.6.0-1 on a 64bit windows
> > installation with 64bit Java 1.7U51 and we are seeing consistent issues
> > with PermGen exceptions. We have the permgen configured to be 512MB.
> > Bitnami ships with a 32bit version of Java for windows and we are
> replacing
> > it with a 64bit version.
> >
> > Passed in Java Options:
> >
> > -XX:MaxPermSize=64M
> > -Xms3072M
> > -Xmx6144M
> > -XX:+UseParNewGC
> > -XX:+UseConcMarkSweepGC
> > -XX:CMSInitiatingOccupancyFraction=75
> > -XX:+CMSClassUnloadingEnabled
> > -XX:NewRatio=3
> >
> > -XX:MaxTenuringThreshold=8
> >
> > This is our use case:
> >
> > We have what we call a database core which remains fairly static and
> > contains the imported contents of a table from SQL server. We then have
> > user cores which contain the record ids of results from a text search
> > outside of Solr. We then query for the data we want from the database
> core
> > and limit the results to the content of the user core. This allows us to
> > combine facet data from Solr with the search results from another engine.
> > We are creating the user cores on demand and removing them when the user
> > logs out.
> >
> > Our issue is the constant creation and removal of user cores combined
> with
> > the constant importing seems to push us over our PermGen limit. The user
> > cores are removed at the end of every session and as a test I made an
> > application that would loop creating the user core, import a set of data
> to
> > it, query the database core using it as a limiter and then remove the
> user
> > core. My expectation was in this scenario that all the permgen associated
> > with that user cores would be freed upon it's unload and allow permgen to
> > reclaim that memory during a garbage collection. This was not the case,
> it
> > would constantly go up until the application would exhaust the memory.
> >
> > I also investigated whether the there was a connection between the two
> > cores left behind because I was joining them together in a query but even
> > unloading the database core after unloading all the user cores won't
> > prevent the limit from being hit or any memory to be garbage collected
> from
> > Solr.
> >
> > Is this a known issue with creating and unloading a large number of
> cores?
> > Could it be configuration based for the core? Is there something other
> than
> > unloading that needs to happen to free the references?
> >
> > Thanks
> >
> > Notes: I've tried using tools to determine if it's a leak within Solr
> such
> > as Plumbr and my activities turned up nothing.
> >
>


How to best handle search like Dave & David

2014-02-28 Thread Susheel Kumar
Hi,

We have name searches on Solr for millions of documents. User may search like 
"Morrison Dave" or other may search like "Morrison David".  What's the best way 
to handle that both brings similar results. Adding Synonym is the option we are 
using right.

But we may need to add around such 50,000+ synonyms for different names for 
each specific name there can be couple of synonyms like for Richard, it can be 
Rich, Rick, Richie etc.

Any experience adding so many synonyms or any other thoughts? Stemming may help 
in few situations but not like Dave and David.

Thanks,
Susheel


Boost query syntax error

2014-02-28 Thread Arun Rangarajan
The Solr function query documentation (
https://wiki.apache.org/solr/FunctionQuery#exists) says:

exists(query({!v='year:2012'})) will return true for docs with year=2012

I have a document like:

{
  id: 1,
  user_type: ADMIN,
  like_score: 1
}
id, user_type and like_score are all indexed and stored files, with id
being int, user_type being string and like_score being int.

I issue a query like this:

q={!boost b=if(true,10,1)}id:1&rows=1&fl=*,score
which works.

But this query does not work:

q={!boost
b=if(exists(query({!v='user_type:ADMIN'})),10,1)}id:1&rows=1&fl=*,score
It gives an error like this:

"error":{
  "msg":"org.apache.solr.search.SyntaxError: Cannot parse ')),5,10)}id:1':
Encountered \" \")\" \") \"\" at line 1, column 0.\nWas expecting one of:\n
...\n\"+\" ...\n\"-\" ...\n ...\n\"(\"
...\n\"*\" ...\n ...\n ...\n
...\n ...\n ...\n\"[\" ...\n\"{\"
...\n ...\n ...\n ...\n\"*\" ...\n
 ",
  "code":400
}
How do I fix the query?

This syntax works:

q={!func}if(exists(query({!v='user_type:ADMIN'})),5,10)&rows=1&fl=*,score
but it doesn't give the multiplicative score I want.


Re: Date query not returning results only some time

2014-02-28 Thread Arun Rangarajan
Thanks, Jack.

>
How is first_publish_date defined?



with "date" being




Yes, we need to fix the Boolean operators AND, OR and NOT as mentioned in
http://searchhub.org/2011/12/28/why-not-and-or-and-not/ but I believe that
is not an issue here, because the same query returns results few mins
before the full index started.



On Fri, Feb 28, 2014 at 8:39 AM, Jack Krupansky wrote:

> How is first_publish_date defined?
>
> After queries start failing, do an explicit query of some of the document
> IDs that you think should be present and see what the first_publish_date
> field contains.
>
> Also, Solr and Lucene queries are not strict Boolean, so ANDing of a
> purely negative term requires explicitly referring to all documents before
> applying the negation.
>
> So,
>
> AND -tag_id:268702
>
> should be:
>
> AND (*:* -tag_id:268702)
>
> Or, maybe you actually wanted this:
>
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] -tag_id:268702
>
> -- Jack Krupansky
>
> -Original Message- From: Arun Rangarajan
> Sent: Friday, February 28, 2014 11:15 AM
> To: solr-user@lucene.apache.org
> Subject: Date query not returning results only some time
>
>
> Solr server version 4.2.1
>
> I am facing a strange issue with a date query like this:
>
> q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
> -tag_id:268702&fq=(burial_score:[* TO 0.49] AND
> -tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id
>
> The only process by which we add documents to the core on which this query
> executes is via data import handler full import. We do indexing on master
> and queries are executed against a slave.
>
> This query returns results till the time full import starts (1 AM PST
> daily). But the moment full import starts, it does not return any results.
> Other queries return results.
>
> Our auto commit settings in solrconfig have openSearcher set to false as
> shown below:
> 
> 
> 25000
> 60 
> false
> 
>
>
>  ${solr.updatelog.dir:}
>
> 
>
> It starts returning results after the full import finishes and issues a
> commit, which takes about 1.5 hrs. The pollInterval for slave is set for
> every hour:
>
> 
> 
> ${enable.master:false}
> startup
> commit
> optimize
>  name="confFiles">solrconfig.xml,data-config.xml,schema.
> xml,stopwords.txt,synonyms.txt,elevate.xml
> 
> 
> ${enable.slave:false}
> http://${master.ip}:${master.port}/solr/${
> solr.core.name}/replication
> 01:00:00
> 
> 
>
> What am I doing wrong? Please let me know if you need any more details to
> help me debug this.
>


Re: Perm Gen issues in SolrCloud

2014-02-28 Thread KNitin
Hi Furkan

 I have read that before but I haven't added any new classes or changed
anything with my setup. I just created more collections in solr. How will
that increase perm gen space ? Doesn't solr intern strings at all ?
Interned strings also go to the perm gen space right?

- Nitin


On Fri, Feb 28, 2014 at 3:11 PM, Furkan KAMACI wrote:

> Hi;
>
> Jack has an answer for a PermGen usages:
>
> "PermGen memory has to do with number of classes loaded, rather than
> documents.
>
> Here are a couple of pages that help explain Java PermGen issues. The
> bottom
> line is that you can increase the PermGen space, or enable unloading of
> classes, or at least trace class loading to see why the problem occurs.
>
>
> http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-
> permgen-space-error
>
> http://www.brokenbuild.com/blog/2006/08/04/java-jvm-gc-permgen
> -and-memory-options/
> "
>
> You can see the conversation from here:
> http://search-lucene.com/m/iMaR11lgj3Q1/permgen&subj=PermGen+OOM+Error
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-02-28 21:37 GMT+02:00 KNitin :
>
> > Hi
> >
> >  I am seeing the Perm Gen usage increase as i keep adding more
> collections.
> > What kind of strings get interned in solr? (Only schema , fields,
> > collection metadata or the data itself?)
> >
> > Will Permgen space (atleast interned strings) increase proportional to
> the
> > size of the data in the collections or with the # of collections
> > themselves?
> >
> >
> > I have temporarily increased the size of PermGen to deal with this but
> > would love to understand what goes on behind the scenes
> >
> > Thanks
> > Nitin
> >
>


Re: Date field indexing in Solr

2014-02-28 Thread Erick Erickson
Yep. One alternative is something I just found out about;

ParseDateFieldUpdateProcessorFactory


Best,

Erick


On Thu, Feb 27, 2014 at 3:12 PM, solr2020  wrote:

> Hi,
>
> We are using 'solr.TrieDateField' type for indexing a date column in Solr.
> By default triedate will index date columns as UTC format. But we need the
> date as it is in the source(DB table) with time associated with that date.
> Do we need to use DateFormatTransformer to get the right date format.
>
> Thanks.
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Date-field-indexing-in-Solr-tp4120281.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 4.5.0 replication numDocs larger in slave

2014-02-28 Thread Erick Erickson
That really shouldn't be happening IF indexing is shut off. Otherwise
the slave is taking a snapshot of the master index and synching.

bq: The slave has about 33 more documents and one fewer
segements (according to Overview in solr admin

Sounds like the master is still indexing and you've deleted documents
on the master.

Best,
Erick


On Fri, Feb 28, 2014 at 11:08 AM, Geary, Frank wrote:

> Hi,
>
> I'm using Solr 4.5.0, I have a single master replicating to a single
> slave.  Only the master is being indexed to - never the slave.  The master
> is committed once each night.  After the first commit and replication the
> numDoc counts are identical.  After the next nightly commit and after the
> second replication a few minutes later, the numDocs has increased in both
> the master and the slave as expected, but numDocs is not the same in the
> master as it is in the slave.  The slave has about 33 more documents and
> one fewer segements (according to Overview in solr admin).
>
> I suspect the numDocs may be in sync again after tonight, but can anyone
> explain what is going on here?   Is it possible a few deletions got
> committed to the master but not replicated to the slave?
>
> Thanks
>
> Frank
>
>
>


Re: Date query not returning results only some time

2014-02-28 Thread Erick Erickson
This is odd. The full import, I think, deletes the
docs in the index when it starts.

If you check our index directory on the slave, is it empty
after the full import starts? If so, check your solr log
on the slave... does it show a replication?

Shooting in the dark...

Erick


On Fri, Feb 28, 2014 at 3:57 PM, Arun Rangarajan
wrote:

> Thanks, Jack.
>
> >
> How is first_publish_date defined?
>
>  />
>
> with "date" being
>
>  positionIncrementGap="0" />
>
>
> Yes, we need to fix the Boolean operators AND, OR and NOT as mentioned in
> http://searchhub.org/2011/12/28/why-not-and-or-and-not/ but I believe that
> is not an issue here, because the same query returns results few mins
> before the full index started.
>
>
>
> On Fri, Feb 28, 2014 at 8:39 AM, Jack Krupansky  >wrote:
>
> > How is first_publish_date defined?
> >
> > After queries start failing, do an explicit query of some of the document
> > IDs that you think should be present and see what the first_publish_date
> > field contains.
> >
> > Also, Solr and Lucene queries are not strict Boolean, so ANDing of a
> > purely negative term requires explicitly referring to all documents
> before
> > applying the negation.
> >
> > So,
> >
> > AND -tag_id:268702
> >
> > should be:
> >
> > AND (*:* -tag_id:268702)
> >
> > Or, maybe you actually wanted this:
> >
> > first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] -tag_id:268702
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Arun Rangarajan
> > Sent: Friday, February 28, 2014 11:15 AM
> > To: solr-user@lucene.apache.org
> > Subject: Date query not returning results only some time
> >
> >
> > Solr server version 4.2.1
> >
> > I am facing a strange issue with a date query like this:
> >
> > q=first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS] AND
> > -tag_id:268702&fq=(burial_score:[* TO 0.49] AND
> > -tag_id:286006)&rows=1&sort=random_906313237 asc&fl=id
> >
> > The only process by which we add documents to the core on which this
> query
> > executes is via data import handler full import. We do indexing on master
> > and queries are executed against a slave.
> >
> > This query returns results till the time full import starts (1 AM PST
> > daily). But the moment full import starts, it does not return any
> results.
> > Other queries return results.
> >
> > Our auto commit settings in solrconfig have openSearcher set to false as
> > shown below:
> > 
> > 
> > 25000
> > 60 
> > false
> > 
> >
> >
> >  ${solr.updatelog.dir:}
> >
> > 
> >
> > It starts returning results after the full import finishes and issues a
> > commit, which takes about 1.5 hrs. The pollInterval for slave is set for
> > every hour:
> >
> > 
> > 
> > ${enable.master:false}
> > startup
> > commit
> > optimize
> >  > name="confFiles">solrconfig.xml,data-config.xml,schema.
> > xml,stopwords.txt,synonyms.txt,elevate.xml
> > 
> > 
> > ${enable.slave:false}
> > http://${master.ip}:${master.port}/solr/${
> > solr.core.name}/replication
> > 01:00:00
> > 
> > 
> >
> > What am I doing wrong? Please let me know if you need any more details to
> > help me debug this.
> >
>


Re: Group query not cached in SOLR

2014-02-28 Thread soodyogesh
Any pointer in this will be helpful, is there a way to avoid using group by
queries and achieve similar results

or way to enable caching for group by queries



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-query-not-cached-in-SOLR-tp4120159p4120547.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date query not returning results only some time

2014-02-28 Thread Chris Hostetter

: This is odd. The full import, I think, deletes the
: docs in the index when it starts.

Yeah, if you are doing a full-import everyday, and you don't want it to 
delete all docs when it starts, you need to specify "clearn=false"

https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand



-Hoss
http://www.lucidworks.com/


Re: Date query not returning results only some time

2014-02-28 Thread Arun Rangarajan
Thx, Erick and Chris.

This is indeed very strange. Other queries which do not restrict by the
date field are returning results, so the index is definitely not empty. Has
it got something to do with the date query part, with NOW/DAY or something
in here?
first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]

For now, I have set up a script to just log the number of docs on the slave
every minute. Will monitor and report the findings.


On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter
wrote:

>
> : This is odd. The full import, I think, deletes the
> : docs in the index when it starts.
>
> Yeah, if you are doing a full-import everyday, and you don't want it to
> delete all docs when it starts, you need to specify "clearn=false"
>
>
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>
>
>
> -Hoss
> http://www.lucidworks.com/
>


Solr is NoSQL database or not?

2014-02-28 Thread nutchsolruser
You may think this is silly question but let me ask this because i am
confused , 
http://www.lucidworks.com/webinar-solr-4-the-nosql-search-server/  this says
Solr is NoSQL but many other links dont have solr in their list as NoSQL
database.

http://en.wikipedia.org/wiki/NoSQL
http://en.wikipedia.org/wiki/Document-oriented_database

 it's really confusing what is real meaning of NoSQL database?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-NoSQL-database-or-not-tp4120554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding filter query slows down avg response time

2014-02-28 Thread nutchsolruser
I am finding users with same nick name in certain area. when i send both
queries in q parameter to solr it works really fast . but if i send location
query in fq then it slows down too much. why it is so? why adding fq to
query degrades my performance?

nickname:"nick name"
{!geofilt pt=20.2284,80.2284 sfield=ps_lat_long d=4}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-filter-query-slows-down-avg-response-time-tp4120555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date query not returning results only some time

2014-02-28 Thread Erick Erickson
Well, I'd certainly try removing parts of the query to see
what was actually in the index.

I don't see anything obvious though...

Erick


On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
wrote:

> Thx, Erick and Chris.
>
> This is indeed very strange. Other queries which do not restrict by the
> date field are returning results, so the index is definitely not empty. Has
> it got something to do with the date query part, with NOW/DAY or something
> in here?
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>
> For now, I have set up a script to just log the number of docs on the slave
> every minute. Will monitor and report the findings.
>
>
> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter
> wrote:
>
> >
> > : This is odd. The full import, I think, deletes the
> > : docs in the index when it starts.
> >
> > Yeah, if you are doing a full-import everyday, and you don't want it to
> > delete all docs when it starts, you need to specify "clearn=false"
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
>


Re: Solr is NoSQL database or not?

2014-02-28 Thread Gora Mohanty
On 1 March 2014 09:39, nutchsolruser  wrote:
> You may think this is silly question but let me ask this because i am
> confused ,
> http://www.lucidworks.com/webinar-solr-4-the-nosql-search-server/  this says
> Solr is NoSQL but many other links dont have solr in their list as NoSQL
> database.
>
> http://en.wikipedia.org/wiki/NoSQL
> http://en.wikipedia.org/wiki/Document-oriented_database
>
>  it's really confusing what is real meaning of NoSQL database?

Rather than looking for buzzword compliance, maybe you should
ask what features do you need out of Solr. We have used Solr as
a noSQL data store, but for something like that, plus search, Solr
+ Cassandra look like a good bet.

Regards,
Gora


Re: Adding filter query slows down avg response time

2014-02-28 Thread nutchsolruser
Found and here,
may be it's because my filter query is changing for each new user. Better i
keep it in main query
http://lucene.472066.n3.nabble.com/fq-vs-q-td495570.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-filter-query-slows-down-avg-response-time-tp4120555p4120559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date query not returning results only some time

2014-02-28 Thread Arun Rangarajan
I believe I figured out what the issue is. Even though we do not open a new
searcher on master during full import, the slave anyway replicates the
index after auto commits! (Is this desired behavior?) Since "clean=true"
this meant all the docs were deleted on slave and a partial index got
replicated! The reason only the date query did not return any results is
because recently created docs have higher doc IDs and we index by ascending
order of IDs!

I believe I have two options:
- as Chris suggested I have to use "clean=false" so the existing docs are
not deleted first on the slave. Since we have primary keys, newly added
docs will overwrite old docs as they get added.
- disable replication after commits. Replicate only after optimize.

Thx all for your help.





On Fri, Feb 28, 2014 at 8:06 PM, Arun Rangarajan
wrote:

> Thx, Erick and Chris.
>
> This is indeed very strange. Other queries which do not restrict by the
> date field are returning results, so the index is definitely not empty. Has
> it got something to do with the date query part, with NOW/DAY or something
> in here?
> first_publish_date:[NOW/DAY-33DAYS TO NOW/DAY-3DAYS]
>
> For now, I have set up a script to just log the number of docs on the
> slave every minute. Will monitor and report the findings.
>
>
> On Fri, Feb 28, 2014 at 6:49 PM, Chris Hostetter  > wrote:
>
>>
>> : This is odd. The full import, I think, deletes the
>> : docs in the index when it starts.
>>
>> Yeah, if you are doing a full-import everyday, and you don't want it to
>> delete all docs when it starts, you need to specify "clearn=false"
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-Parametersforthefull-importCommand
>>
>>
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>