Re: Solr 4.1 over Websphere errors

2013-06-06 Thread Anria
Thank you 

This sure is a lot to chew on



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-1-over-Websphere-errors-tp4068715p4068740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can't find solr.xml

2013-06-06 Thread Anria
Nabeel,

I just want to say, that though this post is very old, in the entire
internet of this error, your suggestion of moving out of /home//solr  
into  /opt/solr   was the one that worked for me too

Thank you! 
Anria



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-find-solr-xml-tp3992267p4068768.html
Sent from the Solr - User mailing list archive at Nabble.com.


&fq degrades qtime in a 20million doc collection

2016-01-13 Thread Anria B.
hi all, 

I have a Really fun question to ask.  I'm sitting here looking at what is by
far the beefiest box I've ever seen in my life.  256GB of RAM,  extreme
TerraBytes of disc space, the works.  Linux server properly partitioned 

Yet, what we are seeing goes against all intuition I've built up in the Solr
world

1.   Collection has 20-30 million docs.
2.   q=*&fq=someField:SomeVal   ---> takes 2.5 seconds
3.q=someField:SomeVal -->  300ms
4.   as numFound -> infinity, qtime -> infinity.

have any of you encountered such a thing?

that FQ degrades query time by so much?   

it's pure Solr 5.3.1.   ZK + Tomcat 8 + 1shard in solr.  JDK_8u60  All
running on this same box.

We have already tested different autoCommit strategies, and different values
for heap size, starting at 16GB, 32GB, 64GB, 128GB ...The only place we
saw a 100ms improvement was between 32 - -Xmx=64GB.  

Thanks 
Anria 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-13 Thread Anria B.
hi Shawn

Thanks for the quick answer.  As for the q=*,  we also saw similar results
in our testing when doing things like 

q=somefield:qval
&fq=otherfield:fqval

Which makes a pure Lucene query.  I simplified things somewhat since our
results were always that as numFound got large, the query time degraded as
soon as we added any &fq in the mix. 

We also saw similar results for queries like 

q=query stuff
&defType=edismax
&df=afield
&qf=afield bfield cfield


So the query structure was not what created the 3-7 second query time, it
was always a correlation between is &fq in the query, and what is the
numFound.  We've run numerous load tests for bringing in good query with fq
values in the "newSearcher",  caches on, caches off  ... this same
phenomenon persisted.  

As for Tomcat, it's an easy enough test to run it in Jetty.  We will sure
try that!  GC we've had default and G1 setups.  

Thanks for giving us something to think about

Anria 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
hi all, 

We did try the q=queryA AND queryB, vs q=queryA&fq=queryB.   For all tests,
we commented out caching, and reload core between queries to be ultra sure
that we are getting good comps on time.

we have so many unique Fq and such frequent commits that caches are always
invalidated, so our tests for the most part were with caches all commented
out.   Further, we have seen some gains in using an autoCommit strategy of
10 or 15 seconds, but still the first queries are horrible.

Also, we tried to set some of these &fq in the "newSearcher" so that it
warms at least an OS caches once before registering the Searcher as
available.

The index size is around 121GB.  So it's just outside of Modest size, but
not yet unacceptable range.  THe docs are all modest in content.  Small
pieces of content, mostly strings, think Metadata of PDF files for the most
part.  not even the OCR content of them, just great well defined metadata.

[quote]
 How much memory are you giving the JVM? Are you autowarming? Are you
indexing while this is going on, and if what are your commit parameters? If
you add &debug=true to your query, one of 
the returned sections 
[/quote]

We tried with several sizes of heap, the gains were minimal.  Above that no
gain.
If we use autowarming in either filterCache or NewSearcher query, the query
takes too long, then several newsearcher classes get created and we start
seeing maximum newSearcher exceeded errors

It's by using &debug=true and &debug=timing that we isolated this.  the
Query time took the longest.  Sometimes Prepare takes a little time too.
Forget it if we add a facet that adds another 500+ ms at the low end ... 

Very perplexing and fun challenge.  Thank Toke for that info on the heap
size pointers, we will dial it down on the Heap size

Anria 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250798.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can we create multiple cluster in single Zookeeper instance

2016-01-14 Thread Anria B.
hi Mugeesh

It's best to use Zookeeper as it was intended.  Install, or run 3 of them
independent of any Solr, then point Solr to the zookeeper cluster.

You can have 1, but then, if anything happens to that 1 single node of
Zookeeper, all of your Solr will be dead, until you can properly revive it
from a back-up.   If it's terribly corrupte

Hope this helps
Anria



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-we-create-multiple-cluster-in-single-Zookeeper-instance-tp4250791p4250810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
Here are some Actual examples, if it helps

wt=json&q=*:*&indent=on&fq=SolrDocumentType:"invalidValue"&fl=timestamp&rows=0&start=0&debug=timing

{
"responseHeader": {
"status": 0,
"QTime": 590,
"params": {
"q": "*:*",
"debug": "timing",
"indent": "on",
"fl": "timestamp",
"start": "0",
"fq": "SolrDocumentType:\"invalidValue\"",
"rows": "0",
"wt": "json"
}
},
"response": {
"numFound": 22916435,
"start": 0,
"docs": []
},
"debug": {
"timing": {
"time": 590,
"prepare": {
"time": 0,
"query": {
"time": 0
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"debug": {
"time": 0
}
},
"process": {
"time": 590,
"query": {
"time": 590
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"debug": {
"time": 0
}
}
}
}
}

Now we wipe out all caches, and put the filter in q.

wt=json&q=SolrDocumentType:"invalidValue"&indent=on&fl=timestamp&rows=0&start=0&debug=timing
{
"responseHeader": {
"status": 0,
"QTime": 266,
"params": {
"q": "SolrDocumentType:\"invalidValue\"",
"debug": "timing",
"indent": "on",
"fl": "timestamp",
"start": "0",
"rows": "0",
"wt": "json"
}
},
"response": {
"numFound": 22916435,
"start": 0,
"docs": []
},
"debug": {
"timing": {
"time": 266,
"prepare": {
"time": 0,
"query": {
"time": 0
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"debug": {
"time": 0
}
},
"process": {
"time": 266,
"query": {
"time": 266
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"debug": {
"time": 0
}
}
}
}
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
Here is a stacktrace of when we put a &fq in the autowarming, or in the
"newSearcher" to warm up the collection after a commit.



2016-01-12 19:00:13,216 [http-nio-19082-exec-25
vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704
vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] INFO 
org.apache.solr.update.UpdateHandler - start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
2016-01-12 19:00:13,217 [http-nio-19082-exec-25
vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704
vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] WARN 
org.apache.solr.core.SolrCore - [instance_2228] Error opening new searcher.
exceeded limit of maxWarmingSearchers=10, try again later.
2016-01-12 19:00:13,217 [http-nio-19082-exec-25
vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704
vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] INFO 
org.apache.solr.update.processor.LogUpdateProcessor - [instance_2228]
webapp=/solr path=/update
params={waitSearcher=true&commit=true&softCommit=true&wt=javabin&version=2}
{} 0 0
2016-01-12 19:00:13,217 [http-nio-19082-exec-25
vaultThreadId:http-STAGE-30518-14 vaultSessionId:1E53A095AD22704
vaultNodeId:nodeId:node-2 vaultInstanceId:2228 vaultUserId:9802] ERROR
org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Error
opening new searcher. exceeded limit of maxWarmingSearchers=10, try again
later.
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1759)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:609)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:95)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:64)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1635)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1612)
at
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:161)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)
at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at
veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:44)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250836.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-14 Thread Anria B.
hi Shawn

Thanks for your comprehensive answers.  I really appreciate it.  Just for
clarity, the numbers I posted here were from tests that we isolated only one
single fq and a q.  These do have good times, even though its almost 600ms. 
Once we are in application mode, and other fq's and facets etc are added,
query times go as bad as 7 seconds (which I personally observed).  

But you did give us a lot to work with, especially I think in the arena of
commit strategies and cache usage.  We'll do some more tests with different
strategies in this area.

Thanks
Anria



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4250855.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Anria B.
Thanks Toke for this.  It gave us a ton to think about, and it really helps
supporting the notion of several smaller indexes over one very large one,
where we can rather distribute a few JVM processes with less size each, than
have one massive one that is according to this, less efficient.



Toke Eskildsen wrote
> I would guess the 100 ms improvement was due to a factor not related to
> heap size. With the exception of a situation where the heap is nearly
> full, increasing Xmx will not improve Solr performance significantly.
> 
> Quick note: Never set Xmx in the range 32GB-40GB (40GB is approximate):
> At the 32GB point, the JVM switches to larger pointers, which means that
> effective heap space is _smaller_ for Xmx=33GB than it is for Xmx=31GB:
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
> 
> - Toke Eskildsen, State and University Library, Denmark





--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4251176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: &fq degrades qtime in a 20million doc collection

2016-01-15 Thread Anria B.
hi Yonik

We definitely didn't overlook that q=* being a wildcard scan, we just had so
many systemic problems to focus on I neglected to thank Shawn for that
particular piece of useful information. 

I must admit, I seriously never knew this. Ever since q=* was allowed I was
so happy that it never occurred to me to investigate its details.   Now I
know :)

Combining all the information from everybody here really brought home where
our shortcomings were

1. yes, the q=* was quickly replaced by q=*:* everywhere - quick win
2. caching strategies are being reformed 
3. We're looking into making smaller shards / cores since we do require
super frequent commits, so on smaller bitsets the commit times should be way
less, and we can use the smaller heap sizes to stay optimized in that realm

One last question though please :

Schema investigations :  the &fq are frequently on Multivalued string
fields, and we believe that it may also be slowing down the &Fq even more,
but we were wondering why.   When we run &fq on single valued fields its
faster than the multi-valued fields, even when the multi-valued fields
frequently have only a single value in it.

Thanks again for everybody's help and pointers and hints, you kept us busy
with changing our mindset on a lot of things here.

Regards
Anria



--
View this message in context: 
http://lucene.472066.n3.nabble.com/fq-degrades-qtime-in-a-20million-doc-collection-tp4250567p4251212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: custom field tutorial

2013-06-07 Thread Anria Billavara

You seem to know what you want the words to map to, so index the map.  Have one 
field for the word, one field for the mapped value, and at query time, search 
the words and return the mapped field. If it is comma separated, so be it and 
split it up in your code post search.
Otherwise, same as Wunder, in my many years in search this is an odd request
Anria

Sent from my Samsung smartphone on AT&T

 Original message 
Subject: Re: custom field tutorial 
From: Walter Underwood  
To: solr-user@lucene.apache.org 
CC:  

What are you trying to do? This seems really odd. I've been working in search 
for fifteen years and I've never heard this request.

You could always return all the fields to the client and ignore the ones you 
don't want.

wunder

On Jun 7, 2013, at 8:24 PM, geeky2 wrote:

> can someone point me to a "custom field" tutorial.
> 
> i checked the wiki and this list - but still a little hazy on how i would do
> this.
> 
> essentially - when the user issues a query, i want my class to interrogate a
> string field (containing several codes - example boo, baz, bar) 
> 
> and return a single integer field that maps to the string field (containing
> the code).
> 
> example: 
> 
> boo=1
> baz=2
> bar=3
> 
> thx
> mark
>