Is DataImportHandler ThreadSafe???

2009-12-19 Thread gurudev

Hi,
Just wanted to know, Is the DataImportHandler available in solr1.3
thread-safe?. I would like to use multiple instances of data import handler
running concurrently and posting my various set of data from DB to Index.

Can I do this by registering the DIH multiple times with various names in
solrconfig.xml and then invoking all concurrently to achieve maximum
throughput? Would i need to define different data-config.xml's &
dataimport.properties for each DIH?

If it would be possible to specify the query in data-config.xml to restrict
one DIH from overlapping the data-set fetched by another DIH through some
SQL clauses?

-- 
View this message in context: 
http://old.nabble.com/Is-DataImportHandler-ThreadSafetp26853521p26853521.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Optimize not having any effect on my index

2009-12-20 Thread gurudev

Hi,

Are you using the compound file format? If yes, then, have u set it properly
in solrconfig.xml, if not, then, change to:

true (this is by default 'false') under
the tags:

...
 and, ...




Aleksander Stensby wrote:
> 
> Hey guys,
> I'm getting some strange behavior here, and I'm wondering if I'm doing
> anything wrong..
> 
> I've got an unoptimized index, and I'm trying to run the following
> command:
> http://server:8983/solr/update?optimize=true&maxSegments=10&waitFlush=false
> Tried it first directly in the browser, it obviously took quite a bit of
> time, but once it was finished I see no difference in my index. Same
> number
> of files, same size etc.
> So i tried with curl:
> curl http://server:8983/solr/update --data-binary '' -H
> 'Content-type:text/xml; charset=utf-8'
> 
> No difference here either... Am I doing anything wrong? Do i need to issue
> a
> commit after the optimize?
> 
> Any pointers would be greatly appreciated.
> 
> Cheers,
>  Aleks
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Optimize-not-having-any-effect-on-my-index-tp26843094p26870653.html
Sent from the Solr - User mailing list archive at Nabble.com.



Search in SOLR multi cores in a single request

2008-11-05 Thread gurudev

I have been reading the SOLR 1.3 wiki, which says that to fetch documents
from each cores in a multi-cores setup we need to request each core
independently.

What i was under impression that SOLR multi-core feature might be using
lucene's multisearcher to search among multiple cores.

Anyone with clarification :)
-- 
View this message in context: 
http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-tp20356173p20356173.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Field collapsing (SOLR-236) and Solr 1.3.0 release version

2008-11-20 Thread gurudev

We are about to release Field collapsing in our production site, but the
index size is not as big as yours.
Definitely collapsing is an added overhead. You can do some load testing and
bench mark on some dataset as you would expect on your production project as
SOLR-236 is currently available only as patch.

Secondly, provide mechanism so you can switch it off/on depending on the
condition of the production servers.

One thing that you can go with is using "adjacent" field collapsing rather
than simple collapsing. As internally SOLR would first sort on the collapse
field to use simple collapsing, which is not the case with "adjacent"
collapsing.



Stephen Weiss-2 wrote:
> 
> Hi,
> 
> A requirement has come up in a project where we're going to need to  
> group by a field in the result set.  I looked into the SOLR-236 patch  
> and it seems there are a couple versions out now that are supposed to  
> work against the Solr 1.3.0 release.
> 
> This is a production site, it really can't be running anything that's  
> going to crash or take up too many resources.  I wanted to check with  
> the list and see if anyone is using this patch with the Solr 1.3.0  
> release and if it is stable enough / performs well enough for serious  
> usage.  We have an index of 3M+ documents and a grouped result set  
> would be about 50-75% the total size of the ungrouped results.
> 
> Thanks for any information or pointers.
> 
> --
> Steve Weiss
> Stylesight
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Field-collapsing-%28SOLR-236%29-and-Solr-1.3.0-release-version-tp20595266p20600959.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multi word Synonym

2008-11-20 Thread gurudev

Just use the query analysis link with appropriate values. It will show how
each filter factories and analyzers breaks the terms during various analysis
levels. Specially check EnglishPorterFilterFactory analysis




Jeff Newburn wrote:
> 
> I am trying to figure out how the synonym filter processes multi word
> inputs.  I have checked the analyzer in the GUI with some confusing
> results.
> The indexed field has ³The North Face² as a value. The synonym file has
> 
> morthface, morth face, noethface, noeth face, norhtface, norht face,
> nortface, nort face, northfac, north fac, northfac3e, north fac3e,
> northface, north face, northfae, north fae, northfaqce, north faqce,
> northfave, north fave, northhace, north hace, nothface, noth face,
> thenorhface, the norh face, thenorth, the north, thenorthandface, the
> north
> and face, thenortheface, the northe face, thenorthfac, the north fac,
> thenorthface, thenorthfacee, the north facee, thenothface, the noth face,
> thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The
> North
> Face
> 
> I have the field type using the WhiteSpaceTokenizer before the synonyms
> are
> running.  My confusion on this is when the term ³morth fac² is run somehow
> the system knows to map it to the correct term even though the term is not
> present in the file.
> 
> How is this happening?  Is the synonym process tokenzing as well?
> 
> The datatype schema is as follows:
> positionIncrementGap="100">
>
>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>
> words="stopwords.txt"/>
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> protected="protwords.txt"/>
>
>
>
> 
> 
> -Jeff
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Multi-word-Synonym-tp20586702p20602482.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR OOM (out of memory) problem

2008-05-21 Thread gurudev

Hi

We currently host index of size approx 12GB on 5 SOLR slaves machines, which
are load balanced under cluster. At some point of time, which is after 8-10
hours, some SOLR slave would give Out of memory error, after which it just
stops responding, which then requires restart and after restart it works
perfectly. Sometimes we notice long query processing time of specific SOLR
slaves after which we restart just to avoid any forthcoming problem. Can
anyone suggest how to avoid OOM problem. Out slave SOLR are read only and we
do the incremental updations during night only. Below is the snapshot of
error we get as OOM: 


SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.solr.util.OpenBitSet.(OpenBitSet.java:87)
at
org.apache.solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:61)
at
org.apache.solr.search.SolrIndexSearcher$9.collect(SolrIndexSearcher.java:1064)
at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:292)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:133)
at org.apache.lucene.search.Searcher.search(Searcher.java:117)
at org.apache.lucene.search.Searcher.search(Searcher.java:96)
at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1061)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:801)
at
org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1237)
at
org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(DisMaxRequestHandler.java:315)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
at
org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)



-- 
View this message in context: 
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17364146.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR OOM (out of memory) problem

2008-05-21 Thread gurudev

Just to add more:

The JVM heap allocated is 6GB with initial heap size as 2GB. We use
quadro(which is 8 cpus) on linux servers for SOLR slaves.
We use facet searches, sorting.
document cache is set to 7 million (which is total documents in index)
filtercache 1




gurudev wrote:
> 
> Hi
> 
> We currently host index of size approx 12GB on 5 SOLR slaves machines,
> which are load balanced under cluster. At some point of time, which is
> after 8-10 hours, some SOLR slave would give Out of memory error, after
> which it just stops responding, which then requires restart and after
> restart it works perfectly. Sometimes we notice long query processing time
> of specific SOLR slaves after which we restart just to avoid any
> forthcoming problem. Can anyone suggest how to avoid OOM problem. Out
> slave SOLR are read only and we do the incremental updations during night
> only. Below is the snapshot of error we get as OOM: 
> 
> 
> SEVERE: java.lang.OutOfMemoryError: Java heap space
> at org.apache.solr.util.OpenBitSet.(OpenBitSet.java:87)
> at
> org.apache.solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:61)
> at
> org.apache.solr.search.SolrIndexSearcher$9.collect(SolrIndexSearcher.java:1064)
> at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:292)
> at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:133)
> at org.apache.lucene.search.Searcher.search(Searcher.java:117)
> at org.apache.lucene.search.Searcher.search(Searcher.java:96)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1061)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:801)
> at
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1237)
> at
> org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(DisMaxRequestHandler.java:315)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> at
> org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
> at
> org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)
> at
> org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
> at
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
> at
> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
> at
> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
> at
> org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112)
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17364150.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: What are stopwords and protwords ???

2008-05-21 Thread gurudev

 Hi Akeel

-Stopwords are general words of language, which, as such do not contain any
meaning in searches like; a,an, the, where, who, am etc. The analyzer in
lucene ignores such words and do not index them. You can also specify you
own stopwords in stopwords.txt in SOLR

-Protwords are the words which you do not want to be stemmed (In stemming
case manager/managing/managed/manageable all are indexed as ---> manag. Same
thing goes in case of searching. In case you do not want a particular word
to be stemmed at index/search time just put it in protwords.txt of SOLR.




Akeel wrote:
> 
> Hi,
> 
> I am a beginner to Solr, I have successfully indexed my db in solr. I want
> to know that what are the stopwords and protwords ??? and how much they
> have
> effect on my search results ?
> 
>  
> 
> Thanks in advance.
> 
>  
> 
> --
> 
> Akeel
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/What-are-stopwords-and-protwords-tp17356365p17364189.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: SOLR OOM (out of memory) problem

2008-05-22 Thread gurudev

Hi Rong,

My cache hit ratio are:

filtercache: 0.96
documentcache:0.51
queryresultcache:0.58

Thanx
Pravesh


Yongjun Rong-2 wrote:
> 
> I had the same problem some weeks before. You can try these:
> 1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If
> the hit ratio is very low. Just disable those cache. It will save you
> some memory.
> 2. set -Xms and -Xmx to the same size will help improve GC performance. 
> 3. Check what's GC do you use? Default will be parallel. You can try use
> concurrent GC which will help a lot.
> 4. This is my sun hotspot jvm startup options: -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit
> The above cannot solve the OOM forever. But they help a lot.
> Wish this can help.
> 
> -Original Message-
> From: Mike Klaas [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, May 21, 2008 2:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR OOM (out of memory) problem
> 
> 
> On 21-May-08, at 4:46 AM, gurudev wrote:
> 
>>
>> Just to add more:
>>
>> The JVM heap allocated is 6GB with initial heap size as 2GB. We use 
>> quadro(which is 8 cpus) on linux servers for SOLR slaves.
>> We use facet searches, sorting.
>> document cache is set to 7 million (which is total documents in index)
> 
>> filtercache 1
> 
> You definitely don't have enough memory to keep 7 million document,
> fully realized in java-object form, in memory.
> 
> Nor would you want to.  The document cache should aim to keep the most
> frequently-occuring documents in memory (in the thousands, perhaps 10's
> of thousands).  By devoting more memory to the OS disk cache, more of
> the 12GB index can be cached by the OS and thus speed up all document
> retreival.
> 
> -Mike
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17402234.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: SOLR OOM (out of memory) problem

2008-05-23 Thread gurudev

One correction:

I have set documentcache as:

initialsize=512
size=710
autowarmcount=512

The total insertion in documentcache goes upto max 45 with 0 evictions
in a day. Which means it never grows to 710.

Thanx


Mike Klaas wrote:
> 
> 
> On 22-May-08, at 4:27 AM, gurudev wrote:
> 
>>
>> Hi Rong,
>>
>> My cache hit ratio are:
>>
>> filtercache: 0.96
>> documentcache:0.51
>> queryresultcache:0.58
> 
> Note that you may be able to reduce the _size_ of the document cache  
> without materially affecting the hit rate, since typically some  
> documents are much more frequently accessed than others.
> 
> I'd suggest starting with 700k, which I would still consider a large  
> cache.
> 
> -Mike
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17424355.html
Sent from the Solr - User mailing list archive at Nabble.com.