Is DataImportHandler ThreadSafe???
Hi, Just wanted to know, Is the DataImportHandler available in solr1.3 thread-safe?. I would like to use multiple instances of data import handler running concurrently and posting my various set of data from DB to Index. Can I do this by registering the DIH multiple times with various names in solrconfig.xml and then invoking all concurrently to achieve maximum throughput? Would i need to define different data-config.xml's & dataimport.properties for each DIH? If it would be possible to specify the query in data-config.xml to restrict one DIH from overlapping the data-set fetched by another DIH through some SQL clauses? -- View this message in context: http://old.nabble.com/Is-DataImportHandler-ThreadSafetp26853521p26853521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimize not having any effect on my index
Hi, Are you using the compound file format? If yes, then, have u set it properly in solrconfig.xml, if not, then, change to: true (this is by default 'false') under the tags: ... and, ... Aleksander Stensby wrote: > > Hey guys, > I'm getting some strange behavior here, and I'm wondering if I'm doing > anything wrong.. > > I've got an unoptimized index, and I'm trying to run the following > command: > http://server:8983/solr/update?optimize=true&maxSegments=10&waitFlush=false > Tried it first directly in the browser, it obviously took quite a bit of > time, but once it was finished I see no difference in my index. Same > number > of files, same size etc. > So i tried with curl: > curl http://server:8983/solr/update --data-binary '' -H > 'Content-type:text/xml; charset=utf-8' > > No difference here either... Am I doing anything wrong? Do i need to issue > a > commit after the optimize? > > Any pointers would be greatly appreciated. > > Cheers, > Aleks > > -- View this message in context: http://old.nabble.com/Optimize-not-having-any-effect-on-my-index-tp26843094p26870653.html Sent from the Solr - User mailing list archive at Nabble.com.
Search in SOLR multi cores in a single request
I have been reading the SOLR 1.3 wiki, which says that to fetch documents from each cores in a multi-cores setup we need to request each core independently. What i was under impression that SOLR multi-core feature might be using lucene's multisearcher to search among multiple cores. Anyone with clarification :) -- View this message in context: http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-tp20356173p20356173.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field collapsing (SOLR-236) and Solr 1.3.0 release version
We are about to release Field collapsing in our production site, but the index size is not as big as yours. Definitely collapsing is an added overhead. You can do some load testing and bench mark on some dataset as you would expect on your production project as SOLR-236 is currently available only as patch. Secondly, provide mechanism so you can switch it off/on depending on the condition of the production servers. One thing that you can go with is using "adjacent" field collapsing rather than simple collapsing. As internally SOLR would first sort on the collapse field to use simple collapsing, which is not the case with "adjacent" collapsing. Stephen Weiss-2 wrote: > > Hi, > > A requirement has come up in a project where we're going to need to > group by a field in the result set. I looked into the SOLR-236 patch > and it seems there are a couple versions out now that are supposed to > work against the Solr 1.3.0 release. > > This is a production site, it really can't be running anything that's > going to crash or take up too many resources. I wanted to check with > the list and see if anyone is using this patch with the Solr 1.3.0 > release and if it is stable enough / performs well enough for serious > usage. We have an index of 3M+ documents and a grouped result set > would be about 50-75% the total size of the ungrouped results. > > Thanks for any information or pointers. > > -- > Steve Weiss > Stylesight > > -- View this message in context: http://www.nabble.com/Field-collapsing-%28SOLR-236%29-and-Solr-1.3.0-release-version-tp20595266p20600959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi word Synonym
Just use the query analysis link with appropriate values. It will show how each filter factories and analyzers breaks the terms during various analysis levels. Specially check EnglishPorterFilterFactory analysis Jeff Newburn wrote: > > I am trying to figure out how the synonym filter processes multi word > inputs. I have checked the analyzer in the GUI with some confusing > results. > The indexed field has ³The North Face² as a value. The synonym file has > > morthface, morth face, noethface, noeth face, norhtface, norht face, > nortface, nort face, northfac, north fac, northfac3e, north fac3e, > northface, north face, northfae, north fae, northfaqce, north faqce, > northfave, north fave, northhace, north hace, nothface, noth face, > thenorhface, the norh face, thenorth, the north, thenorthandface, the > north > and face, thenortheface, the northe face, thenorthfac, the north fac, > thenorthface, thenorthfacee, the north facee, thenothface, the noth face, > thenotrhface, the notrh face, thenrothface, the nroth face, tnf => The > North > Face > > I have the field type using the WhiteSpaceTokenizer before the synonyms > are > running. My confusion on this is when the term ³morth fac² is run somehow > the system knows to map it to the correct term even though the term is not > present in the file. > > How is this happening? Is the synonym process tokenzing as well? > > The datatype schema is as follows: > positionIncrementGap="100"> > > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > words="stopwords.txt"/> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > protected="protwords.txt"/> > > > > > > -Jeff > > > -- View this message in context: http://www.nabble.com/Multi-word-Synonym-tp20586702p20602482.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR OOM (out of memory) problem
Hi We currently host index of size approx 12GB on 5 SOLR slaves machines, which are load balanced under cluster. At some point of time, which is after 8-10 hours, some SOLR slave would give Out of memory error, after which it just stops responding, which then requires restart and after restart it works perfectly. Sometimes we notice long query processing time of specific SOLR slaves after which we restart just to avoid any forthcoming problem. Can anyone suggest how to avoid OOM problem. Out slave SOLR are read only and we do the incremental updations during night only. Below is the snapshot of error we get as OOM: SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.util.OpenBitSet.(OpenBitSet.java:87) at org.apache.solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:61) at org.apache.solr.search.SolrIndexSearcher$9.collect(SolrIndexSearcher.java:1064) at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:292) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:133) at org.apache.lucene.search.Searcher.search(Searcher.java:117) at org.apache.lucene.search.Searcher.search(Searcher.java:96) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1061) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:801) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1237) at org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(DisMaxRequestHandler.java:315) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112) -- View this message in context: http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17364146.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR OOM (out of memory) problem
Just to add more: The JVM heap allocated is 6GB with initial heap size as 2GB. We use quadro(which is 8 cpus) on linux servers for SOLR slaves. We use facet searches, sorting. document cache is set to 7 million (which is total documents in index) filtercache 1 gurudev wrote: > > Hi > > We currently host index of size approx 12GB on 5 SOLR slaves machines, > which are load balanced under cluster. At some point of time, which is > after 8-10 hours, some SOLR slave would give Out of memory error, after > which it just stops responding, which then requires restart and after > restart it works perfectly. Sometimes we notice long query processing time > of specific SOLR slaves after which we restart just to avoid any > forthcoming problem. Can anyone suggest how to avoid OOM problem. Out > slave SOLR are read only and we do the incremental updations during night > only. Below is the snapshot of error we get as OOM: > > > SEVERE: java.lang.OutOfMemoryError: Java heap space > at org.apache.solr.util.OpenBitSet.(OpenBitSet.java:87) > at > org.apache.solr.search.DocSetHitCollector.collect(DocSetHitCollector.java:61) > at > org.apache.solr.search.SolrIndexSearcher$9.collect(SolrIndexSearcher.java:1064) > at > org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:292) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:133) > at org.apache.lucene.search.Searcher.search(Searcher.java:117) > at org.apache.lucene.search.Searcher.search(Searcher.java:96) > at > org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1061) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:801) > at > org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1237) > at > org.apache.solr.request.DisMaxRequestHandler.handleRequestBody(DisMaxRequestHandler.java:315) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) > at > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) > at > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175) > at > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) > at > org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) > at > org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) > at > org.apache.tomcat.util.net.MasterSlaveWorkerThread.run(MasterSlaveWorkerThread.java:112) > > > > -- View this message in context: http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17364150.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What are stopwords and protwords ???
Hi Akeel -Stopwords are general words of language, which, as such do not contain any meaning in searches like; a,an, the, where, who, am etc. The analyzer in lucene ignores such words and do not index them. You can also specify you own stopwords in stopwords.txt in SOLR -Protwords are the words which you do not want to be stemmed (In stemming case manager/managing/managed/manageable all are indexed as ---> manag. Same thing goes in case of searching. In case you do not want a particular word to be stemmed at index/search time just put it in protwords.txt of SOLR. Akeel wrote: > > Hi, > > I am a beginner to Solr, I have successfully indexed my db in solr. I want > to know that what are the stopwords and protwords ??? and how much they > have > effect on my search results ? > > > > Thanks in advance. > > > > -- > > Akeel > > > -- View this message in context: http://www.nabble.com/What-are-stopwords-and-protwords-tp17356365p17364189.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SOLR OOM (out of memory) problem
Hi Rong, My cache hit ratio are: filtercache: 0.96 documentcache:0.51 queryresultcache:0.58 Thanx Pravesh Yongjun Rong-2 wrote: > > I had the same problem some weeks before. You can try these: > 1. Check the hit ratio for the cache via the solr/admin/stats.jsp. If > the hit ratio is very low. Just disable those cache. It will save you > some memory. > 2. set -Xms and -Xmx to the same size will help improve GC performance. > 3. Check what's GC do you use? Default will be parallel. You can try use > concurrent GC which will help a lot. > 4. This is my sun hotspot jvm startup options: -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=50 -XX:-UseGCOverheadLimit > The above cannot solve the OOM forever. But they help a lot. > Wish this can help. > > -Original Message- > From: Mike Klaas [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 21, 2008 2:23 PM > To: solr-user@lucene.apache.org > Subject: Re: SOLR OOM (out of memory) problem > > > On 21-May-08, at 4:46 AM, gurudev wrote: > >> >> Just to add more: >> >> The JVM heap allocated is 6GB with initial heap size as 2GB. We use >> quadro(which is 8 cpus) on linux servers for SOLR slaves. >> We use facet searches, sorting. >> document cache is set to 7 million (which is total documents in index) > >> filtercache 1 > > You definitely don't have enough memory to keep 7 million document, > fully realized in java-object form, in memory. > > Nor would you want to. The document cache should aim to keep the most > frequently-occuring documents in memory (in the thousands, perhaps 10's > of thousands). By devoting more memory to the OS disk cache, more of > the 12GB index can be cached by the OS and thus speed up all document > retreival. > > -Mike > > -- View this message in context: http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17402234.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR OOM (out of memory) problem
One correction: I have set documentcache as: initialsize=512 size=710 autowarmcount=512 The total insertion in documentcache goes upto max 45 with 0 evictions in a day. Which means it never grows to 710. Thanx Mike Klaas wrote: > > > On 22-May-08, at 4:27 AM, gurudev wrote: > >> >> Hi Rong, >> >> My cache hit ratio are: >> >> filtercache: 0.96 >> documentcache:0.51 >> queryresultcache:0.58 > > Note that you may be able to reduce the _size_ of the document cache > without materially affecting the hit rate, since typically some > documents are much more frequently accessed than others. > > I'd suggest starting with 700k, which I would still consider a large > cache. > > -Mike > > > -- View this message in context: http://www.nabble.com/SOLR-OOM-%28out-of-memory%29-problem-tp17364146p17424355.html Sent from the Solr - User mailing list archive at Nabble.com.