Scaling Solr - Suggestions !!
Hello, *Background* :For each of our customers, we create 3 solr webapps with different search schema's,serving different search requirements and we have about 70 customers.So we have about 210 webapps curently . *Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram :32GB , Permgen initally 1GB ,now increased to 2GB. *Solr Indexes* : Most are the order of a few MB ,about 2 big index of about 3GB each *Scaling Step 1 *: We saw the permgen value go upto to nearly 850 mb ,when we created so many webapps ,hence now we are moving to solr cores and we are going to have about 50 cores per webapp ,bringing the number of webapps to about 5 . We want to distribute the cores with multiple webapps to avoid a single point of failure. *Requirement* : - We need to only scale the cores horizontally ,whose index sizes are big. - We also require permission based search for each webapp ,would solr NRT fit our needs ,where we can index the permission into the document ,which would mean there would be frequent addition and deletion of permissions to the documents across cores. - We also require automatic fail over What technology would be ideal fit given Solr Cloud ,Katta , Solandra ,Lily,Elastic Search etc [Preferably Open source] [ We would be required to maintain many webapps with multicores ] and what about the commercial offering given out use case Thanks. Regards, Sujatha
Re: commit fail
Hi, This is what the thread dump looks like. Any ideas? Mav Java HotSpot(TM) 64-Bit Server VM20.1-b02Thread Count: current=19, peak=20, daemon=6'DestroyJavaVM' Id=26, RUNNABLE on lock=, total cpu time=198450.ms user time=196890.ms'Timer-2' Id=25, TIMED_WAITING on lock=java.util.TaskQueue@33799a1e, total cpu time=0.ms user time=0.msat java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) at java.util.TimerThread.run(Timer.java:462) 'pool-3-thread-1' Id=24, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@ 747541f8, total cpu time=0.ms user time=0.msat sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947 ) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 907) at java.lang.Thread.run(Thread.java:662) 'pool-1-thread-1' Id=23, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@ 3e3e3c83, total cpu time=480.ms user time=460.msat sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await (AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947 ) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java: 907) at java.lang.Thread.run(Thread.java:662) 'Timer-1' Id=21, TIMED_WAITING on lock=java.util.TaskQueue@67f6dc61, total cpu time=180.ms user time=120.msat java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:509) at java.util.TimerThread.run(Timer.java:462) '2021372560@qtp-1535043768-9 - Acceptor0 SocketConnector@0.0.0.0:8983' Id=20, RUNNABLE on lock=, total cpu time=60.ms user time=60.msat java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at org.mortbay.jetty.bio.SocketConnector.accept(SocketConnector.java:99) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708 ) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:58 2) '1384828782@qtp-1535043768-8' Id=19, TIMED_WAITING on lock=org.mortbay.thread.QueuedThreadPool$PoolThread@528acf6e, total cpu time=274160.ms user time=273060.msat java.lang.Object.wait(Native Method) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:62 6) '1715374531@qtp-1535043768-7' Id=18, RUNNABLE on lock=, total cpu time=15725890.ms user time=15723380.msat sun.management.ThreadImpl.getThreadInfo1(Native Method) at sun.management.ThreadImpl.getThreadInfo(ThreadImpl.java:154) at org.apache.jsp.admin.threaddump_jsp._jspService(org.apache.jsp.admin.thread dump_jsp:264) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:109) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java: 389) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:486) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:380) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:327) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java :275) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandle r.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.hand
question about NRT(soft commit) and Transaction Log in trunk
hi I checked out the trunk and played with its new soft commit feature. it's cool. But I've got a few questions about it. By reading some introductory articles and wiki, and hasted code reading, my understand of it's implementation is: For normal commit(hard commit), we should flush all into disk and commit it. flush is not very time consuming because of os level cache. the most time consuming one is sync in commit process. Soft commit just flush postings and pending deletions into disk and generating new segments. Then solr can use a new searcher to read the latest indexes and warm up and then register itself. if there is no hard commit and the jvm crashes, then new data may lose. if my understanding is correct, then why we need transaction log? I found in DirectUpdateHandler2, every time a command is executed, TransactionLog will record a line in log. But the default sync level in RunUpdateProcessorFactory is flush, which means it will not sync the log file. does this make sense? in database implementation, we usually write log and modify data in memory because log is smaller than real data. if crashes. we can redo the unfinished log and make data correct. will Solr leverage this log like this? if it is, why it's not synced?
Re: Scaling Solr - Suggestions !!
Just my opinion, but I'm not sure I see the value in deploying the cores to different webapps in a single container on a single machine to avoid a single point of failure... You still have a single point of failure at the process level down to the hardware, which when you think about it, is mostly everything. But perhaps you're at least using more than one container. It sounds to me that the easiest route to scalability for you would be to add more machines. Unless your cores are particularly complex or your traffic is heavy, a 3GB core should be no match for a single machine. And the traffic problem can be solved by replication and load balancing. Michael On Sat, 2012-04-28 at 13:24 +0530, Sujatha Arun wrote: > Hello, > > *Background* :For each of our customers, we create 3 solr webapps with > different search schema's,serving different search requirements and we > have about 70 customers.So we have about 210 webapps curently . > > *Hardware*: Single Server , one JVM , Heap memory 19GB ,Total Ram :32GB , > Permgen initally 1GB ,now increased to 2GB. > > *Solr Indexes* : Most are the order of a few MB ,about 2 big index of > about 3GB each > > *Scaling Step 1 *: We saw the permgen value go upto to nearly 850 mb ,when > we created so many webapps ,hence now we are moving to solr cores and we > are going to have about 50 cores per webapp ,bringing the number of webapps > to about 5 . We want to distribute the cores with multiple webapps to avoid > a single point of failure. > > > *Requirement* : > > >- We need to only scale the cores horizontally ,whose index sizes are >big. >- We also require permission based search for each webapp ,would solr >NRT fit our needs ,where we can index the permission into the document >,which would mean there would be frequent addition and deletion of >permissions to the documents across cores. >- We also require automatic fail over > > What technology would be ideal fit given Solr Cloud ,Katta , Solandra > ,Lily,Elastic Search etc [Preferably Open source] [ We would be required to > maintain many webapps with multicores ] and what about the commercial > offering given out use case > > Thanks. > > Regards, > Sujatha
Re: commit fail
On Sat, Apr 28, 2012 at 7:02 AM, mav.p...@holidaylettings.co.uk wrote: > Hi, > > This is what the thread dump looks like. > > Any ideas? Looks like the thread taking up CPU is in LukeRequestHandler > 1062730578@qtp-1535043768-5' Id=16, RUNNABLE on lock=, total cpu > time=16156160.ms user time=16153110.msat > org.apache.solr.handler.admin.LukeRequestHandler.getIndexedFieldsInfo(LukeR > equestHandler.java:320) That probably accounts for the 1 CPU doing things... but it's not clear at all why commits are failing. Perhaps the commit is succeeding, but the client is just not waiting long enough for it to complete? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: change index/store at indexing time
I can call a script for the logic part but what I want to figure out is how to save the same field sometimes as stored and indexed, sometimes as stored not indexed, etc. From a transformer or a script I didn't see anything where I can modify that at indexing time. Thanks a lot, Maria On Apr 27, 2012, at 18:38, "Bill Bell" wrote: > Yes you can. Just use a script that is called for each row. > > Bill Bell > Sent from mobile > > > On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)" > wrote: > >> Hi, >> I'm migrating a project from Lucene 2.9 to Solr 3.4. >> There is a special case in the code that indexes the same field in two >> different ways, which is completely legal in Lucene directly but I don't >> know how to duplicate this same behavior in Solr: >> >> if (isFirstGeo) { >>document.add(new Field("geoids", geoId, Field.Store.YES, >> Field.Index.NOT_ANALYZED_NO_NORMS)); >>isFirstGeo = false; >> } else { >>if (countProducts < 100) >> document.add(new Field("geoids", geoId, Field.Store.NO, >> Field.Index.NOT_ANALYZED_NO_NORMS)); >>else >> document.add(new Field("geoids", geoId, Field.Store.YES, >> Field.Index.NO)); >> } >> >> Is there any way to do this in Solr in a Tranformer? I'm using the DIH to >> index and I can't see a way to do this other than having three fields in the >> schema like geoids_store_index, geoids_nostore_index, and >> geoids_store_noindex. >> >> Thanks a lot in advance. >> Maria >> >> >>
SolrJ core admin - can it share server objects with queries?
I have a SolrJ application that uses the core admin as well as doing queries against each core. I have an object of my own design for each core that uses SolrJ directly. Two of my core objects (one for build and one for live) are used in an object that represents a shard, and multiple shard objects are used in an object that represents an entire index chain. Within the core object, I am currently creating two solr server objects. One has the URL ending in "/solr" and is used for CoreAdminRequest. The other includes the core name and is used for updates/queries. I have taken steps to share the first server object between core objects when the host and port are the same. My question - could I use one server object for both of these, or am I doing things correctly? I guess it comes down to whether or not there is any way to specify a core when doing queries or updates. I have not been able to see a way to do it. If there is a way, I could reduce the number of server objects that my program uses. Currently there are two index chains, each of which has seven shards. With two cores per shard, the program builds 28 server objects for queries. Since I have four servers, I also end up with four shared server objects for CoreAdminRequest. If there's a way to specify the core for queries, I would only need those four shared objects. If such a capability doesn't already exist, should I file a jira issue? Thanks, Shawn
Re: change index/store at indexing time
Maria, For your need please define unique pattern using dynamic field in schema.xml Please have a look http://wiki.apache.org/solr/SchemaXml#Dynamic_fields Hope that helps! -Jeevanandam Technology keeps you connected! On Apr 28, 2012, at 10:33 PM, "Vazquez, Maria (STM)" wrote: > I can call a script for the logic part but what I want to figure out is how > to save the same field sometimes as stored and indexed, sometimes as stored > not indexed, etc. From a transformer or a script I didn't see anything where > I can modify that at indexing time. > Thanks a lot, > Maria > > > On Apr 27, 2012, at 18:38, "Bill Bell" wrote: > >> Yes you can. Just use a script that is called for each row. >> >> Bill Bell >> Sent from mobile >> >> >> On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)" >> wrote: >> >>> Hi, >>> I'm migrating a project from Lucene 2.9 to Solr 3.4. >>> There is a special case in the code that indexes the same field in two >>> different ways, which is completely legal in Lucene directly but I don't >>> know how to duplicate this same behavior in Solr: >>> >>> if (isFirstGeo) { >>> document.add(new Field("geoids", geoId, Field.Store.YES, >>> Field.Index.NOT_ANALYZED_NO_NORMS)); >>> isFirstGeo = false; >>> } else { >>> if (countProducts < 100) >>>document.add(new Field("geoids", geoId, Field.Store.NO, >>> Field.Index.NOT_ANALYZED_NO_NORMS)); >>> else >>>document.add(new Field("geoids", geoId, Field.Store.YES, >>> Field.Index.NO)); >>> } >>> >>> Is there any way to do this in Solr in a Tranformer? I'm using the DIH to >>> index and I can't see a way to do this other than having three fields in >>> the schema like geoids_store_index, geoids_nostore_index, and >>> geoids_store_noindex. >>> >>> Thanks a lot in advance. >>> Maria >>> >>> >>>
Re: change index/store at indexing time
Thanks Jeevanandam. That still doesn't have the same behavior as Lucene since multiple fields with different names have to be created. What I want is this exactly (multi-value field) document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); document.add(new Field("geoids", geoId, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS)); In Lucene I can save geoids first as stored and in the next line as not stored and it will do exactly that. I want to duplicate this behavior in Solr but I can't do it having only one field in the schema called geoids that I an manipulate at inde time whether to store or not depending on a condition. Thanks again for the help, hope this explanation makes it more clear in what I'm trying to do. Maria On Apr 28, 2012, at 11:49 AM, "Jeevanandam" mailto:je...@myjeeva.com>> wrote: Maria, For your need please define unique pattern using dynamic field in schema.xml Please have a look http://wiki.apache.org/solr/SchemaXml#Dynamic_fields Hope that helps! -Jeevanandam Technology keeps you connected! On Apr 28, 2012, at 10:33 PM, "Vazquez, Maria (STM)" mailto:maria.vazq...@dexone.com>> wrote: I can call a script for the logic part but what I want to figure out is how to save the same field sometimes as stored and indexed, sometimes as stored not indexed, etc. From a transformer or a script I didn't see anything where I can modify that at indexing time. Thanks a lot, Maria On Apr 27, 2012, at 18:38, "Bill Bell" mailto:billnb...@gmail.com>> wrote: Yes you can. Just use a script that is called for each row. Bill Bell Sent from mobile On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)" mailto:maria.vazq...@dexone.com>> wrote: Hi, I'm migrating a project from Lucene 2.9 to Solr 3.4. There is a special case in the code that indexes the same field in two different ways, which is completely legal in Lucene directly but I don't know how to duplicate this same behavior in Solr: if (isFirstGeo) { document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); isFirstGeo = false; } else { if (countProducts < 100) document.add(new Field("geoids", geoId, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS)); else document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NO)); } Is there any way to do this in Solr in a Tranformer? I'm using the DIH to index and I can't see a way to do this other than having three fields in the schema like geoids_store_index, geoids_nostore_index, and geoids_store_noindex. Thanks a lot in advance. Maria
Re: Weird query results with edismax and boolean operator +
Hi, What is your "qf" parameter? Can you run the three queries with debugQuery=true&echoParams=all and attach parsed query and all params? It will probably explain what is happening. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 27. apr. 2012, at 11:21, Vadim Kisselmann wrote: > Hi folks, > > i use solr 4.0 from trunk, and edismax as standard query handler. > In my schema i defined this: > > I have this simple problem: > > nascar +author:serg* (3500 matches) > > +nascar +author:serg* (1 match) > > nascar author:serg* (5200 matches) > > nascar AND author:serg* (1 match) > > I think i understand the query syntax, but this behavior confused me. > Why this match-differences? > > By the way, i get in all matches at least one of my terms. > But not always both. > > Best regards > Vadim
Re: CJKBigram filter questons: single character queries, bigrams created across sript/character types
This does not address the question. A single-ideogram query will not find ideograms in the middle of phrases. I have also found that phrase slop does not work with bigrams. At all. I created a separate field type with unigrams. The CJK fields use the StandardAnalyzer. I made a stack with just the SA which gives raw Euro text and single terms for CJK ideograms. This worked well for direct phrase and phrase slop queries. You should use both kinds of fields- the bigram search helps boost similar phrases. You should also try the SmartChineseAnalyzer and new Japanese analyzer suite. I've discovered that CJK search is a very tricky thing, and different use cases like different strategies. On Fri, Apr 27, 2012 at 10:57 AM, Walter Underwood wrote: > Bigrams across character types seems like a useful thing, especially for > indexing adjective and verb endings. > > An n-gram approach is always going to generate a lot of junk along with the > gold. Tighten the rules and good stuff is missed, guaranteed. The only way to > sort it out is to use a tokenizer with some linguistic rules. > > wunder > > On Apr 27, 2012, at 10:43 AM, Burton-West, Tom wrote: > >> I have a few questions about the CJKBigram filter. >> >> About 10% of our queries that contain Han characters are single character >> queries. It looks like the CJKBigram filter only outputs single characters >> when there are no adjacent bigrammable characters in the input. This means >> we would have to create a separate field to index Han unigrams in order to >> address single character queries. Is this correct? >> >> For Japanese, the default settings form bigrams across character types. So >> for a string containing Hiragana and Han characters bigrams containing a >> mixture of Hiragana and Han characters are formed: >> いろは革命歌 => “いろ” ”ろは“ “は革” ”革命” “命歌” >> >> Is there a way to specify that you don’t want bigrams across character types? >> >> Tom >> >> Tom Burton-West >> Digital Library Production Service >> University of Michigan Library >> >> http://www.hathitrust.org/blogs/large-scale-search >> > > > > > -- Lance Norskog goks...@gmail.com
Re: change index/store at indexing time
Maria, thanks for detailed explanation. as per schema.xml; stored or indexed should be defined at design-time. Per my understanding defining at runtime is not feasible. BTW, you can have multiValued="true" attribute for dynamic fields too. - Jeevanandam On 29-04-2012 2:06 am, Vazquez, Maria (STM) wrote: Thanks Jeevanandam. That still doesn't have the same behavior as Lucene since multiple fields with different names have to be created. What I want is this exactly (multi-value field) document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); document.add(new Field("geoids", geoId, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS)); In Lucene I can save geoids first as stored and in the next line as not stored and it will do exactly that. I want to duplicate this behavior in Solr but I can't do it having only one field in the schema called geoids that I an manipulate at inde time whether to store or not depending on a condition. Thanks again for the help, hope this explanation makes it more clear in what I'm trying to do. Maria On Apr 28, 2012, at 11:49 AM, "Jeevanandam" mailto:je...@myjeeva.com>> wrote: Maria, For your need please define unique pattern using dynamic field in schema.xml Please have a look http://wiki.apache.org/solr/SchemaXml#Dynamic_fields Hope that helps! -Jeevanandam Technology keeps you connected! On Apr 28, 2012, at 10:33 PM, "Vazquez, Maria (STM)" mailto:maria.vazq...@dexone.com>> wrote: I can call a script for the logic part but what I want to figure out is how to save the same field sometimes as stored and indexed, sometimes as stored not indexed, etc. From a transformer or a script I didn't see anything where I can modify that at indexing time. Thanks a lot, Maria On Apr 27, 2012, at 18:38, "Bill Bell" mailto:billnb...@gmail.com>> wrote: Yes you can. Just use a script that is called for each row. Bill Bell Sent from mobile On Apr 27, 2012, at 6:38 PM, "Vazquez, Maria (STM)" mailto:maria.vazq...@dexone.com>> wrote: Hi, I'm migrating a project from Lucene 2.9 to Solr 3.4. There is a special case in the code that indexes the same field in two different ways, which is completely legal in Lucene directly but I don't know how to duplicate this same behavior in Solr: if (isFirstGeo) { document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS)); isFirstGeo = false; } else { if (countProducts < 100) document.add(new Field("geoids", geoId, Field.Store.NO, Field.Index.NOT_ANALYZED_NO_NORMS)); else document.add(new Field("geoids", geoId, Field.Store.YES, Field.Index.NO)); } Is there any way to do this in Solr in a Tranformer? I'm using the DIH to index and I can't see a way to do this other than having three fields in the schema like geoids_store_index, geoids_nostore_index, and geoids_store_noindex. Thanks a lot in advance. Maria