Solr Replication
Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire server and it's picking an arbitrary slice to report on. So I'm curious if every slice gets its own replication handler mbean? This is important because I have no way of knowing in this specific server any information about the other slices, in particular, information about the master/slave value for the other slices. Reading through the Solr 1.4 replication strategy, I saw that a slice can be configured to be a master and a slave, i.e. a repeater. I'm wondering how repeaters work because let's say I have a slice named 'A' and the master is on server 1 and the slave is on server 2 then how are these two servers communicating to replicate? Looking at the jmx information I have in the MBean both the isSlave and isMaster is set to true for my repeater so how does this solr slice know if it's the master or slave? I'm a bit confused. Thanks. _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery
RE: Solr Replication
Thanks for the response. It's interesting because when I run jconsole all I can see is one ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice it finds on its path. Is there anyway to have multiple replication handlers or at least obtain replication on a per "slice"/"instance" via JMX like how you can see attributes for each "slice"/"instance" via each replication admin jsp page? Thanks again. > From: noble.p...@corp.aol.com > Date: Wed, 26 Aug 2009 11:05:34 +0530 > Subject: Re: Solr Replication > To: solr-user@lucene.apache.org > > The ReplicationHandler is not enforced as a singleton , but for all > practical purposes it is a singleton for one core. > > If an instance (a slice as you say) is setup as a repeater, It can > act as both a master and slave > > in the repeater the configuration should be as follows > > MASTER > |_SLAVE (I am a slave of MASTER) > | > REPEATER (I am a slave of MASTER and master to my slaves ) > | > | > REPEATER_SLAVE( of REPEATER) > > > the point is that REPEATER will have a slave section has a masterUrl > which points to master and REPEATER_SLAVE will have a slave section > which has a masterurl pointing to repeater > > > > > > > On Wed, Aug 26, 2009 at 12:40 AM, J G wrote: > > > > Hello, > > > > We are running multiple slices in our environment. I have enabled JMX and I > > am inspecting the replication handler mbean to obtain some information > > about the master/slave configuration for replication. Is the replication > > handler mbean a singleton? I only see one mbean for the entire server and > > it's picking an arbitrary slice to report on. So I'm curious if every slice > > gets its own replication handler mbean? This is important because I have no > > way of knowing in this specific server any information about the other > > slices, in particular, information about the master/slave value for the > > other slices. > > > > Reading through the Solr 1.4 replication strategy, I saw that a slice can > > be configured to be a master and a slave, i.e. a repeater. I'm wondering > > how repeaters work because let's say I have a slice named 'A' and the > > master is on server 1 and the slave is on server 2 then how are these two > > servers communicating to replicate? Looking at the jmx information I have > > in the MBean both the isSlave and isMaster is set to true for my repeater > > so how does this solr slice know if it's the master or slave? I'm a bit > > confused. > > > > Thanks. > > > > > > > > > > _ > > With Windows Live, you can organize, edit, and share your photos. > > http://www.windowslive.com/Desktop/PhotoGallery > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
master/slave replication issue
Hello, I'm having an issue getting the master to replicate its index to the slave. Below you will find my configuration settings. Here is what is happening: I can access the replication dashboard for both the slave and master and I can successfully execute HTTP commands against both of these urls through my browser. Now, my slave is configured to use the same URL as the one I am using in my browser when I query the master, yet when I do a tail -f /logs/catalina.out on the slave server all I see is : Master - server1.xyz.com Aug 27, 2009 12:13:29 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=4 Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime=8 Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute INFO: [] webapp=null path=null params={command=details} status=0 QTime= For some reason, the webapp and the path is being set to null and I "think" this is affecting the replication?!? I am running Solr as the WAR file and it's 1.4 from a few weeks ago. optimize optimize Notice that I commented out the replication of the configuration files. I didn't think this is important for the attempt to try to get replication working. However, is it good to have these files replicated? Slave - server2.xyz.com http://server1.xyz.com:8080/jdoe/replication 00:00:20 internal 5000 1 username password Thanks for your help! _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
RE: Solr Replication
We have multiple solr webapps all running from the same WAR file. Each webapp is running under the same Tomcat container and I consider each webapp the same thing as a "slice" (or "instance"). I've configured the Tomcat container to enable JMX and when I connect using JConsole I only see the replication handler for one of the webapps in the server. I was under the impression each webapp gets its own replication handler. Is this not true? It would be nice to be able to have a JMX MBean for each replication handler in the container so we can get all the same replication information using JMX as in using the replication admin page for each web app. Thanks. > From: noble.p...@corp.aol.com > Date: Thu, 27 Aug 2009 13:04:38 +0530 > Subject: Re: Solr Replication > To: solr-user@lucene.apache.org > > when you say a slice you mean one instance of solr? So your JMX > console is connecting to only one solr? > > On Thu, Aug 27, 2009 at 3:19 AM, J G wrote: > > > > Thanks for the response. > > > > It's interesting because when I run jconsole all I can see is one > > ReplicationHandler jmx mbean. It looks like it is defaulting to the first > > slice it finds on its path. Is there anyway to have multiple replication > > handlers or at least obtain replication on a per "slice"/"instance" via JMX > > like how you can see attributes for each "slice"/"instance" via each > > replication admin jsp page? > > > > Thanks again. > > > >> From: noble.p...@corp.aol.com > >> Date: Wed, 26 Aug 2009 11:05:34 +0530 > >> Subject: Re: Solr Replication > >> To: solr-user@lucene.apache.org > >> > >> The ReplicationHandler is not enforced as a singleton , but for all > >> practical purposes it is a singleton for one core. > >> > >> If an instance (a slice as you say) is setup as a repeater, It can > >> act as both a master and slave > >> > >> in the repeater the configuration should be as follows > >> > >> MASTER > >> |_SLAVE (I am a slave of MASTER) > >> | > >> REPEATER (I am a slave of MASTER and master to my slaves ) > >> | > >> | > >> REPEATER_SLAVE( of REPEATER) > >> > >> > >> the point is that REPEATER will have a slave section has a masterUrl > >> which points to master and REPEATER_SLAVE will have a slave section > >> which has a masterurl pointing to repeater > >> > >> > >> > >> > >> > >> > >> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote: > >> > > >> > Hello, > >> > > >> > We are running multiple slices in our environment. I have enabled JMX > >> > and I am inspecting the replication handler mbean to obtain some > >> > information about the master/slave configuration for replication. Is the > >> > replication handler mbean a singleton? I only see one mbean for the > >> > entire server and it's picking an arbitrary slice to report on. So I'm > >> > curious if every slice gets its own replication handler mbean? This is > >> > important because I have no way of knowing in this specific server any > >> > information about the other slices, in particular, information about the > >> > master/slave value for the other slices. > >> > > >> > Reading through the Solr 1.4 replication strategy, I saw that a slice > >> > can be configured to be a master and a slave, i.e. a repeater. I'm > >> > wondering how repeaters work because let's say I have a slice named 'A' > >> > and the master is on server 1 and the slave is on server 2 then how are > >> > these two servers communicating to replicate? Looking at the jmx > >> > information I have in the MBean both the isSlave and isMaster is set to > >> > true for my repeater so how does this solr slice know if it's the master > >> > or slave? I'm a bit confused. > >> > > >> > Thanks. > >> > > >> > > >> > > >> > > >> > _ > >> > With Windows Live, you can organize, edit, and share your photos. > >> > http://www.windowslive.com/Desktop/PhotoGallery > >> > >> > >> > >> -- > >> - > >> Noble Paul | Principal Engineer| AOL | http://aol.com > > > > _ > > Hotmail® is up to 70% faster. Now good news travels really fast. > > http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009 > > > > -- > - > Noble Paul | Principal Engineer| AOL | http://aol.com _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery
solr jmx connection
Hello, I have a SOLR JMX connection issue. I am running my JMX MBeanServer through Tomcat, meaning I am using Tomcat's MBeanServer rather than any other MBeanServer implemenation. I am having a hard time trying to figure out the correct JMX Service URL on my localhost for the accessing the SOLR MBeans. My current configuration consists of the following: JMX Service url = localhost:9000/jmxrmi So I have configured JMX to run on port 9000 on tomcat on my localhost and using the above service url i can access the tomcat jmx MBeanServer and get related JVM object information(e.g. I can access the MemoryMXBean object) However, I am having a harder time trying to access the SOLR MBeans. First, I could have the wrong service URL. Second, I'm confused as to which MBeans SOLR provides. You might be asking why am I creating my own client rather than using JConsole, but JConsole doesn't provide the features I need. Anyone with any knowledge or code snippets would be a huge help! Thank you for your time! Regards _ Hotmail® has ever-growing storage! Don’t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009
JMX monitoring for multiple SOLR instances
Hi, If I want to run multiple SOLR war files in tomcat is it possible to monitor each of the SOLR instances individually through JMX? Has anyone attempted this before? Also, what are the implications (e.g. performance) of runnign mulitple SOLR instances in the same tomcat server? Thanks. _ Windows Live™: Keep your life in sync. http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009
Obtaining SOLR index size on disk
Hello, Is it possible to obtain the SOLR index size on disk through the SOLR API? I've read through the docs and mailing list questions but can't seem to find the answer. Any help is appreciated. Thanks. _ Hotmail® has ever-growing storage! Don’t worry about storage limits. http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009
Solr Spellcheck on Large index size
I am trying to create a spell checker for my companies website. Currently there are approx 29million documents in the index. When trying to create the spelling index it just seems to skip over the command. My fields in schema.xml look like the following: And copying fields as such: My spell checker config looks like the following: default spell true true C:\Users\kyleg\apache-solr-1.4.0\productGroups\solr\data\spellchecker solr.FileBasedSpellChecker file spellings.txt UTF-8 ./spellcheckerFile The command that I am sending to try to build looks like the following: http://localhost:8983/solr/spell/?q=ACORA&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.dictionary=default&spellcheck.build=true&spellcheck.collate=true&spellcheck.limit=5 I have also tried to reduce the size of the index to around 10,000 documents and still no luck. Any help would be appreciated. Thank you, Kyle -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Spellcheck-on-Large-index-size-tp760416p760416.html Sent from the Solr - User mailing list archive at Nabble.com.
Filter multivalue fields from search result
Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1 & 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
Re: Filter multivalue fields from search result
Hi, So if those are separate documents how should I handle paging? Two separate queries? First to return all matching courses-events pairs, and second one to get courses for given page? Is this common design described in details somewhere? Thanks, Alex On 2010-07-09 01:50, Lance Norskog wrote: Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 "Alex J. G. Burzyński": Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1& 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex
Re: Filter multivalue fields from search result
Hi Chantal, The paging problem I've asked about is that having course-event pairs and specifying rows limits the number of pairs returned not the courses +---+--+++ | id-id | name | town | date | +---+--+++ | 1-1 | Microsoft Excel | London | 2010-08-20 | | 1-2 | Microsoft Excel | Glasgow| 2010-08-24 | | 1-3 | Microsoft Excel | Leeds | 2010-08-28 | | 2-1 | Microsoft Word | Aberdeen | 2010-08-21 | | 2-2 | Microsoft Word | Reading| 2010-08-25 | | 2-3 | Microsoft Word | London | 2010-08-29 | | 3-1 | Microsoft Powerpoint | Birmingham | 2010-08-22 | | 3-2 | Microsoft Powerpoint | Leeds | 2010-08-26 | | 3-3 | Microsoft Powerpoint | Leeds | 2010-08-30 | +---+--+++ And from UI point of view I'm returning less courses then events - that's why I've asked about paging. The search for q=name:Microsoft town:Leeds with rows=2 should return: 1-3 & 3-2 & 3-3 But 3-3 will be obviously on page 2. I hope that it makes my questions more clear. Thanks, Alex On 2010-07-12 10:26, Chantal Ackermann wrote: Hi Alex, I think you have to explain the complete use case. Paging is done by specifying the parameter "start" (and "rows" if you want to have more or less than 10 hits per page). For each page you need of course a new query, but the queries differ only in the parameter value "start" (first page start=0, second page start=10 etc. if rows=10). The other parameters remain the same. You should also have a look at facets. They might help you to get a list of the values of your multi valued fields that you can display in the UI, allowing the user to drill down the results further. Chantal On Mon, 2010-07-12 at 10:26 +0200, "Alex J. G. Burzyński" wrote: Hi, So if those are separate documents how should I handle paging? Two separate queries? First to return all matching courses-events pairs, and second one to get courses for given page? Is this common design described in details somewhere? Thanks, Alex On 2010-07-09 01:50, Lance Norskog wrote: Yes, denormalizing the index into separate (name,town) pairs is the common design for this problem. 2010/7/8 "Alex J. G. Burzyński": Hi, Is it possible to remove from search results the multivalued fields that don't pass the search criteria? My schema is defined as: And example docs are: ++--+++ | id | name | town | date | ++--+++ | 1 | Microsoft Excel | London | 2010-08-20 | || | Glasgow| 2010-08-24 | || | Leeds | 2010-08-28 | | 2 | Microsoft Word | Aberdeen | 2010-08-21 | || | Reading| 2010-08-25 | || | London | 2010-08-29 | | 2 | Microsoft Powerpoint | Birmingham | 2010-08-22 | || | Leeds | 2010-08-26 | ++--+++ so the query for q=name:Microsoft town:Leeds returns docs 1& 3. How would I remove London/Glasgow from doc 1 and Birmingham from doc 3? Or is it that I should create separate doc for each name-event? Thanks, Alex