To cluster, or not to cluster...
Hi, Is it/will it be possible to cluster solr? We have a distributed system and it would be nice if we could replicate the index to improve performance. Rob.
Re: To cluster, or not to cluster...
On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote: > > Is it/will it be possible to cluster solr? > > We have a distributed system and it would be nice if we could replicate > the index to improve performance. > > Solr does not have replication. But it does have a very nice index distribution system. Solr can be run in a master/slave setup. The master receives all the changes. For each commit a snapshooter index can be made. The slaves can run the snappuller with whatever polling frequency they like. Each snapshot is then snapinstalled in the slave and can have its cache warmed (while serving queries from the older index). Slaves can come on line with new indexes out of sync. But if your slave hardware is the same and your pulling and shooting well-understood, and you make warming time-based it probably will not be a problem. This distribution is noted by each slave in the master. That's as tied together as they get (not much). So, if you have a requirement that they must all be in index-version-sync you could tie them closer and extend Solr. --cw
RE: To cluster, or not to cluster...
That's great, cheers. Rob. -Original Message- From: Clay Webster [mailto:[EMAIL PROTECTED] Sent: 24 March 2006 16:55 To: solr-user@lucene.apache.org Subject: Re: To cluster, or not to cluster... On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote: > > Is it/will it be possible to cluster solr? > > We have a distributed system and it would be nice if we could replicate > the index to improve performance. > > Solr does not have replication. But it does have a very nice index distribution system. Solr can be run in a master/slave setup. The master receives all the changes. For each commit a snapshooter index can be made. The slaves can run the snappuller with whatever polling frequency they like. Each snapshot is then snapinstalled in the slave and can have its cache warmed (while serving queries from the older index). Slaves can come on line with new indexes out of sync. But if your slave hardware is the same and your pulling and shooting well-understood, and you make warming time-based it probably will not be a problem. This distribution is noted by each slave in the master. That's as tied together as they get (not much). So, if you have a requirement that they must all be in index-version-sync you could tie them closer and extend Solr. --cw
Multiple updates possible?
Hello again, We're looking at having multiple instances of Solr looking at a single lucene index. Would there be a problem if all instances updated the index at the same time? Rob.
Re: Multiple updates possible?
: We're looking at having multiple instances of Solr looking at a single : lucene index. Would there be a problem if all instances updated the : index at the same time? I'm 99% sure it won't be possible to have multiple server instances using the same index directory and making modifications -- i think the UpdateHandler maintains an persistent IndexWriter between commits (so that it doesn't pay the cost of opening the IndexWRiter every time you add/update a document. However, you might be able to have multiple servers *reading* from the same index, which only one server udpates -- if you configure the postCommit hook appropriately on your "master" (the server doing updates) it can even execute a shell script to notify all of the other servers that it's cahnge the index and they should re-open their IndexReaders. Can I ask what your use case is for wanting multiple servers updating the same physical index directory? -Hoss
RE: Multiple updates possible?
In this case we are looking at having multiple tomcats to provide us with load balancing and failover. We are not looking at a master/slave index solution. We'll also be working on windows. Rob. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: 24 March 2006 18:22 To: solr-user@lucene.apache.org Subject: Re: Multiple updates possible? : We're looking at having multiple instances of Solr looking at a single : lucene index. Would there be a problem if all instances updated the : index at the same time? I'm 99% sure it won't be possible to have multiple server instances using the same index directory and making modifications -- i think the UpdateHandler maintains an persistent IndexWriter between commits (so that it doesn't pay the cost of opening the IndexWRiter every time you add/update a document. However, you might be able to have multiple servers *reading* from the same index, which only one server udpates -- if you configure the postCommit hook appropriately on your "master" (the server doing updates) it can even execute a shell script to notify all of the other servers that it's cahnge the index and they should re-open their IndexReaders. Can I ask what your use case is for wanting multiple servers updating the same physical index directory? -Hoss
Re: To cluster, or not to cluster...
It should be possible to do clustering if you divide your master index over multiple master servers. Then write a wrapper around the SolrClient API using something like MultiSearcher. From what I know this would work, could be wrong. - Original Message From: Clay Webster <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, March 24, 2006 8:54:45 AM Subject: Re: To cluster, or not to cluster... On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote: > > Is it/will it be possible to cluster solr? > > We have a distributed system and it would be nice if we could replicate > the index to improve performance. > > Solr does not have replication. But it does have a very nice index distribution system. Solr can be run in a master/slave setup. The master receives all the changes. For each commit a snapshooter index can be made. The slaves can run the snappuller with whatever polling frequency they like. Each snapshot is then snapinstalled in the slave and can have its cache warmed (while serving queries from the older index). Slaves can come on line with new indexes out of sync. But if your slave hardware is the same and your pulling and shooting well-understood, and you make warming time-based it probably will not be a problem. This distribution is noted by each slave in the master. That's as tied together as they get (not much). So, if you have a requirement that they must all be in index-version-sync you could tie them closer and extend Solr. --cw
RE: To cluster, or not to cluster...
Hi Jason, Would that not mean if one of the master indexes went down then a subset of data would be offline? Rob. -Original Message- From: jason rutherglen [mailto:[EMAIL PROTECTED] Sent: 24 March 2006 18:32 To: solr-user@lucene.apache.org Subject: Re: To cluster, or not to cluster... It should be possible to do clustering if you divide your master index over multiple master servers. Then write a wrapper around the SolrClient API using something like MultiSearcher. From what I know this would work, could be wrong. - Original Message From: Clay Webster <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, March 24, 2006 8:54:45 AM Subject: Re: To cluster, or not to cluster... On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote: > > Is it/will it be possible to cluster solr? > > We have a distributed system and it would be nice if we could replicate > the index to improve performance. > > Solr does not have replication. But it does have a very nice index distribution system. Solr can be run in a master/slave setup. The master receives all the changes. For each commit a snapshooter index can be made. The slaves can run the snappuller with whatever polling frequency they like. Each snapshot is then snapinstalled in the slave and can have its cache warmed (while serving queries from the older index). Slaves can come on line with new indexes out of sync. But if your slave hardware is the same and your pulling and shooting well-understood, and you make warming time-based it probably will not be a problem. This distribution is noted by each slave in the master. That's as tied together as they get (not much). So, if you have a requirement that they must all be in index-version-sync you could tie them closer and extend Solr. --cw
Re: To cluster, or not to cluster...
No because the data would be on the slave servers which would continue to server data. You could easily have mirrored master machines if you were worried about losing updates. Updates of a specific division or stripe would occur to both mirrored servers or not at all. Or fancier configurations could be done such as if a master fails, take it out and recopy the entire index from the good master. - Original Message From: Robert Haycock <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, March 24, 2006 11:26:58 AM Subject: RE: To cluster, or not to cluster... Hi Jason, Would that not mean if one of the master indexes went down then a subset of data would be offline? Rob. -Original Message- From: jason rutherglen [mailto:[EMAIL PROTECTED] Sent: 24 March 2006 18:32 To: solr-user@lucene.apache.org Subject: Re: To cluster, or not to cluster... It should be possible to do clustering if you divide your master index over multiple master servers. Then write a wrapper around the SolrClient API using something like MultiSearcher. From what I know this would work, could be wrong. - Original Message From: Clay Webster <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, March 24, 2006 8:54:45 AM Subject: Re: To cluster, or not to cluster... On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote: > > Is it/will it be possible to cluster solr? > > We have a distributed system and it would be nice if we could replicate > the index to improve performance. > > Solr does not have replication. But it does have a very nice index distribution system. Solr can be run in a master/slave setup. The master receives all the changes. For each commit a snapshooter index can be made. The slaves can run the snappuller with whatever polling frequency they like. Each snapshot is then snapinstalled in the slave and can have its cache warmed (while serving queries from the older index). Slaves can come on line with new indexes out of sync. But if your slave hardware is the same and your pulling and shooting well-understood, and you make warming time-based it probably will not be a problem. This distribution is noted by each slave in the master. That's as tied together as they get (not much). So, if you have a requirement that they must all be in index-version-sync you could tie them closer and extend Solr. --cw
What is proper way to re-init index?
I've been working with Solr for just a few days. Initially I ran the exampldocs and things worked fine. I've now redefined the layout of the index to be more of what I'd like to see, generated my own xml files to index, blew away the old index/directories, restarted with the new schema file and it only creates the index and segment directories but not the complete index. Luke tells me I have a corrupted index. What is the proper way to create the index? I can go back to the pre-expansion solr.war file but that seems a bit drastic.
Re: What is proper way to re-init index?
If you don't care about saving any data, you can just remove the index directory. Solr will create a new one if it does not already exist. You will need to repopulate your data. Bill On 3/24/06, John Mohr <[EMAIL PROTECTED]> wrote: > > I've been working with Solr for just a few days. Initially I ran the > exampldocs and things worked fine. I've now redefined the layout of the > index to be more of what I'd like to see, generated my own xml files to > index, blew away the old index/directories, restarted with the new schema > file and it only creates the index and segment directories but not the > complete index. Luke tells me I have a corrupted index. What is the proper > way to create the index? I can go back to the pre-expansion solr.war file > but that seems a bit drastic. > >
Re: What is proper way to re-init index?
: index to be more of what I'd like to see, generated my own xml files to : index, blew away the old index/directories, restarted with the new schema : file and it only creates the index and segment directories but not the : complete index. Luke tells me I have a corrupted index. What is the proper I'm not sure about Luke saying the index is corrupted -- that may just be because it's empty. It sounds like you never re-indexed your data after blowing away the old index. did you re-index the XML documents you made after you deleted the index directory? ... either using the post.sh script provided by the example, or by using some other client to POST the documents? -Hoss
RE: Multiple updates possible?
: In this case we are looking at having multiple tomcats to provide us : with load balancing and failover. We are not looking at a master/slave : index solution. We'll also be working on windows. I'm not very faimilar with windows, but if your goal is to have load balanced servers for failover, then what is the advantage of running those multiple servers on the same box (pointed at teh same index directory)? ... if the box goes down, you're up a creek. what we do is have one master port that recieves all of the updates and has a postCommit hook which makes snapshots. then we have many slave ports (running on other machines) which pull the snapshots at regular intervals, and are all accessible behind a load balancer. if one slave goes down -- no big deal, the load balancer stops using it. if the master goes down, the slaves happily keep serving queries, but new updates can't be published untill we install a "master" configuration (with thepostCommit hook) on one of the slaves, and change the DNS record for the master to point at that slave -- at which point it because the new master. I know the existing snapshotter/snappuller scripts in subversion don't work on windows, but one of the items on the task list is to try and come up with equivilent methods that can -- if you have any ideas on how that can be achieved that would be great! -Hoss
RE: What is proper way to re-init index?
After some fiddling around the base problem is that it takes my new schema, implies that the update went fine, but it didn't. Of no great surprise, the problem is that the schema for some reason doesn't match my data. It doesn't write out any data. Reconfiging with the example (old) data (and old schema) it writes it out just fine. Hmmm. It seems that further investigation is warranted. -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, March 24, 2006 3:33 PM To: solr-user@lucene.apache.org Subject: Re: What is proper way to re-init index? : index to be more of what I'd like to see, generated my own xml files to : index, blew away the old index/directories, restarted with the new schema : file and it only creates the index and segment directories but not the : complete index. Luke tells me I have a corrupted index. What is the proper I'm not sure about Luke saying the index is corrupted -- that may just be because it's empty. It sounds like you never re-indexed your data after blowing away the old index. did you re-index the XML documents you made after you deleted the index directory? ... either using the post.sh script provided by the example, or by using some other client to POST the documents? -Hoss
RE: What is proper way to re-init index?
: After some fiddling around the base problem is that it takes my new schema, : implies that the update went fine, but it didn't. Of no great surprise, the : problem is that the schema for some reason doesn't match my data. It doesn't : write out any data. Reconfiging with the example (old) data (and old schema) : it writes it out just fine. Hmmm. It seems that further investigation is can you send a copy of your schema and and example of one "..." file? ... we might be able to help spot the problem. -Hoss