To cluster, or not to cluster...

2006-03-24 Thread Robert Haycock
Hi,

 

Is it/will it be possible to cluster solr?

 

We have a distributed system and it would be nice if we could replicate
the index to improve performance.

 

Rob.

 



Re: To cluster, or not to cluster...

2006-03-24 Thread Clay Webster
On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote:
>
> Is it/will it be possible to cluster solr?
>
> We have a distributed system and it would be nice if we could replicate
> the index to improve performance.
>
>
Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves can
run the snappuller with whatever polling frequency they like.  Each snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied together
as they get (not much).  So, if you have a requirement that they must all be
in index-version-sync you could tie them closer and extend Solr.

--cw


RE: To cluster, or not to cluster...

2006-03-24 Thread Robert Haycock
That's great, cheers.

Rob.

-Original Message-
From: Clay Webster [mailto:[EMAIL PROTECTED] 
Sent: 24 March 2006 16:55
To: solr-user@lucene.apache.org
Subject: Re: To cluster, or not to cluster...

On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote:
>
> Is it/will it be possible to cluster solr?
>
> We have a distributed system and it would be nice if we could
replicate
> the index to improve performance.
>
>
Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves
can
run the snappuller with whatever polling frequency they like.  Each
snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and
you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied
together
as they get (not much).  So, if you have a requirement that they must
all be
in index-version-sync you could tie them closer and extend Solr.

--cw


Multiple updates possible?

2006-03-24 Thread Robert Haycock
Hello again,

 

We're looking at having multiple instances of Solr looking at a single
lucene index.  Would there be a problem if all instances updated the
index at the same time?

 

Rob.



Re: Multiple updates possible?

2006-03-24 Thread Chris Hostetter

: We're looking at having multiple instances of Solr looking at a single
: lucene index.  Would there be a problem if all instances updated the
: index at the same time?

I'm 99% sure it won't be possible to have multiple server instances using
the same index directory and making modifications -- i think the
UpdateHandler maintains an persistent IndexWriter between commits (so that
it doesn't pay the cost of opening the IndexWRiter every time you
add/update a document.

However, you might be able to have multiple servers *reading* from the
same index, which only one server udpates -- if you configure the
postCommit hook appropriately on your "master" (the server doing updates)
it can even execute a shell script to notify all of the other servers that
it's cahnge the index and they should re-open their IndexReaders.


Can I ask what your use case is for wanting multiple servers updating the
same physical index directory?


-Hoss



RE: Multiple updates possible?

2006-03-24 Thread Robert Haycock
In this case we are looking at having multiple tomcats to provide us
with load balancing and failover.  We are not looking at a master/slave
index solution.  We'll also be working on windows.

Rob.

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: 24 March 2006 18:22
To: solr-user@lucene.apache.org
Subject: Re: Multiple updates possible?


: We're looking at having multiple instances of Solr looking at a single
: lucene index.  Would there be a problem if all instances updated the
: index at the same time?

I'm 99% sure it won't be possible to have multiple server instances
using
the same index directory and making modifications -- i think the
UpdateHandler maintains an persistent IndexWriter between commits (so
that
it doesn't pay the cost of opening the IndexWRiter every time you
add/update a document.

However, you might be able to have multiple servers *reading* from the
same index, which only one server udpates -- if you configure the
postCommit hook appropriately on your "master" (the server doing
updates)
it can even execute a shell script to notify all of the other servers
that
it's cahnge the index and they should re-open their IndexReaders.


Can I ask what your use case is for wanting multiple servers updating
the
same physical index directory?


-Hoss



Re: To cluster, or not to cluster...

2006-03-24 Thread jason rutherglen
It should be possible to do clustering if you divide your master index over 
multiple master servers.  Then write a wrapper around the SolrClient API using 
something like MultiSearcher.  From what I know this would work, could be wrong.

- Original Message 
From: Clay Webster <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, March 24, 2006 8:54:45 AM
Subject: Re: To cluster, or not to cluster...

On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote:
>
> Is it/will it be possible to cluster solr?
>
> We have a distributed system and it would be nice if we could replicate
> the index to improve performance.
>
>
Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves can
run the snappuller with whatever polling frequency they like.  Each snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied together
as they get (not much).  So, if you have a requirement that they must all be
in index-version-sync you could tie them closer and extend Solr.

--cw





RE: To cluster, or not to cluster...

2006-03-24 Thread Robert Haycock
Hi Jason,

Would that not mean if one of the master indexes went down then a subset
of data would be offline?

Rob.

-Original Message-
From: jason rutherglen [mailto:[EMAIL PROTECTED] 
Sent: 24 March 2006 18:32
To: solr-user@lucene.apache.org
Subject: Re: To cluster, or not to cluster...

It should be possible to do clustering if you divide your master index
over multiple master servers.  Then write a wrapper around the
SolrClient API using something like MultiSearcher.  From what I know
this would work, could be wrong.

- Original Message 
From: Clay Webster <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, March 24, 2006 8:54:45 AM
Subject: Re: To cluster, or not to cluster...

On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote:
>
> Is it/will it be possible to cluster solr?
>
> We have a distributed system and it would be nice if we could
replicate
> the index to improve performance.
>
>
Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves
can
run the snappuller with whatever polling frequency they like.  Each
snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and
you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied
together
as they get (not much).  So, if you have a requirement that they must
all be
in index-version-sync you could tie them closer and extend Solr.

--cw





Re: To cluster, or not to cluster...

2006-03-24 Thread jason rutherglen
No because the data would be on the slave servers which would continue to 
server data.  You could easily have mirrored master machines if you were 
worried about losing updates.  Updates of a specific division or stripe would 
occur to both mirrored servers or not at all.  Or fancier configurations could 
be done such as if a master fails, take it out and recopy the entire index from 
the good master.  

- Original Message 
From: Robert Haycock <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, March 24, 2006 11:26:58 AM
Subject: RE: To cluster, or not to cluster...

Hi Jason,

Would that not mean if one of the master indexes went down then a subset
of data would be offline?

Rob.

-Original Message-
From: jason rutherglen [mailto:[EMAIL PROTECTED] 
Sent: 24 March 2006 18:32
To: solr-user@lucene.apache.org
Subject: Re: To cluster, or not to cluster...

It should be possible to do clustering if you divide your master index
over multiple master servers.  Then write a wrapper around the
SolrClient API using something like MultiSearcher.  From what I know
this would work, could be wrong.

- Original Message 
From: Clay Webster <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, March 24, 2006 8:54:45 AM
Subject: Re: To cluster, or not to cluster...

On 3/24/06, Robert Haycock <[EMAIL PROTECTED]> wrote:
>
> Is it/will it be possible to cluster solr?
>
> We have a distributed system and it would be nice if we could
replicate
> the index to improve performance.
>
>
Solr does not have replication.  But it does have a very nice index
distribution system.

Solr can be run in a master/slave setup.  The master receives all the
changes.  For each commit a snapshooter index can be made.  The slaves
can
run the snappuller with whatever polling frequency they like.  Each
snapshot
is then snapinstalled in the slave and can have its cache warmed (while
serving queries from the older index).

Slaves can come on line with new indexes out of sync.  But if your slave
hardware is the same and your pulling and shooting well-understood, and
you
make warming time-based it probably will not be a problem.  This
distribution is noted by each slave in the master.  That's as tied
together
as they get (not much).  So, if you have a requirement that they must
all be
in index-version-sync you could tie them closer and extend Solr.

--cw








What is proper way to re-init index?

2006-03-24 Thread John Mohr
I've been working with Solr for just a few days. Initially I ran the
exampldocs and things worked fine. I've now redefined the layout of the
index to be more of what I'd like to see, generated my own xml files to
index, blew away the old index/directories, restarted with the new schema
file and it only creates the index and segment directories but not the
complete index. Luke tells me I have a corrupted index. What is the proper
way to create the index? I can go back to the pre-expansion solr.war file
but that seems a bit drastic.


Re: What is proper way to re-init index?

2006-03-24 Thread Bill Au
If you don't care about saving any data, you can just remove the index
directory.
Solr will create a new one if it does not already exist.
You will need to repopulate your data.

Bill

On 3/24/06, John Mohr <[EMAIL PROTECTED]> wrote:
>
> I've been working with Solr for just a few days. Initially I ran the
> exampldocs and things worked fine. I've now redefined the layout of the
> index to be more of what I'd like to see, generated my own xml files to
> index, blew away the old index/directories, restarted with the new schema
> file and it only creates the index and segment directories but not the
> complete index. Luke tells me I have a corrupted index. What is the proper
> way to create the index? I can go back to the pre-expansion solr.war file
> but that seems a bit drastic.
>
>


Re: What is proper way to re-init index?

2006-03-24 Thread Chris Hostetter
: index to be more of what I'd like to see, generated my own xml files to
: index, blew away the old index/directories, restarted with the new schema
: file and it only creates the index and segment directories but not the
: complete index. Luke tells me I have a corrupted index. What is the proper

I'm not sure about Luke saying the index is corrupted -- that may just be
because it's empty.  It sounds like you never re-indexed your data after
blowing away the old index.

did you re-index the XML documents you made after you deleted the index
directory? ... either using the post.sh script provided by the example, or
by using some other client to POST the documents?



-Hoss



RE: Multiple updates possible?

2006-03-24 Thread Chris Hostetter

: In this case we are looking at having multiple tomcats to provide us
: with load balancing and failover.  We are not looking at a master/slave
: index solution.  We'll also be working on windows.

I'm not very faimilar with windows, but if your goal is to have load
balanced servers for failover, then what is the advantage of running those
multiple servers on the same box (pointed at teh same index directory)?
... if the box goes down, you're up a creek.

what we do is have one master port that recieves all of the updates and
has a postCommit hook which makes snapshots.  then we have many slave
ports (running on other machines) which pull the snapshots at regular
intervals, and are all accessible behind a load balancer.

if one slave goes down -- no big deal, the load balancer stops using it.

if the master goes down, the slaves happily keep serving queries, but new
updates can't be published untill we install a "master" configuration
(with thepostCommit hook) on one of the slaves, and change the DNS record
for the master to point at that slave -- at which point it because the
new master.

I know the existing snapshotter/snappuller scripts in subversion don't
work on windows, but one of the items on the task list is to try and come
up with equivilent methods that can -- if you have any ideas on how that
can be achieved that would be great!




-Hoss



RE: What is proper way to re-init index?

2006-03-24 Thread John Mohr
After some fiddling around the base problem is that it takes my new schema,
implies that the update went fine, but it didn't. Of no great surprise, the
problem is that the schema for some reason doesn't match my data. It doesn't
write out any data. Reconfiging with the example (old) data (and old schema)
it writes it out just fine. Hmmm. It seems that further investigation is
warranted.

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Friday, March 24, 2006 3:33 PM
To: solr-user@lucene.apache.org
Subject: Re: What is proper way to re-init index?


: index to be more of what I'd like to see, generated my own xml files to
: index, blew away the old index/directories, restarted with the new schema
: file and it only creates the index and segment directories but not the
: complete index. Luke tells me I have a corrupted index. What is the proper

I'm not sure about Luke saying the index is corrupted -- that may just be
because it's empty.  It sounds like you never re-indexed your data after
blowing away the old index.

did you re-index the XML documents you made after you deleted the index
directory? ... either using the post.sh script provided by the example, or
by using some other client to POST the documents?



-Hoss




RE: What is proper way to re-init index?

2006-03-24 Thread Chris Hostetter

: After some fiddling around the base problem is that it takes my new schema,
: implies that the update went fine, but it didn't. Of no great surprise, the
: problem is that the schema for some reason doesn't match my data. It doesn't
: write out any data. Reconfiging with the example (old) data (and old schema)
: it writes it out just fine. Hmmm. It seems that further investigation is

can you send a copy of your schema and and example of one "..."
file? ... we might be able to help spot the problem.



-Hoss