Embedded Solr updates not showing until restart

2009-01-19 Thread edre...@ha

Hi,

We're evaluating the use of Solr for use in a web application.  I've got the
web application configured to use an embedded instance of Solr for queries
(setup as a slave), and a remote instance for writes (setup as a master).

The replication scripts are running fine and the embedded slave does appear
to be getting the updates, but queries run against the embedded slave don't
show up until I restart the web application.  We're using SolrJ as our
interface to Solr.  

Can anyone provide any insight into why updates don't show up until after a
webapp restart?

Thanks,
Erik
-- 
View this message in context: 
http://www.nabble.com/Embedded-Solr-updates-not-showing-until-restart-tp21546235p21546235.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Embedded Solr updates not showing until restart

2009-01-20 Thread edre...@ha




Grant Ingersoll-6 wrote:
> 
> Do they show up if you use non-embedded?  That is, if you hit that  
> slave over HTTP from your browser, are the changes showing up?
> 

Yes.  Changing the config to access the server over HTTP works fine.  When
looking at our console logs for the Solr Server, I can see no discernable
difference between the embedded and HTTP approaches.  The snapinstaller
appears to be working in both cases, but changes to the index don't show up
in queries when the slave is configured as embedded.

I'm moving forward with the HTTP approach, but the embedded approach is
desirable for two (obvious) reasons: 1) performance improvement, 2) simpler
deployment.

Thanks.
-- 
View this message in context: 
http://www.nabble.com/Embedded-Solr-updates-not-showing-until-restart-tp21546235p21562955.html
Sent from the Solr - User mailing list archive at Nabble.com.



Master failover - seeking comments

2009-01-22 Thread edre...@ha

Hi,

We're looking forward to using Solr in a project.  We're using a typical
setup with one Master and a handful of Slaves.  We're using the Master for
writes and the Slaves for reads.  Standard stuff.

Our concern is with downtime of the Master server.  I read a few posts that
touched on this topic but didn't find anything substantive.  I've got a test
setup in place that appears to work, but I'd like to get some feedback.

Essentially, the plan is to add another Master server, so now we have M1 and
M2.  Both M1 and M2 are also configured to be slaves of each other.  The
plan is to put a load balancer in between the Slaves and the Master servers. 
This way, if M1 goes down, traffic will be routed to M2 automatically.  Once
M1 comes back online, we'll route traffic back to that server.  Because M1
and M2 are replicating each other all updates are captured.

To test this, I ran the following scenario.

1) Slave 1 (S1) is configured to use M2 as it's master.
2) We push an update to M2.
3) We restart S1, now pointing to M1.
4) We wait for M1 to sync from M2
5) We then sync S1 to M1.  
6) Success!

However...

M1 and M2 generate snapshots every time they sync to each other, even if no
new data was pushed to them from a Slave.  We're concerned about this.   

Is this even a problem?  
Are we stuck in some infinte sync loop between the 2 Master machines?  
Will this degrade performance of the Master machines over time?  
Is there anything else I should know about this setup?

Any insights, or alternative suggestions to this setup are quite welcome.

Thanks,
Erik
 
-- 
View this message in context: 
http://www.nabble.com/Master-failover---seeking-comments-tp21614750p21614750.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Embedded Solr updates not showing until restart

2009-01-22 Thread edre...@ha



Grant Ingersoll-6 wrote:
> 
> Can you share your code?  Or reduce it down to a repeatable test?
> 

I'll try to do this.  For now I'm proceeding with the HTTP route.  We're
going to want to revisit this and I'll likely do it at that time.

Thanks,
Erik
-- 
View this message in context: 
http://www.nabble.com/Embedded-Solr-updates-not-showing-until-restart-tp21546235p21614923.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Master failover - seeking comments

2009-01-23 Thread edre...@ha

Thanks for the response. Let me clarify things a bit.

Regarding the Slaves:
Our project is a web application. It is our desire to embedd Solr into the
web application.   The web applications are configured with a local embedded
Solr instance configured as a slave, and a remote Solr instance configured
as a master.

We have a requirement for real-time updates to the Solr indexes.  Our
strategy is to use the local embedded Solr instance as a read-only
repository.  Any time a write is made, we will send it to the remote Master. 
Once a user pushes a write operation to the remote Master, all subsequent
read operations for this user now are made against the Master for the
duration of the session.  This approximates "realtime" updates and seems to
work for our purposes.  Writes to our system are a small percentage of Read
operations.

Now, back to the original question.  We're simply looking for failover
solution if the Master server goes down.  Oh, and we are using the
replication scripts to sync the servers.



> It seems like you are trying to write to Solr directly from your front end
> application. This is why you are thinking of multiple masters. I'll let
> others comment on how easy/hard/correct the solution would be. 
> 

Well, yes.  We have business requirements that want updates to Solr to be
realtime, or as close to that as possible, so when a user changes something,
our strategy was to save it to the DB and push it to the Solr Master as
well.  Although, we will have a background application that will help ensure
that Solr is in sync with the DB for times that Solr is down and the DB is
not.



> But, do you really need to have live writes? Can they be channeled through
> a
> background process? Since you anyway cannot do a commit per-write, the
> advantage of live writes is minimal. Moreover you would need to invest a
> lot
> of time in handling availability concerns to avoid losing updates. If you
> log/record the write requests to an intermediate store (or queue), you can
> do with one master (with another host on standby acting as a slave).
> 

We do need to have live writes, as I mentioned above.  The concern you
mention about losing live writes is exactly why we are looking at a Master
Solr server failover strategy.  We thought about having a backup Solr server
that is a Slave to the Master and could be easily reconfigured as a new
Master in a pinch.  Our operations team has pushed us to come up with a
solution that would be more seamless.  This is why we came up with a
Master/Master solution where both Masters are also slaves to each other.



>>
>> To test this, I ran the following scenario.
>>
>> 1) Slave 1 (S1) is configured to use M2 as it's master.
>> 2) We push an update to M2.
>> 3) We restart S1, now pointing to M1.
>> 4) We wait for M1 to sync from M2
>> 5) We then sync S1 to M1.
>> 6) Success!
>>
> 
> How do you co-ordinate all this?
> 

This was just a test scenario I ran manually to see if the setup I described
above would even work.  

Is there a Wiki page that outlines typical web application Solr deployment
strategies?  There are a lot of questions on the forum about this type of
thing (including this one).  For those who have expertise in this area, I'm
sure there are many who could benefit from this (hint hint).

As before, any comments or suggestions on the above would be much
appreciated.

Thanks,
Erik
-- 
View this message in context: 
http://www.nabble.com/Master-failover---seeking-comments-tp21614750p21625324.html
Sent from the Solr - User mailing list archive at Nabble.com.