Solr Replication

2009-08-25 Thread J G

Hello,

We are running multiple slices in our environment. I have enabled JMX and I am 
inspecting the replication handler mbean to obtain some information about the 
master/slave configuration for replication. Is the replication handler mbean a 
singleton? I only see one mbean for the entire server and it's picking an 
arbitrary slice to report on. So I'm curious if every slice gets its own 
replication handler mbean? This is important because I have no way of knowing 
in this specific server any information about the other slices, in particular, 
information about the master/slave value for the other slices.

Reading through the Solr 1.4 replication strategy, I saw that a slice can be 
configured to be a master and a slave, i.e. a repeater. I'm wondering how 
repeaters work because let's say I have a slice named 'A' and the master is on 
server 1 and the slave is on server 2 then how are these two servers 
communicating to replicate? Looking at the jmx information I have in the MBean 
both the isSlave and isMaster is set to true for my repeater so how does this 
solr slice know if it's the master or slave? I'm a bit confused.

Thanks.




_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

RE: Solr Replication

2009-08-26 Thread J G

Thanks for the response.

It's interesting because when I run jconsole all I can see is one 
ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice 
it finds on its path. Is there anyway to have multiple replication handlers or 
at least obtain replication on a per "slice"/"instance" via JMX like how you 
can see attributes for each "slice"/"instance" via each replication admin jsp 
page? 

Thanks again.

> From: noble.p...@corp.aol.com
> Date: Wed, 26 Aug 2009 11:05:34 +0530
> Subject: Re: Solr Replication
> To: solr-user@lucene.apache.org
> 
> The ReplicationHandler is not enforced as a singleton , but for all
> practical purposes it is a singleton for one core.
> 
> If an instance  (a slice as you say) is setup as a repeater, It can
> act as both a master and slave
> 
> in the repeater the configuration should be as follows
> 
> MASTER
>   |_SLAVE (I am a slave of MASTER)
>   |
> REPEATER (I am a slave of MASTER and master to my slaves )
>  |
>  |
> REPEATER_SLAVE( of REPEATER)
> 
> 
> the point is that REPEATER will have a slave section has a masterUrl
> which points to master and REPEATER_SLAVE will have a slave section
> which has a masterurl pointing to repeater
> 
> 
> 
> 
> 
> 
> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote:
> >
> > Hello,
> >
> > We are running multiple slices in our environment. I have enabled JMX and I 
> > am inspecting the replication handler mbean to obtain some information 
> > about the master/slave configuration for replication. Is the replication 
> > handler mbean a singleton? I only see one mbean for the entire server and 
> > it's picking an arbitrary slice to report on. So I'm curious if every slice 
> > gets its own replication handler mbean? This is important because I have no 
> > way of knowing in this specific server any information about the other 
> > slices, in particular, information about the master/slave value for the 
> > other slices.
> >
> > Reading through the Solr 1.4 replication strategy, I saw that a slice can 
> > be configured to be a master and a slave, i.e. a repeater. I'm wondering 
> > how repeaters work because let's say I have a slice named 'A' and the 
> > master is on server 1 and the slave is on server 2 then how are these two 
> > servers communicating to replicate? Looking at the jmx information I have 
> > in the MBean both the isSlave and isMaster is set to true for my repeater 
> > so how does this solr slice know if it's the master or slave? I'm a bit 
> > confused.
> >
> > Thanks.
> >
> >
> >
> >
> > _
> > With Windows Live, you can organize, edit, and share your photos.
> > http://www.windowslive.com/Desktop/PhotoGallery
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

master/slave replication issue

2009-08-26 Thread J G







Hello,

I'm having an issue getting the master to replicate its index to the slave. 
Below you will find my configuration settings. Here is what is happening: I can 
access the replication dashboard for both the slave and master and I can 
successfully execute HTTP commands against both of these urls through my 
browser. Now, my slave is configured to use the same URL as the one I am using 
in my browser when I query the master, yet when I do a tail -f /logs/catalina.out on the slave server all I see is :


Master - server1.xyz.com Aug 27, 2009 12:13:29 AM org.apache.solr.core.SolrCore 
execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:32 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:34 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:36 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:39 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=4

Aug 27, 2009 12:13:42 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=8

Aug 27, 2009 12:13:44 AM org.apache.solr.core.SolrCore execute

INFO: [] webapp=null path=null params={command=details} status=0 QTime=


For some reason, the webapp and the path is being set to null and I "think" 
this is affecting the replication?!? I am running Solr as the WAR file and it's 
1.4 from a few weeks ago.






optimize


optimize





Notice that I commented out the replication of the configuration files. I 
didn't think this is important for the attempt to try to get replication 
working. However, is it good to have these files replicated?


Slave - server2.xyz.com





http://server1.xyz.com:8080/jdoe/replication  


00:00:20  


internal

5000
1


username
password

 




Thanks for your help!




_
Hotmail® is up to 70% faster. Now good news travels really fast. 
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

RE: Solr Replication

2009-08-27 Thread J G

We have multiple solr webapps all running from the same WAR file. Each webapp 
is running under the same Tomcat container and I consider each webapp the same 
thing as a "slice" (or "instance"). I've configured the Tomcat container to 
enable JMX and when I connect using JConsole I only see the replication handler 
for one of the webapps in the server. I was under the impression each webapp 
gets its own replication handler. Is this not true? 

It would be nice to be able to have a JMX MBean for each replication handler in 
the container so we can get all the same replication information using JMX as 
in using the replication admin page for each web app.

Thanks.





> From: noble.p...@corp.aol.com
> Date: Thu, 27 Aug 2009 13:04:38 +0530
> Subject: Re: Solr Replication
> To: solr-user@lucene.apache.org
> 
> when you say a slice you mean one instance of solr? So your JMX
> console is connecting to only one solr?
> 
> On Thu, Aug 27, 2009 at 3:19 AM, J G wrote:
> >
> > Thanks for the response.
> >
> > It's interesting because when I run jconsole all I can see is one 
> > ReplicationHandler jmx mbean. It looks like it is defaulting to the first 
> > slice it finds on its path. Is there anyway to have multiple replication 
> > handlers or at least obtain replication on a per "slice"/"instance" via JMX 
> > like how you can see attributes for each "slice"/"instance" via each 
> > replication admin jsp page?
> >
> > Thanks again.
> >
> >> From: noble.p...@corp.aol.com
> >> Date: Wed, 26 Aug 2009 11:05:34 +0530
> >> Subject: Re: Solr Replication
> >> To: solr-user@lucene.apache.org
> >>
> >> The ReplicationHandler is not enforced as a singleton , but for all
> >> practical purposes it is a singleton for one core.
> >>
> >> If an instance  (a slice as you say) is setup as a repeater, It can
> >> act as both a master and slave
> >>
> >> in the repeater the configuration should be as follows
> >>
> >> MASTER
> >>   |_SLAVE (I am a slave of MASTER)
> >>   |
> >> REPEATER (I am a slave of MASTER and master to my slaves )
> >>  |
> >>  |
> >> REPEATER_SLAVE( of REPEATER)
> >>
> >>
> >> the point is that REPEATER will have a slave section has a masterUrl
> >> which points to master and REPEATER_SLAVE will have a slave section
> >> which has a masterurl pointing to repeater
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Aug 26, 2009 at 12:40 AM, J G wrote:
> >> >
> >> > Hello,
> >> >
> >> > We are running multiple slices in our environment. I have enabled JMX 
> >> > and I am inspecting the replication handler mbean to obtain some 
> >> > information about the master/slave configuration for replication. Is the 
> >> > replication handler mbean a singleton? I only see one mbean for the 
> >> > entire server and it's picking an arbitrary slice to report on. So I'm 
> >> > curious if every slice gets its own replication handler mbean? This is 
> >> > important because I have no way of knowing in this specific server any 
> >> > information about the other slices, in particular, information about the 
> >> > master/slave value for the other slices.
> >> >
> >> > Reading through the Solr 1.4 replication strategy, I saw that a slice 
> >> > can be configured to be a master and a slave, i.e. a repeater. I'm 
> >> > wondering how repeaters work because let's say I have a slice named 'A' 
> >> > and the master is on server 1 and the slave is on server 2 then how are 
> >> > these two servers communicating to replicate? Looking at the jmx 
> >> > information I have in the MBean both the isSlave and isMaster is set to 
> >> > true for my repeater so how does this solr slice know if it's the master 
> >> > or slave? I'm a bit confused.
> >> >
> >> > Thanks.
> >> >
> >> >
> >> >
> >> >
> >> > _
> >> > With Windows Live, you can organize, edit, and share your photos.
> >> > http://www.windowslive.com/Desktop/PhotoGallery
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >
> > _
> > Hotmail® is up to 70% faster. Now good news travels really fast.
> > http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009
> 
> 
> 
> -- 
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

solr jmx connection

2009-07-10 Thread J G

 Hello,

I have a SOLR JMX connection issue. I am running my JMX MBeanServer through 
Tomcat, meaning I am using Tomcat's MBeanServer rather than any other 
MBeanServer implemenation.
I am having a hard time trying to figure out the correct JMX Service URL on my 
localhost for the accessing the SOLR MBeans. My current configuration consists 
of the following:

JMX Service url = localhost:9000/jmxrmi

So I have configured JMX to run on port 9000 on tomcat on my localhost and 
using the above service url i can access the tomcat jmx MBeanServer and get 
related JVM object information(e.g. I can access the MemoryMXBean object)

However, I am having a harder time trying to access the SOLR MBeans. First, I 
could have the wrong service URL. Second, I'm confused as to which MBeans SOLR 
provides.

You might be asking why am I creating my own client rather than using JConsole, 
but JConsole doesn't provide the features I need.

Anyone with any knowledge or code snippets would be a huge help!

Thank you for your time!

Regards



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

JMX monitoring for multiple SOLR instances

2009-07-14 Thread J G

Hi,

If I want to run multiple SOLR war files in tomcat is it possible to monitor 
each of the SOLR instances individually through JMX? Has anyone attempted this 
before? Also, what are the implications (e.g. performance) of runnign mulitple 
SOLR instances in the same tomcat server?

Thanks.




_
Windows Live™: Keep your life in sync. 
http://windowslive.com/explore?ocid=TXT_TAGLM_WL_BR_life_in_synch_062009

Obtaining SOLR index size on disk

2009-07-17 Thread J G

Hello,

Is it possible to obtain the SOLR index size on disk through the SOLR API? I've 
read through the docs and mailing list questions but can't seem to find the 
answer.

Any help is appreciated.

Thanks.



_
Hotmail® has ever-growing storage! Don’t worry about storage limits. 
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage_062009

Solr Spellcheck on Large index size

2010-04-27 Thread Kyle J G

I am trying to create a spell checker for my companies website.

Currently there are approx 29million documents in the index.

When trying to create the spelling index it just seems to skip over the
command.

My fields in schema.xml look like the following:

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

And copying fields as such: 
   
   
   
   
   


My spell checker config looks like the following: 






  default
  spell
  true
  true
  C:\Users\kyleg\apache-solr-1.4.0\productGroups\solr\data\spellchecker






  solr.FileBasedSpellChecker
  file
  spellings.txt
  UTF-8
  ./spellcheckerFile

  


The command that I am sending to try to build looks like the following:
http://localhost:8983/solr/spell/?q=ACORA&version=2.2&start=0&rows=10&indent=on&spellcheck=true&spellcheck.dictionary=default&spellcheck.build=true&spellcheck.collate=true&spellcheck.limit=5


I have also tried to reduce the size of the index to around 10,000 documents
and still no luck.

Any help would be appreciated.

Thank you,
Kyle
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spellcheck-on-Large-index-size-tp760416p760416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Filter multivalue fields from search result

2010-07-08 Thread Alex J. G. Burzyński
Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:










And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1 & 3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex


Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi,

So if those are separate documents how should I handle paging? Two 
separate queries?
First to return all matching courses-events pairs, and second one to get 
courses for given page?


Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 "Alex J. G. Burzyński":
   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:










And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1&  3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex

 



   


Re: Filter multivalue fields from search result

2010-07-12 Thread Alex J. G. Burzyński

Hi Chantal,

The paging problem I've asked about is that having course-event pairs 
and specifying rows limits the number of pairs returned not the courses


+---+--+++
| id-id | name | town   | date   |
+---+--+++
| 1-1   | Microsoft Excel  | London | 2010-08-20 |
| 1-2   | Microsoft Excel  | Glasgow| 2010-08-24 |
| 1-3   | Microsoft Excel  | Leeds  | 2010-08-28 |
| 2-1   | Microsoft Word   | Aberdeen   | 2010-08-21 |
| 2-2   | Microsoft Word   | Reading| 2010-08-25 |
| 2-3   | Microsoft Word   | London | 2010-08-29 |
| 3-1   | Microsoft Powerpoint | Birmingham | 2010-08-22 |
| 3-2   | Microsoft Powerpoint | Leeds  | 2010-08-26 |
| 3-3   | Microsoft Powerpoint | Leeds  | 2010-08-30 |
+---+--+++


And from UI point of view I'm returning less courses then events - 
that's why I've asked about paging.


The search for q=name:Microsoft town:Leeds with rows=2 should return:
1-3 & 3-2 & 3-3

But 3-3 will be obviously on page 2.

I hope that it makes my questions more clear.

Thanks,
Alex


On 2010-07-12 10:26, Chantal Ackermann wrote:

Hi Alex,

I think you have to explain the complete use case. Paging is done by
specifying the parameter "start" (and "rows" if you want to have more or
less than 10 hits per page). For each page you need of course a new
query, but the queries differ only in the parameter value "start" (first
page start=0, second page start=10 etc. if rows=10). The other
parameters remain the same.

You should also have a look at facets. They might help you to get a list
of the values of your multi valued fields that you can display in the
UI, allowing the user to drill down the results further.

Chantal

On Mon, 2010-07-12 at 10:26 +0200, "Alex J. G. Burzyński" wrote:
   

Hi,

So if those are separate documents how should I handle paging? Two
separate queries?
First to return all matching courses-events pairs, and second one to get
courses for given page?

Is this common design described in details somewhere?

Thanks,
Alex

On 2010-07-09 01:50, Lance Norskog wrote:
 

Yes, denormalizing the index into separate (name,town) pairs is the
common design for this problem.

2010/7/8 "Alex J. G. Burzyński":

   

Hi,

Is it possible to remove from search results the multivalued fields that
don't pass the search criteria?

My schema is defined as:










And example docs are:

++--+++
| id | name | town   | date   |
++--+++
| 1  | Microsoft Excel  | London | 2010-08-20 |
||  | Glasgow| 2010-08-24 |
||  | Leeds  | 2010-08-28 |
| 2  | Microsoft Word   | Aberdeen   | 2010-08-21 |
||  | Reading| 2010-08-25 |
||  | London | 2010-08-29 |
| 2  | Microsoft Powerpoint | Birmingham | 2010-08-22 |
||  | Leeds  | 2010-08-26 |
++--+++

so the query for q=name:Microsoft town:Leeds returns docs 1&   3.

How would I remove London/Glasgow from doc 1 and Birmingham from doc 3?

Or is it that I should create separate doc for each name-event?

Thanks,
Alex