matser /slave issue on solr
Hello All, I have taken the following steps to configure master and slave servers However, the slave doesn't seem to sync with the master... Please let me know what I have done wrong ,,, both are nightly version 2008-7-7 on ubuntu machine java 1.6 On the master machine: 1) the scripts.conf file . user= solr_hostname= localhost solr_port= 8983 rsyncd_port= 18983 data_dir= webapp_name= solr master_host= master_data_dir= master_status_dir= 2) Indexed some docs 3) Then I issued the following commands.. ./rsyncd-enable; rsyncd-start ./snapshooter On the slave machine: 1) the scripts.conf file user= solr_hostname=mastereserver.companyname.com solr_port=8080 rsyncd_port=18983 data_dir= webapp_name=solr master_host=localhost master_data_dir=/root/masterSolr/apache-solr-nightly/example/solr/data/ master_status_dir=/root/masterSolr/apache-solr-nightly/example/solr/logs/clients/ 2) Then the following commands are issued: ./snappuller -P 18983 ./snapinstaller ./commit 3) however on the stats.jsp it says numDocs=0 ( on the salve machine). thanks for your time and suggestions ak _ Get all your favourite content with the slick new MSN Toolbar - FREE http://clk.atdmt.com/UKM/go/111354027/direct/01/
master/slave configuration
Hi All! I am new to Solr. I wanted to know in detail about master/slave setup. I have configured master and slave servers but still not clear about how it works. I have setup only one slave. What I have understood is that when the query is fired over a master server, master server will pass it to the slave server which will process the query but in my case master itself is serving the query. I have set up master and server through manually running all the scripts. Data is correctly indexed in both master and slave servers. Is there any change required in solrconfig.xml also to actually make them update servers and query servers. Thanks in advance!!
Re: master/slave configuration
Hi Pragati, Query fired on master will only run on master. You need to query master/slave separately. Usually, people use a load balancer in front of the slaves to distribute queries and master is (usually) used only for indexing and the replication scripts automatically sync the slave with the new data once commit/optimize is done. Take a look at http://wiki.apache.org/solr/CollectionDistribution for more details. On Thu, Sep 4, 2008 at 4:19 PM, Pragati Jain <[EMAIL PROTECTED]>wrote: > Hi All! > > > > I am new to Solr. I wanted to know in detail about master/slave setup. > > > > I have configured master and slave servers but still not clear about how it > works. I have setup only one slave. > > What I have understood is that when the query is fired over a master > server, > master server will pass it to the slave server which will process the query > but in my case master itself is serving the query. > > > > I have set up master and server through manually running all the scripts. > Data is correctly indexed in both master and slave servers. > > > > Is there any change required in solrconfig.xml also to actually make them > update servers and query servers. > > > > Thanks in advance!! > > -- Regards, Shalin Shekhar Mangar.
RE: master/slave configuration
Thanks :) -Original Message- From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] Sent: Thursday, September 04, 2008 4:31 PM To: solr-user@lucene.apache.org Subject: Re: master/slave configuration Hi Pragati, Query fired on master will only run on master. You need to query master/slave separately. Usually, people use a load balancer in front of the slaves to distribute queries and master is (usually) used only for indexing and the replication scripts automatically sync the slave with the new data once commit/optimize is done. Take a look at http://wiki.apache.org/solr/CollectionDistribution for more details. On Thu, Sep 4, 2008 at 4:19 PM, Pragati Jain <[EMAIL PROTECTED]>wrote: > Hi All! > > > > I am new to Solr. I wanted to know in detail about master/slave setup. > > > > I have configured master and slave servers but still not clear about how it > works. I have setup only one slave. > > What I have understood is that when the query is fired over a master > server, > master server will pass it to the slave server which will process the query > but in my case master itself is serving the query. > > > > I have set up master and server through manually running all the scripts. > Data is correctly indexed in both master and slave servers. > > > > Is there any change required in solrconfig.xml also to actually make them > update servers and query servers. > > > > Thanks in advance!! > > -- Regards, Shalin Shekhar Mangar.
RE: Errors compiling laster solr 1.3 update
OK that's the problema :-) I forget to update the WebContent Libs Thanks all -Mensaje original- De: Grant Ingersoll [mailto:[EMAIL PROTECTED] Enviado el: miércoles, 03 de septiembre de 2008 21:04 Para: solr-user@lucene.apache.org Asunto: Re: Errors compiling laster solr 1.3 update Did you run clean first? Can you share the errors? Note, it compiles for me. On Sep 3, 2008, at 2:15 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote: > Hi Shalin, > I too think that is a problem of jars files , but I download the lib > directory again and isn't work. > This is my CVS link http://svn.apache.org/repos/asf/lucene/solr/ > trunk and y > too try whith > http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3/ > > > It`s correct ??? > > -Mensaje original- > De: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Enviado el: miércoles, 03 de septiembre de 2008 18:56 > Para: solr-user@lucene.apache.org > Asunto: Re: Errors compiling laster solr 1.3 update > > I can compile it successfully. The lucene jars have been updated, so > make > sure you update the lib directory too. > > On Wed, Sep 3, 2008 at 9:30 PM, <[EMAIL PROTECTED]> wrote: > >> Hi all, >> >> >> >> First of all , sorry for my English >> >> >> >> I'm not sure that it's a problem, but after the last update from >> CVS (solr >> 1.3 dev) I can't compile the solr source. I think that is a >> problema of my >> workspace, but I'd like to be sure that anyone more have the same > problema. >> >> The classes who have the problema are SnowballPorterFilterFactory and >> SolrCore >> >> Thanks >> >> >> >> Raul >> >> > > > -- > Regards, > Shalin Shekhar Mangar. >
feeding data
hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml Best.
Re: feeding data
On Sep 4, 2008, at 8:27 AM, Cam Bazz wrote: hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml There are several options. You can feed Solr XML, or CSV, or use any of the Solr client APIs (though those use XML under the covers for indexing documents, but transparently). A more advanced option is to use Solr in embedded mode where you use its Java API directly with no intermediate representation needed. Erik
Re: feeding data
Cam Bazz wrote: hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml Best. Check out the solrj page: http://wiki.apache.org/solr/Solrj
RE: feeding data
Hi Cam You can also feed data through csv files or directly through database. Please have a look http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3 -Original Message- From: Cam Bazz [mailto:[EMAIL PROTECTED] Sent: Thursday, September 04, 2008 5:58 PM To: solr-user@lucene.apache.org Subject: feeding data hello, is there no other way then making xml files and feeding those to solr? I just want to feed solr programmatically. - without xml Best.
Re: matser /slave issue on solr
On your slave, solr_hostname should be localhost and master_host should be the hostname of your master server Check out the following Wiki for a full description of the variables in scripts.conf: http://wiki.apache.org/solr/SolrCollectionDistributionScripts Bill On Thu, Sep 4, 2008 at 4:46 AM, dudes dudes <[EMAIL PROTECTED]> wrote: > > Hello All, > > I have taken the following steps to configure master and slave servers > However, the slave doesn't seem to sync with the master... > Please let me know what I have done wrong ,,, > > both are nightly version 2008-7-7 on ubuntu machine java 1.6 > > On the master machine: > > 1) the scripts.conf file . > > user= > solr_hostname= localhost > solr_port= 8983 > rsyncd_port= 18983 > data_dir= > webapp_name= solr > master_host= > master_data_dir= > master_status_dir= > > 2) Indexed some docs > > 3) Then I issued the following commands.. > > ./rsyncd-enable; rsyncd-start > ./snapshooter > > On the slave machine: > > 1) the scripts.conf file > > user= > solr_hostname=mastereserver.companyname.com > solr_port=8080 > rsyncd_port=18983 > data_dir= > webapp_name=solr > master_host=localhost > master_data_dir=/root/masterSolr/apache-solr-nightly/example/solr/data/ > > master_status_dir=/root/masterSolr/apache-solr-nightly/example/solr/logs/clients/ > > 2) Then the following commands are issued: > > ./snappuller -P 18983 > ./snapinstaller > ./commit > > > > 3) however on the stats.jsp it says numDocs=0 ( on the salve machine). > > thanks for your time and suggestions > ak > > > > > > _ > Get all your favourite content with the slick new MSN Toolbar - FREE > http://clk.atdmt.com/UKM/go/111354027/direct/01/ >
Luke handler questions
Hi, I'm looking at an index with the Luke handler and see something that makes no sense to me: string I-SOl I-SO- 1138826 1138826 2 Note how docs # == distinct #. That looks good and makes sense - each document has a unique "itemid". But then look at topTerms. What does number "2" represent there? I thought it was the term frequency. If so, then the above says there are 2 documents with itemid=INBMA00134320080901 and that conflicts with docs # == distinct #. Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
Re: Luke handler questions
On Thu, Sep 4, 2008 at 1:26 PM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Note how docs # == distinct #. That looks good and makes sense - each > document has a unique "itemid". But then look at topTerms. What does number > "2" represent there? I thought it was the term frequency. If so, then the > above says there are 2 documents with itemid=INBMA00134320080901 and that > conflicts with docs # == distinct #. Remember that the Lucene term frequency does not take into account deleted documents. So in this case, INBMA00134320080901 was probably overwritten. -Yonik
Solr Slaves Sync
Hello We have a 3 Solr Servers replication schema, one Master and 2 Slaves, commits are done every 5 minutes on the Master and an optimize is done once a day during midnight, snapshots are copied via rsync to Slaves are done every 10 minutes, we are facing serious problems when doing the sync after the optimize and keeping Slaves serving queries as usual, active connections to Slaves increase highly during Optimize Snapshot sync, is there any way we can tune this process ? we try this process : 1. stopping Sync process on one Slave 2. taking the other one out of the LB pool 3. do the sync on this offline Slave 4. after sync is over add back to LB Pool synced Slave 5. take other Slave out from LB Pool 6. start sync process on the offline Slave 8 add back synced Slave to LB Pool following these steps we sometimes face high active connections when moving Slaves back to LB Pool. Has anybody faced this situation in production envs ? Thanks Pablo
Re: Solr Slaves Sync
As far as I can tell, there is no need to remove a slave from a pool while performing the sync. It's all done in the background and doesn't change anything till the final is ran to open a new searcher. Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Sep 4, 2008, at 10:46 AM, OLX - Pablo Garrido wrote: Hello We have a 3 Solr Servers replication schema, one Master and 2 Slaves, commits are done every 5 minutes on the Master and an optimize is done once a day during midnight, snapshots are copied via rsync to Slaves are done every 10 minutes, we are facing serious problems when doing the sync after the optimize and keeping Slaves serving queries as usual, active connections to Slaves increase highly during Optimize Snapshot sync, is there any way we can tune this process ? we try this process : 1. stopping Sync process on one Slave 2. taking the other one out of the LB pool 3. do the sync on this offline Slave 4. after sync is over add back to LB Pool synced Slave 5. take other Slave out from LB Pool 6. start sync process on the offline Slave 8 add back synced Slave to LB Pool following these steps we sometimes face high active connections when moving Slaves back to LB Pool. Has anybody faced this situation in production envs ? Thanks Pablo
Question about Data Types
Hi, I have a use case where I need to define my own datatype (Money). Will something like this work? Are there any issues with this approach? Schema.xml Thanks, Raghu Ps: We are using the trunk version of solr
Solr 1.3 RC 2
A Solr 1.3 release candidate is available at http://people.apache.org/~gsingers/solr/1.3-RC2/ Note, this is NOT an official release, but is pretty close. Thus, if you have the time and inclination, please download and provide feedback, preferably on solr-dev as to any issues you have. You may find CHANGES.txt (https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3/CHANGES.txt ) helpful in understanding what's different in 1.3. Cheers, Grant
Questions on compound file format
Hi, What are the benefits/drawbacks of using the compound file format (true)? From searching through Solr and Lucene wiki pages: 1. Using the compound file format drops the number of file descriptors needed. Any other benefits? 2. Indexing may be slower. What about query performance? 3. Since Lucene 1.4, the compound file format became the default, however Solr default is not to use compound file format. Why this inconsistency? -- Regards, Shalin Shekhar Mangar.
Re: solr is highlighting wrong words
Researching more, it was already an issue. Sorry for the inconvenience. http://issues.apache.org/jira/browse/SOLR-42 Pako Francisco Sanmartin wrote: Highlighting in Solr has a strange behavior in some items. I attach an example to see if anyone can throw some light at it. Basically solr is highlighting wrong words. I'm looking for the word "car" and I tell solr to highlight it with the code and . The response is ok in most of the cases, but there are some items that appear with the wrong words highlighted. I attach an example at the bottom. The problem of this example is that is highlighting the word "his", but the search word is "car". This is the scenario: Solr 1.2 The url: http://solr-server:8983/solr/select/?q=id:11439968%20AND%20description%3Acar&hl=on&hl.fl=description&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%20%3C%2Fstrong%3E The query fancy style: description on id:11439968 AND description:car (I query with the id to obtain the item that is failing in highlighing, so everything is more clear). The response: ... 11439968 ... This is a one of a kind all custom '95 Integra LS with 2005 TSX headlight and tailight conversion. It has GSR all black interior, 18 inch rims, strut bars, cd changer, coil overs, HID headlights, catback exhaust, intake, new clutch and brakes. Motor has 130,000 miles. No smoke or leaks. Runs great. This car is completly shaved. Paint is a two toned black/white with white ice flake. It is flawless and ready to show. This car has not even seen winter after being built! It is stored in a garage all year. Serious inquires only (203)994-0085. OR Email [EMAIL PROTECTED] $8,500 OR BEST OFFER! ... back exhaust, intake, new clutch and brakes. Motor has 130,000 miles. No smoke or leaks. Runs great. This The schema (relevant parts); stored="true"/> ... positionIncrementGap="100"> generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/> protected="protwords.txt"/> synonyms="synonyms.txt" ignoreCase="true" expand="true"/> generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/> protected="protwords.txt"/> Thanks in advance. Pako
handling multiple multiple resources with single requestHandler
Hi, Any ideas on how could we register single request handler for handling multiple (wildcarded) contexts/resource uri's ? (something like) : Current logic in SolrDispatchFilter / RequestHandlers registers a single (context <-> handler) mapping and obviously doesn't allow wildcarding. However, such feature could be quite useful in situations where we have single app/handler handling multiple contexts (if there are few - the ugly way would be to just register multiple entries pointing to the single handler , but in some situations - like when having a numeric mid-argument , (example : "/app/3/query") - it's even impossible to do it) The only way I can do it right now is by modifying SolrDispatchFilter, and manually adding request context trimming there (reducing the requested context to "/app/"), and registering handler for that context (which would later resolve other parts of it) -> but if there is another way to do this - without changing the code, I would be more than happy to learn about it :) (actually there is a pathPrefix property, used in that part of the code - but it does the complete opposite of what is needed in this case :( ) Thanks, .Alek
Re: distributed search mechanism
2008/8/31 Grégoire Neuville <[EMAIL PROTECTED]> > Hi all, > > I've recently been working with the distributed search capabilities of solr > to build a web portal ; all is working fine, but it is now time for me to > describe my work on a "theoretical" point of view. > > I've been trying to approximately figure the distributed search mechanism > out first by browsing the code, but it's too complex for me ; then by > reading the JIRA comments accompanying the commits where I found this : > > *** > The search request processing on the set of shards is performed as follows: > > STEP 1: The query is built, terms are extracted. Global numDocs and > docFreqs > are calculated by requesting all the shards and adding up numDocs and > docFreqs from each shard. > > STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and > docFreqs are passed as request parameters. All document fields are NOT > requested, only document uniqFields and sort fields are requested. > MoreLikeThis and Highlighting information are NOT requested. > > Etc... > *** > > This is typically the kind of description I need, but I wonder if the one > cited above is still valid (since it was apparently written quite a time > before final commit). The main steps remains the same ,but the details changes a lot global TF/IDF is not supported yet. > > Assuming it is, what's then the difference between the STEPS mentioned and > the STAGES later introduced (STAGE_START, STAGE_PARSE_QUERY, etc...) ? > > How the ranking of the documents in the merged set of responses is > calculated (especially when sorting on a field) ? generally speaking: in the 1st step only document uniqFields and sort fields are requested so documents can be merged according to the sort fields,and refetched (getting all the fields needed) by uniqFields > > Finally, does the order of the parameters in the query is significant in a > distributed search case ? (i.e, is there a difference between : > - http://server1:port1 > /solr1/?q=title:blah&shards=server1:port1/solr1,server1:port1/solr2 > and > - http://server1:port1 > /solr1/?shards=server1:port1/solr1,server1:port1/solr2&q=title:blah > ? > (this last question is more related with the distributed deadlock topic on > the wiki. : my understanding is that in my first example the "title:blah" > query is send as a top level query to solr1 and as a "shard query" to both > solr1 and solr2 (deadlock risk) ; while in the second example, "title:blah" > is not sent to solr1 as a top level query. Am I right ?)) there is no difference between two queries above,since all parameters are put into a map. search of the query is not executed on the top level ,just done on the shards list. the query send to the shard will add an isShard option, so shards will just do the search without sending query to the shards. > > That's a lot if question and a too long post maybe : sorry. > > Thanks a lot if you feel the courage to answer, > the answer above is just my understanding , not official :) > > -- > Grégoire Neuville >
Re: Building a multilevel query.
: I want to do a query that first queries on one specific field and : for all those that match the input do a second query. : : For example if we have a "type" field where one of the options : is "user" and a "title" fields includes the names of the users. : : So I want to find all data with "type" field = user where the name : Erik is in the title field. I suspect i must be misunderstanding your question, because it sounds like you just want... ...?q=title:Erik&fq=type:user -Hoss
Re: Question about Data Types
: I have a use case where I need to define my own datatype (Money). : Will something like this work? Are there any issues with this approach? : Assuming you have implemented a Java class named "Money" in the package "xyz" and you are subclassing the FieldType class -- then yes. you can implment any sort of FieldType you want. YOu can even subclass something like the SortableFloatType to reuse a lot of the existing code if it's useful to you. But i'm not sure if that's what you are asking. You could also do something like this... now you've got a fieldType called "money" that you can refer to in other fields in your schema, and you don't have to writ any java code -- assuming all you really care about is storing a floating point value. It really depends on what it is you mean by saying you want your own datatype. -Hoss
Re: scoring individual values in a multivalued field
Hi, I ran into the same problem some time ago, couldn't find any relation to the boost values on the multivalued field and the search results. Does anybody have an idea how to handle this? Thanks, Jaco. 2008/8/29 Sébastien Rainville <[EMAIL PROTECTED]> > Hi, > > I have a multivalued field that I would want to score individually for each > value. Is there an easy way to do that? > > Here's a concrete example of what I'm trying to achieve: > > Let's say that I have 3 documents with a field "name_t" and a multivalued > field "caracteristic_t_mv": > > > Dog > Cool > Big > Dirty > > > > Cat > Small > Dirty > > > > Fish > Smells > Dirty > > > If I query only the field caracteristic_t_mv for the value "Dirty" I would > like the documents to be sorted accordingly => get 1-3-2. > > It's possible to set the scoring of a field when indexing but there are 2 > problems with that: > 1) the value of the field boost is actually the multiplication of the value > for the different boost values of the fields with the same name; > 2) the value of normField is persisted as a byte in the index and the > precision loss hurts. > > Thanks in advance, > Sebastien >