matser /slave issue on solr

2008-09-04 Thread dudes dudes

Hello All, 

I have taken the following steps to configure master and slave servers 
However, the slave doesn't seem to sync with the master... 
Please let me know what I have done wrong ,,,

both are nightly version 2008-7-7 on ubuntu machine java 1.6

On the master machine:

1) the scripts.conf file .

user=
solr_hostname= localhost
solr_port= 8983
rsyncd_port= 18983
data_dir=
webapp_name= solr
master_host=
master_data_dir=
master_status_dir=

2) Indexed some docs

3) Then I issued the following commands..

./rsyncd-enable; rsyncd-start 
./snapshooter

On the slave machine:

1) the scripts.conf file

user=
solr_hostname=mastereserver.companyname.com
solr_port=8080 
rsyncd_port=18983
data_dir=
webapp_name=solr
master_host=localhost
master_data_dir=/root/masterSolr/apache-solr-nightly/example/solr/data/
master_status_dir=/root/masterSolr/apache-solr-nightly/example/solr/logs/clients/

2) Then the following commands are issued:

./snappuller -P 18983
./snapinstaller
./commit 



3) however on the stats.jsp it says numDocs=0 ( on the salve machine). 

thanks for your time and suggestions 
ak





_
Get all your favourite content with the slick new MSN Toolbar - FREE
http://clk.atdmt.com/UKM/go/111354027/direct/01/

master/slave configuration

2008-09-04 Thread Pragati Jain
Hi All!

 

I am new to Solr. I wanted to know in detail about master/slave setup.

 

I have configured master and slave servers but still not clear about how it
works. I have setup only one slave.

What I have understood is that when the query is fired over a master server,
master server will pass it to the slave server which will process the query
but in my case master itself is serving the query.

 

I have set up master and server through manually running all the scripts.
Data is correctly indexed in both master and slave servers. 

 

Is there any change required in solrconfig.xml also to actually make them
update servers and query servers.

 

Thanks in advance!! 



Re: master/slave configuration

2008-09-04 Thread Shalin Shekhar Mangar
Hi Pragati,

Query fired on master will only run on master. You need to query
master/slave separately. Usually, people use a load balancer in front of the
slaves to distribute queries and master is (usually) used only for indexing
and the replication scripts automatically sync the slave with the new data
once commit/optimize is done.

Take a look at http://wiki.apache.org/solr/CollectionDistribution for more
details.

On Thu, Sep 4, 2008 at 4:19 PM, Pragati Jain
<[EMAIL PROTECTED]>wrote:

> Hi All!
>
>
>
> I am new to Solr. I wanted to know in detail about master/slave setup.
>
>
>
> I have configured master and slave servers but still not clear about how it
> works. I have setup only one slave.
>
> What I have understood is that when the query is fired over a master
> server,
> master server will pass it to the slave server which will process the query
> but in my case master itself is serving the query.
>
>
>
> I have set up master and server through manually running all the scripts.
> Data is correctly indexed in both master and slave servers.
>
>
>
> Is there any change required in solrconfig.xml also to actually make them
> update servers and query servers.
>
>
>
> Thanks in advance!!
>
>


-- 
Regards,
Shalin Shekhar Mangar.


RE: master/slave configuration

2008-09-04 Thread Pragati Jain
Thanks :) 

-Original Message-
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 04, 2008 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: master/slave configuration

Hi Pragati,

Query fired on master will only run on master. You need to query
master/slave separately. Usually, people use a load balancer in front of the
slaves to distribute queries and master is (usually) used only for indexing
and the replication scripts automatically sync the slave with the new data
once commit/optimize is done.

Take a look at http://wiki.apache.org/solr/CollectionDistribution for more
details.

On Thu, Sep 4, 2008 at 4:19 PM, Pragati Jain
<[EMAIL PROTECTED]>wrote:

> Hi All!
>
>
>
> I am new to Solr. I wanted to know in detail about master/slave setup.
>
>
>
> I have configured master and slave servers but still not clear about how
it
> works. I have setup only one slave.
>
> What I have understood is that when the query is fired over a master
> server,
> master server will pass it to the slave server which will process the
query
> but in my case master itself is serving the query.
>
>
>
> I have set up master and server through manually running all the scripts.
> Data is correctly indexed in both master and slave servers.
>
>
>
> Is there any change required in solrconfig.xml also to actually make them
> update servers and query servers.
>
>
>
> Thanks in advance!!
>
>


-- 
Regards,
Shalin Shekhar Mangar.



RE: Errors compiling laster solr 1.3 update

2008-09-04 Thread r.prieto
OK that's the problema :-) I forget to update the WebContent Libs

Thanks all

-Mensaje original-
De: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Enviado el: miércoles, 03 de septiembre de 2008 21:04
Para: solr-user@lucene.apache.org
Asunto: Re: Errors compiling laster solr 1.3 update

Did you run clean first?

Can you share the errors?  Note, it compiles for me.

On Sep 3, 2008, at 2:15 PM, <[EMAIL PROTECTED]>  
<[EMAIL PROTECTED]> wrote:

> Hi Shalin,
> I too think that is a problem of jars files , but I download the lib
> directory again and isn't work.
> This is my CVS link http://svn.apache.org/repos/asf/lucene/solr/ 
> trunk and y
> too try whith
> http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3/
>
>
> It`s correct ???
>
> -Mensaje original-
> De: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Enviado el: miércoles, 03 de septiembre de 2008 18:56
> Para: solr-user@lucene.apache.org
> Asunto: Re: Errors compiling laster solr 1.3 update
>
> I can compile it successfully. The lucene jars have been updated, so  
> make
> sure you update the lib directory too.
>
> On Wed, Sep 3, 2008 at 9:30 PM, <[EMAIL PROTECTED]> wrote:
>
>> Hi all,
>>
>>
>>
>> First of all , sorry for my English
>>
>>
>>
>> I'm not sure that it's a problem, but after the last update from  
>> CVS (solr
>> 1.3 dev) I can't compile the solr source. I think that is a  
>> problema of my
>> workspace, but I'd like to be sure that anyone more have the same
> problema.
>>
>> The classes who have the problema are SnowballPorterFilterFactory and
>> SolrCore
>>
>> Thanks
>>
>>
>>
>> Raul
>>
>>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.
>



feeding data

2008-09-04 Thread Cam Bazz
hello,
is there no other way then making xml files and feeding those to solr?

I just want to feed solr programmatically. - without xml

Best.


Re: feeding data

2008-09-04 Thread Erik Hatcher


On Sep 4, 2008, at 8:27 AM, Cam Bazz wrote:

hello,
is there no other way then making xml files and feeding those to solr?

I just want to feed solr programmatically. - without xml


There are several options.  You can feed Solr XML, or CSV, or use any  
of the Solr client APIs (though those use XML under the covers for  
indexing documents, but transparently).  A more advanced option is to  
use Solr in embedded mode where you use its Java API directly with no  
intermediate representation needed.


Erik



Re: feeding data

2008-09-04 Thread Mark Miller

Cam Bazz wrote:

hello,
is there no other way then making xml files and feeding those to solr?

I just want to feed solr programmatically. - without xml

Best.

  

Check out the solrj page: http://wiki.apache.org/solr/Solrj


RE: feeding data

2008-09-04 Thread Pragati Jain
Hi Cam

You can also feed data through csv files or directly through database.

Please have a look
http://wiki.apache.org/solr/#head-98c3ee61c5fc837b09e3dfe3fb420491c9071be3

-Original Message-
From: Cam Bazz [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 04, 2008 5:58 PM
To: solr-user@lucene.apache.org
Subject: feeding data

hello,
is there no other way then making xml files and feeding those to solr?

I just want to feed solr programmatically. - without xml

Best.



Re: matser /slave issue on solr

2008-09-04 Thread Bill Au
On your slave,

solr_hostname should be localhost
and
master_host should be the hostname of your master server

Check out the following Wiki for a full description of the variables in
scripts.conf:

http://wiki.apache.org/solr/SolrCollectionDistributionScripts

Bill

On Thu, Sep 4, 2008 at 4:46 AM, dudes dudes <[EMAIL PROTECTED]> wrote:

>
> Hello All,
>
> I have taken the following steps to configure master and slave servers
> However, the slave doesn't seem to sync with the master...
> Please let me know what I have done wrong ,,,
>
> both are nightly version 2008-7-7 on ubuntu machine java 1.6
>
> On the master machine:
>
> 1) the scripts.conf file .
>
> user=
> solr_hostname= localhost
> solr_port= 8983
> rsyncd_port= 18983
> data_dir=
> webapp_name= solr
> master_host=
> master_data_dir=
> master_status_dir=
>
> 2) Indexed some docs
>
> 3) Then I issued the following commands..
>
> ./rsyncd-enable; rsyncd-start
> ./snapshooter
>
> On the slave machine:
>
> 1) the scripts.conf file
>
> user=
> solr_hostname=mastereserver.companyname.com
> solr_port=8080
> rsyncd_port=18983
> data_dir=
> webapp_name=solr
> master_host=localhost
> master_data_dir=/root/masterSolr/apache-solr-nightly/example/solr/data/
>
> master_status_dir=/root/masterSolr/apache-solr-nightly/example/solr/logs/clients/
>
> 2) Then the following commands are issued:
>
> ./snappuller -P 18983
> ./snapinstaller
> ./commit
>
>
>
> 3) however on the stats.jsp it says numDocs=0 ( on the salve machine).
>
> thanks for your time and suggestions
> ak
>
>
>
>
>
> _
> Get all your favourite content with the slick new MSN Toolbar - FREE
> http://clk.atdmt.com/UKM/go/111354027/direct/01/
>


Luke handler questions

2008-09-04 Thread Otis Gospodnetic
Hi,

I'm looking at an index with the Luke handler and see something that makes no 
sense to me:


string
I-SOl
I-SO-
1138826
1138826

  2

Note how docs # == distinct #.  That looks good and makes sense - each document 
has a unique "itemid".  But then look at topTerms.  What does number "2" 
represent there?  I thought it was the term frequency.  If so, then the above 
says there are 2 documents with itemid=INBMA00134320080901 and that conflicts 
with docs # == distinct #.

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



Re: Luke handler questions

2008-09-04 Thread Yonik Seeley
On Thu, Sep 4, 2008 at 1:26 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Note how docs # == distinct #.  That looks good and makes sense - each 
> document has a unique "itemid".  But then look at topTerms.  What does number 
> "2" represent there?  I thought it was the term frequency.  If so, then the 
> above says there are 2 documents with itemid=INBMA00134320080901 and that 
> conflicts with docs # == distinct #.

Remember that the Lucene term frequency does not take into account
deleted documents.  So in this case, INBMA00134320080901 was probably
overwritten.

-Yonik


Solr Slaves Sync

2008-09-04 Thread OLX - Pablo Garrido
Hello

We have a 3 Solr Servers replication schema, one Master and 2 Slaves,
commits are done every 5 minutes on the Master and an optimize is done
once a day during midnight, snapshots are copied via rsync to Slaves are
done every 10 minutes, we are facing serious problems when doing the
sync after the optimize and keeping Slaves serving queries as usual,
active connections to Slaves increase highly during Optimize Snapshot
sync, is there any way we can tune this process ? we try this process :

1. stopping Sync process on one Slave
2. taking the other one out of the LB pool
3. do the sync on this offline Slave
4. after sync is over add back to LB Pool synced Slave
5. take other Slave out from LB Pool
6. start sync process on the offline Slave
8 add back synced Slave to LB Pool

following these steps we sometimes face high active connections when
moving Slaves back to LB Pool. Has anybody faced this situation in
production envs ? Thanks

Pablo 



Re: Solr Slaves Sync

2008-09-04 Thread Matthew Runo
As far as I can tell, there is no need to remove a slave from a pool  
while performing the sync. It's all done in the background and doesn't  
change anything till the final  is ran to open a new searcher.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Sep 4, 2008, at 10:46 AM, OLX - Pablo Garrido wrote:


Hello

We have a 3 Solr Servers replication schema, one Master and 2 Slaves,
commits are done every 5 minutes on the Master and an optimize is done
once a day during midnight, snapshots are copied via rsync to Slaves  
are

done every 10 minutes, we are facing serious problems when doing the
sync after the optimize and keeping Slaves serving queries as usual,
active connections to Slaves increase highly during Optimize Snapshot
sync, is there any way we can tune this process ? we try this  
process :


1. stopping Sync process on one Slave
2. taking the other one out of the LB pool
3. do the sync on this offline Slave
4. after sync is over add back to LB Pool synced Slave
5. take other Slave out from LB Pool
6. start sync process on the offline Slave
8 add back synced Slave to LB Pool

following these steps we sometimes face high active connections when
moving Slaves back to LB Pool. Has anybody faced this situation in
production envs ? Thanks

Pablo





Question about Data Types

2008-09-04 Thread Kashyap, Raghu
Hi,

 

  I have a use case where I need to define my own datatype (Money).
Will something like this work? Are there any issues with this approach?

 

Schema.xml

 



 

Thanks,

Raghu

 

Ps: We are using the trunk version of solr



Solr 1.3 RC 2

2008-09-04 Thread Grant Ingersoll

A Solr 1.3 release candidate is available at 
http://people.apache.org/~gsingers/solr/1.3-RC2/

Note, this is NOT an official release, but is pretty close.  Thus, if  
you have the time and inclination, please download and provide  
feedback, preferably on solr-dev as to any issues you have.  You may  
find CHANGES.txt (https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.3/CHANGES.txt 
) helpful in understanding what's different in 1.3.


Cheers,
Grant


Questions on compound file format

2008-09-04 Thread Shalin Shekhar Mangar
Hi,

What are the benefits/drawbacks of using the compound file format
(true)? From searching through Solr and
Lucene wiki pages:

1. Using the compound file format drops the number of file descriptors
needed. Any other benefits?
2. Indexing may be slower. What about query performance?
3. Since Lucene 1.4, the compound file format became the default, however
Solr default is not to use compound file format. Why this inconsistency?

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr is highlighting wrong words

2008-09-04 Thread Francisco Sanmartin

Researching more, it was already an issue. Sorry for the inconvenience.

http://issues.apache.org/jira/browse/SOLR-42

Pako


Francisco Sanmartin wrote:
Highlighting in Solr has a strange behavior in some items. I attach an 
example to see if anyone can throw some light at it.  Basically solr 
is highlighting wrong words. I'm looking for the word "car" and I tell 
solr to highlight it with the code  and . The 
response is ok in most of the cases, but there are some items that 
appear with the wrong words highlighted. I attach an example at the 
bottom.



The problem of this example is that is highlighting the word "his", 
but the search word is "car".

This is the scenario:

Solr 1.2
The url:
http://solr-server:8983/solr/select/?q=id:11439968%20AND%20description%3Acar&hl=on&hl.fl=description&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%20%3C%2Fstrong%3E 



The query fancy style:


 
description
on
id:11439968 AND description:car


(I query with the id to obtain the item that is failing in 
highlighing, so everything is more clear).


The response:

 
   ...
   11439968
...

 This is a one of a kind all custom '95 Integra LS with 2005 
TSX headlight and tailight conversion. It has GSR all black interior, 
18 inch rims, strut bars, cd changer, coil overs, HID headlights, 
catback exhaust, intake, new clutch and brakes. Motor has 130,000 
miles. No smoke or leaks. Runs great. This car is completly 
shaved. Paint is a two toned black/white with white ice flake. It is 
flawless and ready to show. This car has not even seen winter 
after being built! It is stored in a garage all year. Serious inquires 
only (203)994-0085. OR Email [EMAIL PROTECTED] $8,500 OR BEST 
OFFER!

   
   ...
 

   
   
   
   back exhaust, intake, new clutch and brakes. Motor has 
130,000 miles. No smoke or leaks. Runs great. This 

   
   
   



The schema (relevant parts);

stored="true"/>


...

positionIncrementGap="100">

 
 
 
 generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0"/>

 
 protected="protwords.txt"/>

 
 
 
 
 synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

 
 generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0"/>

 
 protected="protwords.txt"/>

 
 
   


Thanks in advance.

Pako








handling multiple multiple resources with single requestHandler

2008-09-04 Thread Aleksandar Bradic

Hi,

Any ideas on how could we register single request handler for handling  
multiple (wildcarded) contexts/resource uri's ?


(something like) :




Current logic in SolrDispatchFilter / RequestHandlers registers a  
single (context <-> handler) mapping and obviously doesn't allow  
wildcarding.


However, such feature could be quite useful in situations where we  
have single app/handler handling multiple contexts
(if there are few - the ugly way would be to just register multiple  
entries pointing to the single handler , but in some situations - like  
when having a numeric mid-argument , (example : "/app/3/query") - it's  
even impossible to do it)


The only way I can do it right now is by modifying SolrDispatchFilter,  
and manually adding request context trimming there (reducing the  
requested context to "/app/"), and registering handler for that  
context (which would later resolve other parts of it) -> but if there  
is another way to do this -  without changing the code, I would be  
more than happy to learn about it :)


(actually there is a pathPrefix property, used in that part of the  
code - but it does the complete opposite of what is needed in this  
case :( )


Thanks,

.Alek




Re: distributed search mechanism

2008-09-04 Thread Eason . Lee
2008/8/31 Grégoire Neuville <[EMAIL PROTECTED]>

> Hi all,
>
> I've recently been working with the distributed search capabilities of solr
> to build a web portal ; all is working fine, but it is now time for me to
> describe my work on a "theoretical" point of view.
>
> I've been trying to approximately figure the distributed search mechanism
> out first by browsing the code, but it's too complex for me ; then by
> reading the JIRA comments accompanying the commits where I found this :
>
> ***
> The search request processing on the set of shards is performed as follows:
>
> STEP 1: The query is built, terms are extracted. Global numDocs and
> docFreqs
> are calculated by requesting all the shards and adding up numDocs and
> docFreqs from each shard.
>
> STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and
> docFreqs are passed as request parameters. All document fields are NOT
> requested, only document uniqFields and sort fields are requested.
> MoreLikeThis and Highlighting information are NOT requested.
>
> Etc...
> ***
>
> This is typically the kind of description I need, but I wonder if the one
> cited above is still valid (since it was apparently written quite a time
> before final commit).


The main steps remains the same ,but the details changes a lot
global TF/IDF is not supported yet.


>
> Assuming it is, what's then the difference between the STEPS mentioned and
> the STAGES later introduced (STAGE_START, STAGE_PARSE_QUERY, etc...) ?
>
> How the ranking of the documents in the merged set of responses is
> calculated (especially when sorting on a field) ?


generally speaking:
in the 1st step  only document uniqFields and sort fields are requested
so documents can be merged according to the sort fields,and refetched
(getting all the fields needed) by uniqFields


>
> Finally, does the order of the parameters in the query is significant in a
> distributed search case ? (i.e, is there a difference between :
>   - http://server1:port1
> /solr1/?q=title:blah&shards=server1:port1/solr1,server1:port1/solr2
> and
>   - http://server1:port1
> /solr1/?shards=server1:port1/solr1,server1:port1/solr2&q=title:blah
> ?
> (this last question is more related with the distributed deadlock topic on
> the wiki. : my understanding is that in my first example the "title:blah"
> query is send as a top level query to solr1 and as a "shard query" to both
> solr1 and solr2 (deadlock risk) ; while in the second example, "title:blah"
> is not sent to solr1 as a top level query. Am I right ?))


there is no difference between two  queries above,since all parameters are
put into a map.
search of the query is not executed on the top level ,just done on the
shards list.
the query send to the shard will add an isShard option, so shards will just
do the search without sending query to the shards.


>
> That's a lot if question and a too long post maybe : sorry.
>
> Thanks a lot if you feel the courage to answer,
>

the answer above is just my understanding , not official :)


>
> --
> Grégoire Neuville
>


Re: Building a multilevel query.

2008-09-04 Thread Chris Hostetter

: I want to do a query that first queries on one specific field and
: for all those that match the input do a second query.
: 
: For example if we have a "type" field where one of the options
: is "user" and a "title" fields includes the names of the users.
: 
: So I want to find all data with "type" field = user where the name
: Erik is in the title field.

I suspect i must be misunderstanding your question, because it sounds like 
you just want...

...?q=title:Erik&fq=type:user


-Hoss



Re: Question about Data Types

2008-09-04 Thread Chris Hostetter

:   I have a use case where I need to define my own datatype (Money).
: Will something like this work? Are there any issues with this approach?

: 

Assuming you have implemented a Java class named "Money" in the package 
"xyz" and you are subclassing the FieldType class -- then yes.  you can 
implment any sort of FieldType you want.  YOu can even subclass something 
like the SortableFloatType to reuse a lot of the existing code if it's 
useful to you.

But i'm not sure if that's what you are asking.

You could also do something like this...

  

now you've got a fieldType called "money" that you can refer to in other 
fields in your schema, and you don't have to writ any java code -- 
assuming all you really care about is storing a floating point value.

It really depends on what it is you mean by saying you want your own 
datatype.


-Hoss



Re: scoring individual values in a multivalued field

2008-09-04 Thread Jaco
Hi,

I ran into the same problem some time ago, couldn't find any relation to the
boost values on the multivalued field and the search results. Does anybody
have an idea how to handle this?

Thanks,

Jaco.

2008/8/29 Sébastien Rainville <[EMAIL PROTECTED]>

> Hi,
>
> I have a multivalued field that I would want to score individually for each
> value. Is there an easy way to do that?
>
> Here's a concrete example of what I'm trying to achieve:
>
> Let's say that I have 3 documents with a field "name_t" and a multivalued
> field "caracteristic_t_mv":
>
> 
> Dog
> Cool
> Big
> Dirty
> 
>
> 
> Cat
> Small
> Dirty
> 
>
> 
> Fish
> Smells
> Dirty
> 
>
> If I query only the field caracteristic_t_mv for the value "Dirty" I would
> like the documents to be sorted accordingly => get 1-3-2.
>
> It's possible to set the scoring of a field when indexing but there are 2
> problems with that:
> 1) the value of the field boost is actually the multiplication of the value
> for the different boost values of the fields with the same name;
> 2) the value of normField is persisted as a byte in the index and the
> precision loss hurts.
>
> Thanks in advance,
> Sebastien
>