a question about solr queryparser

2008-09-21 Thread finy finy
i use solr 1.2 in my project, and i use myself  Analizer.
for example:
   "oneworld onedream" will be  segmented to  "one" ,"world", "one" ,
"dream"
this is my analizer result. and i use it in solr.
when i use a query "title:oneworld onedream" in solr ,  solr will parse it
like this

title:oneword title:onedream

why?

i think solr should parse  it and generate the query like this "title:one
title:world title:one title:dream".

please help me~


RE: Hardware config for SOLR

2008-09-21 Thread Andrey Shulinskiy
Grant,

Thanks a lot for the answers. Please see my replies below.

> > 1) Should we do sharding or not?
> > If we start without sharding, how hard will it be to enable it?
> > Is it just some config changes + the index rebuild or is it more?
> 
> There will be operations setup, etc.  And you'll have to add in the
> appropriate query stuff.
> 
> Your install and requirements aren't that large, so I doubt you'll
> need sharding, but it always depends on your exact configuration.
> I've seen indexes as big as 80 million docs on a single machine, but
> the docs were smaller in size.
> 
> > My personal opinion is to go without sharding at first and enable it
> > later if do get a lot of documents.
> 
> Sounds reasonable.

One more question - is it worth it to try to keep the whole index in
memory and shard when it doesn't fit anymore? For me it seems like a bit
of overhead, but I may be very wrong here.
What's a recommended ratio of the parts to keep in RAM and on the HDDs?

> > 2) How should we organize our clusters to ensure redundancy?
> >
> > Should we have 2 or more identical Masters (means that all the
> > updates/optimisations/etc. are done for every one of them)?
> >
> > An alternative, afaik, is to reconfigure one slave to become the new
> > Master, how hard is that?
> 
> I don't have a good answer here, maybe someone else can chime in.  I
> know master failover is a concern, but I'm not sure how others handle
> it right now.  Would be good to have people share their approach.
> That being said, it seems reasonable to me to have identical masters.

I found this thread related to this issue:
http://www.nabble.com/High-Availability-deployment-to13094489.html#a1309
8729

I guess, it depends on how easy we can fill the gap between the last
commit and the time of the Master going down. Most likely, we'll have to
have 2 Masters.


> > 3) Basically, we can get servers of two kinds:
> > * Single Processor, Dual Core Opteron 2214HE
> > * 2 GB DDR2 SDRAM
> > * 1 x 250 GB (7200 RPM) SATA Drive(s)
> >
> > * Dual Processor, Quad Core 5335
> > * 16 GB Memory (Fully Buffered)
> > * 2 x 73 GB (10k RPM) 2.5" SAS Drive(s), RAID 1
> >
> > The second - more powerful - one is more expensive, of course.
> 
> Get as much RAM as you can afford.  Surely there is an in between
> machine as well that might balance cost and capabilities.  The first
> machine seems a bit light, especially in memory.

Fair enough.

> > How can we take advantage of the multiprocessor/multicore servers?
> >
> > Is there some special setup required to make, say, 2 instances of
SOLR
> > run on the same server using different processors/cores?
> 
> See the Core Admin stuff http://wiki.apache.org/solr/CoreAdmin.  Solr
> is thread-safe by design (so it's a bug, if you hit issues).  You can
> send it documents on multiple threads and it will be fine.

Hmmm, it seems that several cores are supposed to handle different
indexes:
http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf
35ab6ff393f360d59
<< Solr1.3 added support for multiple "Solr Cores" in a single
deployment of Solr -- each Solr Core has it's own index. For more
information please see CoreAdmin.>>

As we are going to have just one index, so the only way to use it that I
see is to configure a Master on Core1 and a Slave on core 2, or 2 slaves
on 2 cores.

Do I miss something here? 

> > 4) Does it make much difference to get a more powerful Master?
> >
> > Or, on the contrary, as slaves will be queried more often, they
should
> > be the better ones? Maybe just the HDDs for the slaves should be as
> > fast
> > as possible?
> 
> Depends on where your bottlenecks are.  Are you getting a lot of
> queries or a lot of updates?

Both, but more queries than updates. Means we shouldn't neglect slaves,
I guess?


> As for HDDs, people have noted some nice speedups in Lucene using
> Solid-state drives, if you can afford them.  Fast I/O is good if
> you're retrieving whole documents, but once things are warmed up more
> RAM is most important, I think, as many things can be cached.


> > 5) How many slaves does it make sense to have per one Master?
> > What's (roughly) the performance gain from 1 to 2, 2 -> 3, etc?
> > When does it stop making sense to add more slaves?
> 
> I suppose it's when you can handle your peak load, but I don't have
> numbers.  One of the keys is to incrementally test and see what makes
> sense for your scenario.

Right, the numbers given in other responses (thanks Karl and Lars) look
impressive, so we'll consider this option.

> > As far as I understand, it depends mainly on the size of the index.
> > However, I'd guess the time required to do a push for too many
slaves
> > can be a problem too, correct?
> 
> The biggest problem for slaves is if the master does an optimization,
> in which case the whole snapshot must be downloaded versus incremental
> additions can be handled by getting just the deltas.

Our initial idea is to send batch updates several times per 

Re: a question about solr queryparser

2008-09-21 Thread Otis Gospodnetic
Hi,

Hm, it looks as if you have not plugged in your custom Analyzer (correctly) in 
the solrconfig.xml.  Could you paste the relevant part of solrconfig.xml?  I 
don't recall a bug related to this, but you could also try Solr 1.3 if you 
believe you configured things conrrectly.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: finy finy <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Sunday, September 21, 2008 10:42:17 PM
> Subject: a question about solr queryparser
> 
> i use solr 1.2 in my project, and i use myself  Analizer.
> for example:
>"oneworld onedream" will be  segmented to  "one" ,"world", "one" ,
> "dream"
> this is my analizer result. and i use it in solr.
> when i use a query "title:oneworld onedream" in solr ,  solr will parse it
> like this
> 
> title:oneword title:onedream
> 
> why?
> 
> i think solr should parse  it and generate the query like this "title:one
> title:world title:one title:dream".
> 
> please help me~



Re: How to keep a slave offline until the index is puller from master

2008-09-21 Thread Otis Gospodnetic
Hi Jacob,

Aha, the first time, I see.  Without knowing the background I'd say: "So why 
would you expose your Solr instances to search in the first place if the index 
is not in place?  Just copy the index to Solr slaves manually the first time 
and then start the slaves."

On the other hand, I think the "I'm not ready yet" type of response might be 
something that Solr should have in the future.


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jacob Singh <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Sunday, September 21, 2008 12:43:09 AM
> Subject: Re: How to keep a slave offline until the index is puller from master
> 
> Hi Otis,
> 
> Thanks for the response.  I was actually talking about the initial
> sync over from the master.  what I'd like I guess is a "lock" command
> which would start true, and when snapinstaller ran successfully for
> the first time would become false.  I can write the bash, but I'm not
> sure how to get solr to to push out the 503 (I guess that would be the
> appropriate code)...
> 
> Best,
> Jacob
> 
> 
> 
> On Sun, Sep 21, 2008 at 12:29 AM, Otis Gospodnetic
> wrote:
> > Even with your current setup (if it's done correctly) slavs should not be 
> returning 0 hits for a query that previously returned hits.  That is, nothing 
> should be off-line.  Index searcher warmup and swapping happens in the 
> background and while that's happening the old searcher should be serving 
> queries.
> >
> >
> > Otis --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > - Original Message 
> >> From: Jacob Singh 
> >> To: solr-user@lucene.apache.org
> >> Sent: Saturday, September 20, 2008 5:54:39 AM
> >> Subject: How to keep a slave offline until the index is puller from master
> >>
> >> Hi,
> >>
> >> I'm running multiple instances (solr 1.2) on a single jetty server using 
> JNDI.
> >>
> >> When I launch a slave, it has to retrieve all of the indexes from the
> >> master server using the snapuller / snapinstaller.
> >>
> >> This works fine, however, I don't want to wait to activate the slave
> >> (turn on jetty) while waiting for every slave to get its data.
> >>
> >> Is there anyway to make sure that a slave is "up2date" before letting
> >> it accept queries?  AS it is, the last slave will take 10-15 to get
> >> its data, and for those 15 minutes, it is active in the load balancer
> >> and therefor taking requests which return 0 results.
> >>
> >> Also, if I switch to multi-core (1.3) is this problem avoided?
> >>
> >> Thanks,
> >> Jacob
> >>
> >>
> >>
> >>
> >> --
> >>
> >> +1 510 277-0891 (o)
> >> +91  33 7458 (m)
> >>
> >> web: http://pajamadesign.com
> >>
> >> Skype: pajamadesign
> >> Yahoo: jacobsingh
> >> AIM: jacobsingh
> >> gTalk: [EMAIL PROTECTED]
> >
> >
> 
> 
> 
> -- 
> 
> +1 510 277-0891 (o)
> +91  33 7458 (m)
> 
> web: http://pajamadesign.com
> 
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: [EMAIL PROTECTED]



Re: How to keep a slave offline until the index is puller from master

2008-09-21 Thread Jacob Singh
Hi Otis,

Yeah, I know it is a bit of an edge case.  In my scenario though, the
issue is that I want to start serving some slaves before ALL become
available.  I've decided to just do it through jetty wherein, I will
only enable that JNDI config for that host once the snapshot has run.

Best,
Jacob

On Mon, Sep 22, 2008 at 9:21 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Hi Jacob,
>
> Aha, the first time, I see.  Without knowing the background I'd say: "So why 
> would you expose your Solr instances to search in the first place if the 
> index is not in place?  Just copy the index to Solr slaves manually the first 
> time and then start the slaves."
>
> On the other hand, I think the "I'm not ready yet" type of response might be 
> something that Solr should have in the future.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jacob Singh <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Sunday, September 21, 2008 12:43:09 AM
>> Subject: Re: How to keep a slave offline until the index is puller from 
>> master
>>
>> Hi Otis,
>>
>> Thanks for the response.  I was actually talking about the initial
>> sync over from the master.  what I'd like I guess is a "lock" command
>> which would start true, and when snapinstaller ran successfully for
>> the first time would become false.  I can write the bash, but I'm not
>> sure how to get solr to to push out the 503 (I guess that would be the
>> appropriate code)...
>>
>> Best,
>> Jacob
>>
>>
>>
>> On Sun, Sep 21, 2008 at 12:29 AM, Otis Gospodnetic
>> wrote:
>> > Even with your current setup (if it's done correctly) slavs should not be
>> returning 0 hits for a query that previously returned hits.  That is, nothing
>> should be off-line.  Index searcher warmup and swapping happens in the
>> background and while that's happening the old searcher should be serving
>> queries.
>> >
>> >
>> > Otis --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > - Original Message 
>> >> From: Jacob Singh
>> >> To: solr-user@lucene.apache.org
>> >> Sent: Saturday, September 20, 2008 5:54:39 AM
>> >> Subject: How to keep a slave offline until the index is puller from master
>> >>
>> >> Hi,
>> >>
>> >> I'm running multiple instances (solr 1.2) on a single jetty server using
>> JNDI.
>> >>
>> >> When I launch a slave, it has to retrieve all of the indexes from the
>> >> master server using the snapuller / snapinstaller.
>> >>
>> >> This works fine, however, I don't want to wait to activate the slave
>> >> (turn on jetty) while waiting for every slave to get its data.
>> >>
>> >> Is there anyway to make sure that a slave is "up2date" before letting
>> >> it accept queries?  AS it is, the last slave will take 10-15 to get
>> >> its data, and for those 15 minutes, it is active in the load balancer
>> >> and therefor taking requests which return 0 results.
>> >>
>> >> Also, if I switch to multi-core (1.3) is this problem avoided?
>> >>
>> >> Thanks,
>> >> Jacob
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> +1 510 277-0891 (o)
>> >> +91  33 7458 (m)
>> >>
>> >> web: http://pajamadesign.com
>> >>
>> >> Skype: pajamadesign
>> >> Yahoo: jacobsingh
>> >> AIM: jacobsingh
>> >> gTalk: [EMAIL PROTECTED]
>> >
>> >
>>
>>
>>
>> --
>>
>> +1 510 277-0891 (o)
>> +91  33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: [EMAIL PROTECTED]
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: [EMAIL PROTECTED]


Re: Hardware config for SOLR

2008-09-21 Thread Otis Gospodnetic
Hi Andrey,

Responses inlined.



- Original Message 
> From: Andrey Shulinskiy <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Sunday, September 21, 2008 11:23:00 PM
> Subject: RE: Hardware config for SOLR
> 
> Grant,
> 
> Thanks a lot for the answers. Please see my replies below.
> 
> > > 1) Should we do sharding or not?
> > > If we start without sharding, how hard will it be to enable it?
> > > Is it just some config changes + the index rebuild or is it more?
> > 
> > There will be operations setup, etc.  And you'll have to add in the
> > appropriate query stuff.
> > 
> > Your install and requirements aren't that large, so I doubt you'll
> > need sharding, but it always depends on your exact configuration.
> > I've seen indexes as big as 80 million docs on a single machine, but
> > the docs were smaller in size.
> > 
> > > My personal opinion is to go without sharding at first and enable it
> > > later if do get a lot of documents.
> > 
> > Sounds reasonable.
> 
> One more question - is it worth it to try to keep the whole index in
> memory and shard when it doesn't fit anymore? For me it seems like a bit
> of overhead, but I may be very wrong here.
> What's a recommended ratio of the parts to keep in RAM and on the HDDs?

It's well worth trying to keep the index buffered (i.e. in memory).  Yes, once 
you can't fit the hot parts of the index in RAM it's time to think about 
sharding (or buying more RAM).  However, it's not as simple as looking at the 
index size and RAM size, as not all parts of the index need to be cached.

> > > 2) How should we organize our clusters to ensure redundancy?
> > >
> > > Should we have 2 or more identical Masters (means that all the
> > > updates/optimisations/etc. are done for every one of them)?
> > >
> > > An alternative, afaik, is to reconfigure one slave to become the new
> > > Master, how hard is that?
> > 
> > I don't have a good answer here, maybe someone else can chime in.  I
> > know master failover is a concern, but I'm not sure how others handle
> > it right now.  Would be good to have people share their approach.
> > That being said, it seems reasonable to me to have identical masters.
> 
> I found this thread related to this issue:
> http://www.nabble.com/High-Availability-deployment-to13094489.html#a1309
> 8729
> 
> I guess, it depends on how easy we can fill the gap between the last
> commit and the time of the Master going down. Most likely, we'll have to
> have 2 Masters.

Or you could simply have 2 masters and index the same data on both of them.  
Then, in case #1 fails, you simply get your slaves to start copying from the 
#2.  You could have slaves talk to the master via a LB VIP, so a change from #1 
to #2 can be done in LB quickly and slaves don't have to be changed.  Or you 
could have masters keep the index on some sort of shared storage (e.g. SAN).

> > > 3) Basically, we can get servers of two kinds:
> > > * Single Processor, Dual Core Opteron 2214HE
> > > * 2 GB DDR2 SDRAM
> > > * 1 x 250 GB (7200 RPM) SATA Drive(s)
> > >
> > > * Dual Processor, Quad Core 5335
> > > * 16 GB Memory (Fully Buffered)
> > > * 2 x 73 GB (10k RPM) 2.5" SAS Drive(s), RAID 1
> > >
> > > The second - more powerful - one is more expensive, of course.
> > 
> > Get as much RAM as you can afford.  Surely there is an in between
> > machine as well that might balance cost and capabilities.  The first
> > machine seems a bit light, especially in memory.
> 
> Fair enough.
> 
> > > How can we take advantage of the multiprocessor/multicore servers?
> > >
> > > Is there some special setup required to make, say, 2 instances of
> SOLR
> > > run on the same server using different processors/cores?
> > 
> > See the Core Admin stuff http://wiki.apache.org/solr/CoreAdmin.  Solr
> > is thread-safe by design (so it's a bug, if you hit issues).  You can
> > send it documents on multiple threads and it will be fine.
> 
> Hmmm, it seems that several cores are supposed to handle different
> indexes:
> http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf
> 35ab6ff393f360d59

Yes.

> << Solr1.3 added support for multiple "Solr Cores" in a single
> deployment of Solr -- each Solr Core has it's own index. For more
> information please see CoreAdmin.>>
> 
> As we are going to have just one index, so the only way to use it that I
> see is to configure a Master on Core1 and a Slave on core 2, or 2 slaves
> on 2 cores.
> 
> Do I miss something here? 

It sounds like you are talking about a single server hosting the master and 
slave(s) on the same server.
That's not what you want to do, though.  Master and slave(s) live each on their 
own server.  But I think you are aware of this.
You don't need to think about Solr Multicore functionality if you have but a 
single index.

> > > 4) Does it make much difference to get a more powerful Master?
> > >
> > > Or, on the contrary, as slaves will be queried more often, they
> should
> > > be the better ones? Maybe ju

Re: Solr case sensitive searching

2008-09-21 Thread Otis Gospodnetic
Hi,

Please use solr-user list instead of solr-dev.

If you want the search to be case SENSITIVE then you do not want to have 
LowerCase* in the analyzer chain.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: mahendra mahendra <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Sent: Friday, September 19, 2008 8:26:01 PM
> Subject: Solr case sensitive searching
> 
> Hi Guys,
>  
> Could you please tell me how I can configure for case sensitive searching?
>  
> The following ways are not working
> or
> 
>  
> I would appreciate for any help!!
> 
> 
> Thanks & Regards,
> Mahendra



Re: SynonymFilter and inch/foot symbols

2008-09-21 Thread Chris Hostetter

: How would I handle a search for 21" or 3'. The " and ' symbols appear to 
: get stripped away by Lucene before passing the query off to the 
: analyzers.
...
: We are also using the DisjunctionMaxQueryParser to build the actual 
: query from the front end.

Nothing should be striping apostrophes before handing them to the 
QParser.

DisMaxQParserPlugin automaticly strips double-quotes when there is 
an odd number (using SolrPluginUtils.stripUnbalancedQuotes) .. if 
there is an even number it assumes they are there to force a phrase 
query -- but if you are using DisjunctionMaxQueryParser directly, nothing 
should be touching your quote characters.


-Hoss



solr's synonyms and stopwords?

2008-09-21 Thread finy finy
i use solr1.2

the synonyms and stopwords is at conf directory

when i have more than one webapp , i must configure synonyms and stopwords
each.

i want to define a directory for synonyms and stopwords  for all webapp, it
means that all webapp share one synonyms and stopwords .

how to do it?