How to keep a slave offline until the index is puller from master

2008-09-20 Thread Jacob Singh
Hi,

I'm running multiple instances (solr 1.2) on a single jetty server using JNDI.

When I launch a slave, it has to retrieve all of the indexes from the
master server using the snapuller / snapinstaller.

This works fine, however, I don't want to wait to activate the slave
(turn on jetty) while waiting for every slave to get its data.

Is there anyway to make sure that a slave is "up2date" before letting
it accept queries?  AS it is, the last slave will take 10-15 to get
its data, and for those 15 minutes, it is active in the load balancer
and therefor taking requests which return 0 results.

Also, if I switch to multi-core (1.3) is this problem avoided?

Thanks,
Jacob




-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: [EMAIL PROTECTED]


Re: Delta importing issues

2008-09-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
the if an entity is specified like entity=one&entity=two the command
will be run only for those entities. absence of the parameter entity
means all entities will be executed

the last_index_time is another piece which must be improved

It is hard to get usecases . If users  can give me more usecases it
would be great.

One thing I have in mind is allo users to store arbitrary properties
though API say context.persistProperty("key","value")
and you must be able to read it back using context.getPersistedProperty("key");

This would be a generic enough for users to get going

thoughts.

--Noble

On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Actually how does ${deltaimporter.last_index_time} know which entity Im
> specifically updating?  I feel like Im missing something, can it work like
> that?
>
> Thanks.
>
> - Jon
>
> On Sep 19, 2008, at 4:14 PM, Jon Baer wrote:
>
>> Question -
>>
>> So if I issued a dataimport?command=delta-import&entity=one,two,three
>>
>> Would this also hit items w/o a delta-import like four,five,six, etc?  Im
>> trying to set something up and I ended up with 28k+ documents which seems
>> more like a full import, so do I need to do something like delta-query="" to
>> say no delta?
>>
>> @ the moment I dont have anything defined for those since I don't need it,
>> just wondering what the proper behavior is suppose to be?
>>
>> Thanks.
>>
>> - Jon
>
>



-- 
--Noble Paul


Re: Delta importing issues

2008-09-20 Thread Jon Baer
Would that context but available for *each* entity?  @ present it  
seems like there should be a last_index_time written for each top  
level entity ... no?


Umm would it be possible to hack something like ${deltaimporter.[name  
of entity].last_index_time} as is or are there too many moving parts?


Thanks.

- Jon

On Sep 20, 2008, at 9:21 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



the if an entity is specified like entity=one&entity=two the command
will be run only for those entities. absence of the parameter entity
means all entities will be executed

the last_index_time is another piece which must be improved

It is hard to get usecases . If users  can give me more usecases it
would be great.

One thing I have in mind is allo users to store arbitrary properties
though API say context.persistProperty("key","value")
and you must be able to read it back using  
context.getPersistedProperty("key");


This would be a generic enough for users to get going

thoughts.

--Noble

On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
Actually how does ${deltaimporter.last_index_time} know which  
entity Im
specifically updating?  I feel like Im missing something, can it  
work like

that?

Thanks.

- Jon

On Sep 19, 2008, at 4:14 PM, Jon Baer wrote:


Question -

So if I issued a dataimport?command=delta- 
import&entity=one,two,three


Would this also hit items w/o a delta-import like four,five,six,  
etc?  Im
trying to set something up and I ended up with 28k+ documents  
which seems
more like a full import, so do I need to do something like delta- 
query="" to

say no delta?

@ the moment I dont have anything defined for those since I don't  
need it,

just wondering what the proper behavior is suppose to be?

Thanks.

- Jon







--
--Noble Paul




Re: How to keep a slave offline until the index is puller from master

2008-09-20 Thread Otis Gospodnetic
Even with your current setup (if it's done correctly) slavs should not be 
returning 0 hits for a query that previously returned hits.  That is, nothing 
should be off-line.  Index searcher warmup and swapping happens in the 
background and while that's happening the old searcher should be serving 
queries.


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Jacob Singh <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, September 20, 2008 5:54:39 AM
> Subject: How to keep a slave offline until the index is puller from master
> 
> Hi,
> 
> I'm running multiple instances (solr 1.2) on a single jetty server using JNDI.
> 
> When I launch a slave, it has to retrieve all of the indexes from the
> master server using the snapuller / snapinstaller.
> 
> This works fine, however, I don't want to wait to activate the slave
> (turn on jetty) while waiting for every slave to get its data.
> 
> Is there anyway to make sure that a slave is "up2date" before letting
> it accept queries?  AS it is, the last slave will take 10-15 to get
> its data, and for those 15 minutes, it is active in the load balancer
> and therefor taking requests which return 0 results.
> 
> Also, if I switch to multi-core (1.3) is this problem avoided?
> 
> Thanks,
> Jacob
> 
> 
> 
> 
> -- 
> 
> +1 510 277-0891 (o)
> +91  33 7458 (m)
> 
> web: http://pajamadesign.com
> 
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: [EMAIL PROTECTED]



Re: Capabilities of solr

2008-09-20 Thread Otis Gospodnetic
Hi Chris,

Yes, from what you described, Solr sounds like a good choice.  It sounds like 
for each type of entity (doc vs. product vs... ) you may want to have a 
separate index/schema.  The best place to start is the tutorial.

Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Chris <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, September 20, 2008 12:40:13 AM
> Subject: Capabilities of solr
> 
> Hello,
> 
> We currently have a ton of documents that we would like to index and
> make search-able. I came across solr and it seems like it offers a lot
> of nice features and would suite our needs.
> 
> The documents are in similar structure to java code, blocks
> representing functions, variables, comment blocks etc.
> 
> We would also like to provide our users the ability to "tag" a line,
> or multiple lines of the document with comments that would be stored
> externally, for future reference or notes for enhancements. These
> documents are also updated frequently.
> 
> I also noticed in the examples that XML documents are used to import
> documents into solr. If we have code like documents vs. for example
> products is there any specific way to define the solr schema for these
> types of documents?
> 
> Currently we maintain these documents as flat files and in MySQL.
> 
> Does solr sound like a good option for what we are looking to do? If
> so, could anybody provide some starting points for my research?
> 
> Thank you



Re: SynonymFilter and inch/foot symbols

2008-09-20 Thread Otis Gospodnetic
Hi Kevin,


Find the component that's stripping your " and ' characters (WordDelimiterFF?) 
and make sure those characters are indexed first.  Then make sure the 
query-time analyzer keeps those tokens, too.  Finally, escape special 
characters (e.g. " in your example) in the query before passing it to Solr (I 
*think* Solr won't do it for you).

 
Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Kevin Osborn <[EMAIL PROTECTED]>
> To: Solr 
> Sent: Friday, September 19, 2008 7:18:15 PM
> Subject: SynonymFilter and inch/foot symbols
> 
> How would I handle a search for 21" or 3'. The " and ' symbols appear to get 
> stripped away by Lucene before passing the query off to the analyzers.
> 
> Here is my analyzer in the schema.xml:
> 
> 
>   
>   
> ignoreCase="true" expand="true"/>
>   
> words="stopwords.txt"/>
>   
> generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> catenateNumbers="0" catenateAll="0"/>
>   
>   
>   
> 
> 
> I could certainly replace X" with X inch using regex in my custom request 
> handler. But, I would rather not have synonyms in two separate places.
> 
> We are also using the DisjunctionMaxQueryParser to build the actual query 
> from 
> the front end.



Re: Hardware config for SOLR

2008-09-20 Thread Otis Gospodnetic
I have not worked with SSDs, though I've read all the good information that's 
trickling to us from Denmark.  One thing that I've been wondering all along is 
- what about writes?  That is, what about writes "wearing out" the SSD?  How 
quickly does that happen and when it does happen, what are the symptoms?  For 
example, does it happen after N write operations?  Do writes start failing and 
one starts getting IOExceptions in case of Lucene and Solr?


Otis --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Karl Wettin <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, September 19, 2008 6:15:53 PM
> Subject: Re: Hardware config for SOLR
> 
> 
> 19 sep 2008 kl. 23.22 skrev Grant Ingersoll:
> 
> > As for HDDs, people have noted some nice speedups in Lucene using  
> > Solid-state drives, if you can afford them.
> 
> I've seen the average response time cut in 5-10 times when switching  
> to SSD. 64GB SSD is starting at EUR 200 so that can be a lot cheaper  
> to do replace the disk than getting more servers, given you can fit  
> your index on of those.
> 
> 
>   karl



Re: Hardware config for SOLR

2008-09-20 Thread Lars Kotthoff
> I have not worked with SSDs, though I've read all the good information that's
> trickling to us from Denmark.  One thing that I've been wondering all along is
> - what about writes?  That is, what about writes "wearing out" the SSD?  How
> quickly does that happen and when it does happen, what are the symptoms?  For
> example, does it happen after N write operations?  Do writes start failing and
> one starts getting IOExceptions in case of Lucene and Solr?

With modern SSDs you get something in the region of 500,000 to 1,000,000 write
cycles per memory cell. Additionally they all use wear leveling, i.e. the writes
are spread over the whole disk -- you can write to a file system block many
times more. One of the manufacturers of high-end SSDs [1] claims that at a
sustained write rate of 50GB per day their drives will last more than 140 years,
i.e. it's much more likely that something else will fail before ;)

When the write cycles are "exhausted" much the same thing as with a bad
conventional disk happens -- you'll see lots of write errors. If the wear
leveling is perfect (i.e. all memory locations have exactly the same number of
writes) it's even possible that the whole disk will fail at once.

Lars

[1] http://www.mtron.net


Re: Delta importing issues

2008-09-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
context is available for each entity but the implementation just
stores one value for last_index_time.
If it stored the last_index_time per top-level entity it could return
the correct value from context.

Anyway raise an issue and I can provide a patch soon

something like this also shall be supported

${dataimporter.[name of entity].last_index_time}

On Sat, Sep 20, 2008 at 9:32 PM, Jon Baer <[EMAIL PROTECTED]> wrote:
> Would that context but available for *each* entity?  @ present it seems like
> there should be a last_index_time written for each top level entity ... no?
>
> Umm would it be possible to hack something like ${deltaimporter.[name of
> entity].last_index_time} as is or are there too many moving parts?
>
> Thanks.
>
> - Jon
>
> On Sep 20, 2008, at 9:21 AM, Noble Paul നോബിള്‍ नोब्ळ् wrote:
>
>> the if an entity is specified like entity=one&entity=two the command
>> will be run only for those entities. absence of the parameter entity
>> means all entities will be executed
>>
>> the last_index_time is another piece which must be improved
>>
>> It is hard to get usecases . If users  can give me more usecases it
>> would be great.
>>
>> One thing I have in mind is allo users to store arbitrary properties
>> though API say context.persistProperty("key","value")
>> and you must be able to read it back using
>> context.getPersistedProperty("key");
>>
>> This would be a generic enough for users to get going
>>
>> thoughts.
>>
>> --Noble
>>
>> On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote:
>>>
>>> Actually how does ${deltaimporter.last_index_time} know which entity Im
>>> specifically updating?  I feel like Im missing something, can it work
>>> like
>>> that?
>>>
>>> Thanks.
>>>
>>> - Jon
>>>
>>> On Sep 19, 2008, at 4:14 PM, Jon Baer wrote:
>>>
 Question -

 So if I issued a dataimport?command=delta-import&entity=one,two,three

 Would this also hit items w/o a delta-import like four,five,six, etc?
  Im
 trying to set something up and I ended up with 28k+ documents which
 seems
 more like a full import, so do I need to do something like
 delta-query="" to
 say no delta?

 @ the moment I dont have anything defined for those since I don't need
 it,
 just wondering what the proper behavior is suppose to be?

 Thanks.

 - Jon
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>
>



-- 
--Noble Paul


Re: How to keep a slave offline until the index is puller from master

2008-09-20 Thread Jacob Singh
Hi Otis,

Thanks for the response.  I was actually talking about the initial
sync over from the master.  what I'd like I guess is a "lock" command
which would start true, and when snapinstaller ran successfully for
the first time would become false.  I can write the bash, but I'm not
sure how to get solr to to push out the 503 (I guess that would be the
appropriate code)...

Best,
Jacob



On Sun, Sep 21, 2008 at 12:29 AM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Even with your current setup (if it's done correctly) slavs should not be 
> returning 0 hits for a query that previously returned hits.  That is, nothing 
> should be off-line.  Index searcher warmup and swapping happens in the 
> background and while that's happening the old searcher should be serving 
> queries.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
>> From: Jacob Singh <[EMAIL PROTECTED]>
>> To: solr-user@lucene.apache.org
>> Sent: Saturday, September 20, 2008 5:54:39 AM
>> Subject: How to keep a slave offline until the index is puller from master
>>
>> Hi,
>>
>> I'm running multiple instances (solr 1.2) on a single jetty server using 
>> JNDI.
>>
>> When I launch a slave, it has to retrieve all of the indexes from the
>> master server using the snapuller / snapinstaller.
>>
>> This works fine, however, I don't want to wait to activate the slave
>> (turn on jetty) while waiting for every slave to get its data.
>>
>> Is there anyway to make sure that a slave is "up2date" before letting
>> it accept queries?  AS it is, the last slave will take 10-15 to get
>> its data, and for those 15 minutes, it is active in the load balancer
>> and therefor taking requests which return 0 results.
>>
>> Also, if I switch to multi-core (1.3) is this problem avoided?
>>
>> Thanks,
>> Jacob
>>
>>
>>
>>
>> --
>>
>> +1 510 277-0891 (o)
>> +91  33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: [EMAIL PROTECTED]
>
>



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: [EMAIL PROTECTED]