How to keep a slave offline until the index is puller from master
Hi, I'm running multiple instances (solr 1.2) on a single jetty server using JNDI. When I launch a slave, it has to retrieve all of the indexes from the master server using the snapuller / snapinstaller. This works fine, however, I don't want to wait to activate the slave (turn on jetty) while waiting for every slave to get its data. Is there anyway to make sure that a slave is "up2date" before letting it accept queries? AS it is, the last slave will take 10-15 to get its data, and for those 15 minutes, it is active in the load balancer and therefor taking requests which return 0 results. Also, if I switch to multi-core (1.3) is this problem avoided? Thanks, Jacob -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: [EMAIL PROTECTED]
Re: Delta importing issues
the if an entity is specified like entity=one&entity=two the command will be run only for those entities. absence of the parameter entity means all entities will be executed the last_index_time is another piece which must be improved It is hard to get usecases . If users can give me more usecases it would be great. One thing I have in mind is allo users to store arbitrary properties though API say context.persistProperty("key","value") and you must be able to read it back using context.getPersistedProperty("key"); This would be a generic enough for users to get going thoughts. --Noble On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote: > Actually how does ${deltaimporter.last_index_time} know which entity Im > specifically updating? I feel like Im missing something, can it work like > that? > > Thanks. > > - Jon > > On Sep 19, 2008, at 4:14 PM, Jon Baer wrote: > >> Question - >> >> So if I issued a dataimport?command=delta-import&entity=one,two,three >> >> Would this also hit items w/o a delta-import like four,five,six, etc? Im >> trying to set something up and I ended up with 28k+ documents which seems >> more like a full import, so do I need to do something like delta-query="" to >> say no delta? >> >> @ the moment I dont have anything defined for those since I don't need it, >> just wondering what the proper behavior is suppose to be? >> >> Thanks. >> >> - Jon > > -- --Noble Paul
Re: Delta importing issues
Would that context but available for *each* entity? @ present it seems like there should be a last_index_time written for each top level entity ... no? Umm would it be possible to hack something like ${deltaimporter.[name of entity].last_index_time} as is or are there too many moving parts? Thanks. - Jon On Sep 20, 2008, at 9:21 AM, Noble Paul നോബിള് नोब्ळ् wrote: the if an entity is specified like entity=one&entity=two the command will be run only for those entities. absence of the parameter entity means all entities will be executed the last_index_time is another piece which must be improved It is hard to get usecases . If users can give me more usecases it would be great. One thing I have in mind is allo users to store arbitrary properties though API say context.persistProperty("key","value") and you must be able to read it back using context.getPersistedProperty("key"); This would be a generic enough for users to get going thoughts. --Noble On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote: Actually how does ${deltaimporter.last_index_time} know which entity Im specifically updating? I feel like Im missing something, can it work like that? Thanks. - Jon On Sep 19, 2008, at 4:14 PM, Jon Baer wrote: Question - So if I issued a dataimport?command=delta- import&entity=one,two,three Would this also hit items w/o a delta-import like four,five,six, etc? Im trying to set something up and I ended up with 28k+ documents which seems more like a full import, so do I need to do something like delta- query="" to say no delta? @ the moment I dont have anything defined for those since I don't need it, just wondering what the proper behavior is suppose to be? Thanks. - Jon -- --Noble Paul
Re: How to keep a slave offline until the index is puller from master
Even with your current setup (if it's done correctly) slavs should not be returning 0 hits for a query that previously returned hits. That is, nothing should be off-line. Index searcher warmup and swapping happens in the background and while that's happening the old searcher should be serving queries. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Jacob Singh <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Saturday, September 20, 2008 5:54:39 AM > Subject: How to keep a slave offline until the index is puller from master > > Hi, > > I'm running multiple instances (solr 1.2) on a single jetty server using JNDI. > > When I launch a slave, it has to retrieve all of the indexes from the > master server using the snapuller / snapinstaller. > > This works fine, however, I don't want to wait to activate the slave > (turn on jetty) while waiting for every slave to get its data. > > Is there anyway to make sure that a slave is "up2date" before letting > it accept queries? AS it is, the last slave will take 10-15 to get > its data, and for those 15 minutes, it is active in the load balancer > and therefor taking requests which return 0 results. > > Also, if I switch to multi-core (1.3) is this problem avoided? > > Thanks, > Jacob > > > > > -- > > +1 510 277-0891 (o) > +91 33 7458 (m) > > web: http://pajamadesign.com > > Skype: pajamadesign > Yahoo: jacobsingh > AIM: jacobsingh > gTalk: [EMAIL PROTECTED]
Re: Capabilities of solr
Hi Chris, Yes, from what you described, Solr sounds like a good choice. It sounds like for each type of entity (doc vs. product vs... ) you may want to have a separate index/schema. The best place to start is the tutorial. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Chris <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Saturday, September 20, 2008 12:40:13 AM > Subject: Capabilities of solr > > Hello, > > We currently have a ton of documents that we would like to index and > make search-able. I came across solr and it seems like it offers a lot > of nice features and would suite our needs. > > The documents are in similar structure to java code, blocks > representing functions, variables, comment blocks etc. > > We would also like to provide our users the ability to "tag" a line, > or multiple lines of the document with comments that would be stored > externally, for future reference or notes for enhancements. These > documents are also updated frequently. > > I also noticed in the examples that XML documents are used to import > documents into solr. If we have code like documents vs. for example > products is there any specific way to define the solr schema for these > types of documents? > > Currently we maintain these documents as flat files and in MySQL. > > Does solr sound like a good option for what we are looking to do? If > so, could anybody provide some starting points for my research? > > Thank you
Re: SynonymFilter and inch/foot symbols
Hi Kevin, Find the component that's stripping your " and ' characters (WordDelimiterFF?) and make sure those characters are indexed first. Then make sure the query-time analyzer keeps those tokens, too. Finally, escape special characters (e.g. " in your example) in the query before passing it to Solr (I *think* Solr won't do it for you). Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Kevin Osborn <[EMAIL PROTECTED]> > To: Solr > Sent: Friday, September 19, 2008 7:18:15 PM > Subject: SynonymFilter and inch/foot symbols > > How would I handle a search for 21" or 3'. The " and ' symbols appear to get > stripped away by Lucene before passing the query off to the analyzers. > > Here is my analyzer in the schema.xml: > > > > > ignoreCase="true" expand="true"/> > > words="stopwords.txt"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > > > > > I could certainly replace X" with X inch using regex in my custom request > handler. But, I would rather not have synonyms in two separate places. > > We are also using the DisjunctionMaxQueryParser to build the actual query > from > the front end.
Re: Hardware config for SOLR
I have not worked with SSDs, though I've read all the good information that's trickling to us from Denmark. One thing that I've been wondering all along is - what about writes? That is, what about writes "wearing out" the SSD? How quickly does that happen and when it does happen, what are the symptoms? For example, does it happen after N write operations? Do writes start failing and one starts getting IOExceptions in case of Lucene and Solr? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Karl Wettin <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, September 19, 2008 6:15:53 PM > Subject: Re: Hardware config for SOLR > > > 19 sep 2008 kl. 23.22 skrev Grant Ingersoll: > > > As for HDDs, people have noted some nice speedups in Lucene using > > Solid-state drives, if you can afford them. > > I've seen the average response time cut in 5-10 times when switching > to SSD. 64GB SSD is starting at EUR 200 so that can be a lot cheaper > to do replace the disk than getting more servers, given you can fit > your index on of those. > > > karl
Re: Hardware config for SOLR
> I have not worked with SSDs, though I've read all the good information that's > trickling to us from Denmark. One thing that I've been wondering all along is > - what about writes? That is, what about writes "wearing out" the SSD? How > quickly does that happen and when it does happen, what are the symptoms? For > example, does it happen after N write operations? Do writes start failing and > one starts getting IOExceptions in case of Lucene and Solr? With modern SSDs you get something in the region of 500,000 to 1,000,000 write cycles per memory cell. Additionally they all use wear leveling, i.e. the writes are spread over the whole disk -- you can write to a file system block many times more. One of the manufacturers of high-end SSDs [1] claims that at a sustained write rate of 50GB per day their drives will last more than 140 years, i.e. it's much more likely that something else will fail before ;) When the write cycles are "exhausted" much the same thing as with a bad conventional disk happens -- you'll see lots of write errors. If the wear leveling is perfect (i.e. all memory locations have exactly the same number of writes) it's even possible that the whole disk will fail at once. Lars [1] http://www.mtron.net
Re: Delta importing issues
context is available for each entity but the implementation just stores one value for last_index_time. If it stored the last_index_time per top-level entity it could return the correct value from context. Anyway raise an issue and I can provide a patch soon something like this also shall be supported ${dataimporter.[name of entity].last_index_time} On Sat, Sep 20, 2008 at 9:32 PM, Jon Baer <[EMAIL PROTECTED]> wrote: > Would that context but available for *each* entity? @ present it seems like > there should be a last_index_time written for each top level entity ... no? > > Umm would it be possible to hack something like ${deltaimporter.[name of > entity].last_index_time} as is or are there too many moving parts? > > Thanks. > > - Jon > > On Sep 20, 2008, at 9:21 AM, Noble Paul നോബിള് नोब्ळ् wrote: > >> the if an entity is specified like entity=one&entity=two the command >> will be run only for those entities. absence of the parameter entity >> means all entities will be executed >> >> the last_index_time is another piece which must be improved >> >> It is hard to get usecases . If users can give me more usecases it >> would be great. >> >> One thing I have in mind is allo users to store arbitrary properties >> though API say context.persistProperty("key","value") >> and you must be able to read it back using >> context.getPersistedProperty("key"); >> >> This would be a generic enough for users to get going >> >> thoughts. >> >> --Noble >> >> On Sat, Sep 20, 2008 at 1:52 AM, Jon Baer <[EMAIL PROTECTED]> wrote: >>> >>> Actually how does ${deltaimporter.last_index_time} know which entity Im >>> specifically updating? I feel like Im missing something, can it work >>> like >>> that? >>> >>> Thanks. >>> >>> - Jon >>> >>> On Sep 19, 2008, at 4:14 PM, Jon Baer wrote: >>> Question - So if I issued a dataimport?command=delta-import&entity=one,two,three Would this also hit items w/o a delta-import like four,five,six, etc? Im trying to set something up and I ended up with 28k+ documents which seems more like a full import, so do I need to do something like delta-query="" to say no delta? @ the moment I dont have anything defined for those since I don't need it, just wondering what the proper behavior is suppose to be? Thanks. - Jon >>> >>> >> >> >> >> -- >> --Noble Paul > > -- --Noble Paul
Re: How to keep a slave offline until the index is puller from master
Hi Otis, Thanks for the response. I was actually talking about the initial sync over from the master. what I'd like I guess is a "lock" command which would start true, and when snapinstaller ran successfully for the first time would become false. I can write the bash, but I'm not sure how to get solr to to push out the 503 (I guess that would be the appropriate code)... Best, Jacob On Sun, Sep 21, 2008 at 12:29 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Even with your current setup (if it's done correctly) slavs should not be > returning 0 hits for a query that previously returned hits. That is, nothing > should be off-line. Index searcher warmup and swapping happens in the > background and while that's happening the old searcher should be serving > queries. > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message >> From: Jacob Singh <[EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org >> Sent: Saturday, September 20, 2008 5:54:39 AM >> Subject: How to keep a slave offline until the index is puller from master >> >> Hi, >> >> I'm running multiple instances (solr 1.2) on a single jetty server using >> JNDI. >> >> When I launch a slave, it has to retrieve all of the indexes from the >> master server using the snapuller / snapinstaller. >> >> This works fine, however, I don't want to wait to activate the slave >> (turn on jetty) while waiting for every slave to get its data. >> >> Is there anyway to make sure that a slave is "up2date" before letting >> it accept queries? AS it is, the last slave will take 10-15 to get >> its data, and for those 15 minutes, it is active in the load balancer >> and therefor taking requests which return 0 results. >> >> Also, if I switch to multi-core (1.3) is this problem avoided? >> >> Thanks, >> Jacob >> >> >> >> >> -- >> >> +1 510 277-0891 (o) >> +91 33 7458 (m) >> >> web: http://pajamadesign.com >> >> Skype: pajamadesign >> Yahoo: jacobsingh >> AIM: jacobsingh >> gTalk: [EMAIL PROTECTED] > > -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: [EMAIL PROTECTED]