analyzer index vs query vs {missing}
hi there, when defining a field type, i understand the meaning of 'analyzer type="index"' , or type="query". What does it mean when the type is missing? does it apply at both index and query ? This can be found in the example's schema.xml : thanks! B _ {Beto|Norberto|Numard} Meijome "Humans die and turn to dust, but writing makes us remembered" 4000-year-old words of an Egyptian scribe I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Solr Master Slave Architecture over NFS
Hey, I'm looking for some feedback on the following setup. Due to the architects decision I will be working with NFS not Solr's own distribution scripts. A few Solr indexing machines use Multicore to divide the 300.000 Users to 1000 shards. For several reasons we have to go with per user sharding (as you can see 300 per shard) Updates come in with about 166 updates per hour on each shard. So not a problem. The question lies more in this concept: I set up a few Query Slaves, using NFS readonly mounts. I do not use the index directory for the readonly slaves. I patched the slaves to use the most recent snapshot directory to avoid all the nasty nfs issues. (only a quick and dirty hack for testing) On a not yet defined interval I do a snapshot on the masters and send a http commit to the slave, so a new reader on the fresh snapshot is opened. This seems to work without trouble so far, but I've not done extensive testing. To take this a step further (only an idea yet). I let the slaves work on the real index, as long as I do not optimize. Because the directory structure is not changing as long as I do not optimize, I can send commits to the slaves. Before I optimize I take a snapshot, send them a special "commit" to make them fall back to the most recent snapshot dir, optimize the index and send them a real commit when done. Even though a little trickier I would be more up to date with the query slaves. So if you have any design comments or see major or minor flaws, feedback would be very welcome. I do not use live data yet, this is the experimental stage. But I'll give feedback on how it performs and what issues I run into. There's also the faint chance of letting this setup (or a "fixed" one) run on the real user data, which would be roughly 20TB of usable data for indexing. This would be really interesting :-) Have a nice week Nico
RE: Benchmarking tools?
Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico > -Original Message- > From: Jacob Singh [mailto:[EMAIL PROTECTED] > Sent: Sunday, June 29, 2008 6:04 PM > To: solr-user@lucene.apache.org > Subject: Benchmarking tools? > > Hi folks, > > Does anyone have any bright ideas on how to benchmark solr? > Unless someone has something better, here is what I am thinking: > > 1. Have a config file where one can specify info like how > many docs, how large, how many facets, and how many updates / > searches per minute > > 2. Use one of the various client APIs to generate XML files > for updates using some kind of lorem ipsum text as a base and > store them in a dir. > > 3. Use siege to set the update run at whatever interval is > specified in the config, sending an update every x seconds > and removing it from the directory > > 4. Generate a list of search queries based upon the facets > created, and build a urls.txt with all of these search urls > > 5. Run the searches through siege > > 6. Monitor the output using nagios to see where load kicks in. > > This is not that sophisticated, and feels like it won't > really pinpoint bottlenecks, but would aproximately tell us > where a server will start to bail. > > Does anyone have any better ideas? > > Best, > Jacob Singh >
Limit Porter stemmer to plural stemming only?
Hi all, Porter stemmer in general is really good. However, there are some cases where it doesn't work. For example, "accountant" matches "Accountant" as well as "Account Manager" which isn't desirable. Is it possible to use this analyser for plural words only? For example: +Accountant -> accountant +Accountants -> accountant +Account -> Account +Accounts -> account Thanks. -- Regards, Cuong Hoang
Re: Benchmarking tools?
Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: > Hi, > I did some trivial Tests with Jmeter. > I set up Jmeter to increase the number of threads steadily. > For requests I either usa a random word or combination of words in a > wordlist or some sample date from the test system. (this is described in the > JMeter manual) > > In my case the System works fine as long as I don't exceed the max number of > requests per second it can handel. But thats not a big surprise. More > interesting seems the fact, that to a certain degree, after exceeding the > max nr of requests response time seems to rise linear for a little while and > then exponentially. But that might also be the result of my test szenario. > > Nico > > >> -Original Message- >> From: Jacob Singh [mailto:[EMAIL PROTECTED] >> Sent: Sunday, June 29, 2008 6:04 PM >> To: solr-user@lucene.apache.org >> Subject: Benchmarking tools? >> >> Hi folks, >> >> Does anyone have any bright ideas on how to benchmark solr? >> Unless someone has something better, here is what I am thinking: >> >> 1. Have a config file where one can specify info like how >> many docs, how large, how many facets, and how many updates / >> searches per minute >> >> 2. Use one of the various client APIs to generate XML files >> for updates using some kind of lorem ipsum text as a base and >> store them in a dir. >> >> 3. Use siege to set the update run at whatever interval is >> specified in the config, sending an update every x seconds >> and removing it from the directory >> >> 4. Generate a list of search queries based upon the facets >> created, and build a urls.txt with all of these search urls >> >> 5. Run the searches through siege >> >> 6. Monitor the output using nagios to see where load kicks in. >> >> This is not that sophisticated, and feels like it won't >> really pinpoint bottlenecks, but would aproximately tell us >> where a server will start to bail. >> >> Does anyone have any better ideas? >> >> Best, >> Jacob Singh >> > >
Re: Limit Porter stemmer to plural stemming only?
Ok, it looks like step 1a in Porter algo does what I need. On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[EMAIL PROTECTED]> wrote: > Hi all, > Porter stemmer in general is really good. However, there are some cases > where it doesn't work. For example, "accountant" matches "Accountant" as > well as "Account Manager" which isn't desirable. Is it possible to use this > analyser for plural words only? For example: > +Accountant -> accountant > +Accountants -> accountant > +Account -> Account > +Accounts -> account > > Thanks. > > -- > Regards, > > Cuong Hoang > -- Regards, Cuong Hoang
Re: analyzer index vs query vs {missing}
Yes, that's exactly what it means. Erik On Jun 30, 2008, at 3:01 AM, Norberto Meijome wrote: hi there, when defining a field type, i understand the meaning of 'analyzer type="index"' , or type="query". What does it mean when the type is missing? does it apply at both index and query ? This can be found in the example's schema.xml : positionIncrementGap="100" > thanks! B _ {Beto|Norberto|Numard} Meijome "Humans die and turn to dust, but writing makes us remembered" 4000-year-old words of an Egyptian scribe I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
1.3 maven artifact
Hi, I just wanted to ask if solr 1.3 is already available as maven artifact? If it is not could you give me an estimate on when it will be? TIA, Stefan Oestreicher -- Dr. Maté GmbH Stefan Oestreicher / Entwicklung [EMAIL PROTECTED] http://www.netdoktor.at Tel Buero: + 43 1 405 55 75 24 Fax Buero: + 43 1 405 55 75 55 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6
Re: Benchmarking tools?
Hi, I basically followed this: http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae I basically put all my queries in a flat text file. you could either use two parameters or put them in one file. The good point of this is, that each test uses the same queries, so you can compare the settings better afterwards. If you use varying facets, you might just go with 2 text files. If it stays the same in one test you can hardcode it into the test case. I polished the result a little, if you want to take a look: http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such nice graphs. (green is the max results delivered, upon 66 "active users" per second the response time increases (orange/yellow, average and median of the response times) (i know the scales and descriptions are missing :-) but you should get the picture) I manually reduced the machines capacity, elsewise solr would server more than 12000 requests per second. (the whole index did fit into ram) I can send you my saved test case if this would help you. Nico Jacob Singh wrote: Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: analyzer index vs query vs {missing}
On Mon, 30 Jun 2008 05:52:33 -0400 Erik Hatcher <[EMAIL PROTECTED]> wrote: > Yes, that's exactly what it means. > > Erik great, thanks for the clarification. B _ {Beto|Norberto|Numard} Meijome "A dream you dream together is reality." John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Benchmarking tools?
nice stuff. Please send me the test case, I'd love to see it. Thanks, Jacob Nico Heid wrote: > Hi, > I basically followed this: > http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae > > > I basically put all my queries in a flat text file. you could either use > two parameters or put them in one file. > The good point of this is, that each test uses the same queries, so you > can compare the settings better afterwards. > > If you use varying facets, you might just go with 2 text files. If it > stays the same in one test you can hardcode it into the test case. > > I polished the result a little, if you want to take a look: > http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such > nice graphs. > (green is the max results delivered, upon 66 "active users" per second > the response time increases (orange/yellow, average and median of the > response times) > (i know the scales and descriptions are missing :-) but you should get > the picture) > I manually reduced the machines capacity, elsewise solr would server > more than 12000 requests per second. (the whole index did fit into ram) > I can send you my saved test case if this would help you. > > Nico > > > Jacob Singh wrote: >> Hi Nico, >> >> Thanks for the info. Do you have you scripts available for this? >> >> Also, is it configurable to give variable numbers of facets and facet >> based searches? I have a feeling this will be the limiting factor, and >> much slower than keyword searches but I could be (and usually am) wrong. >> >> Best, >> >> Jacob >> >> Nico Heid wrote: >> >>> Hi, >>> I did some trivial Tests with Jmeter. >>> I set up Jmeter to increase the number of threads steadily. >>> For requests I either usa a random word or combination of words in a >>> wordlist or some sample date from the test system. (this is described >>> in the >>> JMeter manual) >>> >>> In my case the System works fine as long as I don't exceed the max >>> number of >>> requests per second it can handel. But thats not a big surprise. More >>> interesting seems the fact, that to a certain degree, after exceeding >>> the >>> max nr of requests response time seems to rise linear for a little >>> while and >>> then exponentially. But that might also be the result of my test >>> szenario. >>> >>> Nico >>> >>> >>> -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh >>> >
Minimum JDK for SolrJ?
What is the minimum JDK that can be used for developing clients that use SolrJ? I am stuck on JDK 1.4.2 at the moment and am wondering if SolrJ is an option for me. Thanks! Todd
Re: Minimum JDK for SolrJ?
SolrJ needs a minimum java 5 --Noble On Mon, Jun 30, 2008 at 8:00 PM, Todd Breiholz <[EMAIL PROTECTED]> wrote: > What is the minimum JDK that can be used for developing clients that use > SolrJ? I am stuck on JDK 1.4.2 at the moment and am wondering if SolrJ is an > option for me. > > Thanks! > > Todd > -- --Noble Paul
Re: Benchmarking tools?
Me too. Thanks. Jacob Singh wrote: nice stuff. Please send me the test case, I'd love to see it. Thanks, Jacob Nico Heid wrote: Hi, I basically followed this: http://wiki.apache.org/jakarta-jmeter/JMeterFAQ#head-1680863678257fbcb85bd97351860eb0049f19ae I basically put all my queries in a flat text file. you could either use two parameters or put them in one file. The good point of this is, that each test uses the same queries, so you can compare the settings better afterwards. If you use varying facets, you might just go with 2 text files. If it stays the same in one test you can hardcode it into the test case. I polished the result a little, if you want to take a look: http://i31.tinypic.com/28c2blk.jpg , JMeter itself does not plot such nice graphs. (green is the max results delivered, upon 66 "active users" per second the response time increases (orange/yellow, average and median of the response times) (i know the scales and descriptions are missing :-) but you should get the picture) I manually reduced the machines capacity, elsewise solr would server more than 12000 requests per second. (the whole index did fit into ram) I can send you my saved test case if this would help you. Nico Jacob Singh wrote: Hi Nico, Thanks for the info. Do you have you scripts available for this? Also, is it configurable to give variable numbers of facets and facet based searches? I have a feeling this will be the limiting factor, and much slower than keyword searches but I could be (and usually am) wrong. Best, Jacob Nico Heid wrote: Hi, I did some trivial Tests with Jmeter. I set up Jmeter to increase the number of threads steadily. For requests I either usa a random word or combination of words in a wordlist or some sample date from the test system. (this is described in the JMeter manual) In my case the System works fine as long as I don't exceed the max number of requests per second it can handel. But thats not a big surprise. More interesting seems the fact, that to a certain degree, after exceeding the max nr of requests response time seems to rise linear for a little while and then exponentially. But that might also be the result of my test szenario. Nico -Original Message- From: Jacob Singh [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 6:04 PM To: solr-user@lucene.apache.org Subject: Benchmarking tools? Hi folks, Does anyone have any bright ideas on how to benchmark solr? Unless someone has something better, here is what I am thinking: 1. Have a config file where one can specify info like how many docs, how large, how many facets, and how many updates / searches per minute 2. Use one of the various client APIs to generate XML files for updates using some kind of lorem ipsum text as a base and store them in a dir. 3. Use siege to set the update run at whatever interval is specified in the config, sending an update every x seconds and removing it from the directory 4. Generate a list of search queries based upon the facets created, and build a urls.txt with all of these search urls 5. Run the searches through siege 6. Monitor the output using nagios to see where load kicks in. This is not that sophisticated, and feels like it won't really pinpoint bottlenecks, but would aproximately tell us where a server will start to bail. Does anyone have any better ideas? Best, Jacob Singh
Re: Solr Master Slave Architecture over NFS
Isn't using Lucene over NFS *not* recommended? Bill On Mon, Jun 30, 2008 at 4:27 AM, Nico Heid <[EMAIL PROTECTED]> wrote: > Hey, I'm looking for some feedback on the following setup. > Due to the architects decision I will be working with NFS not Solr's own > distribution scripts. > > A few Solr indexing machines use Multicore to divide the 300.000 Users to > 1000 > shards. > For several reasons we have to go with per user sharding (as you can see > 300 > per shard) Updates come in with about 166 updates per hour on each shard. > So > not a problem. > > The question lies more in this concept: I set up a few Query Slaves, using > NFS > readonly mounts. > I do not use the index directory for the readonly slaves. I patched the > slaves > to use the most recent snapshot directory to avoid all the nasty nfs > issues. > (only a quick and dirty hack for testing) On a not yet defined interval I > do a > snapshot on the masters and send a http commit to the slave, so a new > reader > on the fresh snapshot is opened. > This seems to work without trouble so far, but I've not done extensive > testing. > > To take this a step further (only an idea yet). I let the slaves work on > the > real index, as long as I do not optimize. Because the directory structure > is > not changing as long as I do not optimize, I can send commits to the > slaves. > Before I optimize I take a snapshot, send them a special "commit" to make > them > fall back to the most recent snapshot dir, optimize the index and send them > a > real commit when done. > Even though a little trickier I would be more up to date with the query > slaves. > > So if you have any design comments or see major or minor flaws, feedback > would > be very welcome. > > I do not use live data yet, this is the experimental stage. But I'll give > feedback on how it performs and what issues I run into. There's also the > faint > chance of letting this setup (or a "fixed" one) run on the real user data, > which would be roughly 20TB of usable data for indexing. This would be > really > interesting :-) > > Have a nice week > Nico > > >
RE: UnicodeNormalizationFilterFactory
Hi Robert, Could you create a JIRA issue and attach your code to it? That makes it easier for people to evaluate it (rather than just binary distribution). This sounds general enough to me that it would be a useful addition to Lucene itself. Solr's factory could just be sugar on top then. Thanks, Steve On 06/26/2008 at 4:41 PM, Robert Haschart wrote: > Lance Norskog wrote: > > > ISOLatin1AccentFilterFactory works quite well for us. It solves our > > basic euro-text keyboard searching problem, where "protege" should find > > protégé. ("protege" with two accents.) > > > > -Original Message- > > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, June 24, 2008 4:05 PM > > To: solr-user@lucene.apache.org > > Subject: Re: UnicodeNormalizationFilterFactory > > > > > > > I've seen mention of these filters: > > > > > > > > > > > > > Are you asking because you saw these in Robert Haschart's reply to your > > previous question? I think those are custom Filters that he has in his > > project ... not open source (but i may be wrong) > > > > they are certainly not something that comes out of the box w/ Solr. > > > > > > -Hoss > > > > > The ISOLatin1AccentFilter works well in the case above described by > Lance Norskog, ie. for words containing characters with accents where > the accented character is a single unicode character for the > letter with > the accent mark as in protégé. However in the data that we work with, > often accented characters will be represented by a plain unaccented > character followed by the Unicode combining character for the accent > mark, roughly like this: prote'ge' which emerge from the > ISOLatin1AccentFilter unchanged. > > After some research I found the UnicodeNormalizationFilter mentioned > above, which did not work on my development system (because it relies > features only available in java 6), and which when combined with the > DiacriticsFilter also mentioned above would remove diacritics from > characters, but also discard any Chinese characters or Russian > characters, or anything else outside the 0x0--0x7f range. > Which is bad. > > I first modified the filter to normalize the characters to > the composed > normalized form, (changing prote'ge' to protégé) and then pass the > results through the ISOLatin1AccentFilter. However for accented > characters for which there is no composed normailzed form > (such as the n > and s in Zarin̦š) the accents are not removed. > > So I took the approach of decomposing the accented characters, and then > only removing the valid diacritics and zero-width composing characters > from the result, and the resulting filter works quite well. And since it > was developed as a part of the blacklight project at the University of > Virginia it is Open Source under the Apache License. > > If anyone is interested in evaluating of using the > UnicodeNormalizationFilter in conjunction with their Solr installation > get the UnicodeNormalizeFilter.jar from: > > http://blacklight.rubyforge.org/svn/trunk/solr/lib/ > > and place it in a lib directory next to the conf directory in > your Solr > home directory. > > Robert Haschart > > > > > > > >
Re: Solr Master Slave Architecture over NFS
I think it comes w/ some caveats, but is now workable (although it may not give great performance), assuming you're using 2.3 (2.2) or later. I would definitely do a search in the Lucene archives about NFS, especially paying attention to Mike McCandless' comments. On Jun 30, 2008, at 1:08 PM, Bill Au wrote: Isn't using Lucene over NFS *not* recommended? Bill On Mon, Jun 30, 2008 at 4:27 AM, Nico Heid <[EMAIL PROTECTED]> wrote: Hey, I'm looking for some feedback on the following setup. Due to the architects decision I will be working with NFS not Solr's own distribution scripts. A few Solr indexing machines use Multicore to divide the 300.000 Users to 1000 shards. For several reasons we have to go with per user sharding (as you can see 300 per shard) Updates come in with about 166 updates per hour on each shard. So not a problem. The question lies more in this concept: I set up a few Query Slaves, using NFS readonly mounts. I do not use the index directory for the readonly slaves. I patched the slaves to use the most recent snapshot directory to avoid all the nasty nfs issues. (only a quick and dirty hack for testing) On a not yet defined interval I do a snapshot on the masters and send a http commit to the slave, so a new reader on the fresh snapshot is opened. This seems to work without trouble so far, but I've not done extensive testing. To take this a step further (only an idea yet). I let the slaves work on the real index, as long as I do not optimize. Because the directory structure is not changing as long as I do not optimize, I can send commits to the slaves. Before I optimize I take a snapshot, send them a special "commit" to make them fall back to the most recent snapshot dir, optimize the index and send them a real commit when done. Even though a little trickier I would be more up to date with the query slaves. So if you have any design comments or see major or minor flaws, feedback would be very welcome. I do not use live data yet, this is the experimental stage. But I'll give feedback on how it performs and what issues I run into. There's also the faint chance of letting this setup (or a "fixed" one) run on the real user data, which would be roughly 20TB of usable data for indexing. This would be really interesting :-) Have a nice week Nico -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: Limit Porter stemmer to plural stemming only?
If you find a solution that works well, I encourage you to contribute it back to Solr. Plural-only stemming is probably a common need (I've definitely wanted to use it before). cheers, -Mike On 30-Jun-08, at 2:25 AM, climbingrose wrote: Ok, it looks like step 1a in Porter algo does what I need. On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[EMAIL PROTECTED]> wrote: Hi all, Porter stemmer in general is really good. However, there are some cases where it doesn't work. For example, "accountant" matches "Accountant" as well as "Account Manager" which isn't desirable. Is it possible to use this analyser for plural words only? For example: +Accountant -> accountant +Accountants -> accountant +Account -> Account +Accounts -> account Thanks. -- Regards, Cuong Hoang -- Regards, Cuong Hoang
Re: Efficient date-based results sorting
: Subject: Efficient date-based results sorting Sorting on anything but score is done pretty much the exact same way regardless of data type. The one thing you can do to make any sorting on any field more efficient is to try and reduce the cardinality of the field -- ie: reduce the number of unique indexed terms. With date based fields, that means that if you don't care about millisecond granularity when you sort by date, round to the nearest second when you index that field. if you don't care about second granularity, round to the nearest minute, etc. I suppose there is also this issue... http://issues.apache.org/jira/browse/SOLR-440 ...if someone implemnts a new DateField class that uses SortableLong as the underlying format instead of string you can be more *memory* efficient, but the speed of sorted queries will be about the same. -Hoss
Re: Search query optimization
If I know that condition C will eliminate more results than either A or B, does specifying the query as: "C AND A AND B" make it any faster (than the original "A AND B AND C")? -- View this message in context: http://www.nabble.com/Search-query-optimization-tp17544667p18205504.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search query optimization
: If I know that condition C will eliminate more results than either A or B, : does specifying the query as: "C AND A AND B" make it any faster (than the : original "A AND B AND C")? Nope. Lucene takes care of that for you. -Hoss
Re: Limit Porter stemmer to plural stemming only?
I modified the original English Stemmer written in Snowball language and regenerate the Java implementation using Snowball compiler. It's been working for me so far. I certainly can share the modified Snowball English Stemmer if anyone wants to use it. Cheers, Cuong On Tue, Jul 1, 2008 at 4:12 AM, Mike Klaas <[EMAIL PROTECTED]> wrote: > If you find a solution that works well, I encourage you to contribute it > back to Solr. Plural-only stemming is probably a common need (I've > definitely wanted to use it before). > > cheers, > -Mike > > > On 30-Jun-08, at 2:25 AM, climbingrose wrote: > > Ok, it looks like step 1a in Porter algo does what I need. >> On Mon, Jun 30, 2008 at 6:39 PM, climbingrose <[EMAIL PROTECTED]> >> wrote: >> >> Hi all, >>> Porter stemmer in general is really good. However, there are some cases >>> where it doesn't work. For example, "accountant" matches "Accountant" as >>> well as "Account Manager" which isn't desirable. Is it possible to use >>> this >>> analyser for plural words only? For example: >>> +Accountant -> accountant >>> +Accountants -> accountant >>> +Account -> Account >>> +Accounts -> account >>> >>> Thanks. >>> >>> -- >>> Regards, >>> >>> Cuong Hoang >>> >>> >> >> >> -- >> Regards, >> >> Cuong Hoang >> > > -- Regards, Cuong Hoang