RE: Vote on a new solr logo
Mark, No worries, you are still way ahead of electronic voting! -Original Message- From: Mark Miller [mailto:[EMAIL PROTECTED] Sent: Monday, July 21, 2008 2:37 PM To: solr-user@lucene.apache.org Subject: Re: Vote on a new solr logo Sorry you can't vote guys - makes me look realy smart eh I can vote even when I log out of the admin section, so I am clueless why you cannot. But I am going to fix things somehow and add those new logos for consideration. - Mark Ryan McKinley wrote: > I can't figure how to use the poll either... > > here are a few others to check out: > http://lapnap.net/solr/ > perhaps "a" and "f" could live together, you use 'a' if you need a > background other then white > > > On Jul 21, 2008, at 2:14 PM, Mike Klaas wrote: > >> On 20-Jul-08, at 6:19 PM, Mark Miller wrote: >> >>> From the dev list: >>> >>> Shalin Shekhar Mangar: >>> > +1 for a new logo. It's a new release, let's have a new logo too! > First step > is to decide which one of these is more Solr-ish. >>> >>> I'm looking to improve the look of solr, so I am going to do my best >>> to push this process along. >>> Not to keep shoving polls down everyones throat, but if you could, >>> please go to the following site >>> and rate the solr logos that you love or hate: >>> http://solrlogo.myhardshadow.com/solr-logo-vote/ >> >> I don't really understand how to use the poll. I click on a logo, >> and am then taken to a page on which the stars are unclickable. >> Which stars should be clicked on? >> >> -Mike >
RE: Less aggressive stemmer?
We use KStem also and are very happy with it. I think it has been integrated into Solr and will be included in 1.3 (someone please correct me if this is not the case). You should be able to get it from the nightly builds now. Cheers! harry -Original Message- From: Kevin Osborn [mailto:[EMAIL PROTECTED] Sent: Thursday, August 21, 2008 5:30 PM To: solr-user@lucene.apache.org Subject: Re: Less aggressive stemmer? We had similar problems and then switched to KStem and have been pretty happy with the results. http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi - Original Message From: Jason Rennie <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Thursday, August 21, 2008 2:23:36 PM Subject: Less aggressive stemmer? Is there an option to perform less aggressive stemming in solr? We're using the Porter stemmer. I see that there is an option for Snowball, but my understanding is that Snowball is a refinement of Porter rather than something radically different. I think we'd be best off with something very basic, possibly as simple as removing plural endings. Our index is over product descriptions, so it's important that we stem normal variations in nouns, but adverbs, verbs and possibly adjective variations are not so important and sometimes cause problems for us. Jason
RE: Less aggressive stemmer?
Otis, I'd be happy to. Where do you think the best place to put this is - under 'hacking Solr' or with the other stemming text? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, August 22, 2008 12:24 PM To: solr-user@lucene.apache.org Subject: Re: Less aggressive stemmer? It won't be integrated in Solr 1.3, I believe, because of KStem's license. But we should document what the Factory for it can look like, perhaps by posting it on the Wiki. Harry, if you have the code handy, feel free to post it on the Solr Wiki somewhere. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message ---- > From: "Wagner,Harry" <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, August 22, 2008 8:40:18 AM > Subject: RE: Less aggressive stemmer? > > We use KStem also and are very happy with it. I think it has been > integrated into Solr and will be included in 1.3 (someone please correct > me if this is not the case). You should be able to get it from the > nightly builds now. > > Cheers! > harry > > -Original Message- > From: Kevin Osborn [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 21, 2008 5:30 PM > To: solr-user@lucene.apache.org > Subject: Re: Less aggressive stemmer? > > We had similar problems and then switched to KStem and have been pretty > happy with the results. > > http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi > > > > - Original Message > From: Jason Rennie > To: solr-user@lucene.apache.org > Sent: Thursday, August 21, 2008 2:23:36 PM > Subject: Less aggressive stemmer? > > Is there an option to perform less aggressive stemming in solr? We're > using > the Porter stemmer. I see that there is an option for Snowball, but my > understanding is that Snowball is a refinement of Porter rather than > something radically different. I think we'd be best off with something > very > basic, possibly as simple as removing plural endings. Our index is over > product descriptions, so it's important that we stem normal variations > in > nouns, but adverbs, verbs and possibly adjective variations are not so > important and sometimes cause problems for us. > > Jason
RE: Less aggressive stemmer?
OK. I put it here http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem and linked it from the stemming paragraph found here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters Cheers! harry -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, August 25, 2008 3:20 PM To: solr-user@lucene.apache.org Subject: Re: Less aggressive stemmer? I'd create a new page and link it from, perhaps, the page about stemming if there is one or from the page about analyzers/tokens/filters. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: "Wagner,Harry" <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, August 25, 2008 1:57:38 PM > Subject: RE: Less aggressive stemmer? > > Otis, > I'd be happy to. Where do you think the best place to put this is - > under 'hacking Solr' or with the other stemming text? > > -Original Message- > From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] > Sent: Friday, August 22, 2008 12:24 PM > To: solr-user@lucene.apache.org > Subject: Re: Less aggressive stemmer? > > It won't be integrated in Solr 1.3, I believe, because of KStem's > license. But we should document what the Factory for it can look like, > perhaps by posting it on the Wiki. Harry, if you have the code handy, > feel free to post it on the Solr Wiki somewhere. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: "Wagner,Harry" > > To: solr-user@lucene.apache.org > > Sent: Friday, August 22, 2008 8:40:18 AM > > Subject: RE: Less aggressive stemmer? > > > > We use KStem also and are very happy with it. I think it has been > > integrated into Solr and will be included in 1.3 (someone please > correct > > me if this is not the case). You should be able to get it from the > > nightly builds now. > > > > Cheers! > > harry > > > > -Original Message- > > From: Kevin Osborn [mailto:[EMAIL PROTECTED] > > Sent: Thursday, August 21, 2008 5:30 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Less aggressive stemmer? > > > > We had similar problems and then switched to KStem and have been > pretty > > happy with the results. > > > > http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi > > > > > > > > - Original Message > > From: Jason Rennie > > To: solr-user@lucene.apache.org > > Sent: Thursday, August 21, 2008 2:23:36 PM > > Subject: Less aggressive stemmer? > > > > Is there an option to perform less aggressive stemming in solr? We're > > using > > the Porter stemmer. I see that there is an option for Snowball, but > my > > understanding is that Snowball is a refinement of Porter rather than > > something radically different. I think we'd be best off with > something > > very > > basic, possibly as simple as removing plural endings. Our index is > over > > product descriptions, so it's important that we stem normal variations > > in > > nouns, but adverbs, verbs and possibly adjective variations are not so > > important and sometimes cause problems for us. > > > > Jason
Solr and KStem
There is a version of the KStem stemmer (http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has been adapted for Lucene. What would be the simplest way to implement this in Solr? As a plug-in? Has anyone already done this? Thanks... harry
RE: Solr and KStem
I've implemented a Solr plug-in that wraps KStem for Solr use. KStem is considered to be more appropriate for library usage since it is much less aggressive than Porter (i.e., searches for organization do NOT match on organ!). If there is any interest in feeding this back into Solr I would be happy to contribute it. Cheers! harry -Original Message- From: Wagner,Harry Sent: Tuesday, August 28, 2007 4:09 PM To: 'solr-user@lucene.apache.org' Subject: Solr and KStem There is a version of the KStem stemmer (http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi) that has been adapted for Lucene. What would be the simplest way to implement this in Solr? As a plug-in? Has anyone already done this? Thanks... harry
RE: Solr and KStem
Yes, I don't think the licensing will be a problem as KStem already includes a wrapper for Lucene. Cheers! harry -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, September 07, 2007 4:40 PM To: solr-user@lucene.apache.org Subject: Re: Solr and KStem Look for KStem in Lucene JIRA. Mny years ago something KStem related was contributed, and there was a discussion about licenses then. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, September 7, 2007 4:31:25 PM Subject: Re: Solr and KStem Even if KStem isn't ASL, we could include the plug-in code with notes about how to get the stemmer. Or, the Solr plug-in could be contributed to the group that manages the KStem distribution: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi wunder On 9/7/07 12:59 PM, "Yonik Seeley" <[EMAIL PROTECTED]> wrote: > On 9/7/07, Wagner,Harry <[EMAIL PROTECTED]> wrote: >> I've implemented a Solr plug-in that wraps KStem for Solr use. KStem is >> considered to be more appropriate for library usage since it is much >> less aggressive than Porter (i.e., searches for organization do NOT >> match on organ!). If there is any interest in feeding this back into >> Solr I would be happy to contribute it. > > Absolutely. > We need to make sure that the license for that k-stemmer is ASL > compatible of course. > > -Yonik
RE: Solr and KStem
Hi Yonik and Mike, No problem regarding my employer. I've checked and they are happy to contribute it. I'm not sure what to do about the KStem code though. It was originally written by Bob Krovetz and then modified for Lucene by Sergio Guzman-Lara (both from UMASS Amherst). I modified the Guzman version for Solr. Perhaps I should contribute only what I modified, with instructions for making it work? Let me know... harry -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Monday, September 10, 2007 2:49 PM To: solr-user@lucene.apache.org Subject: Re: Solr and KStem Hi Harry, Thanks for your contribution! Unfortunately, we can't include it in Solr unless the necessary legal hurdles are cleared. An issue needs to be opened on http://issues.apache.org/jira/browse/ SOLR and you have to attach the file and check the "Grant License to ASF" button. It is also important to verify that you have the legal right to grant the code to ASF (since it is probably your employer's intellectual property). Legal issues are a hassle, but are unavoidable, I'm afraid. Thanks again, -Mike On 10-Sep-07, at 10:22 AM, Wagner,Harry wrote: > Hi Yonik, > The modified KStemmer source is attached. The original KStemFilter is > now wrapped (and replaced) by KStemFilterFactory. I also changed the > path to avoid any naming collisions with existing Lucene code. > > I included the jar file also, for anyone who wants to just drop and > play: > > - put KStem2.jar in your solr/lib directory. > - change your schema to use: class="org.oclc.solr.analysis.KStemFilterFactory" cacheSize="2"/> > - restart your app server > > I don't know if you credit contributions, but if so please include > OCLC. > Seems only fair since I did this on their dime :) > > Cheers! > harry > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik > Seeley > Sent: Friday, September 07, 2007 3:59 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr and KStem > > On 9/7/07, Wagner,Harry <[EMAIL PROTECTED]> wrote: >> I've implemented a Solr plug-in that wraps KStem for Solr use. KStem > is >> considered to be more appropriate for library usage since it is much >> less aggressive than Porter (i.e., searches for organization do NOT >> match on organ!). If there is any interest in feeding this back into >> Solr I would be happy to contribute it. > > Absolutely. > We need to make sure that the license for that k-stemmer is ASL > compatible of course. > > -Yonik >
RE: Solr and KStem
Bill, Currently it is a plug-in. Put the lower case filter ahead of kstem, just as for porter (example below). You can use it with porter, but I can't imagine why you would want to. At least not in the same analyzer. Hope this helps. Cheers... harry -Original Message- From: Bill Fowler [mailto:[EMAIL PROTECTED] Sent: Monday, September 10, 2007 8:33 PM To: solr-user@lucene.apache.org Subject: Re: Solr and KStem Hello, I would like to test this and have a few questions (please excuse what may seem naive questions). I would like to verify that this is purely a configuration feature -- since the schema.xml defines the analysis/tokerizer chain no other changes are required. Also, the source seems to say that a lower case factory needs to be "farther down" the tokenizer chain. So does this mean that the KStem factory appears before the lower case filter factory in the schema.xml. Is there a recommended (required?) tokenizer factory. I am using the WhiteSpaceFactory which seems OK. Finally, I take it that I need to remove the EnglishPorterFilterFactory item in the schema.xml -- or no? Thanks, Bill On 9/10/07, Wagner,Harry <[EMAIL PROTECTED]> wrote: > > Hi Yonik, > The modified KStemmer source is attached. The original KStemFilter is > now wrapped (and replaced) by KStemFilterFactory. I also changed the > path to avoid any naming collisions with existing Lucene code. > > I included the jar file also, for anyone who wants to just drop and > play: > > - put KStem2.jar in your solr/lib directory. > - change your schema to use: class="org.oclc.solr.analysis.KStemFilterFactory" cacheSize="2"/> > - restart your app server > > I don't know if you credit contributions, but if so please include OCLC. > Seems only fair since I did this on their dime :) > > Cheers! > harry > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik > Seeley > Sent: Friday, September 07, 2007 3:59 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr and KStem > > On 9/7/07, Wagner,Harry <[EMAIL PROTECTED]> wrote: > > I've implemented a Solr plug-in that wraps KStem for Solr use. KStem > is > > considered to be more appropriate for library usage since it is much > > less aggressive than Porter (i.e., searches for organization do NOT > > match on organ!). If there is any interest in feeding this back into > > Solr I would be happy to contribute it. > > Absolutely. > We need to make sure that the license for that k-stemmer is ASL > compatible of course. > > -Yonik > >
RE: Solr live at Netflix
Otis, Take a look at KStem: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi It's less aggressive than Porter. I modified the Lucene version to work with Solr, but don't know if it was adopted into the Solr source. Let me know if you are interested and I'll send you a jar file. Cheers! harry -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, October 04, 2007 10:36 AM To: solr-user@lucene.apache.org Subject: Re: Solr live at Netflix I'm curious about this one. I'm assuming Porter stemmer would stem Gamers and Gamera to the same stem (Game?). If the stems are different, which stemmer are you using? A smarter custom morphological stemmer? Thanks, Otis - Original Message From: Tom Hill <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, October 2, 2007 8:16:18 PM Subject: Re: Solr live at Netflix Nice! And there seem to be some improvements. For example, "Gamers" and "Gamera" no longer stem to the same word :-) Tom On 10/2/07, Walter Underwood <[EMAIL PROTECTED]> wrote: > > Here at Netflix, we switched over our site search to Solr two weeks ago. > We've seen zero problems with the server. We average 1.2 million > queries/day on a 250K item index. We're running four Solr servers > with simple round-robin HTTP load-sharing. > > This is all on 1.1. I've been too busy tuning to upgrade. > > Thanks everyone, this is a great piece of software. > > wunder > -- > Walter Underwood > Search Guy, Netflix > >
RE: Solr and KStem
Hi Piete, Good idea. Thanks. One other change that should probably be made is to change the package statement from org.oclc.solr.analysis to org.apache.solr.analysis. Thanks again. Cheers! harry -Original Message- From: Pieter Berkel [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 9:10 PM To: solr-user@lucene.apache.org Subject: Re: Solr and KStem Hi Harry, I re-discovered this thread last week and have made some minor changes to the code (remove deprication warnings) so that it compiles with trunk. I think it would be quite useful to get this stemmer into Solr once all the legal / licensing issues are resolved. If there are no objections, I'll open a JIRA ticket and upload my changes so we can make sure we're all working with the same code. cheers, Piete On 11/09/2007, Wagner,Harry <[EMAIL PROTECTED]> wrote: > > Bill, > Currently it is a plug-in. Put the lower case filter ahead of kstem, > just as for porter (example below). You can use it with porter, but I > can't imagine why you would want to. At least not in the same analyzer. > Hope this helps. > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0"/> > > cacheSize="2"/> > > > > > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0"/> > > cacheSize="2"/> > > > > > Cheers... harry > >
RE: Solr and security
One effective method is to block access to the port Solr runs on. Force application access to come thru the HTTP server, and let it map to the application server (i.e., like mod_jk does for for Apache & Tomcat). Simple, but effective. Cheers! harry -Original Message- From: Cool Coder [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 24, 2007 12:17 PM To: solr-user@lucene.apache.org Subject: Solr and security Hi Group, As far as I know, to use solr, we need to deploy it as a server and communicate to solr using http protocol. How about its security? i.e. how can we ensure that it only accepts request from predefined set of users only. Is there any way we can specify this in solr or solr depends only on web server security model. I am not sure whether my interpretation is right? Your suggestion/input? - BR __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Performance Recommendation
Where is a good place to look for some performance recommendations? We have a 2.4G index running on server with 16G. Overall performance is very good, but the initial sort on an index is too slow. Any idea what, if anything, in the solrConfig would help that? Thanks... harry
RE: Performance Recommendation
Thank Erik, That fixed the problem. Cheers! harry -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, October 25, 2007 4:41 PM To: solr-user@lucene.apache.org Subject: Re: Performance Recommendation On Oct 25, 2007, at 4:19 PM, Wagner,Harry wrote: > Where is a good place to look for some performance recommendations? We > have a 2.4G index running on server with 16G. Overall performance is > very good, but the initial sort on an index is too slow. Any idea > what, > if anything, in the solrConfig would help that? One option is to configure a warming query with sorts in solrconfig.xml. Check out the newSeacher feature: <http://wiki.apache.org/solr/ SolrConfigXml#head-451a45dd507ac621e7349576b475e9ec37c0e7b9> Erik
Analysis / Query problem
I have the following custom field defined for author names. After indexing the 2 documents below the admin analysis tool looks right for field-name=au and field-value=Schröder, Jürgen The highlight matching also seems right. However, if I search for au:Schröder, Jürgen using the admin tool I do not get any hits (see below). This appears to be the case whenever there are 2 non-ascii characters in the author name. Searching for au:Schröder, Jurgen finds both of these records. Any idea what is causing this? Thanks! harry 008053223 Schröder, Jürgen, Gottfried Benn : Poesie u. Sozialisation / Includes index. Benn, Gottfried,. Authors, German--20th century--Biography. 831.912 8 83 831 Special 137 008053223 schroder, jurgen$1935/gottfried benn poesie u sozialisation 1 01 Book ger 1978-12-31T23:59:59Z 317004446X 9783170044463 1 Schröder, Jurgen, Gottfried Benn : Poesie u. Sozialisation / Includes index. Benn, Gottfried,. Authors, German--20th century--Biography. 831.912 8 83 831 Special 137 008053223 schroder, jurgen$1935/gottfried benn poesie u sozialisation 1 01 Book ger 1978-12-31T23:59:59Z 317004446X 9783170044463 0 0 on 0 au:Schröder, Jürgen 10 2.2
RE: Analysis / Query problem
Thanks Erik. That helps. -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 07, 2007 11:36 AM To: solr-user@lucene.apache.org Subject: Re: Analysis / Query problem On Nov 7, 2007, at 10:26 AM, Wagner,Harry wrote: > I have the following custom field defined for author names. After > indexing the 2 documents below the admin analysis tool looks right > for field-name=au and field-value=Schröder, Jürgen The highlight > matching also seems right. However, if I search for au:Schröder, > Jürgen using the admin tool I do not get any hits (see below). > This appears to be the case whenever there are 2 non-ascii > characters in the author name. Searching for au:Schröder, Jurgen > finds both of these records. Any idea what is causing this? > > > > > > 0 > > 0 > > > > on > > 0 > > au:Schröder, Jürgen One thing to note is that query "au:Schröder, Jürgen" is being translated (try &debugQuery=true to see) to: au:schröder :jürgen AND/OR depends on how you have things configured, as well as the default field. You probably want to use the ISOLatin1AccentFilterFactory to have the diacritics "flattened" to the ASCII character they look like. Erik
Field seperater for highlighting multi-value fields
Hi, The default field separator seems to be a '.' when highlighting multi-value fields. Can this be overridden in 1.2 to another character? Thanks! harry
RE: Field seperater for highlighting multi-value fields
Hi Chris, Forget about this. I was doing something stupid. I should not send email before I've had coffee. Cheers... harry -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Saturday, December 08, 2007 12:30 AM To: solr-user@lucene.apache.org Subject: Re: Field seperater for highlighting multi-value fields : The default field separator seems to be a '.' when highlighting : multi-value fields. Can this be overridden in 1.2 to another character? Default field seperator where? in the response? can you give a specific example of what you are talking about? -Hoss
RE: better stemming engine than Porter?
Hi HH, Here's a note I sent Solr-dev a while back: --- I've implemented a Solr plug-in that wraps KStem for Solr use (someone else had already written a Lucene wrapper for it). KStem is considered to be more appropriate for library usage since it is much less aggressive than Porter (i.e., searches for organization do NOT match on organ!). If there is any interest in feeding this back into Solr I would be happy to contribute it. --- I believe there was interest in it, but I never opened an issue for it and I don't know if it was ever followed-up on. I'd be happy to do that now. Can someone on the Solr-dev team point me in the right direction for opening an issue? Thanks... harry -Original Message- From: Hung Huynh [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 11:59 AM To: solr-user@lucene.apache.org Subject: better stemming engine than Porter? I recall I've read some where in one of the mailing-list archives that some one had developed a better stemming algo for Solr than the built-in Porter stemming. Does anyone have link to that stemming module? Thanks, HH
RE: better stemming engine than Porter?
Mathieu, It's not my Kstem. It was written by someone at Umass, Amherst. More info here: http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi Someone else had already ported it to Lucene. I simply modified that wrapper to work with Solr. I'll open an issue for it so that it can (hopefully) be integrated into the project. Cheers... harry -Original Message- From: Mathieu Lecarme [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 3:57 AM To: solr-user@lucene.apache.org Subject: Re: better stemming engine than Porter? Porter stemmer is not only agressive, it is ugly, too. The generated code is too old, too few object centric and should be too slow. If your kstem compile with java 1.4, why don't you suggest it to lucene core? M. Wagner,Harry a écrit : > Hi HH, > Here's a note I sent Solr-dev a while back: > > --- > I've implemented a Solr plug-in that wraps KStem for Solr use (someone > else had already written a Lucene wrapper for it). KStem is considered > to be more appropriate for library usage since it is much less > aggressive than Porter (i.e., searches for organization do NOT match on > organ!). If there is any interest in feeding this back into Solr I would > be happy to contribute it. > --- > > I believe there was interest in it, but I never opened an issue for it > and I don't know if it was ever followed-up on. I'd be happy to do that > now. Can someone on the Solr-dev team point me in the right direction > for opening an issue? > > Thanks... harry > > > -Original Message- > From: Hung Huynh [mailto:[EMAIL PROTECTED] > Sent: Monday, April 21, 2008 11:59 AM > To: solr-user@lucene.apache.org > Subject: better stemming engine than Porter? > > I recall I've read some where in one of the mailing-list archives that > some > one had developed a better stemming algo for Solr than the built-in > Porter > stemming. Does anyone have link to that stemming module? > > Thanks, > > HH > > > > >
RE: better stemming engine than Porter?
Thanks Ryan. I just opened SOLR-546. Please let me know if I can provide further help. Cheers! h -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: better stemming engine than Porter? Hey- to create an issue, make an account on jira and post it... https://issues.apache.org/jira/browse/SOLR Give that a try and holler if you have trouble. ryan On Apr 21, 2008, at 12:31 PM, Wagner,Harry wrote: > Hi HH, > Here's a note I sent Solr-dev a while back: > > --- > I've implemented a Solr plug-in that wraps KStem for Solr use (someone > else had already written a Lucene wrapper for it). KStem is > considered > to be more appropriate for library usage since it is much less > aggressive than Porter (i.e., searches for organization do NOT match > on > organ!). If there is any interest in feeding this back into Solr I > would > be happy to contribute it. > --- > > I believe there was interest in it, but I never opened an issue for it > and I don't know if it was ever followed-up on. I'd be happy to do > that > now. Can someone on the Solr-dev team point me in the right direction > for opening an issue? > > Thanks... harry > > > -Original Message- > From: Hung Huynh [mailto:[EMAIL PROTECTED] > Sent: Monday, April 21, 2008 11:59 AM > To: solr-user@lucene.apache.org > Subject: better stemming engine than Porter? > > I recall I've read some where in one of the mailing-list archives that > some > one had developed a better stemming algo for Solr than the built-in > Porter > stemming. Does anyone have link to that stemming module? > > Thanks, > > HH > > >
RE: better stemming engine than Porter?
Hi Jay, I did not do a timing comparison either, but any change in performance after switching to Kstem was not noticeable. Cheers... h -Original Message- From: Jay [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 12:26 PM To: solr-user@lucene.apache.org Subject: Re: better stemming engine than Porter? Hi Wagner, Thanks for the intro of KStem! I quickly scanned the original paper on KStem by Robert Krovetz but could not find any timing comparison data on KStem and Porter stem. I wonder how slow/fast Kstem is compared to Porter stem based on your use in your application? Jay Wagner,Harry wrote: > Mathieu, > It's not my Kstem. It was written by someone at Umass, Amherst. More info > here: > http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi > > Someone else had already ported it to Lucene. I simply modified that wrapper > to work with Solr. I'll open an issue for it so that it can (hopefully) be > integrated into the project. > > Cheers... harry > > -Original Message- > From: Mathieu Lecarme [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 22, 2008 3:57 AM > To: solr-user@lucene.apache.org > Subject: Re: better stemming engine than Porter? > > Porter stemmer is not only agressive, it is ugly, too. The generated > code is too old, too few object centric and should be too slow. > If your kstem compile with java 1.4, why don't you suggest it to lucene > core? > > M. > > Wagner,Harry a écrit : >> Hi HH, >> Here's a note I sent Solr-dev a while back: >> >> --- >> I've implemented a Solr plug-in that wraps KStem for Solr use (someone >> else had already written a Lucene wrapper for it). KStem is considered >> to be more appropriate for library usage since it is much less >> aggressive than Porter (i.e., searches for organization do NOT match on >> organ!). If there is any interest in feeding this back into Solr I would >> be happy to contribute it. >> --- >> >> I believe there was interest in it, but I never opened an issue for it >> and I don't know if it was ever followed-up on. I'd be happy to do that >> now. Can someone on the Solr-dev team point me in the right direction >> for opening an issue? >> >> Thanks... harry >> >> >> -Original Message- >> From: Hung Huynh [mailto:[EMAIL PROTECTED] >> Sent: Monday, April 21, 2008 11:59 AM >> To: solr-user@lucene.apache.org >> Subject: better stemming engine than Porter? >> >> I recall I've read some where in one of the mailing-list archives that >> some >> one had developed a better stemming algo for Solr than the built-in >> Porter >> stemming. Does anyone have link to that stemming module? >> >> Thanks, >> >> HH >> >> >> >> >> > > >
Too many open files
I'm getting this with Solr 1.2 trying to load a large db. Is there a workaround?