Re: Federated Search
- browsing through the web came I accross an application called the Lucene Web Service : what do you think of it ? (its goal seems precisely to query multiple indices, it thus would be the thing I'm searching for ; but considering the scale of this project, I think I'd prefer to base my work on a project the long time activity of which is guaranted, such as Solr) Lucen Web Service still exist : http://www.opensearch.org/Home You can specific tag in your namespace if it's not already exist. M.
Re: Fwd: Favouring recent matches
1) document boost is periodicaly recomputed with age as a factor (or log(age)). It should be slow. 2) Use your own Similarity implementation. Use the DefaultSimilarity with a dynamic document boost. The Map document id -> age or document id -> date should be cached with Map, ehCache, whirlcache, oscache or bdb base. Use expiration caching, and be careful, warm up (ie populating the cache) should be slow. M. James Brady a écrit : Sorry, I really should have directly explained what I was looking to do: theserverside.com give higher scores to documents that were added more recently. I'd like to do the same, without the date boost being too overbearing (or unnoticeable...) - some ideas on how to approach this would be great. James Begin forwarded message: From: James Brady <[EMAIL PROTECTED]> Date: 8 March 2008 19:41:56 PST To: solr-user@lucene.apache.org Subject: Favouring recent matches Hello all, In Lucene in Action, (replicated here: http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene), theserverside.com team say "The date boost has been really important for us". I'm looking for some advice on the best way to actually implement this - the only way I can see to do it right now is to set a boost for documents at index time that increases linearly over time. However, I'm wary of skewing Lucene's scoring in some strange way, or interfering with the document boosts I'm setting for other reasons. Any suggests? Thanks James
Re: Human Powered Search Module
Sushan Rungta a écrit : Hello Everybody, I am a newbie in Lucene and I am from India, currently working for a search module for our classifed website search module in clickindia.com. I have implemented the basic functionality of solr lucen and am pretty happy with the results. Search in India has its own share of nuances. 'Maruti' is spelt as 'Maruthi' in most of South India. People spell most of the times 'Naukri' as 'Naukari'; a loan request would be simply followed in the query as 'need money'. These and many more such intricacies are typical of Indians and require a special kind of module for the same. Is there any ready-made solution for the same? Can I get the access of words as mentioned above and is used in India, so that I could implement it? Synonyms are easy to handle, but semantic analysis is a bit trickier. Weka may help you? http://weka.sourceforge.net M.
Re: filtering search using regex
hi, I have a question ... I need to be able to filter a search using a regex. I cannot used facet as the filtering is pretty complex (but easy to perform using a regex). For instance I have stored in the field ID the value 12G and I want to basically filter out all the results that are > 12 with G so for instance 14G will match but 8G and 14B would not. Using a regex this is simply "[1-9]+[3-9]G" .. i am wondering what the right approach is to tackle such a situation .. thanks. regex match is only useful when you first select a prefix, wich is a basic lucene feature : put the pointer just up to the first term begining with "toto". Your query don't have any prefix. What happen if you split your data in two field "12" and "G", "14" and "B", or, better, if it's number, "12G" can be indexed as "1200"? M.
Re: better stemming engine than Porter?
Porter stemmer is not only agressive, it is ugly, too. The generated code is too old, too few object centric and should be too slow. If your kstem compile with java 1.4, why don't you suggest it to lucene core? M. Wagner,Harry a écrit : Hi HH, Here's a note I sent Solr-dev a while back: --- I've implemented a Solr plug-in that wraps KStem for Solr use (someone else had already written a Lucene wrapper for it). KStem is considered to be more appropriate for library usage since it is much less aggressive than Porter (i.e., searches for organization do NOT match on organ!). If there is any interest in feeding this back into Solr I would be happy to contribute it. --- I believe there was interest in it, but I never opened an issue for it and I don't know if it was ever followed-up on. I'd be happy to do that now. Can someone on the Solr-dev team point me in the right direction for opening an issue? Thanks... harry -Original Message- From: Hung Huynh [mailto:[EMAIL PROTECTED] Sent: Monday, April 21, 2008 11:59 AM To: solr-user@lucene.apache.org Subject: better stemming engine than Porter? I recall I've read some where in one of the mailing-list archives that some one had developed a better stemming algo for Solr than the built-in Porter stemming. Does anyone have link to that stemming module? Thanks, HH