Re: Federated Search

2008-02-29 Thread Mathieu Lecarme



 - browsing through the web came I accross an application called the Lucene
Web Service : what do you think of it ? (its goal seems precisely to query
multiple indices, it thus would be the thing I'm searching for ; but
considering the scale of this project, I think I'd prefer to base my work on
a project the long time activity of which is guaranted, such as Solr)
  

Lucen Web Service still exist :
http://www.opensearch.org/Home

You can specific tag in your namespace if it's not already exist.

M.


Re: Fwd: Favouring recent matches

2008-03-10 Thread Mathieu Lecarme
1) document boost is periodicaly recomputed with age as a factor (or 
log(age)). It should be slow.
2) Use your own Similarity implementation. Use the DefaultSimilarity 
with a dynamic document boost. The Map document id -> age or document id 
-> date should be cached with Map, ehCache, whirlcache, oscache or bdb 
base. Use expiration caching, and be careful, warm up (ie populating the 
cache) should be slow.


M.
James Brady a écrit :
Sorry, I really should have directly explained what I was looking to 
do: theserverside.com give higher scores to documents that were added 
more recently.


I'd like to do the same, without the date boost being too overbearing 
(or unnoticeable...) - some ideas on how to approach this would be great.


James

Begin forwarded message:


From: James Brady <[EMAIL PROTECTED]>
Date: 8 March 2008 19:41:56 PST
To: solr-user@lucene.apache.org
Subject: Favouring recent matches

Hello all,
In Lucene in Action, (replicated here: 
http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene), 
theserverside.com team say "The date boost has been really important 
for us".


I'm looking for some advice on the best way to actually implement 
this - the only way I can see to do it right now is to set a boost 
for documents at index time that increases linearly over time. 
However, I'm wary of skewing Lucene's scoring in some strange way, or 
interfering with the document boosts I'm setting for other reasons.


Any suggests?

Thanks
James







Re: Human Powered Search Module

2008-04-10 Thread Mathieu Lecarme

Sushan Rungta a écrit :

Hello Everybody,

I am a newbie in Lucene and I am from India, currently working for a 
search module for our classifed website search module in 
clickindia.com. I have implemented the basic functionality of solr 
lucen and am pretty happy with the results.


Search in India has its own share of nuances. 'Maruti' is spelt as 
'Maruthi' in most of South India. People spell most of the times 
'Naukri' as 'Naukari'; a loan request would be simply followed in the 
query as 'need money'. These and many more such intricacies are 
typical of Indians and require a special kind of module for the same.


Is there any ready-made solution for the same? Can I get the access of 
words as mentioned above and is used in India, so that I could 
implement it?
Synonyms are easy to handle, but semantic analysis is a bit trickier. 
Weka may help you? http://weka.sourceforge.net


M.


Re: filtering search using regex

2008-04-12 Thread Mathieu Lecarme

hi,

I have a question ... I need to be able to filter a search using a  
regex. I
cannot used facet as the filtering is pretty complex (but easy to  
perform

using a regex).
For instance I have stored in the field ID the value 12G and I want to
basically filter out all the results that are > 12 with G so for  
instance

14G will match but 8G and 14B would not. Using a regex this is simply
"[1-9]+[3-9]G" ..
i am wondering what the right approach is to tackle such a  
situation ..


thanks.
regex match is only useful when you first select a prefix, wich is a  
basic lucene feature : put the pointer just up to the first term  
begining with "toto".

Your query don't have any prefix.
What happen if you split your data in two field "12"  and "G", "14"  
and "B", or, better, if it's number, "12G" can be indexed as "1200"?


M.


Re: better stemming engine than Porter?

2008-04-22 Thread Mathieu Lecarme
Porter stemmer is not only agressive, it is ugly, too. The generated 
code is too old, too  few object centric and should be too slow.
If your kstem compile with java 1.4, why don't you suggest it to lucene 
core?


M.

Wagner,Harry a écrit :

Hi HH,
Here's a note I sent Solr-dev a while back:

---
I've implemented a Solr plug-in that wraps KStem for Solr use (someone
else had already written a Lucene wrapper for it).  KStem is considered
to be more appropriate for library usage since it is much less
aggressive than Porter (i.e., searches for organization do NOT match on
organ!). If there is any interest in feeding this back into Solr I would
be happy to contribute it.
---

I believe there was interest in it, but I never opened an issue for it
and I don't know if it was ever followed-up on. I'd be happy to do that
now. Can someone on the Solr-dev team point me in the right direction
for opening an issue?

Thanks... harry


-Original Message-
From: Hung Huynh [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 21, 2008 11:59 AM

To: solr-user@lucene.apache.org
Subject: better stemming engine than Porter?

I recall I've read some where in one of the mailing-list archives that
some
one had developed a better stemming algo for Solr than the built-in
Porter
stemming. Does anyone have link to that stemming module? 


Thanks,

HH