I haven't been able to work on it because of some other commitments. The MemoryIndex approach seems promising. Only thing I will have to check is the memory requirement as I have close to 2 million documents.
Will let you know if I can make it work. Thanks a lot! -- Varun Gupta On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe <sar...@syr.edu> wrote: > Hi Varun, > > On 10/26/2010 at 11:26 PM, Varun Gupta wrote: > > I will try to implement the two filters suggested by Steven and see how > > the performance matches up. > > Have you made any progress? > > I was thinking about your use case, and it occurred to me that you could > get what you want by reversing the problem, using Lucene's MemoryIndex < > http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>. > (As far as I can tell, this functionality -- i.e. standing queries a.k.a. > routing a.k.a. filtering -- is not present in Solr.) > > You can load your query (as a document) into a MemoryIndex, and then use > each of your documents to query against it, something like (untested!): > > Map<String,Query> documents = new HashMap<String,Query>(); > Analyzer analyzer = new WhitespaceAnalyzer(); > QueryParser parser = new QueryParser("content", analyzer); > parser.setDefaultOperator(QueryParser.Operator.AND); > documents.put("ID001", parser.parse("nokia n95")); > documents.put("ID002", parser.parse("GPS")); > documents.put("ID003", parser.parse("android")); > documents.put("ID004", parser.parse("samsung")); > documents.put("ID005", parser.parse("samsung android")); > documents.put("ID006", parser.parse("nokia android")); > documents.put("ID007", parser.parse("mobile with GPS")); > > MemoryIndex index = new MemoryIndex(); > index.addField("content", "samsung with GPS", analyzer); > > for (Map.Entry<String,Query> entry : documents.entrySet()) { > Query query = entry.getValue(); > if (index.search(query) > 0.0f) { > String docId = entry.getKey(); > // Do something with the hits here ... > } > } > > In the above example, the documents "samsung", "GPS", "android" and > "samsung android" would be hits, and the other documents would not be, just > as you wanted. > > MemoryIndex is designed to be very fast for this kind of usage, so even > 100's of thousands of documents should be feasible. > > Steve > > > -----Original Message----- > > From: Varun Gupta [mailto:varun.vgu...@gmail.com] > > Sent: Tuesday, October 26, 2010 11:26 PM > > To: solr-user@lucene.apache.org > > Subject: Re: How do I this in Solr? > > > > Thanks everybody for the inputs. > > > > Looks like Steven's solution is the closest one but will lead to > > performance > > issues when the query string has many terms. > > > > I will try to implement the two filters suggested by Steven and see how > > the > > performance matches up. > > > > -- > > Thanks > > Varun Gupta > > > > > > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹) > > <scott....@udngroup.com>wrote: > > > > > I think you have to write a "yet exact match" handler yourself (I mean > > yet > > > cause it's not quite exact match we normally know). Steve's answer is > > quite > > > near your request. You can do further work based on his solution. > > > > > > At the last step, I'll suggest you eat up all blank within query string > > and > > > query result, respevtively & only returns those results that has equal > > > string length as the query string's. > > > > > > For example, giving: > > > *query string = "Samsung with GPS" > > > *query results: > > > resutl 1 = "Samsung has lots of mobile with GPS" > > > result 2 = "with GPS Samsng" > > > result 3 = "GPS mobile with vendors, such as Sony, Samsung" > > > > > > they become: > > > *query result = "SamsungwithGPS" (length =14) > > > *query results: > > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29) > > > result 2 = "withGPSSamsng" (length =14) > > > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43) > > > > > > so result 2 matches your request. > > > > > > In this way, you can avoid case-sensitive, word-order-rearrange load of > > > works. Furthermore, you can do refined work, such as remove white > > > characters, etc. > > > > > > Scott @ Taiwan > > > > > > > > > ----- Original Message ----- From: "Varun Gupta" > > <varun.vgu...@gmail.com> > > > > > > To: <solr-user@lucene.apache.org> > > > Sent: Tuesday, October 26, 2010 9:07 PM > > > > > > Subject: How do I this in Solr? > > > > > > > > > Hi, > > >> > > >> I have lot of small documents (each containing 1 to 15 words) indexed > > in > > >> Solr. For the search query, I want the search results to contain only > > >> those > > >> documents that satisfy this criteria "All of the words of the search > > >> result > > >> document are present in the search query" > > >> > > >> For example: > > >> If I have the following documents indexed: "nokia n95", "GPS", > > "android", > > >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS" > > >> > > >> If I search with the text "samsung andriod GPS", search results should > > >> only > > >> conain "samsung", "GPS", "andriod" and "samsung andriod". > > >> > > >> Is there a way to do this in Solr. > > >> > > >> -- > > >> Thanks > > >> Varun Gupta > > >> > > >> > > > > > > > > > > ------------------------------------------------------------------------ > > -------- > > > > > > > > > > > > %<&b6G$J0T.'$$'d(l/f,r!C > > > Checked by AVG - www.avg.com > > > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date: > 10/26/10 > > > 14:34:00 > > > > > > >