Re: How do I this in Solr?

Varun Gupta Mon, 08 Nov 2010 00:58:31 -0800

I haven't been able to work on it because of some other commitments. The
MemoryIndex approach seems promising. Only thing I will have to check is the
memory requirement as I have close to 2 million documents.


Will let you know if I can make it work.

Thanks a lot!

--
Varun Gupta

On Sat, Nov 6, 2010 at 3:48 AM, Steven A Rowe <sar...@syr.edu> wrote:

> Hi Varun,
>
> On 10/26/2010 at 11:26 PM, Varun Gupta wrote:
> > I will try to implement the two filters suggested by Steven and see how
> > the performance matches up.
>
> Have you made any progress?
>
> I was thinking about your use case, and it occurred to me that you could
> get what you want by reversing the problem, using Lucene's MemoryIndex <
> http://lucene.apache.org/java/3_0_2/api/contrib-memory/org/apache/lucene/index/memory/MemoryIndex.html>.
>  (As far as I can tell, this functionality -- i.e. standing queries a.k.a.
> routing a.k.a. filtering -- is not present in Solr.)
>
> You can load your query (as a document) into a MemoryIndex, and then use
> each of your documents to query against it, something like (untested!):
>
>        Map<String,Query> documents = new HashMap<String,Query>();
>        Analyzer analyzer = new WhitespaceAnalyzer();
>        QueryParser parser = new QueryParser("content", analyzer);
>        parser.setDefaultOperator(QueryParser.Operator.AND);
>        documents.put("ID001", parser.parse("nokia n95"));
>        documents.put("ID002", parser.parse("GPS"));
>        documents.put("ID003", parser.parse("android"));
>        documents.put("ID004", parser.parse("samsung"));
>      documents.put("ID005", parser.parse("samsung android"));
>      documents.put("ID006", parser.parse("nokia android"));
>      documents.put("ID007", parser.parse("mobile with GPS"));
>
>        MemoryIndex index = new MemoryIndex();
>        index.addField("content", "samsung with GPS", analyzer);
>
>        for (Map.Entry<String,Query> entry : documents.entrySet()) {
>          Query query = entry.getValue();
>          if (index.search(query) > 0.0f) {
>            String docId = entry.getKey();
>            // Do something with the hits here ...
>          }
>        }
>
> In the above example, the documents "samsung", "GPS", "android" and
> "samsung android" would be hits, and the other documents would not be, just
> as you wanted.
>
> MemoryIndex is designed to be very fast for this kind of usage, so even
> 100's of thousands of documents should be feasible.
>
> Steve
>
> > -----Original Message-----
> > From: Varun Gupta [mailto:varun.vgu...@gmail.com]
> > Sent: Tuesday, October 26, 2010 11:26 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: How do I this in Solr?
> >
> > Thanks everybody for the inputs.
> >
> > Looks like Steven's solution is the closest one but will lead to
> > performance
> > issues when the query string has many terms.
> >
> > I will try to implement the two filters suggested by Steven and see how
> > the
> > performance matches up.
> >
> > --
> > Thanks
> > Varun Gupta
> >
> >
> > On Wed, Oct 27, 2010 at 8:04 AM, scott chu (朱炎詹)
> > <scott....@udngroup.com>wrote:
> >
> > > I think you have to write a "yet exact match" handler yourself (I mean
> > yet
> > > cause it's not quite exact match we normally know). Steve's answer is
> > quite
> > > near your request. You can do further work based on his solution.
> > >
> > > At the last step, I'll suggest you eat up all blank within query string
> > and
> > > query result, respevtively & only returns those results that has equal
> > > string length as the query string's.
> > >
> > > For example, giving:
> > > *query string = "Samsung with GPS"
> > > *query results:
> > > resutl 1 = "Samsung has lots of mobile with GPS"
> > > result 2 = "with GPS Samsng"
> > > result 3 = "GPS mobile with vendors, such as Sony, Samsung"
> > >
> > > they become:
> > > *query result = "SamsungwithGPS" (length =14)
> > > *query results:
> > > resutl 1 = "SamsunghaslotsofmobilewithGPS" (length =29)
> > > result 2 = "withGPSSamsng" (length =14)
> > > result 3 = "GPSmobilewithvendors,suchasSony,Samsung" (length =43)
> > >
> > > so result 2 matches your request.
> > >
> > > In this way, you can avoid case-sensitive, word-order-rearrange load of
> > > works. Furthermore, you can do refined work, such as remove white
> > > characters, etc.
> > >
> > > Scott @ Taiwan
> > >
> > >
> > > ----- Original Message ----- From: "Varun Gupta"
> > <varun.vgu...@gmail.com>
> > >
> > > To: <solr-user@lucene.apache.org>
> > > Sent: Tuesday, October 26, 2010 9:07 PM
> > >
> > > Subject: How do I this in Solr?
> > >
> > >
> > >  Hi,
> > >>
> > >> I have lot of small documents (each containing 1 to 15 words) indexed
> > in
> > >> Solr. For the search query, I want the search results to contain only
> > >> those
> > >> documents that satisfy this criteria "All of the words of the search
> > >> result
> > >> document are present in the search query"
> > >>
> > >> For example:
> > >> If I have the following documents indexed: "nokia n95", "GPS",
> > "android",
> > >> "samsung", "samsung andriod", "nokia andriod", "mobile with GPS"
> > >>
> > >> If I search with the text "samsung andriod GPS", search results should
> > >> only
> > >> conain "samsung", "GPS", "andriod" and "samsung andriod".
> > >>
> > >> Is there a way to do this in Solr.
> > >>
> > >> --
> > >> Thanks
> > >> Varun Gupta
> > >>
> > >>
> > >
> > >
> > >
> ------------------------------------------------------------------------
> > --------
> > >
> > >
> > >
> > > %<&b6G$J0T.'$$'d(l/f,r!C
> > > Checked by AVG - www.avg.com
> > > Version: 9.0.862 / Virus Database: 271.1.1/3220 - Release Date:
> 10/26/10
> > > 14:34:00
> > >
> > >
>

Re: How do I this in Solr?

Reply via email to