RE: Best use of wildcard searches

Jonathan Woods Thu, 09 Aug 2007 22:45:24 -0700

Maybe there's a different way, in which path-like values like this are
treated explicitly.

I use a similar approach to Matthew at www.colfes.com, where all pages are
generated from Lucene searches according to filters on a couple of
hierarchical categories ('spaces'), i.e. subject and organisational unit.
>From that experience, a few things occur to me here:

1.  The structure of any particular category/space is not immediately
derivable from data, so unless we're Google or doing something RDF-like
they're something you define up front.  For this reason, and because it
makes internationalisation easier, I feel you should model this kind of
standing data independently of its representation.

So instead searching for Departments>Men's Apparel>Jackets, I index (and
search for) a String "/departments/mensapparel/jackets/", and used a simple
standing data mapping to resolves each of the nodes along the path to a
human-readable form when necessary.  In my case, the values for any
particular resource (e.g. a news article) are defined by CMS users from
drop-downs.

2.  In my Lucene library, I redundantly indexed paths like
"/departments/mensapparel/jackets/" into successive fragments, together with
the whole path value:

/departments
/departments/mensapparel
/departments/mensapparel/jackets
/departments/mensapparel/jackets/

using my own PathAnalyzer (extends Analyzer, of course) which makes it very
fast to query on path fragments: "all goods anywhere in the men's apparel
section" -> query on "/departments/mensapparel"; "all goods categorised as
exactly in the men's apparel section" -> query on
"/departments/mensapparel/".

I implemented all queries like this as filters, and cached the filter
definitions.  I guess Solr's query optimisation and filter caching do all
this out of the box, so it may end up being just as fast to use the kind of
PrefixQuery suggested in this thread.

3.  However, I can post/attach/donate PathAnalyzer if anyone thinks it might
still be useful.  I started off calling it HierarchyValueAnalyzer, then
TreeNodePathAnalyzer, but now that it's PathAnalyzer I cna't help thinking
it might have lots of applications....

Jon 

> -----Original Message-----
> From: Yonik Seeley [mailto:[EMAIL PROTECTED] 
> Sent: 09 August 2007 21:50
> To: solr-user@lucene.apache.org
> Subject: Re: Best use of wildcard searches
> 
> On 8/9/07, Matthew Runo <[EMAIL PROTECTED]> wrote:
> > http://66.209.92.171:8080/solr/select/?q=department_exact:Apparel%
> > 3EMen's%20Apparel%
> > 3EJackets*&fq=country_code:US&fq=brand_exact:adidas&wt=python
> >
> > The same exact query, with... wait..
> >
> > Wow. I'm making myself look like an idiot.
> >
> > I swear that these queries didn't work the first time I ran them...
> >
> > But now "\ " and "?" give the same results, as would be expected, 
> > while " " returns nothing.
> >
> > I'm sorry for wasting your time, but I do appreciate the help!
> 
> lo - these things can happen when you get too many levels of 
> escaping needed.
> Hopefully we can improve the situation in the future to get 
> rid of the query parser escaping for certain queries such as 
> prefix and term.

RE: Best use of wildcard searches

Reply via email to