Re: Leveraging filter chache in queries

Erik Hatcher Fri, 12 May 2006 10:20:30 -0700


On May 12, 2006, at 9:06 AM, Fabio Confalonieri wrote:

I see our needs have already surfaced in the mailing list, it's therefinesearch problem You have sometime called faceted browsing and whichis thebase of CNet browsing architecture: we have ads with differentcategories
which have different attributes ("fields" in lucene language), say
motors-car category has make,model,price,color and real-estates-houses has
bathrooms ranges, bedrooms ranges, etc...
I understand You have developed Solr also to have filter cache storing
bitset of search results to have a fast way to intersect thosebitsets tocount resulting sub-queries and present the count for refinementsearches (Ihave read the announcement of CNet and the Nines related thread andalso
some other related thread).

As Yonik has pointed out, Solr provides some nice facilities to buildupon, but the actual implementation is still custom for this sort ofthing. For example, here's the (pseudo)code for my intersectingBitSet (and soon to become DocSet) processing works:

private Query createConstraintMask(final Map facetCache, String[]constraints, BitSet constraintMask, IndexReader reader) throwsParseException, IOException {Query query = new BooleanQuery(); // BooleanQuery used for allfull-text expression constraints, but not for facetsconstraintMask.set(0, constraintMask.size()); // light up alldocuments initially


    if (constraints != null) {

// Loop over all constraints, ANDing all cached bit sets withthe constraint mask

      for (String constraint : constraints) {
        if (constraint == null || constraint.length() == 0) continue;

        // constraint looks like this: [-]field:value
        int colonPosition = constraint.indexOf(':');

        if (colonPosition <= 0) continue;

        String field = constraint.substring(0,colonPosition);
        boolean invert = false;
        if (field.startsWith("-")) {
          invert = true;
          field = field.substring(1);
        }

        String value = constraint.substring(colonPosition + 1);

        BitSet valueMask;
        if (! field.equals("?")) {

Map fieldMap = (Map) facetCache.get(field); // facetCacheis from a custom Solr cache currentlyif (fieldMap == null) continue; // field name doesn'tcorrespond to predefined facets


          valueMask = (BitSet) fieldMap.get(value);
          if (valueMask == null) {
            valueMask = new BitSet(constraintMask.size());

System.out.println("invalid value requested for field "+ field + ": " + value);

          }
        } else {
          Query clause = // some query from parsing "value";

QueryFilter filter = new QueryFilter(clause); // thisshould change to get the DocSet from Solr's facilities :)

          valueMask = filter.bits(reader);
        }

        if (!invert) {
          constraintMask.and(valueMask);
        } else {

constraintMask.andNot(valueMask); // This is what wouldbe nice for DocSet's to be capable of

        }
      }
    }

    if (((BooleanQuery)query).getClauses().length == 0) {
      query = new MatchAllDocsQuery();
    }

    return query;
  }


And then basically it gets called like this in my custom handler:

    BitSet constraintMask = new BitSet(reader.numDocs());

Query query = query = createConstraintMask(facetCache,req.getParams("constraint"), constraintMask, reader);DocList results = req.getSearcher().getDocList(query, newBitDocSet(constraintMask), sort, req.getStart(), req.getLimit());


[critique of this code more than welcome!]

My client (Ruby on Rails) is POSTing in a parameter that looks likethis:


        constraint=#{invert}#{field}:#{constraint[:value]}

parameters. Works really well even before my refactoring to useSolr's DocSet and caching capabilities, and I'm sure it'll do evenbetter leveraging its provided capabilities. Really nice stuff!

A more general question: Is all the CNet logic of intersecting bitsets
available through the servlet or have I to write some java code to be
plugged in Solr?

Currently you have to piece it together. The goal is to build thesefacilities more into the core, but we should do so based on folksimplementing it themselves and contributing it, so that we cancompare the needs that others have and come up with some greatgroundwork in the faceted browsing area just as Solr itself has builtabove raw Lucene.

So, let's all flesh this stuff out and compare/contrast real-worldworking implementations and factoring it on top.

As an example of another facility I've just added on top, the abilityto return all terms that match a client-provided prefix - this is toenable Google Suggest-like convenience so that when someone types"Yo" and pauses, an Ajaxifried UI will hit my Rails app, which inturn will ping Solr with the prefix and a custom request handler willrespond back with the terms that match ("Yonik" for example) for aspecified field. Not only that, but my implementation returns thenumber of documents that match that term constrained by the sametypes of constraints above including full-text queries. This allowsour users to pick people by typing a name rather than us having topopulate a drop-down (we'll still have some kind of browser interfacetoo, I'm sure) but only names of folks involved in the document setthey are currently constraining their view to.

I've been thinking about this in a general sense - if Solr was drivenby a slick servlet filter rather than servlets then these types ofhandlers could be plugged in a lot easier including automatic URLhandling rather than having to twiddle web.xml. I realize that thehandler configuration allows this with the qt parameter, and I'mleveraging that myself, but I think with some HiveMind mojo to allowtrue "plugins" to drop right into the classpath and be immediatelyavailable (perhaps even hotly with some containers, but I personallywould rebuild a WAR, stop/deploy/restart).

In this case which is the correct level to make this, perhaps a new
RequestHandler understanding some new query syntax to exploit filters.

Back to your specific case currently, yes, a new request handler isneeded to go above and beyond what the built-in standard oneprovides. I expect a flood of cool handlers on top of Solr :) andthat is why I am thinking more along the lines of a true pluginarchitecture.

We only need a sort on a single and precalculated rank field storedas arange field, so we don't need relevance and consequently don't neddscores
(which is a prerequisite for using BitSets, if I understand well).


You're pretty much right on!

PS:I think Solr and Lucene are a really great work!
I'll be happy when we have finished to add our project (a majorpress group
here in Italy) to public websites in Solr Wiki.

I'm looking forward to your work on top of Solr! I'm personallyquite thrilled with it and really believe it'll go far. If only Ihad more time to play with it myself rather than just contemplatingit :)


        Erik

Re: Leveraging filter chache in queries

Reply via email to