Let's take this a step further, like I do with a (messy) custom request handler in Collex. For an example, go to http:// www.nines.org/collex and type "sol" into the (slightly misnamed from our technical perspective) "phrase" text box. The drop-down shows all terms beginning with "sol", and _also_ the counts of the documents *within the current constraints*. Add some constraints and you'll see the results in the drop-down change.

      Terms are facets too!

My hacked code is here: <http://patacriticism.svn.sourceforge.net/ viewvc/patacriticism/collex/trunk/src/solr/org/nines/ FacetRequestHandler.java?revision=483&view=markup>, starting around line 152.

152 TermEnum termEnum = reader.terms(new Term(field, prefix));
  153             while (true) {
  154               Term term = termEnum.term();
155 if (term == null || !term.field().equals(field) || !term.text().startsWith(prefix)) break;
  156
157 DocSet docSet = searcher.getDocSet(new TermQuery (term));
  158               int size = docSet.intersectionSize(constraintMask);
  159               if (size > 0) map.put(term.text(), size);
  160
  161               if (! termEnum.next()) break;
  162             }

Don't bother critiquing the code, I know its an unscalable hack :/ As you'll see if you're crazy enough to peruse the rest of that code, the whole thing can practically be replaced with the Solr faceting, but I've got little custom things like this that make it trickier to replace than meets the eye.

Part of my Flare effort is to distill goodies from Collex (at least idea-wise, likely not copy/paste-wise).

What the user-interface needs is a way to ask Solr for terms that begin with a specified prefix, as the user types. Paging via start/ rows is necessary, and also sorting by frequency given some specified constraints. I like the start/end term idea also, though I can't think of a scenario in my application where this would be different than having a prefix parameter. If I want all the 1860's, prefix=186&field=year, for example.

I would be thrilled if this just magically appeared in Solr's codebase before I have a chance to build it. :)

As for Hoss's suggestion of a Stats handler - I still hold the opinion that all of the admin JSPs really ought to be first class request handlers that go through the whole ResponseWriter stuff, so I can get all of that great capability in Ruby format instead of XML. As it is, to build a Ruby API to Solr and provide access to the stats, there has to be two different parsing mechanisms. I know he meant index stats, not Solr admin stats, but it reminded me of the XML pain I'm going to feel in solrb to add Solr stats :)

        Erik


On Jan 11, 2007, at 3:13 PM, Yonik Seeley wrote:

On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Writing a more generic "Stats" request handler that does what you're
describing certianly seems like a good idea.

Hmmm, I hadn't thought of it as a separate handler, but as long as
these types of requests aren't related to a base query, and not needed
along with every query, I guess that could make sense.

 Attempting to enumerating
all of the values for a field could be dangerous

We do it for faceting :-) But we don't drag it all into memory at once...

but an API where the
clienc specifies a starting term and a number of terms and we use the
TermEnum.seek() would be fairly straight forward.

Adding a start and end (like a range query) is a great idea!
Additionally, I think adding support to incrementally write all the
terms to the response might be important... loading them all into
memory doesn't seem like a great idea.

Perhaps adding Iterator or Iterable to the list of supported types in
TextWriter would be a nice general way to go.

-Yonik

Reply via email to