Re: listing/enumerating field information

Erik Hatcher Fri, 12 Jan 2007 02:06:46 -0800

Let's take this a step further, like I do with a (messy) customrequest handler in Collex. For an example, go to http://www.nines.org/collex and type "sol" into the (slightly misnamed fromour technical perspective) "phrase" text box. The drop-down showsall terms beginning with "sol", and _also_ the counts of thedocuments *within the current constraints*. Add some constraints andyou'll see the results in the drop-down change.


      Terms are facets too!

My hacked code is here: <http://patacriticism.svn.sourceforge.net/viewvc/patacriticism/collex/trunk/src/solr/org/nines/FacetRequestHandler.java?revision=483&view=markup>, starting aroundline 152.

152 TermEnum termEnum = reader.terms(new Term(field,prefix));

  153             while (true) {
  154               Term term = termEnum.term();

155 if (term == null || !term.field().equals(field)|| !term.text().startsWith(prefix)) break;

157 DocSet docSet = searcher.getDocSet(new TermQuery(term));

  158               int size = docSet.intersectionSize(constraintMask);
  159               if (size > 0) map.put(term.text(), size);
  160
  161               if (! termEnum.next()) break;
  162             }

Don't bother critiquing the code, I know its an unscalable hack :/As you'll see if you're crazy enough to peruse the rest of that code,the whole thing can practically be replaced with the Solr faceting,but I've got little custom things like this that make it trickier toreplace than meets the eye.

Part of my Flare effort is to distill goodies from Collex (at leastidea-wise, likely not copy/paste-wise).

What the user-interface needs is a way to ask Solr for terms thatbegin with a specified prefix, as the user types. Paging via start/rows is necessary, and also sorting by frequency given some specifiedconstraints. I like the start/end term idea also, though I can'tthink of a scenario in my application where this would be differentthan having a prefix parameter. If I want all the 1860's,prefix=186&field=year, for example.

I would be thrilled if this just magically appeared in Solr'scodebase before I have a chance to build it. :)

As for Hoss's suggestion of a Stats handler - I still hold theopinion that all of the admin JSPs really ought to be first classrequest handlers that go through the whole ResponseWriter stuff, so Ican get all of that great capability in Ruby format instead of XML.As it is, to build a Ruby API to Solr and provide access to thestats, there has to be two different parsing mechanisms. I know hemeant index stats, not Solr admin stats, but it reminded me of theXML pain I'm going to feel in solrb to add Solr stats :)


        Erik


On Jan 11, 2007, at 3:13 PM, Yonik Seeley wrote:

On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

Writing a more generic "Stats" request handler that does what you're
describing certianly seems like a good idea.


Hmmm, I hadn't thought of it as a separate handler, but as long as
these types of requests aren't related to a base query, and not needed
along with every query, I guess that could make sense.

 Attempting to enumerating
all of the values for a field could be dangerous

We do it for faceting :-) But we don't drag it all into memory atonce...

but an API where the
clienc specifies a starting term and a number of terms and we use the
TermEnum.seek() would be fairly straight forward.


Adding a start and end (like a range query) is a great idea!
Additionally, I think adding support to incrementally write all the
terms to the response might be important... loading them all into
memory doesn't seem like a great idea.

Perhaps adding Iterator or Iterable to the list of supported types in
TextWriter would be a nice general way to go.

-Yonik

Re: listing/enumerating field information

Reply via email to