Let's take this a step further, like I do with a (messy) custom
request handler in Collex. For an example, go to http://
www.nines.org/collex and type "sol" into the (slightly misnamed from
our technical perspective) "phrase" text box. The drop-down shows
all terms beginning with "sol", and _also_ the counts of the
documents *within the current constraints*. Add some constraints and
you'll see the results in the drop-down change.
Terms are facets too!
My hacked code is here: <http://patacriticism.svn.sourceforge.net/
viewvc/patacriticism/collex/trunk/src/solr/org/nines/
FacetRequestHandler.java?revision=483&view=markup>, starting around
line 152.
152 TermEnum termEnum = reader.terms(new Term(field,
prefix));
153 while (true) {
154 Term term = termEnum.term();
155 if (term == null || !term.field().equals(field)
|| !term.text().startsWith(prefix)) break;
156
157 DocSet docSet = searcher.getDocSet(new TermQuery
(term));
158 int size = docSet.intersectionSize(constraintMask);
159 if (size > 0) map.put(term.text(), size);
160
161 if (! termEnum.next()) break;
162 }
Don't bother critiquing the code, I know its an unscalable hack :/
As you'll see if you're crazy enough to peruse the rest of that code,
the whole thing can practically be replaced with the Solr faceting,
but I've got little custom things like this that make it trickier to
replace than meets the eye.
Part of my Flare effort is to distill goodies from Collex (at least
idea-wise, likely not copy/paste-wise).
What the user-interface needs is a way to ask Solr for terms that
begin with a specified prefix, as the user types. Paging via start/
rows is necessary, and also sorting by frequency given some specified
constraints. I like the start/end term idea also, though I can't
think of a scenario in my application where this would be different
than having a prefix parameter. If I want all the 1860's,
prefix=186&field=year, for example.
I would be thrilled if this just magically appeared in Solr's
codebase before I have a chance to build it. :)
As for Hoss's suggestion of a Stats handler - I still hold the
opinion that all of the admin JSPs really ought to be first class
request handlers that go through the whole ResponseWriter stuff, so I
can get all of that great capability in Ruby format instead of XML.
As it is, to build a Ruby API to Solr and provide access to the
stats, there has to be two different parsing mechanisms. I know he
meant index stats, not Solr admin stats, but it reminded me of the
XML pain I'm going to feel in solrb to add Solr stats :)
Erik
On Jan 11, 2007, at 3:13 PM, Yonik Seeley wrote:
On 1/11/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
Writing a more generic "Stats" request handler that does what you're
describing certianly seems like a good idea.
Hmmm, I hadn't thought of it as a separate handler, but as long as
these types of requests aren't related to a base query, and not needed
along with every query, I guess that could make sense.
Attempting to enumerating
all of the values for a field could be dangerous
We do it for faceting :-) But we don't drag it all into memory at
once...
but an API where the
clienc specifies a starting term and a number of terms and we use the
TermEnum.seek() would be fairly straight forward.
Adding a start and end (like a range query) is a great idea!
Additionally, I think adding support to incrementally write all the
terms to the response might be important... loading them all into
memory doesn't seem like a great idea.
Perhaps adding Iterator or Iterable to the list of supported types in
TextWriter would be a nice general way to go.
-Yonik