Hi all,

I have a particular data structure I'm trying to index into a solr document
so that I can query and facet it in a particular way, and I can't quite
figure out the best way to go about it.

One sample object is here: https://gist.github.com/1139065

The part that's tripping me up is the workflows. Each workflow has a name
(in this case, digitizationWF and accessionWF). Each workflow is made up of
a number of processes, each of which has its own current status. Every time
the status of a process within a workflow changes, the object is reindexed.

What I'd like to be able to do is present several hierarchies of facets: In
one, the workflow name is the top-level facet, with the second level showing
each process, under which is listed each status (completed, waiting, or
error) and the number of documents with that status for that process (some
values omitted for brevity):

accessionWF (583)
  publish (583)
    completed (574)
    waiting (6)
    error (3)
  shelve (583)
    completed (583)

etc.

I'd also like to be able to invert that presentation:

accessionWF (583)
  completed (583)
    publish (574)
    shelve (583)
  waiting (6)
    publish (6)
  error (3)
    publish (3)

or even

completed (583)
  accessionWF (583)
    publish (574)
    shelve (583)
  digitizationWF (583)
    initiate (583)
error (3)
  accessionWF (3)
    shelve (3)

etc.

I don't think Solr 4.0's pivot/hierarchical facets are what I'm looking for,
because the status values are ambiguous when not qualified by the process
name -- the object itself has no "completed" status, only a
"publish:completed" and a "shelve:completed" that I want to be able to group
together into a count/list of objects with "completed" processes. I also
don't think PathHierarchyTokenizerFactory is quite the answer either.

What kind of Solr magic, if any, am I looking for here?

Thanks in advance for any help or advice.
Michael

---
Michael B. Klein
Digitization Workflow Engineer
Stanford University Libraries

Reply via email to