Re: Need help indexing/querying a particular type of hierarchy

Michael B. Klein Thu, 11 Aug 2011 11:46:28 -0700

I've been experimenting with that, but that fq wouldn't limit my facet
counts adequately. Since the document has both an accessionWF and a
digitizationWF, the fq would match (and count) the document no matter what
the status for each process.


I suppose I could do something like this:

        <field name="status_wps">accessionWF:start-accession:completed</field>
        <field name="status_wps">accessionWF:cleanup:waiting</field>
        <field 
name="status_wps">accessionWF:descriptive-metadata:completed</field>
        <field name="status_wps">accessionWF:content-metadata:completed</field>
        <field name="status_wps">accessionWF:rights-metadata:completed</field>
        <field name="status_wps">accessionWF:publish:completed</field>
        <field name="status_wps">accessionWF:shelve:error</field>
        <field name="status_wsp">accessionWF:completed:start-accession</field>
        <field name="status_wsp">accessionWF:waiting:cleanup</field>
        <field 
name="status_wsp">accessionWF:completed:descriptive-metadata</field>
        <field name="status_wsp">accessionWF:completed:content-metadata</field>
        <field name="status_wsp">accessionWF:completed:rights-metadata</field>
        <field name="status_wsp">accessionWF:completed:publish</field>
        <field name="status_wsp">accessionWF:error:shelve</field>
        <field name="status_swp">completed:accessionWF:start-accession</field>
        <field name="status_swp">waiting:accessionWF:cleanup</field>
        <field 
name="status_swp">completed:accessionWF:descriptive-metadata</field>
        <field name="status_swp">completed:accessionWF:content-metadata</field>
        <field name="status_swp">completed:accessionWF:rights-metadata</field>
        <field name="status_swp">completed:accessionWF:publish</field>
        <field name="status_swp">error:accessionWF:shelve</field>

and use a PathHierarchyTokenizerFactory with : as the delimiter. Then I
could use facet.field=status_wps&f.status_wps.facet.prefix=accessionWF: to
get the counts for all the accessionWF processes and statuses, then repeat
using status_wsp and status_swp for the various inversions. I was hoping for
something easier. :)

On Thu, Aug 11, 2011 at 6:40 AM, Dmitry Kan <dmitry....@gmail.com> wrote:

> Hi,
>
> Can you keep your hierarchy flat in SOLR and then use filter queries
> (fq=wf:accessionWF) inside you facet queries (facet.field=status)?
>
> Or is the requirement to have one single facet query producing the
> hierarchical facet counts?
>
> On Thu, Aug 11, 2011 at 10:43 AM, Michael B. Klein <mbkl...@gmail.com
> >wrote:
>
> > Hi all,
> >
> > I have a particular data structure I'm trying to index into a solr
> document
> > so that I can query and facet it in a particular way, and I can't quite
> > figure out the best way to go about it.
> >
> > One sample object is here: https://gist.github.com/1139065
> >
> > The part that's tripping me up is the workflows. Each workflow has a name
> > (in this case, digitizationWF and accessionWF). Each workflow is made up
> of
> > a number of processes, each of which has its own current status. Every
> time
> > the status of a process within a workflow changes, the object is
> reindexed.
> >
> > What I'd like to be able to do is present several hierarchies of facets:
> In
> > one, the workflow name is the top-level facet, with the second level
> > showing
> > each process, under which is listed each status (completed, waiting, or
> > error) and the number of documents with that status for that process
> (some
> > values omitted for brevity):
> >
> > accessionWF (583)
> >  publish (583)
> >    completed (574)
> >    waiting (6)
> >    error (3)
> >  shelve (583)
> >    completed (583)
> >
> > etc.
> >
> > I'd also like to be able to invert that presentation:
> >
> > accessionWF (583)
> >  completed (583)
> >    publish (574)
> >    shelve (583)
> >  waiting (6)
> >    publish (6)
> >  error (3)
> >    publish (3)
> >
> > or even
> >
> > completed (583)
> >  accessionWF (583)
> >    publish (574)
> >    shelve (583)
> >  digitizationWF (583)
> >    initiate (583)
> > error (3)
> >  accessionWF (3)
> >    shelve (3)
> >
> > etc.
> >
> > I don't think Solr 4.0's pivot/hierarchical facets are what I'm looking
> > for,
> > because the status values are ambiguous when not qualified by the process
> > name -- the object itself has no "completed" status, only a
> > "publish:completed" and a "shelve:completed" that I want to be able to
> > group
> > together into a count/list of objects with "completed" processes. I also
> > don't think PathHierarchyTokenizerFactory is quite the answer either.
> >
> > What kind of Solr magic, if any, am I looking for here?
> >
> > Thanks in advance for any help or advice.
> > Michael
> >
> > ---
> > Michael B. Klein
> > Digitization Workflow Engineer
> > Stanford University Libraries
> >
>
>
>
> --
> Regards,
>
> Dmitry Kan
>

Re: Need help indexing/querying a particular type of hierarchy

Reply via email to