Re: Hierarchical faceting

2014-11-14 Thread Evan Pease
Hi Rashmi,

Here is some more details on how to use PathHierarchyTokenizer that Oleg
provided the link to.

If this is your document:

> *Sample document*
> 
> name=Pbook1
> category=NonFic/Sci/Phy/Quantum
> author=ABC
> price=20.00
> 

Then, in your schema.xml:



  

  
  

  


Then, in your Solr query, you can simply add:

&facet=true
&facet.field=category

You should see a facet that contains each level of the taxonomy with counts.

To navigate the taxonomy you add filter queries using the part of the path
you want narrow the results down to (values from the category facet).

So, for example a user clicks on "NonFic"

&facet=true
&facet.field=category
&fq={!term f=category}NonFic

Then "NonFic/Sci"

&fq={!term f=category}NonFic/Sci

Then "NonFic/Sci/Phy"

&fq={!term f=category}NonFic/Sci/Phy

etc..

If you only want to display the leaf level category and indent child
categories you can easily do this in your UI by splitting the facet value
on your separator, "/" in this case.


Thanks,
Evan



On Nov 14, 2014 8:06 PM, "Oleg Savrasov"  wrote:

> Hi Rashmi,
>
> I believe you are looking for PathHierarchyTokenizer,
> see
>
> https://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/path/PathHierarchyTokenizer.html
>
> Oleg
>
> 2014-11-14 17:53 GMT-05:00 rashmy1 :
>
> > Hello,
> > I'm trying to setup Solr for fetching hierarchical facets.
> > Please advice which of the below approaches should be followed for my
> > scenario.
> > *Scenario:
> > *
> > NonFic
> > Hist
> > HistBook1
> > HistBook2
> > Sci
> > Phy
> > Quantum
> > Pbook1
> > Pbook2
> > Thermodynamics
> > Pbook3
> > Pbook4
> > Chem
> > Cbook1
> > Math
> > Mbook1
> > Fic
> > Mystery
> > Mybook1
> > Childrens
> > Chbook1
> > Chbook2
> >
> > *Sample document*
> > 
> > name=Pbook1
> > category=NonFic/Sci/Phy/Quantum
> > author=ABC
> > price=20.00
> > 
> >
> > *Requirements:*
> > -Show drill down facets
> > -If user searched for "*", the initial set of facets to be shown are
> > 'NonFic' and 'Fic'
> > -If user selects facet 'NonFic', we then show the facets 'Hist' and 'Sci'
> > only.
> >
> > *Option1:*
> > /Solr schema:/
> >  > stored="true" type="string"/>
> > /Document supplied for indexing:/
> > 
> > name=Pbook1
> > category=0/NonFic
> > category=1/NonFic/Sci
> > category=2/NonFic/Sci/Phy
> > category=3/NonFic/Sci/Phy/Quantum
> > category=0/Other (a book can belong to multiple categories)
> > author=ABC
> > price=20.00
> > 
> > With Option2, we can do a drill down facet query.
> > For example, if we give facet.prefix=NonFic/Sci/, the facet results are:
> > NonFic/Sci/Phy
> > NonFic/Sci/Chem
> > NonFic/Sci/Math
> > The only issue is that I have to take care of generating all possible
> path
> > information for 'category'
> >
> > *Option2:*
> > /Solr schema:/
> > 
> >   
> >  > delimiter="/"/>
> >   
> > 
> >  > stored="true" type="path"/>
> > /Document supplied for indexing:/
> > 
> > name=Pbook1
> > category=NonFic/Sci/Phy/Quantum
> > author=ABC
> > price=20.00
> > 
> > With Option2, we can do facet query but it returns all possible
> combination
> > of paths.
> > For example, if we give facet.prefix=Fic, the facet results are:
> > Fic (3)
> > Fic/Mystery (1)
> > Fic/Childrens (2)
> >
> >
> > I'm looking to supply a doc with just a single entry (like
> > 'category=NonFic/Sci/Phy/Quantum' ) and be able to do a drill down query.
> > Is
> > there some existing Solr tokernizer which takes care of generating all
> > possibly combinations which indexing instead of having to generating them
> > as
> > part of  creation?
> >
> > Thanks
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Hierarchical faceting

2014-11-17 Thread Evan Pease
>I'm looking to see if Solr has any in-built tokenizer that splits the
tokens
>and prepends with the depth information. I'd like to avoid building depth
>information into the filed values if Solr already has something that can be
>used.

So the goal is to find out the level of the tree for each category? You
could determine this in the UI by splitting the category facet value string
by the separator.

As you're aware, when you query a field indexed using
solr.PathHierarchyTokenizerFactory
you still get the full path category path back as a facet value.

For example, if a user navigates to "Phy":
fq={!term f=category}NonFic/Sci/Phy

The facet values that are returned will look like this (made up counts):


  10
  
wrote:

> I realize you want to avoid putting depth details into the field values,
> but something has to imply the depth.  So with that in mind, here is
> another approach (with the assumption that you are chasing down a single
> branch of a tree (and all its subbranch offshoots)),
>
> Use dynamic fields
> Step from one level to the next with a simple increment
> Build the facet for the next level on the call
> The UI needs only know the current level
>
> This would possibly be as so:
>
> step_fieldname_n
>
> With a dynamic field configuration of:
>
> step_*
>
> The content of the step_fieldname_n field would either be the strong of
> the field value or the delimited path of the current level (as suited to
> taste).  Either way, most likely a fieldType of String (or some variation
> thereof)
>
> The UI would then call:
>
> facet.field=step_fieldname_n+1
>
> And the UI would need to be aware to carry the n+1 into the fq link
> verbiage:
>
> fq=step_fieldname_n+1:facetvalue
>
> The trick of all of this is that you must build your index with the depth
> of your hierarchy in mind to place the values into the suitable fields.
> You could, of course, write an UpdateProcessor to accomplish this if that
> seems fitting.
>
> Jason
>
> > On Nov 17, 2014, at 12:22 PM, Alexandre Rafalovitch 
> wrote:
> >
> > You might be able to stick in a couple of PatternReplaceFilterFactory
> > in a row with regular expressions to catch different levels.
> >
> > Something like:
> >
> >  > pattern="^[^0-9][^/]+/[^/]/[^/]+$" replacement="2$0" />
> >  > pattern="^[^0-9][^/]+/[^/]$" replacement="1$0" />
> > ...
> >
> > I did not test this, you may need to escape some thing or put explicit
> > groups in there.
> >
> > Regards,
> >   Alex.
> > P.s.
> http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/pattern/PatternReplaceFilterFactory.html
> >
> > Personal: http://www.outerthoughts.com/ and @arafalov
> > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
> >
> >
> > On 17 November 2014 15:01, rashmy1 
> wrote:
> >> Hi Alexandre,
> >> Yes, I've read this post and that's the 'Option1' listed in my initial
> post.
> >>
> >> I'm looking to see if Solr has any in-built tokenizer that splits the
> tokens
> >> and prepends with the depth information. I'd like to avoid building
> depth
> >> information into the filed values if Solr already has something that
> can be
> >> used.
> >>
> >> Thanks!
> >>
> >>
> >>
> >> --
> >> View this message in context:
> http://lucene.472066.n3.nabble.com/Hierarchical-faceting-tp4169263p4169536.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
>
>