I'm not sure I fully understand your ultimate goal or Yonik's response. However, in the past I've been able to represent hierarchical data as a simple enumeration of delimited paths:

<field name="taxonomy">root</field>
<field name="taxonomy">root/region</field>
<field name="taxonomy">root/region/north america</field>
<field name="taxonomy">root/region/south america</field>

Then, at response time, you can walk the result facet and build a hierarchy with counts that can be put into a tree view. The tree can be any arbitrary depth, and documents can live in any combination of nodes on the tree.

In addition, you can represent any arbitrary name value pair (attribute/tuple) as a two level tree. That way, you can put any combination of attributes in the facet and parse them out at results list time. For example, you might be indexing computer hardware. Memory, Bus Speed and Resolution may be valid for some objects but not for others. Just put them in a facet and specify a separator:

<field name="attribute">memory:1GB</name>
<field name="attribute">busspeed:133Mhz</name>
<field name="attribute">voltage:110/220</name>
<field name="attribute">manufacturer:Shiangtsu</field>


When you do a facet query, you can easily display the categories appropriate to the object. And do facet selections like "show me all green things" and "show me all size 4 things".


Even if that's not your goal, this might help someone else.


George Everitt







On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:

So, I think I have things set up correctly in my schema, but it doesn't
appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category field
(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.

I have several different field types, each with a different regex to
match a specific part of the input string.  In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn

   <fieldtype name="cat1str" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.PatternTokenizerFactory"
pattern="^([^:]+)" group="1"/>
                </analyzer>
   </fieldtype>

I have copyfields set up for each Category_# field. Anything obviously
wrong?

Thanks!
Kristen

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?

On 11/6/07, Kristen Roth <[EMAIL PROTECTED]> wrote:
Yonik - thanks so much for your help!  Just to clarify; where should
the
regex go for each field?

Each field should have a different FieldType (referenced by the "type"
XML attribute).  Each fieldType can have it's own analyzer.  You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.

-Yonik


Reply via email to