I'm not sure I fully understand your ultimate goal or Yonik's
response. However, in the past I've been able to represent
hierarchical data as a simple enumeration of delimited paths:
<field name="taxonomy">root</field>
<field name="taxonomy">root/region</field>
<field name="taxonomy">root/region/north america</field>
<field name="taxonomy">root/region/south america</field>
Then, at response time, you can walk the result facet and build a
hierarchy with counts that can be put into a tree view. The tree can
be any arbitrary depth, and documents can live in any combination of
nodes on the tree.
In addition, you can represent any arbitrary name value pair
(attribute/tuple) as a two level tree. That way, you can put any
combination of attributes in the facet and parse them out at results
list time. For example, you might be indexing computer hardware.
Memory, Bus Speed and Resolution may be valid for some objects but not
for others. Just put them in a facet and specify a separator:
<field name="attribute">memory:1GB</name>
<field name="attribute">busspeed:133Mhz</name>
<field name="attribute">voltage:110/220</name>
<field name="attribute">manufacturer:Shiangtsu</field>
When you do a facet query, you can easily display the categories
appropriate to the object. And do facet selections like "show me all
green things" and "show me all size 4 things".
Even if that's not your goal, this might help someone else.
George Everitt
On Nov 7, 2007, at 3:15 PM, Kristen Roth wrote:
So, I think I have things set up correctly in my schema, but it
doesn't
appear that any logic is being applied to my Category_# fields - they
are being populated with the full string copied from the Category
field
(facet1::facet2::facet3...facetn) instead of just facet1, facet2, etc.
I have several different field types, each with a different regex to
match a specific part of the input string. In this example, I'm
matching facet1 in input string facet1::facet2::facet3...facetn
<fieldtype name="cat1str" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.PatternTokenizerFactory"
pattern="^([^:]+)" group="1"/>
</analyzer>
</fieldtype>
I have copyfields set up for each Category_# field. Anything
obviously
wrong?
Thanks!
Kristen
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Wednesday, November 07, 2007 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Can you parse the contents of a field to populate other
fields?
On 11/6/07, Kristen Roth <[EMAIL PROTECTED]> wrote:
Yonik - thanks so much for your help! Just to clarify; where should
the
regex go for each field?
Each field should have a different FieldType (referenced by the "type"
XML attribute). Each fieldType can have it's own analyzer. You can
use a different PatternTokenizer (which specifies a regex) for each
analyzer.
-Yonik