On 1/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
I would like to use faceted browsing to group documents by year,
month, and day.

I don't know what your particular use-case is, but this might be a job
for facet hierarchies.
http://www.nabble.com/Hierarchical-Facets--tf2560327.html#a7135353
No, there isn't anything implemented in Solr yet... but then again,
built in faceting didn't even exist in Solr until 4 months ago :-)
I'm pretty sure we could handle the computational requirements, it's
more a matter of defining useful generic semantics and the interface.

Option 1:
Add three fields, one for year, month, day.  Something like:

 <field name="addedTime" type="date" indexed="true" stored="true" />
 <field name="addedTimeYEAR" type="string" ... />
 <field name="addedTimeMONTH" type="string" ... />
 <field name="addedTimeDAY" type="string" ... />

then use copyField to generate the various versions:
 <copyField source="addedTime" dest="addedTimeYEAR"/>
 <copyField source="addedTime" dest="addedTimeMONTH"/>
 <copyField source="addedTime" dest="addedTimeDAY"/>

this would somehow convert the original date format for each copy:
 addedTime      = "2007-01-08T21:36:15.635Z"
 addedTimeYEAR  = "2007"
 addedTimeMONTH = "2007-01"
 addedTimeDAY   = "2007-01-08"

Perhaps this requires a custom FieldType for Y/M/D to convert the
larger string to the smaller one.

pros:
* Can use SimpleFacets directly
cons:
* seems messy.  particularly since i have multiple fields i'd like to
have the same behavior.

There's also a question of if you would really want a breakdown by
each day (if you had 10 years of data say) returned to the client.  It
starts to be a lot of data.  That's what made me think of a hierarchy
where you could start out at a higher level and drill down.  Of
course, that's possible with simple facets too I guess (via filtering)

Option 2:
Add an analyzer to the date field that adds multiple Tokens with
various resolutions, then write a custom faceter that knows a string
length 4=year, y=month, 10=day.  Or, perhaps it could look at the
token name.

I don't think adding to the same field buys you much (anything?) over
adding to a different field.  In any case, you could also do simple
faceting on this field as-is if your client has knowledge of the
different lengths of strings.

schema.xml:

  <fieldtype name="fdate" class="solr.DateField">
    <analyzer type="index" class="...DateFacetAnalyzer"/>
  </fieldtype>

DateFacetAnalyzer:
 Token t = new Token( date, 0, date.length(), "original" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 t = new Token( date, 0, 4, "year" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 t = new Token( date, 0, 7, "month" );
 t.setPositionIncrement( 0 );
 tokens.add( t );

 ...

pros:
* simple / reusable
cons:
* I don't fully understand how it would affect search & sorting

Any thoughts / pointers / advice?

thanks
ryan


-Yonik

Reply via email to