On 1/9/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:
I would like to use faceted browsing to group documents by year, month, and day.
I don't know what your particular use-case is, but this might be a job for facet hierarchies. http://www.nabble.com/Hierarchical-Facets--tf2560327.html#a7135353 No, there isn't anything implemented in Solr yet... but then again, built in faceting didn't even exist in Solr until 4 months ago :-) I'm pretty sure we could handle the computational requirements, it's more a matter of defining useful generic semantics and the interface.
Option 1: Add three fields, one for year, month, day. Something like: <field name="addedTime" type="date" indexed="true" stored="true" /> <field name="addedTimeYEAR" type="string" ... /> <field name="addedTimeMONTH" type="string" ... /> <field name="addedTimeDAY" type="string" ... /> then use copyField to generate the various versions: <copyField source="addedTime" dest="addedTimeYEAR"/> <copyField source="addedTime" dest="addedTimeMONTH"/> <copyField source="addedTime" dest="addedTimeDAY"/> this would somehow convert the original date format for each copy: addedTime = "2007-01-08T21:36:15.635Z" addedTimeYEAR = "2007" addedTimeMONTH = "2007-01" addedTimeDAY = "2007-01-08" Perhaps this requires a custom FieldType for Y/M/D to convert the larger string to the smaller one. pros: * Can use SimpleFacets directly cons: * seems messy. particularly since i have multiple fields i'd like to have the same behavior.
There's also a question of if you would really want a breakdown by each day (if you had 10 years of data say) returned to the client. It starts to be a lot of data. That's what made me think of a hierarchy where you could start out at a higher level and drill down. Of course, that's possible with simple facets too I guess (via filtering)
Option 2: Add an analyzer to the date field that adds multiple Tokens with various resolutions, then write a custom faceter that knows a string length 4=year, y=month, 10=day. Or, perhaps it could look at the token name.
I don't think adding to the same field buys you much (anything?) over adding to a different field. In any case, you could also do simple faceting on this field as-is if your client has knowledge of the different lengths of strings.
schema.xml: <fieldtype name="fdate" class="solr.DateField"> <analyzer type="index" class="...DateFacetAnalyzer"/> </fieldtype> DateFacetAnalyzer: Token t = new Token( date, 0, date.length(), "original" ); t.setPositionIncrement( 0 ); tokens.add( t ); t = new Token( date, 0, 4, "year" ); t.setPositionIncrement( 0 ); tokens.add( t ); t = new Token( date, 0, 7, "month" ); t.setPositionIncrement( 0 ); tokens.add( t ); ... pros: * simple / reusable cons: * I don't fully understand how it would affect search & sorting Any thoughts / pointers / advice? thanks ryan
-Yonik