: The right approach for more flexible date parsing is probably to add
: more functionality to the date field and configure via optional
: attributes.

Adding configuration options to DateField seems like it might ultimately
be the right choice for changing the *internal* format, but assuming we
want to keep the internal representation of DateField fixed and
unconfigurable for the time being and address the the various *external*
formatting issues i imagine the simplest things to tackle this (in a way
that is consistent with the other datatypes) would be...

1) change DateField to support Analyzers.  that way you could have
seperate analyzers for indexing vs querying just like a text field (so you
could for example send Solr seconds since epoch when indexing, and
query query using MM/DD/YYYY)

The Analyzers used would be responsible for producing Tokens which match
what values the current DateField.toInternal() already consideres legal
(either a DateMath string or an iso8601 string).

(In general a "DateTranslatingTokenFilter" class would be a pretty cool
addition to Lucene, it could as constructor args two DateFormatters (one
for parsing the incoming tokens, and one for formating the outgoing
tokens) and a boolean indicating wether it's job was to replace matching
tokens or inject duplicate tokens in the same position ... maybe another
option indicating wether incoming Tokens that can't be parsed should be
striped or passed through ... the idea being that for something like
DateFiled you would use KeywordTokenizer along with an instance of this to
parse whatever format you wanted -- but when parsing generic text you
might have several of these TokenFilters configured with differnet
DateFormatters so if they see a Token in the text that matches a known
DateFormat they could inject the name of the month, or the day of hte week
into the text at the same position.)


2) add options to the various QueryResponseWriters to control which format
they use when writting fields out. .. in the case of XmlResposneWriter it
would still produce a <date> tag, but the value would be formated
according to the configuration.


-Hoss

Reply via email to