Regarding Hoss's points about the internal format, resolution of
date-times, etc.: maybe a good starting point would be to implement the
date-time algorithms of XML Schema
(http://www.w3.org/TR/xmlschema-2/#isoformats), where these behaviors
are spelled out in reasonably precise terms. There must be code
somewhere that Solr could steal to help with this. This would mesh well
with XSLT 2.0, and presumably other modern XML environments.

peter

-----Original Message-----
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 10, 2007 12:30 PM
To: solr-user@lucene.apache.org
Subject: Re: dates & times


: It's more than string processing, anyway. I would want to convert the
: Solr Time 2007-03-15T00:41:5:2Z to "March 15th, 2007" in a web app.
: I'd also like to say 'Posted 3 days ago." In my vision of things,
: that work is done on Solr's side. (The former case with a strftime
: type formatter in solrconfig, the latter by having strftime return
: the day number this year.)

One of the early architecture/design principles of the Solr "search"
APIs was "compute secondary info about a result if it's more efficient
or easier to compute in Solr then it would be for a client to do it" --
DocSet caches, facet counts, and sorting/pagination being great examples
of things where Solr can do less "work" to get the same info out of raw
data then a client app would because of it's low level access to the
data, and becuase of how much data would need to go over the wire for
the client to do the same computation. ... that's largely just a lit bit
of historic trivial however, Solr has a lot of features now which might
not hold up to the yard stick, but i mention it only to clarify one of
hte reasons Solr didnt' have more 'configurable" date formatting to
start with.

it has been on the TaskList since the start of incubation however...

  * a DateTime field (or Query Parser extension) that allows flexible
    input for easier human entered queries
  * allow alternate format for date output to ease client creation of
    date objects?

One of hte reasons i dont' think anyone has tackled them yet is because
it's hard to get a holistic view of a solution, because there are really
several loosely related problems with date formatting issues:

The first is a discusion of the "internal format" and what resolution
the dates are stored at in the index itself.  if you *know* that you
never plan on querying with anything more fine grained then day
resolution, storing your dates with only day resolution can make your
index a lot smaller (and make date searches a lot faster).  with the
current DateField the same performance benefits can be achieved by
"rounding" your dates before indexing them, but if we were to make it a
config option on DateField itself to automaticly round, we would need to
take this info into account when parsing updates -- should the client be
exepcted to know what precision each date field uses?  do they send
dates expressed using the "internal" format, or as fully qualified
times?  is it an error/warning to attempt to index more datetime
precision then a field supports?

The second is a discussion of "external format" (which seems to be what
you are mostly discussing)  the most trivial way to address this would
be options on the ResponseWriters that allow them to be configured with
DateFormater Strings they would use to process any date they return ..
but that raises questions about the QueryParsing aspect as well ...
should date formating be a property of the response, or a property of
the request, such that both input and output formats are identicle?

Third is how the discussions of the internal format and the external
format shouldn't be treated completely indepndent.  it's tempting to say
that there will be a clean abstraction between the two, that all client
interaction will be done using configured "external" formater(s) to
create internal java Date objects, which will then be translated back to
Strings by an "internal" formater for the purpose of indexing (and
querying) but what happens when a query expresses a date range too
precise for the granularity expressed by the internal format? do we
match nothing/everything? ... what if the indexed granularity is *more*
recised then the uery graunlarity .. how do we know if a range query
between March 6, 2007 and May 10, 2007 on a field that stores
millisencond granularity is suppose to go from the first millisecond of
each day or the last?



Questions like these are whiy I'm glad Solr currently keeps it simple
and makes people deal in absolutes .. less room for confusion  :)


-Hoss

Reply via email to