Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e:
your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub-event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent-eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub-events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent-eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan <mark.al...@ed.ac.uk> > Hi all, > > Firstly, I apologise for the length of this email but I need to describe > properly what I'm doing before I get to the problem! > > I'm working on a project just now which requires the ability to store and > search on temporal coverage data - ie. a field which specifies a date range > during which a certain event took place. > > I hunted around for a few days and couldn't find anything which seemed to > fit, so I had a go at writing my own field type based on solr.PointType. > It's used as follows: > schema.xml > <fieldType name="temporal" class="solr.TemporalCoverage" > dimension="2" subFieldSuffix="_i"/> > <field name="daterange" type="temporal" indexed="true" stored="true" > multiValued="true"/> > data.xml > <add> > <doc> > ... > <field name="daterange">1940,1945</field> > </doc> > </add> > > Internally, this gets stored as: > <arr name="daterange"><str>1940,1945</str></arr> > <int name="daterange_0_i">19400000</int> > <int name="daterange_1_i">19450000</int> > > In due course, I'll declare the subfields as a proper date type, but in the > meantime, this works absolutely fine. I can search for an individual date > and Solr will check (queryDate > daterange_0 AND queryDate < daterange_1 ) > and the correct documents are returned. My code also allows the user to > input a date range in the query but I won't complicate matters with that > just now! > > The problem arises when a document has more than one "daterange" field > (imagine a news broadcast which covers a variety of topics and hence time > periods). > > A document with two daterange fields > <doc> > ... > <field name="daterange">19820402,19820614</field> > <field name="daterange">1990,2000</field> > </doc> > gets stored internally as > <arr > name="daterange"><str>19820402,19820614</str><str>1990,2000</str></arr> > <arr name="daterange_0_i"><int>19820402</int><int>19900000</int></arr> > <arr name="daterange_1_i"><int>19820614</int><int>20000000</int></arr> > > In this situation, searching for 1985 should yield zero results as it is > contained within neither daterange, however, the above document is returned > in the result set. What Solr is doing is checking that the queryDate (1985) > is greater than *any* of the values in daterange_0 AND queryDate is less > than *any* of the values in daterange_1. > > How can I get Solr to respect the positions of each item in the daterange_0 > and _1 arrays? Ideally I'd like the search to use the following logic, thus > preventing the above document from being returned in a search for 1985: > (queryDate > daterange_0[0] AND queryDate < daterange_1[0]) OR > (queryDate > daterange_0[1] AND queryDate < daterange_1[1]) > > Someone else had a very similar problem recently on the mailing list with a > multiValued PointType field but the thread went cold without a final > solution. > > While I could filter the results when they get back to my application > layer, it seems like it's not really the right place to do it. > > Any help getting Solr to respect the positions of items in arrays would be > very gratefully received. > > Many thanks, > Mark > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >