Sorry for the somewhat length post, I would like to make clear that I covered my basis here, and looking for an alternative solution, because the more trivial solutions don't seem to work for my use-case.
Consider Bars, musea, etc. These places have multiple openinghours that can depend on: REQ 1. day of week REQ 2. special days on which they are closed, or have in another way different openinghours than there related 'day of week' Now, I want to model these 'places' in a way so I'm able to do temporal queries like: - which bars are open NOW (and stay open for at least another 3 hours) - which musea are (already) open at 25-12-2011 - 10AM - and stay open until (at least) 3PM. I believe having opening/closing hours available for each day at least gives me the data needed to query the above. (Note that having dayOfWeek*openinghours is not enough, bc. of the special cases in 2.) Okay knowing I need openinghours*dates for each place, how would I format this in documents? OPTION A) ----------- Considering granularity: I want documents to represent Places and not Places*dates. Although the latter would trivially allow me to do the quering mentioned above, it has the disadvantages: - same place returned multiple times (each with a different date) when queries are not constrained to date. - Lot's of data needs to be duplicated, all for the conceptually 'simple' functionality of needing multiple date-ranges. It feels bad and a simpler solution should exist? - Exploding the resultset (documents = say, 100 dates * 1.000.000 = 100.000.000. ) suddenly the size of the resultset goes from 'easily doable' to 'hmmm I have to think about this'. Given that places also have some other fields to sort on, Lucene fieldcache mem-usage would explode with a factor 100. OPTION B) ---------- Another, faulty, option would be to model opening/closing hours in 2 multivalued date-fields, i.e: open, close. and insert open/close for each day, e.g: open: 2011-11-08:1800 - close: 2011-11-09:0300 open: 2011-11-09:1700 - close: 2011-11-10:0500 open: 2011-11-10:1700 - close: 2011-11-11:0300 And queries would be of the form: 'open < now && close > now+3h' But since there is no way to indicate that 'open' and 'close' are pairwise related I will get a lot of false positives, e.g the above document would be returned for: open < 2011-11-09:0100 && close > 2011-11-09:0600 because SOME opendate is before 2011-11-09:0100 (i.e: 2011-11-08:1800) and SOME closedate is after 2011-11-09:0600 (for example: 2011-11-11:0300) but these open and close-dates are not pairwise related. OPTION C) The best of what I have now: --------------------------------------- I have been thinking about a totally different approach using Solr dynamic fields, in which each and every opening and closing-date gets it's own dynamic field, e.g: _date_2011-11-09_open: 1800 _date_2011-11-09_close: 0300 _date_2011-11-09_open: 1700 _date_2011-11-10_close: 0500 _date_2011-11-10_open: 1700 _date_2011-11-11_close: 0300 Then, the client should know the date to query, and thus the correct fields to query. This would solve the problem, since startdate/ enddate are nor pairwise -related, but I fear this can be a big issue from a performance standpoint (especially memory consumption of the Lucene fieldcache) IDEAL OPTION D) ---------------- I'm pretty sure this does not exist out-of-the-box, but might be extended. Okay, Solr has a fieldtype: date, but what if it also had a fieldtype: Daterange? A Daterange would be modeled as <DateTimeA,DateTimeB> or <DateTimeA,Delta DateTimeA> Then this problem would be really easily modelled as a multivalued field 'openinghours' of type 'Daterange'. However, I have the feeling that the standard range-query implementation can't be used on this fieldtype, or perhaps should be run for each of the N datereange-values in 'openinghours'. To make matters worse ( I didn't want to introduce this above) REQ 3: It may be possible that certain places have multiple opening-hours / timeslots each day. Consider museum in Spain which get's closed around noon because of siesta-time. OPTION D) would be able to handle this natively, all other options can't. I would very much appreciate any pointers to: - how to start with option D. and if this approach is at all feasible. - if option C. would suffice. (excluding REQ 3. ), and if I'm likely to run into performance / memory troubles. - any other possible solutions I haven' thought of to tackle this. Thanks a lot. Cheers, Geert-Jan -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-tp3368790p3368790.html Sent from the Solr - User mailing list archive at Nabble.com.