Sorry for the somewhat length post, I would like to make clear that I covered
my basis here, and looking for an alternative solution, because the more
trivial solutions don't seem to work for my use-case. 

Consider Bars, musea, etc. 

These places have multiple openinghours that can depend on: 
REQ 1. day of week
REQ 2. special days on which they are closed, or have in another way
different openinghours than there related 'day of week'

Now, I want to model these 'places' in a way so I'm able to do temporal
queries like: 
- which bars are open NOW (and stay open for at least another 3 hours)
- which musea are (already) open at 25-12-2011 - 10AM - and stay open until
(at least) 3PM. 

I believe having opening/closing hours available for each day at least gives
me the data needed to query the above. (Note that having
dayOfWeek*openinghours is not enough, bc. of the special cases in 2.) 

Okay knowing I need openinghours*dates for each place, how would I format
this in documents? 

OPTION A) 
-----------
Considering granularity: I want documents to represent Places and not
Places*dates. Although the latter would trivially allow me to do the quering
mentioned above, it has the disadvantages: 

 - same place returned multiple times (each with a different date) when
queries are not constrained to date. 
- Lot's of data needs to be duplicated, all for the conceptually 'simple' 
functionality of needing multiple date-ranges. It feels bad and a simpler
solution should exist? 
- Exploding the resultset (documents = say, 100 dates * 1.000.000 =
100.000.000. ) suddenly the size of the resultset goes from 'easily doable'
to 'hmmm I have to think about this'. Given that places also have some other
fields to sort on, Lucene fieldcache mem-usage would explode with a factor
100. 

OPTION B)
----------
Another, faulty, option would be to model opening/closing hours in 2
multivalued date-fields, i.e: open, close. and insert open/close for each
day, e.g: 

open: 2011-11-08:1800 - close: 2011-11-09:0300
open: 2011-11-09:1700 - close: 2011-11-10:0500
open: 2011-11-10:1700 - close: 2011-11-11:0300

And queries would be of the form:

'open < now && close > now+3h'

But since there is no way to indicate that 'open' and 'close' are pairwise
related I will get a lot of false positives, e.g the above document would be
returned for:

open < 2011-11-09:0100 && close > 2011-11-09:0600
because SOME opendate is before 2011-11-09:0100 (i.e: 2011-11-08:1800) and
SOME closedate is after 2011-11-09:0600 (for example: 2011-11-11:0300) but
these open and close-dates are not pairwise related.

OPTION C) The best of what I have now:
---------------------------------------
I have been thinking about a totally different approach using Solr dynamic
fields, in which each and every opening and closing-date gets it's own
dynamic field, e.g:

_date_2011-11-09_open: 1800
_date_2011-11-09_close: 0300
_date_2011-11-09_open: 1700
_date_2011-11-10_close: 0500
_date_2011-11-10_open: 1700
_date_2011-11-11_close: 0300

Then, the client should know the date to query, and thus the correct fields
to query. This would solve the problem, since startdate/ enddate are nor
pairwise -related, but I fear this can be a big issue from a performance
standpoint (especially memory consumption of the Lucene fieldcache)


IDEAL OPTION D) 
----------------
I'm pretty sure this does not exist out-of-the-box, but might be extended. 
Okay, Solr has a fieldtype: date, but what if it also had a fieldtype:
Daterange? A Daterange would be modeled as &lt;DateTimeA,DateTimeB&gt; or
&lt;DateTimeA,Delta DateTimeA&gt;

Then this problem would be really easily modelled as a multivalued field
'openinghours' of type 'Daterange'. 
However, I have the feeling that the standard range-query implementation
can't be used on this fieldtype, or perhaps should be run for each of the N
datereange-values in 'openinghours'. 

To make matters worse ( I didn't want to introduce this above) 
REQ 3: It may be possible that certain places have multiple opening-hours /
timeslots each day. Consider museum in Spain which get's closed around noon
because of siesta-time. 
OPTION D) would be able to handle this natively, all other options can't. 

I would very much appreciate any pointers to: 
 - how to start with option D. and if this approach is at all feasible. 
 - if option C. would suffice. (excluding REQ 3. ), and if I'm likely to run
into performance / memory troubles. 
 - any other possible solutions I haven' thought of to tackle this. 

Thanks a lot. 

Cheers,
Geert-Jan






--
View this message in context: 
http://lucene.472066.n3.nabble.com/multiple-dateranges-timeslots-per-doc-modeling-openinghours-tp3368790p3368790.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to