Re: multiple dateranges/timeslots per doc: modeling openinghours.

Geert-Jan Brits Mon, 03 Oct 2011 04:42:55 -0700

Thanks Hoss for that in-depth walkthrough.

I like your solution of using (something akin to)
FieldMaskingSpanQuery<https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html>.
Conceptually
the Join-approach looks like it would work from paper, although I'm not a
big fan of introducing a lot of complexity to the frontend / querying part
of the solution.


As an alternative, what about using your fieldMaskingSpanQuery-approach
solely (without the JOIN-approach)  and encode open/close on a per day
basis?
I didn't mention it, but I 'only' need 100 days of data, which would lead to
100 open and 100 close values, not counting the pois with multiple
openinghours per day which are pretty rare.
The index is rebuild each night, refreshing the date-data.

I'm not sure what the performance implications would be like, but somehow
that feels doable. Perhaps it even offsets the extra time needed for doing
the Joins, only 1 way to find out I guess.
Disadvantage would be fewer cache-hits when using FQ.

Data then becomes:

open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ...
close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ...

Notice the: 20111021_26_30, which indicates close at 2AM the next day,
which would work (in contrast to encoding it like 20111022_02_30)

Alternatively, how would you compare your suggested approach with the
approach by David Smiley using either SOLR-2155 (Geohash prefix query
filter) or LSP:
https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244.
That would work right now, and the LSP-approach seems pretty elegant to me.
FQ-style caching is probably not possible though.

Geert-Jan

Op 1 oktober 2011 04:25 schreef Chris Hostetter
<hossman_luc...@fucit.org>het volgende:

>
> : Another, faulty, option would be to model opening/closing hours in 2
> : multivalued date-fields, i.e: open, close. and insert open/close for each
> : day, e.g:
> :
> : open: 2011-11-08:1800 - close: 2011-11-09:0300
> : open: 2011-11-09:1700 - close: 2011-11-10:0500
> : open: 2011-11-10:1700 - close: 2011-11-11:0300
> :
> : And queries would be of the form:
> :
> : 'open < now && close > now+3h'
> :
> : But since there is no way to indicate that 'open' and 'close' are
> pairwise
> : related I will get a lot of false positives, e.g the above document would
> be
> : returned for:
>
> This isn't possible out of the box, but the general idea of "position
> linked" queries is possible using the same approach as the
> FieldMaskingSpanQuery...
>
>
> https://lucene.apache.org/java/3_4_0/api/core/org/apache/lucene/search/spans/FieldMaskingSpanQuery.html
> https://issues.apache.org/jira/browse/LUCENE-1494
>
> ..implementing something like this that would work with
> (Numeric)RangeQueries however would require some additional work, but it
> should certianly be doable -- i've suggested this before but no one has
> taken me up on it...
> http://markmail.org/search/?q=hoss+FieldMaskingSpanQuery
>
> If we take it as a given that you can do multiple ranges "at the same
> position", then you can imagine supporting all of your "regular" hours
> using just two fields ("open" and "close") by encoding the day+time of
> each range of open hours into them -- even if a store is open for multiple
> sets of ranges per day (ie: closed for siesta)...
>
>  open: mon_12_30, tue_12_30, wed_07_30, wed_3_30, ...
>  close: mon_20_00, tue_20_30, wed_12_30, wed_22_30, ...
>
> then asking for "stores open now and for the next 3 hours" on "wed" at
> "2:13PM" becomes a query for...
>
> sameposition(open:[* TO wed_14_13], close:[wed_17_13 TO *])
>
> For the special case part of your problem when there are certain dates
> that a store will be open atypical hours, i *think* that could be solved
> using some special docs and the new "join" QParser in a filter query...
>
>        https://wiki.apache.org/solr/Join
>
> imagine you have your "regular" docs with all the normal data about a
> store, and the open/close fields i describe above.  but in addition to
> those, for any store that you know is "closed on dec 25" or "only open
> 12:00-15:00 on Jan 01" you add an additional small doc encapsulating
> the information about the stores closures on that special date - so that
> each special case would be it's own doc, even if one store had 5 days
> where there was a special case...
>
>  specialdoc1:
>    store_id: 42
>    special_date: Dec-25
>    status: closed
>  specialdoc2:
>    store_id: 42
>    special_date: Jan-01
>    status: irregular
>    open: 09_30
>    close: 13_00
>
> then when you are executing your query, you use an "fq" to constrain to
> stores that are (normally) open right now (like i mentioned above) and you
> use another fq to find all docs *except* those resulting from a join
> against these special case docs based on the current date.
>
> so if you r query is "open now and for the next 3 hours" and "now" ==
> "sunday, 2011-12-25 @ 10:17AM your query would be something like...
>
> q=...user input...
> time=sameposition(open:[* TO sun_10_17], close:[sun_13_17 TO *])
> fq={!v=time}
> fq={!join from=store_id to=unique_key v=$vv}
> vv=-(+special_date:Dec-25 +(status:closed OR _query_:"{v=$time}"))
>
> That join based approach for dealing with the special dates should work
> regardless of wether someone implements a way to do pair wise
> "sameposition()" rangequeries ... so if you can live w/o the multiple
> open/close pairs per day, you can just use the "one field per day of hte
> week" type approach you mentioned combined with the "join" for special
> case days of hte year and everything you need should already work w/o any
> code (on trunk).
>
> (disclaimer: obviously i haven't tested that query, the exact syntax may
> be off but the princible for modeling the "special docs" and using
> them in a join should work)
>
>
> -Hoss
>

Re: multiple dateranges/timeslots per doc: modeling openinghours.

Reply via email to