Op 11 oktober 2011 03:21 schreef Chris Hostetter <hossman_luc...@fucit.org>het volgende:
> > : Conceptually > : the Join-approach looks like it would work from paper, although I'm not a > : big fan of introducing a lot of complexity to the frontend / querying > part > : of the solution. > > you lost me there -- i don't see how using join would impact the front end > / query side at all. your query clients would never even know that a join > had happened (your indexing code would certianly have to know about > creating those special case docs to join against obviuosly) > > : As an alternative, what about using your fieldMaskingSpanQuery-approach > : solely (without the JOIN-approach) and encode open/close on a per day > : basis? > : I didn't mention it, but I 'only' need 100 days of data, which would lead > to > : 100 open and 100 close values, not counting the pois with multiple > ... > : Data then becomes: > : > : open: 20111020_12_30, 20111021_12_30, 20111022_07_30, ... > : close: 20111020_20_00, 20111021_26_30, 20111022_12_30, ... > > aw hell ... i assumed you needed to suport an arbitrarily large number > of special case open+close pairs per doc. > I didn't express myself well. A POI can have multiple open+close pairs per day, but each night I only index the coming 100 days. So MOST POIs will have 100 open+close pairs (1 openinghours per day) but some have more. > > if you only have to support a fix value (N=100) open+close values you > could just have N*2 date fields and a BooleanQuery containing N 2-clause > BooleanQueries contain ranging queries against each pair of your date > fields. ie... > > ((+open00:[* TO NOW] +close00:[NOW+3HOURS TO *]) > (+open01:[* TO NOW] +close01:[NOW+3HOURS TO *]) > (+open02:[* TO NOW] +close02:[NOW+3HOURS TO *]) > ...etc... > (+open99:[* TO NOW] +close99:[NOW+3HOURS TO *])) > > ...for a lot of indexes, 100 clauses is small potatoes as far as number of > boolean clauses go, especially if many of them are going to short circut > out because there won't be any matches at all. > Given that I need multiple open+close pairs per day this can't be used directly. However when setting a logical upperbound on the maximum nr of openinghours per day (say 3), which would be possible, this could be extended to: open00 = day0 --> open00-0 = day0 timeslot 0, open00-1 = day0 timeslot 1, etc. So, ((+open00-0:[* TO NOW] +close00-0:[NOW+3HOURS TO *]) (+open00-1:[* TO NOW] +close00-1:[NOW+3HOURS TO *]) (+open00-2:[* TO NOW] +close00-2:[NOW+3HOURS TO *]) (+open01-0:[* TO NOW] +close01-0:[NOW+3HOURS TO *]) (+open01-1:[* TO NOW] +close01-1:[NOW+3HOURS TO *]) (+open01-2:[* TO NOW] +close01-2:[NOW+3HOURS TO *]) ...etc... (+open99:[* TO NOW] +close99:[NOW+3HOURS TO *])) This would need 2*3*100 = 600 dynamicfields to cover the openinghours. You mention this is peanuts for constructing a booleanquery, but how about memory consumption? I'm particularly concerned about the Lucene FieldCache getting populated for each of the 600 fields. (Since I had some nasty OOM experiences with that in the past. 2-3 years ago memory consumption of Lucene FieldCache couldn't be controlled, I'm not sure how that is now to be honest) I will not be sorting on any of the 600 dynamicfields btw. Instead I will only use them as part of the above booleanquery, which I will likely define as a Filter Query. Just to be sure, in this situation, Lucene FieldCache won't be touched, correct? If so, this will probably be a good workable solution! > : Alternatively, how would you compare your suggested approach with the > : approach by David Smiley using either SOLR-2155 (Geohash prefix query > : filter) or LSP: > : > https://issues.apache.org/jira/browse/SOLR-2155?focusedCommentId=13115244#comment-13115244 > . > : That would work right now, and the LSP-approach seems pretty elegant to > me. > > I'm afraid i'm totally ignorant of how the LSP stuff works so i can't > really comment there. > > If i understand what you mean about mapping the open/close concepts to > lat/lon concepts, then i can see how it would be useful for multiple pair > wise (absolute) date ranges, but i'm not really sure how you would deal > with the diff open+close pairs per day (or on diff days of hte week, or > special days of the year) using the lat+lon conceptual model ... I guess > if the LSP stuff supports arbitrary N-dimensional spaces then you could > model day or week as a dimension .. but it still seems like you'd need > multiple fields for the special case days, right? > I planned to do the folllowing using LSP, (through help from David) Each <open,close>-tuple would be modeled as a point(x,y) . (x = open, y = close) So a POI can have many (100 or more) points, each representing a <open,close>-tuple. Given: 100 days lookahead, granularity: 5 min, we can map dimensions x and y to to [0,30000] E.g: - indexing starts at / baseline is at: 2011-11-01:0000 - poi open: 2011-11-08:1800 - poi close: 2011-11-09:0300 - (query): user visit: 2011-11-08:2300 - user depart: 2011-11-09:0200 Would map to: - poi open: 2520 - poi close: 2628 = point(x,y) = (2520,2628) - (query):user visit: 2580 - user depart: 2616 = bbox filter with the ranges x:[0 TO 2580], y:[2616 TO 30000] All pois are returned which have one or more points within the bbox. Both approaches seem pretty good to me. I'll be testing both soon. Thanks! Geert-Jan > How it would compare performance wise: no idea. > > > -Hoss >