relational db mapping for advanced search

Scott Yeadon Tue, 08 Feb 2011 13:42:34 -0800

Hi,

I was just after some advice on how to map some relational metadata to asolr index. The web application I'm working on is based around peopleand the searching based around properties of these people. Severalproperties are more complex - for example, a person's occupations haveplace, from/to dates and other descriptive text; texts about a personhave authors, sources and publication dates. Despite the usefulness offacets and the search-based navigation, an advanced search feature is anon-negotiable required feature of the application.

An advanced search needs to be able to query a person on any set ofattributes (e.g. gender, birth date, death date, place of birth) etcincluding the more complex search criteron as described above(occupation, texts). Taking occupation as an example, because occupationhas its own metadata and a person could have worked an arbitrary numberof occupations throughout their lifetime, I was wondering how/if thisinformation can be denormalised into a single person index document tosupport such a search. I can't use text concatenation in a multivaluedfield as I need to be able to run date-based range queries (e.g.publication dates, occupation dates). And I'm not sure that resorting tomultiple repeated fields based on the current limits (e.g. occ1,occ1startdate, occ1enddate, occ1place, occ2, etc) is a good approach(although that would work).

If there isn't a sensible way to denormalise this, what is the bestapproach? For example, should I have an occupation document type, aperson document type, a text/source document type and (in an advancedsearch context) each containing the relevant person id and (in theadvanced search context) run a query against each document type and thenuse the intersecting set of person ids as the result used by theapplication for its display/pagination? And if so, how do I ensure Icapture all records - for example if there are 100,000 hits on someonehaving worked in Australia in 1956, is there any way to ensure all100,000 are returned in a query (similar to the facet.limit = -1) otherthan specifying an arbitrary high number in the "rows" parameter andhoping a query doesn't hit more than 100,000 and thus exclude thoseabove the limit from the "intersect" processing?


Or is there a single query solution?

Any advice/hints welcome.

Scott.

relational db mapping for advanced search

Reply via email to