On 26.04.2010, at 12:48, Lukas Kahwe Smith wrote:

> Hi,
> 
> I am currently putting together a search for a DB where I have resolutions 
> along with their metadata as well as chapters, its text and metadata. Most of 
> the searching will actually be done on the metadata. The plan atm is to 
> support 2 search modes: (a) one where the results will be resolutions and (b) 
> another where the results will be chapters.
> 
> (a) Here I will search both the document and chapter data, but the actual 
> result entities I want are resolutions. In terms of rating I obviously want 
> stuff to rate higher with more relevant chapters, so I sort of need to group 
> the hits on the chapters when computing the score. For good measure I might 
> also want to show the number of chapters that had a match, potentially even 
> with links to these chapters, so I would also need the chapter id's that 
> matches.
> 
> (b) Here I will just search across the chapters and rank them each on their 
> own. Seems straight forward.
> 
> Now how should I best structure my index for this?
> 
> number of cores:
> I guess I will have two cores, one for documents and one for chapters? Then 
> again there is some minor overlap in fields between the two and there is no 
> real overhead with having unused fields, so I could just as well use one core.
> 
> grouping:
> how do I best group the scores for the (a) type search? should I just do two 
> searches and combine the results? then again this will make paging tricky.


after a bit more searching i found 
https://issues.apache.org/jira/browse/SOLR-236 aka 
http://wiki.apache.org/solr/FieldCollapsing
it seems to be mostly tailored towards removing (near) duplicates and so it 
doesnt seem to actually factor in multiple matches per group when computing the 
score, insetad jsut using the max score of the documents inside a group.

as an intermediate hack the best solution i see is to just make the chapter 
fields multivalued fields inside the resolutions, this should be a decent 
solution for (a), though this way i will not really have any information on the 
number of chapters matched (let alone their id's). this approach however also 
means that i need to index the chapters separately again for the type (b) 
search.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org



Reply via email to