I think we can treat this as a special join operation.
Here are my some clues to support it.
1, build each group as a separate index
        Index 1's name group1
                Key
                Group 1's fields
        Index 2's name group2
                Key
                Group 2's fields
2, query looks like a RDBMS's join operation. For example
        select g1.key, g.field1, g2.field1 from group1 g1, group2 g2 where 
g1.field1 > 1 AND (g1.field2 < 100 OR g2.field1 < 99).
3, how Solr/Lucene support the above query?
        It looks like they do not support it.

I've two ideas of its solution.
First, is it possible to use the same docid for the same key in all indexes? If 
so, what we need do is to have a global docid generator which generate the same 
docid for the same key, and Hit contains index information (maybe like Segment).

        I reviewed the source code of Lucene/Solr and found it seems docid are 
internally generated during building index, more important, some operation 
depends on its order. In another words, you can not give an document an smaller 
docid. Am I right?

Second, let score merge result by key rather than by docid. Of course, it is 
not efficient as by docid. Since Lucene had build index, I think it should 
still be fast enough.
        I'd like to hear your opinions on this topic.

Thanks,
-----Original Message-----
From: Robert Yu [mailto:robert...@morningstar.com] 
Sent: Friday, September 30, 2011 9:54 AM
To: solr-user@lucene.apache.org
Subject: split index horizontally

Is there a efficient way to handle my case?

Each document has several group fields, some of them are updated frequently, 
some of them are updated infrequently. Is it possible to maintain index based 
on groups but can search over all of them as ONE index?

 

To some extent, it is a three layer of document (I think the current is two 
layer):

document = {key: groups},...

groups = {group-name: fields},...

fields = {field-name: field-value},...

 

we can maintain index for each group, and can search it like below:

               query: group-name-1:field-1:val-1 AND (
group-name-2:field-2:val-2 OR group-name-3:field-3:[min-3 TO max-3])

               return data:
group-name-1:field-1,field-2;groupd-name-2:field-3,field-4,...

 

Thanks,

Robert Yu

 

Reply via email to