I think we can treat this as a special join operation.
Here are my some clues to support it.
1, build each group as a separate index
Index 1's name group1
Key
Group 1's fields
Index 2's name group2
Key
Group 2's fields
2, query looks like a RDBMS's join operation. For example
select g1.key, g.field1, g2.field1 from group1 g1, group2 g2 where
g1.field1 > 1 AND (g1.field2 < 100 OR g2.field1 < 99).
3, how Solr/Lucene support the above query?
It looks like they do not support it.
I've two ideas of its solution.
First, is it possible to use the same docid for the same key in all indexes? If
so, what we need do is to have a global docid generator which generate the same
docid for the same key, and Hit contains index information (maybe like Segment).
I reviewed the source code of Lucene/Solr and found it seems docid are
internally generated during building index, more important, some operation
depends on its order. In another words, you can not give an document an smaller
docid. Am I right?
Second, let score merge result by key rather than by docid. Of course, it is
not efficient as by docid. Since Lucene had build index, I think it should
still be fast enough.
I'd like to hear your opinions on this topic.
Thanks,
-----Original Message-----
From: Robert Yu [mailto:[email protected]]
Sent: Friday, September 30, 2011 9:54 AM
To: [email protected]
Subject: split index horizontally
Is there a efficient way to handle my case?
Each document has several group fields, some of them are updated frequently,
some of them are updated infrequently. Is it possible to maintain index based
on groups but can search over all of them as ONE index?
To some extent, it is a three layer of document (I think the current is two
layer):
document = {key: groups},...
groups = {group-name: fields},...
fields = {field-name: field-value},...
we can maintain index for each group, and can search it like below:
query: group-name-1:field-1:val-1 AND (
group-name-2:field-2:val-2 OR group-name-3:field-3:[min-3 TO max-3])
return data:
group-name-1:field-1,field-2;groupd-name-2:field-3,field-4,...
Thanks,
Robert Yu