Ah, I should have read more carefully...

I remember this being discussed on the dev list, and I thought there might
be
a Jira attached but I sure can't find it.

If you're willing to work on it, you might hop over to the solr dev list and
start
a discussion, maybe ask for a place to start. I'm sure some of the devs have
thought about this...

If nobody on the dev list says "There's already a JIRA on it", then you
should
open one. The Jira issues are generally preferred when you start getting
into
design because the comments are preserved for the next person who tries
the idea or makes changes, etc....

Best
Erick

On Wed, Oct 20, 2010 at 9:52 PM, Ben Boggess <ben.bogg...@gmail.com> wrote:

> Thanks Erick.  The problem with multiple cores is that the documents are
> scored independently in each core.  I would like to be able to search across
> both cores and have the scores 'normalized' in a way that's similar to what
> Lucene's MultiSearcher would do.  As far a I understand, multiple cores
> would likely result in seriously skewed scores in my case since the
> documents are not distributed evenly or randomly.  I could have one
> core/index with 20 million docs and another with 200.
>
> I've poked around in the code and this feature doesn't seem to exist.  I
> would be happy with finding a decent place to try to add it.  I'm not sure
> if there is a clean place for it.
>
> Ben
>
> On Oct 20, 2010, at 8:36 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
> > It seems to me that multiple cores are along the lines you
> > need, a single instance of Solr that can search across multiple
> > sub-indexes that do not necessarily share schemas, and are
> > independently maintainable......
> >
> > This might be a good place to start:
> http://wiki.apache.org/solr/CoreAdmin
> >
> > HTH
> > Erick
> >
> > On Wed, Oct 20, 2010 at 3:23 PM, ben boggess <ben.bogg...@gmail.com>
> wrote:
> >
> >> We are trying to convert a Lucene-based search solution to a
> >> Solr/Lucene-based solution.  The problem we have is that we currently
> have
> >> our data split into many indexes and Solr expects things to be in a
> single
> >> index unless you're sharding.  In addition to this, our indexes wouldn't
> >> work well using the distributed search functionality in Solr because the
> >> documents are not evenly or randomly distributed.  We are currently
> using
> >> Lucene's MultiSearcher to search over subsets of these indexes.
> >>
> >> I know this has been brought up a number of times in previous posts and
> the
> >> typical response is that the best thing to do is to convert everything
> into
> >> a single index.  One of the major reasons for having the indexes split
> up
> >> the way we do is because different types of data need to be indexed at
> >> different intervals.  You may need one index to be updated every 20
> minutes
> >> and another is only updated every week.  If we move to a single index,
> then
> >> we will constantly be warming and replacing searchers for the entire
> >> dataset, and will essentially render the searcher caches useless.  If we
> >> were able to have multiple indexes, they would each have a searcher and
> >> updates would be isolated to a subset of the data.
> >>
> >> The other problem is that we will likely need to shard this large single
> >> index and there isn't a clean way to shard randomly and evenly across
> the
> >> of
> >> the data.  We would, however like to shard a single data type.  If we
> could
> >> use multiple indexes, we would likely be also sharding a small sub-set
> of
> >> them.
> >>
> >> Thanks in advance,
> >>
> >> Ben
> >>
>

Reply via email to