I'm not very happy with this proposal. I certainly understand what is
being tried to achieve though. I'd like to see a tighter integration
and communication between Lucene core and SOLR too, but the proposed
requirements seem much too strict. For example, I think it's a good
idea for SOLR to ride on Lucene's trunk again. This will show
potential problems of API changes and new features in Lucene much more
quickly. It will also help SOLR to use new Lucene features much more
quickly.
However, I'm -1 for these points:
* When a change it committed to Lucene, it must pass all Solr tests.
* Release both at once.
SOLR is a consumer of Lucene's API. So what this requirement basically
translates to is that I, as a Lucene committer, now have to not only
make sure Lucene's backwards-compatibility is ensured, but also that I
make all necessary changes in SOLR. So I have to know much more code
suddenly and potentionally make many more changes. But this doesn't
help all the other Lucene consumers out there. I invested several
weeks upgrading our software at IBM to 3.0 APIs, because I had 5000
compile errors.
I think the Lucene backwards-compatibility policy is very strict
already and it often takes more time working on bw-compat than the
actual feature. With the additional requirement above this will get
worse, and I'm afraid it might slow down Lucene's progress.
I don't disagree that things like moving function queries from SOLR to
Lucene have failed - but we have to ask why they weren't added to
Lucene in the first place. Was there ever a discussion whether those
queries should be added to Lucene or SOLR when they were developed? Or I'd
also love to see a powerful facet engine in Lucene, and SOLR would
build its faceting features on top of those APIs.
So I'm +1 for better communication (maybe even merging the dev lists) and
especially talking about where a new feature should live before
working on a patch.
Michael
On 2/28/10 2:57 AM, Michael McCandless wrote:
To make this more concrete, I think this is roughly what's being
proposed:
* Merging the dev lists into a single list.
* Merging committers.
* When a change it committed to Lucene, it must pass all Solr
tests.
* Release both at once.
These things would not change:
* Most importantly, the source code would remain factored into
separate dirs/modules.
* User's lists should remain separate.
* Web sites would remain separate.
* Solr& Lucene are still separate downloads, separate JARs,
seperate subdirs in the source tree, etc.
The outside world still sees Solr& Lucene as separate entities. It's
only that they would now be developed/released in synchrony.
There are some important gains by doing this:
* Single source for all the code dup we now have across the
projects (my original reason, specifically on analyzers, for
starting this).
* Whenever a new feature is added to Lucene, we'd work through what
the impact is to Solr. This can still mean we separately develop
exposure in Solr, but it'd get us to at least more immediately
think about it.
* Solr is Lucene's biggest direct user -- most people who use Lucene
use it through Solr -- so having it more closely integrated means
we know sooner if we broke something.
* Right now I could test whether flex breaks anything in Solr. I
can't do that now since Solr is isn't upgraded to 3.1.
Recent big changes (eg segment based searching, Version, attr based
tokenstream api) caused alot of work in Solr that could've been much
smoother had Solr "been there" as we were working through them.
Recent new features, eg near-real-time search, which are unavailable
in Solr still, would have at least had some discussion about how to
expose this in Solr.
Over time (and we don't have to do this right on day 1) we can make
core capabilities available to pure Lucene. EG core Lucene users
should be able to use faceting, use a schema, etc.
I think this idea makes alot of sense and I think now is a good time
to do it. Yes, this a big change, but I think the gains are sizable.
As Lucene& Solr diverge more, it'll only become harder and harder to
merge.
Robert's massive patch on SOLR-1657, upgrading most Solr's analyzers
to 3.0, is aging... while other changes to analyzers are being
proposed (SOLR-1799). If we were integrated (or at least single
source for analyzers), Robert would already have committed it.
Mike
On Fri, Feb 26, 2010 at 5:20 PM, Yonik Seeley
<[email protected]> wrote:
On Fri, Feb 26, 2010 at 5:15 PM, Steven A Rowe<[email protected]> wrote:
On 02/24/2010 at 2:20 PM, Yonik Seeley wrote:
I've started to think that a merge of Solr and Lucene would be in the
best interest of both projects.
The Sorlucene :) merger could be achieved virtually, i.e. via policy, rather
than physically merging:
Everything is virtual here anyway :-)
I agree with Mike that a single dev list is highly desirable. There
would still be separate downloads. What to do with some of the other
stuff is unspecified.
Committers would need to be merged though - that's the only way to
make a change across projects w/o breaking stuff.
-Yonik