Hi,
Just found out about this discussion so I realize I'm stepping in rather
late with my feedback... still for what it's worth, here it is :-).
In general I'm against this proposal as I believe it's can cause more
harm than good. The way I (and many others) see Lucene is as a separate
effort than Solr. I'm *big* fan of Solr and (as some of you may know)
I'm using it daily and promoting it where/when I can. That said, I'm
also a big fan of Lucene and I believe Solr has its value and use cases
while Lucene has its own.
Joining Solr with Lucene has the potential of creating a "virtual"
monopoly over Solr-like solutions built on top of Lucene which is not
community friendly but more importantly it puts the competition for Solr
in jeopardy. IMO competition is a key advantage for products/projects.
Yes, there is competition that will always come from the commercial
vendors, but competition and challenges must also come from the open
source community. This a big part of what drives innovation.
Furthermore, the community and the users of Lucene should have the
power/ability to decide on which solutions they want to go for - this is
true community driven development way.
I fully agree that there are many duplication in the work that is
currently being done in Solr. But it mainly originates in Solr not in
Lucene and the Lucene community should not be bothered by that. Such
duplicate work should be addressed in the Solr project. So for example,
take the analysis code... if all the work that has gone into the
analyzers in Solr would have been committed in Lucene from the start,
there wouldn't have been duplications. Same goes for the spatial support
or other duplicate work. Solr development certainly proven to push
Lucene development in many ways, and the best way to handle it is to
contribute back all this goodness to Lucene. And yes, it means that Solr
releases will need to wait for official Lucene releases, or in the mean
time have their own custom Lucene distributions, but this is the fair
play that all Lucene based solutions (let it be Katta, ElasticSearch,
Sensei, or any other) will have to deal with.
Merging committers.
I believe this will create a proliferation of commiters on these
projects which can bring a lot of mess. Let Lucene commiters focus on
what they do and know best - which is Lucene, and let Solr committer
focus on Solr. If a Solr committer can bring a lot of value to Lucene,
then yes, sure, make him/her a Lucene committers, but IMO being a Solr
committer doesn't automatically give anyone the credentials or the
skills to be a Lucene committer... mainly because the work done is Solr
is often at a higher level and often not related to Lucene at all.
Single source for all the code dup we now have across the
projects (my original reason, specifically on analyzers, for
starting this).
As mentioned above, this can easily be done by contributing the changes
to the analyzers back to Lucene.
Whenever a new feature is added to Lucene, we'd work through what
the impact is to Solr. This can still mean we separately develop
exposure in Solr, but it'd get us to at least more immediately
think about it.
This is something that Solr committers need to be responsible for, not
lucene commiters. Lucene committers need to make sure that Lucene works
and is bug free. I don't think it makes sense to push Solr
responsibilities on to Lucene committers.
Solr is Lucene's biggest direct user -- most people who use Lucene
use it through Solr -- so having it more closely integrated means
we know sooner if we broke something.
I disagree here. I believe Lucene still has larger install base than
Solr. Think of Jackrabbit which uses Lucene directly and all the CMSs
that use Jackrabbit. Think of frameworks like Compass and Hibernate
Search (that use Lucene directly) which are used in a lot of JEE
deployments around the world. And certainly there are a lot of large
infrastructures that use Lucene directly as well (as in LinkedIn for
example). Solr is great in what it does but it is certainly not
everything when it comes to open source search or Lucene.
Right now I could test whether flex breaks anything in Solr. I
can't do that now since Solr is isn't upgraded to 3.1.
True, but again, this is an issue Solr committers will have to deal
with. And yes, it means that Solr will almost always be one step behind
Lucene, but that's how it works with every dependency on every library
you use. If you want to test the flex stuff and it's currently being
developed as a separate lucene branch, then you can create a separate
Solr branch to see how it works and what future changes might need to go
into Solr. Again, Lucene committers shouldn't bother with this problem
and the development of Lucene shouldn't be effected due Solr related
issues.
Also take into account the huge difference in the release cycles between
the projects. Lucene has quite a steady release cycle (last year it was
quite constant on a release every 3 months or so). Solr on the other
hand, has longer release cycles that can span more than a year. A lot of
the issues that stall Solr releases have nothing to do with Lucene and
Lucene release cycle shouldn't suffer from that. Furthermore
users/projects/products that use Lucene directly should not suffer from
that as well. All the goodness that is developed in Lucene and all the
bug fixes should be available to Lucene users to download as soon as
they're ready - they don't need to suffer from any Solr related issues.
Please rest assure that my goal here is not to step on anyone's toes.
I'm not a committer on either project but I certainly want to see these
two projects go the right direction (at least the direction I believe is
right). So just wanted to express my concerns here.
Keep up the good work!
Cheers,
Uri