Lucene?

Uri Boness Tue, 02 Mar 2010 09:40:30 -0800

Hi,

Just found out about this discussion so I realize I'm stepping in ratherlate with my feedback... still for what it's worth, here it is :-).

In general I'm against this proposal as I believe it's can cause moreharm than good. The way I (and many others) see Lucene is as a separateeffort than Solr. I'm *big* fan of Solr and (as some of you may know)I'm using it daily and promoting it where/when I can. That said, I'malso a big fan of Lucene and I believe Solr has its value and use caseswhile Lucene has its own.

Joining Solr with Lucene has the potential of creating a "virtual"monopoly over Solr-like solutions built on top of Lucene which is notcommunity friendly but more importantly it puts the competition for Solrin jeopardy. IMO competition is a key advantage for products/projects.Yes, there is competition that will always come from the commercialvendors, but competition and challenges must also come from the opensource community. This a big part of what drives innovation.Furthermore, the community and the users of Lucene should have thepower/ability to decide on which solutions they want to go for - this istrue community driven development way.

I fully agree that there are many duplication in the work that iscurrently being done in Solr. But it mainly originates in Solr not inLucene and the Lucene community should not be bothered by that. Suchduplicate work should be addressed in the Solr project. So for example,take the analysis code... if all the work that has gone into theanalyzers in Solr would have been committed in Lucene from the start,there wouldn't have been duplications. Same goes for the spatial supportor other duplicate work. Solr development certainly proven to pushLucene development in many ways, and the best way to handle it is tocontribute back all this goodness to Lucene. And yes, it means that Solrreleases will need to wait for official Lucene releases, or in the meantime have their own custom Lucene distributions, but this is the fairplay that all Lucene based solutions (let it be Katta, ElasticSearch,Sensei, or any other) will have to deal with.

 Merging committers.

I believe this will create a proliferation of commiters on theseprojects which can bring a lot of mess. Let Lucene commiters focus onwhat they do and know best - which is Lucene, and let Solr committerfocus on Solr. If a Solr committer can bring a lot of value to Lucene,then yes, sure, make him/her a Lucene committers, but IMO being a Solrcommitter doesn't automatically give anyone the credentials or theskills to be a Lucene committer... mainly because the work done is Solris often at a higher level and often not related to Lucene at all.

Single source for all the code dup we now have across the
    projects (my original reason, specifically on analyzers, for
    starting this).

As mentioned above, this can easily be done by contributing the changesto the analyzers back to Lucene.

Whenever a new feature is added to Lucene, we'd work through what
    the impact is to Solr.  This can still mean we separately develop
    exposure in Solr, but it'd get us to at least more immediately
    think about it.

This is something that Solr committers need to be responsible for, notlucene commiters. Lucene committers need to make sure that Lucene worksand is bug free. I don't think it makes sense to push Solrresponsibilities on to Lucene committers.

Solr is Lucene's biggest direct user -- most people who use Lucene
    use it through Solr -- so having it more closely integrated means
    we know sooner if we broke something.

I disagree here. I believe Lucene still has larger install base thanSolr. Think of Jackrabbit which uses Lucene directly and all the CMSsthat use Jackrabbit. Think of frameworks like Compass and HibernateSearch (that use Lucene directly) which are used in a lot of JEEdeployments around the world. And certainly there are a lot of largeinfrastructures that use Lucene directly as well (as in LinkedIn forexample). Solr is great in what it does but it is certainly noteverything when it comes to open source search or Lucene.

Right now I could test whether flex breaks anything in Solr.  I
    can't do that now since Solr is isn't upgraded to 3.1.

True, but again, this is an issue Solr committers will have to dealwith. And yes, it means that Solr will almost always be one step behindLucene, but that's how it works with every dependency on every libraryyou use. If you want to test the flex stuff and it's currently beingdeveloped as a separate lucene branch, then you can create a separateSolr branch to see how it works and what future changes might need to gointo Solr. Again, Lucene committers shouldn't bother with this problemand the development of Lucene shouldn't be effected due Solr relatedissues.

Also take into account the huge difference in the release cycles betweenthe projects. Lucene has quite a steady release cycle (last year it wasquite constant on a release every 3 months or so). Solr on the otherhand, has longer release cycles that can span more than a year. A lot ofthe issues that stall Solr releases have nothing to do with Lucene andLucene release cycle shouldn't suffer from that. Furthermoreusers/projects/products that use Lucene directly should not suffer fromthat as well. All the goodness that is developed in Lucene and all thebug fixes should be available to Lucene users to download as soon asthey're ready - they don't need to suffer from any Solr related issues.

Please rest assure that my goal here is not to step on anyone's toes.I'm not a committer on either project but I certainly want to see thesetwo projects go the right direction (at least the direction I believe isright). So just wanted to express my concerns here.


Keep up the good work!

Cheers,
Uri

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Reply via email to