Re: [VOTE] merge lucene/solr development

Grant Ingersoll Tue, 09 Mar 2010 07:27:31 -0800

Dropping priv...@.

On Mar 9, 2010, at 6:30 AM, mark harwood wrote:


> Another 2 cents late to the party.....
> 
>> I believe this is a question of identity.  What is Lucene?
> 
> Absolutely.
> I think one of the clearest differences in outlook between Lucene and Solr is 
> in the support for distributed deployments. Solr clearly aims to support 
> distributed deployments while Lucene is "just a library".
> Many index operations (faceting, search, top terms) that work in a 
> distributed fashion must be written differently to a single-index counterpart.
> If we do aim to share any distribute-capable functionality will Lucene need a 
> brand new set of abstractions to avoid binding directly to the Solr server 
> platform? Is that at all realistic?
> I speak as someone else who needs to maintain a Lucene extension similar to 
> Solr, but where using Solr is not the answer so am keen for Lucene to 
> maintain independence.
> 
> Another potential big difference is any functionality that is 
> Solr-schema-aware. Again, would we need to introduce an abstraction for 
> schemas?
> 
> Maybe it's useful to consider what is fundamentally different between Solr 
> and Lucene (I suggest schema vs no schema and distributed vs local) and use 
> this to help put a limit on what functionality we consider sharing.
> If a function is untainted by a fundamental difference (e.g. Analyzers 
> typically couldnt care less about schemas or distribution) then that is a 
> candidate for sharing.
> 
> At the end of this process we get a good idea about what really can be shared.

I agree.  I maintain both Lucene and Solr instances.  Sometimes I need things 
that are in Solr that are Lucene.  Sometimes I need things in Lucene that are 
in Solr.  In the Lucene instances I maintain/help with, I don't need the Solr 
server stuff.  So, to me, there will always need to be that distinction.  At 
the same time, it is very frustrating for me to write code that I know belongs 
in Lucene, but that I put into Solr for the sole fact that I need it for one of 
the Solr instances and simply can't afford to wait for Solr to be on the 
appropriate version of trunk.  Likewise, I may want something for Lucene from 
Solr but it is a fair amount of work to bring it up to the new Lucene APIs.

As for the sharing list, I started such a list on the other thread, but can 
duplicate here.

To me, there are at least the following:
1. Analyzers
2. Functions
3. Schema (although likely abstracted/reworked)
4. Warming/Reopen - this is hard code to get right and I've seen many people do 
it wrong.  It is also yet another area of duplication where something started 
in Solr b/c for years the Lucene community had no interest in donating code for 
it (incRef/decRef)
5. Faceting
6. Spatial

and on and on.  In fact, in my mind, it's pretty much everything other than 
stuff that is explicitly to do with Input/Output (Request Handlers, Response 
Writers)  and HTTP as the server mechanism.  Even with that list, though, I 
believe we can keep these separated enough that people can pick and choose.  In 
fact, your input, Mark, would be valuable in helping maintain that distinction. 
 As they say in the ASF, those who do, decide.

-Grant

Re: [VOTE] merge lucene/solr development

Reply via email to