Lucene?

Mark Miller Mon, 01 Mar 2010 11:28:07 -0800

On 03/01/2010 01:43 PM, Chris Hostetter wrote:

(Man, why is it you guys alwasy decide to start the monolithic
"let's redesign the world" threads while i'm offline for a few days ...
I figured at worst I'd 'svn up' and discover that McCandless had
reimplemented all of the indexing code in Scala, but i certainly wasn't
expecting all of this.)


As some one who has attempted to read it all at once, let me just say that
this thread is way too big.

I say this not as a facetious comment about the number of messages or the
depth of replies but as a serious comment about the breadth and depth of
the core issues that people seem to be trying to address in a monolithic
fashion -- monolithic suggestions which are in many ways diametricly
opposed to each other.

Personally, I don't think the idea of a merge is too big. I think theimplications of it are less than you are making them out to be.Monolithic suggestions? Lets half merge? Lets draft a resolutionindicating that both Lucene and Solr devs would like to possibly playnicer together with more communication? I don't think that are a lot ofbaby steps towards this goal that will have any meaning or ramifications.

Without obvious concensious on where we want to go, or a clear sense of
how well things will work when we there "there" it seems most productive
to focus on what would be needed to achieve some incremental steps that
could be productive for any/all goals.

That sounds like magic to me :) Or focusing on stuff that has nothing todo with a merge or TLP.

At it's core: this thread started with McCandless'ss suggestion that
refactoring some of text analysis code from Solr, Nutch and Lucene-Java
out of all three projects and into a common code base would be beneficial
to all three subprojects -- Not only do I see no flaw to that reasoning,
but it also seems like it would (oddly enough) serve as a good first step
towards *either* tighter development integration between Lucene-Java and
Solr, *OR* towards looser development of the two code bases (via making
Solr a seperate TLP).

Developing a new code module like this should help demonstrate / excercise
some of the "process" issues that might come up in trying to integrate the
development and release processes of the existing products.  If things
work out "well" that may illustrate that tighter integration is better; if
things work out "poor" that should also tells us something, and may give
us guidance on how to move forward.  In the worst case scenerio that i can
imagine: some code is refactored out of Solr and Nutch in a way that makes
it more directly usable by other comsumers of Lucene-Java.  (Even if Solr
and Nutch never use that code and become their own TLPs and succed from
the ASF to become caribbean tax haven that seems like a Net win for
Lucene-Java)

To put the issue another way: Does anyone see how McCandless'ss suggestion
would be counter-productive towards your vision of what Lucene/Solr/Nutch
should be like in the future? (regardless of your particular vision is)

No, not necessarily - but I don't think its going to tell us anythinguseful about a merge. Its just goingto factor out some analyzers into what is likely going to be yet*another* project with more "do we run on trunk"or "don't we" issues. Or it will be a Lucene contrib, and cause us evenmore headaches due to Solr not running on trunk.

                        ...

: I started here with analysis because I think that's the biggest pain
: point: it seemed like an obvious first step to fixing the code
: duplication and thus the most likely to reach some consensus.  And
: it's also very timely: Robert is right now making all kinds of great
: fixes to our collective analyzers (in between bouts of fuzzy DFA
: debugging).



-Hoss



--
- Mark

http://www.lucidimagination.com

Re: Factor out a standalone, shared analysis package for Nutch/Solr/Lucene?

Reply via email to