Hey Hoss, I support Mike's original suggestion of having a shared, independently maintained/released analysis package for Nutch/Solr/Lucene. I emphatically do not support merging Solr and Lucene in the way proposed.
Hope that clarifies things, at least from me. Cheers, Chris On 3/1/10 11:43 AM, "Chris Hostetter" <[email protected]> wrote: (Man, why is it you guys alwasy decide to start the monolithic "let's redesign the world" threads while i'm offline for a few days ... I figured at worst I'd 'svn up' and discover that McCandless had reimplemented all of the indexing code in Scala, but i certainly wasn't expecting all of this.) As some one who has attempted to read it all at once, let me just say that this thread is way too big. I say this not as a facetious comment about the number of messages or the depth of replies but as a serious comment about the breadth and depth of the core issues that people seem to be trying to address in a monolithic fashion -- monolithic suggestions which are in many ways diametricly opposed to each other. Without obvious concensious on where we want to go, or a clear sense of how well things will work when we there "there" it seems most productive to focus on what would be needed to achieve some incremental steps that could be productive for any/all goals. At it's core: this thread started with McCandless'ss suggestion that refactoring some of text analysis code from Solr, Nutch and Lucene-Java out of all three projects and into a common code base would be beneficial to all three subprojects -- Not only do I see no flaw to that reasoning, but it also seems like it would (oddly enough) serve as a good first step towards *either* tighter development integration between Lucene-Java and Solr, *OR* towards looser development of the two code bases (via making Solr a seperate TLP). Developing a new code module like this should help demonstrate / excercise some of the "process" issues that might come up in trying to integrate the development and release processes of the existing products. If things work out "well" that may illustrate that tighter integration is better; if things work out "poor" that should also tells us something, and may give us guidance on how to move forward. In the worst case scenerio that i can imagine: some code is refactored out of Solr and Nutch in a way that makes it more directly usable by other comsumers of Lucene-Java. (Even if Solr and Nutch never use that code and become their own TLPs and succed from the ASF to become caribbean tax haven that seems like a Net win for Lucene-Java) To put the issue another way: Does anyone see how McCandless'ss suggestion would be counter-productive towards your vision of what Lucene/Solr/Nutch should be like in the future? (regardless of your particular vision is) ... : I started here with analysis because I think that's the biggest pain : point: it seemed like an obvious first step to fixing the code : duplication and thus the most likely to reach some consensus. And : it's also very timely: Robert is right now making all kinds of great : fixes to our collective analyzers (in between bouts of fuzzy DFA : debugging). -Hoss ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
