Solr Project Structure (was Re: Field Collapsing SOLR-236)

Mark Diggory Thu, 17 Jun 2010 16:45:15 -0700

Erik,

I try not to be exclusionary of others development tool choices in the 
selection of my own.  However, just to surely stir up a nest of hornets in true 
Apache fashion... when I saw what was done with the "templating" of the Maven 
pom work that was originally donated to solr, I just cringed at it.  The point 
of using Maven as a build tool is to avoid the complexity that was introduced 
by "one off'ing" in the manner that was finally committed.

https://issues.apache.org/jira/browse/SOLR-19
https://issues.apache.org/jira/browse/SOLR-586

I choose to do the inverse of the solr build process as a means to manage our 
own dependency on solr in a Maven way, conventional to our current build and 
modularity practices.  I don't think Solr needs to adopt maven if they prefer 
ant, just draw clearer lines through the project about how to separate code for 
functional areas and clearly document the interfaces that should be 
customizable/changeable. JIRA Tasks should tackle core code changes separately 
from addon functionality that can be swapped out or left behind such as to 
avoid the risks of producing "spaghetti" interdependencies in the codebase. And 
if using Ant, efforts should be made to not do highly complex transformations 
of the sourcecode and or generated artifacts.  Ideally, source directories 
should have a 1 to 1 relationship to artifacts that are produced.

This SOLR-236 is a posterchild of an unclear practice or convention for how to 
package customizations to Solr.   Really, isn't SOLR-236 "wanted" enough to 
warrant that it actually reside in the svn where it could be developed properly 
rather than as a task thats been open for "how many years"?!  I'd highly 
recommend the Field Collapsing prototype ceased to be managed as patches in a 
JIRA task and actually got some code re-visioning behind it and interim release 
building available.  

I'll even confess that my "patch cludge" in my maven project to apply SOLR-236 
to the solr source is not at all a best practice in terms of supporting addons 
to solr.  It was simply an attempt to compensate.  Ideally, Field Collapsing 
should have been a separately maintained codebase in a separate maven project 
that did not interfere with the solr, solr core request handler or 
configuration implementations and simply just depended on them.  Then it could 
be dropped into a lib dir of any solr 1.4.0. (and conversely just added to my 
webapp poms as a maven dependency when they are assembled in our own build 
processes).

further comments below...

On Jun 17, 2010, at 11:20 AM, Erik Hatcher wrote:
> On Jun 16, 2010, at 7:31 PM, Mark Diggory wrote:
>> p.s. I'd be glad to contribute our Maven build re-organization back to the 
>> community to get Solr properly Mavenized so that it can be distributed and 
>> released more often.  For us the benefit of this structure is that we will 
>> be able to overlay addons such as RequestHandlers and other third party 
>> support without having to rebuild Solr from scratch.
> 
> But you don't have to rebuild Solr from scratch to add a new request handler 
> or other plugins - simply compile your custom stuff into a JAR and put it in 
> <solr-home>/lib (or point to it with <lib> in solrconfig.xml).

Tu chez! 

>> Ideally, a Maven Archetype could be created that would allow one rapidly 
>> produce a Solr webapp and fire it up in Jetty in mere seconds.
> How's that any different than cd example; java -jar start.jar?  Or do you 
> mean a Solr client webapp?

mvn package jetty:run

Its not much different, but it is different in that its webapplication and 
development tool centric, theres no special startup code, thats just using 
maven+jetty or your debugging environment to fire up the war for testing. 

>> Finally, with projects such as Bobo, integration with Spring would make 
>> configuration more consistent and request significantly less java coding 
>> just to add new capabilities everytime someone authors a new RequestHandler.
> 
> It's one line of config to add a new request handler.  How many ridiculously 
> ugly confusing lines of Spring XML would it take?

But if I have my own configuration for that Request Handler, how many lines of 
java to I need to add/alter to get that configuration to parse in solr config 
and be available? Even if its just a few, its IMO, its still the wrong way to 
be cutting the cake.

> 
>> The biggest thing I learned about Solr in my work thusfar is that patches 
>> like these could be standalone modules in separate projects if it weren't 
>> for having to hack the configuration and solrj methods up to adopt them.  
>> Which brings me to SolrJ, great API if it would stay generic and have less 
>> concern for adding method each time some custom collections and query 
>> support for morelikethis or collapseddocs needs to be added.
> I personally find it silly that we customize SolrJ for all these request 
> handlers anyway.  You get a decent navigable data structure back from general 
> SolrJ query requests as it is, there's no need to build in all these 
> convenience methods specific to all the Solr componetry.  Sure, it's 
> "convenient", but it's a maintenance headache and as you say, not generic.

Its an example of something I coin a "policing bottleneck".  Where the core 
code introduces a pattern for convenience that restricts the ability  to add 
features to the application without "approval" I.E. consensus that the code 
contribution be part of the central API. Thus as long as the patch alters core 
code, you can't make the whole solution easily available to your users without 
complete approval.

> But hacking configuration is reasonable, I think, for adding in plugins.  I 
> guess you're aiming for some kind of Spring-like auto-discovery of plugins?  
> Yeah, maybe, but I'm pretty -1 on Spring coming into Solr.  It's overkill and 
> ugly, IMO.  But you like it :)  And that's cool by me, to each their own.

Your welcome to your opinions, I have some pretty strong ones in favor of 
Spring as well. hacking a configuration file is one thing, having to alter code 
to support new configuration properties, well that creates a bit of a 
bottleneck in getting changes into the codebase.

1.) Having simple configuration is not really of benefit if it takes coding for 
your users to customize it, which they will, as is seen in many of the patches 
provided to solr. Ultimately, this burdens your own efforts to maintain the 
application and delays in processing tasks that have good functional addons in 
them.

2.) Adopting Spring (or any other IoC/DI framework) forces your application 
code instantiation and binding to be evaluated and refactored; this frees how 
your application is assembled, allows your application functional areas to be 
more cleanly separated, and hardens your applications interface contracts. 
Doing so, in turn,  allows your application to be assembled and reused inside 
other frameworks and environments, increasing your target user base and 
participation, making your project ultimately, more active and more successful.

So, I think the goal of any development activity around adopting a DI framework 
would be to free up the application from being hardbound to the configuration 
file so those of us that choose to use other configuration tools can do so.  I 
even suspect that it might be already possible if the instantiation and binding 
of the objects that form a Solr application are sufficiently separate from 
those configuration classes in the first place. A good area to explore.

> 
> Oh, and Hi Mark! :)

Your Solr Blackbelt at code4lib was excellent, as were our conversations 
afterward.  :-)

Cheers,
Mark

Solr Project Structure (was Re: Field Collapsing SOLR-236)

Reply via email to