Thanks to everyone who responded, no wonder I was getting confused, I was completely focusing on the wrong half of the equation.
I had a cursory look through some of the Nutch documentation available and it is looking promising. Thanks everyone. Mark On Tue, Dec 7, 2010 at 10:19 PM, webdev1977 <webdev1...@gmail.com> wrote: > > I my experience, the hardest (but most flexible part) is exactly what was > mentioned.. processing the data. Nutch does have a really easy plugin > interface that you can use, and the example plugin is a great place to > start. Once you have the raw parsed text, you can do what ever you want > with it. For example, I wrote a plugin to add geospatial information to > my > NutchDocument. You then map the fields you added in the NutchDocument to > something you want to have Solr index. In my case I created a geography > field where I put lat, lon info. Then you create that same geography field > in the nutch to solr mapping file as well as your solr schema.xml file. > Then, when you run the crawl and tell it to use "solrindex" it will send > the > document to solr to be indexed. Since you have your new field in the > schema, it knows what to do with it at index time. Now you can build a > user > interface around what you want to do with that field. > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Newbie-need-a-point-in-the-right-direction-tp2031381p2033687.html > Sent from the Solr - User mailing list archive at Nabble.com. >