On 2010-03-24 16:15, Markus Jelsma wrote:
A bit off-topic but how about Nutch grabbing some conent and have it indexed in Solr?
The problem is not with collecting and submitting the documents, the problem is with parsing the Wikimedia markup embedded in XML. WikipediaTokenizer from Lucene contrib/ is a quick and perhaps acceptable solution ...
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com