On 2010-03-24 16:15, Markus Jelsma wrote:
A bit off-topic but how about Nutch grabbing some conent and have it indexed
in Solr?

The problem is not with collecting and submitting the documents, the problem is with parsing the Wikimedia markup embedded in XML. WikipediaTokenizer from Lucene contrib/ is a quick and perhaps acceptable solution ...

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to