I note that there is a full download option available, might be easier than crawling.
François On Sep 4, 2011, at 9:56 AM, Markus Jelsma wrote: > Hi, > > Solr is a search engine, not a crawler. You can use Apache Nutch to crawl > your > site and have it indexed in Solr. > > Cheers, > >> Hi, >> >> I am new to Solr/Lucene, and have some problems trying to figure out the >> best way to perform indexing. I think I understand the general principles, >> but have some trouble translating this to my specific goal, which is the >> following: >> >> I want to use SolR as a search engine based on general (English) keywords, >> that has indexed Wikipedia for Schools >> (http://www.soschildrensvillages.org.uk/charity-news/archive/2008/10/2008- >> wikipedia-for-schools). >> >> I initially thought that it would be sufficient to add the root document >> (index.html) to Solr, after which everything would be automagically >> indexed, but this does not seem to work. I have also tried to use >> urldatasource in data-config.xml, but there I get a bit confused by the >> settings. >> >> Could anyone help me understand how I can achieve my goal? >> >> Thanks >> >> Kees