: Been testing nutch to crawl for solr and I was wondering if anyone had : already worked on a system for getting the urls out of solr and generating : an XML sitemap for Google.
it's pretty easy to just paginate through all docs in solr, so you could do that -- but I'd be really suprised if Nutch wasn't also loggign all the URLs it indexed, so you could just post-process that log to build the sitemap as well. -Hoss