It's also possible to try and use the Velocity contrib response writer and paging it w/ the sitemap elements.
BTW generating a sitemap was a big reason of a switch we did from GSA to Solr because (for some reason) the map took way too long to generate (even simple requests). If you page through w/ Solr (ie rows=100&wt=velocity&v.template=sitemap) its fairly painless to build on cron. - Jon On Mar 18, 2010, at 6:25 PM, Chris Hostetter wrote: > > : Been testing nutch to crawl for solr and I was wondering if anyone had > : already worked on a system for getting the urls out of solr and generating > : an XML sitemap for Google. > > it's pretty easy to just paginate through all docs in solr, so you could > do that -- but I'd be really suprised if Nutch wasn't also loggign all the > URLs it indexed, so you could just post-process that log to build the > sitemap as well. > > > > -Hoss >