It's also possible to try and use the Velocity contrib response writer and 
paging it w/ the sitemap elements.

BTW generating a sitemap was a big reason of a switch we did from GSA to Solr 
because (for some reason) the map took way too long to generate (even simple 
requests).

If you page through w/ Solr (ie rows=100&wt=velocity&v.template=sitemap) its 
fairly painless to build on cron.

- Jon

On Mar 18, 2010, at 6:25 PM, Chris Hostetter wrote:

> 
> : Been testing nutch to crawl for solr and I was wondering if anyone had
> : already worked on a system for getting the urls out of solr and generating
> : an XML sitemap for Google.
> 
> it's pretty easy to just paginate through all docs in solr, so you could 
> do that -- but I'd be really suprised if Nutch wasn't also loggign all the 
> URLs it indexed, so you could just post-process that log to build the 
> sitemap as well.
> 
> 
> 
> -Hoss
> 

Reply via email to