It's unfortunately actually a pretty domain specific thing (urls, content, etc), there are also limits @ certain points (see ... but we took CNN.com as a model, for example:
http://www.cnn.com/video_sitemap_index.xml http://www.cnn.com/sitemap_videos_0001.xml Then you just line up the big 3 w/ the static URLs, etc. http://en.wikipedia.org/wiki/Sitemaps (the submission URLs are there) http://www.bing.com/toolbox/posts/archive/2009/10/09/submit-a-sitemap-to-bing.aspx In general though it's great to create custom handlers and use Velocity templates for pretty much anything + its great for prototyping. - Jon On Mar 19, 2010, at 8:55 AM, Erik Hatcher wrote: > Jon - > > Very cool use of VelocityResponseWriter! > > Would you happen to have a sitemap.vm template to contribute? I realize > there'd need to be an external URL configurable, but this would be trivially > added as a request parameter and leveraged in the template. > > Erik > > p.s. Anyone else using VelocityResponseWriter out there? Sitemaps is a > great use of it. And also I've got a report of a big company in Brazil using > it for e-mail generation of search results. I'm in the process of baking > VrW into the main Solr example (it's there on trunk, basically) and more > examples are better. > > On Mar 18, 2010, at 7:40 PM, Jon Baer wrote: > >> It's also possible to try and use the Velocity contrib response writer and >> paging it w/ the sitemap elements. >> >> BTW generating a sitemap was a big reason of a switch we did from GSA to >> Solr because (for some reason) the map took way too long to generate (even >> simple requests). >> >> If you page through w/ Solr (ie rows=100&wt=velocity&v.template=sitemap) its >> fairly painless to build on cron. >> >> - Jon >> >> On Mar 18, 2010, at 6:25 PM, Chris Hostetter wrote: >> >>> >>> : Been testing nutch to crawl for solr and I was wondering if anyone had >>> : already worked on a system for getting the urls out of solr and generating >>> : an XML sitemap for Google. >>> >>> it's pretty easy to just paginate through all docs in solr, so you could >>> do that -- but I'd be really suprised if Nutch wasn't also loggign all the >>> URLs it indexed, so you could just post-process that log to build the >>> sitemap as well. >>> >>> >>> >>> -Hoss >>> >> >