Re: Generating a sitemap

Jon Baer Fri, 19 Mar 2010 06:15:45 -0700

It's unfortunately actually a pretty domain specific thing (urls, content, 
etc), there are also limits @ certain points (see ... but we took CNN.com as a 
model, for example:


http://www.cnn.com/video_sitemap_index.xml
http://www.cnn.com/sitemap_videos_0001.xml

Then you just line up the big 3 w/ the static URLs, etc.

http://en.wikipedia.org/wiki/Sitemaps (the submission URLs are there)
http://www.bing.com/toolbox/posts/archive/2009/10/09/submit-a-sitemap-to-bing.aspx

In general though it's great to create custom handlers and use Velocity 
templates for pretty much anything + its great for prototyping.

- Jon

On Mar 19, 2010, at 8:55 AM, Erik Hatcher wrote:

> Jon -
> 
> Very cool use of VelocityResponseWriter!
> 
> Would you happen to have a sitemap.vm template to contribute?   I realize 
> there'd need to be an external URL configurable, but this would be trivially 
> added as a request parameter and leveraged in the template.
> 
>       Erik
> 
> p.s. Anyone else using VelocityResponseWriter out there?   Sitemaps is a 
> great use of it.  And also I've got a report of a big company in Brazil using 
> it for e-mail generation of search results.   I'm in the process of baking 
> VrW into the main Solr example (it's there on trunk, basically) and more 
> examples are better.
> 
> On Mar 18, 2010, at 7:40 PM, Jon Baer wrote:
> 
>> It's also possible to try and use the Velocity contrib response writer and 
>> paging it w/ the sitemap elements.
>> 
>> BTW generating a sitemap was a big reason of a switch we did from GSA to 
>> Solr because (for some reason) the map took way too long to generate (even 
>> simple requests).
>> 
>> If you page through w/ Solr (ie rows=100&wt=velocity&v.template=sitemap) its 
>> fairly painless to build on cron.
>> 
>> - Jon
>> 
>> On Mar 18, 2010, at 6:25 PM, Chris Hostetter wrote:
>> 
>>> 
>>> : Been testing nutch to crawl for solr and I was wondering if anyone had
>>> : already worked on a system for getting the urls out of solr and generating
>>> : an XML sitemap for Google.
>>> 
>>> it's pretty easy to just paginate through all docs in solr, so you could
>>> do that -- but I'd be really suprised if Nutch wasn't also loggign all the
>>> URLs it indexed, so you could just post-process that log to build the
>>> sitemap as well.
>>> 
>>> 
>>> 
>>> -Hoss
>>> 
>> 
>

Re: Generating a sitemap

Reply via email to