Actually, the robots.txt file should also disallow the 9.x guides. That won’t 
touch guide/latest.

User-agent: *
Disallow: /guide/9* 
Disallow: /guide/8* 
Disallow: /guide/7*
Disallow: /guide/6*

wunder

> On Sep 21, 2023, at 2:38 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> 
> I’m actually OK with them being indexed. It could be helpful to search for 
> “Solr 8.11 aliases” or something like that.
> 
> The priority attribute in sitemap.xml should boost the default, latest manual 
> and that shouldn’t require any web server config. I’m glad to craft a static 
> sitemap.xml file. One generated from the guide would be better, but that can 
> be a later improvement.
> 
> To get the old versions completely out of the index, add a robots.txt file to 
> the solr-site repo under contents/ with these lines:
> 
> User-agent: *
> Disallow: /guide/8*
> Disallow: /guide/7*
> Disallow: /guide/6*
> 
> Note that the wildcards on the paths aren't needed, but they helps humans 
> understand that the disallows are a prefix match.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Sep 21, 2023, at 12:08 PM, Houston Putman <hous...@apache.org> wrote:
>> 
>> I've been trying to get this working for the last year. Basically our issue
>> is that the htaccess files do not add the right X-Robots-Tag header for old
>> ref guide pages.
>> 
>> https://github.com/apache/solr-site/blob/main/themes/solr/templates/htaccess.ref-guide-old#L1
>> 
>> This works locally, but in the actual Solr site, the headers are not
>> returned. I have no idea why. Would love some help though, as I also hate
>> seeing the old ref guide in the google results.
>> 
>> - Houston
>> 
>> On Thu, Sep 21, 2023 at 11:30 AM Walter Underwood <wun...@wunderwood.org>
>> wrote:
>> 
>>> When I get web search results that include the Solr Reference Guide, I
>>> often get older versions (6.6, 7.4) in the results. I would prefer to
>>> always get the latest reference (
>>> https://solr.apache.org/guide/solr/latest/index.html).
>>> 
>>> I think we can list the URLs for that in a sitemap.xml file with a higher
>>> priority to suggest to the crawlers that these are the preferred pages.
>>> 
>>> I don’t see a sitemap.xml or sitemap.xml.gz at https://solr.apached.org <
>>> https://solr.apached.org/>.
>>> 
>>> Should we prefer the latest manual? How do we build/deploy a sitemap? See:
>>> https://www.sitemaps.org/
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to