Actually, the robots.txt file should also disallow the 9.x guides. That won’t touch guide/latest.
User-agent: * Disallow: /guide/9* Disallow: /guide/8* Disallow: /guide/7* Disallow: /guide/6* wunder > On Sep 21, 2023, at 2:38 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > I’m actually OK with them being indexed. It could be helpful to search for > “Solr 8.11 aliases” or something like that. > > The priority attribute in sitemap.xml should boost the default, latest manual > and that shouldn’t require any web server config. I’m glad to craft a static > sitemap.xml file. One generated from the guide would be better, but that can > be a later improvement. > > To get the old versions completely out of the index, add a robots.txt file to > the solr-site repo under contents/ with these lines: > > User-agent: * > Disallow: /guide/8* > Disallow: /guide/7* > Disallow: /guide/6* > > Note that the wildcards on the paths aren't needed, but they helps humans > understand that the disallows are a prefix match. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Sep 21, 2023, at 12:08 PM, Houston Putman <hous...@apache.org> wrote: >> >> I've been trying to get this working for the last year. Basically our issue >> is that the htaccess files do not add the right X-Robots-Tag header for old >> ref guide pages. >> >> https://github.com/apache/solr-site/blob/main/themes/solr/templates/htaccess.ref-guide-old#L1 >> >> This works locally, but in the actual Solr site, the headers are not >> returned. I have no idea why. Would love some help though, as I also hate >> seeing the old ref guide in the google results. >> >> - Houston >> >> On Thu, Sep 21, 2023 at 11:30 AM Walter Underwood <wun...@wunderwood.org> >> wrote: >> >>> When I get web search results that include the Solr Reference Guide, I >>> often get older versions (6.6, 7.4) in the results. I would prefer to >>> always get the latest reference ( >>> https://solr.apache.org/guide/solr/latest/index.html). >>> >>> I think we can list the URLs for that in a sitemap.xml file with a higher >>> priority to suggest to the crawlers that these are the preferred pages. >>> >>> I don’t see a sitemap.xml or sitemap.xml.gz at https://solr.apached.org < >>> https://solr.apached.org/>. >>> >>> Should we prefer the latest manual? How do we build/deploy a sitemap? See: >>> https://www.sitemaps.org/ >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org