I would also prefer to have the old versions in web search. Antora can build a sitemap.xml file, so the right place to do this work is probably in the ref guide part of the Solr build.
URLs that are not in the sitemap will still get indexed, so we can use the sitemap to hint that the latest guide is preferred. The entries would look something like this. <url> <loc>https://solr.apache.org/guide/solr/latest/index.html</loc> <priority>0.80</priority> </url> Default priority is 0.5, so 0.8 would make the latest more important. wunder > On Sep 21, 2023, at 3:14 PM, Arrieta, Alejandro <aarri...@perrinsoftware.com> > wrote: > > Hello, > > Please don't remove the indexing of older Solr guides. It helps to search > for "Solr X.Y what_to_search" and get the link to the corresponding guide. > Thumbs up to give higher priority to the latest guide. > > Kind Regards, > Alejandro Arrieta > > On Thu, Sep 21, 2023 at 3:42 PM Walter Underwood <wun...@wunderwood.org> > wrote: > >> Actually, the robots.txt file should also disallow the 9.x guides. That >> won’t touch guide/latest. >> >> User-agent: * >> Disallow: /guide/9* >> Disallow: /guide/8* >> Disallow: /guide/7* >> Disallow: /guide/6* >> >> wunder >> >>> On Sep 21, 2023, at 2:38 PM, Walter Underwood <wun...@wunderwood.org> >> wrote: >>> >>> I’m actually OK with them being indexed. It could be helpful to search >> for “Solr 8.11 aliases” or something like that. >>> >>> The priority attribute in sitemap.xml should boost the default, latest >> manual and that shouldn’t require any web server config. I’m glad to craft >> a static sitemap.xml file. One generated from the guide would be better, >> but that can be a later improvement. >>> >>> To get the old versions completely out of the index, add a robots.txt >> file to the solr-site repo under contents/ with these lines: >>> >>> User-agent: * >>> Disallow: /guide/8* >>> Disallow: /guide/7* >>> Disallow: /guide/6* >>> >>> Note that the wildcards on the paths aren't needed, but they helps >> humans understand that the disallows are a prefix match. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Sep 21, 2023, at 12:08 PM, Houston Putman <hous...@apache.org> >> wrote: >>>> >>>> I've been trying to get this working for the last year. Basically our >> issue >>>> is that the htaccess files do not add the right X-Robots-Tag header for >> old >>>> ref guide pages. >>>> >>>> >> https://github.com/apache/solr-site/blob/main/themes/solr/templates/htaccess.ref-guide-old#L1 >>>> >>>> This works locally, but in the actual Solr site, the headers are not >>>> returned. I have no idea why. Would love some help though, as I also >> hate >>>> seeing the old ref guide in the google results. >>>> >>>> - Houston >>>> >>>> On Thu, Sep 21, 2023 at 11:30 AM Walter Underwood < >> wun...@wunderwood.org> >>>> wrote: >>>> >>>>> When I get web search results that include the Solr Reference Guide, I >>>>> often get older versions (6.6, 7.4) in the results. I would prefer to >>>>> always get the latest reference ( >>>>> https://solr.apache.org/guide/solr/latest/index.html). >>>>> >>>>> I think we can list the URLs for that in a sitemap.xml file with a >> higher >>>>> priority to suggest to the crawlers that these are the preferred pages. >>>>> >>>>> I don’t see a sitemap.xml or sitemap.xml.gz at >> https://solr.apached.org < >>>>> https://solr.apached.org/>. >>>>> >>>>> Should we prefer the latest manual? How do we build/deploy a sitemap? >> See: >>>>> https://www.sitemaps.org/ >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org >> For additional commands, e-mail: dev-h...@solr.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org