thisisnic opened a new issue, #49692: URL: https://github.com/apache/arrow/issues/49692
### Describe the enhancement requested Google searches for docs take users to older versions: https://bsky.app/profile/tanho.ca/post/3miwyhp63q22w AI has the following recommendations, but I think that we might be able to add in custom html or make a PR to pkgdown if it doesn't work. We should also make a PR to arrow site with the fixes for previously rendered docs :robot: analysis below ------------------------------------------------------------------------------------------------------------------- Three things combine: 1. No canonical tags on R docs — The Python/C++ docs (built with Sphinx) have <link rel="canonical"> tags on every page, including old versions, pointing to the current URL. The R docs (built with pkgdown) have none. So Google sees 24 copies of the same content and has to guess which is authoritative. 2. URL changed between v12 and v13 — The schema page was Schema.html (capital S) in v12, but became schema.html (lowercase) from v13 onward. Google indexed the old URL, it still works on v12, and the current docs return 404 for the capitalized version. Google has no reason to switch. 3. All 24 old versions are fully crawlable — No noindex, no canonical tags, no robots.txt restrictions. The old versions collectively have more inbound links from years of Stack Overflow answers and blog posts. What could fix it The most impactful approach would be: - Add canonical tags pointing to the current (unversioned) URL on all pages - Add noindex to old versioned docs so Google stops surfacing them pkgdown doesn't natively support canonical tags, so this would likely need a post-build script that injects them into the HTML after pkgdown generates the docs. The Python/C++ docs already solve this via Sphinx's built-in canonical URL support. ### Component(s) Documentation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
