thisisnic opened a new issue, #49692:
URL: https://github.com/apache/arrow/issues/49692

   ### Describe the enhancement requested
   
   Google searches for docs take users to older versions: 
https://bsky.app/profile/tanho.ca/post/3miwyhp63q22w
   
   AI has the following recommendations, but I think that we might be able to 
add in custom html or make a PR to pkgdown if it doesn't work.
   
   We should also make a PR to arrow site with the fixes for previously 
rendered docs
   
   :robot: analysis below
   
   
   
-------------------------------------------------------------------------------------------------------------------
   
   Three things combine:                                                        
                           
                                                                                
                             
     1. No canonical tags on R docs — The Python/C++ docs (built with Sphinx) 
have <link rel="canonical">    
     tags on every page, including old versions, pointing to the current URL. 
The R docs (built with pkgdown)
      have none. So Google sees 24 copies of the same content and has to guess 
which is authoritative.       
     2. URL changed between v12 and v13 — The schema page was Schema.html 
(capital S) in v12, but became
     schema.html (lowercase) from v13 onward. Google indexed the old URL, it 
still works on v12, and the     
     current docs return 404 for the capitalized version. Google has no reason 
to switch.
     3. All 24 old versions are fully crawlable — No noindex, no canonical 
tags, no robots.txt restrictions. 
     The old versions collectively have more inbound links from years of Stack 
Overflow answers and blog     
     posts.
                                                                                
                             
     What could fix it
   
     The most impactful approach would be:                                      
                             
     - Add canonical tags pointing to the current (unversioned) URL on all pages
     - Add noindex to old versioned docs so Google stops surfacing them         
                             
                                                                       
     pkgdown doesn't natively support canonical tags, so this would likely need 
a post-build script that     
     injects them into the HTML after pkgdown generates the docs. The 
Python/C++ docs already solve this via 
     Sphinx's built-in canonical URL support.
   
   
   ### Component(s)
   
   Documentation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to