[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098254#comment-17098254
 ] 

Tomoko Uchida commented on LUCENE-9321:
---------------------------------------

bq. Maybe we can dump all links by inserting 'print' and run the script.

I tried to dump all cross-module (relative) links by this patch to 
chackJavadocLinks.py
{code}
diff --git a/dev-tools/scripts/checkJavadocLinks.py 
b/dev-tools/scripts/checkJavadocLinks.py
index 5d07e27a588..a96879536c9 100644
--- a/dev-tools/scripts/checkJavadocLinks.py
+++ b/dev-tools/scripts/checkJavadocLinks.py
@@ -74,6 +74,12 @@ class FindHyperlinks(HTMLParser):
       elif href is not None:
         assert name is None
         href = href.strip()
+        absolute_url = urlparse.urljoin(self.baseURL, href)
+        prefix1 = '/'.join(urlparse.urlparse(self.baseURL).path.split('/')[:5])
+        prefix2 = '/'.join(urlparse.urlparse(absolute_url).path.split('/')[:5])
+        # print only cross-module relative links
+        if re.match('^../', href) and prefix1 != prefix2:
+          print('%s\t%s\t%s' % (self.baseURL, href, absolute_url))
         self.links.append(urlparse.urljoin(self.baseURL, href))
       elif id is None:
         raise RuntimeError('couldn\'t find an href nor name in link in %s: 
only got these attrs: %s' % (self.baseURL, attrs))
@@ -130,8 +136,9 @@ def checkAll(dirName):
   global failures
 
   # Find/parse all HTML files first
-  print()
-  print('Crawl/parse...')
+  #print()
+  #print('Crawl/parse...')
+  print('filename\trelative path\tabsolute url')
   allFiles = {}
 
   if os.path.isfile(dirName):
@@ -160,8 +167,8 @@ def checkAll(dirName):
         allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), 
encoding='UTF-8').read())
 
   # ... then verify:
-  print()
-  print('Verify...')
+  #print()
+  #print('Verify...')
   for fullPath, (links, anchors) in allFiles.items():
     #print fullPath
     printed = False
{code}

I don't want to attach the results (as the output file is large), but this can 
be run as below
{code}
lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py 
lucene/build/docs/ > ~/work/lucene-javadocs-relative-paths.tsv
lucene-solr $ wc -l ~/work/lucene-javadocs-relative-paths.tsv 
31434 /home/moco/work/lucene-javadocs-relative-paths.tsv

lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py solr/build/docs/ 
> ~/work/solr-javadocs-relative-paths.tsv
lucene-solr $ wc -l ~/work/solr-javadocs-relative-paths.tsv 
9307 /home/moco/work/solr-javadocs-relative-paths.tsv
{code}

This includes both kind of relative paths - automatically generated links by 
javadoc tool and hand written links by human (I don't know there is a way to 
distinguish them). With gradle scripts on the current master, the number should 
be reduced since all automatically generated links are absolute ones with 
"renderJavadoc" task. 



> Port documentation task to gradle
> ---------------------------------
>
>                 Key: LUCENE-9321
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9321
>             Project: Lucene - Core
>          Issue Type: Sub-task
>          Components: general/build
>            Reporter: Tomoko Uchida
>            Assignee: Tomoko Uchida
>            Priority: Major
>         Attachments: screenshot-1.png
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to