[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098254#comment-17098254 ]
Tomoko Uchida commented on LUCENE-9321: --------------------------------------- bq. Maybe we can dump all links by inserting 'print' and run the script. I tried to dump all cross-module (relative) links by this patch to chackJavadocLinks.py {code} diff --git a/dev-tools/scripts/checkJavadocLinks.py b/dev-tools/scripts/checkJavadocLinks.py index 5d07e27a588..a96879536c9 100644 --- a/dev-tools/scripts/checkJavadocLinks.py +++ b/dev-tools/scripts/checkJavadocLinks.py @@ -74,6 +74,12 @@ class FindHyperlinks(HTMLParser): elif href is not None: assert name is None href = href.strip() + absolute_url = urlparse.urljoin(self.baseURL, href) + prefix1 = '/'.join(urlparse.urlparse(self.baseURL).path.split('/')[:5]) + prefix2 = '/'.join(urlparse.urlparse(absolute_url).path.split('/')[:5]) + # print only cross-module relative links + if re.match('^../', href) and prefix1 != prefix2: + print('%s\t%s\t%s' % (self.baseURL, href, absolute_url)) self.links.append(urlparse.urljoin(self.baseURL, href)) elif id is None: raise RuntimeError('couldn\'t find an href nor name in link in %s: only got these attrs: %s' % (self.baseURL, attrs)) @@ -130,8 +136,9 @@ def checkAll(dirName): global failures # Find/parse all HTML files first - print() - print('Crawl/parse...') + #print() + #print('Crawl/parse...') + print('filename\trelative path\tabsolute url') allFiles = {} if os.path.isfile(dirName): @@ -160,8 +167,8 @@ def checkAll(dirName): allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), encoding='UTF-8').read()) # ... then verify: - print() - print('Verify...') + #print() + #print('Verify...') for fullPath, (links, anchors) in allFiles.items(): #print fullPath printed = False {code} I don't want to attach the results (as the output file is large), but this can be run as below {code} lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py lucene/build/docs/ > ~/work/lucene-javadocs-relative-paths.tsv lucene-solr $ wc -l ~/work/lucene-javadocs-relative-paths.tsv 31434 /home/moco/work/lucene-javadocs-relative-paths.tsv lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py solr/build/docs/ > ~/work/solr-javadocs-relative-paths.tsv lucene-solr $ wc -l ~/work/solr-javadocs-relative-paths.tsv 9307 /home/moco/work/solr-javadocs-relative-paths.tsv {code} This includes both kind of relative paths - automatically generated links by javadoc tool and hand written links by human (I don't know there is a way to distinguish them). With gradle scripts on the current master, the number should be reduced since all automatically generated links are absolute ones with "renderJavadoc" task. > Port documentation task to gradle > --------------------------------- > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build > Reporter: Tomoko Uchida > Assignee: Tomoko Uchida > Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org