[ 
https://issues.apache.org/jira/browse/LUCENE-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189214#comment-17189214
 ] 

Robert Muir commented on LUCENE-9496:
-------------------------------------

It looks like the doc tree will give you a "Link" node that has a "Reference" 
(basically just the signature of what we're linking to). Ideally we would just 
"check" that each reference is "included": but i'm not sure if it can be that 
simple: as some of the links might go across JARs.

Otherwise another alternative is to explicitly implement logic to check for 
reasons why the link would be broken: e.g. that the signature refers to 
something with private/package-private scope. It is worth considering because 
we could probably give very explicit nice error messages to the developer 
rather than just a "broken link" error. Today the errors might be non-intuitive 
when you do such a thing.

Third option would be to just parse the HTML output with SAX or something, if 
we want it to be java or groovy instead of python. But this wouldn't be any 
"better" (error messaging, speed, wrestling with bugs in javadoc generation 
itself, etc), but it would at least get the precommit to pure java.

> Replace (or accelerate) check-broken-links.gradle with a doclet pass
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9496
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9496
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Dawid Weiss
>            Priority: Minor
>
> This is just a placeholder, but perhaps somebody will find the time to push 
> this forward. The current python script in check-broken-links reparses all 
> emitted HTML files to find links. I have a strong feeling this could be done 
> better. 
> Javadoc doclets have access to parse trees for both the code and the javadoc 
> comments (including information about HTML tags, code links, etc.). For 
> example, this information is used by the built-in javac HTML linter.
> Maybe we could replace the python linter entirely; verify where code links 
> will point at, where existing HTML links point to and validate this 
> information. I wrote some of that link-parsing code in Carrot2 (to convert 
> javadocs into a structured JSON format used in other documentation). The code 
> there is free to eyeball and borrow, if needed. 
> https://docs.oracle.com/en/java/javase/11/docs/api/jdk.compiler/com/sun/source/util/DocTreeScanner.html
> https://github.com/carrot2/carrot2/blob/master/infra/jsondoclet/src/main/java/com/carrotsearch/jsondoclet/JavaDocsVisitor.java#L135
> https://github.com/carrot2/carrot2/blob/master/infra/jsondoclet/src/main/java/com/carrotsearch/jsondoclet/PlainReferenceConverter.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to