https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116419
--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by David Malcolm <dmalc...@gcc.gnu.org>: https://gcc.gnu.org/g:aff7f677120ec394adcedd0dd5cc3afa3b5be102 commit r15-3312-gaff7f677120ec394adcedd0dd5cc3afa3b5be102 Author: David Malcolm <dmalc...@redhat.com> Date: Thu Aug 29 18:48:32 2024 -0400 SARIF output: implement embedded URLs in messages (§3.11.6; PR other/116419) GCC diagnostic messages can contain URLs, such as to our documentation when we suggest an option name to correct a misspelling. SARIF message strings can contain embedded URLs in the plain text messages (see SARIF v2.1.0 §3.11.6), but previously we were simply dropping any URLs from the diagnostic messages. This patch adds support for encoding URLs into messages in our SARIF output, using the pp_token machinery added in the previous patch. As well as supporting URLs, the patch also adjusts how we report event IDs in SARIF message, so that rather than e.g. "text": "second 'free' here; first 'free' was at (1)" we now report: "text": "second 'free' here; first 'free' was at [(1)](sarif:/runs/0/results/0/codeFlows/0/threadFlows/0/locations/0)" i.e. the text "(1)" now has a embedded link referring within the sarif log to the threadFlowLocation object for the other event, via JSON pointer (see §3.10.3 "URIs that use the sarif scheme"). Doing so requires the arious objects to know their index within their containing array, requiring some reworking of how they are constructed. gcc/ChangeLog: PR other/116419 * diagnostic-event-id.h (diagnostic_event_id_t::zero_based): New. * diagnostic-format-sarif.cc: Include "pretty-print-format-impl.h" and "pretty-print-urlifier.h". (sarif_result::sarif_result): Add param "idx_within_parent". (sarif_result::get_index_within_parent): New accessor. (sarif_result::m_idx_within_parent): New field. (sarif_code_flow::sarif_code_flow): New ctor. (sarif_code_flow::get_parent): New accessor. (sarif_code_flow::get_index_within_parent): New accessor. (sarif_code_flow::m_parent): New field. (sarif_code_flow::m_thread_id_map): New field. (sarif_code_flow::m_thread_flows_arr): New field. (sarif_code_flow::m_all_tfl_objs): New field. (sarif_thread_flow::sarif_thread_flow): Add "parent" and "idx_within_parent" params. (sarif_thread_flow::get_parent): New accessor. (sarif_thread_flow::get_index_within_parent): New accessor. (sarif_thread_flow::m_parent): New field. (sarif_thread_flow::m_idx_within_parent): New field. (sarif_thread_flow_location::sarif_thread_flow_location): New ctor. (sarif_thread_flow_location::get_parent): New accessor. (sarif_thread_flow_location::get_index_within_parent): New accessor. (sarif_thread_flow_location::m_parent): New field. (sarif_thread_flow_location::m_idx_within_parent): New field. (sarif_builder::get_code_flow_for_event_ids): New accessor. (class sarif_builder::sarif_token_printer): New. (sarif_builder::m_token_printer): New member. (sarif_builder::m_next_result_idx): New field. (sarif_builder::m_current_code_flow): New field. (sarif_code_flow::get_or_append_thread_flow): New. (sarif_code_flow::get_thread_flow): New. (sarif_code_flow::add_location): New. (sarif_code_flow::get_thread_flow_loc_obj): New. (sarif_thread_flow::add_location): Create the new sarif_thread_flow_location internally, rather than passing it in as a parm so that we can keep track of its index in the array. Return a reference to it. (sarif_builder::sarif_builder): Initialize m_token_printer, m_next_result_idx, and m_current_code_flow. (sarif_builder::on_report_diagnostic): Pass index to make_result_object. (sarif_builder::make_result_object): Add "idx_within_parent" param and pass to sarif_result ctor. Pass code flow index to call to make_code_flow_object. (make_sarif_url_for_event): New. (sarif_builder::make_code_flow_object): Add "idx_within_parent" param and pass it to sarif_code_flow ctor. Reimplement walking of events so that we first create threadFlow objects for each thread, then populate them with threadFlowLocation objects, so that the IDs work. Set m_current_code_flow whilst creating the latter, so that we can create correct URIs for "%@". (sarif_builder::make_thread_flow_location_object): Replace with... (sarif_builder::populate_thread_flow_location_object): ...this. (sarif_output_format::get_builder): New accessor. (sarif_begin_embedded_link): New. (sarif_end_embedded_link): New. (sarif_builder::sarif_token_printer::print_tokens): New. (diagnostic_output_format_init_sarif): Add "fmt" param; use it to set the token printer and output format for the context. (diagnostic_output_format_init_sarif_stderr): Move responsibility for setting the context's output format to within diagnostic_output_format_init_sarif. (diagnostic_output_format_init_sarif_file): Likewise. (diagnostic_output_format_init_sarif_stream): Likewise. (test_sarif_diagnostic_context::test_sarif_diagnostic_context): Likewise. (selftest::test_make_location_object): Provide an idx for the result. (selftest::get_result_from_log): New. (selftest::get_message_from_log): New. (selftest::test_message_with_embedded_link): New test. (selftest::diagnostic_format_sarif_cc_tests): Call it. * pretty-print-format-impl.h: Include "diagnostic-event-id.h". (pp_token::kind): Add "event_id". (struct pp_token_event_id): New. (is_a_helper <pp_token_event_id *>::test): New. (is_a_helper <const pp_token_event_id *>::test): New. * pretty-print.cc (pp_token::dump): Handle kind::event_id. (pretty_printer::format): Update handling of "%@" in phase 2 so that we add a pp_token_event_id, rather that the text "(N)". (default_token_printer): Handle pp_token::kind::event_id by printing the text "(N)". gcc/testsuite/ChangeLog: PR other/116419 * gcc.dg/sarif-output/bad-pragma.c: New test. * gcc.dg/sarif-output/test-bad-pragma.py: New test. * gcc.dg/sarif-output/test-include-chain-2.py (test_location_relationships): Update expected text of event to include an intra-sarif URI to the other event. Signed-off-by: David Malcolm <dmalc...@redhat.com>