[I] Missing URL Protocol Validation for Third-Party Distribution URLs Rendered in HTML (tooling-trusted-releases)

via GitHub Mon, 30 Mar 2026 18:35:08 -0700


asf-tooling opened a new issue, #986:
URL: https://github.com/apache/tooling-trusted-releases/issues/986


   **ASVS Level(s):** [L1]
   
   **Description:**
   
   ### Summary
   URLs from third-party API responses (NPM, ArtifactHub, PyPI) are rendered as 
clickable HTML links without protocol validation. The `distribution_web_url()` 
function extracts URLs directly from API responses and stores them in the 
database. These URLs are later rendered via `html_tr_a()` as `<a href>` 
elements without validating the protocol scheme. An attacker could publish a 
package with a `javascript:` or `data:` URL in the homepage field, which would 
be stored and later execute in users' browsers when they view the distribution 
page, resulting in stored XSS. Jinja2 auto-escaping prevents breaking out of 
HTML attributes but does NOT prevent `javascript:` protocol execution in href 
attributes.
   
   ### Details
   **Affected Files and Lines:**
   - `atr/shared/distribution.py:161-202` - URL extraction without validation
   - `atr/shared/distribution.py:248` - URL rendering
   - `atr/get/distribution.py:105` - Distribution display
   
   URLs are extracted from third-party APIs and rendered without protocol 
validation, allowing dangerous protocols.
   
   ### Recommended Remediation
   Create a centralized URL protocol validation function and apply it to all 
third-party URLs:
   
   ```python
   _SAFE_URL_SCHEMES = frozenset({'http', 'https'})
   
   def validate_url_protocol(url: str) -> str | None:
       """Validate URL has safe protocol scheme."""
       try:
           parsed = urllib.parse.urlparse(url)
           if parsed.scheme.lower() not in _SAFE_URL_SCHEMES:
               return None
           return url
       except Exception:
           return None
   
   # Apply in distribution_web_url() for all cases
   web_url = validate_url_protocol(raw_url)
   if not web_url:
       return None
   
   # Defense-in-depth at render layer
   def html_tr_a(url: str, text: str) -> htm.Element:
       """Render link with protocol validation."""
       safe_url = validate_url_protocol(url)
       if not safe_url:
           return htm.td[text]  # Render as text if unsafe
       return htm.td[htm.a(href=safe_url)[text]]
   ```
   
   Apply in `distribution_web_url()` for all cases (NPM, ArtifactHub, PyPI). 
Add defense-in-depth at render layer in `html_tr_a()` to validate URLs again 
before rendering.
   
   ### Acceptance Criteria
   - [ ] URL validation function created
   - [ ] Validation applied at storage
   - [ ] Validation applied at rendering
   - [ ] Dangerous protocols rejected
   - [ ] Integration test verifies rejection
   - [ ] Unit test verifying the fix
   
   ### References
   - Source reports: L1:1.2.2.md
   - Related findings: FINDING-070
   - ASVS sections: 1.2.2
   
   ### Priority
   Medium
   
   ---
   
   ---
   
   ### Consolidated: FINDING-070 - Missing URL Protocol Validation for SBOM 
Supplier URLs
   
   **ASVS Level(s):** [L1]
   
   **Description:**
   
   ### Summary
   The `supplier_op_from_url()` function in SBOM conformance processing accepts 
URLs from deps.dev API responses without protocol validation. When processing 
SBOM documents, the system queries the deps.dev API for Maven package homepage 
URLs and extracts the URL from the 'HOMEPAGE' link label. The fallback case 
accepts ANY URL as both the supplier name and URL without validating the 
protocol scheme. A `javascript:` or `data:` URL from the deps.dev API would be 
stored in the SBOM supplier URL field. If this data is later rendered in a web 
context with the URL as a clickable link, it could enable stored XSS.
   
   ### Details
   **Affected Files and Lines:**
   - `atr/sbom/conformance.py:104-115` - supplier_op_from_url without validation
   - `atr/sbom/conformance.py:124-132` - URL extraction from API
   
   The function accepts any URL without protocol validation, allowing dangerous 
protocols to be stored.
   
   ### Recommended Remediation
   Add protocol validation to `supplier_op_from_url()`:
   
   ```python
   def supplier_op_from_url(url: str) -> tuple[str, str] | None:
       """Extract supplier from URL with protocol validation."""
       try:
           parsed = urllib.parse.urlparse(url)
           
           # Validate protocol
           if parsed.scheme.lower() not in ('http', 'https'):
               return None
           
           # ... rest of function
       except Exception:
           return None
   ```
   
   Check `parsed.scheme.lower() in ('http', 'https')` and return None for 
non-HTTP(S) URLs. This prevents `javascript:`, `data:`, `file:`, and other 
dangerous protocols from being stored and potentially rendered.
   
   ### Acceptance Criteria
   - [ ] Protocol validation added
   - [ ] Only HTTP(S) URLs accepted
   - [ ] Dangerous protocols rejected
   - [ ] None returned for invalid URLs
   - [ ] Integration test verifies rejection
   - [ ] Unit test verifying the fix
   
   ### References
   - Source reports: L1:1.2.2.md
   - Related findings: FINDING-069
   - ASVS sections: 1.2.2
   
   ### Priority
   Medium
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Missing URL Protocol Validation for Third-Party Distribution URLs Rendered in HTML (tooling-trusted-releases)

Reply via email to