[ https://issues.apache.org/jira/browse/MRESOLVER-90?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897832#comment-16897832 ]
Jörg Hohwiller commented on MRESOLVER-90: ----------------------------------------- > > You could change the default so checksums are validated by default > I tried, it was pulled back for compat reasons. I will retry for 3.7.0. Awesome. Sounds great. Fingers crossed for 3.7.0. > > You could first download the checksums. If the downloaded checksum is > > containing HTML it is not a checksum and any further download for that > > artifact could already be aborted with an error. > What if the checksum file contains just {{123}} or something else, but not > HTML? Well, either you do a specific validation for checksums that ignores leading and trailing whitespaces and otherwise only accepts an alphanumeric word, or you be pragmatic and do not care about the rest (see next point). > > You could try to detect if the content is HTML (what is quite easy). > > Assuming the type is not "html" or "xhtml" you could consider it as invalid > Content type or sniffing? Sniffing. Content types are the same problem like HTTP status codes with form login. In an ideal world they are reliable and correct. However, Firefox still insists of showing the raw content of HTML files or SVGs if content type is not perfectly right. This is correct from the specification and an academic point of view. However, it is a pain for end-users. Ever tried to place SVGs in a github wiki? It would be much smarter of Firefox to show the content properly but raise a warning icon somewhere to still inform the makers that they are doing something wrong. > > You could at least add a validation for pom files. We know that POM files > > are XML and we even have a parser that can validate a POM. Therefore for > > POMs we could reject entirely invalid content before putting it persitenty > > into local repo > The POMs are already parsed by the model builder/parser and this would cause > duplicate proccess tasks which will impact performance. Of course it would be tricky to do it such that it is not parsed twice but it is still doable. Anyhow it might already be efficient to scan the first 512 bytes and check that the root tag matches with just a string lookahead. > Please look at > {{org.eclipse.aether.connector.basic.BasicRepositoryConnector.get(Collection<? > extends ArtifactDownload>, Collection<? extends MetadataDownload>)}} as well > as the > {{org.eclipse.aether.connector.basic.BasicRepositoryConnector.GetTaskRunner.fetchChecksum(URI, > File)}}. > This is a starting point to improve things. Thanks for pointing this out. I will have a look. > HTML content in POM: Maven should validate content before storing in local > repo > ------------------------------------------------------------------------------- > > Key: MRESOLVER-90 > URL: https://issues.apache.org/jira/browse/MRESOLVER-90 > Project: Maven Resolver > Issue Type: New Feature > Affects Versions: 1.4.0 > Environment: both with maven 3.6.0 in CMD or in Eclipse 4.9.0 > Reporter: Jörg Hohwiller > Priority: Major > > For some odd reasons somethimes errors just happen and a maven repo delivers > an HTML error or login page for a request of a POM or JAR file. It seems as > if the status code is valid then Maven (might be anything under the hood, > maybe even ether?) is saving the result without any sanity check or > validation. > Therefore I frequently end up with "POM" or "JAR" files in my local repo that > are no XML but HTML nonsens. > > Example: > {code:java} > <!-- > DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER. > > Copyright (c) 2007 Sun Microsystems Inc. All Rights Reserved > > The contents of this file are subject to the terms > of the Common Development and Distribution License > (the License). You may not use this file except in > compliance with the License. > You can obtain a copy of the License at > https://opensso.dev.java.net/public/CDDLv1.0.html or > opensso/legal/CDDLv1.0.txt > See the License for the specific language governing > permission and limitations under the License. > When distributing Covered Code, include this CDDL > Header Notice in each file and include the License file > at opensso/legal/CDDLv1.0.txt. > If applicable, add the following below the CDDL Header, > with the fields enclosed by brackets [] replaced by > your own identifying information: > "Portions Copyrighted [year] [name of copyright owner]" > $Id: index.html,v 1.2 2008/06/25 05:48:51 qcheng Exp $ > --> > <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> > <html> > <head> > <title>Please Wait While Redirecting to Login page</title> > <script language="JavaScript"> <!-- > function redirectToAuth() { > var params = getQueryParameters(); > var url = 'UI/Login'; > if (params != '') { > url += params; > } > top.location.replace(url); > } > function getQueryParameters() { > var loc = '' + location; > var idx = loc.indexOf('?'); > if (idx != -1) { > return loc.substring(idx); > } else { > return ''; > } > } > //--> > </script> > </head> > <body bgcolor="#FFFFFF" onLoad="redirectToAuth();"> > </body> > </html> > {code} > I would expect maven to verify the content before officially placing it in > the correct location inside the local maven repository on my disc. -- This message was sent by Atlassian JIRA (v7.6.14#76016)