[issue46337] urllib.parse: Allow more flexibility in schemes and URL resolution behavior

2022-01-10 Thread Lincoln Auster


New submission from Lincoln Auster :

It looks like this was discussed in 2013-2015 here: 
https://bugs.python.org/issue18828

Basically, with all the URL schemes that exist in the world (and the 
possibility of a custom scheme), the current strategy of enumerating what do 
what in a hard-coded variable is a bit ... weird. Among the proposed solutions 
in 18828, some were:

+ Have a global registry of what schemes do what (criticized for being 
overkill, and I can't say I disagree)
+ Get rid of the scheme lists altogether, and assume every scheme supports 
everything (isn't backwards compatible; might break with intended behavior, 
too).
+ Switch the use_relative whitelist to a blacklist: (maybe fine in practice, 
maybe not; either way it doesn't really fix the underlying issue)
+ Work around it with global state (modify the uses_* lists; this is what I'm 
doing in my code, and I can't say I like it much).

An alternative implemented I've implemented in my fork 
(https://github.com/lincolnauster/cpython/tree/urllib-custom-schemes) is to 
have an Enum with all the weird scheme-based behaviors that may occur 
(urllib.parse.SchemeClass in my fork) and allow passing a set of those Enums to 
functions relying on scheme-specific behavior, and adding all the elements of 
that set to what's been determined by the scheme. (See the test case for a 
concrete example; this explanation is not great).

Some things I like about this:
+ Backwards compatibility.
+ It makes the functions using it as a general strategy a bit more pure.
+ It makes client code deal with edge cases.

Some things that could be changed:
+ There's no way to remove behaviors you *don't* want.
+ It makes client code deal with edge cases.

As a side thought: if the above could be adopted, the uses_* lists could be 
enforced as immutable, which, while breaking compatibility, could make client 
code a bit cleaner.

--
components: Library (Lib)
messages: 410259
nosy: lincolnauster
priority: normal
severity: normal
status: open
title: urllib.parse: Allow more flexibility in schemes and URL resolution 
behavior
type: behavior

___
Python tracker 
<https://bugs.python.org/issue46337>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46337] urllib.parse: Allow more flexibility in schemes and URL resolution behavior

2022-01-10 Thread Lincoln Auster


Change by Lincoln Auster :


--
keywords: +patch
pull_requests: +28721
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/30520

___
Python tracker 
<https://bugs.python.org/issue46337>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46337] urllib.parse: Allow more flexibility in schemes and URL resolution behavior

2022-02-11 Thread Lincoln Auster


Lincoln Auster  added the comment:

> Maybe a new parse function, or new parameter to the existing one,
> could be easier to add.

If I'm understanding you right, that's what this (and the PR) is - an
extra optional parameter to urllib.parse to supplement the existing
(legacy?) hard-coded list.

--

___
Python tracker 
<https://bugs.python.org/issue46337>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46337] urllib.parse: Allow more flexibility in schemes and URL resolution behavior

2022-02-12 Thread Lincoln Auster


Lincoln Auster  added the comment:

> In my idea it would not be a list of things that you have to pass
> piecemeal to request specific behaviour, but another function or a new
> param (like `parse(string, universal=True)`) that implements universal
> parsing.

If I'm correct in my understanding of a universal parse function (a
function with all the SchemeClasses enabled unilaterally), some
parse_universal function would be a pretty trivial thing to add with the
API I've already got here (though it wouldn't address 22852 without some
extra work afaict). I do think keeping the 'piecemeal' options exposed
has some utility, though, especially since the uses_* lists already
treat them on such a granular level.

Do we think a parse_universal function would be helpful to add on top of
this, or just repetitive?

--

___
Python tracker 
<https://bugs.python.org/issue46337>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com