[Python-Dev] thes mapping data type and thescfg cfg file module - review and suggestion request

2019-11-17 Thread Dave Cinege

If you are not aware:

 - Thesaurus is a mapping data type with recursive keypath map
and attribute aliasing. It is a subclass of dict() and is mostly
compatible as a general use dictionary replacement.

 - ThesaurusExtended is a subclass of Thesaurus providing additional 
usability methods such as recursive key and value searching.


 - ThesaurusCfg is a subclass of ThesaurusExtended providing a nested
key configuration file parser and per key data coercion methods.

The README.rsl will give a better idea:
https://git.cinege.com/thesaurus/


After 7 years I might have reached the point of 'interesting' with
my Thesaurus and ThesaurusCfg modules.

To anyone that is overly bored, I'd appreciate your terse review and 
comments on how mundane and worthless they still actually are. :-)


I'm primarily interested in suggestions to what I've done conceptually 
here. While I'm not completely ashamed of the state of the Thesaurus 
code, ThesaurusCfg is not far beyond the original few hours I slapped it 
together one day in frustration.


After considering suggestions I intend to make changes towards a formal 
release.


Thanks in advance,

Dave
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CTXKJUTTJKOS47T6XI2O6WU7EYVEQQ3N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: thes mapping data type and thescfg cfg file module - review and suggestion request

2019-11-18 Thread Dave Cinege

Hello Tal,

Yes I understand that. I posted this here because it's been suggested by 
those less in the know that Thesaurus and ThesaurusCfg (or some of the 
the concepts contained in them) might have a place in the future mainline.


Remember that Thesaurus is a data type and ThesaurusCfg is a alternative 
to configparser. Additionally I could bring this full circle to 
something that integrates with or provides an alternative to argparse. 
(providing an integrated cfg file and command line argument solution 
which currently does not exist.)


I hope with this in mind this thread is considered on-topic and I'd 
highly value any feedback from the 'serious people' on this list if they 
should take a look.


Dave



On 2019/11/18 08:51, Tal Einat wrote:

Hi Dave,

Thesaurus looks interesting and it is obvious that you've put a lot of 
effort into it!


This list is for the discussion of the development *of* Python itself, 
however, rather than development *with* Python, so it's not an 
appropriate place for such posts.


I suggest you post this on python-list and/or python-announce, to get 
this in front of a wider audience.


- Tal Einat

On Mon, Nov 18, 2019 at 7:17 AM Dave Cinege <mailto:d...@cinege.com>> wrote:


If you are not aware:

   - Thesaurus is a mapping data type with recursive keypath map
and attribute aliasing. It is a subclass of dict() and is mostly
compatible as a general use dictionary replacement.

   - ThesaurusExtended is a subclass of Thesaurus providing additional
usability methods such as recursive key and value searching.

   - ThesaurusCfg is a subclass of ThesaurusExtended providing a nested
key configuration file parser and per key data coercion methods.

The README.rsl will give a better idea:
https://git.cinege.com/thesaurus/


After 7 years I might have reached the point of 'interesting' with
my Thesaurus and ThesaurusCfg modules.

To anyone that is overly bored, I'd appreciate your terse review and
comments on how mundane and worthless they still actually are. :-)

I'm primarily interested in suggestions to what I've done conceptually
here. While I'm not completely ashamed of the state of the Thesaurus
code, ThesaurusCfg is not far beyond the original few hours I
slapped it
together one day in frustration.

After considering suggestions I intend to make changes towards a formal
release.

Thanks in advance,

Dave
___
Python-Dev mailing list -- python-dev@python.org
<mailto:python-dev@python.org>
To unsubscribe send an email to python-dev-le...@python.org
<mailto:python-dev-le...@python.org>
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at

https://mail.python.org/archives/list/python-dev@python.org/message/CTXKJUTTJKOS47T6XI2O6WU7EYVEQQ3N/
Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VXA7EZYVADF3I3OV7WADMMD5PO4YB2OU/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: thes mapping data type and thescfg cfg file module - review and suggestion request

2019-11-18 Thread Dave Cinege

Tal,

On 2019/11/18 10:59, Tal Einat wrote:


These days, thanks to pip and PyPI, anyone can publish libraries and it 
is easy for developers to find and use them if they like. There is no 
longer a need to add things to the stdlib just to make them widely 
available.


Fair enough, only that you possibly neglected to remember the benefits 
to my ego if I get something accepted into mainline. :-D


Framing this a little more specifically:

Aside from my original Thesaurus release in Dec 2012, there have been 
many many attempts at extended/recursive dictionary objects. The use 
case for such a thing is well established. (JSON maybe enough to mention)


I would argue that a new standardized mapping datatype, possibly (some 
form of) Thesaurus, could be warranted in stdlib.


At this stage I consider Thesaurus 'serious' and 'interesting' because:

- I've been using some version of it for 7 years in my own
production code; many of the concepts in it are mature.

- It's tightly wrapped around and highly compatible with dict.

- I've paid close attention to performance

- The recent extended features for tree searches and the slicing
you can do with 'keypaths' (my own term) appears novel.

I am also sure that some will consider things I currently do in 
Thesaurus complete heresy. For now I'll point to ThesaurusCfg as an 
actually use case. (But still consider my use of coercion methods really 
cool :-)


My goal is to attempt to take Thesaurus toward a standardized or de 
facto recursive mapping object and I welcome any dialog that results.


Dave

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QLO2QDSAD3A3NUDJOGIJA2NUHD3N2GGN/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: thes mapping data type and thescfg cfg file module - review and suggestion request

2019-11-18 Thread Dave Cinege

Guido,

Thank you for your generous comments. All of them.

On 2019/11/18 15:33, Guido van Rossum wrote:

It won't be easy to add this to the stdlib.


Maybe there are other existing threads that considered a new recursive 
mapping object for stblib that someone can point me to?


One thing that bugs me in 
particular is that you can access key via '.attr' notation (the example 
shows t['a']['b']['c']['d'] as equivalent to t.a.b.c.d). This feels 
problematic: What should happen if the key happens to be the name of a 
method (e.g. .keys or .update)? The choices you have are either to mask 
the method or to mask the key. Neither solution seems ideal. JavaScript 
has this equivalence and it makes me very uncomfortable.


Well, let's see:

>>> t.set_path('a.b.c.update', 'Hello')
>>> t.a.b.c.update

>>> t.a.b.c['update']
'Hello'

Attribute aliasing has limitations the coder must understand. Thesaurus 
attempts to deal with them gracefully. When in doubt normal key names 
should always work (including keys with dots in them) and if they don't 
I have something to fix. With the focus on being a datatype primitive I 
consider anything more to be the job of higher level modules. 
(ThesaurusCfg being an example)


The benefit of course is dramatically cleaner code for heavily nested 
objects and recursive string replacement compatibility. (f-string, 
printf, format, template)


And with that said, if the only obstacle to a new recursive mapping 
object for stblib is attribute aliasing, then it can be removed.


Dave

I'm not denying that Thesaurus[Cfg] looks useful. But, like Tal, I must 
stress that that's not enough to consider inclusion in the stdlib.


On Mon, Nov 18, 2019 at 4:25 PM Dave Cinege <mailto:d...@cinege.com>> wrote:


Hello Tal,

Yes I understand that. I posted this here because it's been
suggested by
those less in the know that Thesaurus and ThesaurusCfg (or some of the
the concepts contained in them) might have a place in the future
mainline.

Remember that Thesaurus is a data type and ThesaurusCfg is a
alternative
to configparser. Additionally I could bring this full circle to
something that integrates with or provides an alternative to argparse.
(providing an integrated cfg file and command line argument solution
which currently does not exist.)

I hope with this in mind this thread is considered on-topic and I'd
highly value any feedback from the 'serious people' on this list if
they
should take a look.

Dave



On 2019/11/18 08:51, Tal Einat wrote:
 > Hi Dave,
 >
 > Thesaurus looks interesting and it is obvious that you've put a
lot of
 > effort into it!
 >
 > This list is for the discussion of the development *of* Python
itself,
 > however, rather than development *with* Python, so it's not an
 > appropriate place for such posts.
     >
 > I suggest you post this on python-list and/or python-announce, to
get
 > this in front of a wider audience.
 >
 > - Tal Einat
 >
 > On Mon, Nov 18, 2019 at 7:17 AM Dave Cinege mailto:d...@cinege.com>
 > <mailto:d...@cinege.com <mailto:d...@cinege.com>>> wrote:
 >
 >     If you are not aware:
 >
 >        - Thesaurus is a mapping data type with recursive keypath map
 >     and attribute aliasing. It is a subclass of dict() and is mostly
 >     compatible as a general use dictionary replacement.
 >
 >        - ThesaurusExtended is a subclass of Thesaurus providing
additional
 >     usability methods such as recursive key and value searching.
 >
 >        - ThesaurusCfg is a subclass of ThesaurusExtended
providing a nested
 >     key configuration file parser and per key data coercion methods.
 >
 >     The README.rsl will give a better idea:
 > https://git.cinege.com/thesaurus/
 >
 >
 >     After 7 years I might have reached the point of 'interesting'
with
 >     my Thesaurus and ThesaurusCfg modules.
 >
 >     To anyone that is overly bored, I'd appreciate your terse
review and
 >     comments on how mundane and worthless they still actually
are. :-)
 >
 >     I'm primarily interested in suggestions to what I've done
conceptually
 >     here. While I'm not completely ashamed of the state of the
Thesaurus
 >     code, ThesaurusCfg is not far beyond the original few hours I
 >     slapped it
 >     together one day in frustration.
 >
 >     After considering suggestions I intend to make changes
towards a for

[Python-Dev] New string method - splitquoted

2006-05-17 Thread Dave Cinege
Very oftenmake that very very very very very very very very very often,
I find myself processing text in python that  when .split()'ing a line, I'd 
like to exclude the split for a 'quoted' item...quoted because it contains 
whitespace or the sep char.

For example:

s = '  Chan: 11  SNR: 22  ESSID: "Spaced Out Wifi"  Enc: On'

If I want to yank the essid in the above example, it's a pain. But with my new 
dandy split quoted method, we have a 3rd argument to .split() that we can 
spec the quote delimiter where no splitting will occur, and the quote char 
will be dropped:

s.split(None,-1,'"')[5]
'Spaced Out Wifi'

Attached is a proof of concept patch against 
Python-2.4.1/Objects/stringobject.c  that implements this. It is limited to 
whitespace splitting only. (sep == None)

As implemented the quote delimiter also doubles as an additional separator for 
the spliting out a substr. 

For example:
'There is"no whitespace before these"quotes'.split(None,-1,'"')
['There', 'is', 'no whitespace before these', 'quotes']

This is useful, but possibly better put into practice as a separate method??

Comments please.

Dave
--- stringobject.c.orig	2006-05-17 16:12:13.0 -0400
+++ stringobject.c	2006-05-17 23:49:52.0 -0400
@@ -1336,6 +1336,85 @@
 	return NULL;
 }
 
+// dc: split quoted example
+// 'This string has  "not only this" "and this" but"this mixed in string"as well as this "" empty one and two more at the end'.split(None,-1,'"')
+// CORRECT: ['This', 'string', 'has', 'not only this', 'and this', 'but', 'this mixed in string', 'as', 'well', 'as', 'this', '', 'empty', 'one', 'and', 'two', 'more', 'at', 'the', 'end', '', '']
+static PyObject *
+split_whitespace_quoted(const char *s, int len, int maxsplit, const char *qsub)
+{
+	int i, j, quoted = 0;
+	PyObject *str;
+	PyObject *list = PyList_New(0);
+
+	if (list == NULL)
+		return NULL;
+
+	for (i = j = 0; i < len; ) {
+			
+		if (!quoted) {
+			while (i < len && isspace(Py_CHARMASK(s[i])) )
+i++;
+		}
+		
+		if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) {
+			quoted = 1;
+			i++;
+		}
+		
+		j = i;
+			
+		while (i < len) {
+			if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) {	
+if (quoted)	
+	quoted = 2;	// End of quotes found 
+else {
+	quoted = 1;	// Else start of new quotes in the middle of a string
+}
+break;
+			} else if (!quoted && isspace(Py_CHARMASK(s[i])))
+	break;
+			i++;
+		}
+		
+		if (quoted == 2 && j == i) {	// Empty string in quotes
+			SPLIT_APPEND("", 0, 0);
+			quoted = 0;
+			i++;
+			j = i;
+
+		} else if (j < i) {
+			if (maxsplit-- <= 0)
+break;
+			SPLIT_APPEND(s, j, i);
+	
+			if (quoted == 2) {
+quoted = 0;
+i++;
+			} else if (quoted == 1) {
+i++;
+if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) { // Embedded empty string in quotes (at end of string?)
+	SPLIT_APPEND("", 0, 0);
+	quoted = 0;
+	i++;
+}
+			} else {
+while (i < len && isspace(Py_CHARMASK(s[i])))
+	i++;
+			}
+			
+			j = i;
+		}
+	}
+	if (j < len) {
+		SPLIT_APPEND(s, j, len);
+	}
+	return list;
+  onError:
+	Py_DECREF(list);
+	return NULL;
+}
+
+
 static PyObject *
 split_char(const char *s, int len, char ch, int maxcount)
 {
@@ -1376,15 +1455,27 @@
 static PyObject *
 string_split(PyStringObject *self, PyObject *args)
 {
-	int len = PyString_GET_SIZE(self), n, i, j, err;
+	int len = PyString_GET_SIZE(self), n, qn, i, j, err;
 	int maxsplit = -1;
-	const char *s = PyString_AS_STRING(self), *sub;
-	PyObject *list, *item, *subobj = Py_None;
+	const char *s = PyString_AS_STRING(self), *sub, *qsub;
+	PyObject *list, *item, *subobj = Py_None, *qsubobj = Py_None;
 
-	if (!PyArg_ParseTuple(args, "|Oi:split", &subobj, &maxsplit))
+	if (!PyArg_ParseTuple(args, "|OiO:split", &subobj, &maxsplit, &qsubobj))
 		return NULL;
 	if (maxsplit < 0)
 		maxsplit = INT_MAX;
+	if (qsubobj != Py_None) {
+		if (PyString_Check(qsubobj)) {
+			qsub = PyString_AS_STRING(qsubobj);
+			qn = PyString_GET_SIZE(qsubobj);
+		}
+		if (qn == 0) {
+			PyErr_SetString(PyExc_ValueError, "empty delimiter");
+			return NULL;
+		}
+		if (subobj == Py_None)
+			return split_whitespace_quoted(s, len, maxsplit, qsub);
+	}		
 	if (subobj == Py_None)
 		return split_whitespace(s, len, maxsplit);
 	if (PyString_Check(subobj)) {
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New string method - splitquoted

2006-05-18 Thread Dave Cinege

On Thursday 18 May 2006 03:00, Heiko Wundram wrote:
> Am Donnerstag 18 Mai 2006 06:06 schrieb Dave Cinege:
> > This is useful, but possibly better put into practice as a separate
> > method??
>
> I personally don't think it's particularily useful, at least not in the
> special case that your patch tries to address.

Well I'm thinking along the lines of a method to extract only quoted substr's:
' this is "something" and"nothing else"but junk'.splitout('"')
['something ', 'nothing else']

Useful? I dunno

> splitters), but if you have more complicated quoting operators (such as
> """), are you sure it's sensible to implement the logic in split()?

Probably not. See below...

> 2) What should the result of "this is a \"test string".split(None,-1,'"')
> be? An exception (ParseError)?

I'd probably vote for that. However my current patch will simply play dumb and stop split'ing the rest of the line, dropping the first quote.

'this is a "test string'.split(None,-1,'"')
['this', 'is', 'a', 'test string']

> Silently ignoring the missing delimiter, and 
> returning ['this','is','a','test string']? Ignoring the delimiter
> altogether, returning ['this','is','a','"test','string']? I don't think
> there's one case to satisfy all here...

Well the point to the patch is a KISS approach to extending the split() method just slightly to exclude a range of substr from split'ing by delimiter, not to engage in further text processing. 

I'm dealing with this ALL the time, while processing output from other programs. (Windope) fIlenames, (poorly considered) wifi network names, etc. For me it's always some element with whitespace in it and double quotes surrounding it, that otherwise I could just use a slice to dump the quotes for the needed element

'filename: "/root/tmp.txt"'.split()[1] [1:-1]
'/root/tmp.txt'
OK

'filename: "/root/is a bit slow.txt"'.split()[1] [1:-1]
'/root/i'
NOT OK

This exact bug just zapped me in a product I have, that I didn't forsee whitespace turning up in that element.

Thus my patch:
'filename: "/root/is a bit slow.txt"'.split(None,-1,'"')[1]
'/root/is a bit slow.txt'
LIFE IS GOOD

> 3) What about escapes of the delimiter? Your current patch doesn't address
> them at all (AFAICT) at the moment, 

And it wouldn't, just like the current split doesn't.
'this is a \ test string'.split()
['this', 'is', 'a', '\\', 'test', 'string']

> Don't get me wrong, I personally find this functionality very, very
> interesting (I'm +0.5 on adding it in some way or another), especially as a
> part of the standard library (not necessarily as an extension to .split()).

I'd be happy to have this in as .splitquoted(), but once you use it, it seems more to me like a natural 'ought to be there' extension to split itself.
>
> Why not write up a PEP?

Because I have no idea of the procedure.   : )  URL?

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New string method - splitquoted

2006-05-18 Thread Dave Cinege
On Thursday 18 May 2006 04:21, Giovanni Bajo wrote:

> It's already there. It's called shlex.split(), and follows the semantic of
> a standard UNIX shell, including escaping and other things.

Not quite. As I said in my other post, simple is the idea for this, just like 
the split method itself.  (no escaping, etc.just recognizing delimiters 
as an exception to the split seperatation) 

shlex.split() does not let one choose the separator or use a maxsplit, nor is 
it a pure method to strings.

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New string method - splitquoted - New EmailAddress

2006-05-20 Thread Dave Cinege
Sorry to all about tmda on my dcinege-mlists email addy. It was not supposed 
to be, however the dash in dcinege-mlists was flipping out the latest 
incarnation of my mail server config. Please use this address to reply to me
in this thread.

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New string method - splitquoted

2006-05-20 Thread Dave Cinege
On Thursday 18 May 2006 11:11, Guido van Rossum wrote:
> This is not an apropriate function to add as a string methods. There
> are too many conventions for quoting and too many details to get
> right. One method can't possibly handle them all without an enormous
> number of weird options. It's better to figure out how to do this with
> regexps or use some of the other approaches that have been suggested.
> (Did anyone mention the csv module yet? It deals with this too.)

Maybe my idea is better called splitexcept instead of splitquoted, as my goal 
is to (simply) provide a way to limit the split by delimiters, and not dive 
into an all encompassing quoting algorithm.

It me this is in the spirit of the maxsplit option already present.

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New string method - splitquoted

2006-05-20 Thread Dave Cinege
On Thursday 18 May 2006 16:13, you wrote:
> Dave Cinege wrote:
> > For example:
> >
> > s = '  Chan: 11  SNR: 22  ESSID: "Spaced Out Wifi"  Enc: On'
>
> My complaint with this example is that you are just using the wrong tool
> to do this job. If I was going to do this, I would've immediately jumped
> on the regex-press train.
>
> wifi_info = re.match('^\s+'
>   'Chan:\s+(?P[0-9]+)\s+'
>   'SNR:\s+(?P[0-9]+)\s+'
>   'ESSID:\s+"(?P[^"]*)"\s+'
>   'Enc:\s+(?P[a-zA-Z]+)'
>   , s)

For the 5 years of been pythoning, I've used re probably twice. 
I find regex to be a tool of last resort, and quite a bit of effort to get 
right, as regex (for me) is quite prone it giving unintended results without 
a good deal of thought. I don't want to have to think. That's why I use 
python.  : )

.split() and slicing has always been python's holy grail for me, and I find it 
a lot easier to .replace() 'stray' chars with spaces or a delimiter and then 
split() that.  It's easier to read and (should be) a lot quicker to process 
then regex. (Which I care about, as I'm also often on embedded CPU's of a few 
hundred MHz)

So .split works just super duper.but I keep running in to situations where 
I'd like a substr to be excluded from the split'ing.

The clearest one is excluding a 'quoted' string that has whitespace.
Here's another, be it, a very poor example: 

s = '\t\tFrequency:2.462 GHz (Channel 11)'  # This is real output from 
iwlist:
s.replace(':',')').replace(' (','))').split(None,-1,')')
['Frequency', '2.462 GHz', 'Channel 11']

I wanted to preserve the '2.462 GHz' substr. Let's assume, that could come out 
as '900 MHz' or '11.3409 GHz'. The above code gets what I want in 1 shot, 
either way. Show me an easier way, that doesn't need multiple splits, and 
string re-assembly, and I'll use it.

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com