Re: Name mangling vs qualified access to class attributes

2016-12-14 Thread dieter
[email protected] writes:

> The official Python tutorial at
>
> https://docs.python.org/3/tutorial/classes.html#private-variables
>
> says that "name mangling is helpful for letting subclasses override methods 
> without breaking intraclass method calls" and makes an interesting example:
>
> class Mapping:
> def __init__(self, iterable):
> self.items_list = []
> self.__update(iterable)
>
> def update(self, iterable):
> for item in iterable:
> self.items_list.append(item)
>
> __update = update   # private copy of original update() method
>
> class MappingSubclass(Mapping):
>
> def update(self, keys, values):
> # provides new signature for update()
> # but does not break __init__()
> for item in zip(keys, values):
> self.items_list.append(item)
>
>
> It seems to me that, in this example, one could just have:
>
> class Mapping:
> def __init__(self, iterable):
> self.items_list = []
> Mapping.update(self, iterable)
>
> def update(self, iterable):
> for item in iterable:
> self.items_list.append(item)
>
> and avoid copying 'Mapping.update' into 'Mapping.__update'. More generally, 
> any time one needs to "let subclasses override methods without breaking 
> intraclass method calls" (the goal stated in the tutorial), using qualified 
> access to class attributes/methods should suffice.
>
> Am I missing something? Is 'self.__update(iterable)' in 'Mapping.__init__' 
> preferable to 'Mapping.update(self, iterable)'?
>
> I think that, instead, name mangling is helpful to avoid accidental overrides 
> of methods/attributes by the *current* class (rather than its subclasses). 
> Given the way that C3 linearization works, you can't know in advance who will 
> follow your class A in B.__mro__ when B extends A. Name mangling allows you 
> to avoid overriding methods/attributes of classes that might follow.
>
> Any thoughts?

You can do that indeed for class level attributes (such as methods);
you cannot do it for instance level attributes (e.g. holding instance specific
values).

>From my point of view, "__" name mangling is particularly interesting
for mixin classes (i.e. classes implementing a single feature
and being designed to be used in deriving classes combining the necessary
features by deriving from all features classes required).
Those classes are typically combined with other classes - with a
corresponding risk of name clashes. Therefore, the above name mangling
is helpful to reduce that risk for private attributes (whether methods
or data attributes).

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a way to insert hooks into a native dictionary type to see when a query arrives and what's looked up?

2016-12-14 Thread Steven D'Aprano
On Wednesday 14 December 2016 17:11, Veek M wrote:

> I know that with user classes one can define getattr, setattr to handle
> dictionary lookup. Is there a way to hook into the native dict() type
> and see in real time what's being queried.

Not easily, and maybe not at all.

There are two obvious ways to do this:

(1) monkey-patch the object's __dict__, and the class __dict__.

Unfortunately, Python doesn't support monkey-patching built-ins.

https://en.wikipedia.org/wiki/Monkey_patch

Or perhaps I should say, *fortunately* Python doesn't support it.

http://www.virtuouscode.com/2008/02/23/why-monkeypatching-is-destroying-ruby/

(2) Alternatively, you could make a dict subclass, and replace the class and 
instance __dict__ with your own.

Unfortunately, you cannot replace the __dict__ of a class:

py> class X:  # the class you want to hook into
... pass
... 
py> class MyDict(dict):  # my custom dict
... def __getitem__(self, key):
... print(key)
... return super().__getitem__(key)
... 
py> d = MyDict()
py> d.update(X.__dict__)
py> X.__dict__ = d
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: attribute '__dict__' of 'type' objects is not writable


You can replace the instance dict, but Python won't call your __getitem__ 
method:

py> instance = X()
py> instance.__dict__ = MyDict()
py> instance.a = 999
py> instance.a
999

So the short answer is, No.

You might be able to create a completely new metaclass that supports this, but 
it would be a lot of work, and I'm not even sure that it would be successful.



> I wanted to check if when one does:
> 
> x.sin()
> 
> if the x.__dict__ was queried or if the Foo.__dict__ was queried..

The easiest way to do that is something like this:


py> class Test:
... def sin(self):
... return 999
... 
py> x = Test()
py> x.sin
>
py> x.sin()
999
py> x.sin = "surprise!"
py> x.sin
'surprise!'



So now you know: an instance attribute will shadow the class attribute.

(Actually, that's not *completely* true. It depends on whether x.sin is a 
descriptor or not, and if so, what kind of descriptor.)


-- 
Steven
"Ever since I learned about confirmation bias, I've been seeing 
it everywhere." - Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Parsing a potentially corrupted file

2016-12-14 Thread Paul Moore
I'm looking for a reasonably "clean" way to parse a log file that potentially 
has incomplete records in it.

The basic structure of the file is a set of multi-line records. Each record 
starts with a series of fields delimited by [...] (the first of which is always 
a date), optionally separated by whitespace. Then there's a trailing "free 
text" field, optionally followed by a multi-line field delimited by [[...]]

So, example records might be

[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id] Description of 
the issue goes here

(a record delimited by the end of the line)

or 

[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id] Description of 
the issue goes here [[Additional
data, potentially multiple lines

including blank lines
goes here
]]

The terminating ]] is on a line of its own.

This is a messy format to parse, but it's manageable. However, there's a catch. 
Because the logging software involved is broken, I can occasionally get a log 
record prematurely terminated with a new record starting mid-stream. So 
something like the following:

[2016-11-30T20:04:08.000+00:00] [Component] [le[2016-11-30T20:04:08.000+00:00] 
[Component] [level] [] [] [id] Description of the issue goes here

I'm struggling to find a "clean" way to parse this. I've managed a clumsy 
approach, by splitting the file contents on the pattern 
[ddd-dd-ddTdd:dd:dd.ddd+dd:dd] (the timestamp - I've never seen a case where 
this gets truncated) and then treating each entry as a record and parsing it 
individually. But the resulting code isn't exactly maintainable, and I'm 
looking for something cleaner.

Does anyone have any suggestions for a good way to parse this data?

Thanks,
Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing a potentially corrupted file

2016-12-14 Thread Chris Angelico
On Wed, Dec 14, 2016 at 10:43 PM, Paul  Moore  wrote:
> This is a messy format to parse, but it's manageable. However, there's a 
> catch. Because the logging software involved is broken, I can occasionally 
> get a log record prematurely terminated with a new record starting 
> mid-stream. So something like the following:
>
> [2016-11-30T20:04:08.000+00:00] [Component] 
> [le[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id] Description 
> of the issue goes here
>
> I'm struggling to find a "clean" way to parse this. I've managed a clumsy 
> approach, by splitting the file contents on the pattern 
> [ddd-dd-ddTdd:dd:dd.ddd+dd:dd] (the timestamp - I've never seen a case where 
> this gets truncated) and then treating each entry as a record and parsing it 
> individually. But the resulting code isn't exactly maintainable, and I'm 
> looking for something cleaner.
>

Is the "[Component]" section something you could verify? (That is - is
there a known list of components?) If so, I would include that as a
secondary check. Ditto anything else you can check (I'm guessing the
[level] is one of a small set of values too.) The logic would be
something like this:

Read line from file.
Verify line as a potential record:
Assert that line begins with timestamp.
Verify as many fields as possible (component, level, etc)
Search line for additional timestamp.
If additional timestamp found:
Recurse. If verification fails, assume we didn't really have a
corrupted line.
(Process partial line? Or discard?)
If "[[" in line:
Until line is "]]":
Read line from file, append to description
If timestamp found:
Recurse. If verification succeeds, break out of loop.

 Unfortunately it's still not really clean; but that's the nature of
working with messy data. Coping with ambiguity is *hard*.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing a potentially corrupted file

2016-12-14 Thread alister
On Wed, 14 Dec 2016 03:43:44 -0800, Paul  Moore wrote:

> I'm looking for a reasonably "clean" way to parse a log file that
> potentially has incomplete records in it.
> 
> The basic structure of the file is a set of multi-line records. Each
> record starts with a series of fields delimited by [...] (the first of
> which is always a date), optionally separated by whitespace. Then
> there's a trailing "free text" field, optionally followed by a
> multi-line field delimited by [[...]]
> 
> So, example records might be
> 
> [2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id]
> Description of the issue goes here
> 
> (a record delimited by the end of the line)
> 
> or
> 
> [2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id]
> Description of the issue goes here [[Additional data, potentially
> multiple lines
> 
> including blank lines goes here ]]
> 
> The terminating ]] is on a line of its own.
> 
> This is a messy format to parse, but it's manageable. However, there's a
> catch. Because the logging software involved is broken, I can
> occasionally get a log record prematurely terminated with a new record
> starting mid-stream. So something like the following:
> 
> [2016-11-30T20:04:08.000+00:00] [Component]
> [le[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id]
> Description of the issue goes here
> 
> I'm struggling to find a "clean" way to parse this. I've managed a
> clumsy approach, by splitting the file contents on the pattern
> [ddd-dd-ddTdd:dd:dd.ddd+dd:dd] (the timestamp - I've never seen a case
> where this gets truncated) and then treating each entry as a record and
> parsing it individually. But the resulting code isn't exactly
> maintainable, and I'm looking for something cleaner.
> 
> Does anyone have any suggestions for a good way to parse this data?
> 
> Thanks,
> Paul

1st question do you (or anyone you can contact) have any control over the 
logging application?

if so the best approach would be to get the log file output fixed.

if not then you will probably be stuck with a messy solution :-(



-- 
Sin has many tools, but a lie is the handle which fits them all.
-- 
https://mail.python.org/mailman/listinfo/python-list


Wrong release date in 3.6 whats new docs?

2016-12-14 Thread Nick Sarbicki
Afternoon everyone.

Might be missing something obvious but the 3.6 What's New docs point to the
release date being the 12th.

https://docs.python.org/3.6/whatsnew/3.6.html#what-s-new-in-python-3-6

I got the team excited about Friday's release so that caused some confusion
here.

Guessing it's a typo?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing a potentially corrupted file

2016-12-14 Thread Paul Moore
On Wednesday, 14 December 2016 12:57:23 UTC, Chris Angelico  wrote:
> Is the "[Component]" section something you could verify? (That is - is
> there a known list of components?) If so, I would include that as a
> secondary check. Ditto anything else you can check (I'm guessing the
> [level] is one of a small set of values too.)

Possibly, although this is to analyze the structure of a basically undocumented 
log format. So if I validate too tightly, I end up just checking my assumptions 
rather than checking the data :-(

> The logic would be
> something like this:
> 
> Read line from file.
> Verify line as a potential record:
> Assert that line begins with timestamp.
> Verify as many fields as possible (component, level, etc)
> Search line for additional timestamp.
> If additional timestamp found:
> Recurse. If verification fails, assume we didn't really have a
> corrupted line.
> (Process partial line? Or discard?)
> If "[[" in line:
> Until line is "]]":
> Read line from file, append to description
> If timestamp found:
> Recurse. If verification succeeds, break out of loop.
> 
>  Unfortunately it's still not really clean; but that's the nature of
> working with messy data. Coping with ambiguity is *hard*.

Yeah, that's essentially what I have now. As I say, it's working but nobody 
could really love it. But you're right, it's more the fault of the data than of 
the code.

One thought I had, which I might try, is to go with the timestamp as the one 
assumption I make of the data, and read the file in as, in effect, a text 
stream, spitting out a record every time I see something matching a the 
[timestamp] pattern. Then parse record by record. Truncated records should 
either be obvious (because the delimited fields have start and end markers, so 
unmatched markers = truncated record) or acceptable (because undelimited fields 
are free text). I'm OK with ignoring the possibility that the free text 
contains something that looks like a timestamp.

The only problem with this approach is that I have more data than I'd really 
like to read into memory all at once, so I'd need to do some sort of streamed 
match/split processing. But thinking about it, that sounds like the sort of job 
a series of chained generators could manage. Maybe I'll look at that approach...

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Name mangling vs qualified access to class attributes

2016-12-14 Thread Steve D'Aprano
On Wed, 14 Dec 2016 07:27 am, [email protected] wrote:

> The official Python tutorial at
> 
> https://docs.python.org/3/tutorial/classes.html#private-variables
> 
> says that "name mangling is helpful for letting subclasses override
> methods without breaking intraclass method calls" and makes an interesting
> example:
> 
> class Mapping:
> def __init__(self, iterable):
> self.items_list = []
> self.__update(iterable)
> 
> def update(self, iterable):
> for item in iterable:
> self.items_list.append(item)
> 
> __update = update   # private copy of original update() method
> 
> class MappingSubclass(Mapping):
> 
> def update(self, keys, values):
> # provides new signature for update()
> # but does not break __init__()
> for item in zip(keys, values):
> self.items_list.append(item)
> 
> 
> It seems to me that, in this example, one could just have:
> 
> class Mapping:
> def __init__(self, iterable):
> self.items_list = []
> Mapping.update(self, iterable)
> 
> def update(self, iterable):
> for item in iterable:
> self.items_list.append(item)
> 
> and avoid copying 'Mapping.update' into 'Mapping.__update'. 

Perhaps.

But remember that copying Mapping.update in this way is very cheap: it's
only a new reference (e.g. a copy of a pointer), it doesn't have to copy
the entire function object.

The differences between:

Mapping.update(self, iterable)

and

self.__update(iterable)

are very subtle and (as far as I can see) only matter in some fairly hairy
situations. Thanks to name mangling, the second is equivalent to:

self._Mapping__update(iterable)

which gives subclasses the opportunity to override it, if they dare. They
probably shouldn't, because it is a private method, but it you really,
really need to, you can.

A more exotic difference is that the first example looks directly at the
class, while the second checks for an instance attribute first, giving the
instance the opportunity to shadow _Mapping__update.

One last subtle difference: the second version will work even if you bind
another object to Mapping:

class Mapping: ...

instance = Mapping()  # create instance
Mapping = None  # rebind the name to something else
d = type(instance)(iterable)  # create a new instance

In this (admittedly exotic) situation Raymond Hettinger's code with
self.__update will continue to work perfectly, while your alternative with
Mapping.update will fail.


I don't know if Raymond has an objective reason for preferring one over the
other, or if it is just a matter of personal taste. If you have a Twitter
account, perhaps you could ask him to comment?

https://twitter.com/raymondh



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running python from pty without prompt

2016-12-14 Thread Random832
On Tue, Dec 13, 2016, at 19:10, Steve D'Aprano wrote:
> Can you show a simple demonstration of what you are doing?
> 
> I'm having difficulty following this thread because I don't know
> what "script run on a tty" means.

The question is literally about the input/script being the tty and not
redirected from any other file, which causes an interactive prompt in
CPython, but does not do so in some other languages. I don't understand
what part of this you're not getting.

> I thought that with the exception of scripts run from cron, any time you
> run a script *or* in interactive mode, there is an associated tty. Am I
> wrong?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running python from pty without prompt

2016-12-14 Thread Random832
On Tue, Dec 13, 2016, at 19:20, Steve D'Aprano wrote:
> sys.flags.interactive will tell you whether or not your script was
> launched
> with the -i flag.
> 
> hasattr(sys, 'ps1') or hasattr(sys, 'ps2') will tell you if you are
> running
> in the REPL (interactive interpreter). The ps1 and ps2 variables aren't
> defined in non-interactive mode.

There's no way to *tell python to* run in non-interactive mode without
using a file other than the tty as the script. It's not a matter of
finding out from within python whether it's in interactive note, it's a
matter of python finding out whether the user *wants* it to run in
interactive mode.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread John Gordon
In  Gregory Ewing 
 writes:

> Once you're in the clutches of Apple, there is no Escape.

Ha!

-- 
John Gordon   A is for Amy, who fell down the stairs
[email protected]  B is for Basil, assaulted by bears
-- Edward Gorey, "The Gashlycrumb Tinies"

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Wrong release date in 3.6 whats new docs?

2016-12-14 Thread Ned Batchelder
On Wednesday, December 14, 2016 at 9:09:22 AM UTC-5, Nick Sarbicki wrote:
> Afternoon everyone.
> 
> Might be missing something obvious but the 3.6 What's New docs point to the
> release date being the 12th.
> 
> https://docs.python.org/3.6/whatsnew/3.6.html#what-s-new-in-python-3-6
> 
> I got the team excited about Friday's release so that caused some confusion
> here.
> 
> Guessing it's a typo?

3.6 hasn't been released yet. I think the 12th was the original target date.
3.6.0rc1 was the latest version on Dec 6th, but a problem was discovered
that means an rc2 will be needed.

--Ned.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread Michael Torrie
On 12/13/2016 09:22 PM, Gregory Ewing wrote:
> Paul Rubin wrote:
>> First it was the hipster Mac users
>> with the Beatnik black berets and turtlenecks, and now this. 
> 
> Once you're in the clutches of Apple, there is no Escape.

That's so not true! I've escaped dozens of times! ;)

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread mm0fmf

On 14/12/2016 02:40, Paul Rubin wrote:

Skip Montanaro  writes:

Does the lack of a physical ESC key create problems for people, especially
Emacs users?


Not a Mac user and I rarely use ESC instead of ALT while editing with
Emacs on a local computer, but when editing remotely I do have to use
ESC because the Gnome terminal emulator steals a few ALTed keys.  Maybe
there is a way to stop that behaviour but it didn't occur to me til just
now.  Hmm.

Meanwhile the concept of a computer with "no escape" just shows Apple
getting deeper into existentialism.  First it was the hipster Mac users
with the Beatnik black berets and turtlenecks, and now this. 



If you need a full time ESC key then you are just "typing it wrong" as 
Steve Jobs would say if he wasn't dead.


--
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread Jon Ribbens
On 2016-12-14, mm0fmf  wrote:
> On 14/12/2016 02:40, Paul Rubin wrote:
>> Skip Montanaro  writes:
>>> Does the lack of a physical ESC key create problems for people, especially
>>> Emacs users?
>>
>> Not a Mac user and I rarely use ESC instead of ALT while editing with
>> Emacs on a local computer, but when editing remotely I do have to use
>> ESC because the Gnome terminal emulator steals a few ALTed keys.  Maybe
>> there is a way to stop that behaviour but it didn't occur to me til just
>> now.  Hmm.
>>
>> Meanwhile the concept of a computer with "no escape" just shows Apple
>> getting deeper into existentialism.  First it was the hipster Mac users
>> with the Beatnik black berets and turtlenecks, and now this. 
>
> If you need a full time ESC key then you are just "typing it wrong" as 
> Steve Jobs would say if he wasn't dead.

With respect to pressing ESC in Emacs or Vim, you literally don't need
a "full time" ESC key, you only need one when the keyboard input focus
is on a terminal, Emacs or Vim window - and I should imagine that the
ESC key is indeed present whenever that's the case.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread Peter Pearson
On Tue, 13 Dec 2016 19:06:45 -0600, Skip Montanaro wrote:
> I know this isn't a Python-specific question, but 
[snip]
> Yes, I know I can use C-[ or the Alt key instead of ESC.

I know this isn't the sort of answer you wanted, but . . . 

Train your fingers to use C-[.  I did, decades ago, because the darn
escape key kept changing places from one keyboard to the next.  Combined
with the ease with which one can remap the CTRL key to a familiar place,
C-[ has been a blessing.

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread Skip Montanaro
> If you need a full time ESC key then you are just "typing it wrong" as Steve 
> Jobs
> would say if he wasn't dead.

Shouldn't the use of ESC, C-[, Alt, or some other mapped key be
treated as a valid personal preference?

I've been using some variety of Emacs (there have been many) since the
early 80s on many different operating systems, with keyboards (or
terminals - think VT100, VT52, or ADM3a) of all types. I think when I
first started using Gosmacs on a VMS machine, ESC was the prefix key,
and Alt wasn't used.

Over the years, I have remapped the backtick and Caps Lock, keys, used
C-[ or Alt, none of which I find suitable, either because I don't wind
up with the same setup across multiple machines (what's the Windows
equivalent of xmodmap?), the darn thing moves around (Sun and PC101
keyboards were always different), is hidden (Alt keys are always
hiding somewhere under my left or right hands), or actually use those
remapped keys for useful stuff already. By-in-large, the ESC key has
remained in the same place on all the keyboards I've ever used. I
trust that a soft ESC key on the Touch Bar will probably be in just
about the right place. My fingers know where the ESC key is without
thinking. I use multiple platforms from time-to-time (typing right now
on a Dell keyboard connected to a Windows machine running Remote
Desktop to connect to a DMZ-hosted Windows machine). My current aging
MBP has a physical ESC key, as does the Apple keyboard attached to my
wife's iMac (which I sometimes use). I'm specifically interested in
how the lack of a physical ESC key affects people who do are used to
the real deal.

If nobody has any experience because the Touch-Bar-equipped MBPs are
too new, that's fine. I thought some people would have purchased it by
now.

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-14 Thread Skip Montanaro
On Wed, Dec 14, 2016 at 11:40 AM, Peter Pearson
 wrote:
> Train your fingers to use C-[.

As I recall, the location of the Ctrl key was one of the differences
between Sun and PC101 keyboards. Doesn't matter so much now, as Sun
has gone the way of the dodo, but it moved around more for me than ESC
over the years.

S
-- 
https://mail.python.org/mailman/listinfo/python-list


Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Chris Angelico
When you work with threads, it's pretty straight-forward to spin off
another thread: you start it. Your current thread continues, and the
other one runs in parallel.

What's the most straight-forward way to do this in asyncio? I know
this has been asked before, but threads keep devolving, and I can't
find back the one I want :| This seems like a recipe that needs to be
in the docs somewhere. Consider:

async def parallel():
print("Doing stuff in parallel")
await asyncio.sleep(1)
print("Done stuff in parallel")

async def do_stuff():
print("Doing stuff")
asyncio.spin_off(parallel()) # ???
print("Doing more stuff")
await asyncio.sleep(2)
print("Done doing stuff")

Whenever do_stuff gets called, I want it to run to completion in two
seconds, AND the parallel process should run to completion during that
time.

What code should go on the "???" line to accomplish this?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Marko Rauhamaa
Chris Angelico :

> asyncio.spin_off(parallel()) # ???
>
> [...]
>
> What code should go on the "???" line to accomplish this?

asyncio.ensure_future(parallel())


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing a potentially corrupted file

2016-12-14 Thread MRAB

On 2016-12-14 11:43, Paul Moore wrote:

I'm looking for a reasonably "clean" way to parse a log file that potentially 
has incomplete records in it.

The basic structure of the file is a set of multi-line records. Each record starts with a 
series of fields delimited by [...] (the first of which is always a date), optionally 
separated by whitespace. Then there's a trailing "free text" field, optionally 
followed by a multi-line field delimited by [[...]]

So, example records might be

[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id] Description of 
the issue goes here

(a record delimited by the end of the line)

or

[2016-11-30T20:04:08.000+00:00] [Component] [level] [] [] [id] Description of 
the issue goes here [[Additional
data, potentially multiple lines

including blank lines
goes here
]]

The terminating ]] is on a line of its own.

This is a messy format to parse, but it's manageable. However, there's a catch. 
Because the logging software involved is broken, I can occasionally get a log 
record prematurely terminated with a new record starting mid-stream. So 
something like the following:

[2016-11-30T20:04:08.000+00:00] [Component] [le[2016-11-30T20:04:08.000+00:00] 
[Component] [level] [] [] [id] Description of the issue goes here

I'm struggling to find a "clean" way to parse this. I've managed a clumsy 
approach, by splitting the file contents on the pattern [ddd-dd-ddTdd:dd:dd.ddd+dd:dd] 
(the timestamp - I've never seen a case where this gets truncated) and then treating each 
entry as a record and parsing it individually. But the resulting code isn't exactly 
maintainable, and I'm looking for something cleaner.

Does anyone have any suggestions for a good way to parse this data?


I think I'd do something like this:

while have_more(input):
# At the start of a record.
timestamp = parse_timestamp(input)

fields = []
description = None
additional = None

try:
for i in range(5):
# A field shouldn't contain a '[', so if it sees one one, it'll
# push it back and return True for truncated.
field, truncated = parse_field(input)
fields.append(fields)

if truncated:
raise TruncatedError()

# The description shouldn't contain a timestamp, but if it 
does, it'll

# push it back from that point and return True for truncated.
description, truncated = parse_description(input)

if truncated:
raise TruncatedError()

# The additional information shouldn't contain a timestamp, but 
if it

# does, it'll push it back from that point and return True for
# truncated.
additional, truncated = parse_additional_information(input)

if truncated:
raise TruncatedError()
except TruncatedError:
process_record(timestamp, fields, description, additional, 
truncated=True)

else:
process_record(timestamp, fields, description, additional)

--
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Chris Angelico
On Thu, Dec 15, 2016 at 6:27 AM, Marko Rauhamaa  wrote:
> Chris Angelico :
>
>> asyncio.spin_off(parallel()) # ???
>>
>> [...]
>>
>> What code should go on the "???" line to accomplish this?
>
> asyncio.ensure_future(parallel())

Hmm. I tried that but it didn't work in the full program (it hung the
calling coroutine until completion). But it does work in the toy
example I posted here. Weird. Must be something else that's wrong,
then. I'll keep poking around, thanks.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Marko Rauhamaa
Chris Angelico :

> On Thu, Dec 15, 2016 at 6:27 AM, Marko Rauhamaa  wrote:
>> Chris Angelico :
>>
>>> asyncio.spin_off(parallel()) # ???
>>>
>>> [...]
>>>
>>> What code should go on the "???" line to accomplish this?
>>
>> asyncio.ensure_future(parallel())
>
> Hmm. I tried that but it didn't work in the full program (it hung the
> calling coroutine until completion). But it does work in the toy
> example I posted here. Weird. Must be something else that's wrong,
> then. I'll keep poking around, thanks.

What version of Python are you running?

   Changed in version 3.5.1: The function accepts any awaitable object.

   https://docs.python.org/3/library/asyncio-task.html#asyncio.ensu
   re_future>


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Chris Angelico
On Thu, Dec 15, 2016 at 7:25 AM, Marko Rauhamaa  wrote:
> What version of Python are you running?
>
>Changed in version 3.5.1: The function accepts any awaitable object.
>

3.7 :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Ian Kelly
On Wed, Dec 14, 2016 at 12:53 PM, Chris Angelico  wrote:
> On Thu, Dec 15, 2016 at 6:27 AM, Marko Rauhamaa  wrote:
>> Chris Angelico :
>>
>>> asyncio.spin_off(parallel()) # ???
>>>
>>> [...]
>>>
>>> What code should go on the "???" line to accomplish this?
>>
>> asyncio.ensure_future(parallel())
>
> Hmm. I tried that but it didn't work in the full program (it hung the
> calling coroutine until completion). But it does work in the toy
> example I posted here. Weird. Must be something else that's wrong,
> then. I'll keep poking around, thanks.

Did you just use "asyncio.ensure_future(parallel())" in the full
program or did you throw in an await by mistake?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Chris Angelico
On Thu, Dec 15, 2016 at 7:53 AM, Ian Kelly  wrote:
> On Wed, Dec 14, 2016 at 12:53 PM, Chris Angelico  wrote:
>> On Thu, Dec 15, 2016 at 6:27 AM, Marko Rauhamaa  wrote:
>>> Chris Angelico :
>>>
 asyncio.spin_off(parallel()) # ???

 [...]

 What code should go on the "???" line to accomplish this?
>>>
>>> asyncio.ensure_future(parallel())
>>
>> Hmm. I tried that but it didn't work in the full program (it hung the
>> calling coroutine until completion). But it does work in the toy
>> example I posted here. Weird. Must be something else that's wrong,
>> then. I'll keep poking around, thanks.
>
> Did you just use "asyncio.ensure_future(parallel())" in the full
> program or did you throw in an await by mistake?

I didn't await that, no. Have just pinned down the problem, and it's a
total facepalm moment: the secondary task had gotten stuck in an
infinite loop with no await points in it. So technically they _would_
both have been scheduled when I did it as per Marko's suggestion, but
I couldn't tell. Whps. Sorry about that!

Still might be worth having a simple recipe in the docs though; it
would have saved me the trouble of digging through my "fork" code.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Parsing a potentially corrupted file

2016-12-14 Thread Paul Rubin
Paul  Moore  writes:
> I'm looking for a reasonably "clean" way to parse a log file that
> potentially has incomplete records in it.

Basically trial and error.  Code something reasonable, run your program
til it crashes on a record that it doesn't know what to do with, add
code to deal with that, rinse and repeat.  I've done this kind of thing
multiple times.  You tend to get exponentially further along with each
run/crash/fix iteration til you get most of everything, though there
might still be an occasional hopeless record that you have to just log.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Wrong release date in 3.6 whats new docs?

2016-12-14 Thread breamoreboy
On Wednesday, December 14, 2016 at 2:09:22 PM UTC, Nick Sarbicki wrote:
> Afternoon everyone.
> 
> Might be missing something obvious but the 3.6 What's New docs point to the
> release date being the 12th.
> 
> https://docs.python.org/3.6/whatsnew/3.6.html#what-s-new-in-python-3-6
> 
> I got the team excited about Friday's release so that caused some confusion
> here.
> 
> Guessing it's a typo?

Are you confusing the date on which the what's new was updated with the release 
schedule here https://www.python.org/dev/peps/pep-0494/ ?

Kindest regards.

Mark Lawrence.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running python from pty without prompt

2016-12-14 Thread Steve D'Aprano
On Thu, 15 Dec 2016 01:28 am, Random832 wrote:

> On Tue, Dec 13, 2016, at 19:10, Steve D'Aprano wrote:
>> Can you show a simple demonstration of what you are doing?
>> 
>> I'm having difficulty following this thread because I don't know
>> what "script run on a tty" means.
> 
> The question is literally about the input/script being the tty and not
> redirected from any other file, which causes an interactive prompt in
> CPython, but does not do so in some other languages. I don't understand
> what part of this you're not getting.

What can I say? Maybe I'm just slow. Or maybe you're falling into the curse
of knowledge:

https://en.wikipedia.org/wiki/Curse_of_knowledge

I'm not the only one having trouble understanding the nature of this
problem -- Michael Torrie has also said "though a tty comes into this
somehow and I'm not clear on that".

What is meant by "the input/script being the tty"? And how does that relate
to the subject line which refers to a pty?

That's why I've asked for a simple example that demonstrates the issue. But
apparently this simple example is so simple that nobody knows how to write
it. I cannot replicate the OP's problem from his description alone, and I
have not seen an example where the Python prompt is shown apart from when
running Python interactively.

So... the input is the tty. I don't know what that means, but I think I know
what it isn't. I'm fairly confident it isn't when you pipe the output of
one process to Python:

# not this
[steve@ando ~]$ echo "import sys; print sys.version" | python
2.7.2 (default, May 18 2012, 18:25:10)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]


And likewise you said it is not when input is *redirected* from a file, so
it probably isn't this:

[steve@ando ~]$ cat myfile
import sys; print sys.version

[steve@ando ~]$ python < myfile
2.7.2 (default, May 18 2012, 18:25:10)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]


and surely I can eliminate passing the file name as a command line argument
(python myfile) as well. So what is left?


Michael Torrie suggests something more or less like this, redirecting stdin
to Python with a here-doc:

[steve@ando ~]$ python << .
> import sys
> print sys.version
> .
2.7.2 (default, May 18 2012, 18:25:10) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)]

(Michael's version used EOF rather than a dot.) There's a prompt, but it's
not the Python prompt, it's from the shell. Since you are insisting that
the Python interactive prompt is involved, then surely Michael's example
isn't what you mean either.

So I now have *four* ways of running code in Python that *don't* match the
OP's problem (five if you include the standard REPL) and I'm not closer to
understanding the OP's problem.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Python3, column names from array - numpy or pandas

2016-12-14 Thread renjith madhavan
I have a dataset in the below format.

id  A   B   C   D   E
100 1   0   0   0   0
101 0   1   1   0   0
102 1   0   0   0   0
103 0   0   0   1   1

I would like to convert this into below:
100, A
101, B C
102, A
103, D E

How do I do this ? I tried numpy argsort but I am new to Python and finding 
this challenging.
Appreciate any help in this.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running python from pty without prompt

2016-12-14 Thread Samuel Williams
Here are some examples of different languages:

https://github.com/ioquatix/script-runner/blob/master/examples/python-eot.py
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python3, column names from array - numpy or pandas

2016-12-14 Thread Miki Tebeka
You can do this with pandas:

import pandas as pd
from io import StringIO

io = StringIO('''\
idABCDE 
10010000 
10101100 
10210000 
10300011 
''')

df = pd.read_csv(io, sep='\s+', index_col='id')
val = df.apply(lambda row: ' '.join(df.columns[row==1]), axis=1)

Google for "boolean indexing" to see what's going on :)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Running python from pty without prompt

2016-12-14 Thread Michael Torrie
On 12/14/2016 09:29 PM, Samuel Williams wrote:
> Here are some examples of different languages:
> 
> https://github.com/ioquatix/script-runner/blob/master/examples/python-eot.py

Okay so it looks like you're just opening a pipe to a subprocess and
feeding it a script and input.  So there's no pty involved here.  Or am
I wrong?

In any case, I think if you made a python wrapper script that could take
the standard in up until the ctrl-d, and then exec() that, what's left
on standard in should be able to feed the exec'd script any input it needs.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Frank Millman
"Chris Angelico"  wrote in message 
news:CAPTjJmoFyJqYw4G_kNo5Sn=ULyjgm=kexqrqvwubr87evbz...@mail.gmail.com...


When you work with threads, it's pretty straight-forward to spin off
another thread: you start it. Your current thread continues, and the
other one runs in parallel.

What's the most straight-forward way to do this in asyncio?



You have got the answer - asyncio.ensure_future(...)

I thought I would share something that happened just yesterday, where this 
came to my rescue.


I mentioned in a recent post that I had set up a background task to process 
HTTP requests for a given session, using an asyncio.Queue and an endless 
loop.


I have a session.close() coroutine which has, among other steps -
   await self.request_queue.put(None)  # to stop the loop
   await self.request_queue.join()  # to ensure all requests completed

There are three places in my code that can await session.close() -

   - on closing the program, close all open sessions
   - on detecting that the session is no longer responding, kill the 
session

   - on receiving a message from the client indicating that it had closed

The first two worked fine, but the third one hung. I eventually found that, 
because I was awaiting session.close() from within the request handler, that 
request had not completed, therefore it was hanging on the 'join'.


I was stumped for a while, and then I had an 'aha' moment.

I changed 'await self.close()', to 'asyncio.ensure_future(self.close())'.

Problem solved.

Frank Millman



--
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Marko Rauhamaa
"Frank Millman" :

> I changed 'await self.close()', to 'asyncio.ensure_future(self.close())'.
>
> Problem solved.

A nice insight.

However, shouldn't somebody somewhere in your code be keeping track of
the returned task?


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Wrong release date in 3.6 whats new docs?

2016-12-14 Thread Nick Sarbicki
I'm aware of the the schedule in the PEP.

But if the date at the top of the What's New page is the last day it was
updated and not the release date then that is what has caused the confusion.

On Wed, 14 Dec 2016, 22:58 ,  wrote:

> On Wednesday, December 14, 2016 at 2:09:22 PM UTC, Nick Sarbicki wrote:
> > Afternoon everyone.
> >
> > Might be missing something obvious but the 3.6 What's New docs point to
> the
> > release date being the 12th.
> >
> > https://docs.python.org/3.6/whatsnew/3.6.html#what-s-new-in-python-3-6
> >
> > I got the team excited about Friday's release so that caused some
> confusion
> > here.
> >
> > Guessing it's a typo?
>
> Are you confusing the date on which the what's new was updated with the
> release schedule here https://www.python.org/dev/peps/pep-0494/ ?
>
> Kindest regards.
>
> Mark Lawrence.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Frank Millman

"Marko Rauhamaa"  wrote in message news:[email protected]...


"Frank Millman" :

> I changed 'await self.close()', to 
> 'asyncio.ensure_future(self.close())'.

>
> Problem solved.

A nice insight.

However, shouldn't somebody somewhere in your code be keeping track of
the returned task?



I don't know. What is the worst that could happen?

My way of looking at it is that it is similar to setTimeout() in javascript. 
I am requesting that the enclosed function/coroutine be scheduled for 
execution at the next available opportunity in the event loop. In 
javascript, I don't keep track of it, I just assume that it will be executed 
at some point. Is it not reasonable to do the same here?


Frank


--
https://mail.python.org/mailman/listinfo/python-list


Re: Recipe request: asyncio "spin off coroutine"

2016-12-14 Thread Marko Rauhamaa
"Frank Millman" :

> "Marko Rauhamaa"  wrote in message news:[email protected]...
>>
>> "Frank Millman" :
>>
>> > I changed 'await self.close()', to >
>> 'asyncio.ensure_future(self.close())'.
>> >
>> > Problem solved.
>>
>> A nice insight.
>>
>> However, shouldn't somebody somewhere in your code be keeping track of
>> the returned task?
>
> I don't know. What is the worst that could happen?

Only you can tell.

> I just assume that it will be executed at some point. Is it not
> reasonable to do the same here?

It ain't over till the fat lady sings. Things can accumulate, hang
and/or fail in surprising ways.

At the very least you should maintain statistics that reveal the number
of pending closing tasks for troubleshooting when things go south.


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list