Re: SnakeCard products source code

2005-11-14 Thread Kris
Thanks for sharing it. I am interested in the GINA part.


Philippe C. Martin wrote:
> Dear all,
>
> The source code is available in the download section: www.snakecard.com
> 
> Regards,
> 
> Philippe

-- 
http://mail.python.org/mailman/listinfo/python-list


Program blocked in Queue.Queue.get and Queue.Queue.put

2012-01-04 Thread Kris
I have a program that is blocked and all threads are blocked on a
Queue.Queue.get or Queue.Queue.put method (on the same Queue.Queue
object).

1 thread shows the below as its last entry in the stack:
File: "c:\python27\lib\Queue.py", line 161, in get
  self.not_empty.acquire()

2 threads show the below as its last entry in the stack:
File: "c:\python27\lib\Queue.py", line 118, in put
  self.not_full.acquire()

According to me, this means both the Queue.Queue.not_full and
Queue.Queue.not_empty locks are taken, but no other thread seems to
have it. Of course, I don't access the locks my self directly.

I did send an KeyboardInterrupt to the main thread however. Could it
be that it was at that moment doing a Queue.Queue.put and it got
interrupted while it has the lock, but before it entered the try block
with the finally that releases the lock (so between line 118 and 119
in the Queue.py file)?

If this is the case, how do I avoid that? Or is it a bug in the
Queue.Queue class?
If this is not the case, any clue what else could have happened?

Thanks
-- 
http://mail.python.org/mailman/listinfo/python-list


Static Methods in Python

2005-05-01 Thread Kris
Hi,
  I am a newbie to Python. With a background in Java, I was attempting
to write static methods in the class without the self as the first
parameter, when I got an error. I did a search for the same on Google
and found out that there was no consistent approach to this. I would
like to know what is the prescribed approach for the same. Any
thoughts, pointers about the same would be very much appreciated.

Thanks,
Kris

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Issues with if and elif statements in 3.3

2013-08-08 Thread Kris Mesenbrink
WOW as if it was something as easy as that,i had been looking for  awhile on 
what i was doing wrong. as it seems i just don't know my way around if 
statements at all, thank a bunch for this. makes everything else i have been 
code work

thanks again
-- 
http://mail.python.org/mailman/listinfo/python-list


back with more issues

2013-08-11 Thread Kris Mesenbrink
import random

def player():
hp = 10
speed = 5
attack = random.randint(0,5)

def monster ():
hp = 10
speed = 4

def battle(player):
print ("a wild mosnter appered!")
print ("would you like to battle?")
answer = input()
if answer == ("yes"):
return player(attack)
else:
print("nope")


battle()


++

this was a variation on a code that you guys already had helped me with,in the 
long run i plan to incorporate them together but as it stand i don't know how 
to call a specific variable from one function (attack from player) to use in 
another function (battle). what i want is to be able to use the variables from 
both player and monster to use in battle. any idea's?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: back with more issues

2013-08-11 Thread Kris Mesenbrink
the idea was to store variables for later use, but you are correct i don't 
understand functions or if that is even the best way to do it. i guess i'd want 
to be able to call the HP and ATTACK variables of player for when the battle 
gets called. i would then use the variables in battle to figure out who would 
win. is there a better way to store these variables in the functions? i also 
read somewhere about classes but that makes even less sense to me.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: back with more issues

2013-08-11 Thread Kris Mesenbrink
darn i was hoping i could put off learning classes for a bit, but it seems that 
is not the case. i have tested it a bit and it seems to be working correctly 
now.


import random

class player():
hp = 10
speed = 5
attack = random.randint(0,5)

print (player.attack)

+++

i know it's not nearly as complicated as your examples but it seems to work. 
the self part of it always eluded me and continues to do so. and just so you 
know im learning through codecademy.com , it's based on python 2.7 and im 
trying to code in 3.3. but thanks for your help again and classes are starting 
(i think) to make some sort of sense.i'll have to reread both replies over and 
over again but it looks like a lot of useful info is there. but is the example 
i posted sorta right? i know i left the self part out but i think im on the 
right track.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: back with more issues

2013-08-12 Thread Kris Mesenbrink
import random

class player():
hp = 10
attack = random.randint(0,5)

class monster():
hp = 10
attack = random.randint(0,4)


def battle():
print ("a wild mosnter appered!")
print ("would you like to battle?")
answer = input()
if answer == ("yes"):
while monster.hp >=0:
print ("you do", player.attack, "damage")
monster.hp -= player.attack
print (monster.hp)
elif answer == ("no"):
print ("you run away")
else:
print("you stand there")



battle()




Hello! just wanted to show you guys how its coming together, im starting to 
understand it abit more (hopefully it's right) at the moment it seems to only 
roll the attack once and uses that value but that's another issue all together 
that i bother you with (yet anyway).

thanks again guys you are awesome
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: back with more issues

2013-08-12 Thread Kris Mesenbrink
the Classes and __init__ still don't make much sense actually. i have tried and 
tried again to make it generate numbers between 0 and 5 in a while statement 
but it just doesn't seem to be working. 

import random


class Player():
hp = 10
def __init__(self, patt):
self.att = random.randint(0,5)



while Player.hp == 10:
print (Player.__init__)

atm it seems to be printing "" 
over and over again, i don't mind the repetition but from my understanding 
there should be numbers there. numbers that change. crazy frustrating that i 
just don't understand how this works.
-- 
http://mail.python.org/mailman/listinfo/python-list


Multi-threaded SSL

2006-02-17 Thread Kris Kowal
Dear Ophidians,

I'm attempting to create an SSL secured, AJAX chat server.  I'm moving
on the hypothesis that I'll need to hang an XMLHttpRequest response
blocking on the server until a new message is ready to be dispatched.
This means that my server must be able to handle many open SSL sockets
in separate threads.

I started with Twisted, but, having looked as far as I can see, SSL is
either not implemented, or not documented for that library.  There are
hints that it's in the works, but that's all.  So, I've moved on.

I'm using PyOpenSSL on a Debian box, and I started with the ActiveState
Cookbook article,
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/442473  The SSL
server works very well as suggested in this article.

Starting with this code and adding threads, I've been trying to make
simultaneous HTTP requests operate in parallel on the server.  To test,
I've added in turn busy and sleepy waiting to the GET processing
segment of the request handler.  The threads work fine; every time the
server accepts a connection, it clearly starts accepting connections in
a new thread.  However, the problem runs deeper than I can see.  The
SSL listening socket blocks on accept in all threads until the one open
SSL connection finishes its waiting, responds, and closes.  This means
that I can only have one client waiting for a response at a time.

Is there a limitation of SSL, or this SSL implementation, or something
else preventing me from having multiple connections waiting for
responses simultaneously?

Many thanks,
Kris Kowal

-- 
http://mail.python.org/mailman/listinfo/python-list


Python IRC Zork

2008-02-27 Thread Kris Davidson
Hi,

If this has been done before in another language could someone please
tell me, if not I was wondering is its possible and what the easier
way is to create an IRC bot that allows you to play Zork:

I was thinking of just creating a simple Python IRC bot or finding an
existing one then have it run Zork and read/write from stdout/stdin.

Is that possible? Is there a better or easier way to do it? Are there
any existing programs that do something similar?

Or just really anything else people have to say on the subject.

Thanks

Kris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python IRC Zork

2008-02-27 Thread Kris Davidson
I should have said, I'm guessing subprocess is the way to go but I'm
probably wrong.

On 28/02/2008, Kris Davidson <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  If this has been done before in another language could someone please
>  tell me, if not I was wondering is its possible and what the easier
>  way is to create an IRC bot that allows you to play Zork:
>
>  I was thinking of just creating a simple Python IRC bot or finding an
>  existing one then have it run Zork and read/write from stdout/stdin.
>
>  Is that possible? Is there a better or easier way to do it? Are there
>  any existing programs that do something similar?
>
>  Or just really anything else people have to say on the subject.
>
>  Thanks
>
>
>  Kris
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python IRC Zork

2008-02-28 Thread Kris Davidson
>  The bigger picture would be writing a full Z machine in Python, which is
>  something I embarked on for my own amusement a while back but never got
>  far enough to do anything useful at all, given the size of the task.

Might be worth trying that or setting up a project somewhere, do any
exist? Have you posted what code you had somewhere?
-- 
http://mail.python.org/mailman/listinfo/python-list


mmap class has slow "in" operator

2008-05-29 Thread Kris Kennaway

If I do the following:

def mmap_search(f, string):
fh = file(f)
mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)

return mm.find(string)

def mmap_is_in(f, string):
fh = file(f)
mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)

return string in mm

then a sample mmap_search() call on a 50MB file takes 0.18 seconds, but 
the mmap_is_in() call takes 6.6 seconds.  Is the mmap class missing an 
operator and falling back to a slow default implementation?  Presumably 
I can implement the latter in terms of the former.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


UNIX credential passing

2008-05-29 Thread Kris Kennaway
I want to make use of UNIX credential passing on a local domain socket 
to verify the identity of a user connecting to a privileged service. 
However it looks like the socket module doesn't implement 
sendmsg/recvmsg wrappers, and I can't find another module that does this 
either.  Is there something I have missed?


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: UNIX credential passing

2008-05-30 Thread Kris Kennaway

Sebastian 'lunar' Wiesner wrote:

[ Kris Kennaway <[EMAIL PROTECTED]> ]


I want to make use of UNIX credential passing on a local domain socket
to verify the identity of a user connecting to a privileged service.
However it looks like the socket module doesn't implement
sendmsg/recvmsg wrappers, and I can't find another module that does this
either.  Is there something I have missed?


http://pyside.blogspot.com/2007/07/unix-socket-credentials-with-python.html

Illustrates, how to use socket credentials without sendmsg/recvmsg and so
without any need for patching.




Thanks to both you and Paul for your suggestions.  For the record, the 
URL above is linux-specific, but it put me on the right track.  Here is 
an equivalent FreeBSD implementation:


def getpeereid(sock):
""" Get peer credentials on a UNIX domain socket.

Returns a nested tuple: (uid, (gids)) """

LOCAL_PEERCRED = 0x001
NGROUPS = 16

#struct xucred {
#u_int   cr_version; /* structure layout version */
#uid_t   cr_uid; /* effective user id */
#short   cr_ngroups; /* number of groups */
#gid_t   cr_groups[NGROUPS]; /* groups */
#void*_cr_unused1;   /* compatibility with old ucred */
#};

xucred_fmt = '2ih16iP'
res = tuple(struct.unpack(xucred_fmt, sock.getsockopt(0, 
LOCAL_PEERCRED, struct.calcsize(xucred_fmt


# Check this is the above version of the structure
if res[0] != 0:
raise OSError

return (res[1], res[3:3+res[2]])


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: "Faster" I/O in a script

2008-06-04 Thread Kris Kennaway

Gary Herron wrote:

[EMAIL PROTECTED] wrote:

On Jun 2, 2:08 am, "kalakouentin" <[EMAIL PROTECTED]> wrote:

 

 Do you know a way to actually load my data in a more
"batch-like" way so I will avoid the constant line by line reading?



If your files will fit in memory, you can just do

text = file.readlines()

and Python will read the entire file into a list of strings named
'text,' where each item in the list corresponds to one 'line' of the
file.
  


No that won't help.  That has to do *all* the same work (reading blocks 
and finding line endings) as the iterator PLUS allocate and build a list.

Better to just use the iterator.

for line in file:
 ...


Actually this *can* be much slower.  Suppose I want to search a file to 
see if a substring is present.


st = "some substring that is not actually in the file"
f = <50 MB log file>

Method 1:

for i in file(f):
if st in i:
break

--> 0.472416 seconds

Method 2:

Read whole file:

fh = file(f)
rl = fh.read()
fh.close()

--> 0.098834 seconds

"st in rl" test --> 0.037251 (total: .136 seconds)

Method 3:

mmap the file:

mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ)
"st in mm" test --> 3.589938 (<-- see my post the other day)

mm.find(st) --> 0.186895

Summary:

If you can afford the memory, it can be more efficient (more than 3 
times faster in this example) to read the file into memory and process 
it at once (if possible).


Mmapping the file and processing it at once is roughly as fast (I didnt 
measure the difference carefully), but has the advantage that if there 
are parts of the file you do not touch you don't fault them into memory. 
 You could also play more games and mmap chunks at a time to limit the 
memory use (but you'd have to be careful with mmapping that doesn't 
match record boundaries).


Kris
--
http://mail.python.org/mailman/listinfo/python-list


PEP on breaking outer loops with StopIteration

2008-06-09 Thread Kris Kowal
I had a thought that might be pepworthy.  Might we be able to break
outer loops using an iter-instance specific StopIteration type?

This is the desired, if not desirable, syntax::

import string
letters = iter(string.lowercase)
for letter in letters:
for number in range(10):
print letter, number
if letter == 'a' and number == 5:
raise StopIteration()
if letter == 'b' and number == 5:
raise letters.StopIteration()

The first StopIteration would halt the inner loop.  The second
StopIteration would halt the outer loop.  The inner for-loop would
note that the letters.StopIteration instance is specifically targeted
at another iteration and raise it back up.

For this output::

a 0
a 1
a 2
a 3
a 4
a 5
b 0
b 1
b 2
b 3
b 4
b 5

This could be incrementally refined with the addition of an "as"
clause to "for" that would be bound after an iterable is implicitly
iter()ed::

import string
for letter in string.lowercase as letters:
…
raise letters.StopIteration()

I took the liberty to create a demo using a "for_in" decorator instead
of a "for" loop::

former_iter = iter

class iter(object):
def __init__(self, values):
if hasattr(values, 'next'):
self.iter = values
else:
self.iter = former_iter(values)
class Stop(StopIteration):
pass
if hasattr(values, 'StopIteration'):
self.StopIteration = values.StopIteration
else:
self.StopIteration = Stop

def next(self):
try:
return self.iter.next()
except StopIteration, exception:
raise self.StopIteration()

def for_in(values):
def decorate(function):
iteration = iter(values)
while True:
try:
function(iteration.next())
except iteration.StopIteration:
break
except StopIteration, exception:
if type(exception) is StopIteration:
break
else:
raise
return decorate

import string
letters = iter(string.lowercase)

@for_in(letters)
def _as(letter):
@for_in(range(10))
def _as(number):
print letter, number
if letter == 'a' and number == 5:
raise StopIteration()
if letter == 'b' and number == 5:
raise letters.StopIteration()

I imagine that this would constitute a lot of overhead in
StopIteration type instances, but perhaps a C implementation would use
flyweight StopIteration types for immutable direct subtypes of the
builtin StopIteration.

Kris Kowal
--
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on breaking outer loops with StopIteration

2008-06-09 Thread Kris Kowal
On Mon, Jun 9, 2008 at 7:39 PM, Paul Hankin <[EMAIL PROTECTED]> wrote:

> Have you checked out http://www.python.org/dev/peps/pep-3136/
>
> It contains exactly this idea, but using 'break letters' rather than
> 'raise letters.StopIteration()'. I think I like the PEP's syntax
> better than yours, but anyway, it was rejected.

I concur that "break letters" is better than "raise
letters.StopIteration()".  Perhaps the novelty of the implementation
idea (adding another exception case to the "while: try" that
must already be there, and the specialized exception type) can wake this
dead issue.  Maybe "break letters" could under the hood raise the
specialized StopIteration.

But, then again.  Guido has said, "No", already on other, albeit
subjective, grounds.
I'll drop it or champion it if there's interest.

Kris Kowal
--
http://mail.python.org/mailman/listinfo/python-list


ZFS bindings

2008-06-18 Thread Kris Kennaway
Is anyone aware of python bindings for ZFS?  I just want to replicate 
(or at least wrap) the command line functionality for interacting with 
snapshots etc.  Searches have turned up nothing.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: Looking for lots of words in lots of files

2008-06-18 Thread Kris Kennaway

Calvin Spealman wrote:

Upload, wait, and google them.

Seriously tho, aside from using a real indexer, I would build a set of 
the words I'm looking for, and then loop over each file, looping over 
the words and doing quick checks for containment in the set. If so, add 
to a dict of file names to list of words found until the list hits 10 
length. I don't think that would be a complicated solution and it 
shouldn't be terrible at performance.


If you need to run this more than once, use an indexer.

If you only need to use it once, use an indexer, so you learn how for 
next time.


If you can't use an indexer, and performance matters, evaluate using 
grep and a shell script.  Seriously.


grep is a couple of orders of magnitude faster at pattern matching 
strings in files (and especially regexps) than python is.  Even if you 
are invoking grep multiple times it is still likely to be faster than a 
"maximally efficient" single pass over the file in python.  This 
realization was disappointing to me :)


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Bit substring search

2008-06-24 Thread Kris Kennaway
I am trying to parse a bit-stream file format (bzip2) that does not have 
byte-aligned record boundaries, so I need to do efficient matching of 
bit substrings at arbitrary bit offsets.


Is there a package that can do this?  This one comes close:

http://ilan.schnell-web.net/prog/bitarray/

but it only supports single bit substring match.

Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: Bit substring search

2008-06-24 Thread Kris Kennaway

[EMAIL PROTECTED] wrote:

Kris Kennaway:

I am trying to parse a bit-stream file format (bzip2) that does not have
byte-aligned record boundaries, so I need to do efficient matching of
bit substrings at arbitrary bit offsets.
Is there a package that can do this?


You may take a look at Hachoir or some other modules:
http://hachoir.org/wiki/hachoir-core
http://pypi.python.org/pypi/construct/2.00


Thanks.  hachoir also comes close, but it also doesnt seem to be able to 
match substrings at a bit level (e.g. the included bzip2 parser just 
reads the header and hands the entire file off to libbzip2 to extract 
data from).


construct exports a bit stream but it's again pure python and matching 
substrings will be slow.  It will need C support to do that efficiently.



http://pypi.python.org/pypi/FmtRW/20040603
Etc. More:
http://pypi.python.org/pypi?%3Aaction=search&term=binary


Unfortunately I didnt find anything else useful here yet :(

Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: Bit substring search

2008-06-24 Thread Kris Kennaway

[EMAIL PROTECTED] wrote:

Kris Kennaway:

Unfortunately I didnt find anything else useful here yet :(


I see, I'm sorry, I have found hachoir quite nice in the past. Maybe
there's no really efficient way to do it with Python, but you can
create a compiled extension, so you can see if it's fast enough for
your purposes.
To create such extension you can:
- One thing that requires very little time is to create an extension
with ShedSkin, once installed it just needs Python code.
- Cython (ex-Pyrex) too may be okay, but it's a bit trikier on Windows
machines.
- Using Pyd to create a D extension for Python is often the faster way
I have found to create extensions. I need just few minutes to create
them this way. But you need to know a bit of D.
- Then, if you want you can write a C extension, but if you have not
done it before you may need some hours to make it work.


Thanks for the pointers, I think a C extension will end up being the way 
to go, unless someone has beaten me to it and I just haven't found it yet.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: Bit substring search

2008-06-25 Thread Kris Kennaway

Scott David Daniels wrote:

Kris Kennaway wrote:
Thanks for the pointers, I think a C extension will end up being the 
way to go, unless someone has beaten me to it and I just haven't found 
it yet.


Depending on the pattern length you are targeting, it may be fastest to
increase the out-of-loop work.  For a 40-bit string, build an 8-target
Aho-Corasick machine, and at each match check the endpoints.  This will
only work well if 40 bits is at the low end of what you are hunting for.


Thanks, I wasn't aware of Aho-Corasick.

Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-07 Thread Kris Kennaway

Paddy wrote:

On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote:

Henning_Thornblad wrote:

What can be the cause of the large difference between re.search and
grep?

grep uses a smarter algorithm ;)




This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re
row=""
for a in range(156000):
row+="a"
print re.search('[^ "=]*/',row)
While doing a simple grep:
grep '[^ "=]*/' input  (input contains 156.000 a in
one row)
doesn't even take a second.
Is this a bug in python?

You could call this a performance bug, but it's not common enough in real
code to get the necessary brain cycles from the core developers.
So you can either write a patch yourself or use a workaround.

re.search('[^ "=]*/', row) if "/" in row else None

might be good enough.

Peter


It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but 


I can and do :-)

It's a major problem that regular expression parsing in python has 
exponential complexity when polynomial algorithms (for a subset of 
regexp expressions, e.g. excluding back-references) are well-known.


It rules out using python for entire classes of applications where 
regexp parsing is on the critical path.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread Kris Kennaway

samwyse wrote:

On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]>
wrote:

What can be the cause of the large difference between re.search and
grep?



While doing a simple grep:
grep '[^ "=]*/' input  (input contains 156.000 a in
one row)
doesn't even take a second.

Is this a bug in python?


You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/

"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "


Very interesting!  Thanks very much for the pointer.

Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-08 Thread Kris Kennaway

samwyse wrote:

On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]>
wrote:

What can be the cause of the large difference between re.search and
grep?



While doing a simple grep:
grep '[^ "=]*/' input  (input contains 156.000 a in
one row)
doesn't even take a second.

Is this a bug in python?


You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/

"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "

I haven't tested this, but I think it would do what you want:

from Plex import *
lexicon = Lexicon([
(Rep(AnyBut(' "='))+Str('/'),  TEXT),
(AnyBut('\n'), IGNORE),
])
filename = "my_file.txt"
f = open(filename, "r")
scanner = Scanner(lexicon, f, filename)
while 1:
token = scanner.read()
print token
if token[0] is None:
break


Hmm, unfortunately it's still orders of magnitude slower than grep in my 
own application that involves matching lots of strings and regexps 
against large files (I killed it after 400 seconds, compared to 1.5 for 
grep), and that's leaving aside the much longer compilation time (over a 
minute).  If the matching was fast then I could possibly pickle the 
lexer though (but it's not).


Kris

Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway

John Machin wrote:


Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute).  If the matching was fast then I could possibly pickle the
lexer though (but it's not).



Can you give us some examples of the kinds of patterns that you are
using in practice and are slow using Python re?


Trivial stuff like:

  (Str('error in pkg_delete'), ('mtree', 'mtree')),
  (Str('filesystem was touched prior to .make install'), 
('mtree', 'mtree')),

  (Str('list of extra files and directories'), ('mtree', 'mtree')),
  (Str('list of files present before this port was installed'), 
('mtree', 'mtree')),
  (Str('list of filesystem changes from before and after'), 
('mtree', 'mtree')),


  (re('Configuration .* not supported'), ('arch', 'arch')),

  (re('(configure: error:|Script.*configure.*failed 
unexpectedly|script.*failed: here are the contents of)'),

   ('configure_error', 'configure')),
...

There are about 150 of them and I want to find which is the first match 
in a text file that ranges from a few KB up to 512MB in size.


> How large is "large"?

What kind of text?


It's compiler/build output.


Instead of grep, you might like to try nrgrep ... google("nrgrep
Navarro Raffinot"): PDF paper about it on Citeseer (if it's up),
postscript paper and C source findable from Gonzalo Navarro's home-
page.


Thanks, looks interesting but I don't think it is the best fit here.  I 
would like to avoid spawning hundreds of processes to process each file 
(since I have tens of thousands of them to process).


Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway

Jeroen Ruigrok van der Werven wrote:

-On [20080709 14:08], Kris Kennaway ([EMAIL PROTECTED]) wrote:

It's compiler/build output.


Sounds like the FreeBSD ports build cluster. :)


Yes indeed!


Kris, have you tried a PGO build of Python with your specific usage? I
cannot guarantee it will significantly speed things up though.


I am pretty sure the problem is algorithmic, not bad byte code :)  If it 
was a matter of a few % then that is in the scope of compiler tweaks, 
but we're talking orders of magnitude.


Kris


Also, a while ago I did tests with various GCC compilers and their effect on
Python running time as well as Intel's cc. Intel won on (nearly) all
accounts, meaning it was faster overall.

From the top of my mind: GCC 4.1.x was faster than GCC 4.2.x.



--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-09 Thread Kris Kennaway

samwyse wrote:

On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote:

samwyse wrote:



You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "



Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute).  If the matching was fast then I could possibly pickle the
lexer though (but it's not).


That's funny, the compilation is almost instantaneous for me.


My lexicon was quite a bit bigger, containing about 150 strings and regexps.


However, I just tested it to several files, the first containing
4875*'a', the rest each twice the size of the previous.  And you're
right, for each doubling of the file size, the match take four times
as long, meaning O(n^2).  156000*'a' would probably take 8 hours.
Here are my results:


The docs say it is supposed to be linear in the file size ;-) ;-(

Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread Kris Kennaway

John Machin wrote:


Uh-huh ... try this, then:

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

You could use this to find the "Str" cases and the prefixes of the
"re" cases (which seem to be no more complicated than 'foo.*bar.*zot')
and use something slower like Python's re to search the remainder of
the line for 'bar.*zot'.


If it was just strings, then sure...with regexps it might be possible to 
make it work, but it doesn't sound particularly maintainable.  I will 
stick with my shell script until python gets a regexp engine of 
equivalent performance.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: re.search much slower then grep on some regular expressions

2008-07-10 Thread Kris Kennaway

J. Cliff Dyer wrote:

On Wed, 2008-07-09 at 12:29 -0700, samwyse wrote:

On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote:

samwyse wrote:

You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "

Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute).  If the matching was fast then I could possibly pickle the
lexer though (but it's not).

That's funny, the compilation is almost instantaneous for me.
However, I just tested it to several files, the first containing
4875*'a', the rest each twice the size of the previous.  And you're
right, for each doubling of the file size, the match take four times
as long, meaning O(n^2).  156000*'a' would probably take 8 hours.
Here are my results:

compile_lexicon() took 0.0236021580595 secs
test('file-0.txt') took 24.8322969831 secs
test('file-1.txt') took 99.3956799681 secs
test('file-2.txt') took 398.349623132 secs


Sounds like a good strategy would be to find the smallest chunk of the
file that matches can't cross, and iterate your search on units of those
chunks.  For example, if none of your regexes cross line boundaries,
search each line of the file individually.  That may help turn around
the speed degradation you're seeing.


That's what I'm doing.  I've also tried various other things like 
mmapping the file and searching it at once, etc, but almost all of the 
time is spent in the regexp engine so optimizing other things only gives 
marginal improvement.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: multithreading in python ???

2008-07-10 Thread Kris Kennaway

Laszlo Nagy wrote:

Abhishek Asthana wrote:


Hi all ,

I  have large set of data computation and I want to break it into 
small batches and assign it to different threads .I am implementing it 
in python only. Kindly help what all libraries should I refer to 
implement the multithreading in python.


You should not do this. Python can handle multiple threads but they 
always use the same processor. (at least in CPython.) In order to take 
advantage of multiple processors, use different processes.


Only partly true.  Threads executing in the python interpreter are 
serialized and only run on a single CPU at a time.  Depending on what 
modules you use they may be able to operate independently on multiple 
CPUs.  The term to research is "GIL" (Global Interpreter Lock).  There 
are many webpages discussing it, and the alternative strategies you can use.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: pyprocessing/multiprocessing for x64?

2008-08-07 Thread Kris Kennaway

Benjamin Kaplan wrote:

The only problem I can see is that 32-bit programs can't access 64-bit 
dlls, so the OP might have to install the 32-bit version of Python for 
it to work.


Anyway, all of this is beside the point, because the multiprocessing 
module works fine on amd64 systems.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: variable expansion with sqlite

2008-08-08 Thread Kris Kennaway

marc wyburn wrote:

Hi and thanks,

I was hoping to avoid having to weld qmarks together but I guess
that's why people use things like SQL alchemy instead.  It's a good
lesson anyway.


The '?' substitution is there to safely handle untrusted input.  You 
*don't* want to pass in arbitrary user data into random parts of an SQL 
statement (or your database will get 0wned).  I think of it as a 
reminder that when you have to construct your own query template by 
using "... %s ..." % (foo) to bypass this limitation, that you had 
better be darn sure the parameters you are passing in are safe.


Kris

--
http://mail.python.org/mailman/listinfo/python-list


Constructing MIME message without loading message stream

2008-08-09 Thread Kris Kennaway
I would like to MIME encode a message from a large file without first 
loading the file into memory.  Assume the file has been pre-encoded on 
disk (actually I am using encode_7or8bit, so the encoding should be 
null).  Is there a way to construct the flattened MIME message such that 
data is streamed from the file as needed instead of being resident in 
memory?  Do I have to subclass the MIMEBase class myself?


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: benchmark

2008-08-10 Thread Kris Kennaway

Angel Gutierrez wrote:

Steven D'Aprano wrote:


On Thu, 07 Aug 2008 00:44:14 -0700, alex23 wrote:


Steven D'Aprano wrote:

In other words, about 20% of the time he measures is the time taken to
print junk to the screen.

Which makes his claim that "all the console outputs have been removed so
that the benchmarking activity is not interfered with by the IO
overheads" somewhat confusing...he didn't notice the output? Wrote it
off as a weird Python side-effect?

Wait... I've just remembered, and a quick test confirms... Python only
prints bare objects if you are running in a interactive shell. Otherwise
output of bare objects is suppressed unless you explicitly call print.

Okay, I guess he is forgiven. False alarm, my bad.



Well.. there must be somthing because this is what I got in a normal script
execution:

[EMAIL PROTECTED] test]$ python iter.py
Time per iteration = 357.467989922 microseconds
[EMAIL PROTECTED] test]$ vim iter.py
[EMAIL PROTECTED] test]$ python iter2.py
Time per iteration = 320.306909084 microseconds
[EMAIL PROTECTED] test]$ vim iter2.py
[EMAIL PROTECTED] test]$ python iter2.py
Time per iteration = 312.917997837 microseconds


What is the standard deviation on those numbers?  What is the confidence 
level that they are distinct?  In a thread complaining about poor 
benchmarking it's disappointing to see crappy test methodology being 
used to try and demonstrate flaws in the test.


Kris

--
http://mail.python.org/mailman/listinfo/python-list


Re: benchmark

2008-08-10 Thread Kris Kennaway

jlist wrote:

I think what makes more sense is to compare the code one most
typically writes. In my case, I always use range() and never use psyco.
But I guess for most of my work with Python performance hasn't been
a issue. I haven't got to write any large systems with Python yet, where
performance starts to matter.


Hopefully when you do you will improve your programming practices to not 
make poor choices - there are few excuses for not using xrange ;)


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: Constructing MIME message without loading message stream

2008-08-10 Thread Kris Kennaway

Diez B. Roggisch wrote:

Kris Kennaway schrieb:
I would like to MIME encode a message from a large file without first 
loading the file into memory.  Assume the file has been pre-encoded on 
disk (actually I am using encode_7or8bit, so the encoding should be 
null).  Is there a way to construct the flattened MIME message such 
that data is streamed from the file as needed instead of being 
resident in memory?  Do I have to subclass the MIMEBase class myself?


I don't know what you are after here - but I *do* know that anything 
above 10MB or so is most probably not transferable using mail, as MTAs 
impose limits on message-sizes. Or in other words: usually, whatever you 
want to encode should fit in memory as the network is limiting you.


MIME encoding is used for other things than emails.

Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: benchmark

2008-08-11 Thread Kris Kennaway

Peter Otten wrote:

[EMAIL PROTECTED] wrote:


On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote:

jlist wrote:

I think what makes more sense is to compare the code one most
typically writes. In my case, I always use range() and never use psyco.
But I guess for most of my work with Python performance hasn't been
a issue. I haven't got to write any large systems with Python yet,
where performance starts to matter.

Hopefully when you do you will improve your programming practices to not
make poor choices - there are few excuses for not using xrange ;)

Kris

And can you shed some light on how that relates with one of the zens
of python ?

There should be one-- and preferably only one --obvious way to do it.


For the record, the impact of range() versus xrange() is negligable -- on my
machine the xrange() variant even runs a tad slower. So it's not clear
whether Kris actually knows what he's doing.


You are only thinking in terms of execution speed.  Now think about 
memory use.  Using iterators instead of constructing lists is something 
that needs to permeate your thinking about python or you will forever be 
writing code that wastes memory, sometimes to a large extent.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: SSH utility

2008-08-11 Thread Kris Kennaway

James Brady wrote:

Hi all,
I'm looking for a python library that lets me execute shell commands
on remote machines.

I've tried a few SSH utilities so far: paramiko, PySSH and pssh;
unfortunately all been unreliable, and repeated questions on their
respective mailing lists haven't been answered...

It seems like the sort of commodity task that there should be a pretty
robust library for. Are there any suggestions for alternative
libraries or approaches?


Personally I just Popen ssh directly.  Things like paramiko make me 
concerned; getting the SSH protocol right is tricky and not something I 
want to trust to projects that have not had significant experience and 
auditing.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: benchmark

2008-08-11 Thread Kris Kennaway

Peter Otten wrote:

Kris Kennaway wrote:


Peter Otten wrote:

[EMAIL PROTECTED] wrote:


On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote:

jlist wrote:

I think what makes more sense is to compare the code one most
typically writes. In my case, I always use range() and never use
psyco. But I guess for most of my work with Python performance hasn't
been a issue. I haven't got to write any large systems with Python
yet, where performance starts to matter.

Hopefully when you do you will improve your programming practices to
not make poor choices - there are few excuses for not using xrange ;)

Kris

And can you shed some light on how that relates with one of the zens
of python ?

There should be one-- and preferably only one --obvious way to do it.

For the record, the impact of range() versus xrange() is negligable -- on
my machine the xrange() variant even runs a tad slower. So it's not clear
whether Kris actually knows what he's doing.
You are only thinking in terms of execution speed.  


Yes, because my remark was made in the context of the particular benchmark
supposed to be the topic of this thread.


No, you may notice that the above text has moved off onto another 
discussion.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: In-place memory manager, mmap (was: Fastest way to store ints and floats on disk)

2008-08-24 Thread Kris Kennaway

castironpi wrote:

Hi,

I've got an "in-place" memory manager that uses a disk-backed memory-
mapped buffer.  Among its possibilities are: storing variable-length
strings and structures for persistence and interprocess communication
with mmap.

It allocates segments of a generic buffer by length and returns an
offset to the reserved block, which can then be used with struct to
pack values to store.  The data structure is adapted from the GNU PAVL
binary tree.

Allocated blocks can be cast to ctypes.Structure instances using some
monkey patching, which is optional.

Want to open-source it.  Any interest?


Just do it.  That way users can come along later.

Kris
--
http://mail.python.org/mailman/listinfo/python-list


Re: In-place memory manager, mmap

2008-08-24 Thread Kris Kennaway

castironpi wrote:

On Aug 24, 9:52 am, Kris Kennaway <[EMAIL PROTECTED]> wrote:

castironpi wrote:

Hi,
I've got an "in-place" memory manager that uses a disk-backed memory-
mapped buffer.  Among its possibilities are: storing variable-length
strings and structures for persistence and interprocess communication
with mmap.
It allocates segments of a generic buffer by length and returns an
offset to the reserved block, which can then be used with struct to
pack values to store.  The data structure is adapted from the GNU PAVL
binary tree.
Allocated blocks can be cast to ctypes.Structure instances using some
monkey patching, which is optional.
Want to open-source it.  Any interest?

Just do it.  That way users can come along later.

Kris


How?  My website?  Google Code?  Too small for source forge, I think.
--
http://mail.python.org/mailman/listinfo/python-list




Any of those 3 would work fine, but the last two are probably better 
(sourceforge hosts plenty of tiny projects) if you don't want to have to 
manage your server and related infrastructure yourself.


Kris
--
http://mail.python.org/mailman/listinfo/python-list


GDAL installation

2015-02-10 Thread Leo Kris Palao
Hi Python Users,

I currently installed the Python 2.7.9 and installed the GDAL package.
First, I tried to install GDAL using PIP but it throws an error - I cannot
remember the exact error message. So, I install it using easy_install
command. But when I import the package I am getting this message, which I
really don't understand.

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\lpalao>python
> Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)]
> on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import gdal
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "C:\Python27\Python2.7.9\lib\site-packages\gdal.py", line 2, in
> 
> from osgeo.gdal import deprecation_warn
>   File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py", line
> 21, in 
> _gdal = swig_import_helper()
>   File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py", line
> 17, in swig_import_helper
> _mod = imp.load_module('_gdal', fp, pathname, description)
> ImportError: DLL load failed: The specified module could not be found.
> >>>


Thanks in advance,
-Leo
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: GDAL installation

2015-02-11 Thread Leo Kris Palao
Hi Asim, thanks for your help. It is working properly now.

Thanks,
-Leo

On Wed, Feb 11, 2015 at 4:48 PM, Asim Jalis  wrote:

> Hi Leo,
>
> This might be a PATH issue.
>
> See this discussion for details.
>
>
> https://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/
>
> Asim
>
> On Tue, Feb 10, 2015 at 9:11 PM, Leo Kris Palao 
> wrote:
>
>> Hi Python Users,
>>
>> I currently installed the Python 2.7.9 and installed the GDAL package.
>> First, I tried to install GDAL using PIP but it throws an error - I cannot
>> remember the exact error message. So, I install it using easy_install
>> command. But when I import the package I am getting this message, which I
>> really don't understand.
>>
>> Microsoft Windows [Version 6.1.7601]
>> Copyright (c) 2009 Microsoft Corporation.  All rights reserved.
>>
>> C:\Users\lpalao>python
>>> Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
>>> (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import gdal
>>> Traceback (most recent call last):
>>>   File "", line 1, in 
>>>   File "C:\Python27\Python2.7.9\lib\site-packages\gdal.py", line 2, in
>>> 
>>> from osgeo.gdal import deprecation_warn
>>>   File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py",
>>> line 21, in 
>>> _gdal = swig_import_helper()
>>>   File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py",
>>> line 17, in swig_import_helper
>>> _mod = imp.load_module('_gdal', fp, pathname, description)
>>> ImportError: DLL load failed: The specified module could not be found.
>>> >>>
>>
>>
>> Thanks in advance,
>> -Leo
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Random forest and svm for remote sensing in python

2015-02-12 Thread Leo Kris Palao
Hi Python Users,

Good day!

I am currently using ENVI for my image processing/remote sensing work, but
would love to divert into open source python programming for remote
sensing. Can you give me some good sites where I can see practical examples
of how python is used for remote sensing specially using random forest and
support vector machine algorithms.

Thanks,
-Leo
-- 
https://mail.python.org/mailman/listinfo/python-list


Configuring problems with GDAL in enthought Python Canopy

2015-02-25 Thread Leo Kris Palao
Hi ALL,

Just wanted to ask if somebody could guide me in installing GDAL in my
Python installed using Canopy. Could you give me some steps how to
successfully install this package? I got it running using my previous
Python Installation, but I removed it and used Canopy Python now.

btw: my python installation is located in:
C:\Users\lpalao\AppData\Local\Enthought\Canopy\User\Scripts\python.exe

Thanks in advance for the help.
-Leo
-- 
https://mail.python.org/mailman/listinfo/python-list


GDAL Installation in Enthought Python Distribution

2015-02-25 Thread Leo Kris Palao
Hi Python Users,

Would like to request how to install GDAL in my Enthought Python
Distribution (64-bit). I am having some problems making GDAL work. Or can
you point me into a blog that describes how to set up GDAL in Enthought
Python Distribution.

Thanks for any help.
-Leo
-- 
https://mail.python.org/mailman/listinfo/python-list


PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list
Recently I completed a project where I used PyObject_CallFunctionObjArgs 
extensively with the NLTK library from a program written in NASM, with no 
problems.  Now I am on a new project where I call the Python random library.  I 
use the same setup as before, but I am getting a segfault with random.seed.  

At the start of the NASM program I call a C API program that gets PyObject 
pointers to “seed” and “randrange” in the same way as I did before:

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

Later in the same program I call a C API program to call random.seed:

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);

return return_val;
}

The first program correctly imports “random” and gets pointers to “seed” and 
“randrange.”  I verified that the same pointer is correctly passed into 
C_API_2, and the seed value (1234) is passed as  Py_ssize_t value_1.  But I get 
this segfault:

Program received signal SIGSEGV, Segmentation fault.
0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
459 ../Include/object.h: No such file or directory.

So I tried Py_INCREF in the first program: 

Py_INCREF(pMod_random);
Py_INCREF(pAttr_seed);

Then I moved Py_INCREF(pAttr_seed) to the second program.  Same segfault.

Finally, I initialized “random” and “seed” in the second program, where they 
are used.  Same segfault. 

The segfault refers to Py_INCREF, so this seems to do with reference counting, 
but Py_INCREF didn’t solve it.   

I’m using Python 3.8 on Ubuntu. 

Thanks for any ideas on how to solve this. 

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list
Thanks very much to @MRAB for taking time to answer.  I changed my code to 
conform to your answer (as best I understand your comments on references), but 
I still get the same error.  My comments continue below the new code 
immediately below.  

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

Py_INCREF(pName_random);
Py_INCREF(pMod_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = (PyObject * )value_1;
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
NULL);

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);

return return_val;
}

So I incremented the reference to all objects in Get_LibModules, but I still 
get the same segfault at PyObject_CallFunctionObjArgs.  Unfortunately, 
reference counting is not well documented so I’m not clear what’s wrong. 




Sep 29, 2022, 10:06 by [email protected]:

> On 2022-09-29 16:54, Jen Kris via Python-list wrote:
>
>> Recently I completed a project where I used PyObject_CallFunctionObjArgs 
>> extensively with the NLTK library from a program written in NASM, with no 
>> problems.  Now I am on a new project where I call the Python random library. 
>>  I use the same setup as before, but I am getting a segfault with 
>> random.seed.
>>
>> At the start of the NASM program I call a C API program that gets PyObject 
>> pointers to “seed” and “randrange” in the same way as I did before:
>>
>> int64_t Get_LibModules(int64_t * return_array)
>> {
>> PyObject * pName_random = PyUnicode_FromString("random");
>> PyObject * pMod_random = PyImport_Import(pName_random);
>>
> Both PyUnicode_FromString and PyImport_Import return new references or null 
> pointers.
>
>> if (pMod_random == 0x0){
>> PyErr_Print();
>>
>
> You're leaking a reference here (pName_random).
>
>> return 1;}
>>
>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
>> "randrange");
>>
>> return_array[0] = (int64_t)pAttr_seed;
>> return_array[1] = (int64_t)pAttr_randrange;
>>
>
> You're leaking 2 references here (pName_random and pMod_random).
>
>> return 0;
>> }
>>
>> Later in the same program I call a C API program to call random.seed:
>>
>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>> {
>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
>>
>
> It's expecting all of the arguments to be PyObject*, but value_1 is 
> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int).
>
> The argument list must end with a null pointer.
>
> It returns a new reference or a null pointer.
>
>>
>> if (p_seed_calc == 0x0){
>>      PyErr_Print();
>>      return 1;}
>>
>> //Prepare return values
>> long return_val = PyLong_AsLong(p_seed_calc);
>>
> You're leaking a reference here (p_seed_calc).
>
>> return return_val;
>> }
>>
>> The first program correctly imports “random” and gets pointers to “seed” and 
>> “randrange.”  I verified that the same pointer is correctly passed into 
>> C_API_2, and the seed value (1234) is passed as  Py_ssize_t value_1.  But I 
>> get this segfault:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
>> 459 ../Include/object.h: No such file or directory.
>>
>> So I tried Py_INCREF in the first program:
>>
>> Py_INCREF(pMod_random);
>> Py_INCREF(pAttr_seed);
>>
>> Then I moved Py_INCREF(pAttr_seed) to the second program.  Same segfault.
>>
>> Finally, I initialized “random” and “seed” in the second program, where they 
>> are used.  Same segfault.
>>
>> The segfault refers to Py_INCREF, so this seems to do with reference 
>> counting, but Py_INCREF didn’t solve it.
>>
>> I’m using Python 3.8 on Ubuntu.
>>
>> Thanks for any ideas on how to solve this.
>>
>> Jen
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list
To update my previous email, I found the problem, but I have a new problem.  

Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not 
correct.  Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and 
that works.  HOWEVER, while PyObject_CallFunctionObjArgs does work now, it 
returns -1, which is not the right answer for random.seed.  I use "long 
return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long.  

So my question is why do I get -1 as return value?  When I query p_seed calc : 
get:

(gdb) p p_seed_calc
$2 = (PyObject *) 0x769be120 <_Py_NoneStruct>

Thanks again.

Jen




Sep 29, 2022, 13:02 by [email protected]:

> Thanks very much to @MRAB for taking time to answer.  I changed my code to 
> conform to your answer (as best I understand your comments on references), 
> but I still get the same error.  My comments continue below the new code 
> immediately below.  
>
> int64_t Get_LibModules(int64_t * return_array)
> {
> PyObject * pName_random = PyUnicode_FromString("random");
> PyObject * pMod_random = PyImport_Import(pName_random);
>
> Py_INCREF(pName_random);
> Py_INCREF(pMod_random);
>
> if (pMod_random == 0x0){
> PyErr_Print();
> return 1;}
>
> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");
>
> Py_INCREF(pAttr_seed);
> Py_INCREF(pAttr_randrange);
>
> return_array[0] = (int64_t)pAttr_seed;
> return_array[1] = (int64_t)pAttr_randrange;
>
> return 0;
> }
>
> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
> {
> PyObject * value_ptr = (PyObject * )value_1;
> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
> NULL);
>
> if (p_seed_calc == 0x0){
>     PyErr_Print();
>     return 1;}
>
> //Prepare return values
> long return_val = PyLong_AsLong(p_seed_calc);
>
> return return_val;
> }
>
> So I incremented the reference to all objects in Get_LibModules, but I still 
> get the same segfault at PyObject_CallFunctionObjArgs.  Unfortunately, 
> reference counting is not well documented so I’m not clear what’s wrong. 
>
>
>
>
> Sep 29, 2022, 10:06 by [email protected]:
>
>> On 2022-09-29 16:54, Jen Kris via Python-list wrote:
>>
>>> Recently I completed a project where I used PyObject_CallFunctionObjArgs 
>>> extensively with the NLTK library from a program written in NASM, with no 
>>> problems.  Now I am on a new project where I call the Python random 
>>> library.  I use the same setup as before, but I am getting a segfault with 
>>> random.seed.
>>>
>>> At the start of the NASM program I call a C API program that gets PyObject 
>>> pointers to “seed” and “randrange” in the same way as I did before:
>>>
>>> int64_t Get_LibModules(int64_t * return_array)
>>> {
>>> PyObject * pName_random = PyUnicode_FromString("random");
>>> PyObject * pMod_random = PyImport_Import(pName_random);
>>>
>> Both PyUnicode_FromString and PyImport_Import return new references or null 
>> pointers.
>>
>>> if (pMod_random == 0x0){
>>> PyErr_Print();
>>>
>>
>> You're leaking a reference here (pName_random).
>>
>>> return 1;}
>>>
>>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
>>> "randrange");
>>>
>>> return_array[0] = (int64_t)pAttr_seed;
>>> return_array[1] = (int64_t)pAttr_randrange;
>>>
>>
>> You're leaking 2 references here (pName_random and pMod_random).
>>
>>> return 0;
>>> }
>>>
>>> Later in the same program I call a C API program to call random.seed:
>>>
>>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>>> {
>>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
>>>
>>
>> It's expecting all of the arguments to be PyObject*, but value_1 is 
>> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int).
>>
>> The argument list must end with a null pointer.
>>
>> It returns a new reference or a null pointer.
>>
>>>
>>> if (p_seed_calc == 0x0){
>>>      PyErr_Print();
>>>      return 1;}
>>>
>>> //Prepare return values
>>> long return_val = PyLong_AsLong(p_seed_calc);
>>>
>> You're leaking a reference here (p_seed_calc).
>>
&

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-29 Thread Jen Kris via Python-list

I just solved this C API problem, and I’m posting the answer to help anyone 
else who might need it.  

The errors were:

(1) we must call Py_INCREF on each object when it’s created.

(2) in C_API_2 (see below) we don’t cast value_1 as I did before with PyObject 
* value_ptr = (PyObject * )value_1.  Instead we use PyObject * value_ptr = 
PyLong_FromLong(value_1);

(3) The command string to PyObject_CallFunctionObjArgs must be null terminated.

Here’s the revised code:

First we load the modules, and increment the reference to each object: 

int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);

Py_INCREF(pName_random);
Py_INCREF(pMod_random);

if (pMod_random == 0x0){
PyErr_Print();
return 1;}

PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");

Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);

return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;

return 0;
}

Next we call a program to initialize the random number generator with 
random.seed(), and increment the reference to its return value p_seed_calc:

int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
NULL);

// _

if (p_seed_calc == 0x0){
    PyErr_Print();
    return 1;}

Py_INCREF(p_seed_calc);

return 0;
}

Now we call another program to get a random number:

int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange, 
value_ptr, NULL);

if (p_randrange_calc == 0x0){
    PyErr_Print();
    return 1;}

//Prepare return values
long return_val = PyLong_AsLong(p_randrange_calc);

return return_val;
}

That returns 28, which is what I get from the Python command line. 

Thanks again to MRAB for helpful comments. 

Jen


Sep 29, 2022, 15:31 by [email protected]:

> On 2022-09-29 21:47, Jen Kris wrote:
>
>> To update my previous email, I found the problem, but I have a new problem.
>>
>> Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not 
>> correct.  Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and 
>> that works.  HOWEVER, while PyObject_CallFunctionObjArgs does work now, it 
>> returns -1, which is not the right answer for random.seed.  I use "long 
>> return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long.
>>
> random.seed returns None, so when you call PyObject_CallFunctionObjArgs it 
> returns a new reference to Py_None.
>
> If you then pass to PyLong_AsLong a reference to something that's not a 
> PyLong, it'll set an error and return -1.
>
>> So my question is why do I get -1 as return value?  When I query p_seed calc 
>> : get:
>>
>> (gdb) p p_seed_calc
>> $2 = (PyObject *) 0x769be120 <_Py_NoneStruct>
>>
> Exactly. It's Py_None, not a PyLong.
>
>> Thanks again.
>>
>> Jen
>>
>>
>>
>>
>> Sep 29, 2022, 13:02 by [email protected]:
>>
>>  Thanks very much to @MRAB for taking time to answer.  I changed my
>>  code to conform to your answer (as best I understand your comments
>>  on references), but I still get the same error.  My comments
>>  continue below the new code immediately below.
>>
>>  int64_t Get_LibModules(int64_t * return_array)
>>  {
>>  PyObject * pName_random = PyUnicode_FromString("random");
>>  PyObject * pMod_random = PyImport_Import(pName_random);
>>
>>  Py_INCREF(pName_random);
>>  Py_INCREF(pMod_random);
>>
>>  if (pMod_random == 0x0){
>>  PyErr_Print();
>>  return 1;}
>>
>>  PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>>  PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random,
>>  "randrange");
>>
>>  Py_INCREF(pAttr_seed);
>>  Py_INCREF(pAttr_randrange);
>>
>>  return_array[0] = (int64_t)pAttr_seed;
>>  return_array[1] = (int64_t)pAttr_randrange;
>>
>>  return 0;
>>  }
>>
>>  int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>>  {
>>  PyObject * value_ptr = (PyObject * )value_1;
>>  PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed,
>>  value_ptr, NULL);
>>
>>  if (p_seed_calc == 0x0){
>>      PyErr_Print();
>>      return 1;}
>>
>>  //Prepare return values
>>  long return_val = PyLong

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-30 Thread Jen Kris via Python-list

Thanks very much for your detailed reply.  I have a few followup questions.  

You said, “Some functions return an object that has already been incref'ed 
("new reference"). This occurs when it has either created a new object (the 
refcount will be 1) or has returned a pointer to an existing object (the 
refcount will be > 1 because it has been incref'ed).  Other functions return an 
object that hasn't been incref'ed. This occurs when you're looking up 
something, for example, looking at a member of a list or the value of an 
attribute.” 

In the official docs some functions show “Return value: New reference” and 
others do not.  Is there any reason why I should not just INCREF on every new 
object, regardless of whether it’s a new reference or not, and DECREF when I am 
finished with it?  The answer at 
https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to
 says “With out-of-order execution, the INCREF/DECREF are basically free 
operations, so performance is no reason to leave them out.”  Doing so means I 
don’t have to check each object to see if it needs to be INCREF’d or not, and 
that is a big help. 

Also: 

What is a borrowed reference, and how does it effect reference counting?  
According to https://jayrambhia.com/blog/pythonc-api-reference-counting, “Use 
Py_INCREF on a borrowed PyObject pointer you already have. This increments the 
reference count on the object, and obligates you to dispose of it properly.”  
So I guess it’s yes, but I’m confused by “pointer you already have.” 

What does it mean to steal a reference?  If a function steals a reference does 
it have to decref it without incref (because it’s stolen)?

Finally, you said:

if (pMod_random == 0x0){
    PyErr_Print();
Leaks here because of the refcount

Assuming pMod_random is not null, why would this leak? 

Thanks again for your input on this question. 

Jen



Sep 29, 2022, 17:33 by [email protected]:

> On 2022-09-30 01:02, MRAB wrote:
>
>> On 2022-09-29 23:41, Jen Kris wrote:
>>
>>>
>>> I just solved this C API problem, and I’m posting the answer to help anyone 
>>> else who might need it.
>>>
> [snip]
>
> What I like to do is write comments that state which variables hold a 
> reference, followed by '+' if it's a new reference (incref'ed) and '?' if it 
> could be null. '+?' means that it's probably a new reference but could be 
> null. Once I know that it's not null, I can remove the '?', and once I've 
> decref'ed it (if required) and no longer need it, I remobe it from the 
> comment.
>
> Clearing up references, as soon as they're not needed, helps to keep the 
> number of current references more manageable.
>
>
> int64_t Get_LibModules(int64_t * return_array) {
>  PyObject * pName_random = PyUnicode_FromString("random");
>  //> pName_random+?
>  if (!pName_random) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pName_random+
>  PyObject * pMod_random = PyImport_Import(pName_random);
>  //> pName_random+ pMod_random+?
>  Py_DECREF(pName_random);
>  //> pMod_random+?
>  if (!pMod_random) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pMod_random+
>  PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>  //> pMod_random+ pAttr_seed?
>  if (!pAttr_seed) {
>  Py_DECREF(pMod_random);
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pMod_random+ pAttr_seed
>  PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, 
> "randrange");
>  //> pMod_random+ pAttr_seed pAttr_randrange?
>  Py_DECREF(pMod_random);
>  //> pAttr_seed pAttr_randrange?
>  if (!pAttr_randrange) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> pAttr_seed pAttr_randrange
>  return_array[0] = (int64_t)pAttr_seed;
>  return_array[1] = (int64_t)pAttr_randrange;
>
>  return 0;
> }
>
> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) {
>  PyObject * value_ptr = PyLong_FromLong(value_1);
>  //> value_ptr+?
>  if (!!value_ptr) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> value_ptr+
>  PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, 
> NULL);
>  //> value_ptr+ p_seed_calc+?
>  Py_DECREF(value_ptr);
>  //> p_seed_calc+?
>  if (!p_seed_calc) {
>  PyErr_Print();
>  return 1;
>  }
>
>  //> p_seed_calc+
>  Py_DECREF(p_seed_calc);
>  return 0;
> }
>
> int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1) {
>  PyObject * value_ptr = PyLong_FromLong(value_1);
>  //> value_ptr+?
>  if (!value_ptr) {
>  PyErr_Print();
>  ret

Re: PyObject_CallFunctionObjArgs segfaults

2022-09-30 Thread Jen Kris via Python-list

That's great.  It clarifies things a lot for me, particularly re ref count for 
new references.  I would have had trouble if I didn't decref it twice.  

Thanks very much once again.  


Sep 30, 2022, 12:18 by [email protected]:

> On 2022-09-30 17:02, Jen Kris wrote:
>
>>
>> Thanks very much for your detailed reply.  I have a few followup questions.
>>
>> You said, “Some functions return an object that has already been incref'ed 
>> ("new reference"). This occurs when it has either created a new object (the 
>> refcount will be 1) or has returned a pointer to an existing object (the 
>> refcount will be > 1 because it has been incref'ed).  Other functions return 
>> an object that hasn't been incref'ed. This occurs when you're looking up 
>> something, for example, looking at a member of a list or the value of an 
>> attribute.”
>>
>> In the official docs some functions show “Return value: New reference” and 
>> others do not.  Is there any reason why I should not just INCREF on every 
>> new object, regardless of whether it’s a new reference or not, and DECREF 
>> when I am finished with it?  The answer at 
>> https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to
>>  says “With out-of-order execution, the INCREF/DECREF are basically free 
>> operations, so performance is no reason to leave them out.”  Doing so means 
>> I don’t have to check each object to see if it needs to be INCREF’d or not, 
>> and that is a big help.
>>
> It's OK to INCREF them, provided that you DECREF them when you no longer need 
> them, and remember that if it's a "new reference" you'd need to DECREF it 
> twice.
>
>> Also:
>>
>> What is a borrowed reference, and how does it effect reference counting?  
>> According to https://jayrambhia.com/blog/pythonc-api-reference-counting, 
>> “Use Py_INCREF on a borrowed PyObject pointer you already have. This 
>> increments the reference count on the object, and obligates you to dispose 
>> of it properly.”  So I guess it’s yes, but I’m confused by “pointer you 
>> already have.”
>>
>
> A borrowed reference is when it hasn't been INCREFed.
>
> You can think of INCREFing as a way of indicating ownership, which is often 
> shared ownership (refcount > 1). When you're borrowing a reference, you're 
> using it temporarily, but not claiming ownership. When the last owner 
> releases its ownership (DECREF reduces the refcount to 0), the object can be 
> garbage collected.
>
> When, say, you lookup an attribute, or get an object from a list with 
> PyList_GetItem, it won't have been INCREFed. You're using it temporarily, 
> just borrowing a reference.
>
>>
>> What does it mean to steal a reference?  If a function steals a reference 
>> does it have to decref it without incref (because it’s stolen)?
>>
> When function steals a reference, it's claiming ownership but not INCREFing 
> it.
>
>>
>> Finally, you said:
>>
>> if (pMod_random == 0x0){
>>     PyErr_Print();
>> Leaks here because of the refcount
>>
>> Assuming pMod_random is not null, why would this leak?
>>
> It's pName_random that's the leak.
>
> PyUnicode_FromString("random") will either create and return a new object for 
> the string "random" (refcount == 1) or return a reference to an existing 
> object (refcount > 1). You need to DECREF it before returning from the 
> function.
>
> Suppose it created a new object. You call the function, it creates an object, 
> you use it, then return from the function. The object still exists, but 
> there's no reference to it. Now call the function again. It creates another 
> object, you use it, then return from the function. You now have 2 objects 
> with no reference to them.
>
>> Thanks again for your input on this question.
>>
>> Jen
>>
>>
>>
>> Sep 29, 2022, 17:33 by [email protected]:
>>
>>  On 2022-09-30 01:02, MRAB wrote:
>>
>>  On 2022-09-29 23:41, Jen Kris wrote:
>>
>>
>>  I just solved this C API problem, and I’m posting the
>>  answer to help anyone else who might need it.
>>
>>  [snip]
>>
>>  What I like to do is write comments that state which variables
>>  hold a reference, followed by '+' if it's a new reference
>>  (incref'ed) and '?' if it could be null. '+?' means that it's
>>  probably a new reference but cou

Debugging Python C extensions with GDB

2022-11-14 Thread Jen Kris via Python-list
In September 2021, Victor Stinner wrote “Debugging Python C extensions with 
GDB” 
(https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9).
  

My question is:  with Python 3.9+, can I debug into a C extension written in 
pure C and called from ctypes  -- that is not written using the C_API? 

Thanks. 

Jen



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Debugging Python C extensions with GDB

2022-11-14 Thread Jen Kris via Python-list
Thanks for your reply.  Victor's article didn't mention ctypes extensions, so I 
wanted to post a question before I build from source.  


Nov 14, 2022, 14:32 by [email protected]:

>
>
>> On 14 Nov 2022, at 19:10, Jen Kris via Python-list  
>> wrote:
>>
>> In September 2021, Victor Stinner wrote “Debugging Python C extensions with 
>> GDB” 
>> (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9).
>>  
>>
>> My question is:  with Python 3.9+, can I debug into a C extension written in 
>> pure C and called from ctypes  -- that is not written using the C_API?
>>
>
> Yes.
>
> Just put a breakpoint on the function in the c library that you want to debug.
> You can set the breakpoint before a .so is loaded.
>
> Barry
>
>>
>> Thanks. 
>>
>> Jen
>>
>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list


To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list
I am writing a spot speedup in assembly language for a short but 
computation-intensive Python loop, and I discovered something about Python 
array handling that I would like to clarify.  

For a simplified example, I created a matrix mx1 and assigned the array arr1 to 
the third row of the matrix:

mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
arr1 = mx1[2]

The pointers to these are now the same:

ida = id(mx1[2]) - 140260325306880
idb = id(arr1) - 140260325306880

That’s great because when I encounter this in assembly or C, I can just borrow 
the pointer to row 3 for the array arr1, on the assumption that they will 
continue to point to the same object.  Then when I do any math operations in 
arr1 it will be reflected in both arrays because they are now pointing to the 
same array:

arr1[0] += 2
print(mx1[2]) - [9, 8, 9]
print(arr1) - [9, 8, 9]

Now mx1 looks like this:

[ 1, 2, 3 ]
[ 4, 5, 6 ]
[ 9, 8, 9 ]

and it stays that way for remaining iterations.  

But on the next iteration we assign arr1 to something else:

arr1 = [ 10, 11, 12 ]
idc = id(arr1) – 140260325308160
idd = id(mx1[2]) – 140260325306880

Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 
will not affect mx1.  So where I’m rewriting some Python code in a low level 
language, I can’t assume that the two objects are equal because that equality 
will not remain if either is reassigned.  So if I do some operation on one 
array I have to conform the two arrays for as long as they remain equal, I 
can’t just do it in one operation because I can’t rely on the objects remaining 
equal. 

Is my understanding of this correct?  Is there anything I’m missing? 

Thanks very much. 

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list

Thanks for your comments.  I'd like to make one small point.  You say:

"Assignment in Python is a matter of object references. It's not
"conform them as long as they remain equal". You'll have to think in
terms of object references the entire way."

But where they have been set to the same object, an operation on one will 
affect the other as long as they are equal (in Python).  So I will have to 
conform them in those cases because Python will reflect any math operation in 
both the array and the matrix.  



Jan 10, 2023, 12:28 by [email protected]:

> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list
>  wrote:
>
>>
>> I am writing a spot speedup in assembly language for a short but 
>> computation-intensive Python loop, and I discovered something about Python 
>> array handling that I would like to clarify.
>>
>> For a simplified example, I created a matrix mx1 and assigned the array arr1 
>> to the third row of the matrix:
>>
>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
>> arr1 = mx1[2]
>>
>> The pointers to these are now the same:
>>
>> ida = id(mx1[2]) - 140260325306880
>> idb = id(arr1) - 140260325306880
>>
>> That’s great because when I encounter this in assembly or C, I can just 
>> borrow the pointer to row 3 for the array arr1, on the assumption that they 
>> will continue to point to the same object.  Then when I do any math 
>> operations in arr1 it will be reflected in both arrays because they are now 
>> pointing to the same array:
>>
>
> That's not an optimization; what you've done is set arr1 to be a
> reference to that object.
>
>> But on the next iteration we assign arr1 to something else:
>>
>> arr1 = [ 10, 11, 12 ]
>> idc = id(arr1) – 140260325308160
>> idd = id(mx1[2]) – 140260325306880
>>
>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 
>> will not affect mx1.
>>
>
> Yep, you have just set arr1 to be a completely different object.
>
>> So where I’m rewriting some Python code in a low level language, I can’t 
>> assume that the two objects are equal because that equality will not remain 
>> if either is reassigned.  So if I do some operation on one array I have to 
>> conform the two arrays for as long as they remain equal, I can’t just do it 
>> in one operation because I can’t rely on the objects remaining equal.
>>
>> Is my understanding of this correct?  Is there anything I’m missing?
>>
>
> Assignment in Python is a matter of object references. It's not
> "conform them as long as they remain equal". You'll have to think in
> terms of object references the entire way.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-10 Thread Jen Kris via Python-list
There are cases where NumPy would be the best choice, but that wasn’t the case 
here with what the loop was doing.  

To sum up what I learned from this post, where one object derives from another 
object (a = b[0], for example), any operation that would alter one will alter 
the other.  When either is assigned to something else, then they no longer 
point to the same memory location and they’re once again independent.   I hope 
the word "derives" sidesteps the semantic issue of whether they are "equal."    

Thanks to all who replied to this post.  

Jen


Jan 10, 2023, 13:59 by [email protected]:

> Just to add a possibly picky detail to what others have said, Python does not 
> have an "array" type.  It has a "list" type, as well as some other, not 
> necessarily mutable, sequence types.
>
> If you want to speed up list and matrix operations, you might use NumPy.  Its 
> arrays and matrices are heavily optimized for fast processing and provide 
> many useful operations on them.  No use calling out to C code yourself when 
> NumPy has been refining that for many years.
>
> On 1/10/2023 4:10 PM, MRAB wrote:
>
>> On 2023-01-10 20:41, Jen Kris via Python-list wrote:
>>
>>>
>>> Thanks for your comments.  I'd like to make one small point.  You say:
>>>
>>> "Assignment in Python is a matter of object references. It's not
>>> "conform them as long as they remain equal". You'll have to think in
>>> terms of object references the entire way."
>>>
>>> But where they have been set to the same object, an operation on one will 
>>> affect the other as long as they are equal (in Python).  So I will have to 
>>> conform them in those cases because Python will reflect any math operation 
>>> in both the array and the matrix.
>>>
>> It's not a 2D matrix, it's a 1D list containing references to 1D lists, each 
>> of which contains references to Python ints.
>>
>> In CPython, references happen to be pointers, but that's just an 
>> implementation detail.
>>
>>>
>>>
>>> Jan 10, 2023, 12:28 by [email protected]:
>>>
>>>> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list
>>>>  wrote:
>>>>
>>>>>
>>>>> I am writing a spot speedup in assembly language for a short but 
>>>>> computation-intensive Python loop, and I discovered something about 
>>>>> Python array handling that I would like to clarify.
>>>>>
>>>>> For a simplified example, I created a matrix mx1 and assigned the array 
>>>>> arr1 to the third row of the matrix:
>>>>>
>>>>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
>>>>> arr1 = mx1[2]
>>>>>
>>>>> The pointers to these are now the same:
>>>>>
>>>>> ida = id(mx1[2]) - 140260325306880
>>>>> idb = id(arr1) - 140260325306880
>>>>>
>>>>> That’s great because when I encounter this in assembly or C, I can just 
>>>>> borrow the pointer to row 3 for the array arr1, on the assumption that 
>>>>> they will continue to point to the same object.  Then when I do any math 
>>>>> operations in arr1 it will be reflected in both arrays because they are 
>>>>> now pointing to the same array:
>>>>>
>>>>
>>>> That's not an optimization; what you've done is set arr1 to be a
>>>> reference to that object.
>>>>
>>>>> But on the next iteration we assign arr1 to something else:
>>>>>
>>>>> arr1 = [ 10, 11, 12 ]
>>>>> idc = id(arr1) – 140260325308160
>>>>> idd = id(mx1[2]) – 140260325306880
>>>>>
>>>>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in 
>>>>> arr1 will not affect mx1.
>>>>>
>>>>
>>>> Yep, you have just set arr1 to be a completely different object.
>>>>
>>>>> So where I’m rewriting some Python code in a low level language, I can’t 
>>>>> assume that the two objects are equal because that equality will not 
>>>>> remain if either is reassigned.  So if I do some operation on one array I 
>>>>> have to conform the two arrays for as long as they remain equal, I can’t 
>>>>> just do it in one operation because I can’t rely on the objects remaining 
>>>>> equal.
>>>>>
>>>>> Is my understanding of this correct?  Is there anything I’m missing?
>>>>>
>>>>
>>>> Assignment in Python is a matter of object references. It's not
>>>> "conform them as long as they remain equal". You'll have to think in
>>>> terms of object references the entire way.
>>>>
>>>> ChrisA
>>>> -- 
>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-11 Thread Jen Kris via Python-list
Yes, I did understand that.  In your example, "a" and "b" are the same pointer, 
so an operation on one is an operation on the other (because they’re the same 
memory block).  My issue in Python came up because Python can dynamically 
change one or the other to a different object (memory block) so I have to be 
aware of that when handing this kind of situation. 


Jan 10, 2023, 17:31 by [email protected]:

> On 11/01/23 11:21 am, Jen Kris wrote:
>
>> where one object derives from another object (a = b[0], for example), any 
>> operation that would alter one will alter the other.
>>
>
> I think you're still confused. In C terms, after a = b[0], a and b[0]
> are pointers to the same block of memory. If you change that block of
> memory, then of course you will see the change through either pointer.
>
> Here's a rough C translation of some of your Python code:
>
> /* mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] */
> int **mx1 = (int **)malloc(3 * sizeof(int *));
> mx1[0] = (int *)malloc(3 * sizeof(int));
> mx1[0][0] = 1;
> mx1[0][1] = 2;
> mx1[0][2] = 3;
> mx1[1] = (int *)malloc(3 * sizeof(int));
> mx1[1][0] = 4;
> mx1[1][1] = 5;
> mx1[1][2] = 6;
> mx1[2] = (int *)malloc(3 * sizeof(int));
> mx1[2][0] = 7;
> mx1[2][1] = 8;
> mx1[2][2] = 9;
>
> /* arr1 = mx1[2] */
> int *arr1 = mx[2];
>
> /* arr1 = [ 10, 11, 12 ] */
> arr1 = (int *)malloc(3 * sizeof(int));
> arr1[0] = 10;
> arr1[1] = 11;
> arr1[2] = 12;
>
> Does that help your understanding?
>
> -- 
> Greg
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-11 Thread Jen Kris via Python-list
Thanks for your comments.  After all, I asked for clarity so it’s not pedantic 
to be precise, and you’re helping to clarify.  

Going back to my original post,

mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
arr1 = mx1[2]

Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
because while they are different names, they are the assigned same memory 
location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
names will be updated. 

That’s what I meant by "an operation on one is an operation on the other."  To 
be more precise, an operation on one name will be reflected in the other name.  
The difference is in the names,  not the pointers.  Each name has the same 
pointer in my example, but operations can be done in Python using either name. 




Jan 11, 2023, 09:13 by [email protected]:

> Op 11/01/2023 om 16:33 schreef Jen Kris via Python-list:
>
>> Yes, I did understand that.  In your example, "a" and "b" are the same 
>> pointer, so an operation on one is an operation on the other (because 
>> they’re the same memory block).
>>
>
> Sorry if you feel I'm being overly pedantic, but your explanation "an 
> operation on one is an operation on the other (because they’re the same 
> memory block)" still feels a bit misguided. "One" and "other" still make it 
> sound like there are two objects, and "an operation on one" and "an operation 
> on the other" make it sound like there are two operations.
> Sometimes it doesn't matter if we're a bit sloppy for sake of simplicity or 
> convenience, sometimes we really need to be precise. I think this is a case 
> where we need to be precise.
>
> So, to be precise: there is only one object, with possible multiple names to 
> it. We can change the object, using one of the names. That is one and only 
> one operation on one and only one object. Since the different names refer to 
> the same object, that change will of course be visible through all of them.
> Note that 'name' in that sentence doesn't just refer to variables (mx1, arr1, 
> ...) but also things like indexed lists (mx1[0], mx1[[0][0], ...), loop 
> variables, function arguments.
>
> The correct mental model is important here, and I do think you're on track or 
> very close to it, but the way you phrase things does give me that nagging 
> feeling that you still might be just a bit off.
>
> -- 
> "Peace cannot be kept by force. It can only be achieved through 
> understanding."
>  -- Albert Einstein
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: To clarify how Python handles two equal objects

2023-01-13 Thread Jen Kris via Python-list

Avi,

Thanks for your comments.  You make a good point. 

Going back to my original question, and using your slice() example: 

middle_by_two = slice(5, 10, 2)
nums = [n for n in range(12)]
q = nums[middle_by_two]
x = id(q)
b = q
y = id(b)

If I assign "b" to "q", then x and y match – they point to the same memory 
until "b" OR "q" are  reassigned to something else.  If "q" changes during the 
lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as in:

nums = [n for n in range(2, 14)]
q = nums[middle_by_two]
x = id(q)
y = id(b)

Now "x" and "y" are different, as we would expect.  So when writing a spot 
speed up in a compiled language, you can see in the Python source if either is 
reassigned, so you’ll know how to handle it.  The motivation behind my question 
was that in a compiled extension it’s faster to borrow a pointer than to move 
an entire array if it’s possible, but special care must be taken. 

Jen



Jan 12, 2023, 20:51 by [email protected]:

> Jen,
>
> It is dangerous territory you are treading as there are times all or parts of 
> objects are copied, or changed in place or the method you use to make a view 
> is not doing quite what you want.
>
> As an example, you can create a named slice such as:
>
>  middle_by_two = slice(5, 10, 2)
>
> The above is not in any sense pointing at anything yet. But given a long 
> enough list or other such objects, it will take items (starting at index 0) 
> starting with item that are at indices 5 then 7 then 9  as in this:
>
>  nums = [n for n in range(12)]
>  nums[middle_by_two]
>
> [5, 7, 9]
>
> The same slice will work on anything else:
>
>  list('abcdefghijklmnopqrstuvwxyz')[middle_by_two]
> ['f', 'h', 'j']
>
> So although you may think the slice is bound to something, it is not. It is 
> an object that only later is briefly connected to whatever you want to apply 
> it to.
>
> If I later change nums, above, like this:
>
>  nums = [-3, -2, -1] + nums
>  nums
> [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>  nums[middle_by_two]
> [2, 4, 6]
>
> In the example, you can forget about whether we are talking about pointers 
> directly or indirectly or variable names and so on. Your "view" remains valid 
> ONLY as long as you do not change either the slice or the underlying object 
> you are applying to -- at least not the items you want to extract.
>
> Since my example inserted three new items at the start using negative numbers 
> for illustration, you would need to adjust the slice by making a new slice 
> designed to fit your new data. The example below created an adjusted slice 
> that adds 3 to the start and stop settings of the previous slice while 
> copying the step value and then it works on the elongated object:
>
>  middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 3, 
> middle_by_two.step)
>  nums[middle_by_two_adj]
> [5, 7, 9]
>
> A suggestion is  that whenever you are not absolutely sure that the contents 
> of some data structure might change without your participation, then don't 
> depend on various kinds of aliases to keep the contents synchronized. Make a 
> copy, perhaps  a deep copy and make sure the only thing ever changing it is 
> your code and later, if needed, copy the result back to any other data 
> structure. Of course, if anything else is accessing the result in the 
> original in between, it won't work.
>
> Just FYI, a similar analysis applies to uses of the numpy and pandas and 
> other modules if you get some kind of object holding indices to a series such 
> as integers or Booleans and then later try using it after the number of items 
> or rows or columns have changed. Your indices no longer match.
>
> Avi
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Wednesday, January 11, 2023 1:29 PM
> To: Roel Schroeven 
> Cc: [email protected]
> Subject: Re: To clarify how Python handles two equal objects
>
> Thanks for your comments.  After all, I asked for clarity so it’s not 
> pedantic to be precise, and you’re helping to clarify. 
>
> Going back to my original post,
>
> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
> arr1 = mx1[2]
>
> Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
> because while they are different names, they are the assigned same memory 
> location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
> names will be updated. 
>
> That’s what I meant by "an operation on one is an operation on the other."  
> 

Re: To clarify how Python handles two equal objects

2023-01-13 Thread Jen Kris via Python-list
Bob, 

Your examples show a and b separately defined.  My example is where the 
definition is a=1; b = a.  But I'm only interested in arrays.  I would not rely 
on this for integers, and there's not likely to be any real cost savings there. 
  


Jan 13, 2023, 08:45 by [email protected]:

> It seems to me that the the entire concept of relying on python's idea of 
> where an object is stored is just plain dangerous. A most simple example 
> might be:
>    >>> a=1
>    >>> b=1
>    >>> a is b
>   True
>   >>> a=1234
>   >>> b=1234
>   >>> a is b
>   False
>
> Not sure what happens if you manipulate the data referenced by 'b' in the 
> first example thinking you are changing something referred to by 'a' ... but 
> you might be smart to NOT think that you know.
>
>
>
> On Fri, Jan 13, 2023 at 9:00 AM Jen Kris via Python-list <> 
> [email protected]> > wrote:
>
>>
>> Avi,
>>  
>>  Thanks for your comments.  You make a good point. 
>>  
>>  Going back to my original question, and using your slice() example: 
>>  
>>  middle_by_two = slice(5, 10, 2)
>>  nums = [n for n in range(12)]
>>  q = nums[middle_by_two]
>>  x = id(q)
>>  b = q
>>  y = id(b)
>>  
>>  If I assign "b" to "q", then x and y match – they point to the same memory 
>> until "b" OR "q" are  reassigned to something else.  If "q" changes during 
>> the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as 
>> in:
>>  
>>  nums = [n for n in range(2, 14)]
>>  q = nums[middle_by_two]
>>  x = id(q)
>>  y = id(b)
>>  
>>  Now "x" and "y" are different, as we would expect.  So when writing a spot 
>> speed up in a compiled language, you can see in the Python source if either 
>> is reassigned, so you’ll know how to handle it.  The motivation behind my 
>> question was that in a compiled extension it’s faster to borrow a pointer 
>> than to move an entire array if it’s possible, but special care must be 
>> taken. 
>>  
>>  Jen
>>  
>>  
>>  
>>  Jan 12, 2023, 20:51 by >> [email protected]>> :
>>  
>>  > Jen,
>>  >
>>  > It is dangerous territory you are treading as there are times all or 
>> parts of objects are copied, or changed in place or the method you use to 
>> make a view is not doing quite what you want.
>>  >
>>  > As an example, you can create a named slice such as:
>>  >
>>  >  middle_by_two = slice(5, 10, 2)
>>  >
>>  > The above is not in any sense pointing at anything yet. But given a long 
>> enough list or other such objects, it will take items (starting at index 0) 
>> starting with item that are at indices 5 then 7 then 9  as in this:
>>  >
>>  >  nums = [n for n in range(12)]
>>  >  nums[middle_by_two]
>>  >
>>  > [5, 7, 9]
>>  >
>>  > The same slice will work on anything else:
>>  >
>>  >  list('abcdefghijklmnopqrstuvwxyz')[middle_by_two]
>>  > ['f', 'h', 'j']
>>  >
>>  > So although you may think the slice is bound to something, it is not. It 
>> is an object that only later is briefly connected to whatever you want to 
>> apply it to.
>>  >
>>  > If I later change nums, above, like this:
>>  >
>>  >  nums = [-3, -2, -1] + nums
>>  >  nums
>>  > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>  >  nums[middle_by_two]
>>  > [2, 4, 6]
>>  >
>>  > In the example, you can forget about whether we are talking about 
>> pointers directly or indirectly or variable names and so on. Your "view" 
>> remains valid ONLY as long as you do not change either the slice or the 
>> underlying object you are applying to -- at least not the items you want to 
>> extract.
>>  >
>>  > Since my example inserted three new items at the start using negative 
>> numbers for illustration, you would need to adjust the slice by making a new 
>> slice designed to fit your new data. The example below created an adjusted 
>> slice that adds 3 to the start and stop settings of the previous slice while 
>> copying the step value and then it works on the elongated object:
>>  >
>>  >  middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 
>> 3, middle_by_two.step)
>>  >  nu

RE: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list
Avi, 

Your comments go farther afield than my original question, but you made some 
interesting additional points.  For example, I sometimes work with the C API 
and sys.getrefcount may be helpful in deciding when to INCREF and DECREF.  But 
that’s another issue. 

The situation I described in my original post is limited to a case such as x = 
y where both "x" and "y" are arrays – whether they are lists in Python, or from 
the array module – and the question in a compiled C extension is whether the 
assignment can be done simply by "x" taking the pointer to "y" rather than 
moving all the data from "y" into the memory buffer for "x" which, for a wide 
array, would be much more time consuming than just moving a pointer.  The other 
advantage to doing it that way is if, as in my case, we perform a math 
operation on any element in "x" then Python expects that the same change to be 
reflected in "y."  If I don’t use the same pointers then I would have to 
perform that operation twice – once for "x" and once  for "y" – in addition to 
the expense of moving all the data. 

The answers I got from this post confirmed that it I can use the pointer if "y" 
is not re-defined to something else during the lifespan of "x."  If it is then 
"x" has to be restored to its original pointer.  I did it that way, and 
helpfully the compiler did not overrule me. 


Jan 13, 2023, 18:41 by [email protected]:

> Jen,
>
> This may not be on target but I was wondering about your needs in this 
> category. Are all your data in a form where all in a cluster are the same 
> object type, such as floating point?
>
> Python has features designed to allow you to get multiple views on such 
> objects such as memoryview that can be used to say see an array as a matrix 
> of n rows by m columns, or m x n, or any other combo. And of course the 
> fuller numpy package has quite a few features.
>
> However, as you note, there is no guarantee that any reference to the data 
> may not shift away from it unless you build fairly convoluted logic or data 
> structures such as having an object that arranges to do something when you 
> try to remove it, such as tinkering with the __del__ method as well as 
> whatever method is used to try to set it to a new value. I guess that might 
> make sense for something like asynchronous programming including when setting 
> locks so multiple things cannot overlap when being done.
>
> Anyway, some of the packages like numpy are optimized in many ways but if you 
> want to pass a subset of sorts to make processing faster, I suspect you could 
> do things like pass a memoryview but it might not be faster than what you 
> build albeit probably more reliable and portable.
>
> I note another odd idea that others may have mentioned, with caution.
>
> If you load the sys module, you can CAREFULLY use code like this.
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
>
> Note if a==1 you will get some huge number of references and this is 
> meaningless. The 2 above is because asking about how many references also 
> references it.
>
> So save what ever number you have and see what happens when you make a second 
> reference or a third, and what happens if you delete or alter a reference:
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
> b = a
> sys.getrefcount(a)
> 3
> sys.getrefcount(b)
> 3
> c = b
> d = a
> sys.getrefcount(a)
> 5
> sys.getrefcount(d)
> 5
> del(a)
> sys.getrefcount(d)
> 4
> b = "something else"
> sys.getrefcount(d)
> 3
>
> So, in theory, you could carefully write your code to CHECK the reference 
> count had not changed but there remain edge cases where a removed reference 
> is replaced by yet another new reference and you would have no idea.
>
> Avi
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Wednesday, January 11, 2023 1:29 PM
> To: Roel Schroeven 
> Cc: [email protected]
> Subject: Re: To clarify how Python handles two equal objects
>
> Thanks for your comments.  After all, I asked for clarity so it’s not 
> pedantic to be precise, and you’re helping to clarify. 
>
> Going back to my original post,
>
> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
> arr1 = mx1[2]
>
> Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
> because while they are different names, they are the assigned same memory 
> location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
> names will be updated. 
>
> That’s what I meant by "an operation on one is an operat

Re: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list
Yes, in fact I asked my original question – "I discovered something about 
Python array handling that I would like to clarify" -- because I saw that 
Python did it that way.  



Jan 14, 2023, 15:51 by [email protected]:

> On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list
>  wrote:
>
>> The situation I described in my original post is limited to a case such as x 
>> = y ... the assignment can be done simply by "x" taking the pointer to "y" 
>> rather than moving all the data from "y" into the memory buffer for "x"
>>
>
> It's not simply whether it *can* be done. It, in fact, *MUST* be done
> that way. The ONLY meaning of "x = y" is that you now have a name "x"
> which refers to whatever object is currently found under the name "y".
> This is not an optimization, it is a fundamental of Python's object
> model. This is true regardless of what kind of object this is; every
> object must behave this way.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list
When matching a string against a longer string, where both strings have spaces 
in them, we need to escape the spaces.  

This works (no spaces):

import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

That gives me the start and end character positions, which is what I want. 

However, this does not work:

import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

I’ve tried several other attempts based on my reseearch, but still no match. 

I don’t have much experience with regex, so I hoped a reg-expert might help. 

Thanks,

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list
Yes, that's it.  I don't know how long it would have taken to find that detail 
with research through the voluminous re documentation.  Thanks very much.  

Feb 27, 2023, 15:47 by [email protected]:

> On 2023-02-27 23:11, Jen Kris via Python-list wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces.
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want.
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>      print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match.
>>
>> I don’t have much experience with regex, so I hoped a reg-expert might help.
>>
> You need to escape only the pattern, not the string you're searching.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

I went to the re module because the specified string may appear more than once 
in the string (in the code I'm writing).  For example:  

a = "X - abc_degree + 1 + qq + abc_degree + 1"
 b = "abc_degree + 1"
 q = a.find(b)

print(q)
4

So it correctly finds the start of the first instance, but not the second one.  
The re code finds both instances.  If I knew that the substring occurred only 
once then the str.find would be best.  

I changed my re code after MRAB's comment, it now works.  

Thanks much.  

Jen


Feb 27, 2023, 15:56 by [email protected]:

> On 28Feb2023 00:11, Jen Kris  wrote:
>
>> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>
>> This works (no spaces):
>>
>> import re
>> example = 'abcdefabcdefabcdefg'
>> find_string = "abc"
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> That gives me the start and end character positions, which is what I want. 
>>
>> However, this does not work:
>>
>> import re
>> example = re.escape('X - cty_degrees + 1 + qq')
>> find_string = re.escape('cty_degrees + 1')
>> for match in re.finditer(find_string, example):
>>     print(match.start(), match.end())
>>
>> I’ve tried several other attempts based on my reseearch, but still no match. 
>>
>
> You need to print those strings out. You're escaping the _example_ string, 
> which would make it:
>
>  X - cty_degrees \+ 1 \+ qq
>
> because `+` is a special character in regexps and so `re.escape` escapes it. 
> But you don't want to mangle the string you're searching! After all, the text 
> above does not contain the string `cty_degrees + 1`.
>
> My secondary question is: if you're escaping the thing you're searching 
> _for_, then you're effectively searching for a _fixed_ string, not a 
> pattern/regexp. So why on earth are you using regexps to do your searching?
>
> The `str` type has a `find(substring)` function. Just use that! It'll be 
> faster and the code simpler!
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

string.count() only tells me there are N instances of the string; it does not 
say where they begin and end, as does re.finditer.  

Feb 27, 2023, 16:20 by [email protected]:

> Would string.count() work for you then?
>
> On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> 
> [email protected]> > wrote:
>
>>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).  For example: 
>>  
>>  a = "X - abc_degree + 1 + qq + abc_degree + 1"
>>   b = "abc_degree + 1"
>>   q = a.find(b)
>>  
>>  print(q)
>>  4
>>  
>>  So it correctly finds the start of the first instance, but not the second 
>> one.  The re code finds both instances.  If I knew that the substring 
>> occurred only once then the str.find would be best.  
>>  
>>  I changed my re code after MRAB's comment, it now works.  
>>  
>>  Thanks much.  
>>  
>>  Jen
>>  
>>  
>>  Feb 27, 2023, 15:56 by >> [email protected]>> :
>>  
>>  > On 28Feb2023 00:11, Jen Kris <>> [email protected]>> > wrote:
>>  >
>>  >> When matching a string against a longer string, where both strings have 
>> spaces in them, we need to escape the spaces. 
>>  >>
>>  >> This works (no spaces):
>>  >>
>>  >> import re
>>  >> example = 'abcdefabcdefabcdefg'
>>  >> find_string = "abc"
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> That gives me the start and end character positions, which is what I 
>> want. 
>>  >>
>>  >> However, this does not work:
>>  >>
>>  >> import re
>>  >> example = re.escape('X - cty_degrees + 1 + qq')
>>  >> find_string = re.escape('cty_degrees + 1')
>>  >> for match in re.finditer(find_string, example):
>>  >>     print(match.start(), match.end())
>>  >>
>>  >> I’ve tried several other attempts based on my reseearch, but still no 
>> match. 
>>  >>
>>  >
>>  > You need to print those strings out. You're escaping the _example_ 
>> string, which would make it:
>>  >
>>  >  X - cty_degrees \+ 1 \+ qq
>>  >
>>  > because `+` is a special character in regexps and so `re.escape` escapes 
>> it. But you don't want to mangle the string you're searching! After all, the 
>> text above does not contain the string `cty_degrees + 1`.
>>  >
>>  > My secondary question is: if you're escaping the thing you're searching 
>> _for_, then you're effectively searching for a _fixed_ string, not a 
>> pattern/regexp. So why on earth are you using regexps to do your searching?
>>  >
>>  > The `str` type has a `find(substring)` function. Just use that! It'll be 
>> faster and the code simpler!
>>  >
>>  > Cheers,
>>  > Cameron Simpson <>> [email protected]>> >
>>  > -- 
>>  > >> https://mail.python.org/mailman/listinfo/python-list
>>  >
>>  
>>  -- 
>>  >> https://mail.python.org/mailman/listinfo/python-list
>>
>
>
> -- 
>  Listen to my CD at > http://www.mellowood.ca/music/cedars>  
> Bob van der Poel ** Wynndel, British Columbia, CANADA **
> EMAIL: > [email protected]
> WWW:   > http://www.mellowood.ca
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-27 Thread Jen Kris via Python-list

I haven't tested it either but it looks like it would work.  But for this case 
I prefer the relative simplicity of:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
    print(match.start(), match.end())

4 18
26 40

I don't insist on terseness for its own sake, but it's cleaner this way.  

Jen


Feb 27, 2023, 16:55 by [email protected]:

> On 28Feb2023 01:13, Jen Kris  wrote:
>
>> I went to the re module because the specified string may appear more than 
>> once in the string (in the code I'm writing).
>>
>
> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>
>  pos = 0
>  while True:
>  found = s.find(substring, pos)
>  if found < 0:
>  break
>  start = found
>  end = found + len(substring)
>  ... do whatever with start and end ...
>  pos = end
>
> Many people go straight to the `re` module whenever they're looking for 
> strings. It is often cryptic error prone overkill. Just something to keep in 
> mind.
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list
The code I sent is correct, and it runs here.  Maybe you received it with a 
carriage return removed, but on my copy after posting, it is correct:

example = 'X - abc_degree + 1 + qq + abc_degree + 1'
 find_string = re.escape('abc_degree + 1')
 for match in re.finditer(find_string, example):
 print(match.start(), match.end())

One question:  several people have made suggestions other than regex (not your 
terser example with regex you shown below).  Is there a reason why regex is not 
preferred to, for example, a list comp?  Performance?  Reliability?  



  


Feb 27, 2023, 18:16 by [email protected]:

> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we 
> get code that can be copied and run as-is.
>
>  What you sent is not terse. It is wrong. It will not run on any python 
> interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') 
> for match in re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but 
> somewhere in your copy/paste process, 
>
> And, just for fun, since there is nothing wrong with your code, this minor 
> change is terser:
>
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>
> ... print(match.start(), match.end())
> ... 
> ... 
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might 
> match multiple things that are far from the same such as matching two 
> repeated words of any kind in any case including "and and" and "so so" or 
> finding words that have multiple doubled letter as in the  stereotypical 
> bookkeeper. In those cases, you may want even more than offsets but also show 
> the exact text that matched or even show some characters before and/or after 
> for context.
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson 
> Cc: Python List 
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work.  But for this 
> case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in 
> re.finditer(find_string, example):
>  print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way. 
>
> Jen
>
>
> Feb 27, 2023, 16:55 by [email protected]:
>
>> On 28Feb2023 01:13, Jen Kris  wrote:
>>
>>> I went to the re module because the specified string may appear more than 
>>> once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>>  pos = 0
>>  while True:
>>  found = s.find(substring, pos)
>>  if found < 0:
>>  break
>>  start = found
>>  end = found + len(substring)
>>  ... do whatever with start and end ...
>>  pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for 
>> strings. It is often cryptic error prone overkill. Just something to keep in 
>> mind.
>>
>> Cheers,
>> Cameron Simpson 
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

Using str.startswith is a cool idea in this case.  But is it better than regex 
for performance or reliability?  Regex syntax is not a model of simplicity, but 
in my simple case it's not too difficult.  


Feb 27, 2023, 18:52 by [email protected]:

> On 2/27/2023 9:16 PM, [email protected] wrote:
>
>> And, just for fun, since there is nothing wrong with your code, this minor 
>> change is terser:
>>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> for match in re.finditer(re.escape('abc_degree + 1') , example):
>
>> ... print(match.start(), match.end())
>> ...
>> ...
>> 4 18
>> 26 40
>>
>
> Just for more fun :) -
>
> Without knowing how general your expressions will be, I think the following 
> version is very readable, certainly more readable than regexes:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> KEY = 'abc_degree + 1'
>
> for i in range(len(example)):
>  if example[i:].startswith(KEY):
>  print(i, i + len(KEY))
> # prints:
> 4 18
> 26 40
>
> If you may have variable numbers of spaces around the symbols, OTOH, the 
> whole situation changes and then regexes would almost certainly be the best 
> approach.  But the regular expression strings would become harder to read.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to escape strings for re.finditer?

2023-02-28 Thread Jen Kris via Python-list

I wrote my previous message before reading this.  Thank you for the test you 
ran -- it answers the question of performance.  You show that re.finditer is 
30x faster, so that certainly recommends that over a simple loop, which 
introduces looping overhead.  


Feb 28, 2023, 05:44 by [email protected]:

> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>
>> Op 28/02/2023 om 3:44 schreef Thomas Passin:
>>
>>> On 2/27/2023 9:16 PM, [email protected] wrote:
>>>
 And, just for fun, since there is nothing wrong with your code, this minor 
 change is terser:

>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>
 ... print(match.start(), match.end())
 ...
 ...
 4 18
 26 40

>>>
>>> Just for more fun :) -
>>>
>>> Without knowing how general your expressions will be, I think the following 
>>> version is very readable, certainly more readable than regexes:
>>>
>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>> KEY = 'abc_degree + 1'
>>>
>>> for i in range(len(example)):
>>>     if example[i:].startswith(KEY):
>>>     print(i, i + len(KEY))
>>> # prints:
>>> 4 18
>>> 26 40
>>>
>> I think it's often a good idea to use a standard library function instead of 
>> rolling your own. The issue becomes less clear-cut when the standard library 
>> doesn't do exactly what you need (as here, where re.finditer() uses regular 
>> expressions while the use case only uses simple search strings). Ideally 
>> there would be a str.finditer() method we could use, but in the absence of 
>> that I think we still need to consider using the almost-but-not-quite 
>> fitting re.finditer().
>>
>> Two reasons:
>>
>> (1) I think it's clearer: the name tells us what it does (though of course 
>> we could solve this in a hand-written version by wrapping it in a suitably 
>> named function).
>>
>> (2) Searching for a string in another string, in a performant way, is not as 
>> simple as it first appears. Your version works correctly, but slowly. In 
>> some situations it doesn't matter, but in other cases it will. For better 
>> performance, string searching algorithms jump ahead either when they found a 
>> match or when they know for sure there isn't a match for some time (see e.g. 
>> the Boyer–Moore string-search algorithm). You could write such a more 
>> efficient algorithm, but then it becomes more complex and more error-prone. 
>> Using a well-tested existing function becomes quite attractive.
>>
>
> Sure, it all depends on what the real task will be.  That's why I wrote 
> "Without knowing how general your expressions will be". For the example 
> string, it's unlikely that speed will be a factor, but who knows what target 
> strings and keys will turn up in the future?
>
>> To illustrate the difference performance, I did a simple test (using the 
>> paragraph above is test text):
>>
>>      import re
>>      import timeit
>>
>>      def using_re_finditer(key, text):
>>      matches = []
>>      for match in re.finditer(re.escape(key), text):
>>      matches.append((match.start(), match.end()))
>>      return matches
>>
>>
>>      def using_simple_loop(key, text):
>>      matches = []
>>      for i in range(len(text)):
>>      if text[i:].startswith(key):
>>      matches.append((i, i + len(key)))
>>      return matches
>>
>>
>>      CORPUS = """Searching for a string in another string, in a performant 
>> way, is
>>      not as simple as it first appears. Your version works correctly, but 
>> slowly.
>>      In some situations it doesn't matter, but in other cases it will. For 
>> better
>>      performance, string searching algorithms jump ahead either when they 
>> found a
>>      match or when they know for sure there isn't a match for some time (see 
>> e.g.
>>      the Boyer–Moore string-search algorithm). You could write such a more
>>      efficient algorithm, but then it becomes more complex and more 
>> error-prone.
>>      Using a well-tested existing function becomes quite attractive."""
>>      KEY = 'in'
>>      print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>      print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, 
>> CORPUS)', globals=globals(), number=1000))
>>
>> This does 5 runs of 1000 repetitions each, and reports the time in seconds 
>> for each of those runs.
>> Result on my machine:
>>
>>      using_simple_loop: [0.1395295020792, 0.1306313000456, 
>> 0.1280345001249, 0.1318618002423, 0.1308461032626]
>>      using_re_finditer: [0.00386140005233, 0.00406190124297, 
>> 0.00347899970256, 0.00341310216218, 0.003732001273]
>>
>> We find that in this test re.finditer() is more than 30 times faster 
>> (despite the overhead of regular expressions.
>>
>> While speed isn't everything in programming, w

How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list

The base class:


class Constraint(object):

def __init__(self, strength):
    super(Constraint, self).__init__()
    self.strength = strength

def satisfy(self, mark):
    global planner
    self.choose_method(mark)

The subclass:

class UrnaryConstraint(Constraint):

def __init__(self, v, strength):
    super(UrnaryConstraint, self).__init__(strength)
    self.my_output = v
    self.satisfied = False
    self.add_constraint()

    def choose_method(self, mark):
    if self.my_output.mark != mark and \
   Strength.stronger(self.strength, self.my_output.walk_strength):
self.satisfied = True
    else:
    self.satisfied = False

The base class Constraint doesn’t have a "choose_method" class method, but it’s 
called as self.choose_method(mark) on the final line of Constraint shown above. 

My question is:  what makes "choose_method" a method of the base class, called 
as self.choose_method instead of UrnaryConstraint.choose_method?  Is it 
super(UrnaryConstraint, self).__init__(strength) or just the fact that 
Constraint is its base class? 

Also, this program also has a class BinaryConstraint that is also a subclass of 
Constraint and it also has a choose_method class method that is similar but not 
identical:

def choose_method(self, mark):
    if self.v1.mark == mark:
    if self.v2.mark != mark and Strength.stronger(self.strength, 
self.v2.walk_strength):
    self.direction = Direction.FORWARD
    else:
    self.direction = Direction.BACKWARD

When called from Constraint, it uses the one at UrnaryConstraint.  How does it 
know which one to use? 

Thanks,

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list
Thanks to Richard Damon and Peter Holzer for your replies.  I'm working through 
the call chain to understand better so I can post a followup question if 
needed.  

Thanks again.

Jen


Mar 26, 2023, 19:21 by [email protected]:

> On 3/26/23 1:43 PM, Jen Kris via Python-list wrote:
>
>> The base class:
>>
>>
>> class Constraint(object):
>>
>> def __init__(self, strength):
>>      super(Constraint, self).__init__()
>>      self.strength = strength
>>
>> def satisfy(self, mark):
>>      global planner
>>      self.choose_method(mark)
>>
>> The subclass:
>>
>> class UrnaryConstraint(Constraint):
>>
>> def __init__(self, v, strength):
>>      super(UrnaryConstraint, self).__init__(strength)
>>      self.my_output = v
>>      self.satisfied = False
>>      self.add_constraint()
>>
>>      def choose_method(self, mark):
>>      if self.my_output.mark != mark and \
>>     Strength.stronger(self.strength, self.my_output.walk_strength):
>> self.satisfied = True
>>      else:
>>      self.satisfied = False
>>
>> The base class Constraint doesn’t have a "choose_method" class method, but 
>> it’s called as self.choose_method(mark) on the final line of Constraint 
>> shown above.
>>
>> My question is:  what makes "choose_method" a method of the base class, 
>> called as self.choose_method instead of UrnaryConstraint.choose_method?  Is 
>> it super(UrnaryConstraint, self).__init__(strength) or just the fact that 
>> Constraint is its base class?
>>
>> Also, this program also has a class BinaryConstraint that is also a subclass 
>> of Constraint and it also has a choose_method class method that is similar 
>> but not identical:
>>
>> def choose_method(self, mark):
>>      if self.v1.mark == mark:
>>      if self.v2.mark != mark and Strength.stronger(self.strength, 
>> self.v2.walk_strength):
>>      self.direction = Direction.FORWARD
>>      else:
>>      self.direction = Direction.BACKWARD
>>
>> When called from Constraint, it uses the one at UrnaryConstraint.  How does 
>> it know which one to use?
>>
>> Thanks,
>>
>> Jen
>>
>
> Perhaps the key point to remember is that when looking up the methods on an 
> object, those methods are part of the object as a whole, not particually 
> "attached" to a given class. When creating the subclass typed object, first 
> the most base class part is built, and all the methods of that class are put 
> into the object, then the next level, and so on, and if a duplicate method is 
> found, it just overwrites the connection. Then when the object is used, we 
> see if there is a method by that name to use, so methods in the base can find 
> methods in subclasses to use.
>
> Perhaps a more modern approach would be to use the concept of an "abstract 
> base" which allows the base to indicate that a derived class needs to define 
> certain abstract methods, (If you need that sort of support, not defining a 
> method might just mean the subclass doesn't support some optional behavior 
> defined by the base)
>
> -- 
> Richard Damon
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list

Based on your explanations, I went through the call chain and now I understand 
better how it works, but I have a follow-up question at the end.    

This code comes from the DeltaBlue benchmark in the Python benchmark suite. 

1
The call chain starts in a non-class program with the following call:

EqualityConstraint(prev, v, Strength.REQUIRED)

2
EqualityConstraint is a subclass of BinaryConstraint, so first it calls the 
__init__ method of BinaryConstraint:

 def __init__(self, v1, v2, strength):
    super(BinaryConstraint, self).__init__(strength)
    self.v1 = v1
    self.v2 = v2
    self.direction = Direction.NONE
    self.add_constraint()

3
At the final line shown above it calls add_constraint in the Constraint class, 
the base class of BinaryConstraint:

  def add_constraint(self):
    global planner
    self.add_to_graph()
    planner.incremental_add(self)

4
At planner.incremental_add it calls incremental_add in the Planner class 
because planner is a global instance of the Planner class: 

    def incremental_add(self, constraint):
    mark = self.new_mark()
    overridden = constraint.satisfy(mark)

At the final line it calls "satisfy" in the Constraint class, and that line 
calls choose_method in the BinaryConstraint class.  Just as Peter Holzer said, 
it requires a call to "satisfy." 

My only remaining question is, did it select the choose_method in the 
BinaryConstraint class instead of the choose_method in the UrnaryConstraint 
class because of "super(BinaryConstraint, self).__init__(strength)" in step 2 
above? 

Thanks for helping me clarify that. 

Jen



Mar 26, 2023, 18:55 by [email protected]:

> On 2023-03-26 19:43:44 +0200, Jen Kris via Python-list wrote:
>
>> The base class:
>>
>>
>> class Constraint(object):
>>
> [...]
>
>> def satisfy(self, mark):
>>     global planner
>>     self.choose_method(mark)
>>
>> The subclass:
>>
>> class UrnaryConstraint(Constraint):
>>
> [...]
>
>>     def choose_method(self, mark):
>>     if self.my_output.mark != mark and \
>>    Strength.stronger(self.strength, self.my_output.walk_strength):
>> self.satisfied = True
>>     else:
>>     self.satisfied = False
>>
>> The base class Constraint doesn’t have a "choose_method" class method,
>> but it’s called as self.choose_method(mark) on the final line of
>> Constraint shown above. 
>>
>> My question is:  what makes "choose_method" a method of the base
>> class,
>>
>
> Nothing. choose_method isn't a method of the base class.
>
>> called as self.choose_method instead of
>> UrnaryConstraint.choose_method?  Is it super(UrnaryConstraint,
>> self).__init__(strength) or just the fact that Constraint is its base
>> class? 
>>
>
> This works only if satisfy() is called on a subclass of Constraint which
> actually implements this method.
>
> If you do something like
>
> x = UrnaryConstraint()
> x.satisfy(whatever)
>
> Then x is a member of class UrnaryConstraint and will have a
> choose_method() method which can be called.
>
>
>> Also, this program also has a class BinaryConstraint that is also a
>> subclass of Constraint and it also has a choose_method class method
>> that is similar but not identical:
>>
> ...
>
>> When called from Constraint, it uses the one at UrnaryConstraint.  How
>> does it know which one to use? 
>>
>
> By inspecting self. If you call x.satisfy() on an object of class
> UrnaryConstraint, then self.choose_method will be the choose_method from
> UrnaryConstraint. If you call it on an object of class BinaryConstraint,
> then self.choose_method will be the choose_method from BinaryConstraint.
>
>  hp
>
> PS: Pretty sure there's one "r" too many in UrnaryConstraint.
>
> -- 
>  _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | [email protected] |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a method of a subclass become a method of the base class?

2023-03-26 Thread Jen Kris via Python-list

Cameron, 

Thanks for your reply.  You are correct about the class definition lines – e.g. 
class EqualityConstraint(BinaryConstraint).  I didn’t post all of the code 
because this program is over 600 lines long.  It's DeltaBlue in the Python 
benchmark suite.  

I’ve done some more work since this morning, and now I see what’s happening.  
But it gave rise to another question, which I’ll ask at the end. 

The call chain starts at

    EqualityConstraint(prev, v, Strength.REQUIRED) 

The class EqualityConstraint is a subclass of BinaryConstraint.  The entire 
class code is:

    class EqualityConstraint(BinaryConstraint):
    def execute(self):
    self.output().value = self.input().value

Because EqualityConstraint is a subclass of BinaryConstraint, the init method 
of BinaryConstraint is called first.  During that initialization (I showed the 
call chain in my previous message), it calls choose_method.  When I inspect the 
code at "self.choose_method(mark):" in PyCharm, it shows:

    >

As EqualityConstraint is a subclass of BinaryConstraint it has bound the choose 
method from BinaryConstraint, apparently during the BinaryConstraint init 
process, and that’s the one it uses.  So that answers my original question. 

But that brings up a new question.  I can create a class instance with x = 
BinaryConstraint(), but what happens when I have a line like 
"EqualityConstraint(prev, v, Strength.REQUIRED)"? Is it because the only method 
of EqualityConstraint is execute(self)?  Is execute a special function like a 
class __init__?  I’ve done research on that but I haven’t found an answer. 

I’m asking all these question because I have worked in a procedural style for 
many years, with class work limited to only simple classes, but now I’m 
studying classes in more depth. The three answers I have received today, 
including yours, have helped a lot. 

Thanks very much. 

Jen


Mar 26, 2023, 22:45 by [email protected]:

> On 26Mar2023 22:36, Jen Kris  wrote:
>
>> At the final line it calls "satisfy" in the Constraint class, and that line 
>> calls choose_method in the BinaryConstraint class.  Just as Peter Holzer 
>> said, it requires a call to "satisfy." 
>>
>> My only remaining question is, did it select the choose_method in the 
>> BinaryConstraint class instead of the choose_method in the UrnaryConstraint 
>> class because of "super(BinaryConstraint, self).__init__(strength)" in step 
>> 2 above? 
>>
>
> Basicly, no.
>
> You've omitting the "class" lines of the class definitions, and they define 
> the class inheritance, _not "__init__". The "__init__" method just 
> initialises the state of the new objects (which has already been created). 
> The:
>
>  super(BinaryConstraint,_ self).__init__(strength)
>
> line simply calls the appropriate superclass "__init__" with the "strength" 
> parameter to do that aspect of the initialisation.
>
> You haven't cited the line which calls the "choose_method" method, but I'm 
> imagining it calls "choose_method" like this:
>
>  self.choose_method(...)
>
> That searchs for the "choose_method" method based on the method resolution 
> order of the object "self". So if "self" was an instance of 
> "EqualityConstraint", and I'm guessing abut its class definition, assuming 
> this:
>
>  class EqualityConstraint(BinaryConstraint):
>
> Then a call to "self.choose_method" would look for a "choose_method" method 
> first in the EqualityConstraint class and then via the BinaryConstraint 
> class. I'm also assuming UrnaryConstraint is not in that class ancestry i.e. 
> not an ancestor of BinaryConstraint, for example.
>
> The first method found is used.
>
> In practice, when you define a class like:
>
>  class EqualityConstraint(BinaryConstraint):
>
> the complete class ancestry (the addition classes from which BinaryConstraint 
> inherits) gets flatterned into a "method resultion order" list of classes to 
> inspect in order, and that is stored as the ".__mro__" field on the new class 
> (EqualityConstraint). You can look at it directly as 
> "EqualityConstraint.__mro__".
>
> So looking up:
>
>  self.choose_method()
>
> looks for a "choose_method" method on the classes in "type(self).__mro__".
>
> Cheers,
> Cameron Simpson 
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How does a method of a subclass become a method of the base class?

2023-03-27 Thread Jen Kris via Python-list

Thanks to everyone who answered this question.  Your answers have helped a lot. 
 

Jen


Mar 27, 2023, 14:12 by [email protected]:

> On 3/26/23 17:53, Jen Kris via Python-list wrote:
>
>> I’m asking all these question because I have worked in a procedural style 
>> for many years, with class work limited to only simple classes, but now I’m 
>> studying classes in more depth. The three answers I have received today, 
>> including yours, have helped a lot.
>>
>
> Classes in Python don't work quite like they do in many other languages.
>
> You may find a lightbulb if you listen to Raymond Hettinger talk about them:
>
> https://dailytechvideo.com/raymond-hettinger-pythons-class-development-toolkit/
>
> I'd also advise that benchmarks often do very strange things to set up the 
> scenario they're trying to test, a benchmark sure wouldn't be my first place 
> to look in learning a new piece of Python - I don't know if it was the first 
> place, but thought this was worth a mention.
>
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list

Iwant to write a list of 64-bit integers to a binary file.  Everyexample I have 
seen in my research convertsit to .txt, but I want it in binary.  I wrote this 
code,based on some earlier work I have done:




buf= bytes((len(qs_array)) * 8)

foroffset in range(len(qs_array)):


item_to_write= bytes(qs_array[offset])


struct.pack_into(buf,"https://mail.python.org/mailman/listinfo/python-list


Re: How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list
Thanks very much, MRAB.  I just tried that and it works.  What frustrated me is 
that every research example I found writes integers as strings.  That works -- 
sort of -- but it requires re-casting each string to integer when reading the 
file.  If I'm doing binary work I don't want the extra overhead, and it's more 
difficult yet if I'm using the Python integer output in a C program.  Your 
solution solves those problems.  



Oct 2, 2023, 17:11 by [email protected]:

> On 2023-10-01 23:04, Jen Kris via Python-list wrote:
>
>>
>> Iwant to write a list of 64-bit integers to a binary file. Everyexample I 
>> have seen in my research convertsit to .txt, but I want it in binary.  I 
>> wrote this code,based on some earlier work I have done:
>>
>> buf= bytes((len(qs_array)) * 8)
>>
>> foroffset in range(len(qs_array)):
>>
>> item_to_write= bytes(qs_array[offset])
>>
>> struct.pack_into(buf,">
>> ButI get the error "struct.error: embedded null character."
>>
>> Maybethere's a better way to do this?
>>
> You can't pack into a 'bytes' object because it's immutable.
>
> The simplest solution I can think of is:
>
> buf = struct.pack("<%sQ" % len(qs_array), *qs_array)
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to write list of integers to file with struct.pack_into?

2023-10-02 Thread Jen Kris via Python-list
Dieter, thanks for your comment that:

* In your code, `offset` is `0`, `1`, `2`, ...
but it should be `0 *8`, `1 * 8`, `2 * 8`, ...

But you concluded with essentially the same solution proposed by MRAB, so that 
would obviate the need to write item by item because it writes the whole buffer 
at once.  

Thanks for your help.  


Oct 2, 2023, 17:47 by [email protected]:

> Jen Kris wrote at 2023-10-2 00:04 +0200:
> >Iwant to write a list of 64-bit integers to a binary file.  Everyexample I 
> >have seen in my research convertsit to .txt, but I want it in binary.  I 
> >wrote this code,based on some earlier work I have done:
>
>>
>>
> >buf= bytes((len(qs_array)) * 8)
>
>>
>>
> >for offset in range(len(qs_array)):
>
>> item_to_write= bytes(qs_array[offset])
>>  struct.pack_into(buf,">
> >But I get the error "struct.error: embedded null character."
>
> You made a lot of errors:
>
>  * the signature of `struct.pack_into` is
>  `(format, buffer, offset, v1, v2, ...)`.
>  Especially: `format` is the first, `buffer` the second argument
>
>  * In your code, `offset` is `0`, `1`, `2`, ...
>  but it should be `0 *8`, `1 * 8`, `2 * 8`, ...
>
>  * The `vi` should be something which fits with the format:
>  integers in your case. But you pass bytes.
>
> Try `struct.pack_into(" instead of your loop.
>
>
> Next time: carefully read the documentation and think carefully
> about the types involved.
>

-- 
https://mail.python.org/mailman/listinfo/python-list


How to write list of integers to file with struct.pack_into?

2023-10-03 Thread Jen Kris via Python-list
My previous message just went up -- sorry for the mangled formatting.  Here it 
is properly formatted:

I want to write a list of 64-bit integers to a binary file.  Every example I 
have seen in my research converts it to .txt, but I want it in binary.  I wrote 
this code, based on some earlier work I have done:

    buf = bytes((len(qs_array)) * 8)
    for offset in range(len(qs_array)):
    item_to_write = bytes(qs_array[offset])
    struct.pack_into(buf, "https://mail.python.org/mailman/listinfo/python-list


Python child process in while True loop blocks parent

2021-11-29 Thread Jen Kris via Python-list
I have a C program that forks to create a child process and uses execv to call 
a Python program.  The Python program communicates with the parent process (in 
C) through a FIFO pipe monitored with epoll().  

The Python child process is in a while True loop, which is intended to keep it 
running while the parent process proceeds, and perform functions for the C 
program only at intervals when the parent sends data to the child -- similar to 
a daemon process. 

The C process writes to its end of the pipe and the child process reads it, but 
then the child process continues to loop, thereby blocking the parent. 

This is the Python code:

#!/usr/bin/python3
import os
import select

#Open the named pipes
pr = os.open('/tmp/Pipe_01', os.O_RDWR)
pw = os.open('/tmp/Pipe_02', os.O_RDWR)

ep = select.epoll(-1)
ep.register(pr, select.EPOLLIN)

while True:

    events = ep.poll(timeout=2.5, maxevents=-1)
    #events = ep.poll(timeout=None, maxevents=-1)

    print("child is looping")

    for fileno, event in events:
    print("Python fileno")
    print(fileno)
    print("Python event")
    print(event)
    v = os.read(pr,64)
    print("Pipe value")
    print(v)

The child process correctly receives the signal from ep.poll and correctly 
reads the data in the pipe, but then it continues looping.  For example, when I 
put in a timeout:

child is looping
Python fileno
4
Python event
1
Pipe value
b'10\x00'
child is looping
child is looping

That suggests that a while True loop is not the right thing to do in this case. 
 My question is, what type of process loop is best for this situation?  The 
multiprocessing, asyncio and subprocess libraries are very extensive, and it 
would help if someone could suggest the best alternative for what I am doing 
here. 

Thanks very much for any ideas. 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python child process in while True loop blocks parent

2021-11-29 Thread Jen Kris via Python-list
Thanks to you and Cameron for your replies.  The C side has an epoll_ctl set, 
but no event loop to handle it yet.  I'm putting that in now with a pipe write 
in Python-- as Cameron pointed out that is the likely source of blocking on C.  
The pipes are opened as rdwr in Python because that's nonblocking by default.  
The child will become more complex, but not in a way that affects polling.  And 
thanks for the tip about the c-string termination. 



Nov 29, 2021, 14:12 by [email protected]:

>
>
>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list  
>> wrote:
>>
>> I have a C program that forks to create a child process and uses execv to 
>> call a Python program.  The Python program communicates with the parent 
>> process (in C) through a FIFO pipe monitored with epoll(). 
>>
>> The Python child process is in a while True loop, which is intended to keep 
>> it running while the parent process proceeds, and perform functions for the 
>> C program only at intervals when the parent sends data to the child -- 
>> similar to a daemon process. 
>>
>> The C process writes to its end of the pipe and the child process reads it, 
>> but then the child process continues to loop, thereby blocking the parent. 
>>
>> This is the Python code:
>>
>> #!/usr/bin/python3
>> import os
>> import select
>>
>> #Open the named pipes
>> pr = os.open('/tmp/Pipe_01', os.O_RDWR)
>>
> Why open rdwr if you are only going to read the pipe?
>
>> pw = os.open('/tmp/Pipe_02', os.O_RDWR)
>>
> Only need to open for write.
>
>>
>> ep = select.epoll(-1)
>> ep.register(pr, select.EPOLLIN)
>>
>
> Is the only thing that the child does this:
> 1. Read message from pr
> 2. Process message
> 3. Write result to pw.
> 4. Loop from 1
>
> If so as Cameron said you do not need to worry about the poll.
> Do you plan for the child to become more complex?
>
>>
>> while True:
>>
>>  events = ep.poll(timeout=2.5, maxevents=-1)
>>  #events = ep.poll(timeout=None, maxevents=-1)
>>
>>  print("child is looping")
>>
>>  for fileno, event in events:
>>  print("Python fileno")
>>  print(fileno)
>>  print("Python event")
>>  print(event)
>>  v = os.read(pr,64)
>>  print("Pipe value")
>>  print(v)
>>
>> The child process correctly receives the signal from ep.poll and correctly 
>> reads the data in the pipe, but then it continues looping.  For example, 
>> when I put in a timeout:
>>
>> child is looping
>> Python fileno
>> 4
>> Python event
>> 1
>> Pipe value
>> b'10\x00'
>>
> The C code does not need to write a 0 bytes at the end.
> I assume the 0 is from the end of a C string.
> UDS messages have a length.
> In the C just write 2 byes in the case.
>
> Barry
>
>> child is looping
>> child is looping
>>
>> That suggests that a while True loop is not the right thing to do in this 
>> case.  My question is, what type of process loop is best for this situation? 
>>  The multiprocessing, asyncio and subprocess libraries are very extensive, 
>> and it would help if someone could suggest the best alternative for what I 
>> am doing here. 
>>
>> Thanks very much for any ideas. 
>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python child process in while True loop blocks parent

2021-12-01 Thread Jen Kris via Python-list
Thanks for your comment re blocking.  

I removed pipes from the Python and C programs to see if it blocks without 
them, and it does.  It looks now like the problem is not pipes.  I use fork() 
and execv() in C to run Python in a child process, but the Python process 
blocks because fork() does not create a new thread, so the Python global 
interpreter lock (GIL) prevents the C program from running once Python starts.  
So the solution appears to be run Python in a separate thread, which I can do 
with pthread create.  See "Thread State and the Global Interpreter Lock" 
https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock
 and the sections below that "Non-Python created threads" and "Cautions about 
fork()." 

I'm working on that today and I hope all goes well :) 



Nov 30, 2021, 11:42 by [email protected]:

>
>
>
>> On 29 Nov 2021, at 22:31, Jen Kris <>> [email protected]>> > wrote:
>>
>> Thanks to you and Cameron for your replies.  The C side has an epoll_ctl 
>> set, but no event loop to handle it yet.  I'm putting that in now with a 
>> pipe write in Python-- as Cameron pointed out that is the likely source of 
>> blocking on C.  The pipes are opened as rdwr in Python because that's 
>> nonblocking by default.  The child will become more complex, but not in a 
>> way that affects polling.  And thanks for the tip about the c-string 
>> termination. 
>>
>>
>
> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK.
> You should not use O_RDWR when you only need O_RDONLY access or only O_WRONLY 
> access.
>
> You may find
>
> man 2 open
>
> useful to understand in detail what is behind os.open().
>
> Barry
>
>
>
>
>>
>>
>> Nov 29, 2021, 14:12 by >> [email protected]>> :
>>
>>>
>>>
>>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>> 
>>>> [email protected]>>>> > wrote:
>>>>
>>>> I have a C program that forks to create a child process and uses execv to 
>>>> call a Python program.  The Python program communicates with the parent 
>>>> process (in C) through a FIFO pipe monitored with epoll(). 
>>>>
>>>> The Python child process is in a while True loop, which is intended to 
>>>> keep it running while the parent process proceeds, and perform functions 
>>>> for the C program only at intervals when the parent sends data to the 
>>>> child -- similar to a daemon process. 
>>>>
>>>> The C process writes to its end of the pipe and the child process reads 
>>>> it, but then the child process continues to loop, thereby blocking the 
>>>> parent. 
>>>>
>>>> This is the Python code:
>>>>
>>>> #!/usr/bin/python3
>>>> import os
>>>> import select
>>>>
>>>> #Open the named pipes
>>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR)
>>>>
>>> Why open rdwr if you are only going to read the pipe?
>>>
>>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR)
>>>>
>>> Only need to open for write.
>>>
>>>>
>>>> ep = select.epoll(-1)
>>>> ep.register(pr, select.EPOLLIN)
>>>>
>>>
>>> Is the only thing that the child does this:
>>> 1. Read message from pr
>>> 2. Process message
>>> 3. Write result to pw.
>>> 4. Loop from 1
>>>
>>> If so as Cameron said you do not need to worry about the poll.
>>> Do you plan for the child to become more complex?
>>>
>>>>
>>>> while True:
>>>>
>>>> events = ep.poll(timeout=2.5, maxevents=-1)
>>>> #events = ep.poll(timeout=None, maxevents=-1)
>>>>
>>>> print("child is looping")
>>>>
>>>> for fileno, event in events:
>>>> print("Python fileno")
>>>> print(fileno)
>>>> print("Python event")
>>>> print(event)
>>>> v = os.read(pr,64)
>>>> print("Pipe value")
>>>> print(v)
>>>>
>>>> The child process correctly receives the signal from ep.poll and correctly 
>>>> reads the data in the pipe, but then it continues looping.  For example, 
>>>> when I put in a timeout:
>>>>
>>>> child is looping
>>>> Python fileno
>>>> 4
>>>> Python event
>>>> 1
>>>> Pipe value
>>>> b'10\x00'
>>>>
>>> The C code does not need to write a 0 bytes at the end.
>>> I assume the 0 is from the end of a C string.
>>> UDS messages have a length.
>>> In the C just write 2 byes in the case.
>>>
>>> Barry
>>>
>>>> child is looping
>>>> child is looping
>>>>
>>>> That suggests that a while True loop is not the right thing to do in this 
>>>> case.  My question is, what type of process loop is best for this 
>>>> situation?  The multiprocessing, asyncio and subprocess libraries are very 
>>>> extensive, and it would help if someone could suggest the best alternative 
>>>> for what I am doing here. 
>>>>
>>>> Thanks very much for any ideas. 
>>>>
>>>>
>>>> -- 
>>>> https://mail.python.org/mailman/listinfo/python-list
>>>>
>>
>>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python child process in while True loop blocks parent

2021-12-05 Thread Jen Kris via Python-list
Thanks for your comments.  

I put the Python program on its own pthread, and call a small C program to 
fork-execv to call the Python program as a child process.  I revised the Python 
program to be a multiprocessing loop using the Python multiprocessing module.  
That bypasses the GIL and allows Python to run concurrently with C.  So far so 
good.  

Next I will use Linux pipes, not Python multiprocessing pipes, for IPC between 
Python and C.  Multiprocessing pipes are (as far as I can tell) only for commo 
between two Python processes.  I will have the parent thread send a signal 
through the pipe to the child process to exit when the parent thread is ready 
to exit, then call wait() to finalize the child process.  

I will reply back when it's finished and post the code so you can see what I 
have done.  

Thanks again.  

Jen


Dec 4, 2021, 09:22 by [email protected]:

>
>
>> On 1 Dec 2021, at 16:01, Jen Kris <>> [email protected]>> > wrote:
>>
>> Thanks for your comment re blocking.  
>>
>> I removed pipes from the Python and C programs to see if it blocks without 
>> them, and it does.
>>
>> It looks now like the problem is not pipes.
>>
>
> Ok.
>
>
>> I use fork() and execv() in C to run Python in a child process, but the 
>> Python process blocks
>>
>
> Use strace on the parent process to see what is happening.
> You will need to use the option to follow subprocesses so that you can see 
> what goes on in the python process.
>
> See man strace and the --follow-forks and --output-separately options.
> That will allow you to find the blocking system call that your code is making.
>
>
>> because fork() does not create a new thread, so the Python global 
>> interpreter lock (GIL) prevents the C program from running once Python 
>> starts.
>>
>
> Not sure why you think this.
>
>
>>   So the solution appears to be run Python in a separate thread, which I can 
>> do with pthread create.
>>
>>   See "Thread State and the Global Interpreter Lock" >> 
>> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>>
>>   and the sections below that "Non-Python created threads" and "Cautions 
>> about fork()." 
>>
>
> I take it you mean that in the parent you think that using pthreads will 
> affect python after the exec() call?
> I does not. After exec() the process has one main thread create by the kernel 
> and a new address space as defined by the /usr/bin/python.
> The only state that in inherited from the parent are open file descriptors, 
> the current working directory and security state like UID, GID.
>
>
>> I'm working on that today and I hope all goes well :) 
>>
>
> You seem to be missing background information on how processes work.
> Maybe "Advanced Programming in the UNIX Environment" > would be helpful?
>
> https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk>
>   
>
> It's a great book and covers a wide range of Unix systems programming topics.
>
> Have you created a small C program that just does the fork and exec of a 
> python program to test out your assumptions?
> If not I recommend that you do.
>
> Barry
>
>
>
>>
>>
>>
>> Nov 30, 2021, 11:42 by >> [email protected]>> :
>>
>>>
>>>
>>>
>>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>> [email protected]>>>> > wrote:
>>>>
>>>> Thanks to you and Cameron for your replies.  The C side has an epoll_ctl 
>>>> set, but no event loop to handle it yet.  I'm putting that in now with a 
>>>> pipe write in Python-- as Cameron pointed out that is the likely source of 
>>>> blocking on C.  The pipes are opened as rdwr in Python because that's 
>>>> nonblocking by default.  The child will become more complex, but not in a 
>>>> way that affects polling.  And thanks for the tip about the c-string 
>>>> termination. 
>>>>
>>>>
>>>
>>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK.
>>> You should not use O_RDWR when you only need O_RDONLY access or only 
>>> O_WRONLY access.
>>>
>>> You may find
>>>
>>> man 2 open
>>>
>>> useful to understand in detail what is behind os.open().
>>>
>>> Barry
>>>
>>>
>>>
>>>
>>>>
>>>>
>>>&

Re: Python child process in while True loop blocks parent

2021-12-05 Thread Jen Kris via Python-list
By embedding, I think you may be referring to embedding Python in a C program 
with the Python C API.   That's not what I'm doing here -- I'm not using the 
Python C API.  The C program creates two threads (using pthreads), one for 
itself and one for the child process.  On creation, the second pthread is 
pointed to a C program that calls fork-execv to run the Python program.  That 
way Python runs on a separate thread.  The multiprocessing library "effectively 
side-step[s] the Global Interpreter Lock by using subprocesses instead of 
threads."  https://docs.python.org/3/library/multiprocessing.html.  This way I 
can get the Python functionality I want on call from the C program through 
pipes and shared memory.  

I don't want to use the C API because I will be making certain library calls 
from the C program, and the syntax is much easier with native Python code than 
with C API code. 

I hope that clarifies what I'm doing. 

Jen



Dec 5, 2021, 15:19 by [email protected]:

>
>
>
>
>> On 5 Dec 2021, at 17:54, Jen Kris  wrote:
>>
>>   
>> Thanks for your comments.  
>>
>> I put the Python program on its own pthread, and call a small C program to 
>> fork-execv to call the Python program as a child process. 
>>
>
> What do you mean by putting python in it’s own pthread?
> Are you embedding python in an other program?
>
> Barry
>
>
>
>> I revised the Python program to be a multiprocessing loop using the Python 
>> multiprocessing module.  That bypasses the GIL and allows Python to run 
>> concurrently with C.  So far so good.  
>>
>> Next I will use Linux pipes, not Python multiprocessing pipes, for IPC 
>> between Python and C.  Multiprocessing pipes are (as far as I can tell) only 
>> for commo between two Python processes.  I will have the parent thread send 
>> a signal through the pipe to the child process to exit when the parent 
>> thread is ready to exit, then call wait() to finalize the child process.  
>>
>> I will reply back when it's finished and post the code so you can see what I 
>> have done.  
>>
>> Thanks again.  
>>
>> Jen
>>
>>
>> Dec 4, 2021, 09:22 by [email protected]:
>>
>>>
>>>
>>>> On 1 Dec 2021, at 16:01, Jen Kris <>>>> [email protected]>>>> > wrote:
>>>>
>>>> Thanks for your comment re blocking.  
>>>>
>>>> I removed pipes from the Python and C programs to see if it blocks without 
>>>> them, and it does.
>>>>
>>>> It looks now like the problem is not pipes.
>>>>
>>>
>>> Ok.
>>>
>>>
>>>> I use fork() and execv() in C to run Python in a child process, but the 
>>>> Python process blocks
>>>>
>>>
>>> Use strace on the parent process to see what is happening.
>>> You will need to use the option to follow subprocesses so that you can see 
>>> what goes on in the python process.
>>>
>>> See man strace and the --follow-forks and --output-separately options.
>>> That will allow you to find the blocking system call that your code is 
>>> making.
>>>
>>>
>>>> because fork() does not create a new thread, so the Python global 
>>>> interpreter lock (GIL) prevents the C program from running once Python 
>>>> starts.
>>>>
>>>
>>> Not sure why you think this.
>>>
>>>
>>>>   So the solution appears to be run Python in a separate thread, which I 
>>>> can do with pthread create.
>>>>
>>>>   See "Thread State and the Global Interpreter Lock" >>>> 
>>>> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>>>>
>>>>   and the sections below that "Non-Python created threads" and "Cautions 
>>>> about fork()." 
>>>>
>>>
>>> I take it you mean that in the parent you think that using pthreads will 
>>> affect python after the exec() call?
>>> I does not. After exec() the process has one main thread create by the 
>>> kernel and a new address space as defined by the /usr/bin/python.
>>> The only state that in inherited from the parent are open file descriptors, 
>>> the current working directory and security state like UID, GID.
>>>
>>>
>>>> I'm working on that today and I hope all goes well :) 
>>>>
>>>
>>> You seem to be missing background inform

Re: Python child process in while True loop blocks parent

2021-12-06 Thread Jen Kris via Python-list
I can't find any support for your comment that "Fork creates a new
process and therefore also a new thread."  From the Linux man pages 
https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is 
created with a single thread—the one that called fork()." 

I have a one-core one-thread instance at Digital Ocean available running Ubuntu 
18.04.  I can fork and create a new process on it, but it doesn't create a new 
thread because it doesn't have one available. 

You may also want to see "Forking vs Threading" 
(https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), "Fork 
vs Thread" (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and 
"Linux process and thread" (https://zliu.org/post/linux-process-and-thread) 
("This means that to create a normal process fork() is used that further calls 
clone() with appropriate arguments while to create a thread or LWP, a function 
from pthread library calls clone() with relvant flags. So, the main difference 
is generated by using different flags that can be passed to clone() funciton(to 
be exact, it is a system call"). 

You may be confused by the fact that threads are called light-weight processes. 

Or maybe I'm confused :)

If you have other information, please let me know.  Thanks. 

Jen


Dec 5, 2021, 18:08 by [email protected]:

> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote:
>
>> The C program creates two threads (using pthreads), one for itself and
>> one for the child process.  On creation, the second pthread is pointed
>> to a C program that calls fork-execv to run the Python program.  That
>> way Python runs on a separate thread. 
>>
>
> I think you have the relationship between processes and threads
> backwards. A process consists of one or more threads. Fork creates a new
> process and therefore also a new thread.
>
>  hp
>
> -- 
>  _  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | [email protected] |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python child process in while True loop blocks parent

2021-12-06 Thread Jen Kris via Python-list
Here is what I don't understand from what you said.  "The child process is 
created with a single thread—the one that called fork()."  To me that implies 
that the thread that called fork() is the same thread as the child process.  I 
guess you're talking about the distinction between logical threads and physical 
threads.  

But the main issue is your suggestion that I should call fork-execv from the 
thread that runs the main C program, not from a separate physical pthread.  
That would certainly eliminate the overhead of creating a new pthread. 

I am working now to finish this, and I will try your suggestion of calling 
fork-execv from the "main" thread.  When I reply back next I can give you a 
complete picture of what I'm doing. 

Your comments, and those of Peter Holzer and Chris Angelico, are most 
appreciated. 




Dec 6, 2021, 10:37 by [email protected]:

>
>
>> On 6 Dec 2021, at 17:09, Jen Kris via Python-list  
>> wrote:
>>
>> I can't find any support for your comment that "Fork creates a new
>> process and therefore also a new thread."  From the Linux man pages 
>> https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is 
>> created with a single thread—the one that called fork()."
>>
>
> You just quoted the evidence!
>
> All new processes on unix (may all OS) only ever have one thread when they 
> start.
> The thread-id of the first thread is the same as the process-id and referred 
> to as the main thread.
>
>>
>> I have a one-core one-thread instance at Digital Ocean available running 
>> Ubuntu 18.04.  I can fork and create a new process on it, but it doesn't 
>> create a new thread because it doesn't have one available.
>>
>
>
> By that logic it can only run one process...
>
> It has one hardware core that support one hardware thread.
> Linux can create as many software threads as it likes.
>
>> You may also want to see "Forking vs Threading" 
>> (https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), 
>> "Fork vs Thread" 
>> (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and "Linux 
>> process and thread" (https://zliu.org/post/linux-process-and-thread) ("This 
>> means that to create a normal process fork() is used that further calls 
>> clone() with appropriate arguments while to create a thread or LWP, a 
>> function from pthread library calls clone() with relvant flags. So, the main 
>> difference is generated by using different flags that can be passed to 
>> clone() funciton(to be exact, it is a system call"). 
>>
>> You may be confused by the fact that threads are called light-weight 
>> processes.
>>
>
> No Peter and I are not confused.
>
>>
>> Or maybe I'm confused :)
>>
>
> Yes you are confused.
>
>>
>> If you have other information, please let me know.  Thanks.
>>
>
> Please get the book I recommended, or another that covers systems programming 
> on unix, and have a read.
>
> Barry
>
>>
>> Jen
>>
>>
>> Dec 5, 2021, 18:08 by [email protected]:
>>
>>> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote:
>>>
>>>> The C program creates two threads (using pthreads), one for itself and
>>>> one for the child process.  On creation, the second pthread is pointed
>>>> to a C program that calls fork-execv to run the Python program.  That
>>>> way Python runs on a separate thread. 
>>>>
>>>
>>> I think you have the relationship between processes and threads
>>> backwards. A process consists of one or more threads. Fork creates a new
>>> process and therefore also a new thread.
>>>
>>> hp
>>>
>>> -- 
>>> _  | Peter J. Holzer| Story must make more sense than reality.
>>> |_|_) ||
>>> | |   | [email protected] |-- Charles Stross, "Creative writing
>>> __/   | http://www.hjp.at/ |   challenge!"
>>>
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python child process in while True loop blocks parent

2021-12-08 Thread Jen Kris via Python-list
I started this post on November 29, and there have been helpful comments since 
then from Barry Scott, Cameron Simpson, Peter Holzer and Chris Angelico.  
Thanks to all of you.  

I've found a solution that works for my purpose, and I said earlier that I 
would post the solution I found. If anyone has a better solution I would 
appreciate any feedback. 

To recap, I'm using a pair of named pipes for IPC between C and Python.  Python 
runs as a child process after fork-execv.  The Python program continues to run 
concurrently in a while True loop, and responds to requests from C at 
intervals, and continues to run until it receives a signal from C to exit.  C 
sends signals to Python, then waits to receive data back from Python.  My 
problem was that C was blocked when Python started. 

The solution was twofold:  (1) for Python to run concurrently it must be a 
multiprocessing loop (from the multiprocessing module), and (2) Python must 
terminate its write strings with \n, or read will block in C waiting for 
something that never comes.  The multiprocessing module sidesteps the GIL; 
without multiprocessing the GIL will block all other threads once Python 
starts. 

Originally I used epoll() on the pipes.  Cameron Smith and Barry Scott advised 
against epoll, and for this case they are right.  Blocking pipes work here, and 
epoll is too much overhead for watching on a single file descriptor. 

This is the Python code now:

#!/usr/bin/python3
from multiprocessing import Process
import os

print("Python is running")

child_pid = os.getpid()
print('child process id:', child_pid)

def f(a, b):

    print("Python now in function f")

    pr = os.open('/tmp/Pipe_01', os.O_RDONLY)
    print("File Descriptor1 Opened " + str(pr))
    pw = os.open('/tmp/Pipe_02', os.O_WRONLY)
    print("File Descriptor2 Opened " + str(pw))

    while True:

    v = os.read(pr,64)
    print("Python read from pipe pr")
    print(v)

    if v == b'99':
    os.close(pr)
    os.close(pw)
    print("Python is terminating")
    os._exit(os.EX_OK)

    if v != "Send child PID":
    os.write(pw, b"OK message received\n")
    print("Python wrote back")

if __name__ == '__main__':
    a = 0
    b = 0
    p = Process(target=f, args=(a, b,))
    p.start()
    p.join()

The variables a and b are not currently used in the body, but they will be 
later. 

This is the part of the C code that communicates with Python:

    fifo_fd1 = open(fifo_path1, O_WRONLY);
    fifo_fd2 = open(fifo_path2, O_RDONLY);

    status_write = write(fifo_fd1, py_msg_01, sizeof(py_msg_01));
    if (status_write < 0) perror("write");

    status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_01));
    if (status_read < 0) perror("read");
    printf("C received message 1 from Python\n");
    printf("%.*s",(int)buf_len, fifo_readbuf);

    status_write = write(fifo_fd1, py_msg_02, sizeof(py_msg_02));
    if (status_write < 0) perror("write");

    status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_02));
    if (status_read < 0) perror("read");
    printf("C received message 2 from Python\n");
    printf("%.*s",(int)buf_len, fifo_readbuf);

    // Terminate Python multiprocessing
    printf("C is sending exit message to Python\n");
    status_write = write(fifo_fd1, py_msg_03, 2);

    printf("C is closing\n");
    close(fifo_fd1);
    close(fifo_fd2);

Screen output:

Python is running
child process id: 5353
Python now in function f
File Descriptor1 Opened 6
Thread created 0
File Descriptor2 Opened 7
Process ID: 5351
Parent Process ID: 5351
I am the parent
Core joined 0
I am the child
Python read from pipe pr
b'Hello to Python from C\x00\x00'
Python wrote back
C received message 1 from Python
OK message received
Python read from pipe pr
b'Message to Python 2\x00\x00'
Python wrote back
C received message 2 from Python
OK message received
C is sending exit message to Python
C is closing
Python read from pipe pr
b'99'
Python is terminating

Python runs on a separate thread (created with pthreads) because I want the 
flexibility of using this same basic code as a stand-alone .exe, or for a C 
extension from Python called with ctypes.  If I use it as a C extension then I 
want the Python code on a separate thread because I can't have two instances of 
the Python interpreter running on one thread, and one instance will already be 
running on the main thread, albeit "suspended" by the call from ctypes. 

So that's my solution:  (1) Python multiprocessing module; (2) Python strings 
written to the pipe must be terminated with \n. 

Thanks again to all who commented. 



Dec 6, 2021, 13:33 by ba...

Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-01 Thread Jen Kris via Python-list
I am using multiprocesssing.shared_memory to pass data between NASM and Python. 
 The shared memory is created in NASM before Python is called.  Python connects 
to the shm:  shm_00 = 
shared_memory.SharedMemory(name='shm_object_00',create=False).  

I have used shared memory at other points in this project to pass text data 
from Python back to NASM with no problems.  But now this time I need to pass a 
32-bit integer (specifically 32,894) from NASM to Python. 

First I convert the integer to bytes in a C program linked into NASM:

    unsigned char bytes[4]
    unsigned long int_to_convert = 32894;

    bytes[0] = (int_to_convert >> 24) & 0xFF;
    bytes[1] = (int_to_convert >> 16) & 0xFF;
    bytes[2] = (int_to_convert >> 8) & 0xFF;
    bytes[3] = int_to_convert & 0xFF;
    memcpy(outbuf, bytes, 4);

where outbuf is a pointer to the shared memory.  On return from C to NASM, I 
verify that the first four bytes of the shared memory contain what I want, and 
they are 0, 0, -128, 126 which is binary   1000 0110, 
and that's correct (32,894). 

Next I send a message to Python through a FIFO to read the data from shared 
memory.  Python uses the following code to read the first four bytes of the 
shared memory:

    byte_val = shm_00.buf[:4]
    print(shm_00.buf[0])
    print(shm_00.buf[1])
    print(shm_00.buf[2])
    print(shm_00.buf[3])

But the bytes show as 40 39 96 96, which is exactly what the first four bytes 
of this shared memory contained before I called C to overwrite them with the 
bytes 0, 0, -128, 126.  So Python does not see the updated bytes, and naturally 
int.from_bytes(byte_val, "little") does not return the result I want. 

I know that Python refers to shm00.buf, using the buffer protocol.  Is that the 
reason that Python can't see the data that has been updated by another 
language? 

So my question is, how can I alter the data in shared memory in a non-Python 
language to pass back to Python? 

Thanks,

Jen

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-01 Thread Jen Kris via Python-list
Barry, thanks for your reply.  

On the theory that it is not yet possible to pass data from a non-Python 
language to Python with multiprocessing.shared_memory, I bypassed the problem 
by attaching 4 bytes to my FIFO pipe message from NASM to Python:

byte_val = v[10:14]

where v is the message read from the FIFO.  Then:

 breakup = int.from_bytes(byte_val, "big")
print("this is breakup " + str(breakup))

Python prints:  this is breakup 32894

Note that I had to switch from little endian to big endian.  Python is little 
endian by default, but in this case it's big endian.  

However, if anyone on this list knows how to pass data from a non-Python 
language to Python in multiprocessing.shared_memory please let me (and the 
list) know.  

Thanks.  


Feb 1, 2022, 14:20 by [email protected]:

>
>
>> On 1 Feb 2022, at 20:26, Jen Kris via Python-list  
>> wrote:
>>
>> I am using multiprocesssing.shared_memory to pass data between NASM and 
>> Python.  The shared memory is created in NASM before Python is called.  
>> Python connects to the shm:  shm_00 = 
>> shared_memory.SharedMemory(name='shm_object_00',create=False). 
>>
>> I have used shared memory at other points in this project to pass text data 
>> from Python back to NASM with no problems.  But now this time I need to pass 
>> a 32-bit integer (specifically 32,894) from NASM to Python. 
>>
>> First I convert the integer to bytes in a C program linked into NASM:
>>
>>  unsigned char bytes[4]
>>  unsigned long int_to_convert = 32894;
>>
>>  bytes[0] = (int_to_convert >> 24) & 0xFF;
>>  bytes[1] = (int_to_convert >> 16) & 0xFF;
>>  bytes[2] = (int_to_convert >> 8) & 0xFF;
>>  bytes[3] = int_to_convert & 0xFF;
>>  memcpy(outbuf, bytes, 4);
>>
>> where outbuf is a pointer to the shared memory.  On return from C to NASM, I 
>> verify that the first four bytes of the shared memory contain what I want, 
>> and they are 0, 0, -128, 126 which is binary   1000 
>> 0110, and that's correct (32,894). 
>>
>> Next I send a message to Python through a FIFO to read the data from shared 
>> memory.  Python uses the following code to read the first four bytes of the 
>> shared memory:
>>
>>  byte_val = shm_00.buf[:4]
>>  print(shm_00.buf[0])
>>  print(shm_00.buf[1])
>>  print(shm_00.buf[2])
>>  print(shm_00.buf[3])
>>
>> But the bytes show as 40 39 96 96, which is exactly what the first four 
>> bytes of this shared memory contained before I called C to overwrite them 
>> with the bytes 0, 0, -128, 126.  So Python does not see the updated bytes, 
>> and naturally int.from_bytes(byte_val, "little") does not return the result 
>> I want. 
>>
>> I know that Python refers to shm00.buf, using the buffer protocol.  Is that 
>> the reason that Python can't see the data that has been updated by another 
>> language? 
>>
>> So my question is, how can I alter the data in shared memory in a non-Python 
>> language to pass back to Python?
>>
>
> Maybe you need to use a memory barrier to force the data to be seen by 
> another cpu?
> Maybe use shm lock operation to sync both sides?
> Googling I see people talking about using stdatomic.h for this.
>
> But I am far from clear what you would need to do.
>
> Barry
>
>>
>> Thanks,
>>
>> Jen
>>
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-02 Thread Jen Kris via Python-list
It's not clear to me from the struct module whether it can actually auto-detect 
endianness.  I think it must be specified, just as I had to do with 
int.from_bytes().  In my case endianness was dictated by how the four bytes 
were populated, starting with the zero bytes on the left.  


Feb 1, 2022, 21:30 by [email protected]:

> On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris 
> declaimed the following:
>
>>
>> breakup = int.from_bytes(byte_val, "big")
>>
> >print("this is breakup " + str(breakup))
>
>>
>>
> >Python prints:  this is breakup 32894
>
>>
>>
> >Note that I had to switch from little endian to big endian.  Python is 
> >little endian by default, but in this case it's big endian.  
>
>>
>>
> Look at the struct module. I'm pretty certain it has flags for big or
> little end, or system native (that, or run your integers through the
> various "network byte order" functions that I think C and Python both
> support.
>
> https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html
>
>
> >However, if anyone on this list knows how to pass data from a non-Python 
> >language to Python in multiprocessing.shared_memory please let me (and the 
> >list) know.  
>
>  MMU cache lines not writing through to RAM? Can't find
> anything on Google to force a cache flush Can you test on a
> different OS? (Windows vs Linux)
>
>
>
> -- 
>  Wulfraed Dennis Lee Bieber AF6VN
>  [email protected]://wlfraed.microdiversity.freeddns.org/
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Data unchanged when passing data to Python in multiprocessing shared memory

2022-02-02 Thread Jen Kris via Python-list
An ASCII string will not work.  If you convert 32894 to an ascii string you 
will have five bytes, but you need four.  In my original post I showed the C 
program I used to convert any 32-bit number to 4 bytes.  


Feb 2, 2022, 10:16 by [email protected]:

> I applaud trying to find the right solution but wonder if a more trivial 
> solution is even being considered. It ignores big and little endians and just 
> converts your data into another form and back.
>
> If all you want to do is send an integer that fit in 32 bits or 64 bits, why 
> not convert it to a character string in a form that both machines will see 
> the same way and when read back, convert it back to an integer? 
>
> As long as both side see the same string, this can be done in reasonable time 
> and portably.
>
> Or am I missing something? Is "1234" not necessarily seen in the same order, 
> or "1.234e3" or whatever?
>
> Obviously, if the mechanism is heavily used and multiple sides keep reading 
> and even writing the same memory location, this is not ideal. But having 
> different incompatible processors looking at the same memory is also not.
>
> -Original Message-
> From: Dennis Lee Bieber 
> To: [email protected]
> Sent: Wed, Feb 2, 2022 12:30 am
> Subject: Re: Data unchanged when passing data to Python in multiprocessing 
> shared memory
>
>
> On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris 
>
> declaimed the following:
>
>
>
>>
>>
>> breakup = int.from_bytes(byte_val, "big")
>>
>
> >print("this is breakup " + str(breakup))
>
>>
>>
>
> >Python prints:  this is breakup 32894
>
>>
>>
>
> >Note that I had to switch from little endian to big endian.  Python is 
> >little endian by default, but in this case it's big endian.  
>
>>
>>
>
>     Look at the struct module. I'm pretty certain it has flags for big or
>
> little end, or system native (that, or run your integers through the
>
> various "network byte order" functions that I think C and Python both
>
> support.
>
>
>
> https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html
>
>
>
>
>
> >However, if anyone on this list knows how to pass data from a non-Python 
> >language to Python in multiprocessing.shared_memory please let me (and the 
> >list) know.  
>
>
>
>     MMU cache lines not writing through to RAM? Can't find
>
> anything on Google to force a cache flush Can you test on a
>
> different OS? (Windows vs Linux)
>
>
>
>
>
>
>
> -- 
>
>     Wulfraed                 Dennis Lee Bieber         AF6VN
>
>     [email protected]    http://wlfraed.microdiversity.freeddns.org/
>
> -- 
>
> https://mail.python.org/mailman/listinfo/python-list
>
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Can't get iterator in the C API

2022-02-08 Thread Jen Kris via Python-list
I am using the Python C API to load the Gutenberg corpus from the nltk library 
and iterate through the sentences.  The Python code I am trying to replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
    sentences = gutenberg.sents(fileid)
    etc

where gutenberg.fileids is, of course, iterable. 

I use the following C API code to import the module and get pointers:

int64_t Call_PyModule()
{
    PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod;

    pName = PyUnicode_FromString("nltk.corpus");
    pModule = PyImport_Import(pName);

    if (pModule == 0x0){
    PyErr_Print();
    return 1; }

    pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
    pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
    pSentMod = PyObject_GetAttrString(pSubMod, "sents");

    pFidIter = PyObject_GetIter(pFidMod);
    int ckseq_ok = PySeqIter_Check(pFidMod);
    pFidSeqIter  = PySeqIter_New(pFidMod);

    return 0;
}

pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator lines 
return zero: 

pFidIter = PyObject_GetIter(pFidMod);
int ckseq_ok = PySeqIter_Check(pFidMod);
pFidSeqIter  = PySeqIter_New(pFidMod);

So the C API thinks gutenberg.fileids is not iterable, but it is.  What am I 
doing wrong?


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Can't get iterator in the C API

2022-02-09 Thread Jen Kris via Python-list
Thank you for clarifying that.  Now on to getting the iterator from the method. 
 

Jen


Feb 8, 2022, 18:10 by [email protected]:

> On 2022-02-09 01:12, Jen Kris via Python-list wrote:
>
>> I am using the Python C API to load the Gutenberg corpus from the nltk 
>> library and iterate through the sentences.  The Python code I am trying to 
>> replicate is:
>>
>> from nltk.corpus import gutenberg
>> for i, fileid in enumerate(gutenberg.fileids()):
>>      sentences = gutenberg.sents(fileid)
>>      etc
>>
>> where gutenberg.fileids is, of course, iterable.
>>
>> I use the following C API code to import the module and get pointers:
>>
>> int64_t Call_PyModule()
>> {
>>      PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod;
>>
>>      pName = PyUnicode_FromString("nltk.corpus");
>>      pModule = PyImport_Import(pName);
>>
>>      if (pModule == 0x0){
>>      PyErr_Print();
>>      return 1; }
>>
>>      pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>>      pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>>      pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>>      pFidIter = PyObject_GetIter(pFidMod);
>>      int ckseq_ok = PySeqIter_Check(pFidMod);
>>      pFidSeqIter  = PySeqIter_New(pFidMod);
>>
>>      return 0;
>> }
>>
>> pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator 
>> lines return zero:
>>
>> pFidIter = PyObject_GetIter(pFidMod);
>> int ckseq_ok = PySeqIter_Check(pFidMod);
>> pFidSeqIter  = PySeqIter_New(pFidMod);
>>
>> So the C API thinks gutenberg.fileids is not iterable, but it is.  What am I 
>> doing wrong?
>>
> Look at your Python code. You have "gutenberg.fileids()", so the 'fileids' 
> attribute is not an iterable itself, but a method that you need to call to 
> get the iterable.
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
This is a follow-on to a question I asked yesterday, which was answered by 
MRAB.   I'm using the Python C API to load the Gutenberg corpus from the nltk 
library and iterate through the sentences.  The Python code I am trying to 
replicate is:

from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
    sentences = gutenberg.sents(fileid)
    etc

I have everything finished down to the last line (sentences = 
gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
but it segfaults.  The fileid is a string -- the first fileid in this corpus is 
"austen-emma.txt."  

pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);

pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");

pFileIds = PyObject_CallObject(pFidMod, 0);
pListItem = PyList_GetItem(pFileIds, listIndex);
pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
pListStr = PyBytes_AS_STRING(pListStrE);
Py_DECREF(pListStrE);

// sentences = gutenberg.sents(fileid)
PyObject *c_args = Py_BuildValue("s", pListStr);  
PyObject *NullPtr = 0;
pSents = PyObject_Call(pSentMod, c_args, NullPtr);

The final line segfaults:
Program received signal SIGSEGV, Segmentation fault.
0x76e4e8d5 in _PyEval_EvalCodeWithName ()
   from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0

My guess is the problem is in Py_BuildValue, which returns a pointer but it may 
not be constructed correctly.  I also tried it with "O" and it doesn't segfault 
but it returns 0x0. 

I'm new to using the C API.  Thanks for any help. 

Jen


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: C API PyObject_Call segfaults with string

2022-02-09 Thread Jen Kris via Python-list
Thanks for your reply.  

I eliminated the DECREF and now it doesn't segfault but it returns 0x0.  Same 
when I substitute pListStrE for pListStr.  pListStr contains the string 
representation of the fileid, so it seemed like the one to use.  According to  
http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue 
"builds a tuple only if its format string contains two or more format units" 
and that doc contains examples. 


Feb 9, 2022, 16:52 by [email protected]:

> On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list
>  wrote:
>
>>
>> I have everything finished down to the last line (sentences = 
>> gutenberg.sents(fileid)) where I use  PyObject_Call to call gutenberg.sents, 
>> but it segfaults.  The fileid is a string -- the first fileid in this corpus 
>> is "austen-emma.txt."
>>
>> pName = PyUnicode_FromString("nltk.corpus");
>> pModule = PyImport_Import(pName);
>>
>> pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
>> pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
>> pSentMod = PyObject_GetAttrString(pSubMod, "sents");
>>
>> pFileIds = PyObject_CallObject(pFidMod, 0);
>> pListItem = PyList_GetItem(pFileIds, listIndex);
>> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
>> pListStr = PyBytes_AS_STRING(pListStrE);
>> Py_DECREF(pListStrE);
>>
>
> HERE.
> PyBytes_AS_STRING() returns pointer in the pListStrE Object.
> So Py_DECREF(pListStrE) makes pListStr a dangling pointer.
>
>>
>> // sentences = gutenberg.sents(fileid)
>> PyObject *c_args = Py_BuildValue("s", pListStr);
>>
>
> Why do you encode&decode pListStrE?
> Why don't you use just pListStrE?
>
>> PyObject *NullPtr = 0;
>> pSents = PyObject_Call(pSentMod, c_args, NullPtr);
>>
>
> c_args must tuple, but you passed a unicode object here.
> Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue
>
>
>> The final line segfaults:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x76e4e8d5 in _PyEval_EvalCodeWithName ()
>>  from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
>>
>> My guess is the problem is in Py_BuildValue, which returns a pointer but it 
>> may not be constructed correctly.  I also tried it with "O" and it doesn't 
>> segfault but it returns 0x0.
>>
>> I'm new to using the C API.  Thanks for any help.
>>
>> Jen
>>
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> Bests,
>
> -- 
> Inada Naoki  
>

-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   >