Re: SnakeCard products source code
Thanks for sharing it. I am interested in the GINA part. Philippe C. Martin wrote: > Dear all, > > The source code is available in the download section: www.snakecard.com > > Regards, > > Philippe -- http://mail.python.org/mailman/listinfo/python-list
Program blocked in Queue.Queue.get and Queue.Queue.put
I have a program that is blocked and all threads are blocked on a Queue.Queue.get or Queue.Queue.put method (on the same Queue.Queue object). 1 thread shows the below as its last entry in the stack: File: "c:\python27\lib\Queue.py", line 161, in get self.not_empty.acquire() 2 threads show the below as its last entry in the stack: File: "c:\python27\lib\Queue.py", line 118, in put self.not_full.acquire() According to me, this means both the Queue.Queue.not_full and Queue.Queue.not_empty locks are taken, but no other thread seems to have it. Of course, I don't access the locks my self directly. I did send an KeyboardInterrupt to the main thread however. Could it be that it was at that moment doing a Queue.Queue.put and it got interrupted while it has the lock, but before it entered the try block with the finally that releases the lock (so between line 118 and 119 in the Queue.py file)? If this is the case, how do I avoid that? Or is it a bug in the Queue.Queue class? If this is not the case, any clue what else could have happened? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Static Methods in Python
Hi, I am a newbie to Python. With a background in Java, I was attempting to write static methods in the class without the self as the first parameter, when I got an error. I did a search for the same on Google and found out that there was no consistent approach to this. I would like to know what is the prescribed approach for the same. Any thoughts, pointers about the same would be very much appreciated. Thanks, Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Issues with if and elif statements in 3.3
WOW as if it was something as easy as that,i had been looking for awhile on what i was doing wrong. as it seems i just don't know my way around if statements at all, thank a bunch for this. makes everything else i have been code work thanks again -- http://mail.python.org/mailman/listinfo/python-list
back with more issues
import random
def player():
hp = 10
speed = 5
attack = random.randint(0,5)
def monster ():
hp = 10
speed = 4
def battle(player):
print ("a wild mosnter appered!")
print ("would you like to battle?")
answer = input()
if answer == ("yes"):
return player(attack)
else:
print("nope")
battle()
++
this was a variation on a code that you guys already had helped me with,in the
long run i plan to incorporate them together but as it stand i don't know how
to call a specific variable from one function (attack from player) to use in
another function (battle). what i want is to be able to use the variables from
both player and monster to use in battle. any idea's?
--
http://mail.python.org/mailman/listinfo/python-list
Re: back with more issues
the idea was to store variables for later use, but you are correct i don't understand functions or if that is even the best way to do it. i guess i'd want to be able to call the HP and ATTACK variables of player for when the battle gets called. i would then use the variables in battle to figure out who would win. is there a better way to store these variables in the functions? i also read somewhere about classes but that makes even less sense to me. -- http://mail.python.org/mailman/listinfo/python-list
Re: back with more issues
darn i was hoping i could put off learning classes for a bit, but it seems that is not the case. i have tested it a bit and it seems to be working correctly now. import random class player(): hp = 10 speed = 5 attack = random.randint(0,5) print (player.attack) +++ i know it's not nearly as complicated as your examples but it seems to work. the self part of it always eluded me and continues to do so. and just so you know im learning through codecademy.com , it's based on python 2.7 and im trying to code in 3.3. but thanks for your help again and classes are starting (i think) to make some sort of sense.i'll have to reread both replies over and over again but it looks like a lot of useful info is there. but is the example i posted sorta right? i know i left the self part out but i think im on the right track. -- http://mail.python.org/mailman/listinfo/python-list
Re: back with more issues
import random
class player():
hp = 10
attack = random.randint(0,5)
class monster():
hp = 10
attack = random.randint(0,4)
def battle():
print ("a wild mosnter appered!")
print ("would you like to battle?")
answer = input()
if answer == ("yes"):
while monster.hp >=0:
print ("you do", player.attack, "damage")
monster.hp -= player.attack
print (monster.hp)
elif answer == ("no"):
print ("you run away")
else:
print("you stand there")
battle()
Hello! just wanted to show you guys how its coming together, im starting to
understand it abit more (hopefully it's right) at the moment it seems to only
roll the attack once and uses that value but that's another issue all together
that i bother you with (yet anyway).
thanks again guys you are awesome
--
http://mail.python.org/mailman/listinfo/python-list
Re: back with more issues
the Classes and __init__ still don't make much sense actually. i have tried and tried again to make it generate numbers between 0 and 5 in a while statement but it just doesn't seem to be working. import random class Player(): hp = 10 def __init__(self, patt): self.att = random.randint(0,5) while Player.hp == 10: print (Player.__init__) atm it seems to be printing "" over and over again, i don't mind the repetition but from my understanding there should be numbers there. numbers that change. crazy frustrating that i just don't understand how this works. -- http://mail.python.org/mailman/listinfo/python-list
Multi-threaded SSL
Dear Ophidians, I'm attempting to create an SSL secured, AJAX chat server. I'm moving on the hypothesis that I'll need to hang an XMLHttpRequest response blocking on the server until a new message is ready to be dispatched. This means that my server must be able to handle many open SSL sockets in separate threads. I started with Twisted, but, having looked as far as I can see, SSL is either not implemented, or not documented for that library. There are hints that it's in the works, but that's all. So, I've moved on. I'm using PyOpenSSL on a Debian box, and I started with the ActiveState Cookbook article, http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/442473 The SSL server works very well as suggested in this article. Starting with this code and adding threads, I've been trying to make simultaneous HTTP requests operate in parallel on the server. To test, I've added in turn busy and sleepy waiting to the GET processing segment of the request handler. The threads work fine; every time the server accepts a connection, it clearly starts accepting connections in a new thread. However, the problem runs deeper than I can see. The SSL listening socket blocks on accept in all threads until the one open SSL connection finishes its waiting, responds, and closes. This means that I can only have one client waiting for a response at a time. Is there a limitation of SSL, or this SSL implementation, or something else preventing me from having multiple connections waiting for responses simultaneously? Many thanks, Kris Kowal -- http://mail.python.org/mailman/listinfo/python-list
Python IRC Zork
Hi, If this has been done before in another language could someone please tell me, if not I was wondering is its possible and what the easier way is to create an IRC bot that allows you to play Zork: I was thinking of just creating a simple Python IRC bot or finding an existing one then have it run Zork and read/write from stdout/stdin. Is that possible? Is there a better or easier way to do it? Are there any existing programs that do something similar? Or just really anything else people have to say on the subject. Thanks Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Python IRC Zork
I should have said, I'm guessing subprocess is the way to go but I'm probably wrong. On 28/02/2008, Kris Davidson <[EMAIL PROTECTED]> wrote: > Hi, > > If this has been done before in another language could someone please > tell me, if not I was wondering is its possible and what the easier > way is to create an IRC bot that allows you to play Zork: > > I was thinking of just creating a simple Python IRC bot or finding an > existing one then have it run Zork and read/write from stdout/stdin. > > Is that possible? Is there a better or easier way to do it? Are there > any existing programs that do something similar? > > Or just really anything else people have to say on the subject. > > Thanks > > > Kris > -- http://mail.python.org/mailman/listinfo/python-list
Re: Python IRC Zork
> The bigger picture would be writing a full Z machine in Python, which is > something I embarked on for my own amusement a while back but never got > far enough to do anything useful at all, given the size of the task. Might be worth trying that or setting up a project somewhere, do any exist? Have you posted what code you had somewhere? -- http://mail.python.org/mailman/listinfo/python-list
mmap class has slow "in" operator
If I do the following: def mmap_search(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return mm.find(string) def mmap_is_in(f, string): fh = file(f) mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) return string in mm then a sample mmap_search() call on a 50MB file takes 0.18 seconds, but the mmap_is_in() call takes 6.6 seconds. Is the mmap class missing an operator and falling back to a slow default implementation? Presumably I can implement the latter in terms of the former. Kris -- http://mail.python.org/mailman/listinfo/python-list
UNIX credential passing
I want to make use of UNIX credential passing on a local domain socket to verify the identity of a user connecting to a privileged service. However it looks like the socket module doesn't implement sendmsg/recvmsg wrappers, and I can't find another module that does this either. Is there something I have missed? Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: UNIX credential passing
Sebastian 'lunar' Wiesner wrote:
[ Kris Kennaway <[EMAIL PROTECTED]> ]
I want to make use of UNIX credential passing on a local domain socket
to verify the identity of a user connecting to a privileged service.
However it looks like the socket module doesn't implement
sendmsg/recvmsg wrappers, and I can't find another module that does this
either. Is there something I have missed?
http://pyside.blogspot.com/2007/07/unix-socket-credentials-with-python.html
Illustrates, how to use socket credentials without sendmsg/recvmsg and so
without any need for patching.
Thanks to both you and Paul for your suggestions. For the record, the
URL above is linux-specific, but it put me on the right track. Here is
an equivalent FreeBSD implementation:
def getpeereid(sock):
""" Get peer credentials on a UNIX domain socket.
Returns a nested tuple: (uid, (gids)) """
LOCAL_PEERCRED = 0x001
NGROUPS = 16
#struct xucred {
#u_int cr_version; /* structure layout version */
#uid_t cr_uid; /* effective user id */
#short cr_ngroups; /* number of groups */
#gid_t cr_groups[NGROUPS]; /* groups */
#void*_cr_unused1; /* compatibility with old ucred */
#};
xucred_fmt = '2ih16iP'
res = tuple(struct.unpack(xucred_fmt, sock.getsockopt(0,
LOCAL_PEERCRED, struct.calcsize(xucred_fmt
# Check this is the above version of the structure
if res[0] != 0:
raise OSError
return (res[1], res[3:3+res[2]])
Kris
--
http://mail.python.org/mailman/listinfo/python-list
Re: "Faster" I/O in a script
Gary Herron wrote: [EMAIL PROTECTED] wrote: On Jun 2, 2:08 am, "kalakouentin" <[EMAIL PROTECTED]> wrote: Do you know a way to actually load my data in a more "batch-like" way so I will avoid the constant line by line reading? If your files will fit in memory, you can just do text = file.readlines() and Python will read the entire file into a list of strings named 'text,' where each item in the list corresponds to one 'line' of the file. No that won't help. That has to do *all* the same work (reading blocks and finding line endings) as the iterator PLUS allocate and build a list. Better to just use the iterator. for line in file: ... Actually this *can* be much slower. Suppose I want to search a file to see if a substring is present. st = "some substring that is not actually in the file" f = <50 MB log file> Method 1: for i in file(f): if st in i: break --> 0.472416 seconds Method 2: Read whole file: fh = file(f) rl = fh.read() fh.close() --> 0.098834 seconds "st in rl" test --> 0.037251 (total: .136 seconds) Method 3: mmap the file: mm = mmap.mmap(fh.fileno(), 0, mmap.MAP_SHARED, mmap.PROT_READ) "st in mm" test --> 3.589938 (<-- see my post the other day) mm.find(st) --> 0.186895 Summary: If you can afford the memory, it can be more efficient (more than 3 times faster in this example) to read the file into memory and process it at once (if possible). Mmapping the file and processing it at once is roughly as fast (I didnt measure the difference carefully), but has the advantage that if there are parts of the file you do not touch you don't fault them into memory. You could also play more games and mmap chunks at a time to limit the memory use (but you'd have to be careful with mmapping that doesn't match record boundaries). Kris -- http://mail.python.org/mailman/listinfo/python-list
PEP on breaking outer loops with StopIteration
I had a thought that might be pepworthy. Might we be able to break outer loops using an iter-instance specific StopIteration type? This is the desired, if not desirable, syntax:: import string letters = iter(string.lowercase) for letter in letters: for number in range(10): print letter, number if letter == 'a' and number == 5: raise StopIteration() if letter == 'b' and number == 5: raise letters.StopIteration() The first StopIteration would halt the inner loop. The second StopIteration would halt the outer loop. The inner for-loop would note that the letters.StopIteration instance is specifically targeted at another iteration and raise it back up. For this output:: a 0 a 1 a 2 a 3 a 4 a 5 b 0 b 1 b 2 b 3 b 4 b 5 This could be incrementally refined with the addition of an "as" clause to "for" that would be bound after an iterable is implicitly iter()ed:: import string for letter in string.lowercase as letters: … raise letters.StopIteration() I took the liberty to create a demo using a "for_in" decorator instead of a "for" loop:: former_iter = iter class iter(object): def __init__(self, values): if hasattr(values, 'next'): self.iter = values else: self.iter = former_iter(values) class Stop(StopIteration): pass if hasattr(values, 'StopIteration'): self.StopIteration = values.StopIteration else: self.StopIteration = Stop def next(self): try: return self.iter.next() except StopIteration, exception: raise self.StopIteration() def for_in(values): def decorate(function): iteration = iter(values) while True: try: function(iteration.next()) except iteration.StopIteration: break except StopIteration, exception: if type(exception) is StopIteration: break else: raise return decorate import string letters = iter(string.lowercase) @for_in(letters) def _as(letter): @for_in(range(10)) def _as(number): print letter, number if letter == 'a' and number == 5: raise StopIteration() if letter == 'b' and number == 5: raise letters.StopIteration() I imagine that this would constitute a lot of overhead in StopIteration type instances, but perhaps a C implementation would use flyweight StopIteration types for immutable direct subtypes of the builtin StopIteration. Kris Kowal -- http://mail.python.org/mailman/listinfo/python-list
Re: PEP on breaking outer loops with StopIteration
On Mon, Jun 9, 2008 at 7:39 PM, Paul Hankin <[EMAIL PROTECTED]> wrote: > Have you checked out http://www.python.org/dev/peps/pep-3136/ > > It contains exactly this idea, but using 'break letters' rather than > 'raise letters.StopIteration()'. I think I like the PEP's syntax > better than yours, but anyway, it was rejected. I concur that "break letters" is better than "raise letters.StopIteration()". Perhaps the novelty of the implementation idea (adding another exception case to the "while: try" that must already be there, and the specialized exception type) can wake this dead issue. Maybe "break letters" could under the hood raise the specialized StopIteration. But, then again. Guido has said, "No", already on other, albeit subjective, grounds. I'll drop it or champion it if there's interest. Kris Kowal -- http://mail.python.org/mailman/listinfo/python-list
ZFS bindings
Is anyone aware of python bindings for ZFS? I just want to replicate (or at least wrap) the command line functionality for interacting with snapshots etc. Searches have turned up nothing. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Looking for lots of words in lots of files
Calvin Spealman wrote: Upload, wait, and google them. Seriously tho, aside from using a real indexer, I would build a set of the words I'm looking for, and then loop over each file, looping over the words and doing quick checks for containment in the set. If so, add to a dict of file names to list of words found until the list hits 10 length. I don't think that would be a complicated solution and it shouldn't be terrible at performance. If you need to run this more than once, use an indexer. If you only need to use it once, use an indexer, so you learn how for next time. If you can't use an indexer, and performance matters, evaluate using grep and a shell script. Seriously. grep is a couple of orders of magnitude faster at pattern matching strings in files (and especially regexps) than python is. Even if you are invoking grep multiple times it is still likely to be faster than a "maximally efficient" single pass over the file in python. This realization was disappointing to me :) Kris -- http://mail.python.org/mailman/listinfo/python-list
Bit substring search
I am trying to parse a bit-stream file format (bzip2) that does not have byte-aligned record boundaries, so I need to do efficient matching of bit substrings at arbitrary bit offsets. Is there a package that can do this? This one comes close: http://ilan.schnell-web.net/prog/bitarray/ but it only supports single bit substring match. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
[EMAIL PROTECTED] wrote: Kris Kennaway: I am trying to parse a bit-stream file format (bzip2) that does not have byte-aligned record boundaries, so I need to do efficient matching of bit substrings at arbitrary bit offsets. Is there a package that can do this? You may take a look at Hachoir or some other modules: http://hachoir.org/wiki/hachoir-core http://pypi.python.org/pypi/construct/2.00 Thanks. hachoir also comes close, but it also doesnt seem to be able to match substrings at a bit level (e.g. the included bzip2 parser just reads the header and hands the entire file off to libbzip2 to extract data from). construct exports a bit stream but it's again pure python and matching substrings will be slow. It will need C support to do that efficiently. http://pypi.python.org/pypi/FmtRW/20040603 Etc. More: http://pypi.python.org/pypi?%3Aaction=search&term=binary Unfortunately I didnt find anything else useful here yet :( Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
[EMAIL PROTECTED] wrote: Kris Kennaway: Unfortunately I didnt find anything else useful here yet :( I see, I'm sorry, I have found hachoir quite nice in the past. Maybe there's no really efficient way to do it with Python, but you can create a compiled extension, so you can see if it's fast enough for your purposes. To create such extension you can: - One thing that requires very little time is to create an extension with ShedSkin, once installed it just needs Python code. - Cython (ex-Pyrex) too may be okay, but it's a bit trikier on Windows machines. - Using Pyd to create a D extension for Python is often the faster way I have found to create extensions. I need just few minutes to create them this way. But you need to know a bit of D. - Then, if you want you can write a C extension, but if you have not done it before you may need some hours to make it work. Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Bit substring search
Scott David Daniels wrote: Kris Kennaway wrote: Thanks for the pointers, I think a C extension will end up being the way to go, unless someone has beaten me to it and I just haven't found it yet. Depending on the pattern length you are targeting, it may be fastest to increase the out-of-loop work. For a 40-bit string, build an 8-target Aho-Corasick machine, and at each match check the endpoints. This will only work well if 40 bits is at the low end of what you are hunting for. Thanks, I wasn't aware of Aho-Corasick. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
Paddy wrote:
On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote:
Henning_Thornblad wrote:
What can be the cause of the large difference between re.search and
grep?
grep uses a smarter algorithm ;)
This script takes about 5 min to run on my computer:
#!/usr/bin/env python
import re
row=""
for a in range(156000):
row+="a"
print re.search('[^ "=]*/',row)
While doing a simple grep:
grep '[^ "=]*/' input (input contains 156.000 a in
one row)
doesn't even take a second.
Is this a bug in python?
You could call this a performance bug, but it's not common enough in real
code to get the necessary brain cycles from the core developers.
So you can either write a patch yourself or use a workaround.
re.search('[^ "=]*/', row) if "/" in row else None
might be good enough.
Peter
It is not a smarter algorithm that is used in grep. Python RE's have
more capabilities than grep RE's which need a slower, more complex
algorithm.
You could argue that if the costly RE features are not used then maybe
simpler, faster algorithms should be automatically swapped in but
I can and do :-)
It's a major problem that regular expression parsing in python has
exponential complexity when polynomial algorithms (for a subset of
regexp expressions, e.g. excluding back-references) are well-known.
It rules out using python for entire classes of applications where
regexp parsing is on the critical path.
Kris
--
http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote: On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]> wrote: What can be the cause of the large difference between re.search and grep? While doing a simple grep: grep '[^ "=]*/' input (input contains 156.000 a in one row) doesn't even take a second. Is this a bug in python? You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " Very interesting! Thanks very much for the pointer. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote:
On Jul 4, 6:43 am, Henning_Thornblad <[EMAIL PROTECTED]>
wrote:
What can be the cause of the large difference between re.search and
grep?
While doing a simple grep:
grep '[^ "=]*/' input (input contains 156.000 a in
one row)
doesn't even take a second.
Is this a bug in python?
You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "
I haven't tested this, but I think it would do what you want:
from Plex import *
lexicon = Lexicon([
(Rep(AnyBut(' "='))+Str('/'), TEXT),
(AnyBut('\n'), IGNORE),
])
filename = "my_file.txt"
f = open(filename, "r")
scanner = Scanner(lexicon, f, filename)
while 1:
token = scanner.read()
print token
if token[0] is None:
break
Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute). If the matching was fast then I could possibly pickle the
lexer though (but it's not).
Kris
Kris
--
http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
John Machin wrote:
Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute). If the matching was fast then I could possibly pickle the
lexer though (but it's not).
Can you give us some examples of the kinds of patterns that you are
using in practice and are slow using Python re?
Trivial stuff like:
(Str('error in pkg_delete'), ('mtree', 'mtree')),
(Str('filesystem was touched prior to .make install'),
('mtree', 'mtree')),
(Str('list of extra files and directories'), ('mtree', 'mtree')),
(Str('list of files present before this port was installed'),
('mtree', 'mtree')),
(Str('list of filesystem changes from before and after'),
('mtree', 'mtree')),
(re('Configuration .* not supported'), ('arch', 'arch')),
(re('(configure: error:|Script.*configure.*failed
unexpectedly|script.*failed: here are the contents of)'),
('configure_error', 'configure')),
...
There are about 150 of them and I want to find which is the first match
in a text file that ranges from a few KB up to 512MB in size.
> How large is "large"?
What kind of text?
It's compiler/build output.
Instead of grep, you might like to try nrgrep ... google("nrgrep
Navarro Raffinot"): PDF paper about it on Citeseer (if it's up),
postscript paper and C source findable from Gonzalo Navarro's home-
page.
Thanks, looks interesting but I don't think it is the best fit here. I
would like to avoid spawning hundreds of processes to process each file
(since I have tens of thousands of them to process).
Kris
--
http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
Jeroen Ruigrok van der Werven wrote: -On [20080709 14:08], Kris Kennaway ([EMAIL PROTECTED]) wrote: It's compiler/build output. Sounds like the FreeBSD ports build cluster. :) Yes indeed! Kris, have you tried a PGO build of Python with your specific usage? I cannot guarantee it will significantly speed things up though. I am pretty sure the problem is algorithmic, not bad byte code :) If it was a matter of a few % then that is in the scope of compiler tweaks, but we're talking orders of magnitude. Kris Also, a while ago I did tests with various GCC compilers and their effect on Python running time as well as Intel's cc. Intel won on (nearly) all accounts, meaning it was faster overall. From the top of my mind: GCC 4.1.x was faster than GCC 4.2.x. -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
samwyse wrote: On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: samwyse wrote: You might want to look at Plex. http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/ "Another advantage of Plex is that it compiles all of the regular expressions into a single DFA. Once that's done, the input can be processed in a time proportional to the number of characters to be scanned, and independent of the number or complexity of the regular expressions. Python's existing regular expression matchers do not have this property. " Hmm, unfortunately it's still orders of magnitude slower than grep in my own application that involves matching lots of strings and regexps against large files (I killed it after 400 seconds, compared to 1.5 for grep), and that's leaving aside the much longer compilation time (over a minute). If the matching was fast then I could possibly pickle the lexer though (but it's not). That's funny, the compilation is almost instantaneous for me. My lexicon was quite a bit bigger, containing about 150 strings and regexps. However, I just tested it to several files, the first containing 4875*'a', the rest each twice the size of the previous. And you're right, for each doubling of the file size, the match take four times as long, meaning O(n^2). 156000*'a' would probably take 8 hours. Here are my results: The docs say it is supposed to be linear in the file size ;-) ;-( Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
John Machin wrote: Uh-huh ... try this, then: http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/ You could use this to find the "Str" cases and the prefixes of the "re" cases (which seem to be no more complicated than 'foo.*bar.*zot') and use something slower like Python's re to search the remainder of the line for 'bar.*zot'. If it was just strings, then sure...with regexps it might be possible to make it work, but it doesn't sound particularly maintainable. I will stick with my shell script until python gets a regexp engine of equivalent performance. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: re.search much slower then grep on some regular expressions
J. Cliff Dyer wrote:
On Wed, 2008-07-09 at 12:29 -0700, samwyse wrote:
On Jul 8, 11:01 am, Kris Kennaway <[EMAIL PROTECTED]> wrote:
samwyse wrote:
You might want to look at Plex.
http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/
"Another advantage of Plex is that it compiles all of the regular
expressions into a single DFA. Once that's done, the input can be
processed in a time proportional to the number of characters to be
scanned, and independent of the number or complexity of the regular
expressions. Python's existing regular expression matchers do not have
this property. "
Hmm, unfortunately it's still orders of magnitude slower than grep in my
own application that involves matching lots of strings and regexps
against large files (I killed it after 400 seconds, compared to 1.5 for
grep), and that's leaving aside the much longer compilation time (over a
minute). If the matching was fast then I could possibly pickle the
lexer though (but it's not).
That's funny, the compilation is almost instantaneous for me.
However, I just tested it to several files, the first containing
4875*'a', the rest each twice the size of the previous. And you're
right, for each doubling of the file size, the match take four times
as long, meaning O(n^2). 156000*'a' would probably take 8 hours.
Here are my results:
compile_lexicon() took 0.0236021580595 secs
test('file-0.txt') took 24.8322969831 secs
test('file-1.txt') took 99.3956799681 secs
test('file-2.txt') took 398.349623132 secs
Sounds like a good strategy would be to find the smallest chunk of the
file that matches can't cross, and iterate your search on units of those
chunks. For example, if none of your regexes cross line boundaries,
search each line of the file individually. That may help turn around
the speed degradation you're seeing.
That's what I'm doing. I've also tried various other things like
mmapping the file and searching it at once, etc, but almost all of the
time is spent in the regexp engine so optimizing other things only gives
marginal improvement.
Kris
--
http://mail.python.org/mailman/listinfo/python-list
Re: multithreading in python ???
Laszlo Nagy wrote: Abhishek Asthana wrote: Hi all , I have large set of data computation and I want to break it into small batches and assign it to different threads .I am implementing it in python only. Kindly help what all libraries should I refer to implement the multithreading in python. You should not do this. Python can handle multiple threads but they always use the same processor. (at least in CPython.) In order to take advantage of multiple processors, use different processes. Only partly true. Threads executing in the python interpreter are serialized and only run on a single CPU at a time. Depending on what modules you use they may be able to operate independently on multiple CPUs. The term to research is "GIL" (Global Interpreter Lock). There are many webpages discussing it, and the alternative strategies you can use. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: pyprocessing/multiprocessing for x64?
Benjamin Kaplan wrote: The only problem I can see is that 32-bit programs can't access 64-bit dlls, so the OP might have to install the 32-bit version of Python for it to work. Anyway, all of this is beside the point, because the multiprocessing module works fine on amd64 systems. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: variable expansion with sqlite
marc wyburn wrote: Hi and thanks, I was hoping to avoid having to weld qmarks together but I guess that's why people use things like SQL alchemy instead. It's a good lesson anyway. The '?' substitution is there to safely handle untrusted input. You *don't* want to pass in arbitrary user data into random parts of an SQL statement (or your database will get 0wned). I think of it as a reminder that when you have to construct your own query template by using "... %s ..." % (foo) to bypass this limitation, that you had better be darn sure the parameters you are passing in are safe. Kris -- http://mail.python.org/mailman/listinfo/python-list
Constructing MIME message without loading message stream
I would like to MIME encode a message from a large file without first loading the file into memory. Assume the file has been pre-encoded on disk (actually I am using encode_7or8bit, so the encoding should be null). Is there a way to construct the flattened MIME message such that data is streamed from the file as needed instead of being resident in memory? Do I have to subclass the MIMEBase class myself? Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Angel Gutierrez wrote: Steven D'Aprano wrote: On Thu, 07 Aug 2008 00:44:14 -0700, alex23 wrote: Steven D'Aprano wrote: In other words, about 20% of the time he measures is the time taken to print junk to the screen. Which makes his claim that "all the console outputs have been removed so that the benchmarking activity is not interfered with by the IO overheads" somewhat confusing...he didn't notice the output? Wrote it off as a weird Python side-effect? Wait... I've just remembered, and a quick test confirms... Python only prints bare objects if you are running in a interactive shell. Otherwise output of bare objects is suppressed unless you explicitly call print. Okay, I guess he is forgiven. False alarm, my bad. Well.. there must be somthing because this is what I got in a normal script execution: [EMAIL PROTECTED] test]$ python iter.py Time per iteration = 357.467989922 microseconds [EMAIL PROTECTED] test]$ vim iter.py [EMAIL PROTECTED] test]$ python iter2.py Time per iteration = 320.306909084 microseconds [EMAIL PROTECTED] test]$ vim iter2.py [EMAIL PROTECTED] test]$ python iter2.py Time per iteration = 312.917997837 microseconds What is the standard deviation on those numbers? What is the confidence level that they are distinct? In a thread complaining about poor benchmarking it's disappointing to see crappy test methodology being used to try and demonstrate flaws in the test. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: Constructing MIME message without loading message stream
Diez B. Roggisch wrote: Kris Kennaway schrieb: I would like to MIME encode a message from a large file without first loading the file into memory. Assume the file has been pre-encoded on disk (actually I am using encode_7or8bit, so the encoding should be null). Is there a way to construct the flattened MIME message such that data is streamed from the file as needed instead of being resident in memory? Do I have to subclass the MIMEBase class myself? I don't know what you are after here - but I *do* know that anything above 10MB or so is most probably not transferable using mail, as MTAs impose limits on message-sizes. Or in other words: usually, whatever you want to encode should fit in memory as the network is limiting you. MIME encoding is used for other things than emails. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Peter Otten wrote: [EMAIL PROTECTED] wrote: On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris And can you shed some light on how that relates with one of the zens of python ? There should be one-- and preferably only one --obvious way to do it. For the record, the impact of range() versus xrange() is negligable -- on my machine the xrange() variant even runs a tad slower. So it's not clear whether Kris actually knows what he's doing. You are only thinking in terms of execution speed. Now think about memory use. Using iterators instead of constructing lists is something that needs to permeate your thinking about python or you will forever be writing code that wastes memory, sometimes to a large extent. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: SSH utility
James Brady wrote: Hi all, I'm looking for a python library that lets me execute shell commands on remote machines. I've tried a few SSH utilities so far: paramiko, PySSH and pssh; unfortunately all been unreliable, and repeated questions on their respective mailing lists haven't been answered... It seems like the sort of commodity task that there should be a pretty robust library for. Are there any suggestions for alternative libraries or approaches? Personally I just Popen ssh directly. Things like paramiko make me concerned; getting the SSH protocol right is tricky and not something I want to trust to projects that have not had significant experience and auditing. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: benchmark
Peter Otten wrote: Kris Kennaway wrote: Peter Otten wrote: [EMAIL PROTECTED] wrote: On Aug 10, 10:10 pm, Kris Kennaway <[EMAIL PROTECTED]> wrote: jlist wrote: I think what makes more sense is to compare the code one most typically writes. In my case, I always use range() and never use psyco. But I guess for most of my work with Python performance hasn't been a issue. I haven't got to write any large systems with Python yet, where performance starts to matter. Hopefully when you do you will improve your programming practices to not make poor choices - there are few excuses for not using xrange ;) Kris And can you shed some light on how that relates with one of the zens of python ? There should be one-- and preferably only one --obvious way to do it. For the record, the impact of range() versus xrange() is negligable -- on my machine the xrange() variant even runs a tad slower. So it's not clear whether Kris actually knows what he's doing. You are only thinking in terms of execution speed. Yes, because my remark was made in the context of the particular benchmark supposed to be the topic of this thread. No, you may notice that the above text has moved off onto another discussion. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: In-place memory manager, mmap (was: Fastest way to store ints and floats on disk)
castironpi wrote: Hi, I've got an "in-place" memory manager that uses a disk-backed memory- mapped buffer. Among its possibilities are: storing variable-length strings and structures for persistence and interprocess communication with mmap. It allocates segments of a generic buffer by length and returns an offset to the reserved block, which can then be used with struct to pack values to store. The data structure is adapted from the GNU PAVL binary tree. Allocated blocks can be cast to ctypes.Structure instances using some monkey patching, which is optional. Want to open-source it. Any interest? Just do it. That way users can come along later. Kris -- http://mail.python.org/mailman/listinfo/python-list
Re: In-place memory manager, mmap
castironpi wrote: On Aug 24, 9:52 am, Kris Kennaway <[EMAIL PROTECTED]> wrote: castironpi wrote: Hi, I've got an "in-place" memory manager that uses a disk-backed memory- mapped buffer. Among its possibilities are: storing variable-length strings and structures for persistence and interprocess communication with mmap. It allocates segments of a generic buffer by length and returns an offset to the reserved block, which can then be used with struct to pack values to store. The data structure is adapted from the GNU PAVL binary tree. Allocated blocks can be cast to ctypes.Structure instances using some monkey patching, which is optional. Want to open-source it. Any interest? Just do it. That way users can come along later. Kris How? My website? Google Code? Too small for source forge, I think. -- http://mail.python.org/mailman/listinfo/python-list Any of those 3 would work fine, but the last two are probably better (sourceforge hosts plenty of tiny projects) if you don't want to have to manage your server and related infrastructure yourself. Kris -- http://mail.python.org/mailman/listinfo/python-list
GDAL installation
Hi Python Users,
I currently installed the Python 2.7.9 and installed the GDAL package.
First, I tried to install GDAL using PIP but it throws an error - I cannot
remember the exact error message. So, I install it using easy_install
command. But when I import the package I am getting this message, which I
really don't understand.
Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation. All rights reserved.
C:\Users\lpalao>python
> Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit (AMD64)]
> on win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import gdal
> Traceback (most recent call last):
> File "", line 1, in
> File "C:\Python27\Python2.7.9\lib\site-packages\gdal.py", line 2, in
>
> from osgeo.gdal import deprecation_warn
> File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py", line
> 21, in
> _gdal = swig_import_helper()
> File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py", line
> 17, in swig_import_helper
> _mod = imp.load_module('_gdal', fp, pathname, description)
> ImportError: DLL load failed: The specified module could not be found.
> >>>
Thanks in advance,
-Leo
--
https://mail.python.org/mailman/listinfo/python-list
Re: GDAL installation
Hi Asim, thanks for your help. It is working properly now.
Thanks,
-Leo
On Wed, Feb 11, 2015 at 4:48 PM, Asim Jalis wrote:
> Hi Leo,
>
> This might be a PATH issue.
>
> See this discussion for details.
>
>
> https://pythongisandstuff.wordpress.com/2011/07/07/installing-gdal-and-ogr-for-python-on-windows/
>
> Asim
>
> On Tue, Feb 10, 2015 at 9:11 PM, Leo Kris Palao
> wrote:
>
>> Hi Python Users,
>>
>> I currently installed the Python 2.7.9 and installed the GDAL package.
>> First, I tried to install GDAL using PIP but it throws an error - I cannot
>> remember the exact error message. So, I install it using easy_install
>> command. But when I import the package I am getting this message, which I
>> really don't understand.
>>
>> Microsoft Windows [Version 6.1.7601]
>> Copyright (c) 2009 Microsoft Corporation. All rights reserved.
>>
>> C:\Users\lpalao>python
>>> Python 2.7.9 (default, Dec 10 2014, 12:28:03) [MSC v.1500 64 bit
>>> (AMD64)] on win32
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import gdal
>>> Traceback (most recent call last):
>>> File "", line 1, in
>>> File "C:\Python27\Python2.7.9\lib\site-packages\gdal.py", line 2, in
>>>
>>> from osgeo.gdal import deprecation_warn
>>> File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py",
>>> line 21, in
>>> _gdal = swig_import_helper()
>>> File "C:\Python27\Python2.7.9\lib\site-packages\osgeo\__init__.py",
>>> line 17, in swig_import_helper
>>> _mod = imp.load_module('_gdal', fp, pathname, description)
>>> ImportError: DLL load failed: The specified module could not be found.
>>> >>>
>>
>>
>> Thanks in advance,
>> -Leo
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>>
>
--
https://mail.python.org/mailman/listinfo/python-list
Random forest and svm for remote sensing in python
Hi Python Users, Good day! I am currently using ENVI for my image processing/remote sensing work, but would love to divert into open source python programming for remote sensing. Can you give me some good sites where I can see practical examples of how python is used for remote sensing specially using random forest and support vector machine algorithms. Thanks, -Leo -- https://mail.python.org/mailman/listinfo/python-list
Configuring problems with GDAL in enthought Python Canopy
Hi ALL, Just wanted to ask if somebody could guide me in installing GDAL in my Python installed using Canopy. Could you give me some steps how to successfully install this package? I got it running using my previous Python Installation, but I removed it and used Canopy Python now. btw: my python installation is located in: C:\Users\lpalao\AppData\Local\Enthought\Canopy\User\Scripts\python.exe Thanks in advance for the help. -Leo -- https://mail.python.org/mailman/listinfo/python-list
GDAL Installation in Enthought Python Distribution
Hi Python Users, Would like to request how to install GDAL in my Enthought Python Distribution (64-bit). I am having some problems making GDAL work. Or can you point me into a blog that describes how to set up GDAL in Enthought Python Distribution. Thanks for any help. -Leo -- https://mail.python.org/mailman/listinfo/python-list
PyObject_CallFunctionObjArgs segfaults
Recently I completed a project where I used PyObject_CallFunctionObjArgs
extensively with the NLTK library from a program written in NASM, with no
problems. Now I am on a new project where I call the Python random library. I
use the same setup as before, but I am getting a segfault with random.seed.
At the start of the NASM program I call a C API program that gets PyObject
pointers to “seed” and “randrange” in the same way as I did before:
int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);
if (pMod_random == 0x0){
PyErr_Print();
return 1;}
PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");
return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;
return 0;
}
Later in the same program I call a C API program to call random.seed:
int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
if (p_seed_calc == 0x0){
PyErr_Print();
return 1;}
//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);
return return_val;
}
The first program correctly imports “random” and gets pointers to “seed” and
“randrange.” I verified that the same pointer is correctly passed into
C_API_2, and the seed value (1234) is passed as Py_ssize_t value_1. But I get
this segfault:
Program received signal SIGSEGV, Segmentation fault.
0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
459 ../Include/object.h: No such file or directory.
So I tried Py_INCREF in the first program:
Py_INCREF(pMod_random);
Py_INCREF(pAttr_seed);
Then I moved Py_INCREF(pAttr_seed) to the second program. Same segfault.
Finally, I initialized “random” and “seed” in the second program, where they
are used. Same segfault.
The segfault refers to Py_INCREF, so this seems to do with reference counting,
but Py_INCREF didn’t solve it.
I’m using Python 3.8 on Ubuntu.
Thanks for any ideas on how to solve this.
Jen
--
https://mail.python.org/mailman/listinfo/python-list
Re: PyObject_CallFunctionObjArgs segfaults
Thanks very much to @MRAB for taking time to answer. I changed my code to
conform to your answer (as best I understand your comments on references), but
I still get the same error. My comments continue below the new code
immediately below.
int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);
Py_INCREF(pName_random);
Py_INCREF(pMod_random);
if (pMod_random == 0x0){
PyErr_Print();
return 1;}
PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");
Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);
return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;
return 0;
}
int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = (PyObject * )value_1;
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr,
NULL);
if (p_seed_calc == 0x0){
PyErr_Print();
return 1;}
//Prepare return values
long return_val = PyLong_AsLong(p_seed_calc);
return return_val;
}
So I incremented the reference to all objects in Get_LibModules, but I still
get the same segfault at PyObject_CallFunctionObjArgs. Unfortunately,
reference counting is not well documented so I’m not clear what’s wrong.
Sep 29, 2022, 10:06 by [email protected]:
> On 2022-09-29 16:54, Jen Kris via Python-list wrote:
>
>> Recently I completed a project where I used PyObject_CallFunctionObjArgs
>> extensively with the NLTK library from a program written in NASM, with no
>> problems. Now I am on a new project where I call the Python random library.
>> I use the same setup as before, but I am getting a segfault with
>> random.seed.
>>
>> At the start of the NASM program I call a C API program that gets PyObject
>> pointers to “seed” and “randrange” in the same way as I did before:
>>
>> int64_t Get_LibModules(int64_t * return_array)
>> {
>> PyObject * pName_random = PyUnicode_FromString("random");
>> PyObject * pMod_random = PyImport_Import(pName_random);
>>
> Both PyUnicode_FromString and PyImport_Import return new references or null
> pointers.
>
>> if (pMod_random == 0x0){
>> PyErr_Print();
>>
>
> You're leaking a reference here (pName_random).
>
>> return 1;}
>>
>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random,
>> "randrange");
>>
>> return_array[0] = (int64_t)pAttr_seed;
>> return_array[1] = (int64_t)pAttr_randrange;
>>
>
> You're leaking 2 references here (pName_random and pMod_random).
>
>> return 0;
>> }
>>
>> Later in the same program I call a C API program to call random.seed:
>>
>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>> {
>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1);
>>
>
> It's expecting all of the arguments to be PyObject*, but value_1 is
> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int).
>
> The argument list must end with a null pointer.
>
> It returns a new reference or a null pointer.
>
>>
>> if (p_seed_calc == 0x0){
>> PyErr_Print();
>> return 1;}
>>
>> //Prepare return values
>> long return_val = PyLong_AsLong(p_seed_calc);
>>
> You're leaking a reference here (p_seed_calc).
>
>> return return_val;
>> }
>>
>> The first program correctly imports “random” and gets pointers to “seed” and
>> “randrange.” I verified that the same pointer is correctly passed into
>> C_API_2, and the seed value (1234) is passed as Py_ssize_t value_1. But I
>> get this segfault:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x764858d5 in _Py_INCREF (op=0x4d2) at ../Include/object.h:459
>> 459 ../Include/object.h: No such file or directory.
>>
>> So I tried Py_INCREF in the first program:
>>
>> Py_INCREF(pMod_random);
>> Py_INCREF(pAttr_seed);
>>
>> Then I moved Py_INCREF(pAttr_seed) to the second program. Same segfault.
>>
>> Finally, I initialized “random” and “seed” in the second program, where they
>> are used. Same segfault.
>>
>> The segfault refers to Py_INCREF, so this seems to do with reference
>> counting, but Py_INCREF didn’t solve it.
>>
>> I’m using Python 3.8 on Ubuntu.
>>
>> Thanks for any ideas on how to solve this.
>>
>> Jen
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: PyObject_CallFunctionObjArgs segfaults
To update my previous email, I found the problem, but I have a new problem. Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not correct. Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and that works. HOWEVER, while PyObject_CallFunctionObjArgs does work now, it returns -1, which is not the right answer for random.seed. I use "long return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long. So my question is why do I get -1 as return value? When I query p_seed calc : get: (gdb) p p_seed_calc $2 = (PyObject *) 0x769be120 <_Py_NoneStruct> Thanks again. Jen Sep 29, 2022, 13:02 by [email protected]: > Thanks very much to @MRAB for taking time to answer. I changed my code to > conform to your answer (as best I understand your comments on references), > but I still get the same error. My comments continue below the new code > immediately below. > > int64_t Get_LibModules(int64_t * return_array) > { > PyObject * pName_random = PyUnicode_FromString("random"); > PyObject * pMod_random = PyImport_Import(pName_random); > > Py_INCREF(pName_random); > Py_INCREF(pMod_random); > > if (pMod_random == 0x0){ > PyErr_Print(); > return 1;} > > PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); > PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange"); > > Py_INCREF(pAttr_seed); > Py_INCREF(pAttr_randrange); > > return_array[0] = (int64_t)pAttr_seed; > return_array[1] = (int64_t)pAttr_randrange; > > return 0; > } > > int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) > { > PyObject * value_ptr = (PyObject * )value_1; > PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr, > NULL); > > if (p_seed_calc == 0x0){ > PyErr_Print(); > return 1;} > > //Prepare return values > long return_val = PyLong_AsLong(p_seed_calc); > > return return_val; > } > > So I incremented the reference to all objects in Get_LibModules, but I still > get the same segfault at PyObject_CallFunctionObjArgs. Unfortunately, > reference counting is not well documented so I’m not clear what’s wrong. > > > > > Sep 29, 2022, 10:06 by [email protected]: > >> On 2022-09-29 16:54, Jen Kris via Python-list wrote: >> >>> Recently I completed a project where I used PyObject_CallFunctionObjArgs >>> extensively with the NLTK library from a program written in NASM, with no >>> problems. Now I am on a new project where I call the Python random >>> library. I use the same setup as before, but I am getting a segfault with >>> random.seed. >>> >>> At the start of the NASM program I call a C API program that gets PyObject >>> pointers to “seed” and “randrange” in the same way as I did before: >>> >>> int64_t Get_LibModules(int64_t * return_array) >>> { >>> PyObject * pName_random = PyUnicode_FromString("random"); >>> PyObject * pMod_random = PyImport_Import(pName_random); >>> >> Both PyUnicode_FromString and PyImport_Import return new references or null >> pointers. >> >>> if (pMod_random == 0x0){ >>> PyErr_Print(); >>> >> >> You're leaking a reference here (pName_random). >> >>> return 1;} >>> >>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed"); >>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, >>> "randrange"); >>> >>> return_array[0] = (int64_t)pAttr_seed; >>> return_array[1] = (int64_t)pAttr_randrange; >>> >> >> You're leaking 2 references here (pName_random and pMod_random). >> >>> return 0; >>> } >>> >>> Later in the same program I call a C API program to call random.seed: >>> >>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) >>> { >>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_1); >>> >> >> It's expecting all of the arguments to be PyObject*, but value_1 is >> Py_ssize_t instead of PyObject* (a pointer to a _Python_ int). >> >> The argument list must end with a null pointer. >> >> It returns a new reference or a null pointer. >> >>> >>> if (p_seed_calc == 0x0){ >>> PyErr_Print(); >>> return 1;} >>> >>> //Prepare return values >>> long return_val = PyLong_AsLong(p_seed_calc); >>> >> You're leaking a reference here (p_seed_calc). >> &
Re: PyObject_CallFunctionObjArgs segfaults
I just solved this C API problem, and I’m posting the answer to help anyone
else who might need it.
The errors were:
(1) we must call Py_INCREF on each object when it’s created.
(2) in C_API_2 (see below) we don’t cast value_1 as I did before with PyObject
* value_ptr = (PyObject * )value_1. Instead we use PyObject * value_ptr =
PyLong_FromLong(value_1);
(3) The command string to PyObject_CallFunctionObjArgs must be null terminated.
Here’s the revised code:
First we load the modules, and increment the reference to each object:
int64_t Get_LibModules(int64_t * return_array)
{
PyObject * pName_random = PyUnicode_FromString("random");
PyObject * pMod_random = PyImport_Import(pName_random);
Py_INCREF(pName_random);
Py_INCREF(pMod_random);
if (pMod_random == 0x0){
PyErr_Print();
return 1;}
PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random, "randrange");
Py_INCREF(pAttr_seed);
Py_INCREF(pAttr_randrange);
return_array[0] = (int64_t)pAttr_seed;
return_array[1] = (int64_t)pAttr_randrange;
return 0;
}
Next we call a program to initialize the random number generator with
random.seed(), and increment the reference to its return value p_seed_calc:
int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr,
NULL);
// _
if (p_seed_calc == 0x0){
PyErr_Print();
return 1;}
Py_INCREF(p_seed_calc);
return 0;
}
Now we call another program to get a random number:
int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1)
{
PyObject * value_ptr = PyLong_FromLong(value_1);
PyObject * p_randrange_calc = PyObject_CallFunctionObjArgs(pAttr_randrange,
value_ptr, NULL);
if (p_randrange_calc == 0x0){
PyErr_Print();
return 1;}
//Prepare return values
long return_val = PyLong_AsLong(p_randrange_calc);
return return_val;
}
That returns 28, which is what I get from the Python command line.
Thanks again to MRAB for helpful comments.
Jen
Sep 29, 2022, 15:31 by [email protected]:
> On 2022-09-29 21:47, Jen Kris wrote:
>
>> To update my previous email, I found the problem, but I have a new problem.
>>
>> Previously I cast PyObject * value_ptr = (PyObject * )value_1 but that's not
>> correct. Instead I used PyObject * value_ptr = PyLong_FromLong(value_1) and
>> that works. HOWEVER, while PyObject_CallFunctionObjArgs does work now, it
>> returns -1, which is not the right answer for random.seed. I use "long
>> return_val = PyLong_AsLong(p_seed_calc);" to convert it to a long.
>>
> random.seed returns None, so when you call PyObject_CallFunctionObjArgs it
> returns a new reference to Py_None.
>
> If you then pass to PyLong_AsLong a reference to something that's not a
> PyLong, it'll set an error and return -1.
>
>> So my question is why do I get -1 as return value? When I query p_seed calc
>> : get:
>>
>> (gdb) p p_seed_calc
>> $2 = (PyObject *) 0x769be120 <_Py_NoneStruct>
>>
> Exactly. It's Py_None, not a PyLong.
>
>> Thanks again.
>>
>> Jen
>>
>>
>>
>>
>> Sep 29, 2022, 13:02 by [email protected]:
>>
>> Thanks very much to @MRAB for taking time to answer. I changed my
>> code to conform to your answer (as best I understand your comments
>> on references), but I still get the same error. My comments
>> continue below the new code immediately below.
>>
>> int64_t Get_LibModules(int64_t * return_array)
>> {
>> PyObject * pName_random = PyUnicode_FromString("random");
>> PyObject * pMod_random = PyImport_Import(pName_random);
>>
>> Py_INCREF(pName_random);
>> Py_INCREF(pMod_random);
>>
>> if (pMod_random == 0x0){
>> PyErr_Print();
>> return 1;}
>>
>> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
>> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random,
>> "randrange");
>>
>> Py_INCREF(pAttr_seed);
>> Py_INCREF(pAttr_randrange);
>>
>> return_array[0] = (int64_t)pAttr_seed;
>> return_array[1] = (int64_t)pAttr_randrange;
>>
>> return 0;
>> }
>>
>> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1)
>> {
>> PyObject * value_ptr = (PyObject * )value_1;
>> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed,
>> value_ptr, NULL);
>>
>> if (p_seed_calc == 0x0){
>> PyErr_Print();
>> return 1;}
>>
>> //Prepare return values
>> long return_val = PyLong
Re: PyObject_CallFunctionObjArgs segfaults
Thanks very much for your detailed reply. I have a few followup questions.
You said, “Some functions return an object that has already been incref'ed
("new reference"). This occurs when it has either created a new object (the
refcount will be 1) or has returned a pointer to an existing object (the
refcount will be > 1 because it has been incref'ed). Other functions return an
object that hasn't been incref'ed. This occurs when you're looking up
something, for example, looking at a member of a list or the value of an
attribute.”
In the official docs some functions show “Return value: New reference” and
others do not. Is there any reason why I should not just INCREF on every new
object, regardless of whether it’s a new reference or not, and DECREF when I am
finished with it? The answer at
https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to
says “With out-of-order execution, the INCREF/DECREF are basically free
operations, so performance is no reason to leave them out.” Doing so means I
don’t have to check each object to see if it needs to be INCREF’d or not, and
that is a big help.
Also:
What is a borrowed reference, and how does it effect reference counting?
According to https://jayrambhia.com/blog/pythonc-api-reference-counting, “Use
Py_INCREF on a borrowed PyObject pointer you already have. This increments the
reference count on the object, and obligates you to dispose of it properly.”
So I guess it’s yes, but I’m confused by “pointer you already have.”
What does it mean to steal a reference? If a function steals a reference does
it have to decref it without incref (because it’s stolen)?
Finally, you said:
if (pMod_random == 0x0){
PyErr_Print();
Leaks here because of the refcount
Assuming pMod_random is not null, why would this leak?
Thanks again for your input on this question.
Jen
Sep 29, 2022, 17:33 by [email protected]:
> On 2022-09-30 01:02, MRAB wrote:
>
>> On 2022-09-29 23:41, Jen Kris wrote:
>>
>>>
>>> I just solved this C API problem, and I’m posting the answer to help anyone
>>> else who might need it.
>>>
> [snip]
>
> What I like to do is write comments that state which variables hold a
> reference, followed by '+' if it's a new reference (incref'ed) and '?' if it
> could be null. '+?' means that it's probably a new reference but could be
> null. Once I know that it's not null, I can remove the '?', and once I've
> decref'ed it (if required) and no longer need it, I remobe it from the
> comment.
>
> Clearing up references, as soon as they're not needed, helps to keep the
> number of current references more manageable.
>
>
> int64_t Get_LibModules(int64_t * return_array) {
> PyObject * pName_random = PyUnicode_FromString("random");
> //> pName_random+?
> if (!pName_random) {
> PyErr_Print();
> return 1;
> }
>
> //> pName_random+
> PyObject * pMod_random = PyImport_Import(pName_random);
> //> pName_random+ pMod_random+?
> Py_DECREF(pName_random);
> //> pMod_random+?
> if (!pMod_random) {
> PyErr_Print();
> return 1;
> }
>
> //> pMod_random+
> PyObject * pAttr_seed = PyObject_GetAttrString(pMod_random, "seed");
> //> pMod_random+ pAttr_seed?
> if (!pAttr_seed) {
> Py_DECREF(pMod_random);
> PyErr_Print();
> return 1;
> }
>
> //> pMod_random+ pAttr_seed
> PyObject * pAttr_randrange = PyObject_GetAttrString(pMod_random,
> "randrange");
> //> pMod_random+ pAttr_seed pAttr_randrange?
> Py_DECREF(pMod_random);
> //> pAttr_seed pAttr_randrange?
> if (!pAttr_randrange) {
> PyErr_Print();
> return 1;
> }
>
> //> pAttr_seed pAttr_randrange
> return_array[0] = (int64_t)pAttr_seed;
> return_array[1] = (int64_t)pAttr_randrange;
>
> return 0;
> }
>
> int64_t C_API_2(PyObject * pAttr_seed, Py_ssize_t value_1) {
> PyObject * value_ptr = PyLong_FromLong(value_1);
> //> value_ptr+?
> if (!!value_ptr) {
> PyErr_Print();
> return 1;
> }
>
> //> value_ptr+
> PyObject * p_seed_calc = PyObject_CallFunctionObjArgs(pAttr_seed, value_ptr,
> NULL);
> //> value_ptr+ p_seed_calc+?
> Py_DECREF(value_ptr);
> //> p_seed_calc+?
> if (!p_seed_calc) {
> PyErr_Print();
> return 1;
> }
>
> //> p_seed_calc+
> Py_DECREF(p_seed_calc);
> return 0;
> }
>
> int64_t C_API_12(PyObject * pAttr_randrange, Py_ssize_t value_1) {
> PyObject * value_ptr = PyLong_FromLong(value_1);
> //> value_ptr+?
> if (!value_ptr) {
> PyErr_Print();
> ret
Re: PyObject_CallFunctionObjArgs segfaults
That's great. It clarifies things a lot for me, particularly re ref count for new references. I would have had trouble if I didn't decref it twice. Thanks very much once again. Sep 30, 2022, 12:18 by [email protected]: > On 2022-09-30 17:02, Jen Kris wrote: > >> >> Thanks very much for your detailed reply. I have a few followup questions. >> >> You said, “Some functions return an object that has already been incref'ed >> ("new reference"). This occurs when it has either created a new object (the >> refcount will be 1) or has returned a pointer to an existing object (the >> refcount will be > 1 because it has been incref'ed). Other functions return >> an object that hasn't been incref'ed. This occurs when you're looking up >> something, for example, looking at a member of a list or the value of an >> attribute.” >> >> In the official docs some functions show “Return value: New reference” and >> others do not. Is there any reason why I should not just INCREF on every >> new object, regardless of whether it’s a new reference or not, and DECREF >> when I am finished with it? The answer at >> https://stackoverflow.com/questions/59870703/python-c-extension-need-to-py-incref-a-borrowed-reference-if-not-returning-it-to >> says “With out-of-order execution, the INCREF/DECREF are basically free >> operations, so performance is no reason to leave them out.” Doing so means >> I don’t have to check each object to see if it needs to be INCREF’d or not, >> and that is a big help. >> > It's OK to INCREF them, provided that you DECREF them when you no longer need > them, and remember that if it's a "new reference" you'd need to DECREF it > twice. > >> Also: >> >> What is a borrowed reference, and how does it effect reference counting? >> According to https://jayrambhia.com/blog/pythonc-api-reference-counting, >> “Use Py_INCREF on a borrowed PyObject pointer you already have. This >> increments the reference count on the object, and obligates you to dispose >> of it properly.” So I guess it’s yes, but I’m confused by “pointer you >> already have.” >> > > A borrowed reference is when it hasn't been INCREFed. > > You can think of INCREFing as a way of indicating ownership, which is often > shared ownership (refcount > 1). When you're borrowing a reference, you're > using it temporarily, but not claiming ownership. When the last owner > releases its ownership (DECREF reduces the refcount to 0), the object can be > garbage collected. > > When, say, you lookup an attribute, or get an object from a list with > PyList_GetItem, it won't have been INCREFed. You're using it temporarily, > just borrowing a reference. > >> >> What does it mean to steal a reference? If a function steals a reference >> does it have to decref it without incref (because it’s stolen)? >> > When function steals a reference, it's claiming ownership but not INCREFing > it. > >> >> Finally, you said: >> >> if (pMod_random == 0x0){ >> PyErr_Print(); >> Leaks here because of the refcount >> >> Assuming pMod_random is not null, why would this leak? >> > It's pName_random that's the leak. > > PyUnicode_FromString("random") will either create and return a new object for > the string "random" (refcount == 1) or return a reference to an existing > object (refcount > 1). You need to DECREF it before returning from the > function. > > Suppose it created a new object. You call the function, it creates an object, > you use it, then return from the function. The object still exists, but > there's no reference to it. Now call the function again. It creates another > object, you use it, then return from the function. You now have 2 objects > with no reference to them. > >> Thanks again for your input on this question. >> >> Jen >> >> >> >> Sep 29, 2022, 17:33 by [email protected]: >> >> On 2022-09-30 01:02, MRAB wrote: >> >> On 2022-09-29 23:41, Jen Kris wrote: >> >> >> I just solved this C API problem, and I’m posting the >> answer to help anyone else who might need it. >> >> [snip] >> >> What I like to do is write comments that state which variables >> hold a reference, followed by '+' if it's a new reference >> (incref'ed) and '?' if it could be null. '+?' means that it's >> probably a new reference but cou
Debugging Python C extensions with GDB
In September 2021, Victor Stinner wrote “Debugging Python C extensions with GDB” (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9). My question is: with Python 3.9+, can I debug into a C extension written in pure C and called from ctypes -- that is not written using the C_API? Thanks. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Debugging Python C extensions with GDB
Thanks for your reply. Victor's article didn't mention ctypes extensions, so I wanted to post a question before I build from source. Nov 14, 2022, 14:32 by [email protected]: > > >> On 14 Nov 2022, at 19:10, Jen Kris via Python-list >> wrote: >> >> In September 2021, Victor Stinner wrote “Debugging Python C extensions with >> GDB” >> (https://developers.redhat.com/articles/2021/09/08/debugging-python-c-extensions-gdb#getting_started_with_python_3_9). >> >> >> My question is: with Python 3.9+, can I debug into a C extension written in >> pure C and called from ctypes -- that is not written using the C_API? >> > > Yes. > > Just put a breakpoint on the function in the c library that you want to debug. > You can set the breakpoint before a .so is loaded. > > Barry > >> >> Thanks. >> >> Jen >> >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
To clarify how Python handles two equal objects
I am writing a spot speedup in assembly language for a short but computation-intensive Python loop, and I discovered something about Python array handling that I would like to clarify. For a simplified example, I created a matrix mx1 and assigned the array arr1 to the third row of the matrix: mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] arr1 = mx1[2] The pointers to these are now the same: ida = id(mx1[2]) - 140260325306880 idb = id(arr1) - 140260325306880 That’s great because when I encounter this in assembly or C, I can just borrow the pointer to row 3 for the array arr1, on the assumption that they will continue to point to the same object. Then when I do any math operations in arr1 it will be reflected in both arrays because they are now pointing to the same array: arr1[0] += 2 print(mx1[2]) - [9, 8, 9] print(arr1) - [9, 8, 9] Now mx1 looks like this: [ 1, 2, 3 ] [ 4, 5, 6 ] [ 9, 8, 9 ] and it stays that way for remaining iterations. But on the next iteration we assign arr1 to something else: arr1 = [ 10, 11, 12 ] idc = id(arr1) – 140260325308160 idd = id(mx1[2]) – 140260325306880 Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 will not affect mx1. So where I’m rewriting some Python code in a low level language, I can’t assume that the two objects are equal because that equality will not remain if either is reassigned. So if I do some operation on one array I have to conform the two arrays for as long as they remain equal, I can’t just do it in one operation because I can’t rely on the objects remaining equal. Is my understanding of this correct? Is there anything I’m missing? Thanks very much. Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Thanks for your comments. I'd like to make one small point. You say: "Assignment in Python is a matter of object references. It's not "conform them as long as they remain equal". You'll have to think in terms of object references the entire way." But where they have been set to the same object, an operation on one will affect the other as long as they are equal (in Python). So I will have to conform them in those cases because Python will reflect any math operation in both the array and the matrix. Jan 10, 2023, 12:28 by [email protected]: > On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list > wrote: > >> >> I am writing a spot speedup in assembly language for a short but >> computation-intensive Python loop, and I discovered something about Python >> array handling that I would like to clarify. >> >> For a simplified example, I created a matrix mx1 and assigned the array arr1 >> to the third row of the matrix: >> >> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] >> arr1 = mx1[2] >> >> The pointers to these are now the same: >> >> ida = id(mx1[2]) - 140260325306880 >> idb = id(arr1) - 140260325306880 >> >> That’s great because when I encounter this in assembly or C, I can just >> borrow the pointer to row 3 for the array arr1, on the assumption that they >> will continue to point to the same object. Then when I do any math >> operations in arr1 it will be reflected in both arrays because they are now >> pointing to the same array: >> > > That's not an optimization; what you've done is set arr1 to be a > reference to that object. > >> But on the next iteration we assign arr1 to something else: >> >> arr1 = [ 10, 11, 12 ] >> idc = id(arr1) – 140260325308160 >> idd = id(mx1[2]) – 140260325306880 >> >> Now arr1 is no longer equal to mx1[2], and any subsequent operations in arr1 >> will not affect mx1. >> > > Yep, you have just set arr1 to be a completely different object. > >> So where I’m rewriting some Python code in a low level language, I can’t >> assume that the two objects are equal because that equality will not remain >> if either is reassigned. So if I do some operation on one array I have to >> conform the two arrays for as long as they remain equal, I can’t just do it >> in one operation because I can’t rely on the objects remaining equal. >> >> Is my understanding of this correct? Is there anything I’m missing? >> > > Assignment in Python is a matter of object references. It's not > "conform them as long as they remain equal". You'll have to think in > terms of object references the entire way. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
There are cases where NumPy would be the best choice, but that wasn’t the case here with what the loop was doing. To sum up what I learned from this post, where one object derives from another object (a = b[0], for example), any operation that would alter one will alter the other. When either is assigned to something else, then they no longer point to the same memory location and they’re once again independent. I hope the word "derives" sidesteps the semantic issue of whether they are "equal." Thanks to all who replied to this post. Jen Jan 10, 2023, 13:59 by [email protected]: > Just to add a possibly picky detail to what others have said, Python does not > have an "array" type. It has a "list" type, as well as some other, not > necessarily mutable, sequence types. > > If you want to speed up list and matrix operations, you might use NumPy. Its > arrays and matrices are heavily optimized for fast processing and provide > many useful operations on them. No use calling out to C code yourself when > NumPy has been refining that for many years. > > On 1/10/2023 4:10 PM, MRAB wrote: > >> On 2023-01-10 20:41, Jen Kris via Python-list wrote: >> >>> >>> Thanks for your comments. I'd like to make one small point. You say: >>> >>> "Assignment in Python is a matter of object references. It's not >>> "conform them as long as they remain equal". You'll have to think in >>> terms of object references the entire way." >>> >>> But where they have been set to the same object, an operation on one will >>> affect the other as long as they are equal (in Python). So I will have to >>> conform them in those cases because Python will reflect any math operation >>> in both the array and the matrix. >>> >> It's not a 2D matrix, it's a 1D list containing references to 1D lists, each >> of which contains references to Python ints. >> >> In CPython, references happen to be pointers, but that's just an >> implementation detail. >> >>> >>> >>> Jan 10, 2023, 12:28 by [email protected]: >>> >>>> On Wed, 11 Jan 2023 at 07:14, Jen Kris via Python-list >>>> wrote: >>>> >>>>> >>>>> I am writing a spot speedup in assembly language for a short but >>>>> computation-intensive Python loop, and I discovered something about >>>>> Python array handling that I would like to clarify. >>>>> >>>>> For a simplified example, I created a matrix mx1 and assigned the array >>>>> arr1 to the third row of the matrix: >>>>> >>>>> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] >>>>> arr1 = mx1[2] >>>>> >>>>> The pointers to these are now the same: >>>>> >>>>> ida = id(mx1[2]) - 140260325306880 >>>>> idb = id(arr1) - 140260325306880 >>>>> >>>>> That’s great because when I encounter this in assembly or C, I can just >>>>> borrow the pointer to row 3 for the array arr1, on the assumption that >>>>> they will continue to point to the same object. Then when I do any math >>>>> operations in arr1 it will be reflected in both arrays because they are >>>>> now pointing to the same array: >>>>> >>>> >>>> That's not an optimization; what you've done is set arr1 to be a >>>> reference to that object. >>>> >>>>> But on the next iteration we assign arr1 to something else: >>>>> >>>>> arr1 = [ 10, 11, 12 ] >>>>> idc = id(arr1) – 140260325308160 >>>>> idd = id(mx1[2]) – 140260325306880 >>>>> >>>>> Now arr1 is no longer equal to mx1[2], and any subsequent operations in >>>>> arr1 will not affect mx1. >>>>> >>>> >>>> Yep, you have just set arr1 to be a completely different object. >>>> >>>>> So where I’m rewriting some Python code in a low level language, I can’t >>>>> assume that the two objects are equal because that equality will not >>>>> remain if either is reassigned. So if I do some operation on one array I >>>>> have to conform the two arrays for as long as they remain equal, I can’t >>>>> just do it in one operation because I can’t rely on the objects remaining >>>>> equal. >>>>> >>>>> Is my understanding of this correct? Is there anything I’m missing? >>>>> >>>> >>>> Assignment in Python is a matter of object references. It's not >>>> "conform them as long as they remain equal". You'll have to think in >>>> terms of object references the entire way. >>>> >>>> ChrisA >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>>> > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Yes, I did understand that. In your example, "a" and "b" are the same pointer, so an operation on one is an operation on the other (because they’re the same memory block). My issue in Python came up because Python can dynamically change one or the other to a different object (memory block) so I have to be aware of that when handing this kind of situation. Jan 10, 2023, 17:31 by [email protected]: > On 11/01/23 11:21 am, Jen Kris wrote: > >> where one object derives from another object (a = b[0], for example), any >> operation that would alter one will alter the other. >> > > I think you're still confused. In C terms, after a = b[0], a and b[0] > are pointers to the same block of memory. If you change that block of > memory, then of course you will see the change through either pointer. > > Here's a rough C translation of some of your Python code: > > /* mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] */ > int **mx1 = (int **)malloc(3 * sizeof(int *)); > mx1[0] = (int *)malloc(3 * sizeof(int)); > mx1[0][0] = 1; > mx1[0][1] = 2; > mx1[0][2] = 3; > mx1[1] = (int *)malloc(3 * sizeof(int)); > mx1[1][0] = 4; > mx1[1][1] = 5; > mx1[1][2] = 6; > mx1[2] = (int *)malloc(3 * sizeof(int)); > mx1[2][0] = 7; > mx1[2][1] = 8; > mx1[2][2] = 9; > > /* arr1 = mx1[2] */ > int *arr1 = mx[2]; > > /* arr1 = [ 10, 11, 12 ] */ > arr1 = (int *)malloc(3 * sizeof(int)); > arr1[0] = 10; > arr1[1] = 11; > arr1[2] = 12; > > Does that help your understanding? > > -- > Greg > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: To clarify how Python handles two equal objects
Thanks for your comments. After all, I asked for clarity so it’s not pedantic to be precise, and you’re helping to clarify. Going back to my original post, mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] arr1 = mx1[2] Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed because while they are different names, they are the assigned same memory location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both names will be updated. That’s what I meant by "an operation on one is an operation on the other." To be more precise, an operation on one name will be reflected in the other name. The difference is in the names, not the pointers. Each name has the same pointer in my example, but operations can be done in Python using either name. Jan 11, 2023, 09:13 by [email protected]: > Op 11/01/2023 om 16:33 schreef Jen Kris via Python-list: > >> Yes, I did understand that. In your example, "a" and "b" are the same >> pointer, so an operation on one is an operation on the other (because >> they’re the same memory block). >> > > Sorry if you feel I'm being overly pedantic, but your explanation "an > operation on one is an operation on the other (because they’re the same > memory block)" still feels a bit misguided. "One" and "other" still make it > sound like there are two objects, and "an operation on one" and "an operation > on the other" make it sound like there are two operations. > Sometimes it doesn't matter if we're a bit sloppy for sake of simplicity or > convenience, sometimes we really need to be precise. I think this is a case > where we need to be precise. > > So, to be precise: there is only one object, with possible multiple names to > it. We can change the object, using one of the names. That is one and only > one operation on one and only one object. Since the different names refer to > the same object, that change will of course be visible through all of them. > Note that 'name' in that sentence doesn't just refer to variables (mx1, arr1, > ...) but also things like indexed lists (mx1[0], mx1[[0][0], ...), loop > variables, function arguments. > > The correct mental model is important here, and I do think you're on track or > very close to it, but the way you phrase things does give me that nagging > feeling that you still might be just a bit off. > > -- > "Peace cannot be kept by force. It can only be achieved through > understanding." > -- Albert Einstein > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
RE: To clarify how Python handles two equal objects
Avi, Thanks for your comments. You make a good point. Going back to my original question, and using your slice() example: middle_by_two = slice(5, 10, 2) nums = [n for n in range(12)] q = nums[middle_by_two] x = id(q) b = q y = id(b) If I assign "b" to "q", then x and y match – they point to the same memory until "b" OR "q" are reassigned to something else. If "q" changes during the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as in: nums = [n for n in range(2, 14)] q = nums[middle_by_two] x = id(q) y = id(b) Now "x" and "y" are different, as we would expect. So when writing a spot speed up in a compiled language, you can see in the Python source if either is reassigned, so you’ll know how to handle it. The motivation behind my question was that in a compiled extension it’s faster to borrow a pointer than to move an entire array if it’s possible, but special care must be taken. Jen Jan 12, 2023, 20:51 by [email protected]: > Jen, > > It is dangerous territory you are treading as there are times all or parts of > objects are copied, or changed in place or the method you use to make a view > is not doing quite what you want. > > As an example, you can create a named slice such as: > > middle_by_two = slice(5, 10, 2) > > The above is not in any sense pointing at anything yet. But given a long > enough list or other such objects, it will take items (starting at index 0) > starting with item that are at indices 5 then 7 then 9 as in this: > > nums = [n for n in range(12)] > nums[middle_by_two] > > [5, 7, 9] > > The same slice will work on anything else: > > list('abcdefghijklmnopqrstuvwxyz')[middle_by_two] > ['f', 'h', 'j'] > > So although you may think the slice is bound to something, it is not. It is > an object that only later is briefly connected to whatever you want to apply > it to. > > If I later change nums, above, like this: > > nums = [-3, -2, -1] + nums > nums > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] > nums[middle_by_two] > [2, 4, 6] > > In the example, you can forget about whether we are talking about pointers > directly or indirectly or variable names and so on. Your "view" remains valid > ONLY as long as you do not change either the slice or the underlying object > you are applying to -- at least not the items you want to extract. > > Since my example inserted three new items at the start using negative numbers > for illustration, you would need to adjust the slice by making a new slice > designed to fit your new data. The example below created an adjusted slice > that adds 3 to the start and stop settings of the previous slice while > copying the step value and then it works on the elongated object: > > middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + 3, > middle_by_two.step) > nums[middle_by_two_adj] > [5, 7, 9] > > A suggestion is that whenever you are not absolutely sure that the contents > of some data structure might change without your participation, then don't > depend on various kinds of aliases to keep the contents synchronized. Make a > copy, perhaps a deep copy and make sure the only thing ever changing it is > your code and later, if needed, copy the result back to any other data > structure. Of course, if anything else is accessing the result in the > original in between, it won't work. > > Just FYI, a similar analysis applies to uses of the numpy and pandas and > other modules if you get some kind of object holding indices to a series such > as integers or Booleans and then later try using it after the number of items > or rows or columns have changed. Your indices no longer match. > > Avi > > -Original Message- > From: Python-list On > Behalf Of Jen Kris via Python-list > Sent: Wednesday, January 11, 2023 1:29 PM > To: Roel Schroeven > Cc: [email protected] > Subject: Re: To clarify how Python handles two equal objects > > Thanks for your comments. After all, I asked for clarity so it’s not > pedantic to be precise, and you’re helping to clarify. > > Going back to my original post, > > mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] > arr1 = mx1[2] > > Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed > because while they are different names, they are the assigned same memory > location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both > names will be updated. > > That’s what I meant by "an operation on one is an operation on the other." >
Re: To clarify how Python handles two equal objects
Bob, Your examples show a and b separately defined. My example is where the definition is a=1; b = a. But I'm only interested in arrays. I would not rely on this for integers, and there's not likely to be any real cost savings there. Jan 13, 2023, 08:45 by [email protected]: > It seems to me that the the entire concept of relying on python's idea of > where an object is stored is just plain dangerous. A most simple example > might be: > >>> a=1 > >>> b=1 > >>> a is b > True > >>> a=1234 > >>> b=1234 > >>> a is b > False > > Not sure what happens if you manipulate the data referenced by 'b' in the > first example thinking you are changing something referred to by 'a' ... but > you might be smart to NOT think that you know. > > > > On Fri, Jan 13, 2023 at 9:00 AM Jen Kris via Python-list <> > [email protected]> > wrote: > >> >> Avi, >> >> Thanks for your comments. You make a good point. >> >> Going back to my original question, and using your slice() example: >> >> middle_by_two = slice(5, 10, 2) >> nums = [n for n in range(12)] >> q = nums[middle_by_two] >> x = id(q) >> b = q >> y = id(b) >> >> If I assign "b" to "q", then x and y match – they point to the same memory >> until "b" OR "q" are reassigned to something else. If "q" changes during >> the lifetime of "b" then it’s not safe to use the pointer to "q" for "b", as >> in: >> >> nums = [n for n in range(2, 14)] >> q = nums[middle_by_two] >> x = id(q) >> y = id(b) >> >> Now "x" and "y" are different, as we would expect. So when writing a spot >> speed up in a compiled language, you can see in the Python source if either >> is reassigned, so you’ll know how to handle it. The motivation behind my >> question was that in a compiled extension it’s faster to borrow a pointer >> than to move an entire array if it’s possible, but special care must be >> taken. >> >> Jen >> >> >> >> Jan 12, 2023, 20:51 by >> [email protected]>> : >> >> > Jen, >> > >> > It is dangerous territory you are treading as there are times all or >> parts of objects are copied, or changed in place or the method you use to >> make a view is not doing quite what you want. >> > >> > As an example, you can create a named slice such as: >> > >> > middle_by_two = slice(5, 10, 2) >> > >> > The above is not in any sense pointing at anything yet. But given a long >> enough list or other such objects, it will take items (starting at index 0) >> starting with item that are at indices 5 then 7 then 9 as in this: >> > >> > nums = [n for n in range(12)] >> > nums[middle_by_two] >> > >> > [5, 7, 9] >> > >> > The same slice will work on anything else: >> > >> > list('abcdefghijklmnopqrstuvwxyz')[middle_by_two] >> > ['f', 'h', 'j'] >> > >> > So although you may think the slice is bound to something, it is not. It >> is an object that only later is briefly connected to whatever you want to >> apply it to. >> > >> > If I later change nums, above, like this: >> > >> > nums = [-3, -2, -1] + nums >> > nums >> > [-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] >> > nums[middle_by_two] >> > [2, 4, 6] >> > >> > In the example, you can forget about whether we are talking about >> pointers directly or indirectly or variable names and so on. Your "view" >> remains valid ONLY as long as you do not change either the slice or the >> underlying object you are applying to -- at least not the items you want to >> extract. >> > >> > Since my example inserted three new items at the start using negative >> numbers for illustration, you would need to adjust the slice by making a new >> slice designed to fit your new data. The example below created an adjusted >> slice that adds 3 to the start and stop settings of the previous slice while >> copying the step value and then it works on the elongated object: >> > >> > middle_by_two_adj = slice(middle_by_two.start + 3, middle_by_two.stop + >> 3, middle_by_two.step) >> > nu
RE: To clarify how Python handles two equal objects
Avi, Your comments go farther afield than my original question, but you made some interesting additional points. For example, I sometimes work with the C API and sys.getrefcount may be helpful in deciding when to INCREF and DECREF. But that’s another issue. The situation I described in my original post is limited to a case such as x = y where both "x" and "y" are arrays – whether they are lists in Python, or from the array module – and the question in a compiled C extension is whether the assignment can be done simply by "x" taking the pointer to "y" rather than moving all the data from "y" into the memory buffer for "x" which, for a wide array, would be much more time consuming than just moving a pointer. The other advantage to doing it that way is if, as in my case, we perform a math operation on any element in "x" then Python expects that the same change to be reflected in "y." If I don’t use the same pointers then I would have to perform that operation twice – once for "x" and once for "y" – in addition to the expense of moving all the data. The answers I got from this post confirmed that it I can use the pointer if "y" is not re-defined to something else during the lifespan of "x." If it is then "x" has to be restored to its original pointer. I did it that way, and helpfully the compiler did not overrule me. Jan 13, 2023, 18:41 by [email protected]: > Jen, > > This may not be on target but I was wondering about your needs in this > category. Are all your data in a form where all in a cluster are the same > object type, such as floating point? > > Python has features designed to allow you to get multiple views on such > objects such as memoryview that can be used to say see an array as a matrix > of n rows by m columns, or m x n, or any other combo. And of course the > fuller numpy package has quite a few features. > > However, as you note, there is no guarantee that any reference to the data > may not shift away from it unless you build fairly convoluted logic or data > structures such as having an object that arranges to do something when you > try to remove it, such as tinkering with the __del__ method as well as > whatever method is used to try to set it to a new value. I guess that might > make sense for something like asynchronous programming including when setting > locks so multiple things cannot overlap when being done. > > Anyway, some of the packages like numpy are optimized in many ways but if you > want to pass a subset of sorts to make processing faster, I suspect you could > do things like pass a memoryview but it might not be faster than what you > build albeit probably more reliable and portable. > > I note another odd idea that others may have mentioned, with caution. > > If you load the sys module, you can CAREFULLY use code like this. > > a="Something Unique" > sys.getrefcount(a) > 2 > > Note if a==1 you will get some huge number of references and this is > meaningless. The 2 above is because asking about how many references also > references it. > > So save what ever number you have and see what happens when you make a second > reference or a third, and what happens if you delete or alter a reference: > > a="Something Unique" > sys.getrefcount(a) > 2 > b = a > sys.getrefcount(a) > 3 > sys.getrefcount(b) > 3 > c = b > d = a > sys.getrefcount(a) > 5 > sys.getrefcount(d) > 5 > del(a) > sys.getrefcount(d) > 4 > b = "something else" > sys.getrefcount(d) > 3 > > So, in theory, you could carefully write your code to CHECK the reference > count had not changed but there remain edge cases where a removed reference > is replaced by yet another new reference and you would have no idea. > > Avi > > > -Original Message- > From: Python-list On > Behalf Of Jen Kris via Python-list > Sent: Wednesday, January 11, 2023 1:29 PM > To: Roel Schroeven > Cc: [email protected] > Subject: Re: To clarify how Python handles two equal objects > > Thanks for your comments. After all, I asked for clarity so it’s not > pedantic to be precise, and you’re helping to clarify. > > Going back to my original post, > > mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ] > arr1 = mx1[2] > > Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed > because while they are different names, they are the assigned same memory > location (pointer). Similarly, if I write "mx1[2][1] += 5" then again both > names will be updated. > > That’s what I meant by "an operation on one is an operat
Re: To clarify how Python handles two equal objects
Yes, in fact I asked my original question – "I discovered something about Python array handling that I would like to clarify" -- because I saw that Python did it that way. Jan 14, 2023, 15:51 by [email protected]: > On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list > wrote: > >> The situation I described in my original post is limited to a case such as x >> = y ... the assignment can be done simply by "x" taking the pointer to "y" >> rather than moving all the data from "y" into the memory buffer for "x" >> > > It's not simply whether it *can* be done. It, in fact, *MUST* be done > that way. The ONLY meaning of "x = y" is that you now have a name "x" > which refers to whatever object is currently found under the name "y". > This is not an optimization, it is a fundamental of Python's object > model. This is true regardless of what kind of object this is; every > object must behave this way. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to escape strings for re.finditer?
When matching a string against a longer string, where both strings have spaces
in them, we need to escape the spaces.
This works (no spaces):
import re
example = 'abcdefabcdefabcdefg'
find_string = "abc"
for match in re.finditer(find_string, example):
print(match.start(), match.end())
That gives me the start and end character positions, which is what I want.
However, this does not work:
import re
example = re.escape('X - cty_degrees + 1 + qq')
find_string = re.escape('cty_degrees + 1')
for match in re.finditer(find_string, example):
print(match.start(), match.end())
I’ve tried several other attempts based on my reseearch, but still no match.
I don’t have much experience with regex, so I hoped a reg-expert might help.
Thanks,
Jen
--
https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
Yes, that's it. I don't know how long it would have taken to find that detail with research through the voluminous re documentation. Thanks very much. Feb 27, 2023, 15:47 by [email protected]: > On 2023-02-27 23:11, Jen Kris via Python-list wrote: > >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> This works (no spaces): >> >> import re >> example = 'abcdefabcdefabcdefg' >> find_string = "abc" >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> That gives me the start and end character positions, which is what I want. >> >> However, this does not work: >> >> import re >> example = re.escape('X - cty_degrees + 1 + qq') >> find_string = re.escape('cty_degrees + 1') >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> I’ve tried several other attempts based on my reseearch, but still no match. >> >> I don’t have much experience with regex, so I hoped a reg-expert might help. >> > You need to escape only the pattern, not the string you're searching. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I went to the re module because the specified string may appear more than once in the string (in the code I'm writing). For example: a = "X - abc_degree + 1 + qq + abc_degree + 1" b = "abc_degree + 1" q = a.find(b) print(q) 4 So it correctly finds the start of the first instance, but not the second one. The re code finds both instances. If I knew that the substring occurred only once then the str.find would be best. I changed my re code after MRAB's comment, it now works. Thanks much. Jen Feb 27, 2023, 15:56 by [email protected]: > On 28Feb2023 00:11, Jen Kris wrote: > >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> This works (no spaces): >> >> import re >> example = 'abcdefabcdefabcdefg' >> find_string = "abc" >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> That gives me the start and end character positions, which is what I want. >> >> However, this does not work: >> >> import re >> example = re.escape('X - cty_degrees + 1 + qq') >> find_string = re.escape('cty_degrees + 1') >> for match in re.finditer(find_string, example): >> print(match.start(), match.end()) >> >> I’ve tried several other attempts based on my reseearch, but still no match. >> > > You need to print those strings out. You're escaping the _example_ string, > which would make it: > > X - cty_degrees \+ 1 \+ qq > > because `+` is a special character in regexps and so `re.escape` escapes it. > But you don't want to mangle the string you're searching! After all, the text > above does not contain the string `cty_degrees + 1`. > > My secondary question is: if you're escaping the thing you're searching > _for_, then you're effectively searching for a _fixed_ string, not a > pattern/regexp. So why on earth are you using regexps to do your searching? > > The `str` type has a `find(substring)` function. Just use that! It'll be > faster and the code simpler! > > Cheers, > Cameron Simpson > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
string.count() only tells me there are N instances of the string; it does not say where they begin and end, as does re.finditer. Feb 27, 2023, 16:20 by [email protected]: > Would string.count() work for you then? > > On Mon, Feb 27, 2023 at 5:16 PM Jen Kris via Python-list <> > [email protected]> > wrote: > >> >> I went to the re module because the specified string may appear more than >> once in the string (in the code I'm writing). For example: >> >> a = "X - abc_degree + 1 + qq + abc_degree + 1" >> b = "abc_degree + 1" >> q = a.find(b) >> >> print(q) >> 4 >> >> So it correctly finds the start of the first instance, but not the second >> one. The re code finds both instances. If I knew that the substring >> occurred only once then the str.find would be best. >> >> I changed my re code after MRAB's comment, it now works. >> >> Thanks much. >> >> Jen >> >> >> Feb 27, 2023, 15:56 by >> [email protected]>> : >> >> > On 28Feb2023 00:11, Jen Kris <>> [email protected]>> > wrote: >> > >> >> When matching a string against a longer string, where both strings have >> spaces in them, we need to escape the spaces. >> >> >> >> This works (no spaces): >> >> >> >> import re >> >> example = 'abcdefabcdefabcdefg' >> >> find_string = "abc" >> >> for match in re.finditer(find_string, example): >> >> print(match.start(), match.end()) >> >> >> >> That gives me the start and end character positions, which is what I >> want. >> >> >> >> However, this does not work: >> >> >> >> import re >> >> example = re.escape('X - cty_degrees + 1 + qq') >> >> find_string = re.escape('cty_degrees + 1') >> >> for match in re.finditer(find_string, example): >> >> print(match.start(), match.end()) >> >> >> >> I’ve tried several other attempts based on my reseearch, but still no >> match. >> >> >> > >> > You need to print those strings out. You're escaping the _example_ >> string, which would make it: >> > >> > X - cty_degrees \+ 1 \+ qq >> > >> > because `+` is a special character in regexps and so `re.escape` escapes >> it. But you don't want to mangle the string you're searching! After all, the >> text above does not contain the string `cty_degrees + 1`. >> > >> > My secondary question is: if you're escaping the thing you're searching >> _for_, then you're effectively searching for a _fixed_ string, not a >> pattern/regexp. So why on earth are you using regexps to do your searching? >> > >> > The `str` type has a `find(substring)` function. Just use that! It'll be >> faster and the code simpler! >> > >> > Cheers, >> > Cameron Simpson <>> [email protected]>> > >> > -- >> > >> https://mail.python.org/mailman/listinfo/python-list >> > >> >> -- >> >> https://mail.python.org/mailman/listinfo/python-list >> > > > -- > Listen to my CD at > http://www.mellowood.ca/music/cedars> > Bob van der Poel ** Wynndel, British Columbia, CANADA ** > EMAIL: > [email protected] > WWW: > http://www.mellowood.ca > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I haven't tested it either but it looks like it would work. But for this case
I prefer the relative simplicity of:
example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
print(match.start(), match.end())
4 18
26 40
I don't insist on terseness for its own sake, but it's cleaner this way.
Jen
Feb 27, 2023, 16:55 by [email protected]:
> On 28Feb2023 01:13, Jen Kris wrote:
>
>> I went to the re module because the specified string may appear more than
>> once in the string (in the code I'm writing).
>>
>
> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>
> pos = 0
> while True:
> found = s.find(substring, pos)
> if found < 0:
> break
> start = found
> end = found + len(substring)
> ... do whatever with start and end ...
> pos = end
>
> Many people go straight to the `re` module whenever they're looking for
> strings. It is often cryptic error prone overkill. Just something to keep in
> mind.
>
> Cheers,
> Cameron Simpson
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
RE: How to escape strings for re.finditer?
The code I sent is correct, and it runs here. Maybe you received it with a
carriage return removed, but on my copy after posting, it is correct:
example = 'X - abc_degree + 1 + qq + abc_degree + 1'
find_string = re.escape('abc_degree + 1')
for match in re.finditer(find_string, example):
print(match.start(), match.end())
One question: several people have made suggestions other than regex (not your
terser example with regex you shown below). Is there a reason why regex is not
preferred to, for example, a list comp? Performance? Reliability?
Feb 27, 2023, 18:16 by [email protected]:
> Jen,
>
> Can you see what SOME OF US see as ASCII text? We can help you better if we
> get code that can be copied and run as-is.
>
> What you sent is not terse. It is wrong. It will not run on any python
> interpreter because you somehow lost a carriage return and indent.
>
> This is what you sent:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in
> re.finditer(find_string, example):
> print(match.start(), match.end())
>
> This is code indentedproperly:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1')
> for match in re.finditer(find_string, example):
> print(match.start(), match.end())
>
> Of course I am sure you wrote and ran code more like the latter version but
> somewhere in your copy/paste process,
>
> And, just for fun, since there is nothing wrong with your code, this minor
> change is terser:
>
>>>> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
>>>> for match in re.finditer(re.escape('abc_degree + 1') , example):
>>>>
> ... print(match.start(), match.end())
> ...
> ...
> 4 18
> 26 40
>
> But note once you use regular expressions, and not in your case, you might
> match multiple things that are far from the same such as matching two
> repeated words of any kind in any case including "and and" and "so so" or
> finding words that have multiple doubled letter as in the stereotypical
> bookkeeper. In those cases, you may want even more than offsets but also show
> the exact text that matched or even show some characters before and/or after
> for context.
>
>
> -Original Message-
> From: Python-list On
> Behalf Of Jen Kris via Python-list
> Sent: Monday, February 27, 2023 8:36 PM
> To: Cameron Simpson
> Cc: Python List
> Subject: Re: How to escape strings for re.finditer?
>
>
> I haven't tested it either but it looks like it would work. But for this
> case I prefer the relative simplicity of:
>
> example = 'X - abc_degree + 1 + qq + abc_degree + 1'
> find_string = re.escape('abc_degree + 1') for match in
> re.finditer(find_string, example):
> print(match.start(), match.end())
>
> 4 18
> 26 40
>
> I don't insist on terseness for its own sake, but it's cleaner this way.
>
> Jen
>
>
> Feb 27, 2023, 16:55 by [email protected]:
>
>> On 28Feb2023 01:13, Jen Kris wrote:
>>
>>> I went to the re module because the specified string may appear more than
>>> once in the string (in the code I'm writing).
>>>
>>
>> Sure, but writing a `finditer` for plain `str` is pretty easy (untested):
>>
>> pos = 0
>> while True:
>> found = s.find(substring, pos)
>> if found < 0:
>> break
>> start = found
>> end = found + len(substring)
>> ... do whatever with start and end ...
>> pos = end
>>
>> Many people go straight to the `re` module whenever they're looking for
>> strings. It is often cryptic error prone overkill. Just something to keep in
>> mind.
>>
>> Cheers,
>> Cameron Simpson
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
Using str.startswith is a cool idea in this case. But is it better than regex for performance or reliability? Regex syntax is not a model of simplicity, but in my simple case it's not too difficult. Feb 27, 2023, 18:52 by [email protected]: > On 2/27/2023 9:16 PM, [email protected] wrote: > >> And, just for fun, since there is nothing wrong with your code, this minor >> change is terser: >> > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > for match in re.finditer(re.escape('abc_degree + 1') , example): > >> ... print(match.start(), match.end()) >> ... >> ... >> 4 18 >> 26 40 >> > > Just for more fun :) - > > Without knowing how general your expressions will be, I think the following > version is very readable, certainly more readable than regexes: > > example = 'X - abc_degree + 1 + qq + abc_degree + 1' > KEY = 'abc_degree + 1' > > for i in range(len(example)): > if example[i:].startswith(KEY): > print(i, i + len(KEY)) > # prints: > 4 18 > 26 40 > > If you may have variable numbers of spaces around the symbols, OTOH, the > whole situation changes and then regexes would almost certainly be the best > approach. But the regular expression strings would become harder to read. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to escape strings for re.finditer?
I wrote my previous message before reading this. Thank you for the test you ran -- it answers the question of performance. You show that re.finditer is 30x faster, so that certainly recommends that over a simple loop, which introduces looping overhead. Feb 28, 2023, 05:44 by [email protected]: > On 2/28/2023 4:33 AM, Roel Schroeven wrote: > >> Op 28/02/2023 om 3:44 schreef Thomas Passin: >> >>> On 2/27/2023 9:16 PM, [email protected] wrote: >>> And, just for fun, since there is nothing wrong with your code, this minor change is terser: >>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>> for match in re.finditer(re.escape('abc_degree + 1') , example): >>> ... print(match.start(), match.end()) ... ... 4 18 26 40 >>> >>> Just for more fun :) - >>> >>> Without knowing how general your expressions will be, I think the following >>> version is very readable, certainly more readable than regexes: >>> >>> example = 'X - abc_degree + 1 + qq + abc_degree + 1' >>> KEY = 'abc_degree + 1' >>> >>> for i in range(len(example)): >>> if example[i:].startswith(KEY): >>> print(i, i + len(KEY)) >>> # prints: >>> 4 18 >>> 26 40 >>> >> I think it's often a good idea to use a standard library function instead of >> rolling your own. The issue becomes less clear-cut when the standard library >> doesn't do exactly what you need (as here, where re.finditer() uses regular >> expressions while the use case only uses simple search strings). Ideally >> there would be a str.finditer() method we could use, but in the absence of >> that I think we still need to consider using the almost-but-not-quite >> fitting re.finditer(). >> >> Two reasons: >> >> (1) I think it's clearer: the name tells us what it does (though of course >> we could solve this in a hand-written version by wrapping it in a suitably >> named function). >> >> (2) Searching for a string in another string, in a performant way, is not as >> simple as it first appears. Your version works correctly, but slowly. In >> some situations it doesn't matter, but in other cases it will. For better >> performance, string searching algorithms jump ahead either when they found a >> match or when they know for sure there isn't a match for some time (see e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more error-prone. >> Using a well-tested existing function becomes quite attractive. >> > > Sure, it all depends on what the real task will be. That's why I wrote > "Without knowing how general your expressions will be". For the example > string, it's unlikely that speed will be a factor, but who knows what target > strings and keys will turn up in the future? > >> To illustrate the difference performance, I did a simple test (using the >> paragraph above is test text): >> >> import re >> import timeit >> >> def using_re_finditer(key, text): >> matches = [] >> for match in re.finditer(re.escape(key), text): >> matches.append((match.start(), match.end())) >> return matches >> >> >> def using_simple_loop(key, text): >> matches = [] >> for i in range(len(text)): >> if text[i:].startswith(key): >> matches.append((i, i + len(key))) >> return matches >> >> >> CORPUS = """Searching for a string in another string, in a performant >> way, is >> not as simple as it first appears. Your version works correctly, but >> slowly. >> In some situations it doesn't matter, but in other cases it will. For >> better >> performance, string searching algorithms jump ahead either when they >> found a >> match or when they know for sure there isn't a match for some time (see >> e.g. >> the Boyer–Moore string-search algorithm). You could write such a more >> efficient algorithm, but then it becomes more complex and more >> error-prone. >> Using a well-tested existing function becomes quite attractive.""" >> KEY = 'in' >> print('using_simple_loop:', timeit.repeat(stmt='using_simple_loop(KEY, >> CORPUS)', globals=globals(), number=1000)) >> print('using_re_finditer:', timeit.repeat(stmt='using_re_finditer(KEY, >> CORPUS)', globals=globals(), number=1000)) >> >> This does 5 runs of 1000 repetitions each, and reports the time in seconds >> for each of those runs. >> Result on my machine: >> >> using_simple_loop: [0.1395295020792, 0.1306313000456, >> 0.1280345001249, 0.1318618002423, 0.1308461032626] >> using_re_finditer: [0.00386140005233, 0.00406190124297, >> 0.00347899970256, 0.00341310216218, 0.003732001273] >> >> We find that in this test re.finditer() is more than 30 times faster >> (despite the overhead of regular expressions. >> >> While speed isn't everything in programming, w
How does a method of a subclass become a method of the base class?
The base class: class Constraint(object): def __init__(self, strength): super(Constraint, self).__init__() self.strength = strength def satisfy(self, mark): global planner self.choose_method(mark) The subclass: class UrnaryConstraint(Constraint): def __init__(self, v, strength): super(UrnaryConstraint, self).__init__(strength) self.my_output = v self.satisfied = False self.add_constraint() def choose_method(self, mark): if self.my_output.mark != mark and \ Strength.stronger(self.strength, self.my_output.walk_strength): self.satisfied = True else: self.satisfied = False The base class Constraint doesn’t have a "choose_method" class method, but it’s called as self.choose_method(mark) on the final line of Constraint shown above. My question is: what makes "choose_method" a method of the base class, called as self.choose_method instead of UrnaryConstraint.choose_method? Is it super(UrnaryConstraint, self).__init__(strength) or just the fact that Constraint is its base class? Also, this program also has a class BinaryConstraint that is also a subclass of Constraint and it also has a choose_method class method that is similar but not identical: def choose_method(self, mark): if self.v1.mark == mark: if self.v2.mark != mark and Strength.stronger(self.strength, self.v2.walk_strength): self.direction = Direction.FORWARD else: self.direction = Direction.BACKWARD When called from Constraint, it uses the one at UrnaryConstraint. How does it know which one to use? Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Thanks to Richard Damon and Peter Holzer for your replies. I'm working through the call chain to understand better so I can post a followup question if needed. Thanks again. Jen Mar 26, 2023, 19:21 by [email protected]: > On 3/26/23 1:43 PM, Jen Kris via Python-list wrote: > >> The base class: >> >> >> class Constraint(object): >> >> def __init__(self, strength): >> super(Constraint, self).__init__() >> self.strength = strength >> >> def satisfy(self, mark): >> global planner >> self.choose_method(mark) >> >> The subclass: >> >> class UrnaryConstraint(Constraint): >> >> def __init__(self, v, strength): >> super(UrnaryConstraint, self).__init__(strength) >> self.my_output = v >> self.satisfied = False >> self.add_constraint() >> >> def choose_method(self, mark): >> if self.my_output.mark != mark and \ >> Strength.stronger(self.strength, self.my_output.walk_strength): >> self.satisfied = True >> else: >> self.satisfied = False >> >> The base class Constraint doesn’t have a "choose_method" class method, but >> it’s called as self.choose_method(mark) on the final line of Constraint >> shown above. >> >> My question is: what makes "choose_method" a method of the base class, >> called as self.choose_method instead of UrnaryConstraint.choose_method? Is >> it super(UrnaryConstraint, self).__init__(strength) or just the fact that >> Constraint is its base class? >> >> Also, this program also has a class BinaryConstraint that is also a subclass >> of Constraint and it also has a choose_method class method that is similar >> but not identical: >> >> def choose_method(self, mark): >> if self.v1.mark == mark: >> if self.v2.mark != mark and Strength.stronger(self.strength, >> self.v2.walk_strength): >> self.direction = Direction.FORWARD >> else: >> self.direction = Direction.BACKWARD >> >> When called from Constraint, it uses the one at UrnaryConstraint. How does >> it know which one to use? >> >> Thanks, >> >> Jen >> > > Perhaps the key point to remember is that when looking up the methods on an > object, those methods are part of the object as a whole, not particually > "attached" to a given class. When creating the subclass typed object, first > the most base class part is built, and all the methods of that class are put > into the object, then the next level, and so on, and if a duplicate method is > found, it just overwrites the connection. Then when the object is used, we > see if there is a method by that name to use, so methods in the base can find > methods in subclasses to use. > > Perhaps a more modern approach would be to use the concept of an "abstract > base" which allows the base to indicate that a derived class needs to define > certain abstract methods, (If you need that sort of support, not defining a > method might just mean the subclass doesn't support some optional behavior > defined by the base) > > -- > Richard Damon > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Based on your explanations, I went through the call chain and now I understand better how it works, but I have a follow-up question at the end. This code comes from the DeltaBlue benchmark in the Python benchmark suite. 1 The call chain starts in a non-class program with the following call: EqualityConstraint(prev, v, Strength.REQUIRED) 2 EqualityConstraint is a subclass of BinaryConstraint, so first it calls the __init__ method of BinaryConstraint: def __init__(self, v1, v2, strength): super(BinaryConstraint, self).__init__(strength) self.v1 = v1 self.v2 = v2 self.direction = Direction.NONE self.add_constraint() 3 At the final line shown above it calls add_constraint in the Constraint class, the base class of BinaryConstraint: def add_constraint(self): global planner self.add_to_graph() planner.incremental_add(self) 4 At planner.incremental_add it calls incremental_add in the Planner class because planner is a global instance of the Planner class: def incremental_add(self, constraint): mark = self.new_mark() overridden = constraint.satisfy(mark) At the final line it calls "satisfy" in the Constraint class, and that line calls choose_method in the BinaryConstraint class. Just as Peter Holzer said, it requires a call to "satisfy." My only remaining question is, did it select the choose_method in the BinaryConstraint class instead of the choose_method in the UrnaryConstraint class because of "super(BinaryConstraint, self).__init__(strength)" in step 2 above? Thanks for helping me clarify that. Jen Mar 26, 2023, 18:55 by [email protected]: > On 2023-03-26 19:43:44 +0200, Jen Kris via Python-list wrote: > >> The base class: >> >> >> class Constraint(object): >> > [...] > >> def satisfy(self, mark): >> global planner >> self.choose_method(mark) >> >> The subclass: >> >> class UrnaryConstraint(Constraint): >> > [...] > >> def choose_method(self, mark): >> if self.my_output.mark != mark and \ >> Strength.stronger(self.strength, self.my_output.walk_strength): >> self.satisfied = True >> else: >> self.satisfied = False >> >> The base class Constraint doesn’t have a "choose_method" class method, >> but it’s called as self.choose_method(mark) on the final line of >> Constraint shown above. >> >> My question is: what makes "choose_method" a method of the base >> class, >> > > Nothing. choose_method isn't a method of the base class. > >> called as self.choose_method instead of >> UrnaryConstraint.choose_method? Is it super(UrnaryConstraint, >> self).__init__(strength) or just the fact that Constraint is its base >> class? >> > > This works only if satisfy() is called on a subclass of Constraint which > actually implements this method. > > If you do something like > > x = UrnaryConstraint() > x.satisfy(whatever) > > Then x is a member of class UrnaryConstraint and will have a > choose_method() method which can be called. > > >> Also, this program also has a class BinaryConstraint that is also a >> subclass of Constraint and it also has a choose_method class method >> that is similar but not identical: >> > ... > >> When called from Constraint, it uses the one at UrnaryConstraint. How >> does it know which one to use? >> > > By inspecting self. If you call x.satisfy() on an object of class > UrnaryConstraint, then self.choose_method will be the choose_method from > UrnaryConstraint. If you call it on an object of class BinaryConstraint, > then self.choose_method will be the choose_method from BinaryConstraint. > > hp > > PS: Pretty sure there's one "r" too many in UrnaryConstraint. > > -- > _ | Peter J. Holzer| Story must make more sense than reality. > |_|_) || > | | | [email protected] |-- Charles Stross, "Creative writing > __/ | http://www.hjp.at/ | challenge!" > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Cameron, Thanks for your reply. You are correct about the class definition lines – e.g. class EqualityConstraint(BinaryConstraint). I didn’t post all of the code because this program is over 600 lines long. It's DeltaBlue in the Python benchmark suite. I’ve done some more work since this morning, and now I see what’s happening. But it gave rise to another question, which I’ll ask at the end. The call chain starts at EqualityConstraint(prev, v, Strength.REQUIRED) The class EqualityConstraint is a subclass of BinaryConstraint. The entire class code is: class EqualityConstraint(BinaryConstraint): def execute(self): self.output().value = self.input().value Because EqualityConstraint is a subclass of BinaryConstraint, the init method of BinaryConstraint is called first. During that initialization (I showed the call chain in my previous message), it calls choose_method. When I inspect the code at "self.choose_method(mark):" in PyCharm, it shows: > As EqualityConstraint is a subclass of BinaryConstraint it has bound the choose method from BinaryConstraint, apparently during the BinaryConstraint init process, and that’s the one it uses. So that answers my original question. But that brings up a new question. I can create a class instance with x = BinaryConstraint(), but what happens when I have a line like "EqualityConstraint(prev, v, Strength.REQUIRED)"? Is it because the only method of EqualityConstraint is execute(self)? Is execute a special function like a class __init__? I’ve done research on that but I haven’t found an answer. I’m asking all these question because I have worked in a procedural style for many years, with class work limited to only simple classes, but now I’m studying classes in more depth. The three answers I have received today, including yours, have helped a lot. Thanks very much. Jen Mar 26, 2023, 22:45 by [email protected]: > On 26Mar2023 22:36, Jen Kris wrote: > >> At the final line it calls "satisfy" in the Constraint class, and that line >> calls choose_method in the BinaryConstraint class. Just as Peter Holzer >> said, it requires a call to "satisfy." >> >> My only remaining question is, did it select the choose_method in the >> BinaryConstraint class instead of the choose_method in the UrnaryConstraint >> class because of "super(BinaryConstraint, self).__init__(strength)" in step >> 2 above? >> > > Basicly, no. > > You've omitting the "class" lines of the class definitions, and they define > the class inheritance, _not "__init__". The "__init__" method just > initialises the state of the new objects (which has already been created). > The: > > super(BinaryConstraint,_ self).__init__(strength) > > line simply calls the appropriate superclass "__init__" with the "strength" > parameter to do that aspect of the initialisation. > > You haven't cited the line which calls the "choose_method" method, but I'm > imagining it calls "choose_method" like this: > > self.choose_method(...) > > That searchs for the "choose_method" method based on the method resolution > order of the object "self". So if "self" was an instance of > "EqualityConstraint", and I'm guessing abut its class definition, assuming > this: > > class EqualityConstraint(BinaryConstraint): > > Then a call to "self.choose_method" would look for a "choose_method" method > first in the EqualityConstraint class and then via the BinaryConstraint > class. I'm also assuming UrnaryConstraint is not in that class ancestry i.e. > not an ancestor of BinaryConstraint, for example. > > The first method found is used. > > In practice, when you define a class like: > > class EqualityConstraint(BinaryConstraint): > > the complete class ancestry (the addition classes from which BinaryConstraint > inherits) gets flatterned into a "method resultion order" list of classes to > inspect in order, and that is stored as the ".__mro__" field on the new class > (EqualityConstraint). You can look at it directly as > "EqualityConstraint.__mro__". > > So looking up: > > self.choose_method() > > looks for a "choose_method" method on the classes in "type(self).__mro__". > > Cheers, > Cameron Simpson > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How does a method of a subclass become a method of the base class?
Thanks to everyone who answered this question. Your answers have helped a lot. Jen Mar 27, 2023, 14:12 by [email protected]: > On 3/26/23 17:53, Jen Kris via Python-list wrote: > >> I’m asking all these question because I have worked in a procedural style >> for many years, with class work limited to only simple classes, but now I’m >> studying classes in more depth. The three answers I have received today, >> including yours, have helped a lot. >> > > Classes in Python don't work quite like they do in many other languages. > > You may find a lightbulb if you listen to Raymond Hettinger talk about them: > > https://dailytechvideo.com/raymond-hettinger-pythons-class-development-toolkit/ > > I'd also advise that benchmarks often do very strange things to set up the > scenario they're trying to test, a benchmark sure wouldn't be my first place > to look in learning a new piece of Python - I don't know if it was the first > place, but thought this was worth a mention. > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to write list of integers to file with struct.pack_into?
Iwant to write a list of 64-bit integers to a binary file. Everyexample I have seen in my research convertsit to .txt, but I want it in binary. I wrote this code,based on some earlier work I have done: buf= bytes((len(qs_array)) * 8) foroffset in range(len(qs_array)): item_to_write= bytes(qs_array[offset]) struct.pack_into(buf,"https://mail.python.org/mailman/listinfo/python-list
Re: How to write list of integers to file with struct.pack_into?
Thanks very much, MRAB. I just tried that and it works. What frustrated me is that every research example I found writes integers as strings. That works -- sort of -- but it requires re-casting each string to integer when reading the file. If I'm doing binary work I don't want the extra overhead, and it's more difficult yet if I'm using the Python integer output in a C program. Your solution solves those problems. Oct 2, 2023, 17:11 by [email protected]: > On 2023-10-01 23:04, Jen Kris via Python-list wrote: > >> >> Iwant to write a list of 64-bit integers to a binary file. Everyexample I >> have seen in my research convertsit to .txt, but I want it in binary. I >> wrote this code,based on some earlier work I have done: >> >> buf= bytes((len(qs_array)) * 8) >> >> foroffset in range(len(qs_array)): >> >> item_to_write= bytes(qs_array[offset]) >> >> struct.pack_into(buf,"> >> ButI get the error "struct.error: embedded null character." >> >> Maybethere's a better way to do this? >> > You can't pack into a 'bytes' object because it's immutable. > > The simplest solution I can think of is: > > buf = struct.pack("<%sQ" % len(qs_array), *qs_array) > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to write list of integers to file with struct.pack_into?
Dieter, thanks for your comment that: * In your code, `offset` is `0`, `1`, `2`, ... but it should be `0 *8`, `1 * 8`, `2 * 8`, ... But you concluded with essentially the same solution proposed by MRAB, so that would obviate the need to write item by item because it writes the whole buffer at once. Thanks for your help. Oct 2, 2023, 17:47 by [email protected]: > Jen Kris wrote at 2023-10-2 00:04 +0200: > >Iwant to write a list of 64-bit integers to a binary file. Everyexample I > >have seen in my research convertsit to .txt, but I want it in binary. I > >wrote this code,based on some earlier work I have done: > >> >> > >buf= bytes((len(qs_array)) * 8) > >> >> > >for offset in range(len(qs_array)): > >> item_to_write= bytes(qs_array[offset]) >> struct.pack_into(buf,"> > >But I get the error "struct.error: embedded null character." > > You made a lot of errors: > > * the signature of `struct.pack_into` is > `(format, buffer, offset, v1, v2, ...)`. > Especially: `format` is the first, `buffer` the second argument > > * In your code, `offset` is `0`, `1`, `2`, ... > but it should be `0 *8`, `1 * 8`, `2 * 8`, ... > > * The `vi` should be something which fits with the format: > integers in your case. But you pass bytes. > > Try `struct.pack_into(" instead of your loop. > > > Next time: carefully read the documentation and think carefully > about the types involved. > -- https://mail.python.org/mailman/listinfo/python-list
How to write list of integers to file with struct.pack_into?
My previous message just went up -- sorry for the mangled formatting. Here it is properly formatted: I want to write a list of 64-bit integers to a binary file. Every example I have seen in my research converts it to .txt, but I want it in binary. I wrote this code, based on some earlier work I have done: buf = bytes((len(qs_array)) * 8) for offset in range(len(qs_array)): item_to_write = bytes(qs_array[offset]) struct.pack_into(buf, "https://mail.python.org/mailman/listinfo/python-list
Python child process in while True loop blocks parent
I have a C program that forks to create a child process and uses execv to call
a Python program. The Python program communicates with the parent process (in
C) through a FIFO pipe monitored with epoll().
The Python child process is in a while True loop, which is intended to keep it
running while the parent process proceeds, and perform functions for the C
program only at intervals when the parent sends data to the child -- similar to
a daemon process.
The C process writes to its end of the pipe and the child process reads it, but
then the child process continues to loop, thereby blocking the parent.
This is the Python code:
#!/usr/bin/python3
import os
import select
#Open the named pipes
pr = os.open('/tmp/Pipe_01', os.O_RDWR)
pw = os.open('/tmp/Pipe_02', os.O_RDWR)
ep = select.epoll(-1)
ep.register(pr, select.EPOLLIN)
while True:
events = ep.poll(timeout=2.5, maxevents=-1)
#events = ep.poll(timeout=None, maxevents=-1)
print("child is looping")
for fileno, event in events:
print("Python fileno")
print(fileno)
print("Python event")
print(event)
v = os.read(pr,64)
print("Pipe value")
print(v)
The child process correctly receives the signal from ep.poll and correctly
reads the data in the pipe, but then it continues looping. For example, when I
put in a timeout:
child is looping
Python fileno
4
Python event
1
Pipe value
b'10\x00'
child is looping
child is looping
That suggests that a while True loop is not the right thing to do in this case.
My question is, what type of process loop is best for this situation? The
multiprocessing, asyncio and subprocess libraries are very extensive, and it
would help if someone could suggest the best alternative for what I am doing
here.
Thanks very much for any ideas.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Thanks to you and Cameron for your replies. The C side has an epoll_ctl set, but no event loop to handle it yet. I'm putting that in now with a pipe write in Python-- as Cameron pointed out that is the likely source of blocking on C. The pipes are opened as rdwr in Python because that's nonblocking by default. The child will become more complex, but not in a way that affects polling. And thanks for the tip about the c-string termination. Nov 29, 2021, 14:12 by [email protected]: > > >> On 29 Nov 2021, at 20:36, Jen Kris via Python-list >> wrote: >> >> I have a C program that forks to create a child process and uses execv to >> call a Python program. The Python program communicates with the parent >> process (in C) through a FIFO pipe monitored with epoll(). >> >> The Python child process is in a while True loop, which is intended to keep >> it running while the parent process proceeds, and perform functions for the >> C program only at intervals when the parent sends data to the child -- >> similar to a daemon process. >> >> The C process writes to its end of the pipe and the child process reads it, >> but then the child process continues to loop, thereby blocking the parent. >> >> This is the Python code: >> >> #!/usr/bin/python3 >> import os >> import select >> >> #Open the named pipes >> pr = os.open('/tmp/Pipe_01', os.O_RDWR) >> > Why open rdwr if you are only going to read the pipe? > >> pw = os.open('/tmp/Pipe_02', os.O_RDWR) >> > Only need to open for write. > >> >> ep = select.epoll(-1) >> ep.register(pr, select.EPOLLIN) >> > > Is the only thing that the child does this: > 1. Read message from pr > 2. Process message > 3. Write result to pw. > 4. Loop from 1 > > If so as Cameron said you do not need to worry about the poll. > Do you plan for the child to become more complex? > >> >> while True: >> >> events = ep.poll(timeout=2.5, maxevents=-1) >> #events = ep.poll(timeout=None, maxevents=-1) >> >> print("child is looping") >> >> for fileno, event in events: >> print("Python fileno") >> print(fileno) >> print("Python event") >> print(event) >> v = os.read(pr,64) >> print("Pipe value") >> print(v) >> >> The child process correctly receives the signal from ep.poll and correctly >> reads the data in the pipe, but then it continues looping. For example, >> when I put in a timeout: >> >> child is looping >> Python fileno >> 4 >> Python event >> 1 >> Pipe value >> b'10\x00' >> > The C code does not need to write a 0 bytes at the end. > I assume the 0 is from the end of a C string. > UDS messages have a length. > In the C just write 2 byes in the case. > > Barry > >> child is looping >> child is looping >> >> That suggests that a while True loop is not the right thing to do in this >> case. My question is, what type of process loop is best for this situation? >> The multiprocessing, asyncio and subprocess libraries are very extensive, >> and it would help if someone could suggest the best alternative for what I >> am doing here. >> >> Thanks very much for any ideas. >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Thanks for your comment re blocking. I removed pipes from the Python and C programs to see if it blocks without them, and it does. It looks now like the problem is not pipes. I use fork() and execv() in C to run Python in a child process, but the Python process blocks because fork() does not create a new thread, so the Python global interpreter lock (GIL) prevents the C program from running once Python starts. So the solution appears to be run Python in a separate thread, which I can do with pthread create. See "Thread State and the Global Interpreter Lock" https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock and the sections below that "Non-Python created threads" and "Cautions about fork()." I'm working on that today and I hope all goes well :) Nov 30, 2021, 11:42 by [email protected]: > > > >> On 29 Nov 2021, at 22:31, Jen Kris <>> [email protected]>> > wrote: >> >> Thanks to you and Cameron for your replies. The C side has an epoll_ctl >> set, but no event loop to handle it yet. I'm putting that in now with a >> pipe write in Python-- as Cameron pointed out that is the likely source of >> blocking on C. The pipes are opened as rdwr in Python because that's >> nonblocking by default. The child will become more complex, but not in a >> way that affects polling. And thanks for the tip about the c-string >> termination. >> >> > > flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK. > You should not use O_RDWR when you only need O_RDONLY access or only O_WRONLY > access. > > You may find > > man 2 open > > useful to understand in detail what is behind os.open(). > > Barry > > > > >> >> >> Nov 29, 2021, 14:12 by >> [email protected]>> : >> >>> >>> >>>> On 29 Nov 2021, at 20:36, Jen Kris via Python-list <>>>> >>>> [email protected]>>>> > wrote: >>>> >>>> I have a C program that forks to create a child process and uses execv to >>>> call a Python program. The Python program communicates with the parent >>>> process (in C) through a FIFO pipe monitored with epoll(). >>>> >>>> The Python child process is in a while True loop, which is intended to >>>> keep it running while the parent process proceeds, and perform functions >>>> for the C program only at intervals when the parent sends data to the >>>> child -- similar to a daemon process. >>>> >>>> The C process writes to its end of the pipe and the child process reads >>>> it, but then the child process continues to loop, thereby blocking the >>>> parent. >>>> >>>> This is the Python code: >>>> >>>> #!/usr/bin/python3 >>>> import os >>>> import select >>>> >>>> #Open the named pipes >>>> pr = os.open('/tmp/Pipe_01', os.O_RDWR) >>>> >>> Why open rdwr if you are only going to read the pipe? >>> >>>> pw = os.open('/tmp/Pipe_02', os.O_RDWR) >>>> >>> Only need to open for write. >>> >>>> >>>> ep = select.epoll(-1) >>>> ep.register(pr, select.EPOLLIN) >>>> >>> >>> Is the only thing that the child does this: >>> 1. Read message from pr >>> 2. Process message >>> 3. Write result to pw. >>> 4. Loop from 1 >>> >>> If so as Cameron said you do not need to worry about the poll. >>> Do you plan for the child to become more complex? >>> >>>> >>>> while True: >>>> >>>> events = ep.poll(timeout=2.5, maxevents=-1) >>>> #events = ep.poll(timeout=None, maxevents=-1) >>>> >>>> print("child is looping") >>>> >>>> for fileno, event in events: >>>> print("Python fileno") >>>> print(fileno) >>>> print("Python event") >>>> print(event) >>>> v = os.read(pr,64) >>>> print("Pipe value") >>>> print(v) >>>> >>>> The child process correctly receives the signal from ep.poll and correctly >>>> reads the data in the pipe, but then it continues looping. For example, >>>> when I put in a timeout: >>>> >>>> child is looping >>>> Python fileno >>>> 4 >>>> Python event >>>> 1 >>>> Pipe value >>>> b'10\x00' >>>> >>> The C code does not need to write a 0 bytes at the end. >>> I assume the 0 is from the end of a C string. >>> UDS messages have a length. >>> In the C just write 2 byes in the case. >>> >>> Barry >>> >>>> child is looping >>>> child is looping >>>> >>>> That suggests that a while True loop is not the right thing to do in this >>>> case. My question is, what type of process loop is best for this >>>> situation? The multiprocessing, asyncio and subprocess libraries are very >>>> extensive, and it would help if someone could suggest the best alternative >>>> for what I am doing here. >>>> >>>> Thanks very much for any ideas. >>>> >>>> >>>> -- >>>> https://mail.python.org/mailman/listinfo/python-list >>>> >> >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Thanks for your comments. I put the Python program on its own pthread, and call a small C program to fork-execv to call the Python program as a child process. I revised the Python program to be a multiprocessing loop using the Python multiprocessing module. That bypasses the GIL and allows Python to run concurrently with C. So far so good. Next I will use Linux pipes, not Python multiprocessing pipes, for IPC between Python and C. Multiprocessing pipes are (as far as I can tell) only for commo between two Python processes. I will have the parent thread send a signal through the pipe to the child process to exit when the parent thread is ready to exit, then call wait() to finalize the child process. I will reply back when it's finished and post the code so you can see what I have done. Thanks again. Jen Dec 4, 2021, 09:22 by [email protected]: > > >> On 1 Dec 2021, at 16:01, Jen Kris <>> [email protected]>> > wrote: >> >> Thanks for your comment re blocking. >> >> I removed pipes from the Python and C programs to see if it blocks without >> them, and it does. >> >> It looks now like the problem is not pipes. >> > > Ok. > > >> I use fork() and execv() in C to run Python in a child process, but the >> Python process blocks >> > > Use strace on the parent process to see what is happening. > You will need to use the option to follow subprocesses so that you can see > what goes on in the python process. > > See man strace and the --follow-forks and --output-separately options. > That will allow you to find the blocking system call that your code is making. > > >> because fork() does not create a new thread, so the Python global >> interpreter lock (GIL) prevents the C program from running once Python >> starts. >> > > Not sure why you think this. > > >> So the solution appears to be run Python in a separate thread, which I can >> do with pthread create. >> >> See "Thread State and the Global Interpreter Lock" >> >> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>> >> and the sections below that "Non-Python created threads" and "Cautions >> about fork()." >> > > I take it you mean that in the parent you think that using pthreads will > affect python after the exec() call? > I does not. After exec() the process has one main thread create by the kernel > and a new address space as defined by the /usr/bin/python. > The only state that in inherited from the parent are open file descriptors, > the current working directory and security state like UID, GID. > > >> I'm working on that today and I hope all goes well :) >> > > You seem to be missing background information on how processes work. > Maybe "Advanced Programming in the UNIX Environment" > would be helpful? > > https://www.amazon.co.uk/Programming-Environment-Addison-Wesley-Professional-Computing-dp-0321637739/dp/0321637739/ref=dp_ob_image_bk> > > > It's a great book and covers a wide range of Unix systems programming topics. > > Have you created a small C program that just does the fork and exec of a > python program to test out your assumptions? > If not I recommend that you do. > > Barry > > > >> >> >> >> Nov 30, 2021, 11:42 by >> [email protected]>> : >> >>> >>> >>> >>>> On 29 Nov 2021, at 22:31, Jen Kris <>>>> [email protected]>>>> > wrote: >>>> >>>> Thanks to you and Cameron for your replies. The C side has an epoll_ctl >>>> set, but no event loop to handle it yet. I'm putting that in now with a >>>> pipe write in Python-- as Cameron pointed out that is the likely source of >>>> blocking on C. The pipes are opened as rdwr in Python because that's >>>> nonblocking by default. The child will become more complex, but not in a >>>> way that affects polling. And thanks for the tip about the c-string >>>> termination. >>>> >>>> >>> >>> flags is a bit mask. You say its BLOCKing by not setting os.O_NONBLOCK. >>> You should not use O_RDWR when you only need O_RDONLY access or only >>> O_WRONLY access. >>> >>> You may find >>> >>> man 2 open >>> >>> useful to understand in detail what is behind os.open(). >>> >>> Barry >>> >>> >>> >>> >>>> >>>> >>>&
Re: Python child process in while True loop blocks parent
By embedding, I think you may be referring to embedding Python in a C program with the Python C API. That's not what I'm doing here -- I'm not using the Python C API. The C program creates two threads (using pthreads), one for itself and one for the child process. On creation, the second pthread is pointed to a C program that calls fork-execv to run the Python program. That way Python runs on a separate thread. The multiprocessing library "effectively side-step[s] the Global Interpreter Lock by using subprocesses instead of threads." https://docs.python.org/3/library/multiprocessing.html. This way I can get the Python functionality I want on call from the C program through pipes and shared memory. I don't want to use the C API because I will be making certain library calls from the C program, and the syntax is much easier with native Python code than with C API code. I hope that clarifies what I'm doing. Jen Dec 5, 2021, 15:19 by [email protected]: > > > > >> On 5 Dec 2021, at 17:54, Jen Kris wrote: >> >> >> Thanks for your comments. >> >> I put the Python program on its own pthread, and call a small C program to >> fork-execv to call the Python program as a child process. >> > > What do you mean by putting python in it’s own pthread? > Are you embedding python in an other program? > > Barry > > > >> I revised the Python program to be a multiprocessing loop using the Python >> multiprocessing module. That bypasses the GIL and allows Python to run >> concurrently with C. So far so good. >> >> Next I will use Linux pipes, not Python multiprocessing pipes, for IPC >> between Python and C. Multiprocessing pipes are (as far as I can tell) only >> for commo between two Python processes. I will have the parent thread send >> a signal through the pipe to the child process to exit when the parent >> thread is ready to exit, then call wait() to finalize the child process. >> >> I will reply back when it's finished and post the code so you can see what I >> have done. >> >> Thanks again. >> >> Jen >> >> >> Dec 4, 2021, 09:22 by [email protected]: >> >>> >>> >>>> On 1 Dec 2021, at 16:01, Jen Kris <>>>> [email protected]>>>> > wrote: >>>> >>>> Thanks for your comment re blocking. >>>> >>>> I removed pipes from the Python and C programs to see if it blocks without >>>> them, and it does. >>>> >>>> It looks now like the problem is not pipes. >>>> >>> >>> Ok. >>> >>> >>>> I use fork() and execv() in C to run Python in a child process, but the >>>> Python process blocks >>>> >>> >>> Use strace on the parent process to see what is happening. >>> You will need to use the option to follow subprocesses so that you can see >>> what goes on in the python process. >>> >>> See man strace and the --follow-forks and --output-separately options. >>> That will allow you to find the blocking system call that your code is >>> making. >>> >>> >>>> because fork() does not create a new thread, so the Python global >>>> interpreter lock (GIL) prevents the C program from running once Python >>>> starts. >>>> >>> >>> Not sure why you think this. >>> >>> >>>> So the solution appears to be run Python in a separate thread, which I >>>> can do with pthread create. >>>> >>>> See "Thread State and the Global Interpreter Lock" >>>> >>>> https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock>>>> >>>> and the sections below that "Non-Python created threads" and "Cautions >>>> about fork()." >>>> >>> >>> I take it you mean that in the parent you think that using pthreads will >>> affect python after the exec() call? >>> I does not. After exec() the process has one main thread create by the >>> kernel and a new address space as defined by the /usr/bin/python. >>> The only state that in inherited from the parent are open file descriptors, >>> the current working directory and security state like UID, GID. >>> >>> >>>> I'm working on that today and I hope all goes well :) >>>> >>> >>> You seem to be missing background inform
Re: Python child process in while True loop blocks parent
I can't find any support for your comment that "Fork creates a new
process and therefore also a new thread." From the Linux man pages
https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is
created with a single thread—the one that called fork()."
I have a one-core one-thread instance at Digital Ocean available running Ubuntu
18.04. I can fork and create a new process on it, but it doesn't create a new
thread because it doesn't have one available.
You may also want to see "Forking vs Threading"
(https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), "Fork
vs Thread" (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and
"Linux process and thread" (https://zliu.org/post/linux-process-and-thread)
("This means that to create a normal process fork() is used that further calls
clone() with appropriate arguments while to create a thread or LWP, a function
from pthread library calls clone() with relvant flags. So, the main difference
is generated by using different flags that can be passed to clone() funciton(to
be exact, it is a system call").
You may be confused by the fact that threads are called light-weight processes.
Or maybe I'm confused :)
If you have other information, please let me know. Thanks.
Jen
Dec 5, 2021, 18:08 by [email protected]:
> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote:
>
>> The C program creates two threads (using pthreads), one for itself and
>> one for the child process. On creation, the second pthread is pointed
>> to a C program that calls fork-execv to run the Python program. That
>> way Python runs on a separate thread.
>>
>
> I think you have the relationship between processes and threads
> backwards. A process consists of one or more threads. Fork creates a new
> process and therefore also a new thread.
>
> hp
>
> --
> _ | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | | | [email protected] |-- Charles Stross, "Creative writing
> __/ | http://www.hjp.at/ | challenge!"
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
Here is what I don't understand from what you said. "The child process is created with a single thread—the one that called fork()." To me that implies that the thread that called fork() is the same thread as the child process. I guess you're talking about the distinction between logical threads and physical threads. But the main issue is your suggestion that I should call fork-execv from the thread that runs the main C program, not from a separate physical pthread. That would certainly eliminate the overhead of creating a new pthread. I am working now to finish this, and I will try your suggestion of calling fork-execv from the "main" thread. When I reply back next I can give you a complete picture of what I'm doing. Your comments, and those of Peter Holzer and Chris Angelico, are most appreciated. Dec 6, 2021, 10:37 by [email protected]: > > >> On 6 Dec 2021, at 17:09, Jen Kris via Python-list >> wrote: >> >> I can't find any support for your comment that "Fork creates a new >> process and therefore also a new thread." From the Linux man pages >> https://www.man7.org/linux/man-pages/man2/fork.2.html, "The child process is >> created with a single thread—the one that called fork()." >> > > You just quoted the evidence! > > All new processes on unix (may all OS) only ever have one thread when they > start. > The thread-id of the first thread is the same as the process-id and referred > to as the main thread. > >> >> I have a one-core one-thread instance at Digital Ocean available running >> Ubuntu 18.04. I can fork and create a new process on it, but it doesn't >> create a new thread because it doesn't have one available. >> > > > By that logic it can only run one process... > > It has one hardware core that support one hardware thread. > Linux can create as many software threads as it likes. > >> You may also want to see "Forking vs Threading" >> (https://www.geekride.com/fork-forking-vs-threading-thread-linux-kernel), >> "Fork vs Thread" >> (https://medium.com/obscure-system/fork-vs-thread-38e09ec099e2), and "Linux >> process and thread" (https://zliu.org/post/linux-process-and-thread) ("This >> means that to create a normal process fork() is used that further calls >> clone() with appropriate arguments while to create a thread or LWP, a >> function from pthread library calls clone() with relvant flags. So, the main >> difference is generated by using different flags that can be passed to >> clone() funciton(to be exact, it is a system call"). >> >> You may be confused by the fact that threads are called light-weight >> processes. >> > > No Peter and I are not confused. > >> >> Or maybe I'm confused :) >> > > Yes you are confused. > >> >> If you have other information, please let me know. Thanks. >> > > Please get the book I recommended, or another that covers systems programming > on unix, and have a read. > > Barry > >> >> Jen >> >> >> Dec 5, 2021, 18:08 by [email protected]: >> >>> On 2021-12-06 00:51:13 +0100, Jen Kris via Python-list wrote: >>> >>>> The C program creates two threads (using pthreads), one for itself and >>>> one for the child process. On creation, the second pthread is pointed >>>> to a C program that calls fork-execv to run the Python program. That >>>> way Python runs on a separate thread. >>>> >>> >>> I think you have the relationship between processes and threads >>> backwards. A process consists of one or more threads. Fork creates a new >>> process and therefore also a new thread. >>> >>> hp >>> >>> -- >>> _ | Peter J. Holzer| Story must make more sense than reality. >>> |_|_) || >>> | | | [email protected] |-- Charles Stross, "Creative writing >>> __/ | http://www.hjp.at/ | challenge!" >>> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> -- https://mail.python.org/mailman/listinfo/python-list
Re: Python child process in while True loop blocks parent
I started this post on November 29, and there have been helpful comments since
then from Barry Scott, Cameron Simpson, Peter Holzer and Chris Angelico.
Thanks to all of you.
I've found a solution that works for my purpose, and I said earlier that I
would post the solution I found. If anyone has a better solution I would
appreciate any feedback.
To recap, I'm using a pair of named pipes for IPC between C and Python. Python
runs as a child process after fork-execv. The Python program continues to run
concurrently in a while True loop, and responds to requests from C at
intervals, and continues to run until it receives a signal from C to exit. C
sends signals to Python, then waits to receive data back from Python. My
problem was that C was blocked when Python started.
The solution was twofold: (1) for Python to run concurrently it must be a
multiprocessing loop (from the multiprocessing module), and (2) Python must
terminate its write strings with \n, or read will block in C waiting for
something that never comes. The multiprocessing module sidesteps the GIL;
without multiprocessing the GIL will block all other threads once Python
starts.
Originally I used epoll() on the pipes. Cameron Smith and Barry Scott advised
against epoll, and for this case they are right. Blocking pipes work here, and
epoll is too much overhead for watching on a single file descriptor.
This is the Python code now:
#!/usr/bin/python3
from multiprocessing import Process
import os
print("Python is running")
child_pid = os.getpid()
print('child process id:', child_pid)
def f(a, b):
print("Python now in function f")
pr = os.open('/tmp/Pipe_01', os.O_RDONLY)
print("File Descriptor1 Opened " + str(pr))
pw = os.open('/tmp/Pipe_02', os.O_WRONLY)
print("File Descriptor2 Opened " + str(pw))
while True:
v = os.read(pr,64)
print("Python read from pipe pr")
print(v)
if v == b'99':
os.close(pr)
os.close(pw)
print("Python is terminating")
os._exit(os.EX_OK)
if v != "Send child PID":
os.write(pw, b"OK message received\n")
print("Python wrote back")
if __name__ == '__main__':
a = 0
b = 0
p = Process(target=f, args=(a, b,))
p.start()
p.join()
The variables a and b are not currently used in the body, but they will be
later.
This is the part of the C code that communicates with Python:
fifo_fd1 = open(fifo_path1, O_WRONLY);
fifo_fd2 = open(fifo_path2, O_RDONLY);
status_write = write(fifo_fd1, py_msg_01, sizeof(py_msg_01));
if (status_write < 0) perror("write");
status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_01));
if (status_read < 0) perror("read");
printf("C received message 1 from Python\n");
printf("%.*s",(int)buf_len, fifo_readbuf);
status_write = write(fifo_fd1, py_msg_02, sizeof(py_msg_02));
if (status_write < 0) perror("write");
status_read = read(fifo_fd2, fifo_readbuf, sizeof(py_msg_02));
if (status_read < 0) perror("read");
printf("C received message 2 from Python\n");
printf("%.*s",(int)buf_len, fifo_readbuf);
// Terminate Python multiprocessing
printf("C is sending exit message to Python\n");
status_write = write(fifo_fd1, py_msg_03, 2);
printf("C is closing\n");
close(fifo_fd1);
close(fifo_fd2);
Screen output:
Python is running
child process id: 5353
Python now in function f
File Descriptor1 Opened 6
Thread created 0
File Descriptor2 Opened 7
Process ID: 5351
Parent Process ID: 5351
I am the parent
Core joined 0
I am the child
Python read from pipe pr
b'Hello to Python from C\x00\x00'
Python wrote back
C received message 1 from Python
OK message received
Python read from pipe pr
b'Message to Python 2\x00\x00'
Python wrote back
C received message 2 from Python
OK message received
C is sending exit message to Python
C is closing
Python read from pipe pr
b'99'
Python is terminating
Python runs on a separate thread (created with pthreads) because I want the
flexibility of using this same basic code as a stand-alone .exe, or for a C
extension from Python called with ctypes. If I use it as a C extension then I
want the Python code on a separate thread because I can't have two instances of
the Python interpreter running on one thread, and one instance will already be
running on the main thread, albeit "suspended" by the call from ctypes.
So that's my solution: (1) Python multiprocessing module; (2) Python strings
written to the pipe must be terminated with \n.
Thanks again to all who commented.
Dec 6, 2021, 13:33 by ba...
Data unchanged when passing data to Python in multiprocessing shared memory
I am using multiprocesssing.shared_memory to pass data between NASM and Python. The shared memory is created in NASM before Python is called. Python connects to the shm: shm_00 = shared_memory.SharedMemory(name='shm_object_00',create=False). I have used shared memory at other points in this project to pass text data from Python back to NASM with no problems. But now this time I need to pass a 32-bit integer (specifically 32,894) from NASM to Python. First I convert the integer to bytes in a C program linked into NASM: unsigned char bytes[4] unsigned long int_to_convert = 32894; bytes[0] = (int_to_convert >> 24) & 0xFF; bytes[1] = (int_to_convert >> 16) & 0xFF; bytes[2] = (int_to_convert >> 8) & 0xFF; bytes[3] = int_to_convert & 0xFF; memcpy(outbuf, bytes, 4); where outbuf is a pointer to the shared memory. On return from C to NASM, I verify that the first four bytes of the shared memory contain what I want, and they are 0, 0, -128, 126 which is binary 1000 0110, and that's correct (32,894). Next I send a message to Python through a FIFO to read the data from shared memory. Python uses the following code to read the first four bytes of the shared memory: byte_val = shm_00.buf[:4] print(shm_00.buf[0]) print(shm_00.buf[1]) print(shm_00.buf[2]) print(shm_00.buf[3]) But the bytes show as 40 39 96 96, which is exactly what the first four bytes of this shared memory contained before I called C to overwrite them with the bytes 0, 0, -128, 126. So Python does not see the updated bytes, and naturally int.from_bytes(byte_val, "little") does not return the result I want. I know that Python refers to shm00.buf, using the buffer protocol. Is that the reason that Python can't see the data that has been updated by another language? So my question is, how can I alter the data in shared memory in a non-Python language to pass back to Python? Thanks, Jen -- https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
Barry, thanks for your reply.
On the theory that it is not yet possible to pass data from a non-Python
language to Python with multiprocessing.shared_memory, I bypassed the problem
by attaching 4 bytes to my FIFO pipe message from NASM to Python:
byte_val = v[10:14]
where v is the message read from the FIFO. Then:
breakup = int.from_bytes(byte_val, "big")
print("this is breakup " + str(breakup))
Python prints: this is breakup 32894
Note that I had to switch from little endian to big endian. Python is little
endian by default, but in this case it's big endian.
However, if anyone on this list knows how to pass data from a non-Python
language to Python in multiprocessing.shared_memory please let me (and the
list) know.
Thanks.
Feb 1, 2022, 14:20 by [email protected]:
>
>
>> On 1 Feb 2022, at 20:26, Jen Kris via Python-list
>> wrote:
>>
>> I am using multiprocesssing.shared_memory to pass data between NASM and
>> Python. The shared memory is created in NASM before Python is called.
>> Python connects to the shm: shm_00 =
>> shared_memory.SharedMemory(name='shm_object_00',create=False).
>>
>> I have used shared memory at other points in this project to pass text data
>> from Python back to NASM with no problems. But now this time I need to pass
>> a 32-bit integer (specifically 32,894) from NASM to Python.
>>
>> First I convert the integer to bytes in a C program linked into NASM:
>>
>> unsigned char bytes[4]
>> unsigned long int_to_convert = 32894;
>>
>> bytes[0] = (int_to_convert >> 24) & 0xFF;
>> bytes[1] = (int_to_convert >> 16) & 0xFF;
>> bytes[2] = (int_to_convert >> 8) & 0xFF;
>> bytes[3] = int_to_convert & 0xFF;
>> memcpy(outbuf, bytes, 4);
>>
>> where outbuf is a pointer to the shared memory. On return from C to NASM, I
>> verify that the first four bytes of the shared memory contain what I want,
>> and they are 0, 0, -128, 126 which is binary 1000
>> 0110, and that's correct (32,894).
>>
>> Next I send a message to Python through a FIFO to read the data from shared
>> memory. Python uses the following code to read the first four bytes of the
>> shared memory:
>>
>> byte_val = shm_00.buf[:4]
>> print(shm_00.buf[0])
>> print(shm_00.buf[1])
>> print(shm_00.buf[2])
>> print(shm_00.buf[3])
>>
>> But the bytes show as 40 39 96 96, which is exactly what the first four
>> bytes of this shared memory contained before I called C to overwrite them
>> with the bytes 0, 0, -128, 126. So Python does not see the updated bytes,
>> and naturally int.from_bytes(byte_val, "little") does not return the result
>> I want.
>>
>> I know that Python refers to shm00.buf, using the buffer protocol. Is that
>> the reason that Python can't see the data that has been updated by another
>> language?
>>
>> So my question is, how can I alter the data in shared memory in a non-Python
>> language to pass back to Python?
>>
>
> Maybe you need to use a memory barrier to force the data to be seen by
> another cpu?
> Maybe use shm lock operation to sync both sides?
> Googling I see people talking about using stdatomic.h for this.
>
> But I am far from clear what you would need to do.
>
> Barry
>
>>
>> Thanks,
>>
>> Jen
>>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
It's not clear to me from the struct module whether it can actually auto-detect endianness. I think it must be specified, just as I had to do with int.from_bytes(). In my case endianness was dictated by how the four bytes were populated, starting with the zero bytes on the left. Feb 1, 2022, 21:30 by [email protected]: > On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris > declaimed the following: > >> >> breakup = int.from_bytes(byte_val, "big") >> > >print("this is breakup " + str(breakup)) > >> >> > >Python prints: this is breakup 32894 > >> >> > >Note that I had to switch from little endian to big endian. Python is > >little endian by default, but in this case it's big endian. > >> >> > Look at the struct module. I'm pretty certain it has flags for big or > little end, or system native (that, or run your integers through the > various "network byte order" functions that I think C and Python both > support. > > https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html > > > >However, if anyone on this list knows how to pass data from a non-Python > >language to Python in multiprocessing.shared_memory please let me (and the > >list) know. > > MMU cache lines not writing through to RAM? Can't find > anything on Google to force a cache flush Can you test on a > different OS? (Windows vs Linux) > > > > -- > Wulfraed Dennis Lee Bieber AF6VN > [email protected]://wlfraed.microdiversity.freeddns.org/ > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Data unchanged when passing data to Python in multiprocessing shared memory
An ASCII string will not work. If you convert 32894 to an ascii string you will have five bytes, but you need four. In my original post I showed the C program I used to convert any 32-bit number to 4 bytes. Feb 2, 2022, 10:16 by [email protected]: > I applaud trying to find the right solution but wonder if a more trivial > solution is even being considered. It ignores big and little endians and just > converts your data into another form and back. > > If all you want to do is send an integer that fit in 32 bits or 64 bits, why > not convert it to a character string in a form that both machines will see > the same way and when read back, convert it back to an integer? > > As long as both side see the same string, this can be done in reasonable time > and portably. > > Or am I missing something? Is "1234" not necessarily seen in the same order, > or "1.234e3" or whatever? > > Obviously, if the mechanism is heavily used and multiple sides keep reading > and even writing the same memory location, this is not ideal. But having > different incompatible processors looking at the same memory is also not. > > -Original Message- > From: Dennis Lee Bieber > To: [email protected] > Sent: Wed, Feb 2, 2022 12:30 am > Subject: Re: Data unchanged when passing data to Python in multiprocessing > shared memory > > > On Wed, 2 Feb 2022 00:40:22 +0100 (CET), Jen Kris > > declaimed the following: > > > >> >> >> breakup = int.from_bytes(byte_val, "big") >> > > >print("this is breakup " + str(breakup)) > >> >> > > >Python prints: this is breakup 32894 > >> >> > > >Note that I had to switch from little endian to big endian. Python is > >little endian by default, but in this case it's big endian. > >> >> > > Look at the struct module. I'm pretty certain it has flags for big or > > little end, or system native (that, or run your integers through the > > various "network byte order" functions that I think C and Python both > > support. > > > > https://www.gta.ufrj.br/ensino/eel878/sockets/htonsman.html > > > > > > >However, if anyone on this list knows how to pass data from a non-Python > >language to Python in multiprocessing.shared_memory please let me (and the > >list) know. > > > > MMU cache lines not writing through to RAM? Can't find > > anything on Google to force a cache flush Can you test on a > > different OS? (Windows vs Linux) > > > > > > > > -- > > Wulfraed Dennis Lee Bieber AF6VN > > [email protected] http://wlfraed.microdiversity.freeddns.org/ > > -- > > https://mail.python.org/mailman/listinfo/python-list > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Can't get iterator in the C API
I am using the Python C API to load the Gutenberg corpus from the nltk library
and iterate through the sentences. The Python code I am trying to replicate is:
from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
sentences = gutenberg.sents(fileid)
etc
where gutenberg.fileids is, of course, iterable.
I use the following C API code to import the module and get pointers:
int64_t Call_PyModule()
{
PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod;
pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);
if (pModule == 0x0){
PyErr_Print();
return 1; }
pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");
pFidIter = PyObject_GetIter(pFidMod);
int ckseq_ok = PySeqIter_Check(pFidMod);
pFidSeqIter = PySeqIter_New(pFidMod);
return 0;
}
pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator lines
return zero:
pFidIter = PyObject_GetIter(pFidMod);
int ckseq_ok = PySeqIter_Check(pFidMod);
pFidSeqIter = PySeqIter_New(pFidMod);
So the C API thinks gutenberg.fileids is not iterable, but it is. What am I
doing wrong?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Can't get iterator in the C API
Thank you for clarifying that. Now on to getting the iterator from the method. Jen Feb 8, 2022, 18:10 by [email protected]: > On 2022-02-09 01:12, Jen Kris via Python-list wrote: > >> I am using the Python C API to load the Gutenberg corpus from the nltk >> library and iterate through the sentences. The Python code I am trying to >> replicate is: >> >> from nltk.corpus import gutenberg >> for i, fileid in enumerate(gutenberg.fileids()): >> sentences = gutenberg.sents(fileid) >> etc >> >> where gutenberg.fileids is, of course, iterable. >> >> I use the following C API code to import the module and get pointers: >> >> int64_t Call_PyModule() >> { >> PyObject *pModule, *pName, *pSubMod, *pFidMod, *pFidSeqIter,*pSentMod; >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> if (pModule == 0x0){ >> PyErr_Print(); >> return 1; } >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFidIter = PyObject_GetIter(pFidMod); >> int ckseq_ok = PySeqIter_Check(pFidMod); >> pFidSeqIter = PySeqIter_New(pFidMod); >> >> return 0; >> } >> >> pSubMod, pFidMod and pSentMod all return valid pointers, but the iterator >> lines return zero: >> >> pFidIter = PyObject_GetIter(pFidMod); >> int ckseq_ok = PySeqIter_Check(pFidMod); >> pFidSeqIter = PySeqIter_New(pFidMod); >> >> So the C API thinks gutenberg.fileids is not iterable, but it is. What am I >> doing wrong? >> > Look at your Python code. You have "gutenberg.fileids()", so the 'fileids' > attribute is not an iterable itself, but a method that you need to call to > get the iterable. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
C API PyObject_Call segfaults with string
This is a follow-on to a question I asked yesterday, which was answered by
MRAB. I'm using the Python C API to load the Gutenberg corpus from the nltk
library and iterate through the sentences. The Python code I am trying to
replicate is:
from nltk.corpus import gutenberg
for i, fileid in enumerate(gutenberg.fileids()):
sentences = gutenberg.sents(fileid)
etc
I have everything finished down to the last line (sentences =
gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents,
but it segfaults. The fileid is a string -- the first fileid in this corpus is
"austen-emma.txt."
pName = PyUnicode_FromString("nltk.corpus");
pModule = PyImport_Import(pName);
pSubMod = PyObject_GetAttrString(pModule, "gutenberg");
pFidMod = PyObject_GetAttrString(pSubMod, "fileids");
pSentMod = PyObject_GetAttrString(pSubMod, "sents");
pFileIds = PyObject_CallObject(pFidMod, 0);
pListItem = PyList_GetItem(pFileIds, listIndex);
pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict");
pListStr = PyBytes_AS_STRING(pListStrE);
Py_DECREF(pListStrE);
// sentences = gutenberg.sents(fileid)
PyObject *c_args = Py_BuildValue("s", pListStr);
PyObject *NullPtr = 0;
pSents = PyObject_Call(pSentMod, c_args, NullPtr);
The final line segfaults:
Program received signal SIGSEGV, Segmentation fault.
0x76e4e8d5 in _PyEval_EvalCodeWithName ()
from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0
My guess is the problem is in Py_BuildValue, which returns a pointer but it may
not be constructed correctly. I also tried it with "O" and it doesn't segfault
but it returns 0x0.
I'm new to using the C API. Thanks for any help.
Jen
--
https://mail.python.org/mailman/listinfo/python-list
Re: C API PyObject_Call segfaults with string
Thanks for your reply. I eliminated the DECREF and now it doesn't segfault but it returns 0x0. Same when I substitute pListStrE for pListStr. pListStr contains the string representation of the fileid, so it seemed like the one to use. According to http://web.mit.edu/people/amliu/vrut/python/ext/buildValue.html, PyBuildValue "builds a tuple only if its format string contains two or more format units" and that doc contains examples. Feb 9, 2022, 16:52 by [email protected]: > On Thu, Feb 10, 2022 at 9:42 AM Jen Kris via Python-list > wrote: > >> >> I have everything finished down to the last line (sentences = >> gutenberg.sents(fileid)) where I use PyObject_Call to call gutenberg.sents, >> but it segfaults. The fileid is a string -- the first fileid in this corpus >> is "austen-emma.txt." >> >> pName = PyUnicode_FromString("nltk.corpus"); >> pModule = PyImport_Import(pName); >> >> pSubMod = PyObject_GetAttrString(pModule, "gutenberg"); >> pFidMod = PyObject_GetAttrString(pSubMod, "fileids"); >> pSentMod = PyObject_GetAttrString(pSubMod, "sents"); >> >> pFileIds = PyObject_CallObject(pFidMod, 0); >> pListItem = PyList_GetItem(pFileIds, listIndex); >> pListStrE = PyUnicode_AsEncodedString(pListItem, "UTF-8", "strict"); >> pListStr = PyBytes_AS_STRING(pListStrE); >> Py_DECREF(pListStrE); >> > > HERE. > PyBytes_AS_STRING() returns pointer in the pListStrE Object. > So Py_DECREF(pListStrE) makes pListStr a dangling pointer. > >> >> // sentences = gutenberg.sents(fileid) >> PyObject *c_args = Py_BuildValue("s", pListStr); >> > > Why do you encode&decode pListStrE? > Why don't you use just pListStrE? > >> PyObject *NullPtr = 0; >> pSents = PyObject_Call(pSentMod, c_args, NullPtr); >> > > c_args must tuple, but you passed a unicode object here. > Read https://docs.python.org/3/c-api/arg.html#c.Py_BuildValue > > >> The final line segfaults: >> Program received signal SIGSEGV, Segmentation fault. >> 0x76e4e8d5 in _PyEval_EvalCodeWithName () >> from /usr/lib/x86_64-linux-gnu/libpython3.8.so.1.0 >> >> My guess is the problem is in Py_BuildValue, which returns a pointer but it >> may not be constructed correctly. I also tried it with "O" and it doesn't >> segfault but it returns 0x0. >> >> I'm new to using the C API. Thanks for any help. >> >> Jen >> >> >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > > Bests, > > -- > Inada Naoki > -- https://mail.python.org/mailman/listinfo/python-list
