[Tutor] List weirdness
Hi, I was wondering why this happens. I was trying to create a list of lists. >>> d = [[]] >>> d[0][0]=1 Traceback (most recent call last): File "", line 1, in ? IndexError: list assignment index out of range >>> d [[]] What's wrong with that? However: >>> d[0].append(1) >>> d [[1]] I guess I can't reference [0] on an empty list. (I come from a C background.) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] ConfigParser re-read fails
It looks way too simplified. I have no idea where the problem is. Would you mind showing the script? gist.github.com is good for posting code. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Default list arguments in __init__
Hi, This behavior was totally unexpected. I only caught it because it was the only thing I changed. >>> class foo: ... def __init__(self, lst=[]): ... self.items = lst ... >>> f1 = foo() >>> f1.items [] >>> f1.items.append(1) >>> f2 = foo() >>> f2.items [1] Huh? lst is a reference to the *same list* every instance? I guess I have to do it like this. It seems to work. (i.e. every foo instance with default lst now has a unique new list.) def__init__(self, lst=None): self.items = lst or [] This is on python 2.4.4c1 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] extracting a column from many files
On Thu, Feb 19, 2009 at 2:41 AM, Bala subramanian wrote: > Dear friends, > > I want to extract certain 6 different columns from a many files and write it > to 6 separate output files. I took some help from the following link > > http://mail.python.org/pipermail/tutor/2004-November/033475.html > > to write one column from many input files to a particular output file. Let me see if I understand what you want to do. You have file1.txt, file2.txt, file3.txt ... and you want to read n columns from those files? It gets confusing. How many columns do you want to read from each file? How many columns does each output file have? Also, it would be very helpful if you give us the format of the input and output files. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] extracting a column from many files
For example, if you have input files: file1: 1a1 1b1 1c1 1d1 1e1 1f1 2a1 2b1 2c1 2d1 2e1 2f1 3a1 3b1 3c1 3d1 3e1 3f1 file2: 1a2 1b2 1c2 1d2 1e2 1f2 2a2 2b2 2c2 2d2 2e2 2f2 3a2 3b2 3c2 3d2 3e2 3f2 How do you want the output files to look like? ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] extracting a column from many files
Here's a simple repositioning code given that you already have the fields extracted. All files have to have equal dimensions. file1 = [["1a1", "1b1", "1c1",], ["2a1", "2b1", "2c1"],] file2 = [["1a2", "1b2", "1c2",], ["2a2", "2b2", "2c2"],] files = [file1, file2] out_lines = [] for column in range(len(files[0][0])): for fileno in range(len(files)): out_lines.append([]) for row in range(len(files[0])): out_lines[-1].append(files[fileno][row][column]) # write out_lines to file "file%s" % column print out_lines out_lines = [] No offense, but your extracting code looks a bit inflexible. It has a lot of magic numbers, and most of it is hardcoded. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] how to instantiate a class
Here is a good book if you are already familiar with other languages. http://diveintopython.org/object_oriented_framework/instantiating_classes.html ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Binary to Decimal conversion
You're making it more complicated than it needs to. Also, you first used binnum then binum, and you didn't define binsum. It could easily be done like this: binnum = raw_input("Please enter a binary number: ") decnum = 0 rank = 1 for i in reversed(binnum): decnum += rank * int(i) rank *= 2 Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory error
On Fri, Mar 6, 2009 at 5:03 PM, Harris, Sarah L wrote: > fname=filter(isfile, glob.glob('*.zip')) > for fname in fname: > zipnames=filter(isfile, glob.glob('*.zip')) > for zipname in zipnames: > ... It looks you're using an unnecessary extra loop. Aren't the contents of fname similar to zipnames? I tried it with one loop (for zipname in zipnames:) and it worked. P.S. You're at jpl? That's awesome! I was looking at internships they have few days ago. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory error files over 100MB
On Tue, Mar 10, 2009 at 8:45 AM, Harris, Sarah L wrote: > That looks better, thank you. > However I still have a memory error when I try to run it on three or more > files that are over 100 MB? How big are files in the zip file? It seems that in this line newFile.write(zf.read(zfilename)) the compressed file is unzipped to memory first, then written to the new file. You can read and write in smaller chunks using file objects returned by zf.open(), which take a size parameter. (Maybe it wouldn't work since the file is going to get extracted to memory anyway.) However, the open() method is in Python 2.6, and in Python 2.6 there is also the extractall() method http://docs.python.org/library/zipfile.html#zipfile.ZipFile.extractall which does what you're doing. I'm not sure if it will still cause a memory error. Also, are files in your zip files not in directories, since you're not creating directories? Since this is a learning experience, it might also help creating functions to minimize clutter, or you could familiarize yourself with the language. Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Issues Parsing XML
So you want one line for each element? Easy: # Get elements findings = domDatasource.getElementsByTagName('FINDING') # Get the text of all direct child nodes in each element # That's assuming every child has a TEXT_NODE node. lines = [] for finding in findings: lines.append([f.firstChild.data for f in finding.childNodes]) # print for line in lines: print ", ".join(line) Not sure how you want to deal with newlines. You can escape them to \n in the output, or you might find something in the CSV module. (I haven't looked at it.) Now this doesn't deal with missing elements. I found some have 7, and others have 9. You might be able to insert two empty elements in lines with length 7. Or, if you want to have more control, you can make a dictionary with keys of all available tag names, and for each element found in , insert it in the dictionary (If it's a valid tag name). Then you have a list of dictionaries, and you can print the elements in any order you want. Missing elements will have null strings as values. Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Issues Parsing XML
I'm a little bored, so I wrote a function that gets elements and puts them in a dictionary. Missing elements are just an empty string. http://gist.github.com/78385 Usage: >>> d = process_finding(findings[0]) >>> ", ".join(map(lambda e: d[e], elements)) u'V0006310, NF, , , GD, 2.0.8.8, TRUE, DTBI135-Scripting\nof Java applets -\nRestricted, 2' Now for a of 9 elements: >>> d = process_finding(findings[1]) >>> ", ".join(map(lambda e: d[e], elements)) u'V0006311, O, The value:\nSoftware\\Policies\\Microsoft\\Windows\\CurrentVersion\\Internet\nSettings\\Zones\\4\\1A00 does not exist.\n\n, The value:\nSoftware\\Policies\\Microsoft\\Windows\\CurrentVersion\\Internet\nSettings\\Zones\\4\\1A00 does not exist.\n\n, GD, 2.0.8.8, TRUE, DTBI136-User\nAuthentication - Logon -\nRestricted, 2' The map() function just applies the dictionary to each element in the elements list. You can reorder them anyway you want. You're welcome :) Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] problem in replacing regex
Hi, You can do the substitution in many ways. You can first search for bare account numbers and substitute them with urls. Then substitute urls into tags. To substitute account numbers that aren't in urls, you simply substitutes account numbers if they don't start with a "/", as you have been trying to do. re.sub() can accept a function instead of a string. The function receives the match object and returns a replacement. This way you can do extra processing to matches. import re text = """https://hello.com/accid/12345-12 12345-12 http://sadfsdf.com/asdf/asdf/asdf/12345-12 start12345-12end this won't be replaced start/123-45end """ def sub_num(m): if m.group(1) == '/': return m.group(0) else: # put url here return m.group(1) + 'http://example.com/' + m.group(2) >>> print re.sub(r'(\D)(\d+-\d+)', sub_num , text) https://hello.com/accid/12345-12 http://example.com/12345-12 http://sadfsdf.com/asdf/asdf/asdf/12345-12 starthttp://example.com/12345-12end this won't be replaced start/123-45end >>> _ This is assuming there isn't any tags in the input, so you should do this before substituting urls into tags. I have super cow powers! Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Optional groups in RE's
Hello Tutors! I was trying to make some groups optional in a regular expression, but I couldn't do it. For example, I have the string: >>> data = "42 sdlfks d f60 sdf sdf >>> Title" and the pattern: >>> pattern = "(.*?).*?(.*?).*?(.*?)" This works when all the groups are present. >>> re.search(pattern, data).groups() ('42', '60', 'Title') However, I don't know how to make an re to deal with possibly missing groups. For example, with: >>> data = "42 sdlfks d f60 sdf sdf" I tried >>> pattern = >>> "(.*?).*?(.*?).*?(?:(.*?))?" >>> re.search(pattern, data).groups() ('42', '60', None) but it doesn't work when _is_ present. >>> data = "42 sdlfks d f60 sdf sdf >>> Title" >>> re.search(pattern, data).groups() ('42', '60', None) I tried something like (?:pattern)+ and (?:pattern)* but I couldn't get what I wanted. (.*?)? doesn't seem to be a valid re either. I know (?:pattern) is a non-capturing group. I just read that | has very low precedence, so I used parenthesis liberally to "or" pattern and a null string. >>> pattern = >>> "(.*?).*?(.*?).*?(?:(?:(.*?))|)" >>> re.search(pattern, data).groups() ('42', '60', None) (?:(?:pattern)|(?:.*)) didn't work either. I want to be able to have some groups as optional, so when that group isn't matched, it returns None. When it's match it should return what is matched. Is that possible with one re? I could probably do it with more than one re (and did it) but with one re the solution is much more elegant. (i.e. I could have named groups, then pass the resultant dictionary to a processing function) I also tried matching optional groups before, and curious about the solution. Moos ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Optional groups in RE's
Mark Tolonen wrote: > Your data looks like XML. If it is actually well-formed XML, have you tried > ElementTree? It is XML. I used minidom from xml.dom, and it worked fine, except it was ~16 times slower. I'm parsing a ~70mb file, and the difference is 3 minutes to 10 seconds with re's. I used separate re's for each field I wanted, and it worked nicely. (1-1 between DOM calls and re.search and re.finditer) This problem raised when I tried to do the match in one re. I guess instead of minidom I could try lxml, which uses libxml2, which is written in C. Kent Johnson wrote: > This re doesn't have to match anything after so it doesn't. > You can force it to match to the end by adding $ at the end but that > is not enough, you have to make the ".*?" *not* match . > One way to do that is to use [^<]*? instead of .*?: Ah. Thanks. Unfortunately, the input string is multi-line, and doesn't end in Moos P.S. I'm still relatively new to RE's, or IRE's. sed, awk, grep, and perl have different format for re's. grep alone has four different versions of RE's! Since the only form of re I'm using is "start(.*?)end" I was thinking about writing a C program to do that. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Splitting strings and undefined variables
Hello all, I was looking at this: http://www.debian.org/doc/manuals/reference/ch-program.en.html#s-python I have a question about the line of code that uses split() With the python version, the line below only works if there are three fields in line. (first, last, passwd) = line.split() Also, since the variables are used like this: lineout = "%s:%s:%d:%d:%s %s,,/home/%s:/bin/bash\n" % \ (user, passwd, uid, gid, first, last, user) I can't use ":".join(line.split()) But maybe a dictionary could be used for string substitution. In the perl version (above the python version in the link), the script works with the input line having one to three fields. Like "fname lname pw" or "fname lname" ($n1, $n2, $n3) = split / /; Is there a better way to extract the fields from line in a more flexible way, so that the number of fields could vary? I guess we could use conditionals to check each field, but is there a more elegant (or pythonic!) way to do it? Moos P.S. I'm not a Perl user, I was just reading the examples. I've been using C and awk for few years, and Python for few months. Also, I know blank passwords aren't very practical, but I'm just asking this to explore possibilities :) ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor