[Tutor] removing nodes using ElementTree

2015-06-28 Thread street . sweeper
Hello all,

I'm trying to merge and filter some xml.  This is working well, but I'm
getting one node that's not in my list to include.  Python version is
3.4.0.

The goal is to merge multiple xml files and then write a new one based
on whether or not  is in an include list.  In the mock data below,
the 3 xml files have a total of 8  nodes, and I have 4  values
in my list.  The output is correctly formed xml, but it includes 5 
nodes; the 4 in the list, plus 89012 from input1.xml.  It runs without
error.  I've used used type() to compare
rec.find('part').find('pid').text and the items in the list, they're
strings.  When the first for loop is done, xmlet has 8 rec nodes.  Is
there a problem in the iteration in the second for?  Any other
recommendations also welcome.  Thanks!


The code itself was cobbled together from two sources,
http://stackoverflow.com/questions/9004135/merge-multiple-xml-files-from-command-line/11315257#11315257
and http://bryson3gps.wordpress.com/tag/elementtree/

Here's the code and data:

#!/usr/bin/env python3

import os, glob
from xml.etree import ElementTree as ET

xmls = glob.glob('input*.xml')
ilf = os.path.join(os.path.expanduser('~'),'include_list.txt')
xo = os.path.join(os.path.expanduser('~'),'mergedSortedOutput.xml')

il = [x.strip() for x in open(ilf)]

xmlet = None

for xml in xmls:
d = ET.parse(xml).getroot()
for rec in d.iter('inv'):
if xmlet is None:
xmlet = d
else:
xmlet.extend(rec)

for rec in xmlet:
if rec.find('part').find('pid').text not in il:
xmlet.remove(rec)

ET.ElementTree(xmlet).write(xo)

quit()





include_list.txt

12345
34567
56789
67890

input1.xml




67890
67890t


67890d




78901
78901t


78901d




89012
89012t


89012d




input2.xml




45678
45678t


45678d




56789
56789t


56789d




input3.xml




12345
12345t


12345d




23456
23456t


23456d




34567
34567t


34567d




mergedSortedOutput.xml:




67890
67890t


67890d




89012
89012t


89012d




12345
12345t


12345d




34567
34567t


34567d




56789
56789t


56789d



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Certifications

2015-08-08 Thread street . sweeper
On Mon, Aug 3, 2015, at 04:55 PM, acolta wrote:
> Hi,
> 
> I am new in python, so just curios if there are any good and appreciated
> python certification programs/courses ?

I'm interested in this too, but some googling only finds a 4-part
O'Reilly program that's no longer available.  They're moving their study
materials to https://beta.oreilly.com/learning but I don't see any
obvious replacement for this course.

You might try Alan Gauld's site http://www.alan-g.me.uk/ (he's on this
list), http://learnpythonthehardway.org or
http://www.diveintopython3.net/ for step-by-step introduction of
concepts.  There are also a lot of universities offering classes via
OpenCourseWare.  But there's no way to earn any kind of formal
certificate through these, as far as I know.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] library terminology and importing

2016-02-21 Thread street . sweeper
Hello all,

I often use now() and strftime() from datetime, but it seems like I
can't import just those functions.  The os module allows me to import
like this:

from os.path import join,expanduser

but I get an error if I try

from datetime.datetime import now, strftime

But if I import all of os and datetime, I can use those functions by
writing the full 'path' 3 levels deep:

os.path.expanduser('~')
datetime.datetime.now()

Is there a way to import individual functions from datetime.datetime?


Also, is there proper terminology for each of the 3 sections of
os.path.expanduser('~') for example?  Such as

os - library (or module?)
path - ?
expanduser - function


Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] improvements on a renaming script

2014-03-09 Thread street . sweeper
Hello all,

A bit of background, I had some slides scanned and a 3-character
slice of the file name indicates what roll of film it was.
This is recorded in a tab-separated file called fileNames.tab.
Its content looks something like:

p01 200511_autumn_leaves
p02 200603_apple_plum_cherry_blossoms

The original file names looked like:

1p01_abc_0001.jpg
1p02_abc_0005.jpg

The renamed files are:

200511_autumn_leaves_-_001.jpeg
200603_apple_plum_cherry_blossoms_-_005.jpeg

The script below works and has done what I wanted, but I have a
few questions:

- In the get_long_names() function, the for/if thing is reading
the whole fileNames.tab file every time, isn't it?  In reality,
the file was only a few dozen lines long, so I suppose it doesn't
matter, but is there a better way to do this?

- Really, I wanted to create a new sequence number at the end of
each file name, but I thought this would be difficult.  In order
for it to count from 01 to whatever the last file is per set p01,
p02, etc, it would have to be aware of the set name and how many
files are in it.  So I settled for getting the last 3 digits of
the original file name using splitext().  The strings were unique,
so it worked out.  However, I can see this being useful in other
places, so I was wondering if there is a good way to do this.
Is there a term or phrase I can search on?

- I'd be interested to read any other comments on the code.
I'm new to python and I have only a bit of computer science study,
quite some time ago.


#!/usr/bin/env python3

import os
import csv

# get longnames from fileNames.tab
def get_long_name(glnAbbrev):
with open(
  os.path.join(os.path.expanduser('~'),'temp2','fileNames.tab')
  ) as filenames:
filenamesdata = csv.reader(filenames, delimiter='\t')
for row in filenamesdata:
if row[0] == glnAbbrev:
return row[1]

# find shortname from slice in picture filename
def get_slice(fn):
threeColSlice = fn[1:4]
return threeColSlice

# get 3-digit sequence number from basename
def get_bn_seq(fn):
seq = os.path.splitext(fn)[0][-3:]
return seq

# directory locations
indir = os.path.join(os.path.expanduser('~'),'temp4')
outdir = os.path.join(os.path.expanduser('~'),'temp5')

# rename
for f in os.listdir(indir):
if f.endswith(".jpg"):
os.rename(
os.path.join(indir,f),os.path.join(
outdir,

get_long_name(get_slice(f))+"_-_"+get_bn_seq(f)+".jpeg")
)

exit()


Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] ElementTree, iterable container, depth of elements

2014-03-29 Thread street . sweeper
I'm trying to sort the order of elements in an xml file, mostly
to make visual inspection/comparison easier.  The example xml and
code on http://effbot.org/zone/element-sort.htm get me almost
what I need, but the xml I'm working with has the element I'm
trying to sort on one level deeper.


That page's example xml:


  

  Ned
  555-8904


  John
  555-5782


  Julius
  555-3642

  



And that page's last example of code:

  import xml.etree.ElementTree as ET
  tree = ET.parse("data.xml")
  def getkey(elem):
return elem.findtext("number")
  container = tree.find("entries")
  container[:] = sorted(container,key=getkey)
  tree.write("new-data.xml")

I used the interactive shell to experiment a bit with that,
and I can see that 'container' in

  container = tree.find("entries")

is iterable, using

  for a in container:
print(a)

However, the xml I'm working with looks something like this:


  

  
20140325
dentist
  
  
20140324
barber
  

  



What I'd like to do is rearrange the  elements within
 based on the  element.  If I remove the 
level, this will work, but I'm interested in getting the code to
work without editing the file.

I look for "Date" and "diary" rather than "number" and "entries"
but when I try to process the file as-is, I get an error like


Traceback (most recent call last):
  File "./xmlSort.py", line 16, in 
container[:] = sorted(container, key=getkey)
TypeError: 'NoneType' object is not iterable


"container[:] = sorted(container, key=getkey)" confuses me,
particularly because I don't see how the elem parameter is passed
to the getkey function.

I know if I do

  root = tree.getroot()

(from the python.org ElementTree docs) it is possible to step
down through the levels of root with root[0], root[0][0], etc,
and it seems to be possible to iterate with

  for i in root[0][0]:
print(i)

but trying to work root[0][0] into the code has not worked,
and tree[0] is not possible.

How can I get this code to do its work one level down in the xml?

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] python3 equivalent of coreutils stat command

2014-06-14 Thread street . sweeper
With the stat command in GNU coreutils, I can get a file's
modification time, with timezone offset.  For example, the
output of "stat -c %y *" looks like

2014-02-03 14:48:17.0 -0200
2014-05-29 19:00:05.0 -0100

What I want to do is get the mtime in ISO8601 format, and I've
gotten close with os.path.getmtime and os.stat, for example
2014-02-03T14:48:17.  But, no timezone offset.  coreutils stat
can get it, so it must be recorded by the filesystem (ext4 in
this case).  What do I need to do in python to include this
piece of information?

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] sql-like join on two lists or dictionaries

2015-02-14 Thread street . sweeper
Hello all,

Basically what I have here is header and line data for sales or purchase
orders, and I'm trying to do a sql-like join to bring them together
(which ultimately I did because I couldn't figure this out :)).  I've
managed to get the files into python using string slicing, that's not a
problem.

headers - h.dat

B134542Bob  ZQ775235
B875432Joe  ZQ987656
B567943SteveZQ256222

lines - l.dat

B134542   112342   0012
B134542   176542   0001
B875732   765420003
B567943   654565   0001
B567943   900011   0001

desired result - hl.dat

B134542   112342   0012BobZQ775235
B134542   176542   0001BobZQ775235
B875732   765420003JoeZQ987656
B567943   654565   0001Steve  ZQ256222
B567943   900011   0001Steve  ZQ256222



in python3 on linux:

#!/usr/bin/env python3

import os

basepath=os.path.join(os.path.expanduser('~'),'temp',)
linefile=os.path.join(basepath,'l.dat')
headerfile=os.path.join(basepath,'h.dat')

with open(headerfile) as h, open(linefile) as l:
  lines = l.readlines()
  headers = h.readlines()

llist = [[linedata[0:7],
  linedata[14:23],
  linedata[23:27]] for linedata in lines]

hlist = [[headerdata[0:7],
  headerdata[11:19],
  headerdata[19:28]] for headerdata in headers]

ldict = [{linedata[0:7]:
  [linedata[14:23],
   linedata[23:27]]} for linedata in lines]

hdict = [{headerdata[0:7]:
  [headerdata[11:19],
   headerdata[19:28]]} for headerdata in headers]

# :)

quit()



Details on the data are that it's a one or many lines to one header
relationship, at least one of each will exist in each file, and
performance probably isn't an issue as it will only be a few tens to
about 100 lines maximum in the lines file.  The match string will be the
0:7 slice.

You can probably guess my questions: should I be making lists or
dictionaries out of this data, and then of course, what should I do with
them to arrive at the combined file?  I saw some examples of joining two
two-item lists, or dictionaries with a single string as the value, but I
couldn't seem to adapt them to what I'm doing here. I also ran across
the dict.extend method, but looking at the result, I didn't think that
was going to go anywhere, particularly with the one to many
headers:lines relationship.

After a while I pulled this into a sqlite file in memory and did the
join.  Using writelines I think I'll be able to get it out to a file,
but it seems to me that there's probably a way to do this without
resorting to sql.  Or is there?

Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] removing xml elements with ElementTree

2019-03-20 Thread street . sweeper
An opportunity to work in Python, and the necessity of working with some XML 
too large to visualize, got me thinking about an answer Alan Gauld had written 
to me a few years ago 
(https://mail.python.org/pipermail/tutor/2015-June/105810.html).  I have 
applied that information in this script, but I have another question :)

Let's say I have an xml file like this:

-- order.xml 


Bob
321 Main St


D20
4


CS211
1


BL5
7


AC400
1




-- end order.xml 

Items CS211 and AC400 are not valid items, and I want to remove their 
 nodes.  I came up with the following (python 3.6.7 on linux):

 xml_delete_test.py 

import os
import xml.etree.ElementTree as ET

hd = os.path.expanduser('~')
inputxml = os.path.join(hd,'order.xml')
outputxml = os.path.join(hd,'fixed_order.xml')

valid_items = ['D20','BL5']

tree = ET.parse(inputxml)
root = tree.getroot()
saleslines = root.find('saleslines').findall('salesline')
for e in saleslines[:]:
if e.find('item').text not in valid_items:
saleslines.remove(e)

tree.write(outputxml)

-- end xml_delete_test.py --

The above code runs without error, but simply writes the original file to disk. 
 The desired output would be:

-- fixed_order.xml 


Bob
321 Main St


D20
4


BL5
7




-- end fixed_order.xml 

What I find particularly confusing about the problem is that after running 
xml_delete_test.py in the Idle editor, if I go over to the shell and type 
saleslines, I can see that it's now a list of two elements.  I run the 
following:

for i in saleslines:
print(i.find('item').text)

and I see that it's D20 and BL5, my two valid items.  Yet when I write tree out 
to the disk, it has the original four.  Do I need to refresh tree somehow?

Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor