searching and storing large quantities of xml!

2010-01-16 Thread dads
I work in as 1st line support and python is one of my hobbies. We get
quite a few requests for xml from our website and its a long strung
out process. So I thought I'd try and create a system that deals with
it for fun.

I've been tidying up the archived xml and have been thinking what's
the best way to approach this issue as it took a long time to deal
with big quantities of xml. If you have 5/6 years worth of 26000+
5-20k xml files per year. The archived stuff is zipped but what is
better, 26000 files in one big zip file, 26000 files in one big zip
file but in folders for months and days, or zip files in zip files!

I created an app in wxpython to search the unzipped xml files by the
modified date and just open them up and just using the something like
l.find('>%s<' % fiveDigitNumber) != -1: is this quicker than parsing
the xml?

Generally the requests are less than 3 months old so that got me into
thinking should I create a script that finds all the file names and
corresponding web number of old xml and bungs them into a db table one
for each year and another script that after everyday archives the xml
and after 3months zip it up, bungs info into table etc. Sorry for the
ramble I just want other peoples opinions on the matter. =)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: searching and storing large quantities of xml!

2010-01-18 Thread dads
Thanks all, took your advice and have been playing all weekend which
has been great fun. ElementTree is awesome. I created a script that
organises the xml as they're in year blocks and I didn't realise the
required xml is mixed up with other xml. Plus the volumes are much
greater than I realised, I checked as back at work and it was
something like 600,000 files in a year, just over a gig for each
year.

I'm going to add zipping up of the files and getting the required info
and putting it in a db this week hopefully. It's been completely
overhauled, originally I used modified date now it gets the date from
the parsed xml, safer that way. The code is below but word of caution,
it's hobbyist code so it'll probably make your eyes bleed =), thanks
again:

There was one thing that I forgot about - when ElementTree fails to
parse due to an element not being closed why doesn't it close the file
like object. As later on I would raise 'WindowsError: [Error
32] ...file being used by other process' when using shutil.move(). I
got round this by using a 'try except' block.

from __future__ import print_function
import xml.etree.cElementTree as ET
import calendar
import zipfile
import os.path
import shutil
import zlib
import os


class Xmlorg(object):

def __init__(self):

self.cwd = os.getcwd()
self.year = os.path.basename(self.cwd)

def _mkMonthAndDaysDirs(self):

''' creates dirs for every month and day of a of specidifed
year.
Works for leap years as well.

(specified)year/(year)month/day


...2010/201001/01
...2010/201001/02
...2010/201001/03 '''


def addZero(n):

if len(str(n)) < 2:
return '0' + str(n)
else:
return str(n)

dim = [ calendar.monthrange(year,month)[1] for year in \
[int(self.year)] for month in range(1,13) ]

count = 1
for n in dim:
month = addZero(count)
count += 1
ym = os.path.join(self.cwd, self.year + month)
os.mkdir(ym)
for x in range(1,n+1):
x = addZero(x)
os.mkdir(os.path.join(ym, x))


def ParseAndOrg(self):

'''requires dir and zip struct:

.../(year)/(year).zip - example .../2008/2008.zip '''


def movef(fp1,fp2):

'''moves files with exception handling'''

try:
shutil.move(fp1,fp2)
except IOError, e:
print(e)
except WindowsError, e:
print(e)

self._mkMonthAndDaysDirs()
os.mkdir(os.path.join(self.cwd, 'otherFileType'))

# dir struct .../(year)/(year).zip - ex. .../2008/2008.zip
zf = zipfile.ZipFile(os.path.join(self.cwd, self.year +
'.zip'))
zf.extractall()
ld = os.listdir(self.cwd)
for i in ld:
if os.path.isfile(i) and i.endswith('.xml'):
try:
tree = ET.parse(i)
except:
print('%s np' % i) #not parsed
root = tree.getroot()
if root.findtext('Summary/FileType') == 'Order':
date = root.findtext('OrderHeader/OrderDate')[:10]
#dd/mm/
dc = date.split('/')
fp1 = os.path.join(self.cwd, i)
fp2 = os.path.join(self.cwd, dc[2] + dc[1], dc[0])
movef(fp1,fp2)
else:
fp1 = os.path.join(self.cwd, i)
fp2 = os.path.join(self.cwd, 'otherFileType')
movef(fp1,fp2)


if __name__ == '__main__':
os.chdir('c:/sv_zip_test/2010/') #remove
xo = Xmlorg()
xo.ParseAndOrg()
-- 
http://mail.python.org/mailman/listinfo/python-list


unexplainable python

2009-09-26 Thread dads
When creating a script that converts digits to words I've come across
some unexplainable python. The script works fine until I use a 5 digit
number and get a 'IndexError: string index out of range'. After
looking into it and adding some print calls, it looks like a variable
changes for no reason. The example beneath is using the digits 34567,
the _5digit function slices 34 off and passes it to the _2digit
function, which works with 2 digit strings but the IndexError is
raised. Please accept my apologies for the explanation, I'm finding it
hard to put into words. Has anyone any idea why it's acting the way it
is?

enter number: 34567
_5digit function used
34 before sent to _2digit
34 slice when at _2digit function
34 before sent to plus_ten function
7 slice when at _2digit function
7 before sent to plus_ten function


from __future__ import print_function
import sys

class number(object):

def __init__(self, number):

#remove any preceding zero's
num = int(number)
self.num = str(num)
self.num = number

self.single =
{'0':'zero','1':'one','2':'two','3':'three','4':'four',
 
'5':'five','6':'six','7':'seven','8':'eight','9':'nine'}
self.teen = {'11':'eleven','12':'twelve','13':'thirteen',
  '14':'fourteen','15':'fifteen','16':'sixteen',
 
'17':'seventeen','18':'eighteen','19':'nineteen'}
self.plus_ten =
{'10':'ten','20':'twenty','30':'thirty','40':'forty',
  '50':'fifty','60':'sixty','70':'seventy',
  '80':'eighty','90':'ninety'}
self._translate()

def _translate(self):

fns = [ i for i in number.__dict__ if 'digit' in i ]
fns.sort()
fn_name = fns[len(self.num)-1]
print(fn_name,'function used')
fn = number.__dict__[fn_name]
print(fn(self, self.num))


def _1digit(self, n):

return self.single[n]

def _2digit(self, n):

print(n, 'slice when at _2digit function')
if '0' in self.num:
return self.plus_ten[n]
elif self.num[0] == '1':
return self.teen[n]
else:
print(n,'before sent to plus_ten function')
var = self.plus_ten[n[0]+'0'] + ' ' + self._1digit(n[1])
return var

def _3digit(self, n):

var = self._1digit(n[0]) + ' hundred and ' + self._2digit(n
[1:])
return var

def _4digit(self, n):

var = self._1digit(n[0]) + ' thousand ' + self._3digit(n[1:])
return var


def _5digit(self, n):

print(n[:2],'before sent to _2digit')
var = self._2digit(n[:2]) + ' thousand ' + self._4digit(n[2:])
return var

class control(object):

def __init__(self):
pass

def data_input(self):


while True:
i = raw_input('enter number: ')
if i == 's':
break
#try:
n = number(i)
#except:
#print('not a number')


if __name__ in '__main__':
c = control()
c.data_input()
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unexplainable python

2009-09-26 Thread dads
Sorry forgot to mention I'm using python 2.6
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unexplainable python

2009-09-27 Thread dads
Thank you for the help, it's amazing what you can't spot. It seems the
harder you look the less likely you're to find the issue. Fresh eyes
make the world of difference.

To Matt and John:

No this certainly isn't homework, I'm 29 and in full time work. I
decided to learn to program about a year ago and picked up python, so
it's one of my hobbies. Starting from level 0 it's been challenging
and fun.

This exercise was just a bit of fun, I got the idea from a forum. I'm
using classes to help me solidify how they work. Unfortunately I don't
have the experience to know that this is a bad place to use them.

-- 
http://mail.python.org/mailman/listinfo/python-list


sqlite3 bug?

2010-02-10 Thread dads
When the method below is run it raises 'sqlite3.OperationalError: no
such table: dave'.
the arguments are ds = a datestamp and w = a string of digits. The
path of the db is
C:\sv_zip_test\2006\2006.db and the table is definitely named dave.
I've run the sql
in sql manager and it works. Is this a bug?


def findArchive(self, ds, w):

year = ds.GetYear()
if year < 2005:
wx.MessageBox('Year out of Archive, check the date!')
return

year = str(year)
archive = 'C:/sv_zip_test'
dbfp = os.path.abspath(os.path.join(archive, year, year +
'.db'))
if os.path.exists(dbfp):
con = sqlite3.connect('dbfp')
cur = con.cursor()
#cur.execute("SELECT * FROM dave WHERE webno = ?", [w])
cur.execute("SELECT * FROM dave")
for r in cur:
self.fil.AppendText(r[2] + '\n')
else:
wx.MessageBox('no db, %s' % dbfp)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sqlite3 bug?

2010-02-12 Thread dads
Thank you

It just highlights that when your tired things can easily be missed
and
maybe you should leave things until the morning to view things with
fresh eyes =)
-- 
http://mail.python.org/mailman/listinfo/python-list


filecmp.dircmp performance

2011-01-08 Thread dads
I'm creating a one way sync program, it's to automate backing up data
over the wan from our shops to a server at head office. It uses
filecmp.dircmp() but the performance seems poor to me.

for x in dc.diff_files:
srcfp = os.path.join(src, x)
self.fn777(srcfp)
if os.path.isfile(srcfp):
try:
shutil.copy2(srcfp, dst)
self.lg.add_diffiles(src, x)
except Exception, e:
self.lg.add_errors(e)

I tested it at a store which is only around 50 miles away on a 10Mbps
line, the directory has 59 files that are under 100KB. When it gets to
dc.diff_files it takes 15mins to complete. Looking at the filecmp.py
it's only using os.stat, it seems excessively long.

code:
http://pastebin.com/QskXGDQT
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Good books in computer science?

2009-06-14 Thread dads
I'm wanting to purchase some of the titles that have been raised in
this thread. When I look they are very expensive books which is
understandable. Do you think getting earlier editions that are cheaper
is a daft thing or should I fork out the extra £10-£30 to get the
latest edition?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Good books in computer science?

2009-06-15 Thread dads
I remember someone earlier in the thread mentioning reading source
code from good coders. I've been wanting to give this a go as it makes
perfect sense, I suppose the standard library would be a good start.
What would your recommendations be, something not too too hard, so I
don't understand.
-- 
http://mail.python.org/mailman/listinfo/python-list


File Syncing

2009-06-19 Thread dads
I've created a small application that when you click one of the
buttons it randomly picks a paragraphs from a list that it generates
from a text file and then copies them to the clipboard. It also has
make new/edit/delete/print/ etc functionality.

It's for work so I get some brownie points and every know and then I
could work on it and learn python while getting paid (heaven) instead
of my normal customer service job (mind I've done 95% of it at home).
I've been allowed to install it on one of the blade servers so one of
the team can use if they connect to that server. Great stuff.

When we normally connect through one of the thin clients we connect
randomly to one of three blade servers. I've just thought that when I
add the app to the other servers they will be completely separate. So
if the the paragraphs which are stored in text files are amended/
deleted/created will only happen on one server and not them all. I've
a couple of questions:

What would happen if more than one person used my application at the
same time? I haven't added any I/O exception code so I think that
would be an issue but would python crash? (it's only got simple
functions and controls in it, no threading or process code or anything
like that, i'd post it but it's 2500lines long)

What would I have to learn to be able to sync the text files on each
server? python network programming? Or something else? Sorry for my
naivety =p
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: File Syncing

2009-06-21 Thread dads
On Jun 20, 11:21 am, Lawrence D'Oliveiro  wrote:
> In message 
> [email protected]>, dads wrote:
> > What would I have to learn to be able to sync the text files on each
> > server?
>
> How big is the text file? If it's small, why not have your script read it
> directly from a master server every time it runs, instead of having a local
> copy.

Yeah the text files will never get bigger than a 100k. I don't think
they have a master server but i'll check. Thanks

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Re: sqlite3 bug?

2010-02-11 Thread wayne . dads . bell

Thank you

It just highlights that when your tired things can easily be missed and
maybe you should leave things until the morning to view things with
fresh eyes =)
-- 
http://mail.python.org/mailman/listinfo/python-list