date:20121016

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

2012-10-16 Thread Osemeka Osuagwu

On Sun, 14 Oct 2012 16:21:38 +0100 Alan Gauld wrote:
>
>> In the code below, I can't seem to get around calling an instance
>> method (__reality_check()) from within a class method (update_grid()),
>
> Of course you can't :)
>
> Since the class method has no access to the instance, it cannot call
> instance methods, since it doesn't know which instance to use!
>
> Instead, update_grid needs to be an instance method. Of course it
> does! It needs to update a specific grid, right? That will be the
> instance.

All of a sudden it's clear as day!

>
> Class methods are generally used for alternate constructor functions, e.g.
> dict.fromkeys. Since it makes no difference whether you call fromkeys
> from the class (dict) or from an instance, it is made a classmethod.

...this makes a whole lot of sense now.

>>
>> class Grid:
>>  '''Grid(rows = 26, columns = 26) ->  Provides a world object
>> for the cells in Game of Life'''
>>  from time import sleep
>>  from os import system
>>  from sys import platform
>
> Move the imports outside of the class. They should go at the start of
> the module.
>
> As written, your class has attributes Grid.sleep, Grid.system, etc. Since
> these attributes have nothing to do with Grids, they don't belong in the
> Grid class. Move them out.
>
> Classes are not catch-all collections of unrelated methods. That's
> what modules are for :)

I see, thanks.


>>  @staticmethod
>>  def display_grid(generation, os_type):
>>  try:
>>  (lambda x:Grid.system('cls') if 'win' in
>> os_type else Grid.system('clear'))(1)
>
>
> That's wrong! Your display_grid method uses the class (Grid), so it
> shouldn't be a static method.
>
> Move the imports to the module level, and display_grid as a regular
> function:
>
> def display_grid(generation, array):
>  clear_screen()
>  print 'Now in Generation %d' % generation
>  for line in array:
>  print line
>
> def clear_screen(platform=sys.platform()):
>  # The default here is set once, and defaults to your platform.
>  if 'win' in platform:
>  command = 'cls'
>  else:
>  command = 'clear'
>  os.system(command)
>
>
> Or leave display_grid as an instance method:
>
>
> class Grid:
>  def display_grid(self, generation):
>  clear_screen()
>  print 'Now in Generation %d' % generation
>  for line in self._array:
>  print line
>
>
> Either way, clear_screen has nothing to do with grids, so it should
> not be a method of Grid.

After looking at the second option, I thought; Of course! After all
it's "class Grid" so as a "display_grid" instance method makes sense.
I get the logic of it now.

>>  except:
>>  print 'Cannot display Grid'
>
> Never use a bare except like that. Bare except should *only* be used
> at the interactive interpreter, for lazy people who want to avoid
> typing. It should never be used non-interactively, it's simply poor
> practice.
>
> The problem with bare excepts is that they cover up too much. Your aim
> is to write code that works. If it has a bug, you need to *see* the bug
> so you can identify it and then fix it. Bare excepts make bugs
> invisible.
>

I admit I used that to avoid dealing with 'exception tantrums' during
debugging, so that the program wouldn't halt if something went wrong
at that point. It was going away anyway. I'll avoid it henceforth.

>>  @staticmethod
>>  def extend_grid(thickness = 4, force_extend = False):
>>  '''extend_grid([thickness][, force_extend]) -->
>> Extends the edges of the grid by 4 units in
>> all directions if the boundary of the simulation comes close to the
>> edge. Extends anyway if force_extend option is True'''
>
> Why is this a static method?
>
> Which grid is this operating on? It should either take a grid as an
> argument, or extend_grid should be an instance method that operates
> on itself:
>
> # either this top-level function
> def extend_grid(grid, thickness=4, force=False):
>  # do stuff to the grid to extend it
>  # and then return it
>  return grid
>
>
> # or make this a regular instance method
> class Grid:
>  def extend_grid(self, thickness=4, force=False):
>  # do stuff to self to extend it
>  # no need to return anything

Just to clarify, if I went with the top level function; then I guess
I'll define a 'name' attribute for each individual grid and then pass
that name in place of the 'grid' argument in your example. Is this
correct?

>>  '''Runs the 'Life' World simulation for the specified
>> number of generations at the given speed'''
>>  #control = input
>>  #while
>>  for times in range(generations + 1):
>
> This contains an off-by-one bug. If you pass generations=100 as argument,
> it will run 101 times instead.
>
> Instead, if you want your generations t

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

2012-10-16 Thread Alan Gauld


On 16/10/12 08:53, Osemeka Osuagwu wrote:


# or make this a regular instance method
class Grid:
  def extend_grid(self, thickness=4, force=False):
  # do stuff to self to extend it
  # no need to return anything


Just to clarify, if I went with the top level function; then I guess
I'll define a 'name' attribute for each individual grid and then pass
that name in place of the 'grid' argument in your example. Is this
correct?


Yes but then it wouldn't be OOP. You'd be back in the world of 
traditional procedural programming passing explicit data references 
around. Its much better to make it an instance method with the self 
reference being passed implicitly



  def edit_cell(self, cells, state = '##'):
  cells = cells
  for eachcell in cells:
  Grid.__array[eachcell[0]][eachcell[1]] = state
  return

...
As you have written this, you cannot have two Grids. Actually you can, but
since they both share the same state, you cannot have two DIFFERENT Grids.


I don't quite get this one; I intended state to be just the value of
the particular cell in the concerned instance. I don't think it gets
shared by all the grids.


Notice the line:

Grid.__array[] = state

The array is shared by all instances because its an attribute of the 
class not the instance. You want every instance to have its own 
__array[] of cells.  ie. self.__array[]


Incidentally the cells = cells line is pointless, it does nothing.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

2012-10-16 Thread eryksun

On Tue, Oct 16, 2012 at 3:53 AM, Osemeka Osuagwu  wrote:
> On Sun, 14 Oct 2012 16:21:38 +0100 Alan Gauld wrote:
>
> Just to clarify, if I went with the top level function; then I guess
> I'll define a 'name' attribute for each individual grid and then pass
> that name in place of the 'grid' argument in your example. Is this
> correct?

No, you'd just pass a grid object, but you probably want an instance
method, anyway. Perhaps some background can clarify.

Here's a function and a class that have no relation:

>>> def f(grid): pass
>>> class Test(object): pass

The __get__ method-wrapper of the function can bind it as a method of
a Test object:

>>> f.__get__(Test(), Test)
>

If f is accessed as an attribute of Test, it behaves the same as above:

>>> Test.f = f
>>> Test().f
>

The first argument of a function, when the function is intended to be
a method, is conventionally called "self". The bound method created by
__get__ automatically supplies this argument for you. If you instead
make the function global (module level), it's best to use a name that
suggests the kind of object you expect as the first argument (e.g.
grid). But really, if it expects to work on a particular grid, you
should make it a method of Grid.

Other kinds of methods:

If the instance is None, __get__ returns an unbound method:

>>> f.__get__(None, Test)

A classmethod binds to the class instead of an instance:

>>> classmethod(f).__get__(None, Test)
>

>>> classmethod(f).__get__(Test(), Test)
>

When a function is intended to be a classmethod, it's common to name
the first argument "cls".

Finally, a staticmethod has no binding at all. Its __get__
method-wrapper simply returns the wrapped function:

>>> staticmethod(f).__get__(None, Test)

>>> staticmethod(f).__get__(Test(), Test)

There's not much use for this.

Refer to the Descriptor HowTo Guide for more details:

http://docs.python.org/howto/descriptor.html

> I don't quite get this one; I intended state to be just the value of
> the particular cell in the concerned instance. I don't think it gets
> shared by all the grids. The edit method was meant as a way of seeding
> the initial cell configuration, so it just writes whatever it is
> passed in the 'state' arg over that cell location; then again, I'm
> probably missing something here...

Grid.__array is used throughout the class definition. This is a class
attribute. Instances will not get their own copy. Instead, create the
list of lists in Grid.__init__:

def __init__(self, rows=26, cols=26):
'''Create a rows x cols grid'''
self._array = [[''] * cols for i in range(rows)]
print 'A %d x %d cell World has been Created' % (rows, cols)

Note that this uses a list comprehension to create each row as a
unique list. If instead it multiplied a list literal by a number, each
row would be the *same* list. For example:

>>> rows = [[''] * 2] * 2
>>> rows[0][0] = '##'
>>> rows
[['##', ''], ['##', '']]

Also note that it uses _array with a single leading underscore. This
is the convention for 'private' data or methods. It signals to other
developers that this is not part of the public API. Using a leading
double underscore causes the compiler to use name mangling in order to
prevent a subclass from easily overriding an attribute; it's used
infrequently.

Here's how the edit_cell method changes now that _array is instance data:

def edit_cell(self, cells, state='##'):
'''cells is a list of tuples containing coordinates of
the cells to edit
'''
for row, col in cells:
self._array[row][col] = state
return
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extract uri from beautiful soup string

2012-10-16 Thread Norman Khine

On Tue, Oct 16, 2012 at 6:40 AM, eryksun  wrote:
> On Mon, Oct 15, 2012 at 1:17 PM, Norman Khine  wrote:
>>
>> i made an update: https://gist.github.com/3891927 which works based on
>> some of the recommendations posted here.
>>
>> any suggestions for improvement?
>
> I can't make any specific recommendations about using BeautifulSoup
> since I haven't used it in a long time. Here are a few general
> suggestions.
>
> PEP 8 recommends using 4-space indents instead of tabs, if possible.
>
> Move constant definitions out of loops. For example, there's no point
> to repeatedly building the dict literal and assigning it to "headers"
> for each page_no. Also, "headers" isn't used. You need to make a
> Request object:
>
> http://docs.python.org/library/urllib2#urllib2.Request
>
>
> Don't put unnecessary parentheses (round brackets) around assigned
> expressions. For example, replace
>
> req = ('%s%s' % (origin_site, page_no))
> ...
> assoc_link = (tag.string)
>
> with
>
> req = '%s%s' % (origin_site, page_no)
> ...
> assoc_link = tag.string
>
> I'd also move the "%s" into origin_site:
>
> origin_site = ('http://typo3.nimes.fr/index.php?'
>'id=annuaire_assos&theme=0&rech=&num_page=%s')
>
> req = origin_site % page_no
>
> The parentheses around the string literal are necessary because it's
> split over two lines. (I use Gmail's webmail interface, which line
> wraps at 69 columns -- sometimes; it tries to be clever.)
>
> Use "continue" to avoid unnecessary nesting:
>
> for page_no in pages:
> ...
> try:
> doc = urllib2.urlopen(req)
> except urllib2.URLError, e:
> continue
> soup = BeautifulSoup.BeautifulSoup(doc)
> for row in soup.findAll('tr', {'class': 'menu2'}):
> assoc_data = []
> ...
> for i in soup.findAll('a', {'href': '#'}):
> if 'associations' not in i.attrMap['onclick']:
> continue
> req = i.attrMap['onclick'].split("'")[1]
> try:
> doc = urllib2.urlopen(req)
> except urllib2.URLError, e:
> continue
> soup = BeautifulSoup.BeautifulSoup(doc)
>
>
> Don't test "if emails != []". Use "if emails" instead. A non-empty
> list is always True, and an empty list is always False.
>
> Finally, in several places you create an empty list and populate it
> with a for loop. For simple cases like these, using comprehensions and
> generator expressions executes faster, yet it's still easy to code and
> easy to understand. For example, you can replace the following:
>
> assoc_data = []
> for assoc_theme in soup.findAll('u'):
> assoc_data.append(assoc_theme.renderContents())
> for assoc_name in soup.findAll('td', {'width': '70%'}):
> assoc_data.append(assoc_name.renderContents())
>
> with something like this:
>
> assoc_data = [assoc_theme.renderContents()
> for assoc_theme in soup.findAll('u')]
> assoc_data.extend(assoc_name.renderContents()
> for assoc_name in soup.findAll('td', {'width': '70%'}))
>
> At least to me this is just as readable as the original, and the
> generated code is more efficient. If, however, BeautifulSoup is the
> limiting factor here, the efficiency gain will be chump change in the
> big picture. Still, in a simple case like this it's hard to argue
> against using a comprehension or generator expression. In cases that
> use complex expressions for the value and conditions, I think it makes
> more sense to use a for loop, which is easier to read and debug.

thanks, i made the changes https://gist.github.com/3891927

-- 
%>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or
chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extract uri from beautiful soup string

2012-10-16 Thread eryksun

On Tue, Oct 16, 2012 at 7:52 AM, Norman Khine  wrote:
>
> thanks, i made the changes https://gist.github.com/3891927

On line 67, use the result of soup.findAll directly:

assoc_data.extend(assoc_cont.renderContents() for assoc_cont in
soup.findAll('td', {'width': '49%', 'class': 'menu2' }))

On line 72, can't the result of findAll() be subscripted, if you only
want the first item? For example:

assoc_tel = soup.findAll('td', {'width': '45%', 'class': 'menu2'})
assoc_data.append(assoc_tel[0].renderContents())

On line 80 you can use writerows instead:

with open('nimes_assoc.csv', 'wb') as f:
csv.writer(f).writerows(assoc_table)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] managing memory large dictionaries in python

2012-10-16 Thread Abhishek Pratap

Hi Guys

For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.
I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern in this problem and persistence
is not needed as the counting will only be done once on the data. We
want the script to run on smaller memory machines if possible.

I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1  and if not then insert
the key into dbase.

Just want to take your opinion on this.

Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Prasad, Ramit

Abhishek Pratap wrote:
> Sent: Tuesday, October 16, 2012 11:57 AM
> To: tutor@python.org
> Subject: [Tutor] managing memory large dictionaries in python
> 
> Hi Guys
> 
> For my problem I need to store 400-800 million 20 characters keys in a
> dictionary and do counting. This data structure takes about 60-100 Gb
> of RAM.
> I am wondering if there are slick ways to map the dictionary to a file
> on disk and not store it in memory but still access it as dictionary
> object. Speed is not the main concern in this problem and persistence
> is not needed as the counting will only be done once on the data. We
> want the script to run on smaller memory machines if possible.
> 
> I did think about databases for this but intuitively it looks like a
> overkill coz for each key you have to first check whether it is
> already present and increase the count by 1  and if not then insert
> the key into dbase.
> 
> Just want to take your opinion on this.
> 
> Thanks!
> -Abhi

I do not think that a database would be overkill for this type of task.
Your process may be trivial but the amount of data it has manage is not 
trivial. You can use a simple database like SQLite. Otherwise, you 
could create a file for each key and update the count in there. It will
run on a small amount of memory but will be slower than using a db.

# Pseudocode
key = get_key()
filename = os.path.join(directory, key)
if os.path.exists(filename):
# read and update count
else:
with open(os.path.join(directory, key), 'w') as f:
f.write('1')

Given that SQLite is included in Python and is easy to use, I would just
use that.

-Ramit

This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread emile

On 10/16/2012 01:03 PM, Prasad, Ramit wrote:

Abhishek Pratap wrote:

Sent: Tuesday, October 16, 2012 11:57 AM
To: tutor@python.org
Subject: [Tutor] managing memory large dictionaries in python

Hi Guys

For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.
I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern in this problem and persistence
is not needed as the counting will only be done once on the data. We
want the script to run on smaller memory machines if possible.

I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1  and if not then insert
the key into dbase.

Just want to take your opinion on this.

Thanks!
-Abhi

I do not think that a database would be overkill for this type of task.

Agreed.

Your process may be trivial but the amount of data it has manage is not 
trivial. You can use a simple database like SQLite. Otherwise, you
could create a file for each key and update the count in there. It will
run on a small amount of memory but will be slower than using a db.

Well, maybe -- depends on how many unique entries exist.  Most vanilla 
systems are going to crash (or give the appearance thereof) if you end 
up with millions of file entries in a directory.  If a filesystem based 
answer is sought, I'd consider generating 16-bit CRCs per key and 
appending the keys to the CRC named file, then pass those, sort and do 
the final counting.

Emile

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Oscar Benjamin

On 16 October 2012 21:03, Prasad, Ramit  wrote:
> Abhishek Pratap wrote:
>> Sent: Tuesday, October 16, 2012 11:57 AM
>> To: tutor@python.org
>> Subject: [Tutor] managing memory large dictionaries in python
>>
>> Hi Guys
>>
>> For my problem I need to store 400-800 million 20 characters keys in a
>> dictionary and do counting. This data structure takes about 60-100 Gb
>> of RAM.
>> I am wondering if there are slick ways to map the dictionary to a file
>> on disk and not store it in memory but still access it as dictionary
>> object. Speed is not the main concern in this problem and persistence
>> is not needed as the counting will only be done once on the data. We
>> want the script to run on smaller memory machines if possible.
>>
>> I did think about databases for this but intuitively it looks like a
>> overkill coz for each key you have to first check whether it is
>> already present and increase the count by 1  and if not then insert
>> the key into dbase.
>>
>> Just want to take your opinion on this.
>>
>> Thanks!
>> -Abhi
>
> I do not think that a database would be overkill for this type of task.

Neither do I but I think that there are also ways to make it more
memory efficient without the use of disk storage. I suspect that the
bulk of the memory usage is for the strings and there are often ways
to store them more efficiently in memory.

For example, are you using Python 3? You might be able to reduce the
memory consumption by 25-50% by using byte strings instead of unicode
strings (assuming that the keys are ascii).

Further gains are possible if your strings only use a subset of ascii
characters, as is the case for e.g. hex strings. If you can map the
strings reversibly to integers with a function (rather than a dict)
then you should be able to achieve a significant reduction in memory
usage by using ints as the dictionary keys.

A 20 character hex key can theoretically be stored with 80 bits or 10
bytes. If you could hit this limit then you would only need 4-8 GB of
memory for the strings themselves (which may still be too much).

> Your process may be trivial but the amount of data it has manage is not 
> trivial. You can use a simple database like SQLite. Otherwise, you
> could create a file for each key and update the count in there. It will
> run on a small amount of memory but will be slower than using a db.
>
> # Pseudocode
> key = get_key()
> filename = os.path.join(directory, key)
> if os.path.exists(filename):
> # read and update count
> else:
> with open(os.path.join(directory, key), 'w') as f:
> f.write('1')

Depending on the file system this could require a large amount of disk
space. If each file needed 4096 bytes of disk space then you would
need around 2-3 TB of disk space with this solution.

>
> Given that SQLite is included in Python and is easy to use, I would just
> use that.

I would also try this. A standard database solution will likely give
the least headaches in the long run.

Oscar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Alan Gauld


On 16/10/12 17:57, Abhishek Pratap wrote:


For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.


Thats a lot of records but without details of what kind of counting you 
plan on we can't give specific advice.



I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern


The trivial solution is to use shelve since that makes a file look like 
a dictionary. There are security issues but they don't sound like they'd 
be a problem. I've no idea what performance of shelve would be like with 
that many records though...



I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1  and if not then insert
the key into dbase.


The database does all of that automatically and fast.

You just need to set it up, load the data and use it - probably around 
50 lines of SQL... And you don't need anything fancy for a single table 
database - Access, SQLite, even FoxPro...


Or you could just create a big text file and process it line by line if 
the data fits that model. Lots of options.


Personally I'd go with a database for speed, flexibility and ease of coding.


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Mark Lawrence


On 17/10/2012 01:03, Alan Gauld wrote:



You just need to set it up, load the data and use it - probably around
50 lines of SQL... And you don't need anything fancy for a single table
database - Access, SQLite, even FoxPro...



For the record Access is not a database, or so some geezer called Alex 
Martelli reckons http://code.activestate.com/lists/python-list/48130/, 
so please don't shoot the messenger:)


--
Cheers.

Mark Lawrence.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Alexander

On Tue, Oct 16, 2012 at 20:43 EST, Mark Lawrence
 wrote:
> For the record Access is not a database, or so some geezer called Alex
> Martelli reckons http://code.activestate.com/lists/python-list/48130/, so
> please don't shoot the messenger:)
> Cheers.
> Mark Lawrence.

Mark I don't believe your response is relevant or helpful to the
original post so please don't hijack.


-- 
7D9C597B
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] CSV -> sqlite tables with foreign keys

2012-10-16 Thread Monte Milanuk

Hello,

I'm working on a python script to take the sql script, create a sqlite3
database, create the tables, and then populate them using the info from the
csv file.
The sticking point seems to be creating the foreign keys between the tables

I've got a data file with lines like this:

"John","G.","Smith","1972-11-10","123 Any
Place","Somewhere","Missouri","58932"

I have an SQL script set up to create the tables with the appropriate
fields.

What I have so far that works is something like this:

try:
data = open(CSV_FILE, 'rb')
reader = csv.reader(data)
for row in reader:
cursor.execute('''INSERT INTO people (first_name, mid_init,
last_name, birth_date)
VALUES (?,?,?,?)''', row[0:4])

person = cursor.lastrowid
address = list(row)
address.append(person)
row = tuple(address)

cursor.execute('''INSERT INTO physical_address (street, city,
state, zip_code,person)
VALUES (?,?,?,?,?)''', row[4:])
finally:
data.close()


It works... but from what I've found on the web, I get the distinct
impression that converting from a tuple to a list and back is considered
poor practice at best and generally to be avoided.

I'm not really sure how else to go about this, though, when I need to split
one row from a CSV file across two (or more) tables in a database, and
maintain some sort of relation between them.

Any suggestions?


Thanks,

Monte
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Abhishek Pratap

On Tue, Oct 16, 2012 at 7:22 PM, Alexander  wrote:
> On Tue, Oct 16, 2012 at 20:43 EST, Mark Lawrence
>  wrote:
>> For the record Access is not a database, or so some geezer called Alex
>> Martelli reckons http://code.activestate.com/lists/python-list/48130/, so
>> please don't shoot the messenger:)
>> Cheers.
>> Mark Lawrence.
>
> Mark I don't believe your response is relevant or helpful to the
> original post so please don't hijack.
>
>
> --
> 7D9C597B
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

Thanks guys..I think I will try shelve and sqlite. I dont think
creating millions of files (one for each key) will make the sys
admins/file system happy.

Best,
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] CSV -> sqlite tables with foreign keys

2012-10-16 Thread Hugo Arts

On Wed, Oct 17, 2012 at 5:59 AM, Monte Milanuk  wrote:

> Hello,
>
> I'm working on a python script to take the sql script, create a sqlite3
> database, create the tables, and then populate them using the info from the
> csv file.
> The sticking point seems to be creating the foreign keys between the tables
>
> I've got a data file with lines like this:
>
> "John","G.","Smith","1972-11-10","123 Any
> Place","Somewhere","Missouri","58932"
>
> I have an SQL script set up to create the tables with the appropriate
> fields.
>
> What I have so far that works is something like this:
>
> try:
> data = open(CSV_FILE, 'rb')
> reader = csv.reader(data)
> for row in reader:
> cursor.execute('''INSERT INTO people (first_name, mid_init,
> last_name, birth_date)
> VALUES (?,?,?,?)''', row[0:4])
>
> person = cursor.lastrowid
> address = list(row)
> address.append(person)
> row = tuple(address)
>
> cursor.execute('''INSERT INTO physical_address (street, city,
> state, zip_code,person)
> VALUES (?,?,?,?,?)''', row[4:])
> finally:
> data.close()
>
>
> It works... but from what I've found on the web, I get the distinct
> impression that converting from a tuple to a list and back is considered
> poor practice at best and generally to be avoided.
>
> I'm not really sure how else to go about this, though, when I need to
> split one row from a CSV file across two (or more) tables in a database,
> and maintain some sort of relation between them.
>
> Any suggestions?
>
>
> Thanks,
>
> Monte
>

Well, converting to a list and back is a little superfluous if all you need
to do is add a single element. You could just construct a new tuple like so:

# now address would look like ("123 Any
Place","Somewhere","Missouri","58932", cursor.lastrowid)
address_tuple = row[4:] + (cursor.lastrowid,)
cursor.execute('''INSERT INTO physical_address (street, city, state,
zip_code,person) VALUES (?,?,?,?,?)''', address_tuple)

Note that we put cursor.lastrowid into a 1-element tuple so we can + the
two together. It might look a tad clunky but it's pretty easy to read.

Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Tutor Digest, Vol 104, Issue 65

2012-10-16 Thread Osemeka Osuagwu

On Tue, 16 Oct 2012 10:30:54 +0100 Alan Gauld wrote

>
> Yes but then it wouldn't be OOP. You'd be back in the world of
> traditional procedural programming passing explicit data references
> around. Its much better to make it an instance method with the self
> reference being passed implicitly
>

I guess so, implicit is better than explicit :)

Thanks Alan.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

Re: [Tutor] Tutor Digest, Vol 104, Issue 60

Re: [Tutor] extract uri from beautiful soup string

Re: [Tutor] extract uri from beautiful soup string

[Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] managing memory large dictionaries in python

[Tutor] CSV -> sqlite tables with foreign keys

Re: [Tutor] managing memory large dictionaries in python

Re: [Tutor] CSV -> sqlite tables with foreign keys

Re: [Tutor] Tutor Digest, Vol 104, Issue 65

16 matches

Site Navigation

Mail list logo

Footer information