Working with Cursors

2012-04-17 Thread timlash
Searched the web and this forum without satisfaction.  Using Python 2.7 and 
pyODBC on Windows XP I can get the code below to run and generate two cursors 
from two different databases without problems.  Ideally, I'd then like to join 
these result cursors thusly:

SELECT a.state, sum(b.Sales) FROM cust_curs a INNER JOIN fin_curs b ON 
a.Cust_id = b.Cust_id GROUP BY a.state

Is there a way to join cursors using SQL statements in python or pyODBC?  Would 
I need to store these cursors in a common DB (SQLite3?) to accomplish this?  Is 
there a pure python data handling approach that would generate this summary 
from these two cursors?

Thanks for your consideration.


Working code:

import pyodbc

#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()

fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales 
FROM Finance.Sales_Tbl 
GROUP BY Cust_id""")


#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()

cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state  
FROM Customers.Cust_Data""")


-- 
http://mail.python.org/mailman/listinfo/python-list


Numpy Performance

2009-04-23 Thread timlash
Still fairly new to Python.  I wrote a program that used a class
called RectangularArray as described here:

class RectangularArray:
   def __init__(self, rows, cols, value=0):
  self.arr = [None]*rows
  self.row = [value]*cols
   def __getitem__(self, (i, j)):
  return (self.arr[i] or self.row)[j]
   def __setitem__(self, (i, j), value):
  if self.arr[i]==None: self.arr[i] = self.row[:]
  self.arr[i][j] = value

This class was found in a 14 year old post:
http://www.python.org/search/hypermail/python-recent/0106.html

This worked great and let me process a few hundred thousand data
points with relative ease.  However, I soon wanted to start sorting
arbitrary portions of my arrays and to transpose others.  I turned to
Numpy rather than reinventing the wheel with custom methods within the
serviceable RectangularArray class.  However, once I refactored with
Numpy I was surprised to find that the execution time for my program
doubled!  I expected a purpose built array module to be more efficient
rather than less.

I'm not doing any linear algebra with my data.  I'm working with
rectangular datasets, evaluating individual rows, grouping, sorting
and summarizing various subsets of rows.

Is a Numpy implementation overkill for my data handling uses?  Should
I evaluate prior array modules such as Numeric or Numarray?  Are there
any other modules suited to handling tabular data?  Would I be best
off expanding the RectangularArray class for the few data
transformation methods I need?

Any guidance or suggestions would be greatly appreciated!

Cheers,

Tim
--
http://mail.python.org/mailman/listinfo/python-list


Re: Numpy Performance

2009-04-24 Thread timlash
Thanks for your replies.

@Peter - My arrays are not sparse at all, but I'll take a quick look
as scipy.  I also should have mentioned that my numpy arrays are of
Object type as each data point (row) has one or more text labels for
categorization.

@Robert - Thanks for the comments about how numpy was optimized for
bulk transactions.  Most of the processing I'm doing is with
individual elements.

Essentially, I'm testing tens of thousands of scenarios on a
relatively small number of test cases.  Each scenario requires all
elements of each test case to be scored, then summarized, then sorted
and grouped with some top scores captured for reporting.

It seems like I can either work toward a procedure that features
indexed categorization so that my arrays are of integer type and a
design that will allow each scenario to be handled in bulk numpy
fashion, or expand RectangularArray with custom data handling methods.

Any other recommended approaches to working with tabular data in
Python?

Cheers,

Tim
--
http://mail.python.org/mailman/listinfo/python-list