Working with Cursors
Searched the web and this forum without satisfaction. Using Python 2.7 and
pyODBC on Windows XP I can get the code below to run and generate two cursors
from two different databases without problems. Ideally, I'd then like to join
these result cursors thusly:
SELECT a.state, sum(b.Sales) FROM cust_curs a INNER JOIN fin_curs b ON
a.Cust_id = b.Cust_id GROUP BY a.state
Is there a way to join cursors using SQL statements in python or pyODBC? Would
I need to store these cursors in a common DB (SQLite3?) to accomplish this? Is
there a pure python data handling approach that would generate this summary
from these two cursors?
Thanks for your consideration.
Working code:
import pyodbc
#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()
fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales
FROM Finance.Sales_Tbl
GROUP BY Cust_id""")
#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()
cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state
FROM Customers.Cust_Data""")
--
http://mail.python.org/mailman/listinfo/python-list
Numpy Performance
Still fairly new to Python. I wrote a program that used a class called RectangularArray as described here: class RectangularArray: def __init__(self, rows, cols, value=0): self.arr = [None]*rows self.row = [value]*cols def __getitem__(self, (i, j)): return (self.arr[i] or self.row)[j] def __setitem__(self, (i, j), value): if self.arr[i]==None: self.arr[i] = self.row[:] self.arr[i][j] = value This class was found in a 14 year old post: http://www.python.org/search/hypermail/python-recent/0106.html This worked great and let me process a few hundred thousand data points with relative ease. However, I soon wanted to start sorting arbitrary portions of my arrays and to transpose others. I turned to Numpy rather than reinventing the wheel with custom methods within the serviceable RectangularArray class. However, once I refactored with Numpy I was surprised to find that the execution time for my program doubled! I expected a purpose built array module to be more efficient rather than less. I'm not doing any linear algebra with my data. I'm working with rectangular datasets, evaluating individual rows, grouping, sorting and summarizing various subsets of rows. Is a Numpy implementation overkill for my data handling uses? Should I evaluate prior array modules such as Numeric or Numarray? Are there any other modules suited to handling tabular data? Would I be best off expanding the RectangularArray class for the few data transformation methods I need? Any guidance or suggestions would be greatly appreciated! Cheers, Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: Numpy Performance
Thanks for your replies. @Peter - My arrays are not sparse at all, but I'll take a quick look as scipy. I also should have mentioned that my numpy arrays are of Object type as each data point (row) has one or more text labels for categorization. @Robert - Thanks for the comments about how numpy was optimized for bulk transactions. Most of the processing I'm doing is with individual elements. Essentially, I'm testing tens of thousands of scenarios on a relatively small number of test cases. Each scenario requires all elements of each test case to be scored, then summarized, then sorted and grouped with some top scores captured for reporting. It seems like I can either work toward a procedure that features indexed categorization so that my arrays are of integer type and a design that will allow each scenario to be handled in bulk numpy fashion, or expand RectangularArray with custom data handling methods. Any other recommended approaches to working with tabular data in Python? Cheers, Tim -- http://mail.python.org/mailman/listinfo/python-list
