Hi, I am writing code to sort the columns according to the sum of each column. The dataset is huge (50k rows x 300k cols), so i have to read line by line and do the summation to avoid the out-of-memory problem. But I don't know why it runs very slow, and part of the code is as follows. Can anyone point out what needs to be modified to make it run fast? thanks in advance!
... from numpy import * ... currSum = zeros(self.componentcount) currRow = zeros(self.componentcount) for featureDict in self.featureDictList: currRow[:] = 0 for components in self.componentdict1: if featureDict.has_key(components): col = self.componentdict1[components] value = featureDict[components] currRow[col]=value; currSum = currSum + row; ...
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion