i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta
the cells in my csv that i wrote looks likes this 
['82#201#426#553#602#621#811#908#1289#1342#1401#1472#1593#1641#1794#2290#2341#2391#3023#3141#3227#3240#3525#3529#3690#3881#4406#4421#4497#4719#4722#4920#5053#5146#5433']
and the cells which are empty looks like ['']
i have tried the following code
import csv
import numpy as np
with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
print(csv_reader.fieldnames)
col_count = print(len(csv_reader.fieldnames))
print(sum(1 for row in csv_file))
for line in csv_reader:
print(line)
 but when i print line it shows nothing 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta
On Sunday, April 12, 2020 at 1:35:10 PM UTC+5:30, Rahul Gupta wrote:
> the cells in my csv that i wrote looks likes this 
> ['82#201#426#553#602#621#811#908#1289#1342#1401#1472#1593#1641#1794#2290#2341#2391#3023#3141#3227#3240#3525#3529#3690#3881#4406#4421#4497#4719#4722#4920#5053#5146#5433']
> and the cells which are empty looks like ['']
> i have tried the following code
> import csv
> import numpy as np
> with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
> csv_reader = csv.DictReader(csv_file)
> print(csv_reader.fieldnames)
> col_count = print(len(csv_reader.fieldnames))
> print(sum(1 for row in csv_file))
> for line in csv_reader:
> print(line)
>  but when i print line it shows nothing
@Peter Otten thanks that problem got solved but now when i am trying to acess a 
particular column for every row in csv i am getting error.
the code used in addition to the above code
for line in enumerate(csv_reader):
    print(line[csv_reader.fieldnames[1]])
the eoors as follows
"C:\Users\Rahul Gupta\PycharmProjects\CSVLearn\venv\Scripts\python.exe" 
"C:/Users/Rahul Gupta/PycharmProjects/CSVLearn/test10.py"
Traceback (most recent call last):
  File "C:/Users/Rahul Gupta/PycharmProjects/CSVLearn/test10.py", line 16, in 

print(line[csv_reader.fieldnames[1]])
TypeError: tuple indices must be integers or slices, not str
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', 
'14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', 
'27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', 
'40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', 
'53', '54', '55', '56', '57', '58', '59', '60', '61', '62', '63', '64', '65', 
'66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78', 
'79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '90', '91', 
'92', '93', '94', '95', '96', '97', '98', '99', '100', '101', '102', '103', 
'104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', 
'115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', 
'126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', 
'137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', 
'148', '149', '150', '151', '152', '153', '154', '155', '156', '157', '
 158', '159', '160', '161', '162', '163', '164', '165', '166', '167', '168', 
'169', '170', '171', '172', '173', '174', '175', '176', '177', '178', '179', 
'180', '181', '182', '183', '184', '185', '186', '187', '188', '189', '190', 
'191', '192', '193', '194', '195', '196', '197', '198', '199', '200', '201', 
'202', '203', '204', '205', '206', '207', '208', '209', '210', '211', '212', 
'213', '214', '215', '216', '217', '218', '219', '220', '221', '222', '223', 
'224', '225', '226', '227', '228', '229', '230', '231', '232', '233', '234', 
'235', '236', '237', '238', '239', '240', '241', '242', '243', '244', '245', 
'246', '247', '248', '249', '250', '251', '252', '253', '254', '255', '256', 
'257', '258', '259', '260', '261', '262', '263', '264', '265', '266', '267', 
'268', '269', '270', '271', '272', '273', '274', '275', '276', '277', '278', 
'279', '280', '281', '282', '283', '284', '285', '286', '287', '288', '289', 
'290', '291', '292', '293', '294', '295', '296', '297', '298', '299']
300

Process finished with exit code 1

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta




import csv
import numpy as np
with open("D:\PHD\obranking\\cell_split_demo.csv", mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file)
print(csv_reader.fieldnames)
col_count = print(len(csv_reader.fieldnames))
#print(sum(1 for row in csv_file))
row_count = 0

for line in enumerate(csv_reader):
print(line[csv_reader.fieldnames[1]])

@peter Otten this above one is Test10.py
@peter otten below i am posting how i created cell_split_demo.csv using test9.py
this is test9.py
import csv
import numpy as np

with open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file1, 
open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file2:
csv_reader1 = csv.DictReader(csv_file1)
csv_reader2 = csv.DictReader(csv_file2)

#csv_contents = list(csv_reader)
#for i in csv_contents:
#print(i['label'])
#print(csv_contents)

filename = "cell_split_demo.csv"
with open("D:\PHD\obranking\\cell_split_demo.csv", 'w') as csvfilew1:
fields = (range(0, 300))
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()

for i, row in enumerate(csv_reader1):
Mat = np.full([1, 300], '', dtype='object')
matrixrows = dict().fromkeys(fields)
for j, line in enumerate(csv_reader2):
if j != 300:
matrixrows[j] = []
if row['label'] != line['label']:
for k in range(1,5502):
if row[csv_reader1.fieldnames[k]] != 
line[csv_reader2.fieldnames[k]]:
if Mat[0][j] == '':
Mat[0][j] = str(k)
else:
Mat[0][j] += '#' + str(k)
#print(Mat[0][j])
print(i)
#print(j)
matrixrows[j].append(Mat[0][j])
if j == 299:
csvwriter1.writerow(matrixrows)
csv_file2.seek(0)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: i am want to read data from the csv that i wrote using python csv module but apart from filed names and row count i am unable to read rest of the data

2020-04-12 Thread Rahul Gupta
@Peter Thanks alot
-- 
https://mail.python.org/mailman/listinfo/python-list


To apply pca for a large csv

2020-04-14 Thread Rahul Gupta
Hello all, i have a csv of 1 gb which consists of 25000 columns and 2 rows. 
I want to apply pca so i have seen sciki-learn had inbuilt fucntionality to use 
that. But i have seen to do eo you have to load data in data frame. But my 
machine is i5 with 8 gb of ram which fails to load all this data in data frame 
and shows memory error. Is there any alternative way that still i could aaply 
PCA on the same machine to the same rata set
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To apply pca for a large csv

2020-04-14 Thread Rahul Gupta
64 bit version
-- 
https://mail.python.org/mailman/listinfo/python-list


Incremental PCA

2020-04-18 Thread Rahul Gupta
i wanted to implement incremental PCA.
Got this code for stack overflow but i am wondering what y = chunk.pop("y") 
does and what is this argument "y" to pop
from sklearn.decomposition import IncrementalPCA
import csv
import sys
import numpy as np
import pandas as pd

dataset = sys.argv[1]
chunksize_ = 5 * 25000
dimensions = 300

reader = pd.read_csv(dataset, sep = ',', chunksize = chunksize_)
sklearn_pca = IncrementalPCA(n_components=dimensions)
for chunk in reader:
y = chunk.pop("Y")
sklearn_pca.partial_fit(chunk)

# Computed mean per feature
mean = sklearn_pca.mean_
# and stddev
stddev = np.sqrt(sklearn_pca.var_)

Xtransformed = None
for chunk in pd.read_csv(dataset, sep = ',', chunksize = chunksize_):
y = chunk.pop("Y")
Xchunk = sklearn_pca.transform(chunk)
if Xtransformed == None:
Xtransformed = Xchunk
else:
Xtransformed = np.vstack((Xtransformed, Xchunk))
-- 
https://mail.python.org/mailman/listinfo/python-list


chi square test in sklearn printing NAN values for most of the columns

2020-04-27 Thread Rahul Gupta
Hi i am trying to use chi-square Test to select most important columns among 
5501 columns. But for most of the columns i am getting NAN value as a Chi test 
value

import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import chi2
cols =[]
cols.append(int(0))
#for i in range(1, 5502):
cols.append(int(10))

df = pd.read_csv("D:\PHD\obranking\\demo.csv", usecols=cols)
df.apply(LabelEncoder().fit_transform)
X = df.drop(labels='label', axis=1)
Y = df['label']
chi_scores = chi2(X, Y)
print(chi_scores)
in this code i printed chi value for 10th column but for most of the columns it 
is behaving like below "C:\Users\Rahul 
Gupta\PycharmProjects\CSVLearn\venv\Scripts\python.exe" "C:/Users/Rahul 
Gupta/PycharmProjects/CSVLearn/ChiSq_learn.py" (array([nan]), array([nan]))

Process finished with exit code 0
-- 
https://mail.python.org/mailman/listinfo/python-list


unable to write content in csv filw

2020-04-27 Thread Rahul Gupta
FOLLWOING IS MY CODE
import pandas as pd
import csv
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import chi2
with open("D:\PHD\obranking\\test_chi.csv", 'w') as csvfilew1:
fields = ['index', 'feature name', 'p_value']
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()
for i in range(1, 5502):
csv_data = dict().fromkeys(fields)
csv_data['index'] = i
cols =[]
cols.append(int(0))
cols.append(int(i))

df = pd.read_csv("D:\PHD\obranking\\demo.csv", usecols=cols)
df.apply(LabelEncoder().fit_transform)
X = df.drop(labels='label', axis=1)
Y = df['label']
chi_scores = chi2(X, Y)
if(chi_scores[1] < 0.05):
f_name = str(X.columns)
f_name = f_name[8:-19]
csv_data['feature name'] = f_name
p_val = str(chi_scores[1])
p_val = p_val[1:-1]
csv_data['p_value'] = p_val
print(csv_data)
csvwriter1.writerow(csv_data)
#print(csv_data)
#print(f_name + p_val)
#print(str(X.col + str(chi_scores[1]))
test_chi.csv is created but it remains empty after execution of the code. 
although when i am printing csv_data it gets printed but not written in csv 
using writerow(csv_data). Also there are no field names in the csv even 
writeheader() seems to not work. I am confused what is wrong. Could someone 
help
-- 
https://mail.python.org/mailman/listinfo/python-list


error in CSV resetting with seek(0)

2020-05-01 Thread Rahul Gupta
consider the following code
import csv
import numpy as np

with open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file1, 
open("D:\PHD\obranking\\demo.csv", mode='r') as csv_file2:
csv_reader1 = csv.DictReader(csv_file1)
csv_reader2 = csv.DictReader(csv_file2)


filename = "cell_split_demo.csv"
with open("D:\PHD\obranking\\cell_split_demo.csv", 'w') as csvfilew1:
fields = (range(0, 300))
csvwriter1 = csv.DictWriter(csvfilew1, fieldnames=fields)
csvwriter1.writeheader()

for i, row in enumerate(csv_reader1):
print(f"value_i({i}) label({row['label']})")
for j, line in enumerate(csv_reader2):
if j <= i:
matrixrows[j] = []
if row['label'] != line['label']:
print(f"value_j({j})Unequal label({line['label']})")
else:
print(f"value_j({j})   equal label({line['label']})")
pass
else:
break
csv_file2.seek(0)
Here is some of the out_put samples
value_i(0) label(BW)
value_j(0)   equal label(BW)
value_i(1) label(BW)
value_j(0)   Unequal label(label)
value_j(1)   equal label(BW)
value_i(2) label(BW)
value_j(0)   Unequal label(label)
value_j(1)   equal label(BW)
value_j(2)   equal label(BW)
You can see for j=0 while i goes from 1 to n it is not able to acess 
line['label'] value.
Kindly help what is wrong with this?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: error in CSV resetting with seek(0)

2020-05-02 Thread Rahul Gupta
@peter Otten thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Ram memory not freed after executing python script on ubuntu system

2020-05-27 Thread Rahul Gupta


I am having a Ubuntu system which has 125 Gb of RAM. I executed few python 
scripts on that system. Those scripts uses numpy arrays and pandas. Now 
execution was over but still 50 gb of RAM and 2 Gb cache and 8.4 Gb of swap is 
occupied. At this moment nothing is running on the system. I have googled it. 
Most of th result shows that python garbage collector is poor in performance. I 
want this memory to be cleaned and re claim. One of the easiest way is to 
restart the system but i dont want to restart i want a way to do this when the 
system is up and running. Kindly tell me how to do this. Thanks
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Ram memory not freed after executing python script on ubuntu system

2020-05-28 Thread Rahul Gupta
On Thursday, May 28, 2020 at 11:20:05 AM UTC+5:30, Rahul Gupta wrote:
> I am having a Ubuntu system which has 125 Gb of RAM. I executed few python 
> scripts on that system. Those scripts uses numpy arrays and pandas. Now 
> execution was over but still 50 gb of RAM and 2 Gb cache and 8.4 Gb of swap 
> is occupied. At this moment nothing is running on the system. I have googled 
> it. Most of th result shows that python garbage collector is poor in 
> performance. I want this memory to be cleaned and re claim. One of the 
> easiest way is to restart the system but i dont want to restart i want a way 
> to do this when the system is up and running. Kindly tell me how to do this. 
> Thanks
Yes i am sure 125 gb of ram is there.
And you talked about refrences 
see these links
https://stackoverflow.com/questions/39100971/how-do-i-release-memory-used-by-a-pandas-dataframe
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
-- 
https://mail.python.org/mailman/listinfo/python-list