> I have a very large .csv (correlationfile, which is 16 million lines long) > which I want to split into smaller .csvs. The smaller csvs should be created > be searching for a value and printing any line which contains that value - all > these values are contained in another .csv (vertexfile). I think that I have > an indentation problem or have made a mistake with my loops because I only get > data in one of the output .csvs (outputfile) which is for the first one of the > values. The other .csvs are empty. > > Can somebody help me please? > > Thanks so much! > > Emma > > import os > path = os.getcwd() > x = '' > for v in vertexfile: > vs = v.replace('\n','') > outputfile = open(os.path.join(path,vs+'.csv'),'w') > for c in correlationfile: > cs = c.replace('\n','').split(',') > if vs == cs[0]: print vs > outputfile.write(x) > outputfile.close()
Indent the outputfile.close() to be inside the for loop. That should fix your problem. I would recommend working with csv module instead. No need to worry about replacing new lines or if a comma is contained inside your data. Note, when using the csv module open the files as 'rb' and 'wb'. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor