Le Mon, 23 Feb 2009 14:41:10 +0100, Norman Khine <nor...@khine.net> s'exprima ainsi:
> Hello, > > I have this csv file: > > $ cat licences.csv > "1","Air Travel Organisation Licence (ATOL)\n Operates Inclusive Tours (IT)" > "2","Air Travel Organisation Licence (ATOL)\n Appointed Agents of IATA > (IATA)" > "3", "Association of British Travel Agents (ABTA) No. 56542\n Air Travel > Organisation Licence (ATOL)\n Appointed Agents of IATA (IATA)\n > Incentive Travel & Meet. Association (ITMA)" I have the impression that the CSV module is here helpless. Yes, it parses the data, but you need only a subset of it that may be harder to extract. I would do the following (all untested): -0- Read in the file as a single string. > I would like to create a set of unique values for all the memberships. i.e. > > ATOL > IT > ABTA > etc.. -1- Use re.findall with a pattern like r'\((\w+)\)' to get the company codes, then built a set out of the result list > and also I would like to extract the No. 56542 -2- idem, with r'No. (\d+)' (maybe set is not necessary) > and lastly I would like to map each record to the set of unique > membership values, so that: > > I have a dictionary like: > > {0: ['1', '('ATOL', 'IT')'], > 1: ['2','('ATOL', 'IATA')'], > 2: ['3','('ABTA', 'ATOL', 'IATA', 'ITMA')']} (The dict looks strange...) -3- Now "splitlines" the string, and on each line * read ordinal number (maybe useless actually) * read again the codes I dont know what your dict is worthful for, as the keys are simple ordinals. It's a masked list, actually. Unless you want instead {['1':['ATOL', 'IT'], '2':['ATOL', 'IATA'], '3':['ABTA', 'ATOL', 'IATA', 'ITMA']} But here the keys are still predictable ordinals. denis ------ la vita e estrany > Here is what I have so far: > > >>> import csv > >>> inputFile = open(str("licences.csv"), 'r') > >>> outputDic = {} > >>> keyIndex = 0 > >>> fileReader = csv.reader(inputFile) > >>> for line in fileReader: > ... outputDic[keyIndex] = line > ... keyIndex+=1 > ... > >>> print outputDic > {0: ['2', 'Air Travel Organisation Licence (ATOL) Appointed Agents of > IATA (IATA)'], 1: ['3', ' "Association of British Travel Agents (ABTA) > No. 56542 Air Travel'], 2: ['Organisation Licence (ATOL) Appointed > Agents of IATA (IATA) Incentive Travel & Meet. Association (ITMA)"']} > > So basically I would like to keep only the data in the brackets, i.e. > (ABTA) etc.. > > Cheers > > Norman > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor