[Tutor] 1 to N searches in files
Hi all I have two files (File A and File B) with strings of data in them (each string on a separate line). Basically, each string in File B will be compared with all the strings in File A and the resulting output is to show a list of matched/unmatched lines and optionally to write to a third File C File A: Unique strings File B: Can have duplicate strings (that is, "string1" may appear more than once) My code currently looks like this: - FirstFile = open('C:\FileA.txt', 'r') SecondFile = open('C:\FileB.txt', 'r') ThirdFile = open('C:\FileC.txt', 'w') a = FirstFile.readlines() b = SecondFile.readlines() mydiff = difflib.Differ() results = mydiff(a,b) print("\n".join(results)) #ThirdFile.writelines(results) FirstFile.close() SecondFile.close() ThirdFile.close() - However, it seems that the results do not correctly reflect the matched/unmatched lines. As an example, if FileA contains "string1" and FileB contains multiple occurrences of "string1", it seems that the first occurrence matches correctly but subsequent "string1"s are treated as unmatched strings. I am thinking perhaps I don't understand Differ() that well and that it is not doing what I hoped to do? Is Differ() comparing first line to first line and second line to second line etc in contrast to what I wanted to do? Regards ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Tutor Digest, Vol 106, Issue 5
From: "tutor-requ...@python.org" To: tutor@python.org Sent: Sunday, 2 December 2012, 17:34 Subject: Tutor Digest, Vol 106, Issue 5 Send Tutor mailing list submissions to tutor@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/tutor or, via email, send a message with subject or body 'help' to tutor-requ...@python.org You can reach the person managing the list at tutor-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Tutor digest..." Today's Topics: 1. Re: reverse diagonal (Dave Angel) 2. To Find the Answers (Sujit Baniya) 3. Re: To Find the Answers (Dave Angel) 4. Re: reverse diagonal (Steven D'Aprano) 5. 1 to N searches in files (Spectral None) 6. Re: 1 to N searches in files (Steven D'Aprano) -- Message: 1 Date: Sat, 01 Dec 2012 23:18:44 -0500 From: Dave Angel To: eryksun Cc: tutor@python.org Subject: Re: [Tutor] reverse diagonal Message-ID: <50bad6a4.1020...@davea.name> Content-Type: text/plain; charset=UTF-8 On 12/01/2012 09:55 PM, eryksun wrote: > On Sat, Dec 1, 2012 at 9:35 PM, Dave Angel wrote: >> >> [M[i][~i] for i,dummy in enumerate(M) ] > > Since enumerate() iterates the rows, you could skip the first index: > > >>> [row[~i] for i,row in enumerate(M)] > [3, 5, 7] > > Great job. And I can't see any way to improve on that. -- DaveA -- Message: 2 Date: Sun, 2 Dec 2012 10:24:19 +0545 From: Sujit Baniya To: tutor@python.org Subject: [Tutor] To Find the Answers Message-ID: Content-Type: text/plain; charset="iso-8859-1" *Write a function named countRepresentations that returns the number*>* of ways that an amount of money in rupees can be represented as rupee*>* notes. For this problem we only use rupee notes in denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The signature of the function is:*>* def countRepresentations(int numRupees)*>**>* For example, countRepresentations(12) should return 15 because 12*>* rupees can be represented in the following 15 ways.*>* 1. 12 one rupee notes*>* 2. 1 two rupee note plus 10 one rupee notes*>* 3. 2 two rupee notes plus 8 one rupee notes*>* 4. 3 two rupee notes plus 6 one rupee notes*>* 5. 4 two rupee notes plus 4 one rupee notes*>* 6. 5 two rupee notes plus 2 one rupee notes*>* 7. 6 two rupee notes*>* 8. 1 five rupee note plus 7 one rupee notes*>* 9. 1 five rupee note, 1 two rupee note and 5 one rupee notes*>* 10. 1 five rupee note, 2 two rupee notes and 3 one rupee notes*>* 11. 1 five rupee note, 3 two notes and 1 one rupee note*>* 12. 2 five rupee notes and 2 one rupee notes*>* 13. 2 five rupee notes and 1 two rupee note*>* 14. 1 ten rupee note and 2 one rupee notes*>* 15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a nested loop that looks like this. Please fill in the*>* blanks intelligently, i.e. minimize the number of times that the if*>* statement is executed.*>* for (int rupee20=0; rupee20<=__; rupee20++)*>* for (int rupee10=0; rupee10<=__; rupee10++)*>* for (int rupee5=0; rupee5<=__; rupee5++)*>* for (int rupee2=0; rupee2<=__; rupee2++)*>* for (int rupee1=0; rupee1<=__; rupee1++)*>* {*>* if (___)*>* count++*>* }* -- Sujit Baniya -- next part -- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/tutor/attachments/20121202/ffecad69/attachment-0001.html> -- Message: 3 Date: Sun, 02 Dec 2012 00:27:26 -0500 From: Dave Angel To: Sujit Baniya Cc: tutor@python.org Subject: Re: [Tutor] To Find the Answers Message-ID: <50bae6be.4070...@davea.name> Content-Type: text/plain; charset=ISO-8859-1 On 12/01/2012 11:39 PM, Sujit Baniya wrote: > *Write a function named countRepresentations that returns the > number*>* of ways that an amount of money in rupees can be represented > as rupee*>* notes. For this problem we only use rupee notes in > denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The > signature of the function is:*>* def countRepresentations(int > numRupees)*>**>* For example, countRepresentations(12) should return > 15 because 12*>* rupees can be represented in the following 15 > ways.*>* 1. 12 one rupee notes*>* 2. 1 two rupee note plus 10 one > rupee notes*>* 3. 2 two rupee notes plus 8 one rupee notes*>* 4. 3 > two rupee notes plus 6 one rupee notes*>* 5. 4 two rupee notes plus > 4 one rupee notes*>* 6. 5 two rupee notes plus 2 one rupee no
Re: [Tutor] 1 to N searches in files
From: Dave Angel To: Spectral None Cc: "tutor@python.org" Sent: Sunday, 2 December 2012, 20:05 Subject: Re: [Tutor] 1 to N searches in files On 12/02/2012 03:53 AM, Spectral None wrote: > Hi all > > I have two files (File A and File B) with strings of data in them (each > string on a separate line). Basically, each string in File B will be compared > with all the strings in File A and the resulting output is to show a list of > matched/unmatched lines and optionally to write to a third File C > > File A: Unique strings > File B: Can have duplicate strings (that is, "string1" may appear more than > once) > > My code currently looks like this: > > - > FirstFile = open('C:\FileA.txt', 'r') > SecondFile = open('C:\FileB.txt', 'r') > ThirdFile = open('C:\FileC.txt', 'w') > > a = FirstFile.readlines() > b = SecondFile.readlines() > > mydiff = difflib.Differ() > results = mydiff(a,b) > print("\n".join(results)) > > #ThirdFile.writelines(results) > > FirstFile.close() > SecondFile.close() > ThirdFile.close() > - > > However, it seems that the results do not correctly reflect the > matched/unmatched lines. As an example, if FileA contains "string1" and FileB > contains multiple occurrences of "string1", it seems that the first > occurrence matches correctly but subsequent "string1"s are treated as > unmatched strings. > > I am thinking perhaps I don't understand Differ() that well and that it is > not doing what I hoped to do? Is Differ() comparing first line to first line > and second line to second line etc in contrast to what I wanted to do? > > Regards > > > Let me guess your goal, and then, on that assumption, discuss your code. > I think your File A is supposed to be a dictionary of valid words > (strings). You want to process File B, checking each line against that > dictionary, and make a list of which lines are "valid" (in the > dictionary), and another of which lines are not (missing from the > dictionary). That's one list for matched lines, and one for unmatched. > That isn't even close to what difflib does. This can be solved with > minimal code, but not by starting with difflib. > What you should do is to loop through File A, adding all the lines to a > set called valid_dictionary. Calling set(FirstFile) can do that in one > line, without even calling readlines(). > Then a simple loop can build the desired lists. The matched_lines is > simply all lines which are in the dictionary, while unmatched_lines are > those which are not. > The heart of the comparison could simply look like: > if line in valid_dictionary: > matched_lines.append(line) > else: > unmatched_lines.append(line) > -- > DaveA - Hi Dave Your solution seems to work: setA = set(FileA) setB = set(FileB) for line in setB: if line in setA: matched_lines.writelines(line) else: non_matched_lines.writelines(line) There are no duplicates in the results as well. Thanks for helping out Regards___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] 1 to N searches in files
From: "tutor-requ...@python.org" To: tutor@python.org Sent: Monday, 3 December 2012, 21:57 Subject: Tutor Digest, Vol 106, Issue 9 Send Tutor mailing list submissions to tutor@python.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/tutor or, via email, send a message with subject or body 'help' to tutor-requ...@python.org You can reach the person managing the list at tutor-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Tutor digest..." Today's Topics: 1. Re: Tutor Digest, Vol 106, Issue 5 (Spectral None) -- Message: 1 Date: Mon, 3 Dec 2012 21:55:35 +0800 (SGT) From: Spectral None To: "tutor@python.org" Subject: Re: [Tutor] Tutor Digest, Vol 106, Issue 5 Message-ID: <1354542935.11347.yahoomail...@web190604.mail.sg3.yahoo.com> Content-Type: text/plain; charset="iso-8859-1" From: "tutor-requ...@python.org" To: tutor@python.org Sent: Sunday, 2 December 2012, 17:34 Subject: Tutor Digest, Vol 106, Issue 5 Send Tutor mailing list submissions to ??? tutor@python.org To subscribe or unsubscribe via the World Wide Web, visit ??? http://mail.python.org/mailman/listinfo/tutor or, via email, send a message with subject or body 'help' to ??? tutor-requ...@python.org You can reach the person managing the list at ??? tutor-ow...@python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Tutor digest..." Today's Topics: ? 1. Re: reverse diagonal (Dave Angel) ? 2. To Find the Answers (Sujit Baniya) ? 3. Re: To Find the Answers (Dave Angel) ? 4. Re: reverse diagonal (Steven D'Aprano) ? 5. 1 to N searches in files (Spectral None) ? 6. Re: 1 to N searches in files (Steven D'Aprano) -- Message: 1 Date: Sat, 01 Dec 2012 23:18:44 -0500 From: Dave Angel To: eryksun Cc: tutor@python.org Subject: Re: [Tutor] reverse diagonal Message-ID: <50bad6a4.1020...@davea.name> Content-Type: text/plain; charset=UTF-8 On 12/01/2012 09:55 PM, eryksun wrote: > On Sat, Dec 1, 2012 at 9:35 PM, Dave Angel wrote: >> >> [M[i][~i] for i,dummy in enumerate(M) ] > > Since enumerate() iterates the rows, you could skip the first index: > >? ? >>> [row[~i] for i,row in enumerate(M)] >? ? [3, 5, 7] > > Great job.? And I can't see any way to improve on that. -- DaveA -- Message: 2 Date: Sun, 2 Dec 2012 10:24:19 +0545 From: Sujit Baniya To: tutor@python.org Subject: [Tutor] To Find the Answers Message-ID: ??? Content-Type: text/plain; charset="iso-8859-1" *Write a function named countRepresentations that returns the number*>* of ways that an amount of money in rupees can be represented as rupee*>* notes. For this problem we only use? rupee notes in denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The signature of the function is:*>*? ? def countRepresentations(int numRupees)*>**>* For example, countRepresentations(12) should return 15 because 12*>* rupees can be represented in the following 15 ways.*>*? 1. 12 one rupee notes*>*? 2. 1 two rupee note plus 10 one rupee notes*>*? 3. 2 two rupee notes plus 8 one rupee notes*>*? 4. 3 two rupee notes plus 6 one rupee notes*>*? 5. 4 two rupee notes plus 4 one rupee notes*>*? 6. 5 two rupee notes plus 2 one rupee notes*>* 7. 6 two rupee notes*>*? 8. 1 five rupee note plus 7 one rupee notes*>*? 9. 1 five rupee note, 1 two rupee note and 5 one rupee notes*>*? 10. 1 five rupee note, 2 two rupee notes and 3 one rupee notes*>*? 11. 1 five rupee note, 3 two notes and 1 one rupee note*>* 12. 2 five rupee notes and 2 one rupee notes*>*? 13. 2 five rupee notes and 1 two rupee note*>*? 14. 1 ten rupee note and 2 one rupee notes*>*? 15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a nested loop that looks like this. Please fill in the*>* blanks intelligently, i.e. minimize the number of times that the if*>* statement is executed.*>* for (int rupee20=0; rupee20<=__; rupee20++)*>*? ? for (int rupee10=0; rupee10<=__; rupee10++)*>* for (int rupee5=0; rupee5<=__; rupee5++)*>*? ? ? ? ? for (int rupee2=0; rupee2<=__; rupee2++)*>*? ? ? ? ? ? for (int rupee1=0; rupee1<=__; rupee1++)*>*? ? ? ? ? ? {*>*? ? ? ? ? ? ? ? if (___)*>* ? ? ? ? ? ? ? ? ? count++*>*? ? ? ? ? ? }* -- Sujit Baniya -- next part -- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/tutor/attachments/20121202/ffecad69/attachment-0001.html> -- Message: 3 Date: Sun, 02 Dec 2012 00:27:26 -0500 From