[Tutor] 1 to N searches in files

2012-12-02 Thread Spectral None
Hi all

I have two files (File A and File B) with strings of data in them (each string 
on a separate line). Basically, each string in File B will be compared with all 
the strings in File A and the resulting output is to show a list of 
matched/unmatched lines and optionally to write to a third File C

File A: Unique strings
File B: Can have duplicate strings (that is, "string1" may appear more than 
once)

My code currently looks like this:

-
FirstFile = open('C:\FileA.txt', 'r')
SecondFile = open('C:\FileB.txt', 'r')
ThirdFile = open('C:\FileC.txt', 'w')

a = FirstFile.readlines()
b = SecondFile.readlines()

mydiff = difflib.Differ()
results = mydiff(a,b)
print("\n".join(results))

#ThirdFile.writelines(results)

FirstFile.close()
SecondFile.close()
ThirdFile.close()
-

However, it seems that the results do not correctly reflect the 
matched/unmatched lines. As an example, if FileA contains "string1" and FileB 
contains multiple occurrences of "string1", it seems that the first occurrence 
matches correctly but subsequent "string1"s are treated as unmatched strings.

I am thinking perhaps I don't understand Differ() that well and that it is not 
doing what I hoped to do? Is Differ() comparing first line to first line and 
second line to second line etc in contrast to what I wanted to do?

Regards
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Tutor Digest, Vol 106, Issue 5

2012-12-03 Thread Spectral None
From: "tutor-requ...@python.org" 
To: tutor@python.org 
Sent: Sunday, 2 December 2012, 17:34
Subject: Tutor Digest, Vol 106, Issue 5

Send Tutor mailing list submissions to
    tutor@python.org

To subscribe or unsubscribe via the World Wide Web, visit
    http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
    tutor-requ...@python.org

You can reach the person managing the list at
    tutor-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

  1. Re: reverse diagonal (Dave Angel)
  2. To Find the Answers (Sujit Baniya)
  3. Re: To Find the Answers (Dave Angel)
  4. Re: reverse diagonal (Steven D'Aprano)
  5. 1 to N searches in files (Spectral None)
  6. Re: 1 to N searches in files (Steven D'Aprano)


--

Message: 1
Date: Sat, 01 Dec 2012 23:18:44 -0500
From: Dave Angel 
To: eryksun 
Cc: tutor@python.org
Subject: Re: [Tutor] reverse diagonal
Message-ID: <50bad6a4.1020...@davea.name>
Content-Type: text/plain; charset=UTF-8

On 12/01/2012 09:55 PM, eryksun wrote:
> On Sat, Dec 1, 2012 at 9:35 PM, Dave Angel  wrote:
>>
>> [M[i][~i] for i,dummy in enumerate(M) ]
> 
> Since enumerate() iterates the rows, you could skip the first index:
> 
>    >>> [row[~i] for i,row in enumerate(M)]
>    [3, 5, 7]
> 
> 

Great job.  And I can't see any way to improve on that.

-- 

DaveA


--

Message: 2
Date: Sun, 2 Dec 2012 10:24:19 +0545
From: Sujit Baniya 
To: tutor@python.org
Subject: [Tutor] To Find the Answers
Message-ID:
    
Content-Type: text/plain; charset="iso-8859-1"

*Write a function named countRepresentations that returns the
number*>* of ways that an amount of money in rupees can be represented
as rupee*>* notes. For this problem we only use  rupee notes in
denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The
signature of the function is:*>*    def countRepresentations(int
numRupees)*>**>* For example, countRepresentations(12) should return
15 because 12*>* rupees can be represented in the following 15
ways.*>*  1. 12 one rupee notes*>*  2. 1 two rupee note plus 10 one
rupee notes*>*  3. 2 two rupee notes plus 8 one rupee notes*>*  4. 3
two rupee notes plus 6 one rupee notes*>*  5. 4 two rupee notes plus
4 one rupee notes*>*  6. 5 two rupee notes plus 2 one rupee notes*>*
7. 6 two rupee notes*>*  8. 1 five rupee note plus 7 one rupee
notes*>*  9. 1 five rupee note, 1 two rupee note and 5 one rupee
notes*>*  10. 1 five rupee note, 2 two rupee notes and 3 one rupee
notes*>*  11. 1 five rupee note, 3 two notes and 1 one rupee note*>*
12. 2 five rupee notes and 2 one rupee notes*>*  13. 2 five rupee
notes and 1 two rupee note*>*  14. 1 ten rupee note and 2 one rupee
notes*>*  15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a
nested loop that looks like this. Please fill in the*>* blanks
intelligently, i.e. minimize the number of times that the if*>*
statement is executed.*>* for (int rupee20=0; rupee20<=__;
rupee20++)*>*    for (int rupee10=0; rupee10<=__; rupee10++)*>*
for (int rupee5=0; rupee5<=__; rupee5++)*>*          for (int
rupee2=0; rupee2<=__; rupee2++)*>*            for (int rupee1=0;
rupee1<=__; rupee1++)*>*            {*>*                if (___)*>*
                  count++*>*            }*



-- 
Sujit Baniya
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/tutor/attachments/20121202/ffecad69/attachment-0001.html>

--

Message: 3
Date: Sun, 02 Dec 2012 00:27:26 -0500
From: Dave Angel 
To: Sujit Baniya 
Cc: tutor@python.org
Subject: Re: [Tutor] To Find the Answers
Message-ID: <50bae6be.4070...@davea.name>
Content-Type: text/plain; charset=ISO-8859-1

On 12/01/2012 11:39 PM, Sujit Baniya wrote:
> *Write a function named countRepresentations that returns the
> number*>* of ways that an amount of money in rupees can be represented
> as rupee*>* notes. For this problem we only use  rupee notes in
> denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The
> signature of the function is:*>*    def countRepresentations(int
> numRupees)*>**>* For example, countRepresentations(12) should return
> 15 because 12*>* rupees can be represented in the following 15
> ways.*>*  1. 12 one rupee notes*>*  2. 1 two rupee note plus 10 one
> rupee notes*>*  3. 2 two rupee notes plus 8 one rupee notes*>*  4. 3
> two rupee notes plus 6 one rupee notes*>*  5. 4 two rupee notes plus
> 4 one rupee notes*>*  6. 5 two rupee notes plus 2 one rupee no

Re: [Tutor] 1 to N searches in files

2012-12-03 Thread Spectral None
From: Dave Angel 
To: Spectral None  
Cc: "tutor@python.org"  
Sent: Sunday, 2 December 2012, 20:05
Subject: Re: [Tutor] 1 to N searches in files

On 12/02/2012 03:53 AM, Spectral None wrote:
> Hi all
>
> I have two files (File A and File B) with strings of data in them (each 
> string on a separate line). Basically, each string in File B will be compared 
> with all the strings in File A and the resulting output is to show a list of 
> matched/unmatched lines and optionally to write to a third File C
>
> File A: Unique strings
> File B: Can have duplicate strings (that is, "string1" may appear more than 
> once)
>
> My code currently looks like this:
>
> -
> FirstFile = open('C:\FileA.txt', 'r')
> SecondFile = open('C:\FileB.txt', 'r')
> ThirdFile = open('C:\FileC.txt', 'w')
>
> a = FirstFile.readlines()
> b = SecondFile.readlines()
>
> mydiff = difflib.Differ()
> results = mydiff(a,b)
> print("\n".join(results))
>
> #ThirdFile.writelines(results)
>
> FirstFile.close()
> SecondFile.close()
> ThirdFile.close()
> -
>
> However, it seems that the results do not correctly reflect the 
> matched/unmatched lines. As an example, if FileA contains "string1" and FileB 
> contains multiple occurrences of "string1", it seems that the first 
> occurrence matches correctly but subsequent "string1"s are treated as 
> unmatched strings.
>
> I am thinking perhaps I don't understand Differ() that well and that it is 
> not doing what I hoped to do? Is Differ() comparing first line to first line 
> and second line to second line etc in contrast to what I wanted to do?
>
> Regards
>
>
> Let me guess your goal, and then, on that assumption, discuss your code.


> I think your File A is supposed to be a dictionary of valid words
> (strings).  You want to process File B, checking each line against that
> dictionary, and make a list of which lines are "valid" (in the
> dictionary), and another of which lines are not (missing from the
> dictionary).  That's one list for matched lines, and one for unmatched.

> That isn't even close to what difflib does.  This can be solved with
> minimal code, but not by starting with difflib.

> What you should do is to loop through File A, adding all the lines to a
> set called valid_dictionary.  Calling set(FirstFile) can do that in one
> line, without even calling readlines().
> Then a simple loop can build the desired lists.  The matched_lines is
> simply all lines which are in the dictionary, while unmatched_lines are
> those which are not.

> The heart of the comparison could simply look like:

>     if line in valid_dictionary:
>    matched_lines.append(line)
>      else:
>            unmatched_lines.append(line)


> -- 

> DaveA

-

Hi Dave

Your solution seems to work:

setA = set(FileA)
setB = set(FileB)

for line in setB:
  if line in setA:
    matched_lines.writelines(line)
  else:
    non_matched_lines.writelines(line)

There are no duplicates in the results as well. Thanks for helping out

Regards___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] 1 to N searches in files

2012-12-03 Thread Spectral None
From: "tutor-requ...@python.org" 
To: tutor@python.org 
Sent: Monday, 3 December 2012, 21:57
Subject: Tutor Digest, Vol 106, Issue 9

Send Tutor mailing list submissions to
    tutor@python.org

To subscribe or unsubscribe via the World Wide Web, visit
    http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
    tutor-requ...@python.org

You can reach the person managing the list at
    tutor-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

  1. Re: Tutor Digest, Vol 106, Issue 5 (Spectral None)


--

Message: 1
Date: Mon, 3 Dec 2012 21:55:35 +0800 (SGT)
From: Spectral None 
To: "tutor@python.org" 
Subject: Re: [Tutor] Tutor Digest, Vol 106, Issue 5
Message-ID:
    <1354542935.11347.yahoomail...@web190604.mail.sg3.yahoo.com>
Content-Type: text/plain; charset="iso-8859-1"

From: "tutor-requ...@python.org" 
To: tutor@python.org 
Sent: Sunday, 2 December 2012, 17:34
Subject: Tutor Digest, Vol 106, Issue 5

Send Tutor mailing list submissions to
??? tutor@python.org

To subscribe or unsubscribe via the World Wide Web, visit
??? http://mail.python.org/mailman/listinfo/tutor
or, via email, send a message with subject or body 'help' to
??? tutor-requ...@python.org

You can reach the person managing the list at
??? tutor-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Tutor digest..."


Today's Topics:

? 1. Re: reverse diagonal (Dave Angel)
? 2. To Find the Answers (Sujit Baniya)
? 3. Re: To Find the Answers (Dave Angel)
? 4. Re: reverse diagonal (Steven D'Aprano)
? 5. 1 to N searches in files (Spectral None)
? 6. Re: 1 to N searches in files (Steven D'Aprano)


--

Message: 1
Date: Sat, 01 Dec 2012 23:18:44 -0500
From: Dave Angel 
To: eryksun 
Cc: tutor@python.org
Subject: Re: [Tutor] reverse diagonal
Message-ID: <50bad6a4.1020...@davea.name>
Content-Type: text/plain; charset=UTF-8

On 12/01/2012 09:55 PM, eryksun wrote:
> On Sat, Dec 1, 2012 at 9:35 PM, Dave Angel  wrote:
>>
>> [M[i][~i] for i,dummy in enumerate(M) ]
> 
> Since enumerate() iterates the rows, you could skip the first index:
> 
>? ? >>> [row[~i] for i,row in enumerate(M)]
>? ? [3, 5, 7]
> 
> 

Great job.? And I can't see any way to improve on that.

-- 

DaveA


--

Message: 2
Date: Sun, 2 Dec 2012 10:24:19 +0545
From: Sujit Baniya 
To: tutor@python.org
Subject: [Tutor] To Find the Answers
Message-ID:
??? 
Content-Type: text/plain; charset="iso-8859-1"

*Write a function named countRepresentations that returns the
number*>* of ways that an amount of money in rupees can be represented
as rupee*>* notes. For this problem we only use? rupee notes in
denominations of*>* 1, 2, 5, 10 and 20 rupee notes.*>**>* The
signature of the function is:*>*? ? def countRepresentations(int
numRupees)*>**>* For example, countRepresentations(12) should return
15 because 12*>* rupees can be represented in the following 15
ways.*>*? 1. 12 one rupee notes*>*? 2. 1 two rupee note plus 10 one
rupee notes*>*? 3. 2 two rupee notes plus 8 one rupee notes*>*? 4. 3
two rupee notes plus 6 one rupee notes*>*? 5. 4 two rupee notes plus
4 one rupee notes*>*? 6. 5 two rupee notes plus 2 one rupee notes*>*
7. 6 two rupee notes*>*? 8. 1 five rupee note plus 7 one rupee
notes*>*? 9. 1 five rupee note, 1 two rupee note and 5 one rupee
notes*>*? 10. 1 five rupee note, 2 two rupee notes and 3 one rupee
notes*>*? 11. 1 five rupee note, 3 two notes and 1 one rupee note*>*
12. 2 five rupee notes and 2 one rupee notes*>*? 13. 2 five rupee
notes and 1 two rupee note*>*? 14. 1 ten rupee note and 2 one rupee
notes*>*? 15. 1 ten rupee note and 1 two rupee note*>**>* Hint: Use a
nested loop that looks like this. Please fill in the*>* blanks
intelligently, i.e. minimize the number of times that the if*>*
statement is executed.*>* for (int rupee20=0; rupee20<=__;
rupee20++)*>*? ? for (int rupee10=0; rupee10<=__; rupee10++)*>*
for (int rupee5=0; rupee5<=__; rupee5++)*>*? ? ? ? ? for (int
rupee2=0; rupee2<=__; rupee2++)*>*? ? ? ? ? ? for (int rupee1=0;
rupee1<=__; rupee1++)*>*? ? ? ? ? ? {*>*? ? ? ? ? ? ? ? if (___)*>*
? ? ? ? ? ? ? ? ? count++*>*? ? ? ? ? ? }*



-- 
Sujit Baniya
-- next part --
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/tutor/attachments/20121202/ffecad69/attachment-0001.html>

--

Message: 3
Date: Sun, 02 Dec 2012 00:27:26 -0500
From