date:20090504

Re: [Tutor] Iterating over a long list with regular expressions and changing each item?

2009-05-04 Thread spir

Le Sun, 3 May 2009 21:59:23 -0400,
Dan Liang  s'exprima ainsi:

> Hi tutors,
> 
> I am working on a file and need to replace each occurrence of a certain
> label (part of speech tag in this case) by a number of sub-labels. The file
> has the following format:
> 
> word1  \tTag1
> word2  \tTag2
> word3  \tTag3
> 
> Now the tags are complex and I wanted to split them in a tab-delimited
> fashion to have this:
> 
> word1   \t   Tag1Part1   \t   Tag2Part2   \t   Tag3Part3
> 
> I searched online for some solution and found the code below which uses a
> dictionary to store the tags that I want to replace in keys and the sub-tags
> as values. The problem with this is that it sometimes replaces tags that are
> not surrounded by spaces, which I do not like to happen*1*. Also, I wanted
> each new sub-tag to be followed by a tab, so that the new items that I end
> up having in my file are tab-delimited. For this, I put tabs between the
> items of each key in the dictionary*2*. I started thinking that this will
> not be the best solution of the problem and perhaps a script that uses
> regular expressions would be better*3*. Since I am new to Python, I thought
> I should ask you for your thoughts for a best solution. The items I want to
> replace are about 150 and I did not know how to iterate over them with
> regular expressions.

*3* I think regular expressions are not the proper tool here. Because you are 
knew and it's really hairy. But above all because they help parsing, not 
rewriting. Here the input is very simple, while you have some work for the 
replacement function.

*1* If the source really looks like above, then as I understand it, "tags that 
are
not surrounded by spaces" can only occur in words (eg the word 'noun'). On more 
reason for not using regex. You just need to read each line, keep the left part 
unchanged an cope with the tag. An issue is that you replace tags "blindly", 
without taking into account the easy structure of the source -- which would 
help you.

*2* I would rather have a dict which values are lists of (sub)tags. Then let a 
replacement function cope with output formatting.
word_dic = {
'abbrev': ['abbrev, null, null'],
'adj': ['adj, null, null'],
'adv': ['adv, null, null'],
...
}
It's not only cleaner, it lets you modify formatting at will. The dict is only 
constant *data*. Separating data from process is good practice.

I would do something like (untested):


tags = {.., 'foo':['foo1','foo2,'foo3'],..} # tag dict
TAB = '\t'

def newlyTaggedWord(line):
(word,tag) = line.split(TAB)# separate parts of line, keeping data 
only
new_tags = tags['tag']  # read in dict
tagging = TAB.join(new_tags)# join with TABs
return word + TAB + tagging # formatted result

def replaceTagging(source_name, target_name):
source_file = file(source_name, 'r')
source = source_file.read() # not really necessary
target_file = open(target_name, "w")
# replacement loop
for line in source:
new_line = newlyTaggedWord(line) + '\n'
target_file.write(new_line)
source_file.close()
target_file.close()

if __name__ == "__main__"   
source_name = sys.argv[1]
target_name = sys.argv[2]
replaceTagging(source_name, target_name)



> Below is my previous code:
> 
> 
> #!usr/bin/python
> 
> import re, sys
> f = file(sys.argv[1])
> readed= f.read()
> 
> def replace_words(text, word_dic):
> for k, v in word_dic.iteritems():
> text = text.replace(k, v)
> return text
> 
> # the dictionary has target_word:replacement_word pairs
> 
> word_dic = {
> 'abbrev': 'abbrevnullnull',
> 'adj': 'adjnullnull',
> 'adv': 'advnullnull',
> 'case_def_acc': 'case_defaccnull',
> 'case_def_gen': 'case_defgennull',
> 'case_def_nom': 'case_defnomnull',
> 'case_indef_acc': 'case_indefaccnull',
> 'verb_part': 'verb_partnullnull'}
> 
> 
> # call the function and get the changed text
> 
> myString = replace_words(readed, word_dic)
> 
> 
> fout = open(sys.argv[2], "w")
> fout.write(myString)
> fout.close()
> 
> --dan


--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterating over a long list with regular expressions andchanging each item?

2009-05-04 Thread Alan Gauld



"Dan Liang"  wrote 


def replaceTagging(source_name, target_name):
  source_file = file(source_name, 'r')
  source = source_file.read()   # not really necessary


this reads the entire file as a string


  target_file = open(target_name, "w")
  # replacement loop
  for line in source:


this iterates over the characters in the string.
Remove the two source lines above and use

for line in open(source_name):

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterating over a long list with regular expressions and changing each item?

2009-05-04 Thread spir

Le Mon, 4 May 2009 10:15:35 -0400,
Dan Liang  s'exprima ainsi:

> Hi Spir and tutors,
> 
> Thank you Spir for your response. I went ahead and tried your code after
> adding a couple of dictionary entries, as below:
> ---Code Begins---
> #!usr/bin/python
> 
> tags = {
> 
> 
>  'case_def_gen':['case_def','gen','null'],
>  'nsuff_fem_pl':['nsuff','null', 'null'],
>  'abbrev': ['abbrev, null, null'],
>  'adj': ['adj, null, null'],
>  'adv': ['adv, null, null'],} # tag dict
> TAB = '\t'
> 
> def newlyTaggedWord(line):
>(word,tag) = line.split(TAB)# separate parts of line, keeping
> data only
>new_tags = tags['tag'] # read in dict--Index by string
> 
>tagging = TAB.join(new_tags)# join with TABs
>return word + TAB + tagging # formatted result
> 
> def replaceTagging(source_name, target_name):
>source_file = file(source_name, 'r')
>source = source_file.read()   # not really necessary
>target_file = open(target_name, "w")
># replacement loop
>for line in source:
>new_line = newlyTaggedWord(line) + '\n'
>target_file.write(new_line)
>source_file.close()
>target_file.close()
> 
> if __name__ == "__main__":
>source_name = sys.argv[1]
>target_name = sys.argv[2]
>replaceTagging(source_name, target_name)
> 
> ---Code Ends---
> 
> The file I am working on looks like this:
> 
> 
>   word  \t case_def_gen
>   word  \t nsuff_fem_pl
>   word  \t adj
>   word  \t abbrev
>   word  \t adv
> 
> I get the following error when I try to run it, and I cannot figure out
> where the problem lies:
> 
> ---Error Begins---
> 
> Traceback (most recent call last):
>   File "tag.formatter.py", line 36, in ?
> replaceTagging(source_name, target_name)
>   File "tag.formatter.py", line 28, in replaceTagging
> new_line = newlyTaggedWord(line) + '\n'
>   File "tag.formatter.py", line 16, in newlyTaggedWord
> (word,tag) = line.split(TAB)# separate parts of line, keeping data
> only
> ValueError: unpack list of wrong size
> 
> ---Error Ends---
> 
> Any ideas?
> 
> Thank you!
> 
> --dan

Good that I mentioned "untested" ;-)
Can you decipher the error message? What can you reason or guess from it?
Where, how, why does an error happen? What kind of test could you perform to 
better point to a proper diagnosis?
I ask all of that because you do not explain us what reflexions and/or trials 
you did to solve the issue yourself -- instead you just write "Any ideas?".

Denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] returning the entire line when regex matches

2009-05-04 Thread Nick Burgess

So far the script works fine, it avoids printing the lines i want and
I can add new domain names as needed. It looks like this:

#!/usr/bin/python
import re

outFile = open('outFile.dat', 'w')
log = file("log.dat", 'r').read().split('Source') # Set the line delimiter
for line in log:
if not re.search(r'notneeded.com|notneeded1.com',line):
outFile.write(line)

I tried the in method but it missed any other strings I put in, like
the pipe has no effect.  More complex strings will likely be needed so
perhaps re might be better..?

the next task would be to parse all files in all subdirectories,
regardless of the name of the file as the file names are the same but
the directory names change

I have been playing with os.walk but im not sure if it is the best way.

for root, dirs, files in os.walk

I guess merging all of the files into one big one before the parse
would work but I would need help with that too.

the tutelage is much appreciated

-nick

On Sun, May 3, 2009 at 6:21 PM, Alan Gauld  wrote:
>
> "Alan Gauld"  wrote
>
>>> How do I make this code print lines NOT containing the string 'Domains'?
>>>
>>
>> Don't use regex, use in:
>>
>> for line in log:
>>    if "Domains" in line:
>>        print line
>
> Should, of course,  be
>
>      if "Domains" not in line:
>          print line
>
> Alan G.
>
>
> ___
> Tutor maillist  -  tu...@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterating over a long list with regular expressions and changing each item?

2009-05-04 Thread Dan Liang

Hi Spir and tutors,

Thank you Spir for your response. I went ahead and tried your code after
adding a couple of dictionary entries, as below:
---Code Begins---
#!usr/bin/python

tags = {


 'case_def_gen':['case_def','gen','null'],
 'nsuff_fem_pl':['nsuff','null', 'null'],
 'abbrev': ['abbrev, null, null'],
 'adj': ['adj, null, null'],
 'adv': ['adv, null, null'],} # tag dict
TAB = '\t'

def newlyTaggedWord(line):
   (word,tag) = line.split(TAB)# separate parts of line, keeping
data only
   new_tags = tags['tag'] # read in dict--Index by string

   tagging = TAB.join(new_tags)# join with TABs
   return word + TAB + tagging # formatted result

def replaceTagging(source_name, target_name):
   source_file = file(source_name, 'r')
   source = source_file.read()   # not really necessary
   target_file = open(target_name, "w")
   # replacement loop
   for line in source:
   new_line = newlyTaggedWord(line) + '\n'
   target_file.write(new_line)
   source_file.close()
   target_file.close()

if __name__ == "__main__":
   source_name = sys.argv[1]
   target_name = sys.argv[2]
   replaceTagging(source_name, target_name)

---Code Ends---

The file I am working on looks like this:


  word  \t case_def_gen
  word  \t nsuff_fem_pl
  word  \t adj
  word  \t abbrev
  word  \t adv

I get the following error when I try to run it, and I cannot figure out
where the problem lies:

---Error Begins---

Traceback (most recent call last):
  File "tag.formatter.py", line 36, in ?
replaceTagging(source_name, target_name)
  File "tag.formatter.py", line 28, in replaceTagging
new_line = newlyTaggedWord(line) + '\n'
  File "tag.formatter.py", line 16, in newlyTaggedWord
(word,tag) = line.split(TAB)# separate parts of line, keeping data
only
ValueError: unpack list of wrong size

---Error Ends---

Any ideas?

Thank you!

--dan


From: Dan Liang 
Subject: [Tutor] Iterating over a long list with regular expressions
   and changing each item?
To: tutor@python.org
Message-ID:
   
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi tutors,
>
> I am working on a file and need to replace each occurrence of a certain
> label (part of speech tag in this case) by a number of sub-labels. The file
> has the following format:
>
> word1  \tTag1
> word2  \tTag2
> word3  \tTag3
>
> Now the tags are complex and I wanted to split them in a tab-delimited
> fashion to have this:
>
> word1   \t   Tag1Part1   \t   Tag2Part2   \t   Tag3Part3
>
> I searched online for some solution and found the code below which uses a
> dictionary to store the tags that I want to replace in keys and the
> sub-tags
> as values. The problem with this is that it sometimes replaces tags that
> are
> not surrounded by spaces, which I do not like to happen. Also, I wanted
> each
> new sub-tag to be followed by a tab, so that the new items that I end up
> having in my file are tab-delimited. For this, I put tabs between the items
> of each key in the dictionary. I started thinking that this will not be the
> best solution of the problem and perhaps a script that uses regular
> expressions would be better. Since I am new to Python, I thought I should
> ask you for your thoughts for a best solution. The items I want to replace
> are about 150 and I did not know how to iterate over them with regular
> expressions. Below is my previous code:
>
>
> #!usr/bin/python
>
> import re, sys
> f = file(sys.argv[1])
> readed= f.read()
>
> def replace_words(text, word_dic):
>for k, v in word_dic.iteritems():
>text = text.replace(k, v)
>return text
>
> # the dictionary has target_word:replacement_word pairs
>
> word_dic = {
> 'abbrev': 'abbrevnullnull',
> 'adj': 'adjnullnull',
> 'adv': 'advnullnull',
> 'case_def_acc': 'case_defaccnull',
> 'case_def_gen': 'case_defgennull',
> 'case_def_nom': 'case_defnomnull',
> 'case_indef_acc': 'case_indefaccnull',
> 'verb_part': 'verb_partnullnull'}
>
>
> # call the function and get the changed text
>
> myString = replace_words(readed, word_dic)
>
>
> fout = open(sys.argv[2], "w")
> fout.write(myString)
> fout.close()
>
> --dan
> -- next part --
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/tutor/attachments/20090503/bd82a183/attachment-0001.htm
> >
>
> --
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterating over a long list with regular expressions and changing each item?

2009-05-04 Thread Paul McGuire

Original:
 'case_def_gen':['case_def','gen','null'],
 'nsuff_fem_pl':['nsuff','null', 'null'],
 'abbrev': ['abbrev, null, null'],
 'adj': ['adj, null, null'],
 'adv': ['adv, null, null'],}

Note the values for 'abbrev', 'adj' and 'adv' are not lists, but strings
containing comma-separated lists.

Should be:
 'case_def_gen':['case_def','gen','null'],
 'nsuff_fem_pl':['nsuff','null', 'null'],
 'abbrev': ['abbrev', 'null', 'null'],
 'adj': ['adj', 'null', 'null'],
 'adv': ['adv', 'null', 'null'],}

For much of my own code, I find lists of string literals to be tedious to
enter, and easy to drop a ' character.  This style is a little easier on the
eyes, and harder to screw up.

 'case_def_gen':['case_def gen null'.split()],
 'nsuff_fem_pl':['nsuff null null'.split()],
 'abbrev': ['abbrev null null'.split()],
 'adj': ['adj null null'.split()],
 'adv': ['adv null null'.split()],}

Since all that your code does at runtime with the value strings is
"\t".join() them, then you might as well initialize the dict with these
computed values, for at least some small gain in runtime performance:

 T = lambda s : "\t".join(s.split())
 'case_def_gen' : T('case_def gen null'),
 'nsuff_fem_pl' : T('nsuff null null'),
 'abbrev' :   T('abbrev null null'),
 'adj' :  T('adj null null'),
 'adv' :  T('adv null null'),}
 del T

(Yes, I know PEP8 says *not* to add spaces to line up assignments or other
related values, but I think there are isolated cases where it does help to
see what's going on.  You could even write this as:

 T = lambda s : "\t".join(s.split())
 'case_def_gen' : T('case_def  gen  null'),
 'nsuff_fem_pl' : T('nsuff null null'),
 'abbrev' :   T('abbrevnull null'),
 'adj' :  T('adj   null null'),
 'adv' :  T('adv   null null'),}
 del T

and the extra spaces help you to see the individual subtags more easily,
with no change in the resulting values since split() splits on multiple
whitespace the same as a single space.)

Of course you could simply code as:

 'case_def_gen' : T('case_def\tgen\t null'),
 'nsuff_fem_pl' : T('nsuff\tnull\tnull'),
 'abbrev' :   T('abbrev\tnull\tnull'),
 'adj' :  T('adj\tnull\tnull'),
 'adv' :  T('adv\tnull\tnull'),}

But I think readability definitely suffers here, I would probably go with
the penultimate version.

-- Paul


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread Alex Feddor

Hi

I am looking for method enables advanced text string search. Method
string.find() or re module seems no  supporting what I am looking for. The
idea is as follows:

Text ="FDA meeting was successful. New drug is approved for whole sale
distribution!"

I would like to scan the text using AND and OR operators and gets -1 or
other value if the searching elements haven't found in the text.
Example 01:
search criteria:  "FDA" AND ( "approve*" OR "supported")
The catch is that in Text variable FDA and approve words  are not one after
another (other words are in between).
Example 02:
search criteria: "Ben"
The catch is that code sould find only exact Ben words not also words which
that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben is
the right word we are looking for.

I would really appreciated your advice - code sample / links how above can
be achieved! if possible I would appreciated solution achieved with free of
charge module.

Cheers,  Alex
PS:
A few moths ago I have discovered Python. I am amazed what all can be done
with it. Really cool programming language..
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Pablo P. F. de Faria

Thanks, Kent, but that doesn't solve my problem. In fact, I need
ConfigParser to work with non-ascii characters, since my App may run
in "latin-1" environments (folders e files names). I must find out why
the str() function in the module ConfigParser doesn't use the encoding
defined for the application (# -*- coding: utf-8 -*-). The rest of the
application works properly with utf-8, except for ConfigParser. What I
found out is that ConfigParser seems to make use of the configuration
in Site.py (which is set to 'ascii'), instead of the configuration
defined for the App (if I change . But this is very problematic to
have to change Site.py in every computer... So I wonder if there is a
way to replace the settings in Site.py only for my App.

2009/5/1 Kent Johnson :
> On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria
>  wrote:
>> Hi, Kent.
>>
>> The stack trace is:
>>
>> Traceback (most recent call last):
>>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in 
>> OnClose
>>    self.SavePreferences()
>>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068,
>> in SavePreferences
>>    self.cfg.set(u'File Settings',u'Recent files',
>> unicode(",".join(self.recent_files)))
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> 12: ordinal not in range(128)
>>
>> The "unicode" function, actually doesn't do any difference... The
>> content of the string being saved is "/home/pablo/Área de
>> Trabalho/teste.xml".
>
> OK, this error is in your code, not the ConfigParser. The problem is with
> ",".join(self.recent_files)
>
> Are the entries in self.recent_files unicode strings? If so, then I
> think the join is trying to convert to a string using the default
> codec. Try
>
> self.cfg.set('File Settings','Recent files',
> ','.join(name.encode('utf-8') for name in self.recent_files))
>
> Looking at the ConfigParser.write() code, it wants the values to be
> strings or convertible to strings by calling str(), so non-ascii
> unicode values will be a problem there. I would use plain strings for
> all the interaction with ConfigParser and convert to Unicode yourself.
>
> Kent
>
> PS Please Reply All to reply to the list.
>



-- 
-
"Estamos todos na sarjeta, mas alguns de nós olham para as estrelas."
(Oscar Wilde)
-
Pablo Faria
Mestrando em Aquisição de Linguagem - IEL/Unicamp
Bolsista técnico FAPESP no Projeto Padrões Rítmicos e Mudança Lingüística
(19) 3521-1570
http://www.tycho.iel.unicamp.br/~pablofaria/
pablofa...@gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Pablo P. F. de Faria

Here is the traceback, after the last change you sugested:

Traceback (most recent call last):
  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in OnClose
self.SavePreferences()
  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1069,
in SavePreferences
self.cfg.write(codecs.open(self.properties_file,'w','utf-8'))
  File "/usr/lib/python2.5/ConfigParser.py", line 373, in write
(key, str(value).replace('\n', '\n\t')))
  File "/usr/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
  File "/usr/lib/python2.5/codecs.py", line 303, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
27: ordinal not in range(128)

So, in "str(value)" the content is a folder name with an accented character (Á).

2009/5/4 Pablo P. F. de Faria :
> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names). I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-). The rest of the
> application works properly with utf-8, except for ConfigParser. What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.
>
> 2009/5/1 Kent Johnson :
>> On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria
>>  wrote:
>>> Hi, Kent.
>>>
>>> The stack trace is:
>>>
>>> Traceback (most recent call last):
>>>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in 
>>> OnClose
>>>    self.SavePreferences()
>>>  File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068,
>>> in SavePreferences
>>>    self.cfg.set(u'File Settings',u'Recent files',
>>> unicode(",".join(self.recent_files)))
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>>> 12: ordinal not in range(128)
>>>
>>> The "unicode" function, actually doesn't do any difference... The
>>> content of the string being saved is "/home/pablo/Área de
>>> Trabalho/teste.xml".
>>
>> OK, this error is in your code, not the ConfigParser. The problem is with
>> ",".join(self.recent_files)
>>
>> Are the entries in self.recent_files unicode strings? If so, then I
>> think the join is trying to convert to a string using the default
>> codec. Try
>>
>> self.cfg.set('File Settings','Recent files',
>> ','.join(name.encode('utf-8') for name in self.recent_files))
>>
>> Looking at the ConfigParser.write() code, it wants the values to be
>> strings or convertible to strings by calling str(), so non-ascii
>> unicode values will be a problem there. I would use plain strings for
>> all the interaction with ConfigParser and convert to Unicode yourself.
>>
>> Kent
>>
>> PS Please Reply All to reply to the list.
>>
>
>
>
> --
> -
> "Estamos todos na sarjeta, mas alguns de nós olham para as estrelas."
> (Oscar Wilde)
> -
> Pablo Faria
> Mestrando em Aquisição de Linguagem - IEL/Unicamp
> Bolsista técnico FAPESP no Projeto Padrões Rítmicos e Mudança Lingüística
> (19) 3521-1570
> http://www.tycho.iel.unicamp.br/~pablofaria/
> pablofa...@gmail.com
>



-- 
-
"Estamos todos na sarjeta, mas alguns de nós olham para as estrelas."
(Oscar Wilde)
-
Pablo Faria
Mestrando em Aquisição de Linguagem - IEL/Unicamp
Bolsista técnico FAPESP no Projeto Padrões Rítmicos e Mudança Lingüística
(19) 3521-1570
http://www.tycho.iel.unicamp.br/~pablofaria/
pablofa...@gmail.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread vince spicer

Advanced Strings searches are Regex via re module.

EX:

import re

m = re.compile("(FDA.*?(approved|supported)|Ben[^\s])*")

if m.search(Text):
print m.search(Text).group()


Vince


On Mon, May 4, 2009 at 6:45 AM, Alex Feddor  wrote:

> Hi
>
> I am looking for method enables advanced text string search. Method
> string.find() or re module seems no  supporting what I am looking for. The
> idea is as follows:
>
> Text ="FDA meeting was successful. New drug is approved for whole sale
> distribution!"
>
> I would like to scan the text using AND and OR operators and gets -1 or
> other value if the searching elements haven't found in the text.
> Example 01:
> search criteria:  "FDA" AND ( "approve*" OR "supported")
> The catch is that in Text variable FDA and approve words  are not one after
> another (other words are in between).
>  Example 02:
> search criteria: "Ben"
> The catch is that code sould find only exact Ben words not also words which
> that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben is
> the right word we are looking for.
>
> I would really appreciated your advice - code sample / links how above can
> be achieved! if possible I would appreciated solution achieved with free of
> charge module.
>
> Cheers,  Alex
> PS:
> A few moths ago I have discovered Python. I am amazed what all can be done
> with it. Really cool programming language..
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] returning the entire line when regex matches

2009-05-04 Thread Alan Gauld

"Nick Burgess"  wrote 

for line in log:
   if not re.search(r'notneeded.com|notneeded1.com',line):
   outFile.write(line)

I tried the in method but it missed any other strings I put in, like
the pipe has no effect.  More complex strings will likely be needed so
perhaps re might be better..?


Yes, in only works for simple strings. If you need combinations 
then the regex is better



I have been playing with os.walk but im not sure if it is the best way.


It is almost certainly the best way.


I guess merging all of the files into one big one before the parse
would work but I would need help with that too.


You shouldn't need to do that. Your function can take a file and process 
it so just use os.walk to feed it files one by one as you find them


If the file names vary you might find glob.glob useful too.

I show examples of using os,.walk and glob in the OS topic 
in my tutorial. Look under the heading 'Manipulating Files'


HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] returning the entire line when regex matches

2009-05-04 Thread Martin Walsh

Nick Burgess wrote:
> So far the script works fine, it avoids printing the lines i want and
> I can add new domain names as needed. It looks like this:
> 
> #!/usr/bin/python
> import re
> 
> outFile = open('outFile.dat', 'w')
> log = file("log.dat", 'r').read().split('Source') # Set the line delimiter
> for line in log:
> if not re.search(r'notneeded.com|notneeded1.com',line):
> outFile.write(line)

There is a subtle problem here -- the '.' means match any single
character. I suppose it's unlikely to bite you, but it could -- for
example, a line containing a domain named notneeded12com.net would
match. You should probably escape the dot, and while you're at it
compile the regular expression.

# untested
pattern = re.compile(r'notneeded\.com|notneeded1\.com')
for line in log:
if not pattern.search(line):
outFile.write(line)

HTH,
Marty

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterating over a long list with regular expressions andchanging each item?

2009-05-04 Thread Alan Gauld



"Paul McGuire"  wrote


For much of my own code, I find lists of string literals to be tedious to
enter, and easy to drop a ' character.  This style is a little easier on 
the

eyes, and harder to screw up.

'case_def_gen':['case_def gen null'.split()],
'nsuff_fem_pl':['nsuff null null'.split()],


Shouldn't that be:


'case_def_gen':'case_def gen null'.split(),
'nsuff_fem_pl':'nsuff null null'.split(),


Otherwise you get a list inside a list.


'abbrev' :   T('abbrev null null'),
'adj' :  T('adj null null'),
'adv' : T('adv null null'),}


(Yes, I know PEP8 says *not* to add spaces to line up assignments or 
other
related values, but I think there are isolated cases where it does help 
to

see what's going on.  You could even write this as:


Absolutely! There are a few of the Python style PEPs that I disagree
with, this looks like another one.


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread Alan Gauld


"Alex Feddor"  wrote


I am looking for method enables advanced text string search. Method
string.find() or re module seems no  supporting what I am looking for. 
The

idea is as follows:


The re module almost certainly can do what you want but regex
are notoriously hard to master and often obscure.


Text ="FDA meeting was successful. New drug is approved for whole sale
distribution!"

Example 01:
search criteria:  "FDA" AND ( "approve*" OR "supported")


The regex will search for FDA followed by either approve or supported.
There is no AND operator in regex since AND just implies a sequence
within the string. There is an OR operator however which is '|'

The catch is that in Text variable FDA and approve words  are not one 
after

another (other words are in between).


And regex allows for you to specify a sequence of anything after FDA


Example 02:
search criteria: "Ben"
The catch is that code sould find only exact Ben words not also words 
which
that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben 
is

the right word we are looking for.


And again regex provides ways of ensuring an exact match.

I would really appreciated your advice - code sample / links how above 
can
be achieved! if possible I would appreciated solution achieved with free 
of

charge module.


You need to go through one of the many regex tutorials to understand
what can be done with these extremely powerful search tools (and
what can't!) There is a very basic introduction in my tutorial which
unfortunately doesn't cover all that you need here but might be a
good starting point.

The python HOWTO is another  good start and goes a bit deeper
with a different approach:

http://docs.python.org/howto/regex.html

HTH,

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread Emile van Sebille


On 5/4/2009 11:03 AM Alan Gauld said...

"Alex Feddor"  wrote


I am looking for method enables advanced text string search. Method
string.find() or re module seems no  supporting what I am looking for. 
The

idea is as follows:


The re module almost certainly can do what you want but regex
are notoriously hard to master and often obscure.


Seconded.  I almost always find it faster and easier to simply write the 
python routine I need rather than suffer the pain that results from 
getting the regex to actually perform what's needed ...


Emile

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 10:09 AM, Pablo P. F. de Faria
 wrote:
> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names).

Yes, I understand that.

Python has two different kinds of strings - byte strings, which are
instances of class str,  and unicode strings, which are instances of
class unicode. String objects are byte strings - sequences of bytes.
They are not limited to ascii characters, they hold encoded strings in
any supported encoding. In particular, UTF-8 data is stored in string
objects.

Unicode objects hold "unencoded" unicode data. (I know, Unicode is an
encoding, but it is useful to think of it this way in this context.)

str.decode() converts a string to a unicode object. unicode.encode()
converts a unicode object to a (byte) string. Both of these functions
take the encoding as a parameter. When Python is given a string, but
it needs a unicode object, or vice-versa, it will encode or decode as
needed. The encode or decode will use the system default encoding,
which as you have discovered is ascii. If the data being encoded or
decoded contains non-ascii characters, you get an error that you are
familiar with. These errors indicate that you are not correctly
handling encoded data.

See the references at the end of this essay for more background information:
http://personalpages.tds.net/~kent37/stories/00018.html

> I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-).

Because the encoding declaration doesn't define an encoding for the
application. It defines the encoding of the text of the source file
containing the declaration, that's all.

> The rest of the
> application works properly with utf-8, except for ConfigParser.

I guess you have been lucky.

> What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.

It is the wrong solution. What you should do is
- understand why you have a problem. Hint: it's not a ConfigParser bug
- give only utf-8-encoded strings to ConfigParser
- don't use the codecs module, because the data you are writing will
already be encoded.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 1:32 PM, Pablo P. F. de Faria
 wrote:
> Hi, all.
>
> I've found something that worked for me, but I'm not sure of its
> secureness. The solution is:
>
> reload(sys)
> sys.setdefaultencoding('utf-8')
>
> That's exactly what I wanted to do, but is this good practice?

No. You should understand and fix the underlying problem.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread spir

Le Mon, 4 May 2009 10:38:31 -0600,
vince spicer  s'exprima ainsi:

> Advanced Strings searches are Regex via re module.
> 
> EX:
> 
> import re
> 
> m = re.compile("(FDA.*?(approved|supported)|Ben[^\s])*")
> 
> if m.search(Text):
> print m.search(Text).group()
> 
> 
> Vince

This is not at all what the origial poster looks for, I guess (or maybe it 
didn't understand?). Regex can only match one individual sample of request 
expressed in a logical form with AND and OR clauses.
What he wants is a module able to decode and perform logical searches. It can 
certainly be built on top of regex, with a layer that:
* decodes logical requests
* performs "sub-matches" for items in the request
* then builds unions (OR) or intersections (AND) of results

I do not know of anything like that for python. But it would be a nice project 
;-)

Denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread spir

Le Mon, 4 May 2009 11:09:25 -0300,
"Pablo P. F. de Faria"  s'exprima ainsi:

> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names). I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-). The rest of the
> application works properly with utf-8, except for ConfigParser. What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.


The parameter in question is the default encoding. We used to read 
(sys.getdefaultencoding()) and define it (e.g. sys.getdefaultencoding('utf8')), 
but I remember something has changed in later versions of python.
Someone?

Denis
--
la vita e estrany
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] returning the entire line when regex matches

2009-05-04 Thread Nick Burgess

Compiling the regular expression works great, I cant find the tutorial
Mr. Gauld is referring to!!  I searched python.org and alan-g.me.uk.
Does anyone have a link?



On Mon, May 4, 2009 at 1:46 PM, Martin Walsh  wrote:
> Nick Burgess wrote:
>> So far the script works fine, it avoids printing the lines i want and
>> I can add new domain names as needed. It looks like this:
>>
>> #!/usr/bin/python
>> import re
>>
>> outFile = open('outFile.dat', 'w')
>> log = file("log.dat", 'r').read().split('Source') # Set the line delimiter
>> for line in log:
>>     if not re.search(r'notneeded.com|notneeded1.com',line):
>>         outFile.write(line)
>
> There is a subtle problem here -- the '.' means match any single
> character. I suppose it's unlikely to bite you, but it could -- for
> example, a line containing a domain named notneeded12com.net would
> match. You should probably escape the dot, and while you're at it
> compile the regular expression.
>
> # untested
> pattern = re.compile(r'notneeded\.com|notneeded1\.com')
> for line in log:
>    if not pattern.search(line):
>        outFile.write(line)
>
> HTH,
> Marty
>
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Sander Sweers

2009/5/4 Kent Johnson :
> str.decode() converts a string to a unicode object. unicode.encode()
> converts a unicode object to a (byte) string. Both of these functions
> take the encoding as a parameter. When Python is given a string, but
> it needs a unicode object, or vice-versa, it will encode or decode as
> needed. The encode or decode will use the system default encoding,
> which as you have discovered is ascii. If the data being encoded or
> decoded contains non-ascii characters, you get an error that you are
> familiar with. These errors indicate that you are not correctly
> handling encoded data.

Very interesting read Kent!

So if I get it correctly you are saying the join() is joining strings
of str and unicode type? Then would it help to add a couple of "print
type(the_string), the_string" before the .join() help finding which
string is not unicode or is unicode where it shouldn't?

Thanks
Sander
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 3:54 PM, Sander Sweers  wrote:
> 2009/5/4 Kent Johnson :
>> str.decode() converts a string to a unicode object. unicode.encode()
>> converts a unicode object to a (byte) string. Both of these functions
>> take the encoding as a parameter. When Python is given a string, but
>> it needs a unicode object, or vice-versa, it will encode or decode as
>> needed. The encode or decode will use the system default encoding,
>> which as you have discovered is ascii. If the data being encoded or
>> decoded contains non-ascii characters, you get an error that you are
>> familiar with. These errors indicate that you are not correctly
>> handling encoded data.
>
> Very interesting read Kent!
>
> So if I get it correctly you are saying the join() is joining strings
> of str and unicode type? Then would it help to add a couple of "print
> type(the_string), the_string" before the .join() help finding which
> string is not unicode or is unicode where it shouldn't?

I think that was the original problem though I haven't seen enough
code to be sure. The current problem is (I tihnk) that he is writing
encoded data to a codec writer that expects unicode input, so it is
trying to convert str to unicode (so it can convert back to str!) and
failing.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 8:45 AM, Alex Feddor  wrote:
> Hi
>
> I am looking for method enables advanced text string search. Method
> string.find() or re module seems no  supporting what I am looking for. The
> idea is as follows:
>
> Text ="FDA meeting was successful. New drug is approved for whole sale
> distribution!"
>
> I would like to scan the text using AND and OR operators and gets -1 or
> other value if the searching elements haven't found in the text.

There are some Python search engines that will do this. They might be
overkill unless you have a lot of text to search:
http://whoosh.ca/
http://lucene.apache.org/pylucene/
http://pypi.python.org/pypi/pyswish/20080920

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 12:38 PM, vince spicer  wrote:
> Advanced Strings searches are Regex via re module.
>
> EX:
>
> import re
>
> m = re.compile("(FDA.*?(approved|
> supported)|Ben[^\s])*")
>
> if m.search(Text):
>     print m.search(Text).group()

This won't match "approved FDA" which may be desired. It also quickly
gets complicated as the search expressions get more complex. Regex
would also have a hard time with something like
"FDA" AND NOT "approved"

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread David

Dear list,

in different books I come across different syntax for dealing with
files. It seems that open(filename, 'r') and file(filename, 'r') are
used interchangeably, and I wonder what this is all about. Is there a
reason why Python allows such ambiguity here?

Cheers for a quick shot of enlightenment ;-)

David
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread Bill Campbell

On Tue, May 05, 2009, David wrote:
>Dear list,
>
>in different books I come across different syntax for dealing with
>files. It seems that open(filename, 'r') and file(filename, 'r') are
>used interchangeably, and I wonder what this is all about. Is there a
>reason why Python allows such ambiguity here?
>
>Cheers for a quick shot of enlightenment ;-)

``pydoc file'' is your friend.  It says open is an alias for file.

Bill
-- 
INTERNET:   b...@celestial.com  Bill Campbell; Celestial Software LLC
URL: http://www.celestial.com/  PO Box 820; 6641 E. Mercer Way
Voice:  (206) 236-1676  Mercer Island, WA 98040-0820
Fax:(206) 232-9186  Skype: jwccsllc (206) 855-5792

A petty thief is put in jail. A great brigand becomes ruler of a
State. -- Chuang Tzu
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread bob gailer


PDavid wrote:

Dear list,

in different books I come across different syntax for dealing with
files. It seems that open(filename, 'r') and file(filename, 'r') are
used interchangeably, and I wonder what this is all about. Is there a
reason why Python allows such ambiguity here?
  


regarding file, the docs say:

Constructor function for the file type, described further in section 
3.9, ``File Objects''. The constructor's arguments are the same as those 
of the open() built-in function described below.
When opening a file, it's preferable to use open() instead of invoking 
this constructor directly. file is more suited to type testing (for 
example, writing "isinstance(f, file)").


Unfortunately no explanation as to WHY open is preferred. I have long 
wondered that myself.


Perhaps someone with more enlightenment can tell us!

--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] how to reference a function itself when accessing its private functions?

2009-05-04 Thread Tim Michelsen


Dear Tutors and fellow pythonistas,
I would like to get access to the private methods of my function.

For instance:
Who can I reference the docstring of a function within the function itself?

Please have a look at the code below and assist me.

Thanks and regards,
Timmie

 CODE ###

s = 'hello'

def show(str):
"""prints str"""
print str

return str



def show2(str):
"""prints str"""
print str
d = self.__doc__
print d

>>> show2(s)
hello
---
NameError Traceback (most recent call last)

 in ()

 in show2(str)

NameError: global name 'self' is not defined

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread Emile van Sebille


On 5/4/2009 2:50 PM bob gailer said...

PDavid wrote:

Dear list,

in different books I come across different syntax for dealing with
files. It seems that open(filename, 'r') and file(filename, 'r') are
used interchangeably, and I wonder what this is all about. Is there a
reason why Python allows such ambiguity here?


Backwards compatibility.  The file type was introduced in python 2.2, 
before which there was open.


Emile

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to reference a function itself when accessing its private functions?

2009-05-04 Thread Emile van Sebille


On 5/4/2009 3:37 PM Tim Michelsen said...

Dear Tutors and fellow pythonistas,
I would like to get access to the private methods of my function.

For instance:
Who can I reference the docstring of a function within the function itself?




def show2(str):
"""prints str"""
print str
d = self.__doc__
print d


>>> def show2(str):
... """prints str"""
... print str
... print globals()['show2'].__doc__
...
>>> show2('hello')
hello
prints str
>>>


This is the easy way -- ie, you know where to look and what name to use. 
 You can discover the name using the inspect module, but it can get 
ugly.  If you're interested start with...


from inspect import getframeinfo, currentframe

HTH,

Emile

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Conversion question

2009-05-04 Thread Tom Green

First, thanks in advance for any insight on how to assist in making me a
better Python programmer.

Here is my question.  I work with a lot of sockets and most of them require
hex data.  I am usually given a string of data to send to the socket.
Example:

"414243440d0a"

Is there a way in Python to say this is a string of HEX characters like
Perl's pack?  Right now I have to take the string and add a \x to every two
values i.e. \x41\x42...

Sometimes my string values are 99+ bytes in length.  I did write a parsing
program that would basically loop thru the string and insert the \x, but I
was wondering if there was another or better way.

Again, thanks in advance for any feedback.

Mike.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] returning the entire line when regex matches

2009-05-04 Thread Alan Gauld


Mr. Gauld is referring to!!  I searched python.org and alan-g.me.uk.
Does anyone have a link?


I posted a link to the Python howto and my tutorial is at alan-g.me.uk
You will find it on the contents frame under Regular Expressions...
Its in the Advanced Topics section.


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Conversion question

2009-05-04 Thread Emile van Sebille


On 5/4/2009 4:17 PM Tom Green said...
First, thanks in advance for any insight on how to assist in making me a 
better Python programmer.


Here is my question.  I work with a lot of sockets and most of them 
require hex data.  I am usually given a string of data to send to the 
socket.  Example:


"414243440d0a"

Is there a way in Python to say this is a string of HEX characters like 
Perl's pack?  Right now I have to take the string and add a \x to every 
two values i.e. \x41\x42...



import binascii
binascii.a2b_hex('41424344')

Emile

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread Alan Gauld

"Emile van Sebille"  wrote in message 
news:gtnrtf$pi...@ger.gmane.org...

On 5/4/2009 2:50 PM bob gailer said...

PDavid wrote:

Dear list,

in different books I come across different syntax for dealing with
files. It seems that open(filename, 'r') and file(filename, 'r') are
used interchangeably, and I wonder what this is all about. Is there a
reason why Python allows such ambiguity here?


Backwards compatibility.  The file type was introduced in python 2.2, 
before which there was open.


And file has been removed again in Python v3
In fact open is now an alias for io.open and no longer simply returns
a file object - in fact the file type itself is gone too!

A pity, there are cases where I found file() more intuitive than
open and vice versa so liked having both available. The fact that it
looked like creating an instance of a class seemed to fit well
in OO code.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/l2p/ 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Conversion question

2009-05-04 Thread Tom Green

Thank you, I didn't realize it was that easy.  I tried binascii before and I
thought it didn't work properly.

I appreciate it.

Mike.

On Mon, May 4, 2009 at 7:40 PM, Emile van Sebille  wrote:

> On 5/4/2009 4:17 PM Tom Green said...
>
>> First, thanks in advance for any insight on how to assist in making me a
>> better Python programmer.
>>
>> Here is my question.  I work with a lot of sockets and most of them
>> require hex data.  I am usually given a string of data to send to the
>> socket.  Example:
>>
>> "414243440d0a"
>>
>> Is there a way in Python to say this is a string of HEX characters like
>> Perl's pack?  Right now I have to take the string and add a \x to every two
>> values i.e. \x41\x42...
>>
>
>
> import binascii
> binascii.a2b_hex('41424344')
>
> Emile
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Conversion question

2009-05-04 Thread Alan Gauld



"Tom Green"  wrote

Here is my question.  I work with a lot of sockets and most of them 
require

hex data.  I am usually given a string of data to send to the socket.
Example:

"414243440d0a"

Is there a way in Python to say this is a string of HEX characters like
Perl's pack?  Right now I have to take the string and add a \x to every 
two

values i.e. \x41\x42...


Assuming you actually want to send the hex values rather than
a hex string representation then the way I'd send that would be
to convert that to a number using int() then transmit it using
struct()

Sometimes my string values are 99+ bytes in length.  I did write a 
parsing
program that would basically loop thru the string and insert the \x, but 
I

was wondering if there was another or better way.


OK, Maybe you do want to send the hex representation rather than
the actual data (I can't think why unless you have a very strange
parser at the other end). In that case I think you do need  to insert
the \x characters.


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] how to reference a function itself when accessing its private functions?

2009-05-04 Thread Kent Johnson

On Mon, May 4, 2009 at 6:37 PM, Tim Michelsen
 wrote:

> Who can I reference the docstring of a function within the function itself?

You can refer to the function by name inside the function. By the time
the body is actually executed, the name is defined:
In [1]: def show2(s):
   ...: """prints s"""
   ...: print s
   ...: print show2.__doc__

In [2]: show2("test")
test
prints s

> Please have a look at the code below and assist me.
>
> def show(str):
>    """prints str"""
>    print str

It's a good idea not to use the names of builtins, such as 'str', as
variable names in your program.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Tutor Digest, Vol 63, Issue 8

2009-05-04 Thread Dan Liang

as:
>
>  T = lambda s : "\t".join(s.split())
>  'case_def_gen' : T('case_def  gen  null'),
>  'nsuff_fem_pl' : T('nsuff null null'),
>  'abbrev' :   T('abbrevnull null'),
>  'adj' :  T('adj   null null'),
>  'adv' :  T('adv   null null'),}
>  del T
>
> and the extra spaces help you to see the individual subtags more easily,
> with no change in the resulting values since split() splits on multiple
> whitespace the same as a single space.)
>
> Of course you could simply code as:
>
>  'case_def_gen' : T('case_def\tgen\t null'),
>  'nsuff_fem_pl' : T('nsuff\tnull\tnull'),
>  'abbrev' :   T('abbrev\tnull\tnull'),
>  'adj' :  T('adj\tnull\tnull'),
>  'adv' :  T('adv\tnull\tnull'),}
>
> But I think readability definitely suffers here, I would probably go with
> the penultimate version.
>
> -- Paul
>
>
>
>
> --
>
> Message: 2
> Date: Mon, 4 May 2009 14:45:06 +0200
> From: Alex Feddor 
> Subject: [Tutor] Advanced String Search using operators AND, OR etc..
> To: tutor@python.org
> Message-ID:
><5bf184e30905040545i78bc75b8ic78eabf44a55a...@mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi
>
> I am looking for method enables advanced text string search. Method
> string.find() or re module seems no  supporting what I am looking for. The
> idea is as follows:
>
> Text ="FDA meeting was successful. New drug is approved for whole sale
> distribution!"
>
> I would like to scan the text using AND and OR operators and gets -1 or
> other value if the searching elements haven't found in the text.
> Example 01:
> search criteria:  "FDA" AND ( "approve*" OR "supported")
> The catch is that in Text variable FDA and approve words  are not one after
> another (other words are in between).
> Example 02:
> search criteria: "Ben"
> The catch is that code sould find only exact Ben words not also words which
> that has firts three letters Ben such as Benquick, Benseek etc.. Only Ben
> is
> the right word we are looking for.
>
> I would really appreciated your advice - code sample / links how above can
> be achieved! if possible I would appreciated solution achieved with free of
> charge module.
>
> Cheers,  Alex
> PS:
> A few moths ago I have discovered Python. I am amazed what all can be done
> with it. Really cool programming language..
> -- next part --
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/tutor/attachments/20090504/bbd34b5a/attachment-0001.htm
> >
>
> --
>
> Message: 3
> Date: Mon, 4 May 2009 11:09:25 -0300
> From: "Pablo P. F. de Faria" 
> Subject: Re: [Tutor] Encode problem
> To: Kent Johnson 
> Cc: *tutor python 
> Message-ID:
><3ea81d4c0905040709m78a45d11j2037943380817...@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Thanks, Kent, but that doesn't solve my problem. In fact, I need
> ConfigParser to work with non-ascii characters, since my App may run
> in "latin-1" environments (folders e files names). I must find out why
> the str() function in the module ConfigParser doesn't use the encoding
> defined for the application (# -*- coding: utf-8 -*-). The rest of the
> application works properly with utf-8, except for ConfigParser. What I
> found out is that ConfigParser seems to make use of the configuration
> in Site.py (which is set to 'ascii'), instead of the configuration
> defined for the App (if I change . But this is very problematic to
> have to change Site.py in every computer... So I wonder if there is a
> way to replace the settings in Site.py only for my App.
>
> 2009/5/1 Kent Johnson :
> > On Fri, May 1, 2009 at 4:54 PM, Pablo P. F. de Faria
> >  wrote:
> >> Hi, Kent.
> >>
> >> The stack trace is:
> >>
> >> Traceback (most recent call last):
> >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1057, in
> OnClose
> >> ? ?self.SavePreferences()
> >> ?File "/home/pablo/workspace/E-Dictor/src/MainFrame.py", line 1068,
> >> in SavePreferences
> >> ? ?self.cfg.set(u'File Settings',u'Recent files',
> >> unicode(",".join(self.recent_files)))
> >> UnicodeDecodeError: 'ascii'

[Tutor] Replacing fields in lines of various lengths

2009-05-04 Thread Dan Liang

(Please disregard my earlier message that was sent by mistake before I
finished composing. Sorry about that! :().

Hello Spir, Alan, and Paul, and tutors,

Thank you Spir, Alan, and Paul for your help with my previous code! Earlier,
I was asking how to separate a composite tag like the one in field 2 below
with sub-tags like those in the values of the dictionary below. In my
original question, I was asking about data formatted as follows:

w1\t   case_def_acc
w2‬\t   noun_prop
‭w3‬\t   case_def_gen
w4\t   dem_pron_f


And I put together the code below based on your suggestions, with minor
changes and it does work.


-Begin code

#!usr/bin/python
tags = {
'noun-prop': 'noun_prop null null'.split(),
'case_def_gen': 'case_def gen null'.split(),
'dem_pron_f': 'dem_pron f null'.split(),
'case_def_acc': 'case_def acc null'.split(),
}


TAB = '\t'


def newlyTaggedWord(line):
   line = line.rstrip() # I strip line ending
   (word,tag) = line.split(TAB)# separate parts of line, keeping
data only
   new_tags = tags[tag]  # read in dict
   tagging = TAB.join(new_tags)# join with TABs
   return word + TAB + tagging   # formatted result

def replaceTagging(source_name, target_name):
   target_file = open(target_name, "w")
   # replacement loop
   for line in open(source_name, "r"):
   new_line = newlyTaggedWord(line) + '\n'
   target_file.write(new_line)

source_name.close()
target_file.close()

if __name__ == "__main__":
   source_name = sys.argv[1]
   target_name = sys.argv[2]
   replaceTagging(source_name, target_name)



-End code


Now since I have to workon different data format as follows:

-Begin data

w1\t   case_def_acc   \t  yes
w2‬\t   noun_prop   \t   no
‭w3‬\t   case_def_gen   \t
w4\t   dem_pron_f   \t no
w3‬\t   case_def_gen   \t
w4\t   dem_pron_f   \t no
w1\t   case_def_acc   \t  yes
w3‬\t   case_def_gen   \t
w3‬\t   case_def_gen   \t

-End data
Notices that some lines have nothing in yes-no filed, and hence end in a
tab.

My question is how to replace data in the filed of composite tags by
sub-tags like those in the dictionary values above and still be able to
print the whole line only with this change (i.e, composite tags replace by
sub-tags). Earlier, we read words and tags from line directly into the
dictionary since we were sure each line had 2 fields after separating by
tabs. Here, lines have various field lengths and sometimes have yes and no
finally, and sometimes not.

I tried to  make changes to the code above by changing the function where we
read the dictionary, but it did not work. While it is ugly, I include it as
a proof that I have worked on the problem. I am sure you will have various
nice ideas.


-End code
def newlyTaggedWord(line):
   tagging = ""
   line = line.split(TAB)# separate parts of line, keeping data only
   if len(line)==3:
   word = line[-3]
   tag = line[-2]
   new_tags = tags[tag]
   decision = line[-1]

# in decision I wanted to store #either yes or no if one of #these existed

   elif len(line)==2:
   word = line[-2]
   tag = line[-1]
   decision = TAB

# I thought if it is a must to put sth in decision while decision #is really
absent in line, I would put a tab. But I really want to #avoid putting
anything there.

   new_tags = tags[tag]  # read in dict
   tagging = TAB.join(new_tags)# join with TABs
   return word + TAB + tagging + TAB + decision
-End code


I appreciate your support!

--dan
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encode problem

2009-05-04 Thread Mark Tolonen



"spir"  wrote in message 
news:20090501220601.31891...@o...

Le Fri, 1 May 2009 15:19:29 -0300,
"Pablo P. F. de Faria"  s'exprima ainsi:


self.cfg.write(codecs.open(self.properties_file,'w','utf-8'))

As one can see, the character encoding is explicitly UTF-8. But
ConfigParser keeps trying to save it as a 'ascii' file and gives me
error for directory-names containing >128 code characters (like "Á").
It is just a horrible thing to me, for my app will be used mostly by
brazillians.


Just superficial suggestions, only because it's 1st of May and WE so that 
better answers won't maybe come up before monday.


If all what you describe is right, then there must be something wrong with 
char encoding in configParser's write method. Have you had a look at it? 
While I hardly imagine why/how ConfigParser would limit file pathes to 
7-bit ASCII...
Also, for porteguese characters, you shouldn't even need explicit 
encoding; they should pass through silently because they fit in an 8 bit 
latin charset. (I never encode french path/file names.)


The below works.  ConfigParser isn't written to support Unicode correctly. 
I was able to get Unicode sections to write out, but it was just luck. 
Unicode keys and values break as the OP discovered.  So treat everything as 
byte strings:



# coding: utf-8
# Note coding is required because of non-ascii
# in the source code.  This ONLY controls the
# encoding of the source file characters saved to disk.
import ConfigParser
import glob
import sys
c = ConfigParser.ConfigParser()
c.add_section('马克') # this is a utf-8 encoded byte string...no u'')
c.set('马克','多少','明白') # so are these

# The following could be glob.glob(u'.') to get a filename in
# Unicode, but this is for illustration that the encoding of the
# source file has no bearing on the encoding strings other than
# one's hard-coded in the source file.  The 'files' list will be byte
# strings in the default file system encoding.  Which for Windows
# is 'mbcs'...a magic value that changes depending on the
# which country's version of Windows is running.
files = glob.glob('*.txt')
c.add_section('files')

for i,fn in enumerate(files):
   fn = fn.decode(sys.getfilesystemencoding())
   fn = fn.encode('utf-8')
   c.set('files','file%d'%(i+1),fn)

# Don't need a codec here...everything is already UTF8.
c.write(open('chinese.txt','wt'))
--

Here is the content of my utf-8 file:

-
[files]
file3 = ascii.txt
file2 = chinese.txt
file1 = blah.txt
file5 = ÀÈÌÒÙ.txt
file4 = other.txt

[马克]
多少 = 明白


Hope this helps,
Mark


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] quick question to open(filename, 'r') vs. file(filename, 'r')

2009-05-04 Thread Lie Ryan


Alan Gauld wrote:
 > And file has been removed again in Python v3

In fact open is now an alias for io.open and no longer simply returns
a file object - in fact the file type itself is gone too!

A pity, there are cases where I found file() more intuitive than
open and vice versa so liked having both available. The fact that it
looked like creating an instance of a class seemed to fit well
in OO code.


But having both of them is a violation of "There should be one-- and 
preferably only one --obvious way to do it."


I think python's duck typing culture makes it very rare that you want to 
test whether a file is an impostor.


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Advanced String Search using operators AND, OR etc..

2009-05-04 Thread C or L Smith

>> From: Alex Feddor 
>> 
>> I am looking for method enables advanced text string search. Method
>> string.find() or re module seems no  supporting what I am looking
>> for. The idea is as follows:
>> 
>> Text ="FDA meeting was successful. New drug is approved for whole
>> sale distribution!"
>> 
>> 
>> I would really appreciated your advice - code sample / links how
>> above can 
>> be achieved! if possible I would appreciated solution achieved with
>> free of 
>> charge module.

The pieces to assemble a solution are not too hard to master. Instead of 
thinking of searching your text, think about searching a list of words in the 
text for what you are interested in.

The re pattern to match a word containing only letters is [a-zA-Z]+. This 
pattern can cut your text into words for you. A list of words corresponding to 
your text can then be made with re.findall():

###
>>> word=re.compile('[a-zA-Z]+')
>>> text = """FDA meeting was successful."""
>>> Words = re.findall(word, text)
>>> Words
['FDA', 'meeting', 'was', 'successful']
>>> 
###

There are some gems hidden in some of the modules that are intended for one 
purpose but can be handy for another. For your purposes, the fnmatch module has 
a lightweight (compared to re) string matching function that can be used to 
find out if a word matches a given criteria or not. There are only 4 types of 
patterns to master:

* matches anything
? matches a single character
[seq] matches any character in the sequence
[!seq] matches any character NOT in the sequence

Within the module there is a case sensitive and case insensitive version of a 
pattern matcher. We can write a helper function that allows us to use either 
one (and it is set right now to be case sensitive by default):

###
import fnmatch
def match(pat, words, case=True):
"""See if pat matches an word in words list. It uses a generator
rather than a list inside the any() so as not to generate the
whole list if at all possible."""
if case:
return any(x for x in words if fnmatch.fnmatchcase(x,pat))
else:
return any(x for x in words if fnmatch.fnmatch(x,pat))
###

Now you can see if a certain pattern is in your list of words or not:

###
>>> Words=['FDA', 'meeting', 'was', 'successful']
>>> match('FDA',Words)
True
>>> match('fda',Words)
False
>>> match('fda',Words, case=False)
True
>>> 
###

And now string together whatever tests you like for a given line:

###
>>> match('FDA',Words) and (match('approve*',Words) or match('success*',Words))
True
>>> 
###

If you are searching a large piece of text you might want to turn the list of 
words into a set of unique words so there is less to search. The match function 
will work with it equally as well.

###
>>> text='this is a list is a list is a list'
>>> re.findall(word,text)
['this', 'is', 'a', 'list', 'is', 'a', 'list', 'is', 'a', 'list']
>>> set(_)
set(['this', 'a', 'is', 'list'])
>>> match('is', _)
True
>>> 
###

You also might want to apply your search line by line, but those are details 
you might already know how to handle. 

Hope that helps!

/chris
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

42 matches

Mail list logo