need some kind of "coherence index" for a group of strings

2016-11-03 Thread Fillmore


Hi there, apologies for the generic question. Here is my problem let's 
say that I have a list of lists of strings.


list1:#strings are sort of similar to one another

  my_nice_string_blabla
  my_nice_string_blqbli
  my_nice_string_bl0bla
  my_nice_string_aru


list2:#strings are mostly different from one another

  my_nice_string_blabla
  some_other_string
  yet_another_unrelated string
  wow_totally_different_from_others_too


I would like an algorithm that can look at the strings and determine 
that strings in list1 are sort of similar to one another, while the 
strings in list2 are all different.
Ideally, it would be nice to have some kind of 'coherence index' that I 
can exploit to separate lists given a certain threshold.


I was about to concoct something using levensthein distance, but then I 
figured that it would be expensive to compute and I may be reinventing 
the wheel.


Thanks in advance to python masters that may have suggestions...



--
https://mail.python.org/mailman/listinfo/python-list


Re: need some kind of "coherence index" for a group of strings

2016-11-03 Thread Fillmore

On 11/3/2016 6:47 PM, [email protected] wrote:

On Thursday, November 3, 2016 at 1:09:48 PM UTC-7, Neil D. Cerutti wrote:

you may also be
able to use some items "off the shelf" from Python's difflib.


I wasn't aware of that module, thanks for the tip!

difflib.SequenceMatcher.ratio() returns a numerical value which represents
> the "similarity" between two strings.  I don't see a precise 
definition of

> "similar", but it may do what the OP needs.





I may end up rolling my own algo, but thanks for the tip, this does seem 
like useful stuff indeed



--
https://mail.python.org/mailman/listinfo/python-list


MemoryError and Pickle

2016-11-21 Thread Fillmore


Hi there, Python newbie here.

I am working with large files. For this reason I figured that I would 
capture the large input into a list and serialize it with pickle for 
later (faster) usage.
Everything has worked beautifully until today when the large data (1GB) 
file caused a MemoryError :(


Question for experts: is there a way to refactor this so that data may 
be filled/written/released as the scripts go and avoid the problem?

code below.

Thanks

data = list()
for line in sys.stdin:

try:
parts = line.strip().split("\t")
t = parts[0]
w = parts[1]
u = parts[2]



#let's retain in-memory copy of data
data.append({"ta": t,
 "wa": w,
 "ua": u
})

except IndexError:
print("Problem with line :"+line, file=sys.stderr)
pass

#time to save data object into a pickle file

fileObject = open(filename,"wb")
pickle.dump(data,fileObject)
fileObject.close()
--
https://mail.python.org/mailman/listinfo/python-list


Drowning in a teacup?

2016-04-01 Thread Fillmore


notorious pass by reference vs pass by value biting me in the backside 
here. Proceeding in order.


I need to scan a list of strings. If one of the elements matches the 
beginning of a search keyword, that element needs to snap to the front 
of the list.

I achieved that this way:


   for i in range(len(mylist)):
if(mylist[i].startswith(key)):
mylist = [mylist[i]] + mylist[:i] + mylist[i+1:]

Since I need this code in multiple places, I placed it inside a function

def bringOrderStringToFront(mylist, key):

for i in range(len(mylist)):
if(mylist[i].startswith(key)):
mylist = [mylist[i]] + mylist[:i] + mylist[i+1:]

and called it this way:

 if orderstring:
 bringOrderStringToFront(Tokens, orderstring)

right?
Nope, wrong! contrary to what I thought I had understood about how 
parameters are passed in Python, the function is acting on a copy(!) and 
my original list is unchanged.


I fixed it this way:

def bringOrderStringToFront(mylist, key):

for i in range(len(mylist)):
if(mylist[i].startswith(key)):
mylist = [mylist[i]] + mylist[:i] + mylist[i+1:]
return(mylist)

and:

if orderstring:
   Tokens = bringOrderStringToFront(Tokens, orderstring)

but I'm left with a sour taste of not understanding what I was doing 
wrong. Can anyone elaborate? what's the pythonista way to do it right?


Thanks

--
https://mail.python.org/mailman/listinfo/python-list


Re: Drowning in a teacup?

2016-04-01 Thread Fillmore

On 04/01/2016 04:27 PM, Fillmore wrote:


notorious pass by reference vs pass by value biting me in the backside here. 
Proceeding in order.



Many thanks to all of those who replied!
--
https://mail.python.org/mailman/listinfo/python-list


Most probably a stupid question, but I still want to ask

2016-04-10 Thread Fillmore


let's look at this:

$ python3.4
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> line1 = '"String1" | bla'
>>> parts1 = line1.split(" | ")
>>> parts1
['"String1"', 'bla']
>>> tokens1 = eval(parts1[0])
>>> tokens1
'String1'
>>> tokens1[0]
'S'

and now this

>>> line2 = '"String1","String2" | bla'
>>> parts2 = line2.split(" | ")
>>> tokens2 = eval(parts2[0])
>>> tokens2
('String1', 'String2')
>>> tokens2[0]
'String1'
>>> type(tokens1)

>>> type(tokens2)

>>>


the question is: at which point did the language designers decide to betray the
"path of least surprise" principle and create a 'discontinuity' in the language?
Open to the idea that I am getting something fundamentally wrong. I'm new to 
Python...

Thanks


--
https://mail.python.org/mailman/listinfo/python-list


one-element tuples [Was: Most probably a stupid question, but I still want to ask]

2016-04-10 Thread Fillmore


Sorry guys. It was not my intention to piss off anyone...just trying to 
understand how the languare works

I guess that the answer to my question is: there is no such thing as a 
one-element tuple,
and Python will automatically convert a one-element tuple to a string... hence 
the
behavior I observed is explained...

>>> a = ('hello','bonjour')
>>> b = ('hello')
>>> b
'hello'
>>> a
('hello', 'bonjour')
>>>


Did I get this right this time?

--
https://mail.python.org/mailman/listinfo/python-list


Re: Most probably a stupid question, but I still want to ask

2016-04-10 Thread Fillmore

On 04/10/2016 07:30 PM, Stephen Hansen wrote:


There's nothing inconsistent or surprising going on besides you doing
something vaguely weird and not really expressing what you find
surprising.


well, I was getting some surprising results for some of my data, so I can
guarantee that I was surprised!

apparently my 'discontinuity' is mappable to the fact that there's no such
thing as one-element tuples in Python, and attempts to create one will
result in a string (i.e. an object of a different kind!)...





--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples [Was: Most probably a stupid question, but I still want to ask]

2016-04-10 Thread Fillmore

On 04/10/2016 08:13 PM, Fillmore wrote:


Sorry guys. It was not my intention to piss off anyone...just trying to 
understand how the languare works

I guess that the answer to my question is: there is no such thing as a 
one-element tuple,
and Python will automatically convert a one-element tuple to a string... hence 
the
behavior I observed is explained...

 >>> a = ('hello','bonjour')
 >>> b = ('hello')
 >>> b
'hello'
 >>> a
('hello', 'bonjour')
 >>>



Hold on a sec! it turns up that there is such thing as single-element tuples in 
python:

>>> c = ('hello',)
>>> c
('hello',)
>>> c[0]
'hello'
>>> c[1]
Traceback (most recent call last):
  File "", line 1, in 
IndexError: tuple index out of range
>>>

So, my original question makes sense. Why was a discontinuation point 
introduced by the language designer?


--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-10 Thread Fillmore

On 04/10/2016 08:31 PM, Ben Finney wrote:

Can you describe explicitly what that “discontinuation point” is? I'm
not seeing it.


Here you go:

>>> a = '"string1"'
>>> b = '"string1","string2"'
>>> c = '"string1","string2","string3"'
>>> ea = eval(a)
>>> eb = eval(b)
>>> ec = eval(c)
>>> type(ea)
   <--- HERE 
>>> type(eb)

>>> type(ec)


I can tell you that it exists because it bit me in the butt today...

and mind you, I am not saying that this is wrong. I'm just saying that it 
surprised me.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Most probably a stupid question, but I still want to ask

2016-04-10 Thread Fillmore


funny, but it seems to me that you are taking it personally... thank god i even 
apologized
in advance for what was most probably a stupid question..

On 04/10/2016 09:50 PM, Steven D'Aprano wrote:


Fillmore, you should feel very pleased with yourself. All the tens of
thousands of Python programmers, and millions of lines of code written in
the language, but nobody until now was able to see what you alone had the
intelligence and clarity of thought to spot. Well done!





--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-10 Thread Fillmore

On 04/10/2016 09:36 PM, Ben Finney wrote:

If the two examples give you different responses (one surprises you, the
other does not), I would really like to know*what the surprise is*.
What specifically did you expect, that did not happen?


now that I get the role of commas it's not surprising anymore...

thanks

--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-10 Thread Fillmore


Thank you for trying to help, Martin. So:

On 04/10/2016 09:08 PM, Martin A. Brown wrote:

#1: I would not choose eval() except when there is no other
 solution.  If you don't need eval(), it may save you some
 headache in the future, as well, to find an alternate way.
 So, can we help you choose something other than eval()?
 What are you trying to do with that usage?


so, I do not quite control the format of the file I am trying to parse.

it has the format:

"str1","str2",,"strN" => more stuff
  :

in some cases there is just one "str" which is what created me problem.
The first "str1" has special meaning and, at times, it can be alone.

The way I handle this is:

parts = line.strip().split(" => ")
tokens = eval(parts[0])

if type(tokens) == str:   #Handle case that there's only one token
columns.add(tokens)
rowTokenString = "__Empty__"
rows.add(rowTokenString)
value = parts[1][:2]
addCell(table, rowTokenString, tokens, value)
else:
columns.add(tokens[0])
rowTokenString = '"'+'","'.join(tokens[1:]) + '"'
rows.add(rowTokenString)
value = parts[1][:2]
addCell(table, rowTokenString, tokens[0],value)

which admittedly is not very elegant. If you have suggestions on how to avoid 
the use
of eval() and still achieve the same, I would be delighted to hear them


--
https://mail.python.org/mailman/listinfo/python-list


Re: Most probably a stupid question, but I still want to ask

2016-04-10 Thread Fillmore

On 04/10/2016 11:54 PM, Steven D'Aprano wrote:

On Mon, 11 Apr 2016 12:48 pm, Fillmore wrote:



funny, but it seems to me that you are taking it personally... thank god i
even apologized in advance for what was most probably a stupid question..


I hope you did get a laugh out of it, because it wasn't meant to be nasty.
But it was meant to get you to think about statements about betraying
principles and other inflammatory remarks.


I did have a laugh.

I don't think I talked about betraying principles. I just mentioned that in my 
newbie
mind, I experienced what I perceived as a discontinuity. My limited 
understanding of
what builds tuples and the (almost always to be avoided) use of eval() were the 
origin
of my perplexities.

I'll make sure I approach the temple of pythonistas bare-footed and with 
greater humility next time




--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-10 Thread Fillmore

On 04/11/2016 12:10 AM, Ben Finney wrote:


So, will we never get your statement of what surprised you between those
examples?

Clearly there is something of interest here. I'd like to know what the
facts of the matter were; “beginner's mind” is a precious resource, not
to be squandered.



I thought I had made the point clear with the REPL session below. I had (what 
seemed
to me like) a list of strings getting turned into a tuple. I was surprised that
a single string wasn't turned into a single-element tuple.
Now that I know that commas create tuples, but lack of commas don't, I'm not 
surprised anymore.

>>> a = '"string1"'
>>> b = '"string1","string2"'
>>> c = '"string1","string2","string3"'
>>> ea = eval(a)
>>> eb = eval(b)
>>> ec = eval(c)
>>> type(ea)

>>> type(eb)

>>> type(ec)


--
https://mail.python.org/mailman/listinfo/python-list


Re: one-element tuples

2016-04-11 Thread Fillmore

On 04/11/2016 10:10 AM, Grant Edwards wrote:


What behaviour did you expect instead? That's still unclear.


I must admit this is one of the best trolls I've seen in a while...



shall I take it as a compliment?
--
https://mail.python.org/mailman/listinfo/python-list


I have been dealing with Python for a few weeks...

2016-04-14 Thread Fillmore


...and I'm loving it.

Sooo much more elegant than Perl...and so much less going back to the 
manual to lookup the syntax of simple data structures and operations...


REPL is so useful

and you guys rock too

cheers
--
https://mail.python.org/mailman/listinfo/python-list


Re: I have been dealing with Python for a few weeks...

2016-04-18 Thread Fillmore

On 04/14/2016 10:12 PM, justin walters wrote:

On Thu, Apr 14, 2016 at 1:50 PM, Fillmore 
wrote:



...and I'm loving it.

Sooo much more elegant than Perl...and so much less going back to the
manual to lookup the syntax of simple data structures and operations...

REPL is so useful

and you guys rock too

cheers
--
https://mail.python.org/mailman/listinfo/python-list



Good to hear you're enjoying it. Out of curiosity, what were you using Perl
for?



I am a programmer by education but have not programmed in years. Sometimes I 
need to get
my point across to people in my team that something can be done. The fact that 
I could prototype
in Perl (now Python) certainly makes it harder for others to argue that this or 
that cannot be done
in (java/PHP) or that it would take a disproportionate amount of time to carry 
out...

Thanks

--
https://mail.python.org/mailman/listinfo/python-list


Python script reading from sys.stdin and debugger

2016-05-19 Thread Fillmore

Hello PyMasters!

Long story short:

cat myfile.txt | python -m pdb myscript.py

doens't work (pdb hijacking stdin?).

Google indicates that someone has fixed this with named pipes, but, call 
me stupid, I don't understand how I need to set up those pipes, how I 
need to modify my script and, above all, how I now need to invoke the 
program.


Help and suggestions appreciated. I am using Python 3.4 on Cygwin and 
Ubuntu.


Thanks

--
https://mail.python.org/mailman/listinfo/python-list


reduction

2016-05-31 Thread Fillmore


My problem. I have lists of substrings associated to values:

['a','b','c','g'] => 1
['a','b','c','h'] => 1
['a','b','c','i'] => 1
['a','b','c','j'] => 1
['a','b','c','k'] => 1
['a','b','c','l'] => 0  # <- Black sheep!!!
['a','b','c','m'] => 1
['a','b','c','n'] => 1
['a','b','c','o'] => 1
['a','b','c','p'] => 1

I can check a bit of data against elements in this list
and determine whether the value to be associated to the data is 1 or 0.

I would like to make my matching algorithm smarter so I can
reduce the total number of lists:

['a','b','c','l'] => 0  # If "l" is in my data I have a zero
['a','b','c'] => 1  # or a more generic match will do the job

I am trying to think of a way to perform this "reduction", but
I have a feeling I am reinventing the wheel.

Is this a common problem that is already addressed by an existing module?

I realize this is vague. Apologies for that.

thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: reduction

2016-06-01 Thread Fillmore


Thank you, guys. Your suggestions are avaluable. I think I'll go with the tree

On 05/31/2016 10:22 AM, Fillmore wrote:


My problem. I have lists of substrings associated to values:

['a','b','c','g'] => 1
['a','b','c','h'] => 1
['a','b','c','i'] => 1
['a','b','c','j'] => 1
['a','b','c','k'] => 1
['a','b','c','l'] => 0  # <- Black sheep!!!
['a','b','c','m'] => 1
['a','b','c','n'] => 1
['a','b','c','o'] => 1
['a','b','c','p'] => 1

I can check a bit of data against elements in this list
and determine whether the value to be associated to the data is 1 or 0.

I would like to make my matching algorithm smarter so I can
reduce the total number of lists:

['a','b','c','l'] => 0  # If "l" is in my data I have a zero
['a','b','c'] => 1  # or a more generic match will do the job

I am trying to think of a way to perform this "reduction", but
I have a feeling I am reinventing the wheel.

Is this a common problem that is already addressed by an existing module?

I realize this is vague. Apologies for that.

thank you


--
https://mail.python.org/mailman/listinfo/python-list


loading trees...

2016-06-12 Thread Fillmore


Hi, problem for today. I have a batch file that creates "trees of data".
I can save these trees in the form of python code or serialize them with 
something
like pickle.

I then need to run a program that loads the whole forest in the form of a dict()
where each item will point to a dynamically loaded tree.

What's my best way to achieve this? Pickle? or creating curtom python code?

or maybe I am just reinventing the wheel and there are better ways to achieve 
this?

The idea is that I'll receive a bit of data, determine which tree is suitable 
for handling it,
and dispatch the data to the right tree for further processing...

Thanks
--
https://mail.python.org/mailman/listinfo/python-list


psss...I want to move from Perl to Python

2016-01-28 Thread Fillmore


I learned myself Perl as a scripting language over two decades ago. All 
through this time, I would revert to it from time to time whenever I 
needed some text manipulation and data analysis script.


My problem? maybe I am stupid, but each time I have to go back and 
re-learn the syntax, the gotchas, the references and the derefercing, 
the different syntax between Perl 4 and Perl 5, that messy CPAN in which 
every author seems to have a different ideas of how things should be 
done


I get this feeling I am wasting a lot of time restudying the wheel each 
tim...


I look and Python and it looks so much more clean

add to that that it is the language of choice of data miners...

add to that that iNotebook looks powerful

Does Python have Regexps?

How was the Python 2.7 vs Python 3.X solved? which version should I go for?

Do you think that switching to Python from Perl is a good idea at 45?

Where do I get started moving from Perl to Python?

which gotchas need I be aware of?

Thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-01-29 Thread Fillmore

So many answers. So much wisdom...thank you everyone

On 01/28/2016 07:01 PM, Fillmore wrote:




--
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-01-29 Thread Fillmore


+1

On 1/29/2016 10:07 AM, Random832 wrote:


The main source of confusion is that $foo[5] is an element of @foo.
$foo{'x'} is an element of %foo. Both of these have absolutely nothing
to do with $foo.



--
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-01-29 Thread Fillmore


I actually have a few followup question.

- will iNotebook also work in Python 3?

- What version of Python 3 do you recommend I install on Windows?

- Is Python 3 available also for CygWin?

- I use Ubuntu at home. Will I be able to install Python 3 with apt-get? 
will I need to uninstall previous versions?


- Is there a good IDE that can be used for debugging? all free IDEs for 
Perl suck and it would be awesome if Python was better than that.


Thanks


On 1/28/2016 7:01 PM, Fillmore wrote:


I learned myself Perl as a scripting language over two decades ago. All


--
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-01-29 Thread Fillmore

On 1/29/2016 4:30 PM, Rick Johnson wrote:

 People who are unwilling to "expanding their
intellectual horizons" make me sick!!!


did I miss something or is this aggressiveness unjustified?


--
https://mail.python.org/mailman/listinfo/python-list


Re: psss...I want to move from Perl to Python

2016-01-31 Thread Fillmore

On 01/30/2016 05:26 AM, [email protected] wrote:


Python 2 vs python 3 is anything but "solved".



Python 3.5.1 is still suffering from the same buggy
behaviour as in Python 3.0 .



Can you elaborate?



--
https://mail.python.org/mailman/listinfo/python-list


Cygwin and Python3

2016-02-09 Thread Fillmore


Hi, I am having a hard time making my Cygwin run Python 3.5 (or Python 2.7 for 
that matter).
The command will hang and nothing happens.

A cursory search on the net reveals many possibilities, which might mean a lot
of trial and error, which I would very much like to avoid.

Any suggestions on how I can get cygwin and Python3.5 to play together like 
brother and sister?

thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cygwin and Python3

2016-02-09 Thread Fillmore

On 2/9/2016 2:29 PM, [email protected] wrote:




$ ls -l /usr/bin/python
rm /usr/bin/python

$ ln -s /usr/bin/python /usr/bin/python3.2m.exe

$ /usr/bin/python --version
Python 3.2.5

$  pydoc modules



Still no luck (:

 ~
$ python --version
Python 3.5.1

 ~
$ python
(..hangs indefinitely)
^C

 ~
$ pydoc modules
-bash: pydoc: command not found

 ~
$ echo $PATH
/usr/local/bin:/usr/bin:/cygdrive/c/Python27:/cygdrive/c/
Python27/Scripts:/cygdrive/c/Windows/system32:/cygdrive/
c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/
c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/
c/Program Files (x86)/Common Files/Roxio Shared/OEM/
DLLShared:/cygdrive/c/Program Files (x86)/Common Files/
Roxio Shared/OEM/DLLShared:/cygdrive/c/Program
 Files (x86)/Common Files/Roxio Shared/OEM/12.0/
DLLShared:/cygdrive/c/Program Files (x86)/Roxio/OEM/
AudioCore:/cygdrive/c/unxutils/bin:/cygdrive/c/unxutils
/usr/local/wbin:/cygdrive/c/strawberry/c/bin:/cygdrive/
c/strawberry/perl/site/bin:/cygdrive/c/strawberry/
perl/bin:/cygdrive/c/Program Files/Intel/WiFi/bin:/
cygdrive/c/Program Files/Common Files/Intel/
WirelessCommon:/cygdrive/c/Users/user/AppData/Local/
Programs/Python/Python35/Scripts:/cygdrive/c/Users/
user/AppData/Local/Programs/Python/Python35:%APPDATA%
/Python/Scripts:/cygdrive/c/Program Files/Intel/WiFi/
bin:/cygdrive/c/Program Files/Common Files/Intel/
WirelessCommon


--
https://mail.python.org/mailman/listinfo/python-list


Re: Cygwin and Python3

2016-02-09 Thread Fillmore

On 2/9/2016 3:30 PM, [email protected] wrote:



When you run the cygwin installer you have the option of installing 2.7

> and 3.2.5, by default it will install 2.7 and 3.2 together.
> After running the installer run whereis python and use the alternatives
> to change it or use python3 instead of python #!/usr/bin/python3


Hope this helps.



I see. I was trying to do it the Perl way. I simply linked the 
strawberry perl.exe from cygwin environemnt and it replaced the built in 
perl that sucked.

OK. Backtrack. I'll try with a purely cygwin solution...

Thank you


--
https://mail.python.org/mailman/listinfo/python-list


Re: Cygwin and Python3

2016-02-09 Thread Fillmore

On 2/9/2016 4:47 PM, Fillmore wrote:

On 2/9/2016 3:30 PM, [email protected] wrote:



When you run the cygwin installer you have the option of installing 2.7

 > and 3.2.5, by default it will install 2.7 and 3.2 together.
 > After running the installer run whereis python and use the alternatives
 > to change it or use python3 instead of python #!/usr/bin/python3


Hope this helps.



I see. I was trying to do it the Perl way. I simply linked the
strawberry perl.exe from cygwin environemnt and it replaced the built in
perl that sucked.
OK. Backtrack. I'll try with a purely cygwin solution...

Thank you




$ python --version
Python 2.7.10

$ python3 --version
Python 3.4.3

Thank you, Alvin

--
https://mail.python.org/mailman/listinfo/python-list


Regex: Perl to Python

2016-03-06 Thread Fillmore


Hi, I'm trying to move away from Perl and go to Python.
Regex seems to bethe hardest challenge so far.

Perl:

while () {
if (/(\d+)\t(.+)$/) {
print $1." - ". $2."\n";
}
}

into python

pattern = re.compile(r"(\d+)\t(.+)$")
with open(fields_Indexfile,mode="rt",encoding='utf-8') as headerfile:
for line in headerfile:
#sys.stdout.write(line)
m = pattern.match(line)
print(m.group(0))
headerfile.close()

but I must be getting something fundamentally wrong because:

Traceback (most recent call last):
  File "./slicer.py", line 30, in 
print(m.group(0))
AttributeError: 'NoneType' object has no attribute 'group'


 why is 'm' a None?

the input data has this format:

 :
 3  prop1
 4  prop2
 5  prop3

Thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: Regex: Perl to Python

2016-03-07 Thread Fillmore


Big thank you to everyone who offered their help!

On 03/06/2016 11:38 PM, Fillmore wrote:




--
https://mail.python.org/mailman/listinfo/python-list


Pythonic love

2016-03-07 Thread Fillmore


learning Python from Perl here. Want to do things as Pythonicly as possible.

I am reading a TSV, but need to skip the first 5 lines. The following 
works, but wonder if there's a more pythonc way to do things. Thanks


ctr = 0
with open(prfile,mode="rt",encoding='utf-8') as pfile:
for line in pfile:
ctr += 1

if ctr < 5:
continue

allVals = line.strip().split("\t")
print(allVals)
--
https://mail.python.org/mailman/listinfo/python-list


breaking out of outer loops

2016-03-07 Thread Fillmore


I must be missing something simple because I can't find a way to break 
out of a nested loop in Python.


Is there a way to label loops?

For the record, here's a Perl script of mine I'm trying to port...there 
may be 'malformed' lines in a TSV file I'm parsing that are better 
discarded than fixed.


my $ctr = 0;
OUTER:
while($line = ) {

$ctr++;
if ($ctr < 5) {next;}

my @allVals  = split /\t/,$line;

my $newline;
foreach my $i (0..$#allVals) {

if ($i == 0) {
if ($allVals[0] =~ /[^[:print:]]/) {next OUTER;}

$newline =  $allVals[0];
}

if (defined $headers{$i}) {

#if column not a number, skip line
if ($allVals[$i+1] !~ /^\d+$/) {next OUTER;}

$newline .= "\t".$allVals[$i+1];
}
}
print $newline."\n";

}
--
https://mail.python.org/mailman/listinfo/python-list


Re: Pythonic love

2016-03-07 Thread Fillmore

On 3/7/2016 6:03 PM, [email protected] wrote:


On a side note, your "with open..." line uses inconsistent quoting.

> You have "" on one string, but '' on another.

Thanks. I'll make sure I flog myself three times later tonight...



--
https://mail.python.org/mailman/listinfo/python-list


Re: breaking out of outer loops

2016-03-07 Thread Fillmore

On 3/7/2016 6:17 PM, Ian Kelly wrote:

On Mon, Mar 7, 2016 at 4:09 PM, Fillmore  wrote:


I must be missing something simple because I can't find a way to break out
of a nested loop in Python.

Is there a way to label loops?


No, you can't break out of nested loops,


wow...this is a bit of a WTF moment to me :(


apart from structuring your
code such that return does what you want.



Can you elaborate? apologies, but I'm new to python and trying to find 
my way out of perl


--
https://mail.python.org/mailman/listinfo/python-list


Re: breaking out of outer loops

2016-03-07 Thread Fillmore

On 3/7/2016 6:29 PM, Rob Gaddi wrote:


You're used to Perl, you're used to exceptions being A Thing.  This is
Python, and exceptions are just another means of flow control.

class MalformedLineError(Exception): pass

for line in file:
 try:
 for part in line.split('\t'):
 if thispartisbadforsomereason:
 raise MalformedLineError()
 otherwisewedothestuff
 except MalformedLineError:
 pass



I am sure you are right, but adapting this thing here into something 
that is a fix to my problem seems like abusing my 'system 2' (for those 
who read a certain book by a guy called Daniel Kanheman :)


--
https://mail.python.org/mailman/listinfo/python-list


Re: breaking out of outer loops

2016-03-07 Thread Fillmore

On 3/7/2016 6:09 PM, Fillmore wrote:


I must be missing something simple because I can't find a way to break
out of a nested loop in Python.


Thanks to everyone who has tried to help so far. I suspect this may be a 
case where I just need to get my head around a new paradigm




--
https://mail.python.org/mailman/listinfo/python-list


Re: breaking out of outer loops

2016-03-07 Thread Fillmore

On 3/7/2016 7:08 PM, Chris Angelico wrote:



Yep, which is why we're offering a variety of new paradigms. Because
it's ever so much easier to get your head around three than one!

We are SO helpful, guys. So helpful. :)


not too dissimilarly from human languages, speaking a foreign language 
is more often than not a matter of learning to think differently...


--
https://mail.python.org/mailman/listinfo/python-list


Other difference with Perl: Python scripts in a pipe

2016-03-10 Thread Fillmore


when I put a Python script in pipe with other commands, it will refuse 
to let go silently. Any way I can avoid this?


$ python somescript.py | head -5
line 1
line 3
line 3
line 4
line 5
Traceback (most recent call last):
  File "./somescript.py", line 50, in 
sys.stdout.write(row[0])
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='' mode='w' 
encoding='UTF-8'>

BrokenPipeError: [Errno 32] Broken pipe

thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: Other difference with Perl: Python scripts in a pipe

2016-03-10 Thread Fillmore

On 3/10/2016 4:46 PM, Ian Kelly wrote:

On Thu, Mar 10, 2016 at 2:33 PM, Fillmore  wrote:


when I put a Python script in pipe with other commands, it will refuse to
let go silently. Any way I can avoid this?


What is your script doing? I don't see this problem.

ikelly@queso:~ $ cat somescript.py
import sys

for i in range(20):
 sys.stdout.write('line %d\n' % i)


you are right. it's the with block :(

import sys
import csv

with open("somefile.tsv", newline='') as csvfile:

myReader = csv.reader(csvfile, delimiter='\t')
for row in myReader:

for i in range(20):
sys.stdout.write('line %d\n' % i)

$ ./somescript.py | head -5
line 0
line 1
line 2
line 3
line 4
Traceback (most recent call last):
  File "./somescript.py", line 12, in 
sys.stdout.write('line %d\n' % i)
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='' mode='w' 
encoding='UTF-8'>

BrokenPipeError: [Errno 32] Broken pipe

--
https://mail.python.org/mailman/listinfo/python-list


Re: Other difference with Perl: Python scripts in a pipe

2016-03-10 Thread Fillmore

On 3/10/2016 5:16 PM, Ian Kelly wrote:


Interesting, both of these are probably worth bringing up as issues on
the bugs.python.org tracker. I'm not sure that the behavior should be
changed (if we get an error, we shouldn't just swallow it) but it does
seem like a significant hassle for writing command-line
text-processing tools.


is it possible that I am the first one encountering this kind of issues?




--
https://mail.python.org/mailman/listinfo/python-list


non printable (moving away from Perl)

2016-03-10 Thread Fillmore


Here's another handy Perl regex which I am not sure how to translate to 
Python.


I use it to avoid processing lines that contain funny chars...

if ($string =~ /[^[:print:]]/) {next OUTER;}

:)

--
https://mail.python.org/mailman/listinfo/python-list


Re: Other difference with Perl: Python scripts in a pipe

2016-03-10 Thread Fillmore

On 3/10/2016 7:08 PM, INADA Naoki wrote:


No.  I see it usually.

Python's zen says:


Errors should never pass silently.
Unless explicitly silenced.


When failed to write to stdout, Python should raise Exception.
You can silence explicitly when it's safe:

try:
 print(...)
except BrokenPipeError:
 os.exit(0)



I don't like it. It makes Python not so good for command-line utilities


--
https://mail.python.org/mailman/listinfo/python-list


Re: non printable (moving away from Perl)

2016-03-11 Thread Fillmore

On 03/11/2016 07:13 AM, Wolfgang Maier wrote:

One lesson for Perl regex users is that in Python many things can be solved 
without regexes.
How about defining:

printable = {chr(n) for n in range(32, 127)}

then using:

if (set(my_string) - set(printable)):
 break


seems computationally heavy. I have a file with about 70k lines, of which only 20 contain 
"funny" chars.

ANy idea on how I can create a script that compares Perl speed vs. Python speed
in performing the cleaning operation?

--
https://mail.python.org/mailman/listinfo/python-list


Re: non printable (moving away from Perl)

2016-03-11 Thread Fillmore

On 3/11/2016 2:23 PM, MRAB wrote:

On 2016-03-11 00:07, Fillmore wrote:


Here's another handy Perl regex which I am not sure how to translate to
Python.

I use it to avoid processing lines that contain funny chars...

if ($string =~ /[^[:print:]]/) {next OUTER;}

:)


Python 3 (Unicode) strings have an .isprintable method:

mystring.isprintable()



my strings are UTF-8. Will it work there too?

--
https://mail.python.org/mailman/listinfo/python-list


issue with CVS module

2016-03-11 Thread Fillmore


I have a TSV file containing a few strings like this (double quotes are 
part of the string):


'"pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52"'

After Python and the CVS module has read the file and re-printed the 
value, the string has become:


'pragma: CacheHandler=08616B7E907744E026C9F044250EA55844CCFD52'

which is NOT good for me. I went back to Perl and noticed that Perl was 
correctly leaving the original string intact.


This is what I am using to read the file:


with open(file, newline='') as csvfile:

myReader = csv.reader(csvfile, delimiter='\t')
for row in myReader:

and this is what I use to write the cell value

sys.stdout.write(row[0])

Is there some directive I can give CVS reader to tell it to stop 
screwing with my text?


Thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: issue with CVS module

2016-03-11 Thread Fillmore

On 3/11/2016 3:05 PM, Joel Goldstick wrote:


Enter the python shell.  Import csv

then type help(csv)

It is highly configurable



Possibly, but I am having a hard time letting it know that it should 
leave each and every char alone, ignore quoting and just handle strings 
as strings. I tried playing with the quoting related parameters, to no 
avail:


Traceback (most recent call last):
  File "./myscript.py", line 47, in 
myReader = csv.reader(csvfile, delimiter='\t',quotechar='')
TypeError: quotechar must be set if quoting enabled

I tried adding CVS.QUOTE_NONE, but things get messy :(

Traceback (most recent call last):
  File "./myscript.py", line 64, in 
sys.stdout.write("\t"+row[h])
IndexError: list index out of range

Sorry for being a pain, but I am porting from Perl and  split 
/\t/,$line; was doing the job for me. Maybe I should go back to split on 
'\t' for python too...


--
https://mail.python.org/mailman/listinfo/python-list


Re: issue with CVS module

2016-03-11 Thread Fillmore

On 3/11/2016 2:41 PM, Fillmore wrote:

Is there some directive I can give CVS reader to tell it to stop
screwing with my text?


OK, I think I reproduced my problem at the REPL:

>>> import csv
>>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
>>> reader = csv.reader([s], delimiter='\t')
>>> for row in reader:
... print(row[0])
...
Please preserve my doublequotes
>>>

:(

How do I instruct the reader to preserve my doublequotes?

As an aside. split() performs the job correctly...

>>> allVals = s.split("\t")
>>> print(allVals[0])
"Please preserve my doublequotes"
>>>


--
https://mail.python.org/mailman/listinfo/python-list


Re: issue with CVS module

2016-03-11 Thread Fillmore

On 3/11/2016 4:14 PM, MRAB wrote:



 >>> import csv
 >>> s = '"Please preserve my doublequotes"\ttext1\ttext2'
 >>> reader = csv.reader([s], delimiter='\t', quotechar=None)
 >>> for row in reader:
... print(row[0])
...
"Please preserve my doublequotes"
 >>>



This worked! thank you MRAB


--
https://mail.python.org/mailman/listinfo/python-list


Re: issue with CVS module

2016-03-11 Thread Fillmore

On 3/11/2016 4:15 PM, Mark Lawrence wrote:


https://docs.python.org/3/library/csv.html#csv.Dialect.doublequote



thanks, but my TSV is not using any particular dialect as far as I 
understand...


Thank you, anyway


--
https://mail.python.org/mailman/listinfo/python-list


Re: issue with CVS module

2016-03-11 Thread Fillmore

On 3/11/2016 2:41 PM, Fillmore wrote:


I have a TSV file containing a few strings like this (double quotes are
part of the string):



A big thank you to everyone who helped with this and with other 
questions. My porting of one of my Perl scripts to Python is over now 
that the two scripts produce virtually the same result:


$ wc -l test2.txt test3.txt
   70823 test2.txt
   70822 test3.txt
  141645 total
$ diff test2.txt test3.txt
69351d69350
<

there's only an extra empty line at the bottom that I'll leave as a tip 
to Perl ;)


It was instructive.



--
https://mail.python.org/mailman/listinfo/python-list


argparse

2016-03-11 Thread Fillmore


Playing with ArgumentParser. I can't find a way to override the -h and 
--help options so that it provides my custom help message.


  -h, --help show this help message and exit

Here is what I am trying:

parser = argparse.ArgumentParser("csresolver.py",add_help=False)
parser.add_argument("-h","--help",
help="USAGE:  | myscript.py [-exf Exception 
File]")

parser.add_argument("-e","--ext", type=str,
help="Exception file")
args = parser.parse_args()


The result:

$ ./myscript.py -h
usage: myscript.py [-h HELP] [-e EXT]
csresolver.py: error: argument -h/--help: expected one argument

am I missing something obvious?


--
https://mail.python.org/mailman/listinfo/python-list


Re: argparse

2016-03-11 Thread Fillmore

On 3/11/2016 6:26 PM, Larry Martell wrote:

am I missing something obvious?


https://docs.python.org/2/library/argparse.html#usage



you rock!



--
https://mail.python.org/mailman/listinfo/python-list


Perl to Python again

2016-03-11 Thread Fillmore


So, now I need to split a string in a way that the first element goes 
into a string and the others in a list:


while($line = ) {

my ($s,@values)  = split /\t/,$line;

I am trying with:

for line in sys.stdin:
s,values = line.strip().split("\t")
print(s)

but no luck:

ValueError: too many values to unpack (expected 2)

What's the elegant python way to achieve this?

Thanks
--
https://mail.python.org/mailman/listinfo/python-list


Re: Perl to Python again

2016-03-11 Thread Fillmore

On 3/11/2016 7:12 PM, Martin A. Brown wrote:


Aside from your csv question today, many of your questions could be
answered by reading through the manual documenting the standard
datatypes (note, I am assuming you are using Python 3).



are you accusing me of being lazy?

if that's your accusation, then guilty as charged, but

--
https://mail.python.org/mailman/listinfo/python-list


Re: Perl to Python again

2016-03-12 Thread Fillmore

On 03/12/2016 04:40 AM, alister wrote:

On Fri, 11 Mar 2016 19:15:48 -0500, Fillmore wrote:
I not sure if you were being accused of being lazy as such but actually
being given the suggestion that there are other places that you can find
these answers that are probably better for a number of reasons

1) Speed, you don't have to wait for someone to reply although i hope you
are continuing your research whilst waiting

2) Accuracy. I have not seen it here but there are some people who would
consider it fun to provide an incorrect or dangerous solution to someone
they though was asking too basic a question

3) Collateral learning, whilst looking for the solution it is highly
likely that you will unearth other information that answers questions you
have yet to raise.


Alister, you are right, of course. The reality is that I discovered this trove 
of
a newsgroup and I am rather shamelessly taking advantage of it.
Rest assured that I cross check and learn what is associated with each
and every answer I get. So nothing is wasted. Hopefully your answers
are also useful to others who may find them at a later stage through foofle 
groups.

Also very important, I am very grateful for the support I am getting from you
and from others.



--
https://mail.python.org/mailman/listinfo/python-list


Re: retrieve key of only element in a dictionary (Python 3)

2016-03-18 Thread Fillmore


OK, this seems to do the trick, but boy is it a lot of code. Anythong more 
pythonic?

>>> l = list(d.items())
>>> l
[('squib', '007')]
>>> l[0]
('squib', '007')
>>> l[0][0]
'squib'
>>>


On 03/18/2016 05:33 PM, Fillmore wrote:


I must be missing something simple, but...

Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> d = dict()
 >>> d['squib'] = "007"
 >>> # I forget that 'squib' is my key to retrieve the only element in d
...
 >>> type(d.items())

 >>> key = d.items()[0]
Traceback (most recent call last):
   File "", line 1, in 
TypeError: 'dict_items' object does not support indexing
 >>> key,_ = d.items()
Traceback (most recent call last):
   File "", line 1, in 
ValueError: need more than 1 value to unpack
 >>> key,b = d.items()
Traceback (most recent call last):
   File "", line 1, in 
ValueError: need more than 1 value to unpack
 >>> print(d.items())
dict_items([('squib', '007')])
 >>> print(d.items()[0])
Traceback (most recent call last):
   File "", line 1, in 
TypeError: 'dict_items' object does not support indexing
 >>>

what am I missing? I don't want to iterate over the dictionary.
I know that there's only one element and I need to retrieve the key

thanks


--
https://mail.python.org/mailman/listinfo/python-list


retrieve key of only element in a dictionary (Python 3)

2016-03-19 Thread Fillmore


I must be missing something simple, but...

Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> d = dict()
>>> d['squib'] = "007"
>>> # I forget that 'squib' is my key to retrieve the only element in d
...
>>> type(d.items())

>>> key = d.items()[0]
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'dict_items' object does not support indexing
>>> key,_ = d.items()
Traceback (most recent call last):
  File "", line 1, in 
ValueError: need more than 1 value to unpack
>>> key,b = d.items()
Traceback (most recent call last):
  File "", line 1, in 
ValueError: need more than 1 value to unpack
>>> print(d.items())
dict_items([('squib', '007')])
>>> print(d.items()[0])
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'dict_items' object does not support indexing
>>>

what am I missing? I don't want to iterate over the dictionary.
I know that there's only one element and I need to retrieve the key

thanks
--
https://mail.python.org/mailman/listinfo/python-list