from:"py_genetic"

os.system and subprocess odd behavior

2012-12-14 Thread py_genetic

Example of the issue for arguments sake:

Platform Ubuntu server 12.04LTS, python 2.7

Say file1.txt has "hello world" in it.

subprocess.Popen("cat < file1 > file2", shell = True)
subprocess.call("cat < file1 > file2", shell = True)
os.system("cat < file1 > file2")


I'm finding that file2 IS created, but with 0bytes in it, this happens when I 
try any sort of cmd to the system of the nature where I'm putting the output 
into a file.

I've made sure it isn't a permission issue.  The command runs fine from the cmd 
line and python is being run with super user privileges. Strait from the 
terminal I get a hello world copy as file2... as expected.

I would like python to simply exec the cmd and move on I don't want to read 
and write the stdout ect into python and write it to a file.  Any thoughts as 
to why this creates file2, but no data appears?  Is there a better way to do 
this?

Thank you!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.system and subprocess odd behavior

2012-12-17 Thread py_genetic

Thanks! I am using .txt extensions.  Sorry for being a little vague.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.system and subprocess odd behavior

2012-12-17 Thread py_genetic

Thanks for verifying this for me Steven.  I'm glad you are seeing it work.  
It's really the strangest thing.

The issue seems to be with the " > outfile.txt" portion of the command.

The actual command is running a query on a verticalDB and dumping the result.  
The EXACT command run from the command line runs just fine.  

Even if I use the simple cat command to and out file as just a simple test 
case...  The file is created with zero bytes (see below)... but its as if 
python moves on or gets an 0 exit code after the first part of the cmd is 
executed  and no data is written.

-rw-r--r-- 1 root root0 Dec 14 15:33 QUAD_12142012203251.TXT

Any thoughts as to why on my end this may happen?

Thanks again!



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.system and subprocess odd behavior

2012-12-17 Thread py_genetic

Oscar, seems you may be correct.  I need to run this program as a superuser.  
However, after some more tests with simple commands...  I seem to be working 
correctly from any permission level in python Except for the output write 
command from the database to a file.  Which runs fine if I paste it into the 
cmd line.  Also, subprocess.call_check() returns clean.  However, nothing is 
written to the output file when called from python.

so this cmd runs great from the cmd line (sudo or no) however the output file 
in this case is owned by the sysadmin either way... not root?

/usr/local/Calpont/mysql/bin/mysql 
--defaults-file=/usr/local/Calpont/mysql/my.cnf -u root myDB < 
/home/myusr/jobs/APP_JOBS/JOB_XXX.SQL > /home/myusr/jobs/APP_JOBS/JOB_XXX.TXT


When run from sudo python (other files are also created and owned by root 
correctly)  however no output is written from the db command zero byte file 
only (owned by root)... returns to python with no errors.

I'm sorta at a loss.  I'd rather still avoid having python connect to the db 
directly or reading the data from stdout, is a waste or mem and time for what I 
need.

Thanks for any more thoughts.

> 
> Because of the root permissions on the file? What happens if you write
> 
> to a file that doesn't need privileged access?
> 
> 
> 
> Instead of running the "exact command", run the cat commands you
> 
> posted (that Steven has confirmed as working) and run them somewhere
> 
> in your user directory without root permissions.
> 
> 
> 
> Also you may want to use subprocess.check_call as this raises a Python
> 
> error if the command returns an error code.
> 
> 
> 
> 
> 
> Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.system and subprocess odd behavior

2012-12-18 Thread py_genetic

Oscar I can confirm this behavior from terminal. 

AND this works as well, simulating exactly what I'm doing permissions wise, and 
calling sudo python test.py  below

f1 = open('TESTDIR/file1.txt', 'w')
f1.write('some test here\n')
f1.close()

cmd1 = 'cat < TESTDIR/file1.txt > TESTDIR/file2.txt'
P = Popen(cmd1, shell=True)
P.wait()

cmd2 = 'cat < TESTDIR/file1.txt | sudo tee TESTDIR/file3.txt'
P = Popen(cmd2, shell=True)
P.wait()

-rw-r--r-- 1 root root   15 Dec 18 12:57 file1.txt
-rw-r--r-- 1 root root   15 Dec 18 12:57 file2.txt
-rw-r--r-- 1 root root   15 Dec 18 12:57 file3.txt

HOWEVER... 

when using this command from before no dice

/usr/local/Calpont/mysql/bin/mysql 
--defaults-file=/usr/local/Calpont/mysql/my.cnf -u root myDB < 
/home/myusr/jobs/APP_JOBS/JOB_XXX.SQL > /home/myusr/jobs/APP_JOBS/JOB_XXX.TXT

OR

/usr/local/Calpont/mysql/bin/mysql 
--defaults-file=/usr/local/Calpont/mysql/my.cnf -u root myDB < 
/home/myusr/jobs/APP_JOBS/JOB_XXX.SQL | sudo tee 
/home/myusr/jobs/APP_JOBS/JOB_XXX.TXT

So it's basically as if python gets a response instantly (perhaps from the 
query) and closes the process, since we've verified its not permissions related.

Perhaps someone can try a mysql cmd line such as above within python?  And see 
if you can verify this behavior.  I believe the query returning with no errors 
is shutting the sub shell/process?

I've tried this with all options p.wait() ect as well as parsing the command 
and running shell false.

Again the exact command run perfect when pasted and run from the shell.  I'll 
try running it a few other ways with some diff db options.


> Follow through the bash session below
> 
> 
> 
> $ cd /usr
> 
> $ ls
> 
> bin  games  include  lib  local  sbin  share  src
> 
> $ touch file
> 
> touch: cannot touch `file': Permission denied
> 
> $ sudo touch file
> 
> [sudo] password for oscar:
> 
> $ ls
> 
> bin  file  games  include  lib  local  sbin  share  src
> 
> $ cat < file > file2
> 
> bash: file2: Permission denied
> 
> $ sudo cat < file > file2
> 
> bash: file2: Permission denied
> 
> $ sudo cat < file > file2
> 
> bash: file2: Permission denied
> 
> $ sudo cat < file | tee file2
> 
> tee: file2: Permission denied
> 
> $ sudo cat < file | sudo tee file2
> 
> $ ls
> 
> bin  file  file2  games  include  lib  local  sbin  share  src
> 
> 
> 
> The problem is that when you do
> 
> 
> 
>   $ sudo cmd > file2
> 
> 
> 
> it is sort of like doing
> 
> 
> 
>   $ sudo cmd | this_bash_session > file2
> 
> 
> 
> so the permissions used to write to file2 are the same as the bash
> 
> session rather than the command cmd which has root permissions. By
> 
> piping my output into "sudo tee file2" I can get file2 to be written
> 
> by a process that has root permissions.
> 
> 
> 
> I suspect you have the same problem although it all complicated by the
> 
> fact that everything is a subprocess of Python. Is it possibly the
> 
> case that the main Python process does not have root permissions but
> 
> you are using it to run a command with sudo that then does have root
> 
> permissions?
> 
> 
> 
> Does piping through something like "sudo tee" help?
> 
> 
> 
> 
> 
> Oscar

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: os.system and subprocess odd behavior

2012-12-18 Thread py_genetic

Solved the issue, by injecting the query into the cmd line.  Shell script 
worked fine as if I was cutting and pasting to the prompt.  Seems to still be 
something with the subprocess receiving and exit code before or when the query 
finishes, just when I ask to to read from the .SQL file.

example called from in python:
mysql  < file.txt > out.txt  < doesn't work (query is run 0Byte output)
mysql  -e "my query" > out.txt <- does work

However this isn't standard mysql as it's infinidb.  Maybe this is an esoteric 
issue.

Thanks for the help Oscar.  Frustrating since it seems illogical seems if 
the cmd runs in the shell it should have the exact same behavior from a 
subprocess shell=True cmd string call.

If I find anything else I'll update this.
-- 
http://mail.python.org/mailman/listinfo/python-list

pytables - best practices / mem leaks

2006-07-17 Thread py_genetic

I have an H5 file with one group (off the root) and two large main
tables and I'm attempting to aggragate my data into 50+ new groups (off
the root) with two tables per sub group.

sys info:
PyTables version:  1.3.2
HDF5 version:  1.6.5
numarray version:  1.5.0
Zlib version:  1.2.3
BZIP2 version: 1.0.3 (15-Feb-2005)
Python version:2.4.2 (#1, Jul 13 2006, 20:16:08)
[GCC 4.0.1 (Apple Computer, Inc. build 5250)]
Platform:  darwin-Power Macintosh (v10.4.7)
Byte-ordering: big

Ran all pytables tests included with package and recieved an OK.


Using the following code I get one of three errors:

1. Illegal Instruction

2. Malloc(): trying to call free() twice

3. Bus Error

I believe all three stem from the same issue, involving a malloc()
memory problem in the pytable c libraries.  I also believe this may be
due to how I'm attempting to write my sorting script.

The script executes fine and all goes well until I'm sorting about
group 20 to 30 and I throw one of the three above errors depending on
how/when I'm flush() close() the file.  When I open the file after the
error using h5ls all tables are in perfact order up to the crash and if
I continue from the point every thing runs fine until python throws the
same error again after another 10 sorts or so.  The somewhat random
crashing is what leads me to believe I have a memory leak or my method
of doing this is incorrect.

Is there a better way to aggragate data using pytables/python? Is there
a better way to be doing this?  This seems strait forward enough.

Thanks,
Conor

#function to agg state data from main neg/pos tables into neg/pos state
tables

import string
import tables


def aggstate(state, h5file):

print state

class PosRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)

class NegRecords(tables.IsDescription):
sic = tables.IntCol(0, 1, 4, 0, None, 0)
numsic = tables.IntCol(0, 1, 4, 0, None, 0)
empsiz = tables.StringCol(1, '?', 1, None, 0)
salvol = tables.StringCol(1, '?', 1, None, 0)
popcod = tables.StringCol(1, '?', 1, None, 0)
state = tables.StringCol(2, '?', 1, None, 0)
zip = tables.IntCol(0, 1, 4, 0, None, 1)



group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
records")

table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
raw pos record table")
table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
raw neg record table")

table = h5file.root.raw_records.pos_records
point = table1.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()

h5file.flush()

table = h5file.root.raw_records.neg_records
point = table2.row
for x in table.iterrows():
if x['state'] == state:
point['sic'] = x['sic']
point['numsic'] = x['numsic']
point['empsiz'] = x['empsiz']
point['salvol'] = x['salvol']
point['popcod'] = x['popcod']
point['state'] = x['state']
point['zip'] = x['zip']

point.append()


h5file.flush()



states =
['AL','AK','AZ','AR','CA','CO','CT','DC','DE','FL','GA','HI','ID','IL','IN','IA','KS','KY','LA','ME','MD','MA','MI','MN','MS','MO','MT','NE','NV','NH','NJ','NM','NY','NC','ND','OH','OK','OR','PA','RI','SC','SD','TN','TX','UT','VT','VA','WA','WV','WI','WY']

h5file = tables.openFile("200309_data.h5", mode = 'a')

for i in xrange(len(states)):
aggstate(states[i], h5file)

h5file.close()

-- 
http://mail.python.org/mailman/listinfo/python-list

run a string as code?

2006-07-17 Thread py_genetic

How can you make python interpret a string (of py code) as code.  For
example if you want a py program to modify itself as it runs.  I know
this is an advantage of interpreted languages, how is this done in
python.  Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: run a string as code?

2006-07-17 Thread py_genetic

[EMAIL PROTECTED] wrote:
> py_genetic wrote:
> > How can you make python interpret a string (of py code) as code.  For
> > example if you want a py program to modify itself as it runs.  I know
> > this is an advantage of interpreted languages, how is this done in
> > python.  Thanks.
>
> This might do it...
>
> >>> print eval.__doc__
> eval(source[, globals[, locals]]) -> value
>
> Evaluate the source in the context of globals and locals.
> The source may be a string representing a Python expression
> or a code object as returned by compile().
> The globals must be a dictionary and locals can be any mappping,
> defaulting to the current globals and locals.
> If only globals is given, locals defaults to it.

For example each time this line is interpreted I would like to use the
new value of the state var which is a global var.  How can I force
state to be identified and used in this string.

r_table = h5file.root.state_raw_records.neg_records

r_table = eval("h5file.root.state_raw_records.neg_records") ??
r_table = h5file.root.eval("state")_raw_records.neg_records ?? eval is
not a part of root

dont think either of these is very logical? Any ideas?  Possibly the
parser mod?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: run a string as code?

2006-07-17 Thread py_genetic


py_genetic wrote:
> [EMAIL PROTECTED] wrote:
> > py_genetic wrote:
> > > How can you make python interpret a string (of py code) as code.  For
> > > example if you want a py program to modify itself as it runs.  I know
> > > this is an advantage of interpreted languages, how is this done in
> > > python.  Thanks.
> >
> > This might do it...
> >
> > >>> print eval.__doc__
> > eval(source[, globals[, locals]]) -> value
> >
> > Evaluate the source in the context of globals and locals.
> > The source may be a string representing a Python expression
> > or a code object as returned by compile().
> > The globals must be a dictionary and locals can be any mappping,
> > defaulting to the current globals and locals.
> > If only globals is given, locals defaults to it.
>
> For example each time this line is interpreted I would like to use the
> new value of the state var which is a global var.  How can I force
> state to be identified and used in this string.
>
> r_table = h5file.root.state_raw_records.neg_records
>
> r_table = eval("h5file.root.state_raw_records.neg_records") ??
> r_table = h5file.root.eval("state")_raw_records.neg_records ?? eval is
> not a part of root
>
> dont think either of these is very logical? Any ideas?  Possibly the
> parser mod?

Got it!

tmp = "h5file.root."+state+"_raw_records.pos_records"
r_table = eval(tmp)

works great thanks for the help!

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: run a string as code?

2006-07-18 Thread py_genetic

Gary Herron wrote:
> py_genetic wrote:
> > py_genetic wrote:
> >
> >> [EMAIL PROTECTED] wrote:
> >>
> >>> py_genetic wrote:
> >>>
> >>>> How can you make python interpret a string (of py code) as code.  For
> >>>> example if you want a py program to modify itself as it runs.  I know
> >>>> this is an advantage of interpreted languages, how is this done in
> >>>> python.  Thanks.
> >>>>
> >>> This might do it...
> >>>
> >>>
> >>>>>> print eval.__doc__
> >>>>>>
> >>> eval(source[, globals[, locals]]) -> value
> >>>
> >>> Evaluate the source in the context of globals and locals.
> >>> The source may be a string representing a Python expression
> >>> or a code object as returned by compile().
> >>> The globals must be a dictionary and locals can be any mappping,
> >>> defaulting to the current globals and locals.
> >>> If only globals is given, locals defaults to it.
> >>>
> >> For example each time this line is interpreted I would like to use the
> >> new value of the state var which is a global var.  How can I force
> >> state to be identified and used in this string.
> >>
> >> r_table = h5file.root.state_raw_records.neg_records
> >>
> >> r_table = eval("h5file.root.state_raw_records.neg_records") ??
> >> r_table = h5file.root.eval("state")_raw_records.neg_records ?? eval is
> >> not a part of root
> >>
> >> dont think either of these is very logical? Any ideas?  Possibly the
> >> parser mod?
> >>
> >
> > Got it!
> >
> > tmp = "h5file.root."+state+"_raw_records.pos_records"
> > r_table = eval(tmp)
> >
> > works great thanks for the help!
> >
> Yes, it works, but this is not a good place to use eval. Now that we see
> how you want to use it, we can find a *much* better way to do it.
>
> If you want to lookup an attribute of an object, but the attribute name
> is a string in a variable, then use getattr to do the lookup.
>
> If in interpret your code correctly:
>
> attrname = state + "_raw_records"
> obj = getattr(h5file.root, attrname)
> r_table = obj.pos_records
>
> These, of course, could be combined into a single (but not necessarily
> clearer) line.
>
> Gary Herron

So it is eval() is more appropriate when evalution blocks of string
code, and getattr() is more efficient for dealing with objects such as
h5file object above?  Thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: pytables - best practices / mem leaks

2006-07-18 Thread py_genetic


py_genetic wrote:
> I have an H5 file with one group (off the root) and two large main
> tables and I'm attempting to aggragate my data into 50+ new groups (off
> the root) with two tables per sub group.
>
> sys info:
> PyTables version:  1.3.2
> HDF5 version:  1.6.5
> numarray version:  1.5.0
> Zlib version:  1.2.3
> BZIP2 version: 1.0.3 (15-Feb-2005)
> Python version:2.4.2 (#1, Jul 13 2006, 20:16:08)
> [GCC 4.0.1 (Apple Computer, Inc. build 5250)]
> Platform:  darwin-Power Macintosh (v10.4.7)
> Byte-ordering: big
>
> Ran all pytables tests included with package and recieved an OK.
>
>
> Using the following code I get one of three errors:
>
> 1. Illegal Instruction
>
> 2. Malloc(): trying to call free() twice
>
> 3. Bus Error
>
> I believe all three stem from the same issue, involving a malloc()
> memory problem in the pytable c libraries.  I also believe this may be
> due to how I'm attempting to write my sorting script.
>
> The script executes fine and all goes well until I'm sorting about
> group 20 to 30 and I throw one of the three above errors depending on
> how/when I'm flush() close() the file.  When I open the file after the
> error using h5ls all tables are in perfact order up to the crash and if
> I continue from the point every thing runs fine until python throws the
> same error again after another 10 sorts or so.  The somewhat random
> crashing is what leads me to believe I have a memory leak or my method
> of doing this is incorrect.
>
> Is there a better way to aggragate data using pytables/python? Is there
> a better way to be doing this?  This seems strait forward enough.
>
> Thanks,
> Conor
>
> #function to agg state data from main neg/pos tables into neg/pos state
> tables
>
> import string
> import tables
>
>
> def aggstate(state, h5file):
>
>   print state
>
>   class PosRecords(tables.IsDescription):
>   sic = tables.IntCol(0, 1, 4, 0, None, 0)
>   numsic = tables.IntCol(0, 1, 4, 0, None, 0)
>   empsiz = tables.StringCol(1, '?', 1, None, 0)
>   salvol = tables.StringCol(1, '?', 1, None, 0)
>   popcod = tables.StringCol(1, '?', 1, None, 0)
>   state = tables.StringCol(2, '?', 1, None, 0)
>   zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
>   class NegRecords(tables.IsDescription):
>   sic = tables.IntCol(0, 1, 4, 0, None, 0)
>   numsic = tables.IntCol(0, 1, 4, 0, None, 0)
>   empsiz = tables.StringCol(1, '?', 1, None, 0)
>   salvol = tables.StringCol(1, '?', 1, None, 0)
>   popcod = tables.StringCol(1, '?', 1, None, 0)
>   state = tables.StringCol(2, '?', 1, None, 0)
>   zip = tables.IntCol(0, 1, 4, 0, None, 1)
>
>
>
>   group1 = h5file.createGroup("/", state+"_raw_records", state+" raw
> records")
>
>   table1 = h5file.createTable(group1, "pos_records", PosRecords, state+"
> raw pos record table")
>   table2 = h5file.createTable(group1, "neg_records", NegRecords, state+"
> raw neg record table")
>
>   table = h5file.root.raw_records.pos_records
>   point = table1.row
>   for x in table.iterrows():
>   if x['state'] == state:
>   point['sic'] = x['sic']
>   point['numsic'] = x['numsic']
>   point['empsiz'] = x['empsiz']
>   point['salvol'] = x['salvol']
>   point['popcod'] = x['popcod']
>   point['state'] = x['state']
>   point['zip'] = x['zip']
>
>   point.append()
>
>   h5file.flush()
>
>   table = h5file.root.raw_records.neg_records
>   point = table2.row
>   for x in table.iterrows():
>   if x['state'] == state:
>   point['sic'] = x['sic']
>   point['numsic'] = x['numsic']
>   point['empsiz'] = x['empsiz']
>   point['salvol'] = x['salvol']
>   point['popcod'] = x['popcod']
>   point[&

Re: Create a new class on the fly

2007-06-01 Thread py_genetic

Alex, thanks for the advise:

> > class PosRecords(tables.IsDescription):
>
> > class A(object):
> > self.__init__(self, args):
>
> This makes 0 sense; maybe you should learn elementary Python syntax well
> _before_ trying advanced stuff, no?

I accidently left that erroneous snippet in, however if your offering
a class in smart ass let me know where to sign up.




-- 
http://mail.python.org/mailman/listinfo/python-list

Efficient way of generating original alphabetic strings like unix file "split"

2007-06-14 Thread py_genetic

Hi,

I'm looking to generate x alphabetic strings in a list size x.  This
is exactly the same output that the unix command "split" generates as
default file name output when splitting large files.

Example:

produce x original, but not random strings from english alphabet, all
lowercase.  The length of each string and possible combinations is
dependent on x.  You don't want any repeats.

[aaa, aab, aac, aad,  aax, .. bbc, bbd,  bcd]

I'm assumming there is a slick, pythonic way of doing this, besides
writing out a beast of a looping function.  I've looked around on
activestate cookbook, but have come up empty handed.  Any suggestions?

Thanks,
Conor

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient way of generating original alphabetic strings like unix file "split"

2007-06-14 Thread py_genetic

>
> You didn't try hard enough. :)
>
> http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465
>
> --
> HTH,
> Rob

Thanks Rob, "permutation" was the keyword I shcould have used!

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient way of generating original alphabetic strings like unix file "split"

2007-06-18 Thread py_genetic

On Jun 14, 3:02 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote:
> On Jun 14, 4:39 pm, py_genetic <[EMAIL PROTECTED]> wrote:
>
> > > You didn't try hard enough. :)
>
> > >http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/190465
>
> > > --
> > > HTH,
> > > Rob
>
> > Thanks Rob, "permutation" was the keyword I shcould have used!
>
> See my other post to see if that is indeed what you mean.

Thanks, mensanator I see what you are saying, I appreciate you
clarification.  I modified the unique version to fit my needs,
sometimes you just want the first x unique combinations and of the
right "width" (A or AA or AAA...) string, so I reworked it a bit to be
more efficient.  Isn't this a case of base^n-1 for # unique
combinations, using the alphabet: 26^strlen - 1 or to figure out
strlen from #of combinations needed: ln(26 * #ofcobinations needed)/
ln(26) obviously a float but a pritty good idea of strlen needed when
rounded?

-- 
http://mail.python.org/mailman/listinfo/python-list

converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

2007-05-18 Thread py_genetic

Hello,

I'm importing large text files of data using csv.  I would like to add
some more auto sensing abilities.  I'm considing sampling the data
file and doing some fuzzy logic scoring on the attributes (colls in a
data base/ csv file, eg. height weight income etc.) to determine the
most efficient 'type' to convert the attribute coll into for further
processing and efficient storage...

Example row from sampled file data: [ ['8','2.33', 'A', 'BB', 'hello
there' '100,000,000,000'], [next row...] ]

Aside from a missing attribute designator, we can assume that the same
type of data continues through a coll.  For example, a string, int8,
int16, float etc.

1. What is the most efficient way in python to test weather a string
can be converted into a given numeric type, or left alone if its
really a string like 'A' or 'hello'?  Speed is key?  Any thoughts?

2. Is there anything out there already which deals with this issue?

Thanks,
Conor

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

2007-05-21 Thread py_genetic

This is excellect advise, thank you gentelman.

Paddy:

We can't really, in this arena make assumtions about the data source.
I fully agree with your point, but if we had the luxury of really
knowing the source we wouldn't be having this conversation.  Files we
can deal with could be consumer data files, log files, financial
files... all from different users BCP-ed out or cvs excell etc.
However, I agree that we can make one basic assumtion, for each coll
there is a correct and furthermore optimal format.  In many cases we
may have a supplied "data dictionary" with the data in which case you
are right and we can override much of this process, except we still
need to find the optimal format like int8 vs int16.

James:

Using a baysian method were my inital thoughts as well.  The key to
this method, I feel is getting a solid random sample of the entire
file without having to load the whole beast into memory.

What are your thoughts on other techniques?  For example training a
neural net and feeding it a sample, this might be nice and very fast
since after training (we would have to create a good global training
set) we could just do a quick transform on a coll sample and ave the
probabilities of the output units (one output unit for each type).
The question here would encoding, any ideas?  A bin rep of the vars?
Furthermore, niave bayes decision trees etc?


John:

> The approach that I've adopted is to test the values in a column for all
> types, and choose the non-text type that has the highest success rate
> (provided the rate is greater than some threshold e.g. 90%, otherwise
> it's text).

> For large files, taking a 1/N sample can save a lot of time with little
> chance of misdiagnosis.

I like your approach, this could be simple.  Intially, I was thinking
a loop that did exactly this, just test the sample colls for "hits"
and take the best.  Thanks for the sample code.


George:

Thank you for offering to share your transform function.  I'm very
interested.








-- 
http://mail.python.org/mailman/listinfo/python-list

Create a new class on the fly

2007-05-30 Thread py_genetic


Is this possible or is there a better way.  I need to create a new
class during runtime to be used inside a function. The class
definition and body are dependant on unknows vars at time of exec,
thus my reasoning here.

class PosRecords(tables.IsDescription):


class A(object):
self.__init__(self, args):
 
def mkClass(self, args):
  eval( "class B(object): ...") #definition on B is dependant
on dynamic values in string
  ..do stuff with class


thanks.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: segmentation fault in scipy?

2006-05-11 Thread py_genetic

>No! matrix objects use matrix multiplication for *. You seem to need 
>elementwise
>multiplication.

No! when you mult a vector with itself transposed, the diagonal of the
resulting matrix is the squares of each error (albeit you do a lot of
extra calc), then sum the squares, ie trace().  Its a nifty trick, if
you don't have too much data 25000x25000 matrix in mem and youre using
matricies ie. batch learning.  The actual equation includes multiply by
1/2*(sum of the squares),  but mean squared error can be more telling
about error and cross entropy is even better, becuase it tells you how
well youre predicting the posterior probabilies...

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: segmentation fault in scipy?

2006-05-11 Thread py_genetic

True! it is rediculous/insane as I mentioned and noted and agreed with
you (in all your responses) and was my problem, however, not wrong
(same result), as I was just simply noting (not to be right), although,
yes, insane.  Thanks again.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: segmentation fault in scipy?

2006-05-11 Thread py_genetic

>Now I'm even more confused. What kind of array is "error" here? First you tell
>me it's a (25000, 80) array and now you are telling me it is a (25000,) array.
>Once you've defined what "error" is, then please tell me what the quantity is
>that you want to calculate. I think I told you several different wrong things,
>previously, based on wrong assumptions.
>

It's just in my original post I was trying to get across maximum size
of the arrays I'm using. sorry for the confusion, I didn't state actual
size of my output vectors.  I discovered the probelm when your first
stated:

>If error.shape == (25000, 80), then dot(error, transpose(error)) will be
>returning an array of shape (25000, 25000)

Which was exactly related to the excessive calculation I was running
and set off the red flags and made it very clear.

Later I was somewhat confused and believed that you we were talking
about two different things regarding SSE when you said:

SSE = sum(multiply(error, error), axis=None)

And didn't realize that multipy() was an efficient element mult method
for matricies, thinking it was like matrixmultiply() and gave you back
the old trace sse method but with the casting (not meaning to
contradict you).  However all is now very clear, and I agree with
element wise mult or even squaring the output error, and have no real
reason why I was using trace, except I had a faint memory of using it
in a class for small experiments in matlab (i guess the idea was to
keep everything in linear algebra) and I spit it up for some reason
when I was doing a quicky error function.

-- 
http://mail.python.org/mailman/listinfo/python-list

os.system and subprocess odd behavior

Re: os.system and subprocess odd behavior

Re: os.system and subprocess odd behavior

Re: os.system and subprocess odd behavior

Re: os.system and subprocess odd behavior

Re: os.system and subprocess odd behavior

pytables - best practices / mem leaks

run a string as code?

Re: run a string as code?

Re: run a string as code?

Re: run a string as code?

Re: pytables - best practices / mem leaks

Re: Create a new class on the fly

Efficient way of generating original alphabetic strings like unix file "split"

Re: Efficient way of generating original alphabetic strings like unix file "split"

Re: Efficient way of generating original alphabetic strings like unix file "split"

converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

Re: converting strings to most their efficient types '1' --> 1, 'A' ---> 'A', '1.2'---> 1.2

Create a new class on the fly

Re: segmentation fault in scipy?

Re: segmentation fault in scipy?

Re: segmentation fault in scipy?

22 matches

Site Navigation

Mail list logo

Footer information