Escaping commas within parens in CSV parsing?

2005-06-30 Thread felciano
Hi --

I am trying to use the csv module to parse a column of values
containing comma-delimited values with unusual escaping:

AAA, BBB, CCC (some text, right here), DDD

I want this to come back as:

["AAA", "BBB", "CCC (some text, right here)", "DDD"]

I think this is probably non-standard escaping, as I can't figure out
how to structure a csv dialect to handle it correctly. I can probably
hack this with regular expressions but I thought I'd check to see if
anyone had any quick suggestions for how to do this elegantly first.

Thanks!

Ramon

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Escaping commas within parens in CSV parsing?

2005-07-01 Thread felciano
Thanks for all the postings. I can't change delimiter in the source
itself, so I'm doing it temporarily just to handle the escaping:

def splitWithEscapedCommasInParens(s, trim=False):
pat = re.compile(r"(.+?\([^\(\),]*?),(.+?\).*)")
while pat.search(s):
s = re.sub(pat,r"\1|\2",s)
if trim:
return [string.strip(string.replace(x,"|",",")) for x in
string.split(s,",")]
else:
return [string.replace(x,"|",",") for x in string.split(s,",")]

Probably not the most efficient, but its "the simplest thing that
works" for me :-)

Thanks again for all the quick responses.

Ramon

-- 
http://mail.python.org/mailman/listinfo/python-list


Python plain-text database or library that supports joins?

2007-06-22 Thread felciano
Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130 and
http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What was that web interaction library called again?

2007-06-22 Thread felciano
Maybe http://twill.idyll.org/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python plain-text database or library that supports joins?

2007-06-22 Thread felciano
>
> i don't think that using flat text files as a database is common these
> days. if you need relational database features what stops you from
> using rdbms? if the only reason for that is some legacy system then
> i'd still use in-memory sqlite database for all relational operations.
> import, process, export back to text if you need to.
>
These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

-- 
http://mail.python.org/mailman/listinfo/python-list


Multiline CSV export to Excel produces unrecognized characters?

2008-03-23 Thread felciano
Hi --

Is there a standard way to use the csv module to export data that
contains multi-line values to Excel? I can get it mostly working, but
Excel seems to have difficulty displaying the generated multi-line
cells.

The following reproduces the problem in python 2.5:

import csv
row = [1,"hello","this is\na multiline\ntext field"]
writer = csv.writer(open("test.tsv", "w"), dialect="excel-tab",
quotechar='"')
writer.writerow(row)

When opening the resulting test.tsv file, I do indeed see a cell with
a multi-line value, but there is a small boxed question mark at the
end of each of the lines, as if Excel didn't recognize the linebreak.

Any idea why these are there or how to get rid of them?

Ramon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Multiline CSV export to Excel produces unrecognized characters?

2008-03-24 Thread felciano
>
> Any chance you are doing this on Windows and Excel doesn't like the
> return+linefeed line endings that Windows produces when writing files in
> text mode?  Try 'wb' as mode for the output file.
>
> Ciao,
> Marc 'BlackJack' Rintsch
>
Fixed! Can't believe I missed that...

Thank you for the quick reply!

Ramon
-- 
http://mail.python.org/mailman/listinfo/python-list


Checking for network connectivity

2008-06-19 Thread felciano
Hi --

Is there a clean pythonic way to check for network connectivity? I
have a script that needs to run periodically on a laptop to create a
local cache of some network files. I would like it to fail gracefully
when disconnected, as well as issue a warning if it hasn't been able
to connect for X minutes / hours.

I currently just use try / except to catch the network errors when the
first call times out, but the timeout takes a while, and this doesn't
feel like the right design because technically this isn't an exception
-- it is expected behavior. Is there a better way to do this (and
still be reasonably portable)?

Thanks,

Ramon
--
http://mail.python.org/mailman/listinfo/python-list


Python libraries for log mining and event abstraction? (possibly OT)

2008-06-24 Thread felciano
Hi --

I am trying to do some event abstraction to mine a set of HTTP logs.
We have a pretty clean stateless architecture with user IDs that
allows us to understand what is retrieved on each session, and should
allow us to detect the higher-order user activity from the logs.
Ideally I'd love a python toolkit that has abstracted this out into a
basic set of API calls or even a query language.

An simple example is: find all instances of a search request, followed
by a 2+ search requests with additional words in the search string,
and group these into a higher-order "Iterative Search Refinement"
event (i.e. the user got too many search results to start with, and is
adding additional words to narrow down the results). So what I need is
the ability to select temporally-related events out of the event
stream (e.g. find searches by the same user within 10 second of each
other), further filter based on additional criteria across these event
(e.g. select only search events where there are additional search
criteria relative to the previous search), and a way to annotate, roll-
up or otherwise group matching patterns into a higher-level event.
Some of these patterns may require non-trivial criteria / logic not
supported by COTS log analytics, which is why I'm trying a toolkit
approach that allows customization.

I've been hunting around Google and the usual open source sites for
something like this and haven't found anything (in python or
otherwise). This is surprising to me, as I would think many people
would benefit from something like this, so maybe I'm just describing
the problem wrong or using the wrong keywords. I'm posting this to
this group because it feels somewhat AI-ish (temporal event
abstraction, etc) and that therefore pythonistas may have experience
with (there seems to be a reasonably high correlation there). Further,
if I can't find anything I'm going to have to build it myself, and it
will be in python, so any pointers on elegant design patterns for how
to do this using pythonic functional programming would be appreciated.
Barring anything else I will start from itertools and work from
there.

That said, I'm hoping to use an existing library rather than re-invent
the wheel. Any suggestions on where to look for something like this?

Thanks!

Ramon
--
http://mail.python.org/mailman/listinfo/python-list


Pythonic use of CSV module to skip headers?

2004-12-02 Thread Ramon Felciano
Hi --

I'm using the csv module to parse a tab-delimited file and wondered
whether there was a more elegant way to skip an possible header line.
I'm doing

line = 0
reader = csv.reader(file(filename))
for row in reader:
if (ignoreFirstLine & line == 0):
continue
line = line+1
# do something with row

The only thing I could think of was to specialize the default reader
class with an extra skipHeaderLine constructor parameter so that its
next() method can skip the first line appropriate. Is there any other
cleaner way to do it w/out subclassing the stdlib?

Thanks!

Ramon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pythonic list to bitflag mapping

2004-12-02 Thread Ramon Felciano
> Or can be used directly as an integer index to get a character
> 
>  >>> ['01'[x in a] for x in xrange(10)]
>  ['0', '0', '0', '1', '1', '0', '1', '0', '0', '0']
> 
Very cool -- this does the trick nicely and seems quite extensible,
now that I get the basic idiom.

Belated thanks for the quick replies on this one!

Ramon
-- 
http://mail.python.org/mailman/listinfo/python-list