Escaping commas within parens in CSV parsing?
Hi -- I am trying to use the csv module to parse a column of values containing comma-delimited values with unusual escaping: AAA, BBB, CCC (some text, right here), DDD I want this to come back as: ["AAA", "BBB", "CCC (some text, right here)", "DDD"] I think this is probably non-standard escaping, as I can't figure out how to structure a csv dialect to handle it correctly. I can probably hack this with regular expressions but I thought I'd check to see if anyone had any quick suggestions for how to do this elegantly first. Thanks! Ramon -- http://mail.python.org/mailman/listinfo/python-list
Re: Escaping commas within parens in CSV parsing?
Thanks for all the postings. I can't change delimiter in the source itself, so I'm doing it temporarily just to handle the escaping: def splitWithEscapedCommasInParens(s, trim=False): pat = re.compile(r"(.+?\([^\(\),]*?),(.+?\).*)") while pat.search(s): s = re.sub(pat,r"\1|\2",s) if trim: return [string.strip(string.replace(x,"|",",")) for x in string.split(s,",")] else: return [string.replace(x,"|",",") for x in string.split(s,",")] Probably not the most efficient, but its "the simplest thing that works" for me :-) Thanks again for all the quick responses. Ramon -- http://mail.python.org/mailman/listinfo/python-list
Python plain-text database or library that supports joins?
Hello -- Is there a convention, library or Pythonic idiom for performing lightweight relational operations on flatfiles? I frequently find myself writing code to do simple SQL-like operations between flat files, such as appending columns from one file to another, linked through a common id. For example, take a list of addresses and append a 'district' field by looking up a congressional district from a second file that maps zip codes to districts. Conceptually this is a simple database operation with a join on a common field (zip code in the above example). Other case use other relational operators (projection, cross-product, etc) so I'm really looking for something SQL-like in functionality. However, the data is in flat-files, the file structure changes frequently, the files are dynamically generated from a range of sources, are short-lived in nature, and otherwise not warrant the hassle of a database setup. So I've been looking around for a nice, Pythonic, zero-config (no parsers, no setup/teardown, etc) solution for simple queries that handles a database of csv-files-with-headers automatically. There are number of solutions that are close, but in the end come up short: - KirbyBase 1.9 (latest Python version) is the closest that I could find, as it lets you keep your data in flatfiles and perform operations using the field names from those text-based tables, but it doesn't support joins (the more recent Ruby version seems to). - Buzhug and Sqlite have their data structures w no automatic .tab or .csv parsing (unless sqlite includes a way to map flatfiles to sqlite virtual tables that I don't know about). - http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974 is heading in the right direction, as it shows how to perform relational operations on lists and are index based rather than field-name based. - http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130 and http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html provide ways of automatically populating DBs but not the reverse (persist changes back out to the data files) The closest alternatives I've found are the GNU textutils that support join, cut, merge, etc but I need to add additional logic they don't support, nor do they allow field-level write operations from Python (UPDATE ... WHERE ...). Normally I'd jump right in and start coding but this seems like something so common that I would have expected someone else to have solved, so in the interest of not re-inventing the wheel I thought I'd see if anyone had any other suggestions. Any thoughts? Thanks! Ramon -- http://mail.python.org/mailman/listinfo/python-list
Re: What was that web interaction library called again?
Maybe http://twill.idyll.org/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Python plain-text database or library that supports joins?
> > i don't think that using flat text files as a database is common these > days. if you need relational database features what stops you from > using rdbms? if the only reason for that is some legacy system then > i'd still use in-memory sqlite database for all relational operations. > import, process, export back to text if you need to. > These are often one-off operations, so those import + export steps are non-trivial overhead. For example, most log files are structured, but it seems like we still use scripts or command line tools to find data in those files. I'm essentially doing the same thing, only with operations across multiple files (e.g. merge records these two files based on a common key, or append a column based on a look up value). I may end up having to go to DB, but that seems like a heavyweight jump for what are otherwise simple operations. Maybe this is the wrong forum for the question. I prefer programming in Python, but the use cases I'm looking is closer to shell scripting. I'd be perfectly happy with a more powerful version of GNU textutils that allowed for greater flexibility in text manipulation. HTH, Ramon -- http://mail.python.org/mailman/listinfo/python-list
Multiline CSV export to Excel produces unrecognized characters?
Hi --
Is there a standard way to use the csv module to export data that
contains multi-line values to Excel? I can get it mostly working, but
Excel seems to have difficulty displaying the generated multi-line
cells.
The following reproduces the problem in python 2.5:
import csv
row = [1,"hello","this is\na multiline\ntext field"]
writer = csv.writer(open("test.tsv", "w"), dialect="excel-tab",
quotechar='"')
writer.writerow(row)
When opening the resulting test.tsv file, I do indeed see a cell with
a multi-line value, but there is a small boxed question mark at the
end of each of the lines, as if Excel didn't recognize the linebreak.
Any idea why these are there or how to get rid of them?
Ramon
--
http://mail.python.org/mailman/listinfo/python-list
Re: Multiline CSV export to Excel produces unrecognized characters?
> > Any chance you are doing this on Windows and Excel doesn't like the > return+linefeed line endings that Windows produces when writing files in > text mode? Try 'wb' as mode for the output file. > > Ciao, > Marc 'BlackJack' Rintsch > Fixed! Can't believe I missed that... Thank you for the quick reply! Ramon -- http://mail.python.org/mailman/listinfo/python-list
Checking for network connectivity
Hi -- Is there a clean pythonic way to check for network connectivity? I have a script that needs to run periodically on a laptop to create a local cache of some network files. I would like it to fail gracefully when disconnected, as well as issue a warning if it hasn't been able to connect for X minutes / hours. I currently just use try / except to catch the network errors when the first call times out, but the timeout takes a while, and this doesn't feel like the right design because technically this isn't an exception -- it is expected behavior. Is there a better way to do this (and still be reasonably portable)? Thanks, Ramon -- http://mail.python.org/mailman/listinfo/python-list
Python libraries for log mining and event abstraction? (possibly OT)
Hi -- I am trying to do some event abstraction to mine a set of HTTP logs. We have a pretty clean stateless architecture with user IDs that allows us to understand what is retrieved on each session, and should allow us to detect the higher-order user activity from the logs. Ideally I'd love a python toolkit that has abstracted this out into a basic set of API calls or even a query language. An simple example is: find all instances of a search request, followed by a 2+ search requests with additional words in the search string, and group these into a higher-order "Iterative Search Refinement" event (i.e. the user got too many search results to start with, and is adding additional words to narrow down the results). So what I need is the ability to select temporally-related events out of the event stream (e.g. find searches by the same user within 10 second of each other), further filter based on additional criteria across these event (e.g. select only search events where there are additional search criteria relative to the previous search), and a way to annotate, roll- up or otherwise group matching patterns into a higher-level event. Some of these patterns may require non-trivial criteria / logic not supported by COTS log analytics, which is why I'm trying a toolkit approach that allows customization. I've been hunting around Google and the usual open source sites for something like this and haven't found anything (in python or otherwise). This is surprising to me, as I would think many people would benefit from something like this, so maybe I'm just describing the problem wrong or using the wrong keywords. I'm posting this to this group because it feels somewhat AI-ish (temporal event abstraction, etc) and that therefore pythonistas may have experience with (there seems to be a reasonably high correlation there). Further, if I can't find anything I'm going to have to build it myself, and it will be in python, so any pointers on elegant design patterns for how to do this using pythonic functional programming would be appreciated. Barring anything else I will start from itertools and work from there. That said, I'm hoping to use an existing library rather than re-invent the wheel. Any suggestions on where to look for something like this? Thanks! Ramon -- http://mail.python.org/mailman/listinfo/python-list
Pythonic use of CSV module to skip headers?
Hi -- I'm using the csv module to parse a tab-delimited file and wondered whether there was a more elegant way to skip an possible header line. I'm doing line = 0 reader = csv.reader(file(filename)) for row in reader: if (ignoreFirstLine & line == 0): continue line = line+1 # do something with row The only thing I could think of was to specialize the default reader class with an extra skipHeaderLine constructor parameter so that its next() method can skip the first line appropriate. Is there any other cleaner way to do it w/out subclassing the stdlib? Thanks! Ramon -- http://mail.python.org/mailman/listinfo/python-list
Re: Pythonic list to bitflag mapping
> Or can be used directly as an integer index to get a character > > >>> ['01'[x in a] for x in xrange(10)] > ['0', '0', '0', '1', '1', '0', '1', '0', '0', '0'] > Very cool -- this does the trick nicely and seems quite extensible, now that I get the basic idiom. Belated thanks for the quick replies on this one! Ramon -- http://mail.python.org/mailman/listinfo/python-list
