Re: Obtain the query interface url of BCS server.
On 9/12/2022 5:00 AM, [email protected] wrote: I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script. However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim? [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10 Regards, Zhao You didn't say what you want to query. Are you trying to download entire sections of the Bilbao Crystallographic Server? Maybe the admins will give you access to the data. * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen brings up the table of space group symbols. * choose say #7: Pc * now click ITA Settings, then choose the last entry "P c 1 1" and it loads: https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita You might be able to fool around with that URL and substitute values and get back the data you want (in HTML) via Python. Do you really want HTML results? Hit Ctrl+U to see the source HTML of a webpage Right-click or hit Ctrl + Shift + C to inspect the individual elements of the page -- https://mail.python.org/mailman/listinfo/python-list
Re: Obtain the query interface url of BCS server.
On 9/13/2022 3:46 AM, [email protected] wrote: On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote: On 9/12/2022 5:00 AM, [email protected] wrote: I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script. However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim? [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10 Regards, Zhao You didn't say what you want to query. Are you trying to download entire sections of the Bilbao Crystallographic Server? I am engaged in some related research and need some specific data used by BCS server. What specific data? Is it available elsewhere? Maybe the admins will give you access to the data. I don't think they will provide such convenience to researchers who have no cooperative relationship with them. You can try. Tell the admins what data you want, and ask them for the easiest way to get it. * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen brings up the table of space group symbols. * choose say #7: Pc * now click ITA Settings, then choose the last entry "P c 1 1" and it loads: https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita Not only that, but I want to obtain all such URLs programmatically! You might be able to fool around with that URL and substitute values and get back the data you want (in HTML) via Python. Do you really want HTML results? Hit Ctrl+U to see the source HTML of a webpage Right-click or hit Ctrl + Shift + C to inspect the individual elements of the page For batch operations, all these manual methods are inefficient. Yes, but I don't think you'll be able to retrieve the URLs programmatically. The JavaScript code doesn't put them in the HTML result, except for that one I showed you, which seems like a mistake on their part. So you'll have to figure out the search fields, and your python program will have to cycle through the search values: Sample from above gnum = 007 what = gp trmat = b,-a-c,c unconv = P c 1 1 from = ita wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen"; wGnum = "?gnum=" + findgnum wWhat = "&what=" + findWhat wTrmat = "&trmat=" + findTrmat wUnconv = "&unconv=" + findUnconv wFrom = "&from=" + findFrom webpage = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom Then if that returns a hit, you'll have to parse the resulting HTML and extract the exact data you want. I did something similar a while back using the requests and lxml libraries #build url wBase= "http://www.usdirectory.com"; wForm= "/ypr.aspx?fromform=qsearch" wKeyw= "&qhqn=" + keyw wCityZip = "&qc=" + cityzip wState = "&qs=" + state wDist= "&rg=" + str(miles) wSort= "&sb=a2z" #sort alpha wPage= "&ap=" #used with the results page number webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist #open URL page = requests.get(webpage) tree = html.fromstring(page.content) #no matches matches = tree.xpath('//strong/text()') if passNbr == 1 and ("No results were found" in str(matches)): print "No results found for that search" exit(0) 2.x code file: https://file.io/VdptORSKh5CN Best Regards, Zhao -- https://mail.python.org/mailman/listinfo/python-list
Re: Obtain the query interface url of BCS server.
On 9/13/2022 7:29 PM, [email protected] wrote: On Tuesday, September 13, 2022 at 9:33:20 PM UTC+8, DFS wrote: On 9/13/2022 3:46 AM, [email protected] wrote: On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote: On 9/12/2022 5:00 AM, [email protected] wrote: I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script. However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim? [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10 Regards, Zhao You didn't say what you want to query. Are you trying to download entire sections of the Bilbao Crystallographic Server? I am engaged in some related research and need some specific data used by BCS server. What specific data? All the data corresponding to the total catalog here: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen Is it available elsewhere? This is an internationally recognized authoritative data source in this field. Data from other places, even if there are readily available electronic versions, are basically taken from here and are not comprehensive. Maybe the admins will give you access to the data. I don't think they will provide such convenience to researchers who have no cooperative relationship with them. You can try. Tell the admins what data you want, and ask them for the easiest way to get it. * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen brings up the table of space group symbols. * choose say #7: Pc * now click ITA Settings, then choose the last entry "P c 1 1" and it loads: https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita Not only that, but I want to obtain all such URLs programmatically! You might be able to fool around with that URL and substitute values and get back the data you want (in HTML) via Python. Do you really want HTML results? Hit Ctrl+U to see the source HTML of a webpage Right-click or hit Ctrl + Shift + C to inspect the individual elements of the page For batch operations, all these manual methods are inefficient. Yes, but I don't think you'll be able to retrieve the URLs programmatically. The JavaScript code doesn't put them in the HTML result, except for that one I showed you, which seems like a mistake on their part. So you'll have to figure out the search fields, and your python program will have to cycle through the search values: Sample from above gnum = 007 what = gp trmat = b,-a-c,c unconv = P c 1 1 from = ita The problem is that I must first get all possible combinations of these variables. Shouldn't be too hard, but I've never done some of these things and have no code for you: space group number = gnum = 1 to 230 * use python to put each of those values, one at a time, into the group number field on the webpage * use python to simulate a button click of the ITA Settings button * it should load the HTML of the list of ITA settings for that space group * use python to parse the HTML and extract each of the ITA settings. The line of HTML has 'ITA number' in it. Find each of the 'href' values in the line(s). Real HTML from ITA Settings for space group 10: ITA number bgcolor="#bb">Settingbgcolor="#f0f0f0">10 href="/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp">P 1 2/m 1bgcolor="#f0f0f0">10 href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P 1 1 2/m&from=ita">P 1 1 2/malign="center" bgcolor="#f0f0f0">10 href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P 2/m 1 1&from=ita">P 2/m 1 1 If you parse it right you'll have these addresses: "/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp" "/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P 1 1 2/m&from=ita" "/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P 2/m 1 1&from=ita" Then you can parse each of these addresses and build a master list of the valid combinations of: gnum, what, trmat, unconv, from Check into the lxml library, and the 'etree' class. https://lxml.de You can also search gen.lib.rus.ec for the crystallography volumes, and maybe cut and paste data from them. wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen"; wGnum = "?gnum=" + findgnum wWhat = "&wh
Re: Uninstall tool not working.
On 9/13/2022 3:54 PM, Salvatore Bruzzese wrote: Hi, I was trying to uninstall version 3.10.7 of python but I've encountered problems with the uninstall tool. I open the python setup program, click on the uninstall button but it doesn't even start deleting python even though it says that the process has finished. Feel free to ask for more details in case I didn't explain it correctly. Thanks in advance for your help. https://stackoverflow.com/questions/3515673/how-to-completely-remove-python-from-a-windows-machine -- https://mail.python.org/mailman/listinfo/python-list
Quick question about CPython interpreter
- this does a str() conversion in the loop - for i in range(cells.count()): if text == str(ID): break - this does one str() conversion before the loop - strID = str(ID) for i in range(cells.count()): if text == strID: break But does CPython interpret the str() conversion away and essentially do it for me in the first example? -- https://mail.python.org/mailman/listinfo/python-list
Any PyQt developers here?
Having problems with removeRow() on a QTableView object.
After calling removeRow(), the screen isn't updating. It's as if the
model is read-only, but it's a QSqlTableModel() model, which is not
read-only.
The underlying SQL is straightforward (one table) and all columns are
editable.
None of the editStrategies are working either.
I tried everything I can think of, including changes to the
EditTriggers, but no luck. HELP!
FWIW, the same removeRow() code works fine with a QTableWidget.
---
object creation and data loading all works fine
---
#open db connection
qdb = QSqlDatabase.addDatabase("QSQLITE")
qdb.setDatabaseName(dbname)
qdb.open()
#prepare query and execute to return data
query = QSqlQuery()
query.prepare(cSQL)
query.exec_()
#set model type and query
model = QSqlTableModel()
model.setQuery(query)
#assign model to QTableView object
view = frm.tblPostsView
view.setModel(model)
#get all data
while(model.canFetchMore()): model.fetchMore()
datarows = model.rowCount()
---
iterate selected rows also works fine
SelectionMode is Extended.
identical code works for a QTableWidget
---
selected = tbl.selectionModel().selectedRows()
#reverse sort the selected items to delete from bottom up
selected = sorted(selected,reverse=True)
for i,val in enumerate(selected):
tbl.model().removeRow(selected[i].row())
--
https://mail.python.org/mailman/listinfo/python-list
Re: Any PyQt developers here?
On 10/25/2022 1:45 PM, Thomas Passin wrote:
On 10/25/2022 1:03 PM, DFS wrote:
Having problems with removeRow() on a QTableView object.
removeRow() isn't listed as being a method of a QTableView, not even an
inherited method, so how are you calling removeRow() on it? (See
https://doc.qt.io/qt-6/qtableview-members.html)
* I thought I was calling it the same way it's called with
QTableWidgets: tbl.removeRow()
But looking at my code again I was using tbl.model().removeRow()
* Plus I found several others online with similar removeRow() issues
with QTableViews.
* Plus the code didn't throw an error:
selected = tbl.selectionModel().selectedRows()
#reverse sort the selected items to delete from bottom up
selected = sorted(selected,reverse=True)
for i,val in enumerate(selected):
tbl.model().removeRow(selected[i].row())
But... as you say, when looking at the docs, removeRow() isn't even one
of the slots for QTableViews. So duh!
I see the QTableView.hideRow(row) method, which does exactly what I need.
Thanks man!
After calling removeRow(), the screen isn't updating. It's as if the
model is read-only, but it's a QSqlTableModel() model, which is not
read-only.
The underlying SQL is straightforward (one table) and all columns are
editable.
None of the editStrategies are working either.
I tried everything I can think of, including changes to the
EditTriggers, but no luck. HELP!
FWIW, the same removeRow() code works fine with a QTableWidget.
---
object creation and data loading all works fine
---
#open db connection
qdb = QSqlDatabase.addDatabase("QSQLITE")
qdb.setDatabaseName(dbname)
qdb.open()
#prepare query and execute to return data
query = QSqlQuery()
query.prepare(cSQL)
query.exec_()
#set model type and query
model = QSqlTableModel()
model.setQuery(query)
#assign model to QTableView object
view = frm.tblPostsView
view.setModel(model)
#get all data
while(model.canFetchMore()): model.fetchMore()
datarows = model.rowCount()
---
iterate selected rows also works fine
SelectionMode is Extended.
identical code works for a QTableWidget
---
selected = tbl.selectionModel().selectedRows()
#reverse sort the selected items to delete from bottom up
selected = sorted(selected,reverse=True)
for i,val in enumerate(selected):
tbl.model().removeRow(selected[i].row())
--
https://mail.python.org/mailman/listinfo/python-list
Re: Any PyQt developers here?
On 10/25/2022 2:03 PM, Barry Scott wrote: There is an active PyQt mailing list that has lots of helpful and knowledgeable people on it. https://www.riverbankcomputing.com/mailman/listinfo/pyqt Barry Thanks. I'll send some questions their way, I'm sure. -- https://mail.python.org/mailman/listinfo/python-list
A little source file analyzer
Nothing special, but kind of fun to use
$python progname.py sourcefile.py
-
#count blank lines, comments, source code
import sys
#counters
imports, blanks,comments, source = 0,0,0,0
functions, dbexec, total = 0,0,0
#python builtins
builtins = 0
bins =
['abs','aiter','all','any','anext','ascii','bin','bool','breakpoint','bytearray','bytes','callable','chr','classmethod','compile','complex','delattr','dict','dir','divmod','enumerate','eval','exec','filter','float','format','frozenset','getattr','globals','hasattr','hash','help','hex','id','input','int','isinstance','issubclass','iter','len','list','locals','map','max','memoryview','min','next','object','oct','open','ord','pow','property','range','repr','reversed','round','set','setattr','slice','sorted','staticmethod','str','sum','super','tuple','type','vars','zip']
bins2,bins3 = [],[]
for bi in bins: bins2.append(' ' + bi + '(') #look for leading space
then builtin then open paren
#todo use for source files other than .py
ccomments = 0
py_comment = ['#','~','@']
c_comment = ['/*','//']
#read file
f = open(sys.argv[1], encoding='utf-8')
lines = f.read().splitlines()
f.close()
#print builtin usage count
#def binusage():
#iterate file
linenbr = 0
for line in lines:
line = line.strip()
linenbr += 1
if line == '' : blanks += 1
if line != '':
if line[0:1] == '#': comments += 1
if line[0:3] == '"""' : comments += 1
if line[-3:1] == '"""' : comments += 1
if line[0:1] not in ['#','"']:
source += 1
if line[0:3] == 'def' and line[-2:] == '):' :
functions += 1
if '.execute' in line : dbexec += 1
if 'commit()' in line : dbexec += 1
if 'import' in line : imports += 1
if 'print(' in line : bins3.append('print')
for bi in bins2:#usage of a python builtin
function
if bi in line:
bins3.append(bi[1:-1])
total += 1
#output
print('imports : ' + str(imports))
print('source: ' + str(source))
print('-functions: ' + str(functions))
print('-db exec : ' + str(dbexec) + 'x')
ctxt = ''
x = [(i,bins3.count(i)) for i in sorted(set(bins3))]
for bi,cnt in x: ctxt += bi + '('+ str(cnt) + '), '
print('-builtins : ' + str(len(bins3)) + 'x [' + ctxt[:-2] + ']')
print('comments : ' + str(comments))
print('blanks: ' + str(blanks))
print('Total : ' + str(total))
-
--
https://mail.python.org/mailman/listinfo/python-list
Re: Any PyQt developers here?
On 10/25/2022 1:45 PM, Thomas Passin wrote: On 10/25/2022 1:03 PM, DFS wrote: Having problems with removeRow() on a QTableView object. removeRow() isn't listed as being a method of a QTableView, not even an inherited method, so how are you calling removeRow() on it? (See https://doc.qt.io/qt-6/qtableview-members.html) Since you helped me on the last one, maybe you could try to answer a couple more [probably simple] roadblocks I'm hitting. I just wanna set the font to bold/not-bold when clicking on a row in QTableView. With a QTableWidget I do it like this: font = QFont() font.setBold(True) or False QTableWidget.item(row,col).setFont(font) But the QTableView has data/view 'models' attached to it and that syntax doesn't work: Tried: font = QFont() font.setBold(True) or False model = QTableView.model() model.setFont(model.index(row,col), font) Throws AttributeError: 'QSqlTableModel' object has no attribute 'setFont' This doesn't throw an error, but doesn't show bold: model.setData(model.index(tblRow, col), font, Qt.FontRole) Any ideas? Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: Any PyQt developers here?
On 10/27/2022 3:47 PM, Thomas Passin wrote:
On 10/27/2022 11:15 AM, DFS wrote:
On 10/25/2022 1:45 PM, Thomas Passin wrote:
On 10/25/2022 1:03 PM, DFS wrote:
Having problems with removeRow() on a QTableView object.
removeRow() isn't listed as being a method of a QTableView, not even
an inherited method, so how are you calling removeRow() on it? (See
https://doc.qt.io/qt-6/qtableview-members.html)
Since you helped me on the last one, maybe you could try to answer a
couple more [probably simple] roadblocks I'm hitting.
I just wanna set the font to bold/not-bold when clicking on a row in
QTableView.
With a QTableWidget I do it like this:
font = QFont()
font.setBold(True) or False
QTableWidget.item(row,col).setFont(font)
But the QTableView has data/view 'models' attached to it and that
syntax doesn't work:
Tried:
font = QFont()
font.setBold(True) or False
model = QTableView.model()
model.setFont(model.index(row,col), font)
Throws AttributeError: 'QSqlTableModel' object has no attribute 'setFont'
This doesn't throw an error, but doesn't show bold:
model.setData(model.index(tblRow, col), font, Qt.FontRole)
Any ideas?
You definitely need to be setting the font in an item. I'm not sure but
I think that your QFont() doesn't have any properties, so it doesn't do
anything. I found this bit in a page - it's in C++ instead of Python
but that doesn't really make a difference except for the exact syntax to
use -
https://forum.qt.io/topic/70016/qlistview-item-font-stylesheet-not-working/4
QVariant v = ModelBaseClass::data(index,role);
if( condition && role == Qt::FontRole )
{
QFont font = v.value();
font.setBold( true );
v = QVariant::fromValue( font );
}
IOW, you have to get the font from the item, then set it to bold, which
you would do with setFont(). Then you set that new font on the item. Of
course you would have to unset bold on it later. See
https://doc.qt.io/qt-6/qtablewidgetitem.html#font
Instead of "item", you might need to operate on "row". I didn't look
into that. Since a row probably doesn't have just one font (since it
can have more than one item), you'd still have to get the font from some
item in the row.
You might also be able to make the item bold using CSS, but I'm not sure.
Thanks
Internet searches are your friend for questions like this.
Before I posted I spent a couple hours looking online, reading the docs,
and trying different ways.
I found one person that said they did it but their syntax didn't work.
But it doesn't throw an error either.
model.setData(model.index(tblRow, col), font, Qt.FontRole)
When I'm done with my app (nearly 2K LOC) I'm going to put a summary out
there somewhere with a bunch of examples of easy ways to do things. For
one thing I wrote zero classes. Not one.
I've never
worked with a QTableView, so I had to start with some knowledge about
some other parts of QT. I found the first page searching for "qt set
qtableview row font", and the second searching for "qtablewidgetitem".
I used TableWidgets in 2 apps and no problems. In this app there's more
data and more sorting, and one of the TableWidgets took a while to load
35K rows (7 items per row). So I tried a TableView. Incredibly fast -
4x the speed - but it doesn't have the bolding in place yet. That could
slow it down.
As you know, a TableView is tied to the underlying datasource (in my
case via a QSqlTableModel), but it's much faster to show data than a
TableWidget, because with the widget you have populate each cell with
setItem().
The Widget is slower but easier to work with. So it's a tradeoff.
And I think I found some bugs in the TableViews. The Views have
editStrategies() that control how data is updated (if the model supports
editing), but they don't work the way the docs say they do.
In my app, when I click on a row a flag field is changed from N to Y
onscreen (well, it's hidden but it's in the row).
model.setData(model.index(row,7), 'Y')
OnFieldChange : all changes to the model will be applied immediately to
the database.
model.setEditStrategy(QSqlTableModel.OnFieldChange)
Doesn't work right. The screen is updated the first row you click on,
but the db isn't updated until you reload the view.
OnRowChange: changes to a row will be applied when the user selects
a different row.
model.setEditStrategy(QSqlTableModel.OnRowChange)
Doesn't work right. The screen is updated the first row you click on,
but the db isn't updated until you reload the view.
OnManualSubmit : all changes will be cached in the model until either
submitAll() or revertAll() is called.
model.setEdi
What's tkinter doing in \Lib\site-packages\future\moves ?
3.9.13 -- https://mail.python.org/mailman/listinfo/python-list
Re: What's tkinter doing in \Lib\site-packages\future\moves ?
On 11/7/2022 10:48 PM, DFS wrote: 3.9.13 Never mind. User error - I didn't install it in the first place. -- https://mail.python.org/mailman/listinfo/python-list
Re: Need max values in list of tuples, based on position
On 11/11/2022 12:49 PM, Dennis Lee Bieber wrote:
On Fri, 11 Nov 2022 02:22:34 -0500, DFS declaimed the
following:
[(0,11), (1,1), (2,1),
(0,1) , (1,41), (2,2),
(0,9) , (1,3), (2,12)]
The set of values in elements[0] is {0,1,2}
I want the set of max values in elements[1]: {11,41,12}
Do they have to be IN THAT ORDER?
Yes.
data = [(0,11), (1,1), (2,1), (0,1) , (1,41), (2,2), (0,9) , (1,3), (2,12)]
reshape = list(zip(*data))
result = sorted(reshape[1])[-3:]
result
[11, 12, 41]
--
https://mail.python.org/mailman/listinfo/python-list
Need max values in list of tuples, based on position
[(0,11), (1,1), (2,1),
(0,1) , (1,41), (2,2),
(0,9) , (1,3), (2,12)]
The set of values in elements[0] is {0,1,2}
I want the set of max values in elements[1]: {11,41,12}
--
https://mail.python.org/mailman/listinfo/python-list
Re: Need max values in list of tuples, based on position
On 11/11/2022 7:50 AM, Stefan Ram wrote:
Pancho writes:
def build_max_dict( tups):
dict = {}
for (a,b) in tups:
if (a in dict):
if (b>dict[a]):
dict[a]=b
else:
dict[a]=b
return(sorted(dict.values()))
Or,
import itertools
import operator
def build_max_dict( tups ):
key = operator.itemgetter( 0 )
groups = itertools.groupby( sorted( tups, key=key ), key )
return set( map( lambda x: max( x[ 1 ])[ 1 ], groups ))
FYI, neither of those solutions work:
Pancho: 11, 12, 41
You : 41, 11, 12
The answer I'm looking for is 11,41,12
Maybe a tuple with the same info presented differently would be easier
to tackle:
orig:
[(0, 11), (1, 1), (2, 1),
(0, 1), (1, 41), (2, 2),
(0, 9), (1, 3), (2, 12)]
new: [(11,1,1),
(1,41,2),
(9,3,12)]
I'm still looking for the max value in each position across all elements
of the tuple, so the answer is still 11,41,12.
Edit: found a solution online:
-
x = [(11,1,1),(1,41,2),(9,3,12)]
maxvals = [0]*len(x[0])
for e in x:
maxvals = [max(w,int(c)) for w,c in zip(maxvals,e)]
print(maxvals)
[11,41,12]
-
So now the challenge is making it a one-liner!
Thanks
--
https://mail.python.org/mailman/listinfo/python-list
Re: Need max values in list of tuples, based on position
On 11/11/2022 2:22 PM, Pancho wrote:
On 11/11/2022 18:53, DFS wrote:
On 11/11/2022 12:49 PM, Dennis Lee Bieber wrote:
On Fri, 11 Nov 2022 02:22:34 -0500, DFS declaimed the
following:
[(0,11), (1,1), (2,1),
(0,1) , (1,41), (2,2),
(0,9) , (1,3), (2,12)]
The set of values in elements[0] is {0,1,2}
I want the set of max values in elements[1]: {11,41,12}
Do they have to be IN THAT ORDER?
Yes.
Sets aren't ordered, which is why I gave my answer as a list. A wrongly
ordered list, but I thought it rude to point out my own error, as no one
else had. :-)
Assuming you want numeric order of element[0], rather than first
occurrence order of the element[0] in the original tuple list. In this
example, they are both the same.
Here is a corrected version
from collections import OrderedDict
def build_max_dict( tups):
dict = OrderedDict()
for (a,b) in tups:
if (a in dict):
if (b>dict[a]):
dict[a]=b
else:
dict[a]=b
return(dict.values())
This solution giving the answer as type odict_values. I'm not quite sure
what this type is, but it seems to be a sequence/iterable/enumerable
type, whatever the word is in Python.
Caveat: I know very little about Python.
Thanks for looking at it. I'm trying to determine the maximum length of
each column result in a SQL query. Normally you can use the 3rd value
of the cursor.description object (see the DB-API spec), but apparently
not with my dbms (SQLite). The 'display_size' column is None with
SQLite. So I had to resort to another way.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Need max values in list of tuples, based on position
On 11/11/2022 7:04 PM, Dennis Lee Bieber wrote:
On Fri, 11 Nov 2022 15:03:49 -0500, DFS declaimed the
following:
Thanks for looking at it. I'm trying to determine the maximum length of
each column result in a SQL query. Normally you can use the 3rd value
of the cursor.description object (see the DB-API spec), but apparently
not with my dbms (SQLite). The 'display_size' column is None with
SQLite. So I had to resort to another way.
Not really a surprise. SQLite doesn't really have column widths --
As I understand it, the cursor.description doesn't look at the column
type - it goes by the data in the cursor.
since any column can store data of any type; affinities just drive it into
what may be the optimal storage for the column... That is, if a column is
"INT", SQLite will attempt to convert whatever the data is into an integer
-- but if the data is not representable as an integer, it will be stored as
the next best form.
Yeah, I don't know why cursor.description doesn't work with SQLite; all
their columns are basically varchars.
123 => stored as integer
"123" => converted and stored as integer
123.0 => probably converted to integer
123.5 => likely stored as numeric/double
"one two three" => can't convert, store it as a string
We've not seen the SQL query in question,
The query is literally any SELECT, any type of data, including SELECT *.
The reason it works with SELECT * is the cursor.description against
SQLite DOES give the column names:
select * from timezone;
print(cur.description)
(
('TIMEZONE', None, None, None, None, None, None),
('TIMEZONEDESC', None, None, None, None, None, None),
('UTC_OFFSET', None, None, None, None, None, None)
)
(I lined up the data)
Anyway, I got it working nicely, with the help of the solution I found
online and posted here earlier:
-
x = [(11,1,1),(1,41,2),(9,3,12)]
maxvals = [0]*len(x[0])
for e in x:
#clp example using only ints
maxvals = [max(w,int(c)) for w,c in zip(maxvals,e)] #clp example
#real world - get the length of the data string, even if all numeric
maxvals = [max(w,len(str(c))) for w,c in zip(maxvals,e)]
print(maxvals)
[11,41,12]
-
Applied to real data, the iterations might look like this:
[4, 40, 9]
[4, 40, 9]
[4, 40, 9]
[4, 40, 18]
[4, 40, 18]
[4, 40, 18]
[5, 40, 18]
[5, 40, 18]
[5, 40, 18]
[5, 69, 18]
[5, 69, 18]
[5, 69, 18]
The last row contains the max width of the data in each column.
Then I compare those datawidths to the column name widths, and take the
wider of the two, so [5,69,18] might change to [8,69,18] if the column
label is wider than the widest bit of data in the column
convert those final widths into a print format string, and everything
fits well: Each column is perfectly sized and it all looks pleasing to
the eye (and no external libs like tabulate used either).
https://imgur.com/UzO3Yhp
The 'downside' is you have to fully iterate the data twice: once to get
the widths, then again to print it.
If I get a wild hair I might create a PostgreSQL clone of my db and see
if the cursor.description works with it. It would also have to iterate
the data to determine that 'display_size' value.
https://peps.python.org/pep-0249/#cursor-attributes
> but it might suffice to use a
> second (first?) SQL query with aggregate (untested)
>
>max(length(colname))
>
> for each column in the main SQL query.
Might be a pain to code dynamically.
"""
length(X)
For a string value X, the length(X) function returns the number of
characters (not bytes) in X prior to the first NUL character. Since SQLite
strings do not normally contain NUL characters, the length(X) function will
usually return the total number of characters in the string X. For a blob
value X, length(X) returns the number of bytes in the blob. If X is NULL
then length(X) is NULL. If X is numeric then length(X) returns the length
of a string representation of X.
"""
Note the last sentence for numerics.
Thanks for looking at it.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Need max values in list of tuples, based on position
On 11/13/2022 7:37 AM, Pancho wrote: On 11/11/2022 19:56, DFS wrote: Edit: found a solution online: - x = [(11,1,1),(1,41,2),(9,3,12)] maxvals = [0]*len(x[0]) for e in x: maxvals = [max(w,int(c)) for w,c in zip(maxvals,e)] print(maxvals) [11,41,12] - So now the challenge is making it a one-liner! x = [(11,1,1),(1,41,2),(9,3,12)] print(functools.reduce( lambda a,b : [max(w,c) for w,c in zip(a,b)], x, [0]*len(x[0]))) noice! -- https://mail.python.org/mailman/listinfo/python-list
In code, list.clear doesn't throw error - it's just ignored
In code, list.clear is just ignored. At the terminal, list.clear shows in code: x = [1,2,3] x.clear print(len(x)) 3 at terminal: x = [1,2,3] x.clear print(len(x)) 3 Caused me an hour of frustration before I noticed list.clear() was what I needed. x = [1,2,3] x.clear() print(len(x)) 0 -- https://mail.python.org/mailman/listinfo/python-list
Re: In code, list.clear doesn't throw error - it's just ignored
On 11/13/2022 5:20 PM, Jon Ribbens wrote: On 2022-11-13, DFS wrote: In code, list.clear is just ignored. At the terminal, list.clear shows in code: x = [1,2,3] x.clear print(len(x)) 3 at terminal: x = [1,2,3] x.clear print(len(x)) 3 Caused me an hour of frustration before I noticed list.clear() was what I needed. x = [1,2,3] x.clear() print(len(x)) 0 If you want to catch this sort of mistake automatically then you need a linter such as pylint: $ cat test.py """Create an array and print its length""" array = [1, 2, 3] array.clear print(len(array)) $ pylint -s n test.py * Module test test.py:4:0: W0104: Statement seems to have no effect (pointless-statement) Thanks, I should use linters more often. But why is it allowed in the first place? I stared at list.clear and surrounding code a dozen times and said "Looks right! Why isn't it clearing the list?!?!" 2 parens later and I'm golden! -- https://mail.python.org/mailman/listinfo/python-list
Re: In code, list.clear doesn't throw error - it's just ignored
On 11/13/2022 9:11 PM, Chris Angelico wrote: On Mon, 14 Nov 2022 at 11:53, DFS wrote: On 11/13/2022 5:20 PM, Jon Ribbens wrote: On 2022-11-13, DFS wrote: In code, list.clear is just ignored. At the terminal, list.clear shows in code: x = [1,2,3] x.clear print(len(x)) 3 at terminal: x = [1,2,3] x.clear print(len(x)) 3 Caused me an hour of frustration before I noticed list.clear() was what I needed. x = [1,2,3] x.clear() print(len(x)) 0 If you want to catch this sort of mistake automatically then you need a linter such as pylint: $ cat test.py """Create an array and print its length""" array = [1, 2, 3] array.clear print(len(array)) $ pylint -s n test.py * Module test test.py:4:0: W0104: Statement seems to have no effect (pointless-statement) Thanks, I should use linters more often. But why is it allowed in the first place? I stared at list.clear and surrounding code a dozen times and said "Looks right! Why isn't it clearing the list?!?!" 2 parens later and I'm golden! No part of it is invalid, so nothing causes a problem. For instance, you can write this: If it wastes time like that it's invalid. This is an easy check for the interpreter to make. If I submit a suggestion to [email protected] will it just show up here? Or do the actual Python devs intercept it? 1 And you can write this: 1 + 2 And you can write this: print(1 + 2) But only one of those is useful in a script. Should the other two be errors? No. But linters WILL usually catch them, so if you have a good linter (especially built into your editor), you can notice these things. ran pylint against it and got 0.0/10. --disable= invalid-name multiple-statements bad-indentation line-too-long trailing-whitespace missing-module-docstring missing-function-docstring too-many-lines fixme and got 8.9/10. -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3.11.0 installation and Tkinter does not work
On 11/21/2022 12:59 PM, [email protected] wrote: Dear list, I want learn python for 4 weeks and have problems, installing Tkinter. If I installed 3.11.0 for my windows 8.1 from python.org and type >>> import _tkinter > Traceback (most recent call last): > File "", line 1, in > ImportError: DLL load failed while importing _tkinter: Das angegebene > Modul wurde nicht gefunden. > So I it is a tkinter Problem and I tried this: > >>> import _tkinter > Traceback (most recent call last): > File "", line 1, in > ImportError: DLL load failed while importing _tkinter: Das angegebene > Modul wurde nicht gefunden. How can I fix this and make it work? When installing Python 3.11.0 did you check the box "tcl/tk and IDLE"? (it's an option on the Python Windows installer). I made sure to do that, and then this worked: import tkinter from tkinter import filedialog as fd from tkinter.filedialog import askopenfilename filename = fd.askopenfilename() print(filename) foldername = fd.askdirectory() print(foldername) time.sleep(3) -- https://mail.python.org/mailman/listinfo/python-list
Re: Vb6 type to python
On 11/30/2022 6:56 AM, [email protected] wrote: Hello i have a byte file, that fill a vb6 type like: Type prog_real codice As String * 12'hsg denom As String * 24'oo codprof As String * 12 'ljio note As String * 100 programmer As String * 11 Out As Integer b_out As Byte'TRUE = Sec FALSE= mm asse_w As Byte '3.zo Asse --> 0=Z 1=W numpassi As Integer 'put len As Long 'leng p(250) As passo_pg vd(9) As Byte'vel. qUscita(9) As Integer'quote l_arco As Long 'reserved AxDin As Byte'dime End Type How i can convert to python You don't need to declare variable types in Python. I don't do Python OO so someone else can answer better, but a simple port of your VB type would be a python class definition: class prog_real: codice, denom, codprof, note, programmer AxDin, b_out, asse_w, vd, Out, numpassi, qUscita len, l_arco, p important: at some point you'll have trouble with a variable named 'len', which is a Python built-in function. For a visual aid you could label the variables by type and assign an initial value, if that helps you keep track in your mind. class prog_real: # strings codice, denom, codprof, note, programmer = '', '', '', '', '' # bytes AxDin, b_out, asse_w, vd = 0, 0, 0, 0 # ints Out, numpassi, qUscita = 0, 0, 0 # longs len, l_arco = 0, 0 # misc p = '' But it's not necessary. To restrict the range of values in the variables you would have to manually check them each time before or after they change, or otherwise force some kind of error/exception that occurs when the variable contains data you don't want. # assign values prog_real.codice = 'ABC' print('codice: ' + prog_real.codice) prog_real.codice = 'DEF' print('codice: ' + prog_real.codice) prog_real.codice = 123 print('codice: ' + str(prog_real.codice)) And as shown in the last 2 lines, a variable can accept any type of data, even after it's been initialized with a different type. b = 1 print(type(b)) b = 'ABC' print(type(b)) Python data types: https://www.digitalocean.com/community/tutorials/python-data-types A VB to python program: https://vb2py.sourceforge.net -- https://mail.python.org/mailman/listinfo/python-list
Re: Vb6 type to python
On 11/30/2022 1:07 PM, DFS wrote: On 11/30/2022 6:56 AM, [email protected] wrote: I don't do Python OO so someone else can answer better, but a simple port of your VB type would be a python class definition: class prog_real: codice, denom, codprof, note, programmer AxDin, b_out, asse_w, vd, Out, numpassi, qUscita len, l_arco, p Sorry for bad advice - that won't work. The other class definition that initializes the variables does work: class prog_real: # strings codice, denom, codprof, note, programmer = '', '', '', '', '' # bytes AxDin, b_out, asse_w, vd = 0, 0, 0, 0 # ints Out, numpassi, qUscita = 0, 0, 0 # longs len, l_arco = 0, 0 # misc p = '' -- https://mail.python.org/mailman/listinfo/python-list
Python is maybe the most widely used language, but clp gets 0 posts some days?
Usenet is dead. Long live Usenet. -- https://mail.python.org/mailman/listinfo/python-list
Re: New computer, new Python
On 12/9/2022 12:13 PM, [email protected] wrote: Hello. I've downloaded the new Python to my new Computer, and the new Python mystifies me. Instead of an editor, it looks like a Dos executable program. python.exe is a Windows executable. How can I write my own Python Functions and subroutines in the new Python? Open a text editor and write your own functions and subs. Save the file as prog.py. From the command line (not from inside the Python shell), type: $ python prog.py It is version 3.11 (64 bit). The latest and greatest. Significantly sped up vs 3.10. -- https://mail.python.org/mailman/listinfo/python-list
Re: Does one have to use curses to read single characters from keyboard?
On 12/11/2022 5:09 AM, Chris Green wrote:
Is the only way to read single characters from the keyboard to use
curses.cbreak() or curses.raw()? If so how do I then read characters,
it's not at all obvious from the curses documentation as that seems to
think I'm using a GUI in some shape or form.
All I actually want to do is get 'Y' or 'N' answers to questions on
the command line.
Searching for ways to do this produces what seem to me rather clumsy
ways of doing it.
resp = 'x'
while resp.lower() not in 'yn':
resp = input("Did you say Y or did you say N?: ")
--
https://mail.python.org/mailman/listinfo/python-list
Re: How to get the needed version of a dependency
On 12/14/2022 3:55 AM, Cecil Westerhof wrote: If I want to know the dependencies for requests I use: pip show requests And one of the lines I get is: Requires: certifi, charset-normalizer, idna, urllib3 But I want (in this case) to know with version of charset-normalizer requests needs. How do I get that? Check the METADATA file in the *dist-info package files usually found in Lib\site-packages. ie \Python\3.11.0\Lib\site-packages\pandas-1.5.2.dist-info Look for config lines beginning with 'Requires': Requires-Python: >=3.8 Requires-Dist: python-dateutil (>=2.8.1) $ pip list will show you which version of the package you have installed, so you can search for the matching .dist-info file -- https://mail.python.org/mailman/listinfo/python-list
Re: Fwd: Installation hell
On 12/18/2022 6:50 AM, Jim Lewis wrote: I'm an occasional user of Python and have a degree in computer science. Almost every freaking time I use Python, I go through PSH (Python Setup Hell). Sometimes a wrong version is installed. Sometimes it's a path issue. Or exe naming confusion: python, python3, phthon311, etc. Or library compatibility issues - took an hour to find out that pygame does not work with the current version of python. Then the kludgy PIP app and using a DOS box under Windows with command prompts which is ridiculous. God only knows how many novice users of the language (or even intermediate users) were lost in the setup process. Why not clean the infrastructure up and make a modern environment or IDE or something better than it is now. Or at least good error messages that explain exactly what to do. Even getting this email to the list took numerous steps. -- A frustrated user Issues installing python and sending an email? Ask for a refund on your compsci degree. -- https://mail.python.org/mailman/listinfo/python-list
Connecting python to DB2 database
Having a problem with the DB2 connector test.py import ibm_db_dbi connectstring = 'DATABASE=xxx;HOSTNAME=localhost;PORT=5;PROTOCOL=TCPIP;UID=xxx;PWD=xxx;' conn = ibm_db_dbi.connect(connectstring,'','') curr = conn.cursor print(curr) cSQL = "SELECT * FROM TEST" curr.execute(cSQL) rows = curr.fetchall() print(len(rows)) $python test.py Traceback (most recent call last): File "temp.py", line 9, in curr.execute(cSQL) AttributeError: 'function' object has no attribute 'execute' The ibm_db_dbi library supposedly adheres to PEP 249 (DB-API Spec 2.0), but it ain't happening here. Googling got me nowhere. Any ideas? python 3.8.2 on Windows 10 pip install ibm_db -- https://mail.python.org/mailman/listinfo/python-list
Re: Connecting python to DB2 database
On 9/3/2021 1:47 AM, Chris Angelico wrote: On Fri, Sep 3, 2021 at 3:42 PM DFS wrote: Having a problem with the DB2 connector test.py import ibm_db_dbi connectstring = 'DATABASE=xxx;HOSTNAME=localhost;PORT=5;PROTOCOL=TCPIP;UID=xxx;PWD=xxx;' conn = ibm_db_dbi.connect(connectstring,'','') curr = conn.cursor print(curr) According to PEP 249, what you want is conn.cursor() not conn.cursor. I'm a bit surprised as to the repr of that function though, which seems to be this line from your output: I'd have expected it to say something like "method cursor of Connection object", which would have been an immediate clue as to what needs to be done. Not sure why the repr is so confusing, and that might be something to report upstream. ChrisA Thanks. I must've done it right, using conn.cursor(), 500x. Bleary-eyed from staring at code too long I guess. Now can you get DB2 to accept ; as a SQL statement terminator like the rest of the world? They call it "An unexpected token"... -- https://mail.python.org/mailman/listinfo/python-list
Help me split a string into elements
Typical cases:
lines = [('one\ntwo\nthree\n')]
print(str(lines[0]).splitlines())
['one', 'two', 'three']
lines = [('one two three\n')]
print(str(lines[0]).split())
['one', 'two', 'three']
That's the result I'm wanting, but I get data in a slightly different
format:
lines = [('one\ntwo\nthree\n',)]
Note the comma after the string data, but inside the paren.
splitlines() doesn't work on it:
print(str(lines[0]).splitlines())
["('one\\ntwo\\nthree\\n',)"]
I've banged my head enough - can someone spot an easy fix?
Thanks
--
https://mail.python.org/mailman/listinfo/python-list
Re: Help me split a string into elements
On 9/4/2021 5:55 PM, DFS wrote:
Typical cases:
lines = [('one\ntwo\nthree\n')]
print(str(lines[0]).splitlines())
['one', 'two', 'three']
lines = [('one two three\n')]
print(str(lines[0]).split())
['one', 'two', 'three']
That's the result I'm wanting, but I get data in a slightly different
format:
lines = [('one\ntwo\nthree\n',)]
Note the comma after the string data, but inside the paren. splitlines()
doesn't work on it:
print(str(lines[0]).splitlines())
["('one\\ntwo\\nthree\\n',)"]
I've banged my head enough - can someone spot an easy fix?
Thanks
I got it:
lines = [('one\ntwo\nthree\n',)]
print(str(lines[0][0]).splitlines())
['one', 'two', 'three']
--
https://mail.python.org/mailman/listinfo/python-list
Re: Connecting python to DB2 database
On 9/3/2021 9:50 AM, Chris Angelico wrote: On Fri, Sep 3, 2021 at 11:37 PM DFS wrote: On 9/3/2021 1:47 AM, Chris Angelico wrote: On Fri, Sep 3, 2021 at 3:42 PM DFS wrote: Having a problem with the DB2 connector test.py import ibm_db_dbi connectstring = 'DATABASE=xxx;HOSTNAME=localhost;PORT=5;PROTOCOL=TCPIP;UID=xxx;PWD=xxx;' conn = ibm_db_dbi.connect(connectstring,'','') curr = conn.cursor print(curr) According to PEP 249, what you want is conn.cursor() not conn.cursor. I'm a bit surprised as to the repr of that function though, which seems to be this line from your output: I'd have expected it to say something like "method cursor of Connection object", which would have been an immediate clue as to what needs to be done. Not sure why the repr is so confusing, and that might be something to report upstream. ChrisA Thanks. I must've done it right, using conn.cursor(), 500x. Bleary-eyed from staring at code too long I guess. Cool cool! Glad that's working. Now can you get DB2 to accept ; as a SQL statement terminator like the rest of the world? They call it "An unexpected token"... Hmm, I don't know that the execute() method guarantees to allow semicolons. Some implementations will strip a trailing semi, but they usually won't allow interior ones, because that's a good way to worsen SQL injection vulnerabilities. It's entirely possible - and within the PEP 249 spec, I believe - for semicolons to be simply rejected. The default in the DB2 'Command Line Plus' tool is semicolons aren't "allowed". db2 => connect to SAMPLE db2 => SELECT COUNT(*) FROM STAFF; SQL0104N An unexpected token ";" was found following "COUNT(*) FROM STAFF". Expected tokens may include: "END-OF-STATEMENT". SQLSTATE=42601 db2 => SELECT COUNT(*) FROM STAFF 1 --- 35 1 record(s) selected. But I should've known you can set the terminator value: https://www.ibm.com/docs/en/db2/11.1?topic=clp-options Option : -t Description: This option tells the command line processor to use a semicolon (;) as the statement termination character. Default: OFF $ db2 -t turns it on in CommandLinePlus - and the setting applies to the DB-API code too. -- https://mail.python.org/mailman/listinfo/python-list
Re: ANN: Dogelog Runtime, Prolog to the Moon (2021)
On 9/15/2021 12:23 PM, Mostowski Collapse wrote:
I really wonder why my Python implementation
is a factor 40 slower than my JavaScript implementation.
Structurally its the same code.
You can check yourself:
Python Version:
https://github.com/jburse/dogelog-moon/blob/main/devel/runtimepy/machine.py
JavaScript Version:
https://github.com/jburse/dogelog-moon/blob/main/devel/runtime/machine.js
Its the same while, if-then-else, etc.. its the same
classes Variable, Compound etc.. Maybe I could speed
it up by some details. For example to create an array
of length n, I use in Python:
temp = [NotImplemented] * code[pos]
pos += 1
Whereas in JavaScript I use, also
in exec_build2():
temp = new Array(code[pos++]);
So I hear Guido doesn't like ++. So in Python I use +=
and a separate statement as a workaround. But otherwise,
what about the creation of an array,
is the the idiom [_] * _ slow? I am assuming its
compiled away. Or does it really first create an
array of size 1 and then enlarge it?
I'm sure you know you can put in timing statements to find bottlenecks.
import time
startTime = time.perf_counter()
[code block]
print("%.2f" % (time.perf_counter() - startTime))
--
https://mail.python.org/mailman/listinfo/python-list
Re: ANN: Dogelog Runtime, Prolog to the Moon (2021)
On 9/15/2021 5:10 PM, Mostowski Collapse wrote: And how do you only iterate over n-1 elements? I don't need a loop over all elements. With array slicing? Someting like: for item in items[0:len(items)-2]: ___print(item) Or with negative slicing indexes? Problem is my length can be equal to one. And when I have length equal to one, the slice might not do the right thing? LoL From the python command prompt: items = [1,2,3,4] for itm in items: print(itm) 1 2 3 4 for itm in items[:-2]: print(itm) 1 2 for itm in items[:-3]: print(itm) 1 for itm in items[:-4]: print(itm) (no result, no error thrown) for itm in items[:-5]: print(itm) (no result, no error thrown) -- https://mail.python.org/mailman/listinfo/python-list
Re: Question again
On 9/16/2021 1:50 AM, af kh wrote:
Hello,
I was doing some coding on a website called replit then I extracted the file,
and opened it in Python. For some reason, after answering 'no' or 'yes' after
the last sentence I wrote, the Python window shut off, in replit I added one
more sentence, but it isn't shown on Python, it just shuts off. Why is that?
please reply to me soon since I need to submit it as an assignment for my class.
Code on replit:
#Title: Week 2: Chatbot with personality
#Author: Afnan Khan
#Date:9/15/21
#Description: Ask at least 3 questions to the user
#and create a creative topic
#Gives greetings to the user
import random
greetings = ["Hello, I'm Mr. ChatBot!", "Hi, I'm Mr. ChatBot!", "Hey~, I'm Mr.
ChatBot!"]
comment = random.choice(greetings)
print(comment)
#Learn about the user's Name: First question
name = input("What is your name? ")
#Greet the User
print("Nice to meet you, " + name)
#Ask the user about their day: Second question
print("How is your day going? ")
#The user replies
reply = input()
#If user says 'amazing', reply with 'I am glad!'
if reply == "amazing" :
print("I am glad!")
#If user says 'Alright', reply with 'that's good'
elif reply == "alright" :
print("that's good")
#If user says 'bad', reply with 'Do not worry things will get better'
elif reply == "bad" :
print("Do not worry things will get better")
#Else than that type 'I see'
else :
print("I see!")
#Ask to pick between numbers 1~10 to see if you will get lucky today: Third
question
number = input("Please pick between numbers 1~10 to see your luck for today: ")
#From number 1~3 and an answer
if number == "1" or number == "2" or number == "3" :
print("You're in grat luck today!")
#From number 4~7 and an answer
elif number == "4" or number == "5" or number == "6" :
print("damn, bad luck is coming your way")
#From number 8~10 and an answer
elif number == "7" or number == "8" or number == "9" or number == "10" :
print("I cannot sense any luck today, try again next time")
#Add a statement and question: Fourth question
print("That will be all for today's chitchat, woohooo! would you like to exit the
chat?")
#User says 'yes'
reply = input()
#If user says 'yes' reply 'wait hold on! are you really leaving??': Fifth
question
if reply == "yes" :
print("Wait hold on! are you really leaving??")
#User answers
answer = input()
#If user says 'yes' again, reply 'fine! bye then!'
if answer == "yes" :
print("Fine! bye then!")
#Other than that if user says 'no', reply 'just kidding we're done here haha'
elif answer == "no" :
print("just kidding we're done here haha")
Regards,
Aya
I don't understand your issue, but this code runs fine for me.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Free OCR package in Python and selecting appropriate widget for the GUI
On 9/21/2021 4:36 AM, Mohsen Owzar wrote: Hi Guys Long time ago I've written a program in Malab a GUI for solving Sudoku puzzles, which worked not so bad. Now I try to write this GUI with Python with PyQt5 or TKinter. First question is: Is there any free OCR software, packages or code in Python, which I can use to recognize the given digits and their positions in the puzzle square. Second: Because, I can not attach a picture to this post, I try to describe my picture of my GUI. Draw your GUI in PyQt designer or other graphics tool, then upload a screenshot of it to imgur, then post the link to the picture. -- https://mail.python.org/mailman/listinfo/python-list
Re: Free OCR package in Python and selecting appropriate widget for the GUI
On 9/21/2021 10:38 PM, Mohsen Owzar wrote: DFS schrieb am Dienstag, 21. September 2021 um 15:45:38 UTC+2: On 9/21/2021 4:36 AM, Mohsen Owzar wrote: Hi Guys Long time ago I've written a program in Malab a GUI for solving Sudoku puzzles, which worked not so bad. Now I try to write this GUI with Python with PyQt5 or TKinter. First question is: Is there any free OCR software, packages or code in Python, which I can use to recognize the given digits and their positions in the puzzle square. Second: Because, I can not attach a picture to this post, I try to describe my picture of my GUI. Draw your GUI in PyQt designer or other graphics tool, then upload a screenshot of it to imgur, then post the link to the picture. Thanks, for your answer. But, what is "imgur"? I'm not so familiar with handling of pictures in this group. How can I call "imgur" or how can I get there? Regards Mohsen www.imgur.com It's a website you can upload image files or screenshots to. Then you can copy a link to your picture and post the link here. -- https://mail.python.org/mailman/listinfo/python-list
Re: Free OCR package in Python and selecting appropriate widget for the GUI
On 9/22/2021 1:54 AM, Mohsen Owzar wrote:
DFS schrieb am Mittwoch, 22. September 2021 um 05:10:30 UTC+2:
On 9/21/2021 10:38 PM, Mohsen Owzar wrote:
DFS schrieb am Dienstag, 21. September 2021 um 15:45:38 UTC+2:
On 9/21/2021 4:36 AM, Mohsen Owzar wrote:
Hi Guys
Long time ago I've written a program in Malab a GUI for solving Sudoku puzzles,
which worked not so bad.
Now I try to write this GUI with Python with PyQt5 or TKinter.
First question is:
Is there any free OCR software, packages or code in Python, which I can use to
recognize the given digits and their positions in the puzzle square.
Second:
Because, I can not attach a picture to this post, I try to describe my picture
of my GUI.
Draw your GUI in PyQt designer or other graphics tool, then upload a
screenshot of it to imgur, then post the link to the picture.
Thanks, for your answer.
But, what is "imgur"?
I'm not so familiar with handling of pictures in this group.
How can I call "imgur" or how can I get there?
Regards
Mohsen
www.imgur.com
It's a website you can upload image files or screenshots to. Then you
can copy a link to your picture and post the link here.
I have already posted the link, but I can not see it anywhere.
Now, I post it again:
https://imgur.com/a/Vh8P2TE
I hope that you can see my two images.
Regards
Mohsen
Got it.
I haven't used tkinter. In PyQt5 designer I think you should use one
QTextEdit control for each square.
Each square with the small black font can be initially populated with
1 2 3
4 5 6
7 8 9
https://imgur.com/lTcEiML
some starter python code (maybe save as sudoku.py)
=
from PyQt5 import Qt, QtCore, QtGui, QtWidgets, uic
from PyQt5.Qt import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.QtWidgets import *
#objects
app = QtWidgets.QApplication([])
frm = uic.loadUi("sudoku.ui")
#grid = a collection of squares
grids = 1
#squares = number of squares per grid
squares = 9
#fill the squares with 1-9
def populateSquares():
for i in range(grids,grids+1):
for j in range(1,squares+1):
widget = frm.findChild(QtWidgets.QTextEdit,
"txt{}_{}".format(i,j))
widget.setText("1 2 3 4 5 6 7 8 9")
#read data from squares
def readSquares():
for i in range(grids,grids+1):
for j in range(1,squares+1):
print("txt%d_%d contains: %s" %
(i,j,frm.findChild(QtWidgets.QTextEdit,
"txt{}_{}".format(i,j)).toPlainText()))
#connect pushbuttons to code
frm.btnPopulate.clicked.connect(populateSquares)
frm.btnReadContents.clicked.connect(readSquares)
#show main form
frm.show()
#initiate application
app.exec()
=
.ui file (ie save as sudoku.ui)
=
MainWindow
0
0
325
288
Sudoku
32
22
83
65
Courier
12
50
false
false
color: rgb(0, 0, 127);
background-color: rgb(255, 255, 127);
QFrame::StyledPanel
QFrame::Sunken
Qt::ScrollBarAlwaysOff
true
114
22
83
65
Courier
12
50
false
false
color: rgb(0, 0, 127);
background-color: rgb(255, 255, 127);
QFrame::StyledPanel
QFrame::Sunken
Qt::ScrollBarAlwaysOff
true
196
22
83
65
Courier
12
50
false
false
color: rgb(0, 0, 127);
background-color: rgb(255, 255, 127);
QFrame::StyledPanel
QFrame::Sunken
Qt::ScrollBarAlwaysOff
true
32
86
83
65
Courier
12
50
false
false
color: rgb(0, 0, 127);
background-color: rgb(255, 255, 127);
QFrame::StyledPanel
QFrame::Sunken
Qt::ScrollBarAlwaysOff
true
114
86
83
65
Courier
12
50
false
false
color: rgb(0, 0, 127);
background-color: rgb(255, 255, 127);
QFrame::StyledPanel
QFrame::Sunken
Qt::ScrollBarAlwaysOff
true
196
86
83
65
Courier
12
50
false
Re: Flush / update GUIs in PyQt5 during debugging in PyCharm
On 9/24/2021 12:46 AM, Mohsen Owzar wrote: Hi Guys I've written a GUI using PyQt5 and in there I use StyleSheets (css) for the buttons and labels to change their background- and foreground-colors and their states as well. Because my program doesn't function correctly, I try to debug it in my IDE (PyCharm). The problem is that during debugging, when I change some attributes of a button or label, let say its background-color, I can not see this modification of the color until the whole method or function is completed. I believe that I have seen somewhere during my searches and googling that one can flush or update the GUI after each step/action is done. But until now I couldn't manage it and I don't know where I have to invoke flush/update command in PyCharm. If anyone has done this before and knows about it, I would very appreciate seeing his solution. Regards Mohsen screen: form.repaint() individual widgets: form.widget.repaint() -- https://mail.python.org/mailman/listinfo/python-list
Use pyodbc to count and list tables, columns, indexes, etc
import pyodbc
dbName = "D:\test_data.mdb"
conn = pyodbc.connect('DRIVER={Microsoft Access Driver
(*.mdb)};DBQ='+dbName)
cursor = conn.cursor()
#COUNT TABLES, LIST COLUMNS
tblCount = 0
for rows in cursor.tables():
if rows.table_type == "TABLE": #LOCAL TABLES ONLY
tblCount += 1
print rows.table_name
for fld in cursor.columns(rows.table_name):
print(fld.table_name, fld.column_name)
print tblCount,"tables"
Problem is, the 'for rows' loop executes only once if the 'for fld' loop
is in place. So even if I have 50 tables, the output is like:
DATA_TYPES
(u'DATA_TYPES', u'FLD_TEXT', -9, u'VARCHAR')
(u'DATA_TYPES', u'FLD_MEMO', -10, u'LONGCHAR')
(u'DATA_TYPES', u'FLD_NBR_BYTE', -6, u'BYTE')
1 tables
And no errors are thrown.
If I comment out the 2 'for fld' lines, it counts and lists all 50
tables correctly.
Any ideas?
Thanks!
--
https://mail.python.org/mailman/listinfo/python-list
Re: Use pyodbc to count and list tables, columns, indexes, etc
On 3/31/2016 11:44 PM, DFS wrote:
import pyodbc
dbName = "D:\test_data.mdb"
conn = pyodbc.connect('DRIVER={Microsoft Access Driver
(*.mdb)};DBQ='+dbName)
cursor = conn.cursor()
#COUNT TABLES, LIST COLUMNS
tblCount = 0
for rows in cursor.tables():
if rows.table_type == "TABLE": #LOCAL TABLES ONLY
tblCount += 1
print rows.table_name
for fld in cursor.columns(rows.table_name):
print(fld.table_name, fld.column_name)
print tblCount,"tables"
Problem is, the 'for rows' loop executes only once if the 'for fld' loop
is in place. So even if I have 50 tables, the output is like:
DATA_TYPES
(u'DATA_TYPES', u'FLD_TEXT', -9, u'VARCHAR')
(u'DATA_TYPES', u'FLD_MEMO', -10, u'LONGCHAR')
(u'DATA_TYPES', u'FLD_NBR_BYTE', -6, u'BYTE')
1 tables
And no errors are thrown.
If I comment out the 2 'for fld' lines, it counts and lists all 50
tables correctly.
Any ideas?
Thanks!
Never mind! I discovered I just needed a 2nd cursor object for the columns.
---
cursor1 = conn.cursor()
cursor2 = conn.cursor()
tblCount = 0
for rows in cursor1.tables():
if rows.table_type == "TABLE":
tblCount += 1
print rows.table_name
for fld in cursor2.columns(rows.table_name):
print(fld.table_name, fld.column_name)
---
Works splendiferously.
--
https://mail.python.org/mailman/listinfo/python-list
Re: extract rar
On 4/1/2016 5:01 PM, Jianling Fan wrote: Thanks, but the problem is that I am not allowed to install any software in my office PC, even free software. Normally, I use zip files but this time I need to extract a rar file. I don't like to go to IT guys because it takes time. That's why I am looking for an alternative way without installing other software. Thanks, On 1 April 2016 at 13:37, Albert-Jan Roskam wrote: Date: Fri, 1 Apr 2016 13:22:12 -0600 Subject: extract rar From: [email protected] To: [email protected] Hello everyone, I am wondering is there any way to extract rar files by python without WinRAR software? I tried Archive() and patool, but seems they required the WinRAR software. Perhaps 7-zip in a Python subprocess: http://superuser.com/questions/458643/unzip-rar-from-command-line-with-7-zip/464128 I'm not experienced with Python, but I found this: "pip install patool import patoolib patoolib.extract_archive("foo_bar.rar", outdir=".") Works on Windows and linux without any other libraries needed." http://stackoverflow.com/questions/17614467/how-can-unrar-a-file-with-python -- https://mail.python.org/mailman/listinfo/python-list
Re: Sorting a list
On 4/3/2016 2:30 PM, DFS wrote:
cntText = 60
cntBool = 20
cntNbrs = 30
cntDate = 20
cntBins = 20
strText = " text: "
strBool = " boolean: "
strNbrs = " numeric: "
strDate = " date-time:"
strBins = " binary: "
colCounts = [(cntText,strText) , (cntBool,strBool), (cntNbrs,strNbrs) ,
(cntDate,strDate) , (cntBins,strBins)]
# sort by alpha, then by column type count descending
colCounts.sort(key=lambda x: x[1])
colCounts.sort(key=lambda x: x[0], reverse=True)
for key in colCounts: print key[1], key[0]]
-
Output (which is exactly what I want):
text: 60
numeric: 30
binary:20
boolean: 20
date-time: 20
-
But, is there a 1-line way to sort and print?
Meant to include this example:
print {i:os.strerror(i) for i in sorted(errno.errorcode)}
Thanks!
--
https://mail.python.org/mailman/listinfo/python-list
Sorting a list
cntText = 60 cntBool = 20 cntNbrs = 30 cntDate = 20 cntBins = 20 strText = " text: " strBool = " boolean: " strNbrs = " numeric: " strDate = " date-time:" strBins = " binary: " colCounts = [(cntText,strText) , (cntBool,strBool), (cntNbrs,strNbrs) , (cntDate,strDate) , (cntBins,strBins)] # sort by alpha, then by column type count descending colCounts.sort(key=lambda x: x[1]) colCounts.sort(key=lambda x: x[0], reverse=True) for key in colCounts: print key[1], key[0]] - Output (which is exactly what I want): text: 60 numeric: 30 binary:20 boolean: 20 date-time: 20 - But, is there a 1-line way to sort and print? Thanks! -- https://mail.python.org/mailman/listinfo/python-list
Re: Sorting a list
On 4/3/2016 3:31 PM, Peter Otten wrote:
DFS wrote:
cntText = 60
cntBool = 20
cntNbrs = 30
cntDate = 20
cntBins = 20
strText = " text: "
strBool = " boolean: "
strNbrs = " numeric: "
strDate = " date-time:"
strBins = " binary: "
colCounts = [(cntText,strText) , (cntBool,strBool), (cntNbrs,strNbrs) ,
(cntDate,strDate) , (cntBins,strBins)]
# sort by alpha, then by column type count descending
colCounts.sort(key=lambda x: x[1])
colCounts.sort(key=lambda x: x[0], reverse=True)
for key in colCounts: print key[1], key[0]]
-
Output (which is exactly what I want):
text: 60
numeric: 30
binary:20
boolean: 20
date-time: 20
-
But, is there a 1-line way to sort and print?
Yes, but I would not recommend it. You can replace the sort() method
invocations with nested calls of sorted() and instead of
for item in items:
print convert_to_str(item)
use
print "\n".join(convert_to_str(item) for item in items)
Putting it together:
from operator import itemgetter as get
print "\n".join("{1} {0}".format(*p) for p in sorted(
... sorted(colCounts, key=get(1)), key=get(0), reverse=True))
Kind of clunky looking. Is that why don't you recommend it?
text: 60
numeric: 30
binary:20
boolean: 20
date-time: 20
You could also cheat and use
lambda v: (-v[0], v[1])
and a single sorted().
That works well. Why is it 'cheating'?
Thanks for the reply.
--
https://mail.python.org/mailman/listinfo/python-list
OT: Anyone here use the ConEmu console app?
I turned on the Quake-style option (and auto-hide when it loses focus) and it disappeared and I can't figure out how to get it back onscreen. I think there's a keystroke combo (like Win+key) but I don't know what it is. It shows in the Task Manager Processses, but not in the Alt+Tab list. Uninstalled and reinstalled and now it launches Quake-style and hidden. Looked everywhere (\Users\AppData\Local, Registry) for leftover settings file but couldn't find it. Here's the screen where you make the Quake-style setting. https://conemu.github.io/en/SettingsAppearance.html Thanks -- https://mail.python.org/mailman/listinfo/python-list
Re: OT: Anyone here use the ConEmu console app?
On 4/11/2016 6:04 PM, 20/20 Lab wrote: win+alt+space does not work? ctrl+alt+win+space? http://conemu.github.io/en/KeyboardShortcuts.html Says those are not configurable, so they should work. Neither of those worked, but Ctrl+~ did. Thankyouthankyouthankyou On 04/11/2016 02:49 PM, DFS wrote: I turned on the Quake-style option (and auto-hide when it loses focus) and it disappeared and I can't figure out how to get it back onscreen. I think there's a keystroke combo (like Win+key) but I don't know what it is. It shows in the Task Manager Processses, but not in the Alt+Tab list. Uninstalled and reinstalled and now it launches Quake-style and hidden. Looked everywhere (\Users\AppData\Local, Registry) for leftover settings file but couldn't find it. Here's the screen where you make the Quake-style setting. https://conemu.github.io/en/SettingsAppearance.html Thanks -- https://mail.python.org/mailman/listinfo/python-list
You gotta love a 2-line python solution
To save a webpage to a file:
-
1. import urllib
2. urllib.urlretrieve("http://econpy.pythonanywhere.com
/ex/001.html","D:\file.html")
-
That's it!
Coming from VB/A background, some of the stuff you can do with python -
with ease - is amazing.
VBScript version
--
1. Option Explicit
2. Dim xmlHTTP, fso, fOut
3. Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
4. xmlHTTP.Open "GET", "http://econpy.pythonanywhere.com/ex/001.html";
5. xmlHTTP.Send
6. Set fso = CreateObject("Scripting.FileSystemObject")
7. Set fOut = fso.CreateTextFile("D:\file.html", True)
8. fOut.WriteLine xmlHTTP.ResponseText
9. fOut.Close
10. Set fOut = Nothing
11. Set fso = Nothing
12. Set xmlHTTP = Nothing
--
Technically, that VBS will run with just lines 3-9, but that's still 6
lines of code vs 2 for python.
--
https://mail.python.org/mailman/listinfo/python-list
Fastest way to retrieve and write html contents to file
I posted a little while ago about how short the python code was:
-
1. import urllib
2. urllib.urlretrieve(webpage, filename)
-
Which is very sweet compared to the VBScript version:
--
1. Option Explicit
2. Dim xmlHTTP, fso, fOut
3. Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
4. xmlHTTP.Open "GET", webpage
5. xmlHTTP.Send
6. Set fso = CreateObject("Scripting.FileSystemObject")
7. Set fOut = fso.CreateTextFile(filename, True)
8. fOut.WriteLine xmlHTTP.ResponseText
9. fOut.Close
10. Set fOut = Nothing
11. Set fso = Nothing
12. Set xmlHTTP = Nothing
--
Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10
iterations, vs 0.88 for python.
webpage = 'http://econpy.pythonanywhere.com/ex/001.html'
So I tried:
---
import urllib2
r = urllib2.urlopen(webpage)
f = open(filename,"w")
f.write(r.read())
f.close
---
and
---
import requests
r = requests.get(webpage)
f = open(filename,"w")
f.write(r.text)
f.close
---
and
-
import pycurl
with open(filename, 'wb') as f:
c = pycurl.Curl()
c.setopt(c.URL, webpage)
c.setopt(c.WRITEDATA, f)
c.perform()
c.close()
-
urllib2 and requests were about the same speed as urllib.urlretrieve,
while pycurl was significantly slower (1.2 seconds).
I'm running Win 8.1. python 2.7.11 32-bit.
I know it's asking a lot, but is there a really fast AND really short
python solution for this simple thing?
Thanks!
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 12:40 AM, Chris Angelico wrote: On Mon, May 2, 2016 at 2:34 PM, Stephen Hansen wrote: On Sun, May 1, 2016, at 09:06 PM, DFS wrote: Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10 iterations, vs 0.88 for python. ... I know it's asking a lot, but is there a really fast AND really short python solution for this simple thing? 0.88 is not fast enough for you? That's less then a second. Also, this is timings of network and disk operations. Unless something pathological is happening, the language used won't make any difference. ChrisA Unfortunately, the VBScript is twice as fast as any python method. -- https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 12:31 AM, Stephen Hansen wrote:
On Sun, May 1, 2016, at 08:39 PM, DFS wrote:
To save a webpage to a file:
-
1. import urllib
2. urllib.urlretrieve("http://econpy.pythonanywhere.com
/ex/001.html","D:\file.html")
-
Note, for paths on windows you really want to use a rawstring. Ie,
r"D:\file.html".
Thanks.
I actually use "D:\\file.html" in my code.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 12:49 AM, Ben Finney wrote: DFS writes: Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10 iterations, vs 0.88 for python. […] urllib2 and requests were about the same speed as urllib.urlretrieve, while pycurl was significantly slower (1.2 seconds). Network access is notoriously erratic in its timing. The program, and the machine on which it runs, is subject to a great many external effects once the request is sent — effects which will significantly alter the delay before a response is completed. How have you controlled for the wide variability in the duration, for even a given request by the *same code on the same machine*, at different points in time? One simple way to do that: Run the exact same test many times (say, 10 000 or so) on the same machine, and then compute the average of all the durations. Do the same for each different program, and then you may have more meaningfully comparable measurements. I tried the 10-loop test several times with all versions. The results were 100% consistent: VBSCript xmlHTTP was always 2x faster than any python method. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 1:00 AM, Stephen Hansen wrote: On Sun, May 1, 2016, at 09:50 PM, DFS wrote: On 5/2/2016 12:40 AM, Chris Angelico wrote: On Mon, May 2, 2016 at 2:34 PM, Stephen Hansen wrote: On Sun, May 1, 2016, at 09:06 PM, DFS wrote: Then I tested them in loops - the VBScript is MUCH faster: 0.44 for 10 iterations, vs 0.88 for python. ... I know it's asking a lot, but is there a really fast AND really short python solution for this simple thing? 0.88 is not fast enough for you? That's less then a second. Also, this is timings of network and disk operations. Unless something pathological is happening, the language used won't make any difference. ChrisA Unfortunately, the VBScript is twice as fast as any python method. And 0.2 is twice as fast as 0.1. When you have two small numbers, 'twice as fast' isn't particularly meaningful as a metric. 0.2 is half as fast as 0.1, here. And two small numbers turn into bigger numbers when the webpage is big, and soon the download time differences are measured in minutes, not half a second. So, any ideas? -- https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 1:02 AM, Stephen Hansen wrote:
On Sun, May 1, 2016, at 09:51 PM, DFS wrote:
On 5/2/2016 12:31 AM, Stephen Hansen wrote:
On Sun, May 1, 2016, at 08:39 PM, DFS wrote:
To save a webpage to a file:
-
1. import urllib
2. urllib.urlretrieve("http://econpy.pythonanywhere.com
/ex/001.html","D:\file.html")
-
Note, for paths on windows you really want to use a rawstring. Ie,
r"D:\file.html".
Thanks.
I actually use "D:\\file.html" in my code.
Or you can do that. But the whole point of raw strings is not having to
escape slashes :)
Nice. Where/how else is 'r' used?
I'm new to python, but I learned that one the hard way.
I was using "D\testfile.txt" for something, and my code kept failing.
Took me a while to figure it out. I tried various letters after the
slash. I finally stumbled across the escape slashes in the docs somewhere.
--
https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 1:02 AM, Stephen Hansen wrote:
On Sun, May 1, 2016, at 09:51 PM, DFS wrote:
On 5/2/2016 12:31 AM, Stephen Hansen wrote:
On Sun, May 1, 2016, at 08:39 PM, DFS wrote:
To save a webpage to a file:
-
1. import urllib
2. urllib.urlretrieve("http://econpy.pythonanywhere.com
/ex/001.html","D:\file.html")
-
Note, for paths on windows you really want to use a rawstring. Ie,
r"D:\file.html".
Thanks.
I actually use "D:\\file.html" in my code.
Or you can do that. But the whole point of raw strings is not having to
escape slashes :)
Trying the rawstring thing (say it fast 3x):
webpage = "http://econpy.pythonanywhere.com/ex/001.html";
webfile = "D:\\econpy001.html"
urllib.urlretrieve(webpage,webfile) WORKS
webfile = "rD:\econpy001.html"
urllib.urlretrieve(webpage,webfile) FAILS
webfile = "D:\econpy001.html"
urllib.urlretrieve(webpage,"r" + webfile) FAILS
webfile = "D:\econpy001.html"
urllib.urlretrieve(webpage,"r" + "" + webfile + "") FAILS
The FAILs throw:
Traceback (most recent call last):
File "webscraper.py", line 54, in
urllib.urlretrieve(webpage,webfile)
File "D:\development\python\python_2.7.11\lib\urllib.py", line 98, in
urlretrieve
return opener.retrieve(url, filename, reporthook, data)
File "D:\development\python\python_2.7.11\lib\urllib.py", line 249,
in retrieve
tfp = open(filename, 'wb')
IOError: [Errno 22] invalid mode ('wb') or filename: 'rD:\\econpy001.html'
What am I doing wrong?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 1:15 AM, Stephen Hansen wrote: On Sun, May 1, 2016, at 10:00 PM, DFS wrote: I tried the 10-loop test several times with all versions. Also how, _exactly_, are you testing this? C:\Python27>python -m timeit "filename='C:\\test.txt'; webpage='http://econpy.pythonanywhere.com/ex/001.html'; import urllib2; r = urllib2.urlopen(webpage); f = open(filename, 'w'); f.write(r.read()); f.close();" 10 loops, best of 3: 175 msec per loop That's a whole lot less the 0.88secs. Indeed. - import requests, urllib, urllib2, pycurl import time webpage = "http://econpy.pythonanywhere.com/ex/001.html"; webfile = "D:\\econpy001.html" loops = 10 startTime = time.clock() for i in range(loops): urllib.urlretrieve(webpage,webfile) endTime = time.clock() print "Finished urllib in %.2g seconds" %(endTime-startTime) startTime = time.clock() for i in range(loops): r = urllib2.urlopen(webpage) f = open(webfile,"w") f.write(r.read()) f.close endTime = time.clock() print "Finished urllib2 in %.2g seconds" %(endTime-startTime) startTime = time.clock() for i in range(loops): r = requests.get(webpage) f = open(webfile,"w") f.write(r.text) f.close endTime = time.clock() print "Finished requests in %.2g seconds" %(endTime-startTime) startTime = time.clock() for i in range(loops): with open(webfile + str(i) + ".txt", 'wb') as f: c = pycurl.Curl() c.setopt(c.URL, webpage) c.setopt(c.WRITEDATA, f) c.perform() c.close() endTime = time.clock() print "Finished pycurl in %.2g seconds" %(endTime-startTime) - $ python getHTML.py Finished urllib in 0.88 seconds Finished urllib2 in 0.83 seconds Finished requests in 0.89 seconds Finished pycurl in 1.1 seconds Those results are consistent. They go up or down a little, but never below 0.82 seconds (for urllib2), or above 1.2 seconds (for pycurl) VBScript is consistently 0.44 to 0.48 -- https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 1:37 AM, Stephen Hansen wrote: On Sun, May 1, 2016, at 10:23 PM, DFS wrote: Trying the rawstring thing (say it fast 3x): webpage = "http://econpy.pythonanywhere.com/ex/001.html"; webfile = "D:\\econpy001.html" urllib.urlretrieve(webpage,webfile) WORKS webfile = "rD:\econpy001.html" The r is *outside* the string. Its: r"D:\econpy001.html" Got it. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 2:05 AM, Steven D'Aprano wrote:
On Monday 02 May 2016 15:00, DFS wrote:
I tried the 10-loop test several times with all versions.
The results were 100% consistent: VBSCript xmlHTTP was always 2x faster
than any python method.
Are you absolutely sure you're comparing the same job in two languages?
As near as I can tell. In VBScript I'm actually dereferencing various
objects (that adds to the time), but I don't do that in python. I don't
know enough to even know if it's necessary, or good practice, or what.
Is VB using a local web cache, and Python not?
I'm not specifying a local web cache with either (wouldn't know how or
where to look). If you have Windows, you can try it.
---
Option Explicit
Dim xmlHTTP, fso, fOut, startTime, endTime, webpage, webfile,i
webpage = "http://econpy.pythonanywhere.com/ex/001.html";
webfile = "D:\econpy001.html"
startTime = Timer
For i = 1 to 10
Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
xmlHTTP.Open "GET", webpage
xmlHTTP.Send
Set fso = CreateObject("Scripting.FileSystemObject")
Set fOut = fso.CreateTextFile(webfile, True)
fOut.WriteLine xmlHTTP.ResponseText
fOut.Close
Set fOut= Nothing
Set fso = Nothing
Set xmlHTTP = Nothing
Next
endTime = Timer
wscript.echo "Finished VBScript in " & FormatNumber(endTime -
startTime,3) & " seconds"
---
save it to a .vbs file and run it like this:
$cscript /nologo filename.vbs
Are you saving files with both
tests? To the same local drive? (To ensure you aren't measuring the
difference between "write this file to a slow IDE hard disk, write that file
to a fast SSD".)
Identical functionality (retrieve webpage, write html to file). Same
webpage, written to the same folder on the same hard drive (not SSD).
The 10 file writes (open/write/close) don't make a meaningful difference
at all:
VBScript 0.0156 seconds
urllib2 0.0034 seconds
This file is 3.55K.
Once you are sure that you are comparing the same task in two languages,
then make sure the measurement is meaningful. If you change from a (let's
say) 1 KB file to a 100 KB file, do you see the same 2 x difference? What if
you increase it to a 1 KB file?
Do you know a webpage I can hit 10x repeatedly to download a good size
file? I'm always paranoid they'll block me thinking I'm a
"professional" web scraper or something.
Thanks
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 2:27 AM, Stephen Hansen wrote: On Sun, May 1, 2016, at 10:59 PM, DFS wrote: startTime = time.clock() for i in range(loops): r = urllib2.urlopen(webpage) f = open(webfile,"w") f.write(r.read()) f.close endTime = time.clock() print "Finished urllib2 in %.2g seconds" %(endTime-startTime) Yeah on my system I get 1.8 out of this, amounting to 0.18s. You get 1.8 seconds total for the 10 loops? That's less than half as fast as my results. Surprising. I'm again going back to the point of: its fast enough. When comparing two small numbers, "twice as slow" is meaningless. Speed is always meaningful. I know python is relatively slow, but it's a cool, concise, powerful language. I'm extremely impressed by how tight the code can get. You have an assumption you haven't answered, that downloading a 10 meg file will be twice as slow as downloading this tiny file. You haven't proven that at all. True. And it has been my assumption - tho not with 10MB file. I suspect you have a constant overhead of X, and in this toy example, that makes it seem twice as slow. But when downloading a file of size, you'll have the same constant factor, at which point the difference is irrelevant. Good point. Test below. If you believe otherwise, demonstrate it. http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=ga&wqhqn=2&qc=Atlanta&rg=30&qhqn=restaurant&sb=zipdisc&ap=2 It's a 58854 byte file when saved to disk (smaller file was 3546 bytes), so this is 16.6x larger. So I would expect python to linearly run in 16.6 * 0.88 = 14.6 seconds. 10 loops per run 1st run $ python timeGetHTML.py Finished urllib in 8.5 seconds Finished urllib2 in 5.6 seconds Finished requests in 7.8 seconds Finished pycurl in 6.5 seconds wait a couple minutes, then 2nd run $ python timeGetHTML.py Finished urllib in 5.6 seconds Finished urllib2 in 5.7 seconds Finished requests in 5.2 seconds Finished pycurl in 6.4 seconds It's a little more than 1/3 of my estimate - so good news. (when I was doing these tests, some of the python results were 0.75 seconds - way too fast, so I checked and no data was written to file, and I couldn't even open the webpage with a browser. Looks like I had been temporarily blocked from the site. After a couple minutes, I was able to access it again). I noticed urllib and curl returned the html as is, but urllib2 and requests added enhancements that should make the data easier to parse. Based on speed and functionality and documentation, I believe I'll be using the requests HTTP library (I will actually be doing a small amount of web scraping). VBScript 1st run: 7.70 seconds 2nd run: 5.38 3rd run: 7.71 So python matches or beats VBScript at this much larger file. Kewl. -- https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 5:26 AM, BartC wrote:
On 02/05/2016 04:39, DFS wrote:
To save a webpage to a file:
-
1. import urllib
2. urllib.urlretrieve("http://econpy.pythonanywhere.com
/ex/001.html","D:\file.html")
-
That's it!
Coming from VB/A background, some of the stuff you can do with python -
with ease - is amazing.
VBScript version
--
1. Option Explicit
2. Dim xmlHTTP, fso, fOut
3. Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
4. xmlHTTP.Open "GET", "http://econpy.pythonanywhere.com/ex/001.html";
5. xmlHTTP.Send
6. Set fso = CreateObject("Scripting.FileSystemObject")
7. Set fOut = fso.CreateTextFile("D:\file.html", True)
8. fOut.WriteLine xmlHTTP.ResponseText
9. fOut.Close
10. Set fOut = Nothing
11. Set fso = Nothing
12. Set xmlHTTP = Nothing
--
Technically, that VBS will run with just lines 3-9, but that's still 6
lines of code vs 2 for python.
It seems Python provides a higher level solution compared with VBS.
Python presumably also has to do those Opens and Sends, but they are
hidden away inside urllib.urlretrieve.
You can do the same with VB just by wrapping up these lines in a
subroutine. As you would if this had to be executed in a dozen different
places for example. Then you could just write:
getfile("http://econpy.pythonanywhere.com/ex/001.html";, "D:/file.html")
in VBS too. (The forward slash in the file name ought to work.)
Of course. Taken to its extreme, I could eventually replace you with
one line of code :)
But python does it for me. That would save me 8 lines...
(I don't know VBS; I assume it does /have/ subroutines? What I haven't
factored in here is error handling which might yet require more coding
in VBS compared with Python)
Yeah, VBS has subs and functions. And strange, limited error handling.
And a single data type, called Variant. But it's installed with Windows
so it's easy to get going with.
--
https://mail.python.org/mailman/listinfo/python-list
Best way to clean up list items?
Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n ']
Want: list1 = ['Item 1','Item 2']
I wrote this, which works fine, but maybe it can be tidier?
1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n
2. list3 = [t.strip(' ') for t in list2]#trim whitespace
3. list1 = filter(None, list3) #remove empty items
After each step:
1. list2 = [' Item 1 ',' Item 2 ',' '] #remove \r\n
2. list3 = ['Item 1','Item 2',''] #trim whitespace
3. list1 = ['Item 1','Item 2'] #remove empty items
Thanks!
--
https://mail.python.org/mailman/listinfo/python-list
Re: Python3 html scraper that supports javascript
On 5/2/2016 11:33 AM, [email protected] wrote: I tried to use the following code: from bs4 import BeautifulSoup from selenium import webdriver PHANTOMJS_PATH = 'C:\\Users\\Zoran\\Downloads\\Obrisi\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe' url = 'https://hrti.hrt.hr/#/video/show/2203605/trebizat-prica-o-jednoj-vodi-i-jednom-narodu-dokumentarni-film' browser = webdriver.PhantomJS(PHANTOMJS_PATH) browser.get(url) soup = BeautifulSoup(browser.page_source, "html.parser") x = soup.prettify() print(x) When I print x variable, I would expect to see something like this: https://hrti.hrt.hr/2e9e9c45-aa23-4d08-9055-cd2d7f2c4d58"; id="vjs_video_3_html5_api" class="vjs-tech" preload="none">https://prd-hrt.spectar.tv/player/get_smil/id/2203605/video_id/2203605/token/Cny6ga5VEQSJ2uZaD2G8pg/token_expiration/1462043309/asset_type/Movie/playlist_template/nginx/channel_name/trebiat__pria_o_jednoj_vodi_i_jednom_narodu_dokumentarni_film/playlist.m3u8?foo=bar";> but I can't come to that point. Regards. I was doing something similar recently. Try this: f = open(somefilename) soup = BeautifulSoup.BeautifulSoup(f) f.close() print soup.prettify() -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to clean up list items?
On 5/2/2016 1:25 PM, Stephen Hansen wrote:
On Mon, May 2, 2016, at 09:33 AM, DFS wrote:
Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n ']
I'm curious how you got to this point, it seems like you can solve the
problem in how this is generated.
from lxml import html
import requests
webpage =
"http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=TN&wqhqn=2&qc=Nashville&rg=30&qhqn=restaurant&sb=zipdisc&ap=2";
page = requests.get(webpage)
tree = html.fromstring(page.content)
addr1 = tree.xpath('//span[@class="text3"]/text()')
print 'Addresses: ', addr1
I'd prefer to get clean data in the first place, but I don't know a
better way to extract it from the HTML.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Best way to clean up list items?
On 5/2/2016 12:57 PM, Jussi Piitulainen wrote:
DFS writes:
Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n ']
Want: list1 = ['Item 1','Item 2']
I wrote this, which works fine, but maybe it can be tidier?
1. list2 = [t.replace("\r\n", "") for t in list1] #remove \r\n
2. list3 = [t.strip(' ') for t in list2]#trim whitespace
3. list1 = filter(None, list3) #remove empty items
After each step:
1. list2 = [' Item 1 ',' Item 2 ',' '] #remove \r\n
2. list3 = ['Item 1','Item 2',''] #trim whitespace
3. list1 = ['Item 1','Item 2'] #remove empty items
Try filter(None, (t.strip() for t in list1)). The default.
Works and drops a line of code. Thx.
Funny-looking data you have.
I know - sadly, it's actual data:
from lxml import html
import requests
webpage =
"http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=TN&wqhqn=2&qc=Nashville&rg=30&qhqn=restaurant&sb=zipdisc&ap=2";
page = requests.get(webpage)
tree = html.fromstring(page.content)
addr1 = tree.xpath('//span[@class="text3"]/text()')
print 'Addresses: ', addr1
I couldn't figure out a better way to extract it from the HTML (maybe
XML and DOM?)
--
https://mail.python.org/mailman/listinfo/python-list
Re: Best way to clean up list items?
On 5/2/2016 2:27 PM, Jussi Piitulainen wrote:
DFS writes:
On 5/2/2016 12:57 PM, Jussi Piitulainen wrote:
DFS writes:
Have: list1 = ['\r\n Item 1 ',' Item 2 ','\r\n ']
Want: list1 = ['Item 1','Item 2']
. .
Funny-looking data you have.
I know - sadly, it's actual data:
from lxml import html
import requests
webpage =
"http://www.usdirectory.com/ypr.aspx?fromform=qsearch&qs=TN&wqhqn=2&qc=Nashville&rg=30&qhqn=restaurant&sb=zipdisc&ap=2";
page = requests.get(webpage)
tree = html.fromstring(page.content)
addr1 = tree.xpath('//span[@class="text3"]/text()')
print 'Addresses: ', addr1
I couldn't figure out a better way to extract it from the HTML (maybe
XML and DOM?)
I should have guessed :) But now I'm a bit worried about those spaces
inside your items. Can it happen that item text is split into strings in
the middle?
Meaning split by me, or comes 'malformed' from the data source?
Then the above sanitation does the wrong thing.
If someone has the right solution, I'm watching, too.
Here's the raw data as stored in the tree:
---
1st page
['\r\n', '\r\n1918 W End
Ave, Nashville, TN 37203', '\r\n
', '\r\n1806 Hayes St, Nashville,
TN 37203', '\r\n', '\r\n
1701 Broadway, Nashville, TN 37203', '\r\n', '\r\n
209 10th Ave S, Nashville, TN 37203', '\r\n
', '\r\n907 20th Ave S, Nashville, TN
37212', '\r\n', '\r\n911
20th Ave S, Nashville, TN 37212', '\r\n', '\r\n
1722 W End Ave, Nashville, TN 37203', '\r\n
', '\r\n1905 Hayes St,
Nashville, TN 37203', '\r\n
', '\r\n2000 W End Ave,
Nashville, TN 37203']
---
Next page
['\r\n', '\r\n120 19th
Ave N, Nashville, TN 37203', '\r\n
', '\r\n1719 W End Ave Ste 101,
Nashville, TN 37203', '\r\n
', '\r\n1922 W End Ave, Nashville, TN
37203', '\r\n', '\r\n
909 20th Ave S, Nashville, TN 37212', '\r\n
', '\r\n
1807 Church St, Nashville, TN 37203', '\r\n
', '\r\n1721 Church St, Nashville, TN 37203',
'\r\n', '\r\n718
Division St, Nashville, TN 37203', '\r\n', '\r\n
907 12th Ave S, Nashville, TN 37203', '\r\n
', '\r\n204 21st Ave S,
Nashville, TN 37203', '\r\n
', '\r\n1811 Division St, Nashville,
TN 37203', '\r\n', '\r\n
903 Gleaves St, Nashville, TN 37203', '\r\n', '\r\n
1720 W End Ave Ste 530, Nashville, TN 37203', '\r\n
', '\r\n
1200 Division St Ste 100-A, Nashville, TN 37203', '\r\n
', '\r\n
422 7th Ave S, Nashville, TN 37203', '\r\n',
'\r\n605 8th Ave S, Nashville, TN 37203']
and so on
---
I've checked a couple hundred addresses visually, and so far I've only
seen 2 formats:
1. '\r\n'
2. '\r\n address '
--
https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 8:45 PM, [email protected] wrote: DFS at 2016/5/2 UTC+8 11:39:33AM wrote: To save a webpage to a file: - 1. import urllib 2. urllib.urlretrieve("http://econpy.pythonanywhere.com /ex/001.html","D:\file.html") - That's it! Why my system can't do it? Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (In tel)] on win32 Type "help", "copyright", "credits" or "license" for more information. from urllib import urlretrieve Traceback (most recent call last): File "", line 1, in ImportError: cannot import name 'urlretrieve' try from urllib.request import urlretrieve http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3 I'm running python 2.7.11 (32-bit) -- https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 4:42 AM, Peter Otten wrote:
DFS wrote:
Is VB using a local web cache, and Python not?
I'm not specifying a local web cache with either (wouldn't know how or
where to look). If you have Windows, you can try it.
I don't have Windows, but if I'm to believe
http://stackoverflow.com/questions/5235464/how-to-make-microsoft-xmlhttprequest-honor-cache-control-directive
the page is indeed cached and you can disable caching with
Option Explicit
Dim xmlHTTP, fso, fOut, startTime, endTime, webpage, webfile,i
webpage = "http://econpy.pythonanywhere.com/ex/001.html";
webfile = "D:\econpy001.html"
startTime = Timer
For i = 1 to 10
Set xmlHTTP = CreateObject("MSXML2.serverXMLHTTP")
xmlHTTP.Open "GET", webpage
xmlHTTP.setRequestHeader "Cache-Control", "max-age=0"
Tried that, and from later on that stackoverflow page:
xmlHTTP.setRequestHeader "Cache-Control", "private"
Neither made a difference. In fact, I saw faster times than ever - as
low as 0.41 for 10 loops.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 3:19 AM, Chris Angelico wrote:
There's an easier way to test if there's caching happening. Just crank
the iterations up from 10 to 100 and see what happens to the times. If
your numbers are perfectly fair, they should be perfectly linear in
the iteration count; eg a 1.8 second ten-iteration loop should become
an 18 second hundred-iteration loop. Obviously they won't be exactly
that, but I would expect them to be reasonably close (eg 17-19
seconds, but not 2 seconds).
100 loops
Finished VBScript in 3.953 seconds
Finished VBScript in 3.608 seconds
Finished VBScript in 3.610 seconds
Bit of a per-loop speedup going from 10 to 100.
Then the next thing to test would be to create a deliberately-slow web
server, and connect to that. Put a two-second delay into it, to
simulate a distant or overloaded server, and see if your logs show the
correct result. Something like this:
import time
try:
import http.server as BaseHTTPServer # Python 3
except ImportError:
import BaseHTTPServer # Python 2
class SlowHTTP(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
self.send_response(200)
self.send_header("Content-type","text/html")
self.end_headers()
self.wfile.write(b"Hello, ")
time.sleep(2)
self.wfile.write(b"world!")
server = BaseHTTPServer.HTTPServer(("", 1234), SlowHTTP)
server.serve_forever()
---
Test that with a web browser or command-line downloader (go to
http://127.0.0.1:1234/), and make sure that (a) it produces "Hello,
world!", and (b) it takes two seconds. Then set your test scripts to
downloading that URL. (Be sure to set them back to low iteration
counts first!) If the times are true and fair, they should all come
out pretty much the same - ten iterations, twenty seconds. And since
all that's changed is the server, this will be an accurate
demonstration of what happens in the real world: network requests
aren't always fast. Incidentally, you can also watch the server's log
to see if it's getting the appropriate number of requests.
It may turn out that changing the web server actually materially
changes your numbers. Comment out the sleep call and try it again -
you might find that your numbers come closer together, because this
naive server doesn't send back 204 NOT MODIFIED responses or anything.
Again, though, this would prove that you're not actually measuring
language performance, because the tests are more dependent on the
server than the client.
Even if the files themselves aren't being cached, you might find that
DNS is. So if you truly want to eliminate variables, replace the name
in your URL with an IP address. It's another thing that might mess
with your timings, without actually being a language feature.
Networking has about four billion variables in it. You're messing with
one of the least significant: the programming language :)
ChrisA
Thanks for the good feedback.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/2/2016 10:00 PM, Chris Angelico wrote: On Tue, May 3, 2016 at 11:51 AM, DFS wrote: On 5/2/2016 3:19 AM, Chris Angelico wrote: There's an easier way to test if there's caching happening. Just crank the iterations up from 10 to 100 and see what happens to the times. If your numbers are perfectly fair, they should be perfectly linear in the iteration count; eg a 1.8 second ten-iteration loop should become an 18 second hundred-iteration loop. Obviously they won't be exactly that, but I would expect them to be reasonably close (eg 17-19 seconds, but not 2 seconds). 100 loops Finished VBScript in 3.953 seconds Finished VBScript in 3.608 seconds Finished VBScript in 3.610 seconds Bit of a per-loop speedup going from 10 to 100. How many seconds was it for 10 loops? ChrisA ~0.44 -- https://mail.python.org/mailman/listinfo/python-list
Re: You gotta love a 2-line python solution
On 5/2/2016 11:27 PM, [email protected] wrote: DFS at 2016/5/3 9:12:24AM wrote: try from urllib.request import urlretrieve http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3 I'm running python 2.7.11 (32-bit) Alright, it works...someway. I try to get a zip file. It works, the file can be unzipped correctly. from urllib.request import urlretrieve urlretrieve("http://www.caprilion.com.tw/fed.zip";, "d:\\temp\\temp.zip") ('d:\\temp\\temp.zip', ) But when I try to get this forum page, it does get a html file but can't be viewed normally. urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ bmR7A", "d:\\temp\\temp.html") ('d:\\temp\\temp.html', ) I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-) Who knows what Google has done... it won't open in Opera. The tab title shows up, but after 20-30 seconds the screen just stays blank and the cursor quits loading. It's a mess - try running it thru BeautifulSoup.prettify() and it looks better. import BeautifulSoup from urllib.request import urlretrieve webfile = "D:\\afile.html" urllib.urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A",webfile) f = open(webfile) soup = BeautifulSoup.BeautifulSoup(f) f.close() print soup.prettify() -- https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/3/2016 12:06 AM, Michael Torrie wrote: Now if you want to talk about processing the data once you have it, there we can talk about speeds and optimization. Be glad to. Helps me learn python, so bring whatever challenge you want and I'll try to keep up. One small comparison I was able to make was VBA vs python/pyodbc to summarize an Access database. Not quite a fair test, but interesting nonetheless. --- Access 2003 file Access 2003 VBA code 2,099,101 rows 114 tables (max row = 600288) 971 columns text: 503 boolean: 4 numeric: 351 date-time: 108 binary:5 309 indexes (25 foreign keys) 333,549,568 bytes on disk Time: 0.18 seconds --- same Access 2003 file 32-bit python 2.7.11 + 32-bit pyodbc 3.0.6 2,099,101 rows 114 tables (max row = 600288) 971 columns text: 503 numeric: 351 date-time: 108 binary:5 boolean: 4 309 indexes (foreign keys na via ODBC*) 333,549,568 bytes on disk Time: 0.49 seconds * the Access ODBC driver doesn't support the SQLForeignKeys function --- -- https://mail.python.org/mailman/listinfo/python-list
Re: Not x.islower() has different output than x.isupper() in list output...
On 5/3/2016 8:00 AM, Chris Angelico wrote:
On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
wrote:
Chris Angelico writes:
This assumes, of course, that there is a function swapcase which can
return a string with case inverted. I'm not sure such a function
exists.
str.swapcase("foO")
'FOo'
I suppose for this discussion it doesn't matter if it's imperfect.
What was imperfect?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Not x.islower() has different output than x.isupper() in list output...
On 5/3/2016 9:13 AM, Chris Angelico wrote:
On Tue, May 3, 2016 at 11:01 PM, DFS wrote:
On 5/3/2016 8:00 AM, Chris Angelico wrote:
On Tue, May 3, 2016 at 9:25 PM, Jussi Piitulainen
wrote:
Chris Angelico writes:
This assumes, of course, that there is a function swapcase which can
return a string with case inverted. I'm not sure such a function
exists.
str.swapcase("foO")
'FOo'
I suppose for this discussion it doesn't matter if it's imperfect.
What was imperfect?
It doesn't invert, the way numeric negation does.
What do you mean by 'case inverted'?
It looks like it swaps the case correctly between upper and lower.
And if you try to
define exactly what it does, you'll come right back to
isupper()/islower(), so it's not much help in defining those.
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Saving Consol outputs in a python script
On 5/3/2016 8:14 AM, [email protected] wrote: Hello, I'm new to python and have a Question. I'm running a c++ file with a python script like: import os import subprocess subprocess.call(["~/caffe/build/examples/cpp_classification/classification", "deploy.prototxt", "this.caffemodel", "mean.binaryproto", "labels.txt", "Bild2.jpg"]) and it runes fine. On the console it gives me the output: ~/Desktop/Downloader/Sym+$ python Run_C.py -- Prediction for Bild2.jpg -- 0.9753 - "Class 1" 0.0247 - "Class 2" What I need are the 2 values for the 2 classes saved in a variable in the .py script, so that I can write them into a text file. Would be super nice if someone could help me! This looks like the ticket: http://eli.thegreenplace.net/2015/redirecting-all-kinds-of-stdout-in-python/ have a nice day! Steffen -- https://mail.python.org/mailman/listinfo/python-list
Re: Not x.islower() has different output than x.isupper() in list output...
On 5/3/2016 10:49 AM, Jussi Piitulainen wrote:
DFS writes:
On 5/3/2016 9:13 AM, Chris Angelico wrote:
It doesn't invert, the way numeric negation does.
What do you mean by 'case inverted'?
It looks like it swaps the case correctly between upper and lower.
There's letters that do not come in exact pairs of upper and lower case,
so _some_ swaps are not invertible: you swap twice and end up somewhere
else than your starting point.
The "\N{ANSGTROM SIGN}" looks like the Swedish upper-case
a-with-ring-above but isn't the same character, yet Python swaps its
case to the actual lower-case a-with-ring above. It can't go back to
_both_ the Angstrom sign and the actual upper case letter.
(Not sure why the sign is considered a cased letter at all.)
Thanks for the explanation.
Does that mean:
lower(Å) != å ?
and
upper(å) != Å ?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/3/2016 11:28 AM, Tim Chase wrote: On 2016-05-03 00:24, DFS wrote: One small comparison I was able to make was VBA vs python/pyodbc to summarize an Access database. Not quite a fair test, but interesting nonetheless. Access 2003 file Access 2003 VBA code Time: 0.18 seconds same Access 2003 file 32-bit python 2.7.11 + 32-bit pyodbc 3.0.6 Time: 0.49 seconds Curious whether you're forcing Access VBA to talk over ODBC or whether Access is using native access/file-handling (and thus bypassing the ODBC overhead)? The latter, which is why I said "not quite a fair test". -- https://mail.python.org/mailman/listinfo/python-list
Re: How to become more motivated to learn Python
On 5/3/2016 10:12 PM, Christopher Reimer wrote: When I realized that I wasn't learning enough about the Python language from translating BASIC games, I started coding a chess engine. If you ever look at the academic literature for chess programming from the last 50+ years, you can spend a lifetime solving the programming challenges from implementing the game of kings. We can have a good thread on python chess engines some time. I'm also going to write a chess engine in python - follow the UCI protocol and all. You're way ahead of me, I'm sure, but I did already look into algebraic notation, game recording, FEN and all that. pyChess is a nice little game: www.pychess.org The one thing I'm not going to do is review anyone else's code until I put out v1.0 of my own. My goal with v1.0 is for the pieces to make valid moves. That's it. Following that, I'll work in getting the game recording right. No 'strategy' at first. Maybe later I can load a library of well-known openings and try to utilize them. How far along are you in your engine development? Getting the code for en passant and castling right looks to be a bit of an obstacle. What's nice is the strongest engine (Stockfish) is totally open source. -- https://mail.python.org/mailman/listinfo/python-list
python chess engines
On 5/3/2016 8:00 PM, DFS wrote: How far along are you in your engine development? I can display a text-based chess board on the console (looks better with a mono font). 8 BR BN BB BQ BK BB BN BR 7 BP BP BP BP BP BP BP BP 6 __ __ __ __ __ __ __ __ 5 __ __ __ __ __ __ __ __ 4 __ __ __ __ __ __ __ __ 3 __ __ __ __ __ __ __ __ 2 WP WP WP WP WP WP WP WP 1 WR WN WB WQ WK WB WN WR A B C D E F G H With feedback from this list, I had to break a lot of bad Java habits to make the code more Pythonic. Right now I'm going back and forth between writing documentation and unit tests. Once I finalized the code in its current state, I'll post it up on GitHub under the MIT license. Future updates will have a fuller console interface and moves for individual pieces implemented. Thank you, Chris R. Wanted to start a new thread, rather than use the 'motivated' thread. Can you play your game at the console? The way I think about a chess engine is it doesn't even display a board. It accepts a move as input, records the move, analyzes the positions after the move, and returns the next move. Here's the UCI protocol. http://download.shredderchess.com/div/uci.zip -- https://mail.python.org/mailman/listinfo/python-list
Re: Fastest way to retrieve and write html contents to file
On 5/3/2016 2:41 PM, Tim Chase wrote: On 2016-05-03 13:00, DFS wrote: On 5/3/2016 11:28 AM, Tim Chase wrote: On 2016-05-03 00:24, DFS wrote: One small comparison I was able to make was VBA vs python/pyodbc to summarize an Access database. Not quite a fair test, but interesting nonetheless. Access 2003 file Access 2003 VBA code Time: 0.18 seconds same Access 2003 file 32-bit python 2.7.11 + 32-bit pyodbc 3.0.6 Time: 0.49 seconds Curious whether you're forcing Access VBA to talk over ODBC or whether Access is using native access/file-handling (and thus bypassing the ODBC overhead)? The latter, which is why I said "not quite a fair test". Can you try the same tests, getting Access/VBA to use ODBC instead to see how much overhead ODBC entails? -tkc Done. I dropped a few extraneous tables from the database (was 114 tables): Access 2003 .mdb file 2,009,164 rows 97 tables (max row = 600288) 725 columns text: 389 boolean: 4 numeric: 261 date-time: 69 binary:2 264 indexes (25 foreign keys)* 299,167,744 bytes on disk 1. DAO Time: 0.15 seconds 2. ADODB, Access ODBC driver, OpenSchema method** Time: 0.26 seconds 3. python, pyodbc, Access ODBC driver Time: 0.42 seconds * despite being written by Microsoft, the Access ODBC driver doesn't support the ODBC SQLForeignKeys function, so the python code doesn't show a count of foreign keys ** the Access ODBC driver doesn't support the adSchemaIndexes or adSchemaForeignKeys query types, so I used DAO code to count indexes and foreign keys. -- https://mail.python.org/mailman/listinfo/python-list
Re: Not x.islower() has different output than x.isupper() in list output...
On 5/3/2016 11:28 PM, Steven D'Aprano wrote:
On Wed, 4 May 2016 12:49 am, Jussi Piitulainen wrote:
DFS writes:
On 5/3/2016 9:13 AM, Chris Angelico wrote:
It doesn't invert, the way numeric negation does.
What do you mean by 'case inverted'?
It looks like it swaps the case correctly between upper and lower.
There's letters that do not come in exact pairs of upper and lower case,
Languages with two distinct lettercases, like English, are called bicameral.
The two cases are technically called majuscule and minuscule, but
colloquially known as uppercase and lowercase since movable type printers
traditionally used to keep the majuscule letters in a drawer above the
minuscule letters.
Many alphabets are unicameral, that is, they only have a single lettercase.
Examples include Hebrew, Arabic, Hangul, and many others. Georgian is an
interesting example, as it is the only known written alphabet that started
as a bicameral script and then became unicameral.
Consequently, many letters are neither upper nor lower case, and have
Unicode category "Letter other":
py> c = u'\N{ARABIC LETTER FEH}'
py> unicodedata.category(c)
'Lo'
py> c.isalpha()
True
py> c.isupper()
False
py> c.islower()
False
Even among bicameral alphabets, there are a few anomalies. The three most
obvious ones are Greek sigma, German Eszett (or "sharp S") and Turkish I.
(1) The Greek sigma is usually written as Σ or σ in uppercase and lowercase
respectively, but at the end of a word, lowercase sigma is written as ς.
(This final sigma is sometimes called "stigma", but should not be confused
with the archaic Greek letter stigma, which has two cases Ϛ ϛ, at least
when it is not being written as digamma Ϝϝ -- and if you're confused, so
are the Greeks :-)
Python 3.3 correctly handles the sigma/final sigma when upper- and
lowercasing:
py> 'ΘΠΣΤΣ'.lower()
'θπστς'
py> 'ΘΠΣΤΣ'.lower().upper()
'ΘΠΣΤΣ'
(2) The German Eszett ß traditionally existed in only lowercase forms, but
despite the existence of an uppercase form since at least the 19th century,
when the Germans moved away from blackletter to Roman-style letters, the
uppercase form was left out. In recent years, printers in Germany have
started to reintroduce an uppercase version, and the German government have
standardized on its use for placenames, but not other words.
(Aside: in Germany, ß is not considered a distinct letter of the alphabet,
but a ligature of ss; historically it derived from a ligature of ſs, ſz or
ſʒ. The funny characters you may or may not be able to see are the long-S
and round-Z.)
Python follows common, but not universal, German practice for eszett:
py> 'ẞ'.lower()
'ß'
py> 'ß'.upper()
'SS'
Note that this is lossy: given a name like "STRASSER", it is impossible to
tell whether it should be title-cased to "Strasser" or "Straßer". It also
means that uppercasing a string can make it longer.
For more on the uppercase eszett, see:
https://typography.guru/journal/germanys-new-character/
https://typography.guru/journal/how-to-draw-a-capital-sharp-s-r18/
(3) In most Latin alphabets, the lowercase i and j have a "tittle" diacritic
on them, but not the uppercase forms I and J. Turkish and a few other
languages have both I-with-tittle and I-without-tittle.
(As far as I know, there is no language with a dotless J.)
So in Turkish, the correct uppercase to lowercase and back again should go:
Dotless I: I -> ı -> I
Dotted I: İ -> i -> İ
Python does not quite manage to handle this correctly for Turkish
applications, since it loses the dotted/dotless distinction:
py> 'ı'.upper()
'I'
py> 'İ'.lower()
'i'
and further case conversions follow the non-Turkish rules.
Note that sometimes getting this wrong can have serious consequences:
http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail
Linguist much?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Not x.islower() has different output than x.isupper() in list output...
On 5/4/2016 11:37 AM, Steven D'Aprano wrote: On Thu, 5 May 2016 12:09 am, DFS wrote: On 5/3/2016 11:28 PM, Steven D'Aprano wrote: Languages with two distinct lettercases, like English, are called bicameral. [...] Linguist much? Possibly even a cunning one. I see you as more of a Colonel Angus. -- https://mail.python.org/mailman/listinfo/python-list
No SQLite newsgroup, so I'll ask here about SQLite, python and MS Access
Both of the following python commands successfully create a SQLite3
datafile which crashes Access 2003 immediately upon trying to open it
(via an ODBC linked table).
import sqlite3
conn = sqlite3.connect("dfile.db")
import pyodbc
conn = pyodbc.connect('Driver={SQLite3 ODBC Driver};Database=dfile.db')
The file is created, a table is added, I add rows to the table in code,
etc., and it can be read by 'DB Browser for SQLite' so it's a valid
SQLite3 database, but Access won't read it. I can create and store a
link to the table - using that ODBC driver - but as soon as I try to
open it: "Microsoft Access has stopped working"
On the other hand, a SQLite3 file created in VBScript, using the same
ODBC driver, /is/ readable with Access 2003:
Set conn = CreateObject("ADODB.Connection")
conn.Open "Driver={SQLite3 ODBC Driver};Database=dfile.db;"
python 2.7.11, pyodbc 3.0.6, ODBC driver, and Access 2003: all 32-bit
OS is Win8.1Pro 64-bit.
I can't find anything on the web.
Any ideas?
Thanks
--
https://mail.python.org/mailman/listinfo/python-list
Whittle it on down
Want to whittle a list like this: [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs & Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city guide', 'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS & TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'www.custombuiltpt.com/', 'RACQUETBALL COURTS PRIVATE', 'www.lafitness.com', 'GYMNASIUMS', 'HEALTH & FITNESS CLUBS', 'www.lafitness.com', 'HEALTH & FITNESS CLUBS', 'www.lafitness.com', 'PERSONAL FITNESS TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'EXERCISE & PHYSICAL FITNESS PROGRAMS', 'FITNESS CENTERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'PERSONAL FITNESS TRAINERS', '5', '4', '3', '2', '1', 'Yellow Pages', 'About Us', 'Contact Us', 'Support', 'Terms of Use', 'Privacy Policy', 'Advertise With Us', 'Add/Update Listing', 'Business Profile Login', 'F.A.Q.'] down to ['PHYSICAL FITNESS CONSULTANTS & TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'RACQUETBALL COURTS PRIVATE', 'GYMNASIUMS', 'HEALTH & FITNESS CLUBS', 'HEALTH & FITNESS CLUBS', 'PERSONAL FITNESS TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'EXERCISE & PHYSICAL FITNESS PROGRAMS', 'FITNESS CENTERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'PERSONAL FITNESS TRAINERS'] Want to keep all elements containing only upper case letters or upper case letters and ampersand (where ampersand is surrounded by spaces) Is it easier to extract elements meeting those conditions, or remove elements meeting the following conditions: * elements with a lower-case letter in them * elements with a number in them * elements with a period in them ? So far all I figured out is remove items with a period: newlist = [ x for x in oldlist if "." not in x ] Thanks for help, python gurus. -- https://mail.python.org/mailman/listinfo/python-list
Re: No SQLite newsgroup, so I'll ask here about SQLite, python and MS Access
On 5/4/2016 10:02 PM, Stephen Hansen wrote: On Wed, May 4, 2016, at 03:46 PM, DFS wrote: I can't find anything on the web. Have you tried: http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users If you really must access it over a newsgroup, you can use the Gmane mirror: http://gmane.org/info.php?group=gmane.comp.db.sqlite.general Thanks Any ideas? Sorry, I don't use Access. -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 2:04 AM, Steven D'Aprano wrote:
On Thursday 05 May 2016 14:58, DFS wrote:
Want to whittle a list like this:
[...]
Want to keep all elements containing only upper case letters or upper
case letters and ampersand (where ampersand is surrounded by spaces)
Start by writing a function or a regex that will distinguish strings that
match your conditions from those that don't. A regex might be faster, but
here's a function version.
def isupperalpha(string):
return string.isalpha() and string.isupper()
def check(string):
if isupperalpha(string):
return True
parts = string.split("&")
if len(parts) < 2:
return False
# Don't strip leading spaces from the start of the string.
parts[0] = parts[0].rstrip(" ")
# Or trailing spaces from the end of the string.
parts[-1] = parts[-1].lstrip(" ")
# But strip leading and trailing spaces from the middle parts
# (if any).
for i in range(1, len(parts)-1):
parts[i] = parts[i].strip(" ")
return all(isupperalpha(part) for part in parts)
Now you have two ways of filtering this. The obvious way is to extract
elements which meet the condition. Here are two ways:
# List comprehension.
newlist = [item for item in oldlist if check(item)]
# Filter, Python 2 version
newlist = filter(check, oldlist)
# Filter, Python 3 version
newlist = list(filter(check, oldlist))
In practice, this is the best (fastest, simplest) way. But if you fear that
you will run out of memory dealing with absolutely humongous lists with
hundreds of millions or billions of strings, you can remove items in place:
def remove(func, alist):
for i in range(len(alist)-1, -1, -1):
if not func(alist[i]):
del alist[i]
Note the magic incantation to iterate from the end of the list towards the
front. If you do it the other way, Bad Things happen. Note that this will
use less memory than extracting the items, but it will be much slower.
You can combine the best of both words. Here is a version that uses a
temporary list to modify the original in place:
# works in both Python 2 and 3
def remove(func, alist):
# Modify list in place, the fast way.
alist[:] = filter(check, alist)
You are out of your mind.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 1:39 AM, Stephen Hansen wrote: pattern = re.compile(r"^[A-Z\s&]+$") output = [x for x in list if pattern.match(x)] Holy Shr"^[A-Z\s&]+$" One line of parsing! I was figuring a few list comprehensions would do it - this is better. (note: the reason I specified 'spaces around ampersand' is so it would remove 'Q&A' if that ever came up - but some people write 'Q & A', so I'll live with that exception, or try to tweak it myself. You're the man, man. Thank you! -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 1:53 AM, Jussi Piitulainen wrote: Either way is easy to approximate with a regex: import re upper = re.compile(r'[A-Z &]+') lower = re.compile(r'[^A-Z &]') print([datum for datum in data if upper.fullmatch(datum)]) print([datum for datum in data if not lower.search(datum)]) This is similar to Hansen's solution. I've skipped testing that the ampersand is between spaces, and I've skipped the period. Adjust. Will do. This considers only ASCII upper case letters. You can add individual letters that matter to you, or you can reach for the documentation to find if there is some generic notation for all upper case letters. The newer regex package on PyPI supports POSIX character classes like [:upper:], I think, and there may or may not be notation for Unicode character categories in re or regex - LU would be Letter, Uppercase. Thanks. -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 9:32 AM, Stephen Hansen wrote: On Thu, May 5, 2016, at 12:36 AM, Steven D'Aprano wrote: Oh, a further thought... On Thursday 05 May 2016 16:46, Stephen Hansen wrote: I don't even care about faster: Its overly complicated. Sometimes a regular expression really is the clearest way to solve a problem. Putting non-ASCII letters aside for the moment, how would you match these specs as a regular expression? I don't know, but mostly because I wouldn't even try. The requirements are over-specified. If you look at the OP's data (and based on previous conversation), he's doing web scraping and trying to pull out good data. There's no absolutely perfect way to do that because the system he's scraping isn't meant for data processing. The data isn't cleanly articulated. Instead, he wants a heuristic to pull out what look like section titles. Assigned by a company named localeze, apparently. http://www.usdirectory.com/cat/g0 https://www.neustarlocaleze.biz/welcome/ The OP looked at the data and came up with a simple set of rules that identify these section titles: Want to keep all elements containing only upper case letters or upper case letters and ampersand (where ampersand is surrounded by spaces) This translates naturally into a simple regular expression: an uppercase string with spaces and &'s. Now, that expression doesn't 100% encode every detail of that rule-- it allows both Q&A and Q & A-- but on my own looking at the data, I suspect its good enough. The titles are clearly separate from the other data scraped by their being upper cased. We just need to expand our allowed character range into spaces and &'s. Nothing in the OP's request demands the kind of rigorous matching that your scenario does. Its a practical problem with a simple, practical answer. Yes. And simplicity + practicality = successfulality. And I do a sanity check before using the data anyway: after parse and cleanup and regex matching, I make sure all lists have the same number of elements: lenData = [len(title),len(names),len(addr),len(street),len(city),len(state),len(zip)] if len(set(lenData)) != 1: alert the media -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 1:54 PM, Steven D'Aprano wrote:
On Thu, 5 May 2016 10:31 pm, DFS wrote:
You are out of your mind.
That's twice you've tried to put me down, first by dismissing my comments
about text processing with "Linguist much", and now an outright insult. The
first time I laughed it off and made a joke about it. I won't do that
again.
>
You asked whether it was better to extract the matching strings into a new
list, or remove them in place in the existing list. I not only showed you
how to do both, but I tried to give you the mental tools to understand when
you should pick one answer over the other. And your response is to insult
me and question my sanity.
Well, DFS, I might be crazy, but I'm not stupid. If that's really how you
feel about my answers, I won't make the mistake of wasting my time
answering your questions in the future.
Over to you now.
heh! Relax, pal.
I was just trying to be funny - no insult intended either time, of
course. Look for similar responses from me in the future. Usenet
brings out the smart-aleck in me.
Actually, you should've accepted the 'Linguist much?' as a compliment,
because I seriously thought you were.
But you ARE out of your mind if you prefer that convoluted "function"
method over a simple 1-line regex method (as per S. Hansen).
def isupperalpha(string):
return string.isalpha() and string.isupper()
def check(string):
if isupperalpha(string):
return True
parts = string.split("&")
if len(parts) < 2:
return False
parts[0] = parts[0].rstrip(" ")
parts[-1] = parts[-1].lstrip(" ")
for i in range(1, len(parts)-1):
parts[i] = parts[i].strip(" ")
return all(isupperalpha(part) for part in parts)
I'm sure it does the job well, but that style brings back [bad] memories
of the VBA I used to write. I expected something very concise and
'pythonic' (which I'm learning is everyone's favorite mantra here in
python-land).
Anyway, I appreciate ALL replies to my queries. So thank you for taking
the time.
Whenever I'm able, I'll try to contribute to clp as well.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 2:56 PM, Stephen Hansen wrote: On Thu, May 5, 2016, at 05:31 AM, DFS wrote: You are out of your mind. Whoa, now. I might disagree with Steven D'Aprano about how to approach this problem, but there's no need to be rude. Seriously not trying to be rude - more smart-alecky than anything. Hope D'Aprano doesn't stay butthurt... Everyone's trying to help you, after all. Yes, and I do appreciate it. I've only been working with python for about a month, but I feel like I'm making good progress. clp is a great resource, and I'll be hanging around for a long time, and will contribute when possible. Thanks for your help. -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/5/2016 1:39 AM, Stephen Hansen wrote: Given: input = [u'Espa\xf1ol', 'Health & Fitness Clubs (36)', 'Health Clubs & Gymnasiums (42)', 'Health Fitness Clubs', 'Name', 'Atlanta city guide', 'edit address', 'Tweet', 'PHYSICAL FITNESS CONSULTANTS & TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'www.custombuiltpt.com/', 'RACQUETBALL COURTS PRIVATE', 'www.lafitness.com', 'GYMNASIUMS', 'HEALTH & FITNESS CLUBS', 'www.lafitness.com', 'HEALTH & FITNESS CLUBS', 'www.lafitness.com', 'PERSONAL FITNESS TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'EXERCISE & PHYSICAL FITNESS PROGRAMS', 'FITNESS CENTERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'PERSONAL FITNESS TRAINERS', '5', '4', '3', '2', '1', 'Yellow Pages', 'About Us', 'Contact Us', 'Support', 'Terms of Use', 'Privacy Policy', 'Advertise With Us', 'Add/Update Listing', 'Business Profile Login', 'F.A.Q.'] Then: pattern = re.compile(r"^[A-Z\s&]+$") output = [x for x in list if pattern.match(x)] output ['PHYSICAL FITNESS CONSULTANTS & TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'RACQUETBALL COURTS PRIVATE', 'GYMNASIUMS', 'HEALTH & FITNESS CLUBS', 'HEALTH & FITNESS CLUBS', 'PERSONAL FITNESS TRAINERS', 'HEALTH CLUBS & GYMNASIUMS', 'EXERCISE & PHYSICAL FITNESS PROGRAMS', 'FITNESS CENTERS', 'HEALTH CLUBS & GYMNASIUMS', 'HEALTH CLUBS & GYMNASIUMS', 'PERSONAL FITNESS TRAINERS'] Should've looked earlier. Their master list of categories http://www.usdirectory.com/cat/g0 shows a few commas, a bunch of dashes, and the ampersands we talked about. "OFFICE SERVICES, SUPPLIES & EQUIPMENT" gets removed because of the comma. "AUTOMOBILE - DEALERS" gets removed because of the dash. I updated your regex and it seems to have fixed it. orig: (r"^[A-Z\s&]+$") new : (r"^[A-Z\s&,-]+$") Thanks again. -- https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/6/2016 3:45 AM, Peter Otten wrote:
DFS wrote:
Should've looked earlier. Their master list of categories
http://www.usdirectory.com/cat/g0 shows a few commas, a bunch of dashes,
and the ampersands we talked about.
"OFFICE SERVICES, SUPPLIES & EQUIPMENT" gets removed because of the comma.
"AUTOMOBILE - DEALERS" gets removed because of the dash.
I updated your regex and it seems to have fixed it.
orig: (r"^[A-Z\s&]+$")
new : (r"^[A-Z\s&,-]+$")
Thanks again.
If there is a "master list" compare your candidates against it instead of
using a heuristic, i. e.
categories = set(master_list)
output = [category for category in input if category in categories]
You can find the categories with
import urllib.request
import bs4
soup =
bs4.BeautifulSoup(urllib.request.urlopen("http://www.usdirectory.com/cat/g0";).read())
categories = set()
for li in soup.find_all("li"):
... assert li.parent.parent["class"][0].startswith("category_items")
... categories.add(li.text)
...
print("\n".join(sorted(categories)[:10]))
"import urllib.request
ImportError: No module named request"
I'm on python 2.7.11
Accounting & Bookkeeping Services
Adoption Services
Adult Entertainment
Advertising
Agricultural Equipment & Supplies
Agricultural Production
Agricultural Services
Aids Resources
Aircraft Charters & Rentals
Aircraft Dealers & Services
Yeah, I actually did something like that last night. Was trying to get
their full tree structure, which goes 4 levels deep: ie
Arts & Entertainment
Newpapers
News Dealers
Prepess Services
What I referred to as their 'master list' is actually just 2 levels
deep. My bad.
So far I haven't come across one that had anything in it but letters,
dashes, commas or ampersands.
Thanks
--
https://mail.python.org/mailman/listinfo/python-list
Re: Whittle it on down
On 5/6/2016 9:58 AM, DFS wrote:
On 5/6/2016 3:45 AM, Peter Otten wrote:
DFS wrote:
Should've looked earlier. Their master list of categories
http://www.usdirectory.com/cat/g0 shows a few commas, a bunch of dashes,
and the ampersands we talked about.
"OFFICE SERVICES, SUPPLIES & EQUIPMENT" gets removed because of the
comma.
"AUTOMOBILE - DEALERS" gets removed because of the dash.
I updated your regex and it seems to have fixed it.
orig: (r"^[A-Z\s&]+$")
new : (r"^[A-Z\s&,-]+$")
Thanks again.
If there is a "master list" compare your candidates against it instead of
using a heuristic, i. e.
categories = set(master_list)
output = [category for category in input if category in categories]
You can find the categories with
import urllib.request
import bs4
soup =
bs4.BeautifulSoup(urllib.request.urlopen("http://www.usdirectory.com/cat/g0";).read())
categories = set()
for li in soup.find_all("li"):
... assert li.parent.parent["class"][0].startswith("category_items")
... categories.add(li.text)
...
print("\n".join(sorted(categories)[:10]))
"import urllib.request
ImportError: No module named request"
Figured it out using urllib2. Your code returns 411 categories from
that first page.
There are up to 4 levels of categorization:
Level 1: Arts & Entertainment
Level 2: Newspapers
Level 3: Newspaper Brokers
Level 3: Newspaper Dealers Back Number
Level 3: Newspaper Delivery
Level 3: Newspaper Distributors
Level 3: Newsracks
Level 3: Printers Newspapers
Level 3: Newspaper Dealers
Level 3: News Dealers
Level 4: News Dealers Wholesale
Level 4: Shoppers News Publications
Level 3: News Service
Level 4: Newspaper Feature Syndicates
Level 4: Prepress Services
http://www.usdirectory.com/cat/g0 shows 21 Level 1 categories, and 390
Level 2. To get the Level 3 and 4 you have to drill-down using the
hyperlinks.
How to do it in python code is beyond my skills at this point. Get the
hrefs and load them and parse, then get the next level and load them and
parse, etc.?
--
https://mail.python.org/mailman/listinfo/python-list
A fun python CLI program for all to enjoy!
getAddresses.py
Scrapes addresses from www.usdirectory.com and stores them in a SQLite
database, or writes them to text files for mailing labels, etc
Now, just by typing 'fast food Taco Bell 10 db all' you can find
out how many Taco Bells are within 10 miles of you, and store all the
addresses in your own address database.
No more convoluted Googling, or hitting the 'Next Page' button, or
fumbling with the Yellow Pages...
Note: the db structure is flat on purpose, and the .csv files aren't
quote delimited.
Put the program in its own directory. It creates the SQLite database
there, and writes files there, too.
Reviews of code, bug reports, criticisms, suggestions for improvement,
etc are all welcome.
Enjoy!
#getAddresses.py
import os, sys, requests, time, datetime
from lxml import html
import pyodbc, sqlite3, re
#show values of variables, HTML content, etc
#set it to False for short/concise program output
verbose = False
if verbose == True:
print "The verbose setting is turned On."
print ""
#check if address is unique
addrCheck = []
def addrUnique(addr):
if addr not in addrCheck:
x = True
addrCheck.append(addr)
else: x = False
return x
#validate and parse command line
def showHelp():
print ""
print " Enter search word(s), city or zip, state, miles to search, txt
or csv or db, # addresses to save (no commas)"
print ""
print " eg: restaurant Knoxville TN 10 txt 50"
print " search for restaurants within 10 miles of Knoxville TN, and
write"
print " the first 50 address to a txt file"
print ""
print " eg: furniture 30303 GA 20 csv all"
print " search for furniture within 20 miles of zip 30303 GA,"
print " and write all results to a csv file"
print ""
print " eg: boxing gyms Detroit MI 10 db 5"
print " search for boxing gyms within 10 miles of Detroit MI, and
store"
print " the first 5 results in a database"
print ""
print " All entries are case-insensitive (ie TX or tx are acceptable)"
exit(0)
argCnt = len(sys.argv)
if argCnt < 7: showHelp()
if verbose == True:
print ""
print str(argCnt) + " arguments"
keyw = "" #eg restaurant,
boxing gym
if argCnt == 7: keyw = sys.argv[1] #one search word
if argCnt > 7: #multiple search words
for i in range(1,argCnt-5):
keyw = keyw + sys.argv[i] + "+"
keyw = keyw[:-1]#drop trailing + sign
cityzip = sys.argv[argCnt-5] #eg Atlanta or 30339
state= sys.argv[argCnt-4] #eg GA
miles= sys.argv[argCnt-3] #eg 5,10,20,30,50 (website allows max 30)
store= sys.argv[argCnt-2] #write address to file or database
addrWant = sys.argv[argCnt-1] #eg save All or number >0
if addrWant.lower() != "all": #how many addresses to save
if addrWant.isdigit() == False: showHelp()
if addrWant == "0": showHelp()
addrWant = int(addrWant)
elif addrWant.lower() == "all": addrWant = addrWant.lower()
else: addrWant = int(addrWant)
if store != "csv" and store != "txt" and store != "db": showHelp()
#begin timing the code
startTime = time.clock()
#website, SQLite db, search string, current date/time for use with db
datasrc = "www.usdirectory.com"
dbName = "addresses.sqlite"
search = keyw + " " + str(cityzip) + " " + state + " " + str(miles) + "
" + str(addrWant)
loaddt = datetime.datetime.now()
#write addresses to file
#each time the same search is done, the file is deleted and recreated
if store == "csv" or store == "txt":
#csv will write in .csv format - header and 1 line per address
#txt will write out 3 lines per address, then blank before next address
webfile = "usdirectory.com_"+keyw+"_"+cityzip+"_"+state+"."+store
f = open(webfile,"w")
if store == "csv": f.write("Name,Address,CityStateZip\n")
f.close
#store addresses in database
cSQL = ""
if store == "db":
#creates a SQLite database that Access 2003 can't read
#conn = sqlite3.connect(dbName)
#also creates a SQLite database that Access 2003 can't read
conn = pyodbc.connect('Driver={SQLite3 ODBC Driver};Database=' + dbName)
db = conn.cursor()
cSQL = "CREATE TABLE If Not Exists ADDRESSES "
cSQL += "(datasrc, search, category, name, street, city, state, zip,
loaddt, "
cSQL += "PRIMARY KEY (datasrc, search, name, street));"
db.execute(cSQL)
# cSQL = "CREATE TABLE If Not Exists CATEGORIES "
# cSQL += "(catID INTEGER PRIMARY KEY, catDesc);"
# db.execute(cSQL)
# db.execute("CREATE UNIQUE INDEX If Not Exists UIDX_CATDESC ON
CA
Re: Whittle it on down
On 5/6/2016 11:44 AM, Peter Otten wrote: DFS wrote: There are up to 4 levels of categorization: http://www.usdirectory.com/cat/g0 shows 21 Level 1 categories, and 390 Level 2. To get the Level 3 and 4 you have to drill-down using the hyperlinks. How to do it in python code is beyond my skills at this point. Get the hrefs and load them and parse, then get the next level and load them and parse, etc.? Yes, that should work ;) How about you do it, and I'll tell you if you did it right? ha! -- https://mail.python.org/mailman/listinfo/python-list
Re: A fun python CLI program for all to enjoy!
On 5/6/2016 4:30 PM, MRAB wrote:
On 2016-05-06 20:10, DFS wrote:
getAddresses.py
Scrapes addresses from www.usdirectory.com and stores them in a SQLite
database, or writes them to text files for mailing labels, etc
Now, just by typing 'fast food Taco Bell 10 db all' you can find
out how many Taco Bells are within 10 miles of you, and store all the
addresses in your own address database.
No more convoluted Googling, or hitting the 'Next Page' button, or
fumbling with the Yellow Pages...
Note: the db structure is flat on purpose, and the .csv files aren't
quote delimited.
Put the program in its own directory. It creates the SQLite database
there, and writes files there, too.
Reviews of code, bug reports, criticisms, suggestions for improvement,
etc are all welcome.
OK, you asked for it... :-)
1. It's shorter and clearer not to compare with True or False:
if verbose:
and:
if not dupeRow:
Done. It will take some getting used to, though. I like that it's
shorter, but I could do the same in VBA and almost always chose not to.
2. You can print a blank line with an empty print statement:
print
Done. I actually like the way print looks better than print ""
3. When looking for unique items, a set is a better choice than a list:
addrCheck = set()
def addrUnique(addr):
if addr not in addrCheck:
x = True
addrCheck.add(addr)
else:
x = False
return x
Done.
I researched this just now on StackOverflow:
"Sets are significantly faster when it comes to determining if an object
is present in the set"
and
"lists are very nice to sort and have order while sets are nice to use
when you don't want duplicates and don't care about order."
The speed difference won't matter here in my little app, but it's better
to use the right construct for the job.
4. Try string formatting instead multiple concatenation:
print "%s arguments" % argCnt
You're referring to this line:
print str(argCnt) + " arguments"
Is there a real benefit of using string formatting here? (other than
the required str() conversion)
5. Strings have a .join method, and when you combine it with string
slicing:
keyw = "+".join(sys.argv[1 : argCnt - 5])
Slick. Works a treat, and saved 2 lines of code. String handling is
another area in which python shines compared to VB.
6. Another example of string formatting:
search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)
Done. It's shorter, and doesn't require the str() conversion I had to
do on several of the items.
If I can remember to use it, it should eliminate these:
"TypeError: cannot concatenate 'str' and 'int' objects"
7. It's recommended to use the 'with' statement when handling files:
with open(webfile, "w") as f:
if store == "csv":
f.write("Name,Address,CityStateZip\n")
Done. I read that using 'with' means Python closes the file even if an
exception occurs. So a definite benefit.
If you don't want to use the 'with' statement, note that closing the
file is:
f.close()
It needs the "()"!
I used close() in 1 place, but close without parens in 2 other places.
So it works either way. Good catch.
(it's moot now: all 'f.open()/f.close()' replaced by 'with open()')
8. When using SQL, you shouldn't try to insert the values yourself; you
should use parametrised queries:
cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
if verbose:
print cSQL
db.execute(cSQL, (datasrc, search))
conn.commit()
It'll insert the values where the "?" are and will do any necessary
quoting itself. (Actually, some drivers use "?", others use "%s", so if
it doesn't work with one, try the other.)
The way you wrote it, it would fail if a value contained a "'". It's
that kind of thing that leads to SQL injection attacks.
Fixed.
You'll notice later on in the code I used the parameterized method for
INSERTS. I hate the look of that method, but it does make dealing with
apostrophes easier, and makes it safer as you say.
Thanks for the code review, RMAB. Good improvements.
--
https://mail.python.org/mailman/listinfo/python-list
Re: A fun python CLI program for all to enjoy!
On 5/6/2016 7:29 PM, Ethan Furman wrote: On 05/06/2016 04:12 PM, DFS wrote: On 5/6/2016 4:30 PM, MRAB wrote: If you don't want to use the 'with' statement, note that closing the file is: f.close() It needs the "()"! I used close() in 1 place, but close without parens in 2 other places. So it works either way. Good catch. No, it doesn't. `f.close` simple returns the close function, it doesn't call it. The "it works" was simply because Python closed the files for you later. Not a big deal in a small program like this, but still a mistake. Yes. Check out the answer by 'unutbu' here: http://stackoverflow.com/questions/1832528/is-close-necessary-when-using-iterator-on-a-python-file-object He says "I...checked /proc/PID/fd for when the file descriptor was closed. It appears that when you break out of the for loop, the file is closed for you." Improper f.close didn't seem to affect any of the files my program wrote - and I checked a lot of them when I was writing the code. Maybe it worked because the last time the file was written to was in a for loop, so I got lucky and the files weren't truncated? Don't know. Did you notice any other gotchas in the program? -- https://mail.python.org/mailman/listinfo/python-list
