Re: [Tutor] Help error 504

2015-08-26 Thread Peter Otten
Danny Yoo wrote:

> Unit tests that depend on external dependencies can be "flaky": they
> might fail for reasons that you don't anticipate.  I'd recommend not
> depending on an external web site like this: it puts load on someone
> else, which they might not appreciate.

If you have a well-defined error like an exception raised in a specific 
function by all means write a unit test that ensures that your code can 
handle it. Whether you use unittest.mock or the technique you describe below 
is a matter of taste.

Even if your ultimate goal is a comprehensive set of unit tests a tool like 
httbin has its merits as it may help you find out what actually happens in 
real life situtations. 

Example: Will a server error raise an exception in urlopen() or is it 
sometimes raised in the request.read() method?

Also, mocking can give you a false sense of security. coverage.py may report
100% coverage for a function like

def process(urlopen=urllib.request.urlopen):
result = []
for url in [1, 2]:
try:
r = urlopen(url)
result.append(r.read())
except urllib.error.HTTPError:
pass
return result

which will obviously fail once you release it into the wild.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] value range checker

2015-08-26 Thread Albert-Jan Roskam
Hello,


I have a written a function checks the validity of values. The ranges of valid 
values are stored in a database table.

Such a table contains three columns: category, min and max. One record of such 
a table specifies the range for

a certain category, but a category may be spread out over multiple records.


For example, the category-min-max tuples

("cat a", 1, 1), 

("cat a", 3, 3), 

("cat a", 6, 10),

correspond to a range of category A of 1-1, 3-3, 6-10, which is the same as 1, 
and 3, and 6, 7, 8, 9, 10.


The code below does exactly what I want:


import collections
import bisect
import math
import pprint


def get_valid_value_lookup(records):
"""
Translates value range information (from a database table)
into a dictionary of the form {: []}
"""
Boundaries = collections.namedtuple("Boundaries", "category min max")
records = [Boundaries(*record) for record in records]
boundaries = collections.defaultdict(list)
crap = [boundaries[record.category].__iadd__(range(record.min, record.max + 
1))
for record in records]
return dict(boundaries)


def is_valid(lookup, category, value):
"""Return True if value is member of a list of a given category, False 
otherwise."""
try:
return value in lookup[category]
except KeyError:
raise KeyError("Invalid category: %r" % category)


def is_valid2(lookup, category, value):
"""Return True if value is member of a list of a given category, False 
otherwise."""
# this version also knows how to deal with floats.

try:

 L = lookup[category]
except KeyError:
raise KeyError("Invalid category: %r" % category)

adjusted_value = value if int(value) in (L[0], 0, L[-1]) else 
math.ceil(value)
try:
chopfunc = bisect.bisect_right if value < L[0] else bisect.bisect_left
return L[chopfunc(L, value)] == adjusted_value
except IndexError:
return False


if __name__ == '__main__':
L = [("cat a", 1, 1), ("cat a", 3, 3), ("cat a", 6, 10),
 ("cat b", 1, 9), ("cat c", 1, 2), ("cat c", 5, 9)]
lookup = get_valid_value_lookup(L)
assert not is_valid(lookup, "cat a", 999) # value 999 is invalid for 
category 1
assert is_valid(lookup, "cat a", 10)
assert not is_valid2(lookup, "cat a", 0.1)
assert not is_valid2(lookup, "cat a", -1)
assert is_valid2(lookup, "cat a", 6.1)


L2 = [(1, -5, 1), (1, 3, 3), (1, 6, 10), (2, 1, 9), (3, 1, 2), (3, 5, 9)]
lookup = get_valid_value_lookup(L2)
assert is_valid2(lookup, 1, -4.99)
assert is_valid2(lookup, 1, -5)


My questions:

[1] @ is_valid: is there a better way to do this? I mostly don't like the use 
of the __iadd__ dunder method. 

[2] @ is_valid2: Perhaps an answer to my previous question. Is this a better 
approach? 

[3] I am inheriting a this system. It feels a bit strange that these range 
check values are stored in a database. 

Would yaml be a better choice? Some of the tables are close to 200 records.


Thank you in advance!


Albert-Jan
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] value range checker

2015-08-26 Thread Alan Gauld

On 26/08/15 14:19, Albert-Jan Roskam wrote:


I have a written a function checks the validity of values.

> The ranges of valid values are stored in a database table.

That's an unusual choice because:

1) using a database normally only makes sense in the case
where you are already using the database to store the
other data. But in that case you would normally get
validation done using a database constraint.

2) For small amounts of data the database introduces
a significant overhead. Databases are good for handling
large amounts of data.

3) A database is rather inflexible since you need to
initialise it, create it, etc. Which limits the number
of environments where it can be used.


Such a table contains three columns: category, min and max. ...
a category may be spread out over multiple records.


And searching multiple rows is even less efficient.


Would yaml be a better choice? Some of the tables are close to 200 records.


Mostly I wouldn't use a data format per-se (except for
persistence between sessions). I'd load the limits into
a Python set and let the validation be a simple member-of check.

Unless you are dealing with large ranges rather than sets
of small ranges. Even with complex options I'd still
opt for a two tier data structure. But mostly I'd query
any design that requires a lot of standalone data validation.
(Unless its function is to be a bulk data loader or similar.)
I'd probably be looking to having the data stored as
objects that did their own validation at creation/modification
time.

If I was doing a bulk data loader/checker I'd probably create
a validation function for each category and add it to a
dictionary. So I'd write a make_validator() function that
took the validation data and created a specific validator
function for that category. Very simple example:

def make_validator(min, max, *values):
def validate(value):
return (min <= value <= max) or value in *values)
return validator
...
for category in categories:
lookup[category] = make_validator(min,max, valueList)
...
if lookup[category](my_value):
   # process valid value
else:
   raise ValueError

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] How should my code handle db connections? Should my db manager module use OOP?

2015-08-26 Thread boB Stepp
My ongoing project will be centered around an SQLite db.  Since almost
all data needed by the program will be stored in this db, my thought
is that I should create a connection to this db shortly after program
startup and keep this connection open until program closure.  I am
assuming that opening and closing a db connection has enough overhead
that I should only do this once.  But I do not *know* that this is
true.  Is it?  If not, then the alternative would make more sense,
i.e., open and close the db as needed.

In the first iteration of my project, my intent is to create and
populate the db with tables external to the program.  The program will
only add entries to tables, query the db, etc.  That is, the structure
of the db will be pre-set outside of the program, and the program will
only deal with data interactions with the db.  My intent is to make
the overall design of the program OO, but I am wondering how to handle
the db manager module.  Should I go OO here as well?  With each
pertinent method handling a very specific means of interacting with
the db?  Or go a procedural route with functions similar to the
aforementioned methods?  It is not clear to me that OOP provides a
real benefit here, but, then again, I am learning how to OOP during
this project as well, so I don't have enough knowledge yet to
realistically answer this question.

TIA!
-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How should my code handle db connections? Should my db manager module use OOP?

2015-08-26 Thread Steven D'Aprano
On Wed, Aug 26, 2015 at 07:11:42PM -0500, boB Stepp wrote:

> My ongoing project will be centered around an SQLite db.  Since almost
> all data needed by the program will be stored in this db, my thought
> is that I should create a connection to this db shortly after program
> startup and keep this connection open until program closure.

If you do this, you will (I believe) hit at least three problems:

- Now only one program can access the DB at a time. Until the first 
  program closes, nobody else can open it.

- Your database itself is vulnerable to corruption. SQLite is an easy to 
  use database, but it doesn't entirely meet the ACID requirements of a 
  real DB.

- If your database lives on a NTFS partition, which is very common for
  Linux/Unix users, then if your program dies, the database will very 
  likely be broken.

I don't have enough experience with SQLite directly to be absolutely 
sure of these things, but Firefox uses SQLite for a bunch of things that 
(in my opinion) don't need to be in a database, and it suffers from 
these issues, especially on Linux when using NTFS. For example, if 
Firefox dies, when you restart you may lose all your bookmarks, history, 
and most bizarrely of all, the back button stops working.


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How should my code handle db connections? Should my db manager module use OOP?

2015-08-26 Thread Zachary Ware
On Aug 26, 2015 9:03 PM, "Steven D'Aprano"  wrote:
> - If your database lives on a NTFS partition, which is very common for
>   Linux/Unix users

> these issues, especially on Linux when using NTFS.

Surely you mean NFS, as in Network FileSystem, rather than NTFS as in New
Technology FileSystem? :)

--
Zach
(On a phone)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How should my code handle db connections? Should my db manager module use OOP?

2015-08-26 Thread Martin A. Brown


Hi there,


My ongoing project will be centered around an SQLite db.


Not a bad way to start.  There are many possible ways to access SQL 
DBs.  I'll talk about one of my favorites, since I'm a big fan of 
sqlalchemy [0], which provides a broad useful toolkit for dealing 
with SQL DBs and an abstraction layer.


To start, often the question is why any such abstraction tool, given 
the additional complexity of a module, a.k.a. another layer of code?


Briefly, my main two reasons:

  A) abstraction of data model from SQL implementation for the
 Python program (allows switching from SQLite another DBAPI,
 e.g. postgres, later with a minimum effort)
  B) somebody has already implemented the tricky bits, such as ORMs
 (see below), failover, connection pooling (see below) and
 other DB-specific features

Since almost all data needed by the program will be stored in this 
db, my thought is that I should create a connection to this db 
shortly after program startup and keep this connection open until 
program closure.


That is one possible approach.  But, consider using a "connection 
pooling" technique that somebody else has already implemented and 
tested.  This saves your time for working on the logic of your 
program.


There are many different pooling strategies, which include things 
like "Use only one connection at a time." or "Connect on demand." or 
"Hold a bunch of connections open and let me use one when I need 
one, and I'll release it when I'm done." and even "When the 
connection fails, retry quietly in the background until a successful 
connection can be re-established."


I am assuming that opening and closing a db connection has enough 
overhead that I should only do this once.  But I do not *know* 
that this is true.  Is it?  If not, then the alternative would 
make more sense, i.e., open and close the db as needed.


Measure, measure, measure.  Profile it before coming to such a 
conclusion.  You may be correct, but, it behooves you to measure. 
(My take on an old computing adage:  Premature optimization can lead 
you down unnecessarily painful or time consuming paths.)


N.B.  Only you (or your development cohort) can anticipate the load 
on the DB, the growth of records (i.e. data set size), the growth of 
the complexity of the project, or the user count.  So, even if the 
measurements tell you one thing, be sure to consider the longer-term 
plan for the data and application.


Also, see Steven D'Aprano's comments about concurrency and other 
ACIDic concerns.


In the first iteration of my project, my intent is to create and 
populate the db with tables external to the program.  The program 
will only add entries to tables, query the db, etc.  That is, the 
structure of the db will be pre-set outside of the program, and 
the program will only deal with data interactions with the db.


If the structure of the DB is determined outside the program, 
this sounds like a great reason to use an Object Relational 
Modeler (ORM).  An ORM which supports reflection (sqlalchemy 
does) can create Pythonic objects for you.


My intent is to make the overall design of the program OO, but I 
am wondering how to handle the db manager module.  Should I go OO 
here as well?  With each pertinent method handling a very specific 
means of interacting with the db?  Or go a procedural route with 
functions similar to the aforementioned methods?  It is not clear 
to me that OOP provides a real benefit here, but, then again, I am 
learning how to OOP during this project as well, so I don't have 
enough knowledge yet to realistically answer this question.


I'm not sure I can weigh in intelligently here (OOP v. procedural), 
but I'd guess that you could get that Object-Oriented feel by taking 
advantage of an ORM, rather than writing one yourself.  Getting used 
to the idea of an ORM can be tricky, but if you can get reflection 
working [1], I think you will be surprised at how quickly your 
application logic (at the business layer) comes together and you can 
(mostly) stop worrying about things like connection logic and SQL 
statements executing from your Python program [2].


There probably are a few people on this list who have used 
sqlalchemy and are competent to answer it, but if you have questions 
specifically about sqlalchemy, you might find better answers on 
their mailing list [3].


Now, back to the beginnings...a SQLite DB is a fine place to start 
if you have only one thread/user/program accessing the data at any 
time.  Don't host it on a network(ed) file system if you have the 
choice.  If your application grows so much in usage or volume that 
it needs a new and different DB, consider it all a success and 
migrate accordingly.


Best of luck,

-Martin

 [0] http://www.sqlalchemy.org/
 [1] http://docs.sqlalchemy.org/en/rel_1_0/core/reflection.html
 [2] Here, naturally, I'm assuming that you know your way around
 SQL, since you are asserting that the DB already exists, is
 mai