Re: Coding help

2022-02-24 Thread Jack Dangler



On 2/23/22 17:02, Richard Pullin via Python-list wrote:

I know next to nothing about computer coding nor Python.

However, I am working on a mathematical challenge in which coding is
required to calculate and generate different potential solutions.

Can anyone help? If so, please private message me and we can discuss in
more detail.

Many thanks,
Richard


Sure

What do you have in mind?

--
https://mail.python.org/mailman/listinfo/python-list


Threading question .. am I doing this right?

2022-02-24 Thread Robert Latest via Python-list
I have a multi-threaded application (a web service) where several threads need
data from an external database. That data is quite a lot, but it is almost
always the same. Between incoming requests, timestamped records get added to
the DB.

So I decided to keep an in-memory cache of the DB records that gets only
"topped up" with the most recent records on each request:


from threading import Lock, Thread


class MyCache():
def __init__(self):
self.cache = None
self.cache_lock = Lock()

def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)

def get_data(self):
with self.cache_lock:
self._update()

return self.cache

my_cache = MyCache() # module level


This works, but even those "small" queries can sometimes hang for a long time,
causing incoming requests to pile up at the "with self.cache_lock" block.

Since it is better to quickly serve the client with slightly outdated data than
not at all, I came up with the "impatient" solution below. The idea is that an
incoming request triggers an update query in another thread, waits for a short
timeout for that thread to finish and then returns either updated or old data.

class MyCache():
def __init__(self):
self.cache = None
self.thread_lock = Lock()
self.update_thread = None

def _update(self):
new_records = query_external_database()
if self.cache is None:
self.cache = new_records
else:
self.cache.extend(new_records)

def get_data(self):
if self.cache is None:
timeout = 10 # allow more time to get initial batch of data
else:
timeout = 0.5
with self.thread_lock:
if self.update_thread is None or not self.update_thread.is_alive():
self.update_thread = Thread(target=self._update)
self.update_thread.start()
self.update_thread.join(timeout)

return self.cache

my_cache = MyCache()

My question is: Is this a solid approach? Am I forgetting something? For
instance, I believe that I don't need another lock to guard self.cache.append()
because _update() can ever only run in one thread at a time. But maybe I'm
overlooking something.

-- 
https://mail.python.org/mailman/listinfo/python-list


Extract specific no of data from nedCDF files using python

2022-02-24 Thread Smital Fulzele
Hi.I am learning python and I am working with some netCDF files. Suppose I have 
temperature data from 1950-2020 and I want data for only 1960-2015. How should 
I extract it. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Threading question .. am I doing this right?

2022-02-24 Thread Chris Angelico
On Fri, 25 Feb 2022 at 06:54, Robert Latest via Python-list
 wrote:
>
> I have a multi-threaded application (a web service) where several threads need
> data from an external database. That data is quite a lot, but it is almost
> always the same. Between incoming requests, timestamped records get added to
> the DB.
>
> So I decided to keep an in-memory cache of the DB records that gets only
> "topped up" with the most recent records on each request:

Depending on your database, this might be counter-productive. A
PostgreSQL database running on localhost, for instance, has its own
caching, and data transfers between two apps running on the same
computer can be pretty fast. The complexity you add in order to do
your own caching might be giving you negligible benefit, or even a
penalty. I would strongly recommend benchmarking the naive "keep going
back to the database" approach first, as a baseline, and only testing
these alternatives when you've confirmed that the database really is a
bottleneck.

> Since it is better to quickly serve the client with slightly outdated data 
> than
> not at all, I came up with the "impatient" solution below. The idea is that an
> incoming request triggers an update query in another thread, waits for a short
> timeout for that thread to finish and then returns either updated or old data.
>
> class MyCache():
> def __init__(self):
> self.cache = None
> self.thread_lock = Lock()
> self.update_thread = None
>
> def _update(self):
> new_records = query_external_database()
> if self.cache is None:
> self.cache = new_records
> else:
> self.cache.extend(new_records)
>
> def get_data(self):
> if self.cache is None:
> timeout = 10 # allow more time to get initial batch of data
> else:
> timeout = 0.5
> with self.thread_lock:
> if self.update_thread is None or not 
> self.update_thread.is_alive():
> self.update_thread = Thread(target=self._update)
> self.update_thread.start()
> self.update_thread.join(timeout)
>
> return self.cache
>
> my_cache = MyCache()
>
> My question is: Is this a solid approach? Am I forgetting something? For
> instance, I believe that I don't need another lock to guard 
> self.cache.append()
> because _update() can ever only run in one thread at a time. But maybe I'm
> overlooking something.

Hmm, it's complicated. There is another approach, and that's to
completely invert your thinking: instead of "request wants data, so
let's get data", have a thread that periodically updates your cache
from the database, and then all requests return from the cache,
without pinging the requester. Downside: It'll be requesting fairly
frequently. Upside: Very simple, very easy, no difficulties debugging.

How many requests per second does your service process? (By
"requests", I mean things that require this particular database
lookup.) What's average throughput, what's peak throughput? And
importantly, what sorts of idle times do you have? For instance, if
you might have to handle 100 requests/second, but there could be
hours-long periods with no requests at all (eg if your clients are all
in the same timezone and don't operate at night), that's a very
different workload from 10 r/s constantly throughout the day.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Extract specific no of data from nedCDF files using python

2022-02-24 Thread David Lowry-Duda
> Hi.I am learning python and I am working with some netCDF files. 
> Suppose I have temperature data from 1950-2020 and I want data for 
> only 1960-2015. How should I extract it. -- 

Alternately, use https://unidata.github.io/netcdf4-python/ or gdal.

It might also be possible to read
https://stackoverflow.com/questions/14035148/import-netcdf-file-to-pandas-dataframe
and using pandas dataframes to organize your data, if desired.

Good luck!

- DLD
-- 
https://mail.python.org/mailman/listinfo/python-list