Re: Coding help
On 2/23/22 17:02, Richard Pullin via Python-list wrote: I know next to nothing about computer coding nor Python. However, I am working on a mathematical challenge in which coding is required to calculate and generate different potential solutions. Can anyone help? If so, please private message me and we can discuss in more detail. Many thanks, Richard Sure What do you have in mind? -- https://mail.python.org/mailman/listinfo/python-list
Threading question .. am I doing this right?
I have a multi-threaded application (a web service) where several threads need data from an external database. That data is quite a lot, but it is almost always the same. Between incoming requests, timestamped records get added to the DB. So I decided to keep an in-memory cache of the DB records that gets only "topped up" with the most recent records on each request: from threading import Lock, Thread class MyCache(): def __init__(self): self.cache = None self.cache_lock = Lock() def _update(self): new_records = query_external_database() if self.cache is None: self.cache = new_records else: self.cache.extend(new_records) def get_data(self): with self.cache_lock: self._update() return self.cache my_cache = MyCache() # module level This works, but even those "small" queries can sometimes hang for a long time, causing incoming requests to pile up at the "with self.cache_lock" block. Since it is better to quickly serve the client with slightly outdated data than not at all, I came up with the "impatient" solution below. The idea is that an incoming request triggers an update query in another thread, waits for a short timeout for that thread to finish and then returns either updated or old data. class MyCache(): def __init__(self): self.cache = None self.thread_lock = Lock() self.update_thread = None def _update(self): new_records = query_external_database() if self.cache is None: self.cache = new_records else: self.cache.extend(new_records) def get_data(self): if self.cache is None: timeout = 10 # allow more time to get initial batch of data else: timeout = 0.5 with self.thread_lock: if self.update_thread is None or not self.update_thread.is_alive(): self.update_thread = Thread(target=self._update) self.update_thread.start() self.update_thread.join(timeout) return self.cache my_cache = MyCache() My question is: Is this a solid approach? Am I forgetting something? For instance, I believe that I don't need another lock to guard self.cache.append() because _update() can ever only run in one thread at a time. But maybe I'm overlooking something. -- https://mail.python.org/mailman/listinfo/python-list
Extract specific no of data from nedCDF files using python
Hi.I am learning python and I am working with some netCDF files. Suppose I have temperature data from 1950-2020 and I want data for only 1960-2015. How should I extract it. -- https://mail.python.org/mailman/listinfo/python-list
Re: Threading question .. am I doing this right?
On Fri, 25 Feb 2022 at 06:54, Robert Latest via Python-list wrote: > > I have a multi-threaded application (a web service) where several threads need > data from an external database. That data is quite a lot, but it is almost > always the same. Between incoming requests, timestamped records get added to > the DB. > > So I decided to keep an in-memory cache of the DB records that gets only > "topped up" with the most recent records on each request: Depending on your database, this might be counter-productive. A PostgreSQL database running on localhost, for instance, has its own caching, and data transfers between two apps running on the same computer can be pretty fast. The complexity you add in order to do your own caching might be giving you negligible benefit, or even a penalty. I would strongly recommend benchmarking the naive "keep going back to the database" approach first, as a baseline, and only testing these alternatives when you've confirmed that the database really is a bottleneck. > Since it is better to quickly serve the client with slightly outdated data > than > not at all, I came up with the "impatient" solution below. The idea is that an > incoming request triggers an update query in another thread, waits for a short > timeout for that thread to finish and then returns either updated or old data. > > class MyCache(): > def __init__(self): > self.cache = None > self.thread_lock = Lock() > self.update_thread = None > > def _update(self): > new_records = query_external_database() > if self.cache is None: > self.cache = new_records > else: > self.cache.extend(new_records) > > def get_data(self): > if self.cache is None: > timeout = 10 # allow more time to get initial batch of data > else: > timeout = 0.5 > with self.thread_lock: > if self.update_thread is None or not > self.update_thread.is_alive(): > self.update_thread = Thread(target=self._update) > self.update_thread.start() > self.update_thread.join(timeout) > > return self.cache > > my_cache = MyCache() > > My question is: Is this a solid approach? Am I forgetting something? For > instance, I believe that I don't need another lock to guard > self.cache.append() > because _update() can ever only run in one thread at a time. But maybe I'm > overlooking something. Hmm, it's complicated. There is another approach, and that's to completely invert your thinking: instead of "request wants data, so let's get data", have a thread that periodically updates your cache from the database, and then all requests return from the cache, without pinging the requester. Downside: It'll be requesting fairly frequently. Upside: Very simple, very easy, no difficulties debugging. How many requests per second does your service process? (By "requests", I mean things that require this particular database lookup.) What's average throughput, what's peak throughput? And importantly, what sorts of idle times do you have? For instance, if you might have to handle 100 requests/second, but there could be hours-long periods with no requests at all (eg if your clients are all in the same timezone and don't operate at night), that's a very different workload from 10 r/s constantly throughout the day. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Extract specific no of data from nedCDF files using python
> Hi.I am learning python and I am working with some netCDF files. > Suppose I have temperature data from 1950-2020 and I want data for > only 1960-2015. How should I extract it. -- Alternately, use https://unidata.github.io/netcdf4-python/ or gdal. It might also be possible to read https://stackoverflow.com/questions/14035148/import-netcdf-file-to-pandas-dataframe and using pandas dataframes to organize your data, if desired. Good luck! - DLD -- https://mail.python.org/mailman/listinfo/python-list
