best way to write large data-streams quickly?
Everyone, We are trying to architect a new system, which will have to take several large datastreams (total of ~200,000 parsed files per second) and place them in a database. I am trying to figure out the best way to import that sort of data into Postgres. I keep thinking i can't be the first to have this problem and there are common solutions but I can't find any. Does anyone know of some sort method, third party program, etc, that can accept data from a number of different sources, and push it into Postgres as fast as possible? Thanks in advance, Mark Moellering
Re: best way to write large data-streams quickly?
On Mon, Apr 9, 2018 at 12:01 PM, Steve Atkins wrote: > > > On Apr 9, 2018, at 8:49 AM, Mark Moellering com> wrote: > > > > Everyone, > > > > We are trying to architect a new system, which will have to take several > large datastreams (total of ~200,000 parsed files per second) and place > them in a database. I am trying to figure out the best way to import that > sort of data into Postgres. > > > > I keep thinking i can't be the first to have this problem and there are > common solutions but I can't find any. Does anyone know of some sort > method, third party program, etc, that can accept data from a number of > different sources, and push it into Postgres as fast as possible? > > Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the > benchmarks for different situations compared to COPY. > > Depending on what you're doing using custom code to parse your data and > then do multiple binary COPYs in parallel may be better. > > Cheers, > Steve > > > (fighting google slightly to keep from top-posting...) Thanks! How long can you run COPY? I have been looking at it more closely. In some ways, it would be simple just to take data from stdin and send it to postgres but can I do that literally 24/7? I am monitoring data feeds that will never stop and I don't know if that is how Copy is meant to be used or if I have to let it finish and start another one at some point? Thanks for everyones' help and input! Mark Moellering
db-connections (application architecture)
So, I am working on some system designs for a web application, and I wonder if there is any definitive answer on how to best connect to a postgres database. I could have it so that each time a query, or set of queries, for a particular request, needs to be run, a new connection is opened, queries are run, and then connection is closed / dropped. OR, I could create a persistent connection that will remain open as long as a user is logged in and then any queries are run against the open connection. I can see how, for only a few (hundreds to thousands) of users, the latter might make more sense but if I need to scale up to millions, I might not want all of those connections open. Any idea of how much time / overhead is added by opening and closing a connection everytime? Any and all information is welcome. Thanks in advance -- Mark M
Re: db-connections (application architecture)
Oh, excellent. I knew I was about to reinvent the wheel. Sometimes, there are just too many new things to keep up on. Thank you so much! On Thu, Nov 15, 2018 at 10:16 AM Adrian Klaver wrote: > On 11/15/18 7:09 AM, Mark Moellering wrote: > > So, I am working on some system designs for a web application, and I > > wonder if there is any definitive answer on how to best connect to a > > postgres database. > > > > I could have it so that each time a query, or set of queries, for a > > particular request, needs to be run, a new connection is opened, queries > > are run, and then connection is closed / dropped. > > > > OR, I could create a persistent connection that will remain open as long > > as a user is logged in and then any queries are run against the open > > connection. > > > > I can see how, for only a few (hundreds to thousands) of users, the > > latter might make more sense but if I need to scale up to millions, I > > might not want all of those connections open. > > > > Any idea of how much time / overhead is added by opening and closing a > > connection everytime? > > > > Any and all information is welcome. > > Connection pooling? > > In no particular order: > > https://pgbouncer.github.io/ > > http://www.pgpool.net/mediawiki/index.php/Main_Page > > > > > Thanks in advance > > > > -- Mark M > > > -- > Adrian Klaver > adrian.kla...@aklaver.com >
Re: Where **not** to use PostgreSQL?
I wish more people would ask this question, to me, it is the true mark of experience. In general, I think of PostgreSQL as the leading Relational Database. The farther you get away from relational data and relational queries, the more I would say, you should look for other products or solutions. But if you want to store relational data and then run queries over it, then stick with PostgreSQL. My 2 scents.. Mark On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly wrote: > On Thu, Feb 28, 2019 at 2:12 PM Chris Travers > wrote: > >> Where I did this on the TB scale, we had some sort of ranking but it was >> not based on ts_rank. >> >> On the PB scale systems I work on now, it is distributed, and we don't >> order in PostgreSQL (or anywhere else, though if someone wants to write to >> disk and sort, they can do this I guess) >> > > Thanks! > >>