best way to write large data-streams quickly?

2018-04-09 Thread Mark Moellering
Everyone,

We are trying to architect a new system, which will have to take several
large datastreams (total of ~200,000 parsed files per second) and place
them in a database.  I am trying to figure out the best way to import that
sort of data into Postgres.

I keep thinking i can't be the first to have this problem and there are
common solutions but I can't find any.  Does anyone know of some sort
method, third party program, etc, that can accept data from a number of
different sources, and push it into Postgres as fast as possible?

Thanks in advance,

Mark Moellering


Re: best way to write large data-streams quickly?

2018-04-10 Thread Mark Moellering
On Mon, Apr 9, 2018 at 12:01 PM, Steve Atkins  wrote:

>
> > On Apr 9, 2018, at 8:49 AM, Mark Moellering  com> wrote:
> >
> > Everyone,
> >
> > We are trying to architect a new system, which will have to take several
> large datastreams (total of ~200,000 parsed files per second) and place
> them in a database.  I am trying to figure out the best way to import that
> sort of data into Postgres.
> >
> > I keep thinking i can't be the first to have this problem and there are
> common solutions but I can't find any.  Does anyone know of some sort
> method, third party program, etc, that can accept data from a number of
> different sources, and push it into Postgres as fast as possible?
>
> Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the
> benchmarks for different situations compared to COPY.
>
> Depending on what you're doing using custom code to parse your data and
> then do multiple binary COPYs in parallel may be better.
>
> Cheers,
>   Steve
>
>
>
(fighting google slightly to keep from top-posting...)

Thanks!

How long can you run COPY?  I have been looking at it more closely.  In
some ways, it would be simple just to take data from stdin and send it to
postgres but can I do that literally 24/7?  I am monitoring data feeds that
will never stop and I don't know if that is how Copy is meant to be used or
if I have to let it finish and start another one at some point?

Thanks for everyones' help and input!

Mark Moellering


db-connections (application architecture)

2018-11-15 Thread Mark Moellering
So, I am working on some system designs for a web application, and I wonder
if there is any definitive answer on how to best connect to a postgres
database.

I could have it so that each time a query, or set of queries, for a
particular request, needs to be run, a new connection is opened, queries
are run, and then connection is closed / dropped.

OR, I could create a persistent connection that will remain open as long as
a user is logged in and then any queries are run against the open
connection.

I can see how, for only a few (hundreds to thousands) of users, the latter
might make more sense but if I need to scale up to millions, I might not
want all of those connections open.

Any idea of how much time / overhead is added by opening and closing a
connection everytime?

Any and all information is welcome.

Thanks in advance

-- Mark M


Re: db-connections (application architecture)

2018-11-15 Thread Mark Moellering
Oh, excellent.  I knew I was about to reinvent the wheel.
Sometimes, there are just too many new things to keep up on.

Thank you so much!

On Thu, Nov 15, 2018 at 10:16 AM Adrian Klaver 
wrote:

> On 11/15/18 7:09 AM, Mark Moellering wrote:
> > So, I am working on some system designs for a web application, and I
> > wonder if there is any definitive answer on how to best connect to a
> > postgres database.
> >
> > I could have it so that each time a query, or set of queries, for a
> > particular request, needs to be run, a new connection is opened, queries
> > are run, and then connection is closed / dropped.
> >
> > OR, I could create a persistent connection that will remain open as long
> > as a user is logged in and then any queries are run against the open
> > connection.
> >
> > I can see how, for only a few (hundreds to thousands) of users, the
> > latter might make more sense but if I need to scale up to millions, I
> > might not want all of those connections open.
> >
> > Any idea of how much time / overhead is added by opening and closing a
> > connection everytime?
> >
> > Any and all information is welcome.
>
> Connection pooling?
>
> In no particular order:
>
> https://pgbouncer.github.io/
>
> http://www.pgpool.net/mediawiki/index.php/Main_Page
>
> >
> > Thanks in advance
> >
> > -- Mark M
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>


Re: Where **not** to use PostgreSQL?

2019-02-28 Thread Mark Moellering
I wish more people would ask this question, to me, it is the true mark of
experience.  In general, I think of PostgreSQL as the leading Relational
Database.  The farther you get away from relational data and relational
queries, the more I would say, you should look for other products or
solutions.  But if you want to store relational data and then run queries
over it, then stick with PostgreSQL.

My 2 scents..

Mark

On Thu, Feb 28, 2019 at 8:28 AM Nicolas Grilly 
wrote:

> On Thu, Feb 28, 2019 at 2:12 PM Chris Travers 
> wrote:
>
>> Where I did this on the TB scale, we had some sort of ranking but it was
>> not based on ts_rank.
>>
>> On the PB scale systems I work on now, it is distributed, and we don't
>> order in PostgreSQL (or anywhere else, though if someone wants to write to
>> disk and sort, they can do this I guess)
>>
>
> Thanks!
>
>>