> > Nope. a large OFFSET of N will read through N rows, regardless index > coverage. see http://www.postgresql.org/docs/9.1/static/queries-limit.html >
That's simple not true. If you define a Order By with a well indexes query, the database will only do a bitmap scan. This wiki isn't well explained. Take this: > https://wiki.postgresql.org/images/3/35/Pagination_Done_the_PostgreSQL_Way.pdf The problem relies on the django paginator, that builds OFFSET queries that > are unefficient by design. There are alternatives such as ordering by > primary key and doing SELECT * from auth_user WHERE id > latest_id_returned > . that "latest_id_returned" has to be passed back and forth to the client > to keep the pagination hapening. Too bad you can't do "jump to page 15" > like the django admin does. > Paginator needs to use OFFSET, which is slower than your seek method, but SEEK has certain limitations. As you already said. Index and Order By to the Rescue, which makes OFFSET as fast as possible. WITH INDEX, ORDER BY and maybe some READ Slaves or even if you Shard your data the result is as good as the SEEK method. read the relevant theread I posted before. There are huge memory allocation > issues even if you just iterate the first two items of a query that > potentially returns a million rows: all the rows will be fetched and stored > in memory by psycopg and django ORM > Thats only happening since django will built something like "SELECT "bla", "bla", "bla" FROM table LIMIT 10 OFFSET 100; There is no ORDER BY clause, which is of course really slow, even with only 1000 entries inside a table. ...except when the cache is bigger than your entire VM. > > In this case if your dataset is , say, 10 TB, you will need roughly 10 TB > of memory and 10 TB of data transferred between your django app and your > postgres server. It is not a matter of optimization: is mere survival. > Okai just as I said, If you deal with that amount of data u certainly will use READ Slaves AND / OR partition your data. > Let's first understand what's needed, than we can decide if it has a story > inside the django core > The only thing that could be improved is that we should force a ORDER BY id clause here and there. The index on the automatic id is already there. I don't know why we should enforce something just because somebody has a huge amount of data and can't deal with it. That wouldn't follow the clean design pattern. Most people rely on the "Page to" part of the Paginator, if we patch django to use a seek method as a default it wouldn't be that good for a lot of developers. The most people that have a performance problem could easily "fix" that by themself or throwing more hardware at it. -- You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscr...@googlegroups.com. To post to this group, send email to django-developers@googlegroups.com. Visit this group at http://groups.google.com/group/django-developers. To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAPDLAU5HNtE%2BDfSxR7FcgdtAN6U3bSG7AVtGYhT7LLZUdY0Fjw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.