Partial index creation always scans the entire table

2020-02-15 Thread MingJu Wu
Hello,

When creating partial indexes, can postgres utilize another index for
figuring which rows should be included in the partial index, without
performing a full table scan?

My scenario is that I have a table with 50M rows that are categorized into
10K categories. I need to create a partial index for each category. I have
created a index on the category column, hoping that postgres can use this
information when creating the partial indexes. However, postgres always
performs full table scan.

I've tested with PostgreSQL 12.2. Below is an example setup showing the
problem.

TEST1 shows that building a full index covering all rows takes 18 seconds.

TEST2 shows that creating a partial index for one of the category1
(category1=1) takes 3 seconds. This means that for creating 10K partial
indexes for each category, it will take over 8 hours. Compared to just 18
seconds in TEST1, it is much longer due to repeated full table scans.

TEST3 shows that even with another index (index_category2 created in SETUP)
covering category2, creating a partial index for one of the category2
(category2=1) still takes 3 seconds. I think postgres is still doing a full
table scan here.

My question is: can postgres utilize index_category2 is TEST3?

Thank you.

-
-- SETUP
-

CREATE TABLE test_data (
id bigint PRIMARY KEY,
category1 bigint,
category2 bigint
);


INSERT INTO test_data(id, category1, category2)
SELECT id, category, category FROM (
SELECT
generate_series(1, 5000) AS id,
(random()*1)::bigint AS category
) q;
--  Query returned successfully in 1 min 47 secs.

CREATE INDEX index_category2 ON test_data(category2);
-- Query returned successfully in 32 secs 347 msec.


--
-- TEST1: CREATE FULL INDEX
--

CREATE INDEX index_full ON test_data(id);
-- Query returned successfully in 18 secs 713 msec.


--
-- TEST2: CREATE PARTIAL INDEX, using category1
--

CREATE INDEX index_partial_1 ON test_data(id) WHERE category1=1;
-- Query returned successfully in 3 secs 523 msec.


--
-- TEST3: CREATE PARTIAL INDEX, using category2
--

CREATE INDEX index_partial_2 ON test_data(id) WHERE category2=1;
-- Query returned successfully in 3 secs 651 msec.


--- END ---


Re: Partial index creation always scans the entire table

2020-02-15 Thread Sergei Kornilov
Hello

> When creating partial indexes, can postgres utilize another index for 
> figuring which rows should be included in the partial index, without 
> performing a full table scan?

No.
create index always perform a seqscan on table. And two full table scan for 
create index concurrently.

regards, Sergei




Re: Partial index creation always scans the entire table

2020-02-15 Thread Justin Pryzby
On Sat, Feb 15, 2020 at 07:04:48PM +0800, MingJu Wu wrote:
> Hello,
> 
> When creating partial indexes, can postgres utilize another index for
> figuring which rows should be included in the partial index, without
> performing a full table scan?
> 
> My scenario is that I have a table with 50M rows that are categorized into
> 10K categories. I need to create a partial index for each category. I have
> created a index on the category column, hoping that postgres can use this
> information when creating the partial indexes. However, postgres always
> performs full table scan.
> 
> I've tested with PostgreSQL 12.2. Below is an example setup showing the

I don't think it's possible, and an index scan wouldn't necessarily be faster,
anyway, since the reads might be unordered rather than sequantial, and might
hit large fractions of the table even though only returning a fraction of its
tuples.

But have you thought about partitioning on category rather than partial
indexes?  Possibly hash partition of (category).  If your queries usually
include category_id=X, that might be a win for performance anyway, since tables
can now be read sequentially rather than scannned by index (again, probably out
of order).

-- 
Justin




Re: Partial index creation always scans the entire table

2020-02-15 Thread Laurenz Albe
On Sat, 2020-02-15 at 19:04 +0800, MingJu Wu wrote:
> When creating partial indexes, can postgres utilize another index for 
> figuring which rows
> should be included in the partial index, without performing a full table scan?

No; it has to be a full sequential scan.

> My scenario is that I have a table with 50M rows that are categorized into 
> 10K categories.
> I need to create a partial index for each category. I have created a index on 
> the category
> column, hoping that postgres can use this information when creating the 
> partial indexes.
> However, postgres always performs full table scan.

There is your problem.

You don't need a partial index per category, you need a single index that 
*contains* the category.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com