Partial index creation always scans the entire table
Hello, When creating partial indexes, can postgres utilize another index for figuring which rows should be included in the partial index, without performing a full table scan? My scenario is that I have a table with 50M rows that are categorized into 10K categories. I need to create a partial index for each category. I have created a index on the category column, hoping that postgres can use this information when creating the partial indexes. However, postgres always performs full table scan. I've tested with PostgreSQL 12.2. Below is an example setup showing the problem. TEST1 shows that building a full index covering all rows takes 18 seconds. TEST2 shows that creating a partial index for one of the category1 (category1=1) takes 3 seconds. This means that for creating 10K partial indexes for each category, it will take over 8 hours. Compared to just 18 seconds in TEST1, it is much longer due to repeated full table scans. TEST3 shows that even with another index (index_category2 created in SETUP) covering category2, creating a partial index for one of the category2 (category2=1) still takes 3 seconds. I think postgres is still doing a full table scan here. My question is: can postgres utilize index_category2 is TEST3? Thank you. - -- SETUP - CREATE TABLE test_data ( id bigint PRIMARY KEY, category1 bigint, category2 bigint ); INSERT INTO test_data(id, category1, category2) SELECT id, category, category FROM ( SELECT generate_series(1, 5000) AS id, (random()*1)::bigint AS category ) q; -- Query returned successfully in 1 min 47 secs. CREATE INDEX index_category2 ON test_data(category2); -- Query returned successfully in 32 secs 347 msec. -- -- TEST1: CREATE FULL INDEX -- CREATE INDEX index_full ON test_data(id); -- Query returned successfully in 18 secs 713 msec. -- -- TEST2: CREATE PARTIAL INDEX, using category1 -- CREATE INDEX index_partial_1 ON test_data(id) WHERE category1=1; -- Query returned successfully in 3 secs 523 msec. -- -- TEST3: CREATE PARTIAL INDEX, using category2 -- CREATE INDEX index_partial_2 ON test_data(id) WHERE category2=1; -- Query returned successfully in 3 secs 651 msec. --- END ---
Re: Partial index creation always scans the entire table
Hello > When creating partial indexes, can postgres utilize another index for > figuring which rows should be included in the partial index, without > performing a full table scan? No. create index always perform a seqscan on table. And two full table scan for create index concurrently. regards, Sergei
Re: Partial index creation always scans the entire table
On Sat, Feb 15, 2020 at 07:04:48PM +0800, MingJu Wu wrote: > Hello, > > When creating partial indexes, can postgres utilize another index for > figuring which rows should be included in the partial index, without > performing a full table scan? > > My scenario is that I have a table with 50M rows that are categorized into > 10K categories. I need to create a partial index for each category. I have > created a index on the category column, hoping that postgres can use this > information when creating the partial indexes. However, postgres always > performs full table scan. > > I've tested with PostgreSQL 12.2. Below is an example setup showing the I don't think it's possible, and an index scan wouldn't necessarily be faster, anyway, since the reads might be unordered rather than sequantial, and might hit large fractions of the table even though only returning a fraction of its tuples. But have you thought about partitioning on category rather than partial indexes? Possibly hash partition of (category). If your queries usually include category_id=X, that might be a win for performance anyway, since tables can now be read sequentially rather than scannned by index (again, probably out of order). -- Justin
Re: Partial index creation always scans the entire table
On Sat, 2020-02-15 at 19:04 +0800, MingJu Wu wrote: > When creating partial indexes, can postgres utilize another index for > figuring which rows > should be included in the partial index, without performing a full table scan? No; it has to be a full sequential scan. > My scenario is that I have a table with 50M rows that are categorized into > 10K categories. > I need to create a partial index for each category. I have created a index on > the category > column, hoping that postgres can use this information when creating the > partial indexes. > However, postgres always performs full table scan. There is your problem. You don't need a partial index per category, you need a single index that *contains* the category. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
