Slow index scan backward.

2018-03-05 Thread John van Breda
I have a single table with 45 columns and 6.5 million records with a roughly
random distribution of data of a variety of types. I am trying to implement
a data table with pagination in a web user interface, where the data table
can be filtered in a very flexible way on pretty much any combination of
columns. I have indexes covering the most frequently filtered columns. The
data table shows 30 records at a time, sorted to put the most recent records
first. The database version is 9.3.5.

 

The problem occurs if I have a filter which results in less than 30 records,
or where the 30 records that are returned are distributed through the
dataset (it's OK if the page of 30 records are found in the relatively
recent records). Basically, because of the ORDER BY id DESC LIMIT 30 the
query planner is opting to use an index scan backward on the primary key,
applying a filter to each record, until it finds 30 records. If it finds
these relatively quickly then all is good. However sometimes the filter
results in < 30 records in the final result set, in which case the index
scan runs through the whole table and takes several minutes. A better plan
in these cases would be to use the indexes available on the other fields to
limit the results set, then filter, sort and limit. But, the planner is
presumably not able to work this out because the statistics aren't detailed
enough. 

 

Here's the table schema:

 

-- Table: cache_occurrences_functional

 

-- DROP TABLE cache_occurrences_functional;

 

CREATE TABLE cache_occurrences_functional

(

  id integer NOT NULL,

  sample_id integer,

  website_id integer,

  survey_id integer,

  input_form character varying,

  location_id integer,

  location_name character varying,

  public_geom geometry(Geometry,900913),

  map_sq_1km_id integer,

  map_sq_2km_id integer,

  map_sq_10km_id integer,

  date_start date,

  date_end date,

  date_type character varying(2),

  created_on timestamp without time zone,

  updated_on timestamp without time zone,

  verified_on timestamp without time zone,

  created_by_id integer,

  group_id integer,

  taxa_taxon_list_id integer,

  preferred_taxa_taxon_list_id integer,

  taxon_meaning_id integer,

  taxa_taxon_list_external_key character varying(50),

  family_taxa_taxon_list_id integer,

  taxon_group_id integer,

  taxon_rank_sort_order integer,

  record_status character(1),

  record_substatus smallint,

  certainty character(1),

  query character(1),

  sensitive boolean,

  release_status character(1),

  marine_flag boolean,

  data_cleaner_result boolean,

  media_count integer DEFAULT 0,

  training boolean NOT NULL DEFAULT false,

  zero_abundance boolean,

  licence_id integer,

  location_id_vice_county integer,

  location_id_lrc_boundary integer,

  location_id_country integer,

  identification_difficulty integer, -- Identification difficulty assigned
by the data_cleaner module, on a scale from 1 (easy) to 5 (difficult)

  import_guid character varying, -- Globally unique identifier of the import
batch.

  confidential boolean DEFAULT false,

  external_key character varying,

  CONSTRAINT pk_cache_occurrences_functional PRIMARY KEY (id)

)

WITH (

  OIDS=FALSE

);

ALTER TABLE cache_occurrences_functional

  OWNER TO indicia_user;

GRANT ALL ON TABLE cache_occurrences_functional TO indicia_user;

GRANT SELECT ON TABLE cache_occurrences_functional TO indicia_report_user;

GRANT SELECT ON TABLE cache_occurrences_functional TO naturespot;

GRANT SELECT ON TABLE cache_occurrences_functional TO brc_read_only;

COMMENT ON COLUMN cache_occurrences_functional.identification_difficulty IS
'Identification difficulty assigned by the data_cleaner module, on a scale
from 1 (easy) to 5 (difficult)';

COMMENT ON COLUMN cache_occurrences_functional.import_guid IS 'Globally
unique identifier of the import batch.';

 

 

-- Index: ix_cache_occurrences_functional_created_by_id

 

-- DROP INDEX ix_cache_occurrences_functional_created_by_id;

 

CREATE INDEX ix_cache_occurrences_functional_created_by_id

  ON cache_occurrences_functional

  USING btree

  (created_by_id);

 

-- Index: ix_cache_occurrences_functional_date_end

 

-- DROP INDEX ix_cache_occurrences_functional_date_end;

 

CREATE INDEX ix_cache_occurrences_functional_date_end

  ON cache_occurrences_functional

  USING btree

  (date_end);

 

-- Index: ix_cache_occurrences_functional_date_start

 

-- DROP INDEX ix_cache_occurrences_functional_date_start;

 

CREATE INDEX ix_cache_occurrences_functional_date_start

  ON cache_occurrences_functional

  USING btree

  (date_start);

 

-- Index: ix_cache_occurrences_functional_family_taxa_taxon_list_id

 

-- DROP INDEX ix_cache_occurrences_functional_family_taxa_taxon_list_id;

 

CREATE INDEX ix_cache_occurrences_functional_family_taxa_taxon_list_id

  ON cache_occurrences_functional

  USING btree

  (family_taxa_taxon_list_id);

 

-- Index: ix_cache_occurrences_functional_group_id

 

-- DROP INDEX ix_cache_occu

GIST index (polygon, point)

2018-03-05 Thread ghiureai

Hi List,

I have a short description bellow from Dev team regarding the behaviour 
of gist index on the polygon column, looking to get some  feedback  from 
you:


" I was expecting the <@(point,polygon) and @>(polygon,point) to be 
indexable but they are not. see bellow query output ,
the column is a polygon and the index is a gist index on the polygon 
column; my understanding of the above query is that it says which 
operators would cause that index to be used


This SQL shows which operators are indexable:SELECT
 pg_get_indexdef(ss.indexrelid, (ss.iopc).n, TRUE) AS index_col,
 amop.amopopr::regoperator AS indexable_operator
FROM pg_opclass opc, pg_amop amop,
 (SELECT indexrelid, information_schema._pg_expandarray(indclass) AS iopc
  FROM pg_index
  WHERE indexrelid = 'caom2.Plane_energy_ib'::regclass) ss
WHERE amop.amopfamily = opc.opcfamily AND opc.oid = (ss.iopc).x
ORDER BY (ss.iopc).n, indexable_operator;

We run  the SQL  in PG 9.5.3 and PG 10.2 we  the same result: only 
polygon vs polygon is indexable (except the last entry which is distance 
operator).


The work around for us was to change interval-contains-value from 
polygon-contains-point (@> or <@ operator) to 
polygn-intersects-really-small-polygon (&&) in order to use the index, 
but I was quite surprised that contains operators are not indexable!


Note that this is using the built in polygon and not pgsphere (spoly)"


thank you

Isabella