Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread balasubramanian c r
HI Tom/Ninad My bad I didn't explain my use case properly. The use case is to find the best string similarity for a given address against the list of addresses in the table. Initially I tried a similarity function provided by the pg_trgm extension. But the similarity scores were not satisfactory. L

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Tom Lane
Michael Lewis writes: > This is showing many false positives from the index scan that get removed > when the actual values are examined. With such a long search parameter, > that does not seem surprising. I would expect a search on "raj nagar > ghaziabad 201017" or something like that to yield far

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
I see an issue with the operator. GIN index is capable of performing index scan and index-only scan. Regards, Ninad Shah On Thu, 2 Sept 2021 at 20:57, Michael Lewis wrote: > This is showing many false positives from the index scan that get removed > when the actual values are examined. With su

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
Try the pg_tgrm extension. It is a rich set of operators. Regards, Ninad Shah On Thu, 2 Sept 2021 at 23:39, balasubramanian c r wrote: > HI Ninad > > Thanks for your reply. > If bitmap index should not be used. Do i need to disable it for the time > being and carry out the test. > > The docume

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread balasubramanian c r
HI Ninad Thanks for your reply. If bitmap index should not be used. Do i need to disable it for the time being and carry out the test. The documentation in pg_similarity shows that index can be created on text column using gin with gin_similarity_ops. The same way the index is created like CREATE

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Michael Lewis
This is showing many false positives from the index scan that get removed when the actual values are examined. With such a long search parameter, that does not seem surprising. I would expect a search on "raj nagar ghaziabad 201017" or something like that to yield far fewer results from the index s

Re: Querying a table with jaccard similarity with 1.6 million records take 12 seconds

2021-09-02 Thread Ninad Shah
Hi Bala, Are your statistics updated? Also, have you used the gin operator(gin_similarity_ops) correctly? It is fetching just 6 records out of a million, hence, it should not go for bitmap index scan. As bitmap index scan loads a complete index, and access relevant pages from the table later by b