Why the index is not used ?
Hi I would like to submit the following problem to the PostgreSQL community. In my company, we have data encryption needs. So I decided to use the following procedure : (1)Creating a table with a bytea type column to store the encrypted data CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username VARCHAR(100), cc bytea); (2)inserting encrypted data INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, cipher-algo=aes256') FROM generate_series(1,10) AS x(id); (3)Querying the table SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; pgp_sym_decrypt - test value 32 (1 row) Time: 115735.035 ms (01:55.735) -> the execution time is very long. So, I decide to create an index (4)Creating an index on encrypted data CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc); (5)Querying the table again SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; pgp_sym_decrypt - test value 32 (1 row) Time: 118558.485 ms (01:58.558) -> almost 2 minutes !! postgres=# explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; QUERY PLAN -- Seq Scan on cartedecredit (cost=0.00..3647.25 rows=500 width=32) (actual time=60711.787..102920.509 rows=1 loops=1) Filter: (pgp_sym_decrypt(cc, 'motdepasse'::text) = 'test value 32'::text) Rows Removed by Filter: 9 Planning time: 0.112 ms Execution time: 102920.585 ms (5 rows) ? the index is not used in the execution plan. maybe because of the use of a function in the WHERE clause. I decide to modify the SQL query (6)Querying the table SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse'); pgp_sym_decrypt - (0 rows) Time: 52659.571 ms (00:52.660) ? The execution time is very long and I get no result (!?) QUERY PLAN --- Seq Scan on cartedecredit (cost=0.00..3646.00 rows=1 width=32) (actual time=61219.989..61219.989 rows=0 loops=1) Filter: (cc = pgp_sym_encrypt('test value 32'::text, 'motdepasse'::text)) Rows Removed by Filter: 10 Planning time: 0.157 ms Execution time: 61220.035 ms (5 rows) ? My index is not used. QUESTIONS : - why I get no result ? -why the index is not used? Thanks in advance Best Regards Didier [cid:image002.png@01D14E0E.8515EB90] Didier ROS Expertise SGBD DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse. Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message. Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus. This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval. If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message. E-mail communication cannot be guaranteed to be timely secure, error or virus-free.
Re: Why the index is not used ?
so 6. 10. 2018 v 11:57 odesílatel ROS Didier napsal: > Hi > > I would like to submit the following problem to the PostgreSQL community. > In my company, we have data encryption needs. > So I decided to use the following procedure : > > > > (1)Creating a table with a bytea type column to store the encrypted > data > CREATE TABLE cartedecredit(card_id SERIAL PRIMARY KEY, username > VARCHAR(100), cc bytea); > > > > (2)inserting encrypted data > INSERT INTO cartedecredit(username,cc) SELECT 'individu ' || x.id, > pgp_sym_encrypt('test value ' || x.id, 'motdepasse','compress-algo=2, > cipher-algo=aes256') FROM generate_series(1,10) AS x(id); > > > > (3)Querying the table > SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE > pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; > > pgp_sym_decrypt > > - > > test value 32 > > (1 row) > > > > Time: 115735.035 ms (01:55.735) > -> the execution time is very long. So, I decide to create an index > > > > (4)Creating an index on encrypted data > CREATE INDEX idx_cartedecredit_cc02 ON cartedecredit(cc); > this index cannot to help. but functional index can cartedecredit(pgp_sym_decrypt(cc, 'motdepasse'). Unfortunately index file will be decrypted in this case. CREATE INDEX ON > > > (5)Querying the table again > > SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE > pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; > pgp_sym_decrypt > > - > > test value 32 > > (1 row) > > > > Time: 118558.485 ms (01:58.558) -> almost 2 minutes !! > postgres=# explain analyze SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM > cartedecredit WHERE pgp_sym_decrypt(cc, 'motdepasse')='test value 32'; > > QUERY PLAN > > > -- > > Seq Scan on cartedecredit (cost=0.00..3647.25 rows=500 width=32) (actual > time=60711.787..102920.509 rows=1 loops=1) > >Filter: (pgp_sym_decrypt(cc, 'motdepasse'::text) = 'test value > 32'::text) > >Rows Removed by Filter: 9 > > Planning time: 0.112 ms > > Execution time: 102920.585 ms > > (5 rows) > > > > è the index is not used in the execution plan. maybe because of the use > of a function in the WHERE clause. I decide to modify the SQL query > > > > (6)Querying the table > SELECT pgp_sym_decrypt(cc, 'motdepasse') FROM cartedecredit WHERE > *cc*=pgp_sym_encrypt('test > value 32', 'motdepasse'); > it is strange - this should to use index, when there is usual index over cc column. What is result of explain analyze when you penalize seq scan by set enable_seqscan to off > pgp_sym_decrypt > > - > > (0 rows) > > > > Time: 52659.571 ms (00:52.660) > > è The execution time is very long and I get no result (!?) > > QUERY PLAN > > > --- > > Seq Scan on cartedecredit (cost=0.00..3646.00 rows=1 width=32) (actual > time=61219.989..61219.989 rows=0 loops=1) > >Filter: (cc = pgp_sym_encrypt('test value 32'::text, > 'motdepasse'::text)) > >Rows Removed by Filter: 10 > > Planning time: 0.157 ms > > Execution time: 61220.035 ms > > (5 rows) > > > > è My index is not used. > > > QUESTIONS : > - why I get no result ? > > -why the index is not used? > > Thanks in advance > > > > Best Regards > Didier > > > > > > [image: cid:image002.png@01D14E0E.8515EB90] > > > * Didier ROS* > * Expertise SGBD* > > > *DS IT/IT DMA/Solutions Groupe EDF/Expertise Applicative - SGBD * > > > > > > > Ce message et toutes les pièces jointes (ci-après le 'Message') sont > établis à l'intention exclusive des destinataires et les informations qui y > figurent sont strictement confidentielles. Toute utilisation de ce Message > non conforme à sa destination, toute diffusion ou toute publication totale > ou partielle, est interdite sauf autorisation expresse. > > Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de > le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou > partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de > votre système, ainsi que toutes ses copies, et de n'en garder aucune trace > sur quelque support que ce soit. Nous vous remercions également d'en > avertir immédiatement l'expéditeur par retour du message. > > Il est impossible de garantir que les communications par messagerie > électronique arrivent en temps utile, sont sécurisées ou dénuées de toute > erreur ou virus. > > > This message and any attachments (the 'Message') are intended solely for > the addressees. The information contained in this Message is confidential. > Any use of information contained in this Message not in accord with its > pur
Re: Why the index is not used ?
Hello Didier, (3), (5) to find the match, you decrypt the whole table, apparently this take quite a long time. Index cannot help here because indexes work on exact match of type and value, but you compare mapped value, not indexed. Functional index should help, but like it was said, it against the idea of encrypted storage. (6) I never used pgp_sym_encrypt() but I see that in INSERT INTO you supplied additional parameter 'compress-algo=2, cipher-algo=aes256' while in (6) you did not. Probably this is the reason. In general matching indexed bytea column should use index, you can ensure in this populating the column unencrypted and using 'test value 32'::bytea for match. In you case I believe pgp_sym_encrypt() is not marked as STABLE or IMMUTABLE that's why it will be evaluated for each row (very inefficient) and cannot use index. From documentation: "Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition." https://www.postgresql.org/docs/10/static/xfunc-volatility.html If you cannot add STABLE/IMMUTABLE to pgp_sym_encrypt() (which apparently should be there), you can encrypt searched value as a separate operation and then search in the table using basic value match. Vlad
Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...
Dear Experts, I have a table that contains bytea data which sometimes is UTF-8 text. When it is - and that is indicated by another column - I want to text-search index it. Something like this: db=> create index ix on tbl using gin (to_tsvector('english',body)) where is_utf8_text; (Note my client encoding is UTF-8.) That doesn't work because to_tsvector doesn't take bytea. So I tried to_tsvector('english',body::text) ; that almost works, but CRLFs in the data become \015\012 (i.e. 6 characters) in the text and the tsvector contains things like '12hello'. Next I tried to_tsvector('english',convert_from(body::text,'UTF-8')). That doesn't work because convert_from is not immutable. (This is 9.6; maybe that has changed.) Is there a good reason for that? Maybe because I might change the client encoding? As a hack I tried ALTER FUNCTION to make it immutable, and now I get: ERROR: invalid byte sequence for encoding "UTF8": 0x00 Hmm, as far as I'm aware 0x00 is fine in UTF-8; what's that mean? But actually I'd be more than happy to ignore invalid UTF-8 here, since I'm only using it for text search; there may be some truly invalid UTF-8 in the data. Is there a "permissive" mode for charset conversion? (That error also suggests that the convert_from is not optimising the conversion from UTF-8 to UTF-8 to a no-op.) Anyway: given the problem of creating a text search index over bytea data that contains UTF-8 text, which may include oddities like null bytes, what would you do? Thanks for any suggestions! Phil.
Re: Why the index is not used ?
I haven’t looked up what pgp_sym_encrypt() does but assuming it does encryption the way you should be for credit card data then it will be using a random salt and the same input value won’t encrypt to the same output value so WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse'); wouldn’t work because the value generated by the function when you are searching on isn’t the same value as when you stored it. Paul > On 6 Oct 2018, at 19:57, ROS Didier wrote: > > WHERE cc=pgp_sym_encrypt('test value 32', 'motdepasse');
Re: Text-indexing UTF-8 bytea, convert_from() immutability, null bytes...
> "Phil" == Phil Endecott writes: Phil> Next I tried Phil> to_tsvector('english',convert_from(body::text,'UTF-8')). That Phil> doesn't work because convert_from is not immutable. (This is 9.6; Phil> maybe that has changed.) Is there a good reason for that? I would guess because conversions are controlled by the pg_conversion table which can be modified by create/drop conversion. Phil> Maybe because I might change the client encoding? No, because convert_from converts from the specified encoding to the server_encoding, not the client_encoding, and the server_encoding can't be changed except at db creation time. Phil> As a hack I tried ALTER FUNCTION to make it immutable, A better approach is to wrap it in a function of your own which is declared immutable, rather than hacking the catalogs: create function from_utf8(bytea) returns text language plpgsql immutable as $$ begin return convert_from($1, 'UTF8'); end; $$; Phil> and now I get: Phil> ERROR: invalid byte sequence for encoding "UTF8": 0x00 Phil> Hmm, as far as I'm aware 0x00 is fine in UTF-8; what's that mean? PG doesn't allow 0x00 in text values regardless of encoding. Phil> But actually I'd be more than happy to ignore invalid UTF-8 here, Phil> since I'm only using it for text search; there may be some truly Phil> invalid UTF-8 in the data. Is there a "permissive" mode for Phil> charset conversion? Unfortunately not. Phil> (That error also suggests that the convert_from is not optimising Phil> the conversion from UTF-8 to UTF-8 to a no-op.) Indeed not, because it must validate that the data really is UTF-8 before treating it as such. Phil> Anyway: given the problem of creating a text search index over Phil> bytea data that contains UTF-8 text, which may include oddities Phil> like null bytes, what would you do? You can search for 0x00 in a bytea using position() or LIKE. What do you want to do with values that contain null bytes? or values which you think are supposed to be valid utf8 text but are not? -- Andrew (irc:RhodiumToad)