Re: Recommended way to copy database files on Windows OS (to perform file system level backup)

2018-05-14 Thread Allan Kamau
On Mon, May 14, 2018 at 2:22 PM, Yashwanth Govinda Setty <
ygovindase...@commvault.com> wrote:

> Hi all,
>
>
>
> We are facing this problem while performing file system level backup of
> database files:
>
> As each database will form a directory inside Base directory which
> consists of files representing the tables, when some tables are dropped
> during backup, *We get error while copying since the files do not exist
> anymore*.
>
>
>
> So we would like to know how you recommend copying PostgreSQL database
> files in Windows OS to perform file system level backups.
>
> (For Example – The recommended way in Linux is to use tar format.)
>
>
>
> Thanks,
>
> *Yashwanth*
>
>
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>

An easy way to perform backup of your data is to use PostgreSQL's very own
pg_dump utility.


Re: Lock Postgres account after X number of failed logins?

2020-05-05 Thread Allan Kamau
On Tue, May 5, 2020 at 5:28 PM Adrian Klaver 
wrote:

> On 5/5/20 7:13 AM, Wolff, Ken L wrote:
> > Hi, everyone.  Wondering if there’s a way in PostgreSQL to automatically
> > lock accounts after a number of failed logins (a security requirement
> > for my organization).  I’ve been investigating this for a while and the
> > only reference I’ve found is to write a hook:
> > https://wiki.postgresql.org/images/e/e3/Hooks_in_postgresql.pdf , which
> > is a little more involved than I’d hoped.  Was hoping there was
> > something native available within PostgreSQL.
>
> There is not.
>
> You might want to take a look at this thread:
>
>
> https://www.postgresql.org/message-id/OF010D9AFE.7D96A308-ON85257AB6.00746957-85257AB6.0074746B%40us.ibm.com
>
>
> >
> > Locking accounts after X number of failed logins is an excellent way to
> > defeat brute force attacks, so I’m just wondering if there’s a way to do
> > this, other than the aforementioned hook.
> >
> > This is my first time using this mail list so apologies in advance if
> > I’m not following etiquette or doing something incorrectly.
> >
> > Thanks in advance.
> >
> > Ken W
> >
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com




You can configure PostgreSQL to authenticate via your organisations LDAP or
Active directory then benefit from your organisation's user login account
locking mechanism.

Just in case it proves difficult to find or implement login locking as you
have requested here is a plan B different from what you have requested.

Ideally access to PostgreSQL could be via some authorized applications
which may be developed inhouse, unless your users are required to connect
use psql issue SQL queries directly.
In this case you would only need to secure the application by restricting
access to their use by the use of logins authenticated against your
institutional account manager such as Active Directory.

If your applications are webapplications or are deployed from a small pool
of known computers that would host the applications that need to access the
database, you can restrict access to the port PostgreSQL is listening to by
configuring pg_hba.conf. In this file you would permit only connections
originating from the IP address of the computer(s) that host your
applications as well as the user name and the databases they can connect to.
Then you can also configure firewall rules at the OS of the server hosting
PostgreSQL to disallow connections from all IP addresses except those that
are for the hosts on which your database accessing applications are running
from.

If this fails to meet your requirement, depending the size of your
workload, you can deploy all your database accessing applications on the
same computer on which PostgreSQL is deployed.
Then disallow all TCP/IP connections other than localhost to your database.
This way the only way to anyone to log into PostgreSQL would be from a ssh
or interactive session at the computer hosting PostgreSQL.
Here you would only need to enforce account locking at the OS level for
failed attempts, this would be seamless if log ins to your nodes are via or
organisation's user authentication machinery which would already have the
account locking mechanisms for N number of failed attempts.

Allan.


Re: Querying PostgreSQL / PostGIS Databases in Python

2020-07-31 Thread Allan Kamau
You may write stored procedures using PL/pgSQL,alternatively you may
write your queries in python.
You may use psycopg2 to query the DB from Python.
You may have a mix of the two, it will depend on your preference.
Ideally you may not want your users running queries against the data by
connecting to the database directly using database tools psql or pgadmin3
or pgadmin4.
This means that having a database access application written in Python to
restrict the and encapsulate data access may be advisable.
In this case you may place all the DML statements in python and execute
them or you may have much of the data access logic written into several
PL/pgSQL functions, then call these functions via Python.


Below is python code illustrating the use of psycopg2. This code has not
been run so expect some errors.
Here I am executing an SQL query on a table, you may modify this code to
execute a PL/pgSQL function.


import psycopg2;
from psycopg2 import sql;
import psycopg2.extras;
from psycopg2.extensions import AsIs;


db__pg_conn__str="host='host_name_of_computer_hosting_pg_db_service'
port=5432 dbname='your_pg_db_name' user='your_username'
password='user_password'";
db__pg_conn=psycopg2.connect(db__pg_conn__str);

query_table(
dataset_name
,some_value_2
,db__pg_conn
);


def query_table(
dataset_name
,some_value_2
,db__pg_conn
):
"""
""";
table__id=-1;
_sql_query1a="""
SELECT {}::TEXT AS some_string,a.id AS
table__id,a.*,clock_timestamp() FROM {}.{} a WHERE a.dataset_name={}::TEXT
;
""";
sqlSQL1a=None;
sqlSQL1a=sql.SQL(_sql_query1a);

pg_cursor1a=db__pg_conn.cursor(cursor_factory=psycopg2.extras.DictCursor);
_sql_query1a_processed=pg_cursor1a.mogrify(
sqlSQL1a.format(

sql.SQL(',').join(map(psycopg2.sql.Placeholder,['some_value_1']))
,psycopg2.sql.Identifier("my_schema.my_table".split(".")[0])
,psycopg2.sql.Identifier("my_schema.my_table".split(".")[1])

,sql.SQL(',').join(map(psycopg2.sql.Placeholder,['some_value_2']))
)
,{
'some_value_1':'ABC'
,'some_value_2':dataset_name
}
);

_sql_query1a_processed=_sql_query1a_processed.decode().replace("\\'","'");

#LOGGER.info(" '{0}', -- _sql_query1a_processed
is:'{1}'.".format(datetime.datetime.now().strftime('%Y-%m-%d
%H:%M:%S.%f')[:-1],_sql_query1a_processed));
pg_cursor1a.execute(
_sql_query1a_processed
);
rowcount1a=pg_cursor1a.rowcount;
rows=None;
rows=pg_cursor1a.fetchall();
row_cnt=0;
for row in rows:
pass;
row_cnt+=1;
table__id=row["table__id"];//do something with table__id
//do something with rows.
rows=None;
db__pg_conn.commit();
sqlSQL1a=None;
pg_cursor1a=None;



On Fri, Jul 31, 2020 at 12:30 PM Shaozhong SHI 
wrote:

> Hi,
>
> What is the advantage of querying in Python?
>
> Has anyone got much experience?
>
> What not just use standard query?
>
> What is the rationale for querying in Python?
>
> Would the performance be better?
>
> Regards,
>
> Shao
>


Re: Does the work made by vaccum in the current pass is lost when interrupted?

2020-10-09 Thread Allan Kamau
On Thu, Oct 8, 2020 at 10:16 PM Jean-Marc Lessard <
jean-marc.less...@ultra-ft.com> wrote:

> I have a large table (billions of records) which has not been vacuum and
> bloated.
>
> Vacuum scale factor was left at the default.
>
>
>
> I ran a vacuum on a DEV system and it makes several passes (scanning
> heap/vacuuming indexes/vacumming heap) which take more than an hour each.
>
> On a PROD system, I may have to kill the job midway.
>
> Should I reduce the autovacuum_work_mem of my session? Currently 1GB
>
> Does the work made in the current pass is lost when interrupted?
>
> Correct me if I am not right, but vacuumed indexes and heap made by the
> current pass will go to disk, but the heap should be rescan on the next
> vacuum.
>
> I guess that the visibility map is updated only at the end of each pass.
>
>
>
> My best regards
>
>
>
> Thank you
>
>
>
>
> *Jean-Marc Lessard*
> Administrateur de base de données / Database Administrator
>
>
>
> Ultra Electronics Forensic Technology Inc.
> T +1 514 489 4247 x4164
> www.ultra-forensictechnology.com
>
>
>




A suggestion, considering that you have a table with billions of tuples,
you could look into table partitioning, see "
https://www.postgresql.org/docs/current/ddl-partitioning.html";.

First you may need to identify a criteria by which to partition your data,
maybe by record population timestamp binned into mm resolution.


One a test environment, you may construct a table similar to the one
currently having the many records but with a different name.

Then write code which will construct the child tables, run the sql to
construct the child tables.
If your child tables are based on the record date, you may construct child
tables for the timestamps of the record you have as well as several months
or timepoints in the future. Maybe have a cron job to construct new tables
of future timepoints.

Then write code to populate the new tables directly in piecemeal by the use
of WHERE clause with data from the current production table.

You may choose to write the above code to use "COPY" or "INSERT" to
populate the specific partition table.


Clone your current application and modify the code such that it inserts
directly to the specific child table or leave the writing to the specific
child table to be done by the on insert trigger.

Test the data population via the application to see if the child tables are
being populated accordingly.

If all is well. Schedule downtime where you can implement these changes to
your production environment.



Allan.


Re: protect data of postgres database

2020-12-16 Thread Allan Kamau
A good place to start would be here "
https://www.postgresql.org/docs/13/user-manag.html"; and "
https://www.postgresql.org/docs/13/client-authentication.html"; and "
https://www.postgresql.org/docs/current/ssl-tcp.html";

Additionally.
1) You may need to look into securing access to the physical server which
hosts your PostgreSQL database.
2) Look into OS security and securing your OS (Operating System software)
users.
3) Look into network security and network  firewalls, you may also
configure and activate some firewall rules on your OS.
4) You may need to ensure your client applications are not prone to SQL
injection. SQL injection can be mitigated by the use of parameters when
supplying values to your database queries.

Allan.



On Wed, Dec 16, 2020 at 12:35 PM Atul Kumar  wrote:

> Hi,
>
> How can I protect my data by using any security on my postgres database.
>
>
> Please help me.
>
>
> Regards,
> Atul
>
>
>


Re: Index corruption revealed after upgrade to 11.17, could date back to at least 11.12

2022-10-27 Thread Allan Kamau
On Thu, Oct 27, 2022 at 10:20 AM Kristjan Mustkivi 
wrote:

> Dear community,
>
> Right after upgrading our postgres servers from 11.15 to 11.17 we
> started to encounter problems with data. Namely, when the query hit
> the index, it returned a single row; when the query hit a relation
> directly, it returned more than one row. Attempt to REINDEX revealed
> the underlying data had duplicates (unique index reindexing failed).
>
> Version facts:
> we started out with 11.12
> jan 2022 upgraded to 11.14
> mar 2022 to 11.15
> oct 2022 to 11.17
>
> We are not sure when this corruption actually happened. Could it be
> related to the indexing bugs reported in
> https://www.postgresql.org/docs/release/11.14/? And the condition only
> became known to us after 11.17 rollout which can perhaps be explained
> by the following: while 11.17 does not have any outstanding index
> related fixes, then https://www.postgresql.org/docs/release/11.15/
> mentions fix for index-only scans and so does
> https://www.postgresql.org/docs/release/11.16/.
>
> The bottom line is we would like to understand if the index corruption
> and its manifestation is explained by the above release fixes or is
> there something else that should be investigated further here with the
> help from the community.
>
> With best regards,
> --
> Kristjan Mustkivi
>
> Email: kristjan.mustk...@gmail.com
>
>
> Hi Kristjan,
What if you construct a select statement containing the row id and the
column which has the problematic index into a new table. Then perform
queries on this table to test for uniqueness of the column on which the
problematic index was reported.

Allan.


Re: Index corruption revealed after upgrade to 11.17, could date back to at least 11.12

2022-10-27 Thread Allan Kamau
On Thu, Oct 27, 2022 at 10:26 AM Allan Kamau  wrote:

>
>
> On Thu, Oct 27, 2022 at 10:20 AM Kristjan Mustkivi 
> wrote:
>
>> Dear community,
>>
>> Right after upgrading our postgres servers from 11.15 to 11.17 we
>> started to encounter problems with data. Namely, when the query hit
>> the index, it returned a single row; when the query hit a relation
>> directly, it returned more than one row. Attempt to REINDEX revealed
>> the underlying data had duplicates (unique index reindexing failed).
>>
>> Version facts:
>> we started out with 11.12
>> jan 2022 upgraded to 11.14
>> mar 2022 to 11.15
>> oct 2022 to 11.17
>>
>> We are not sure when this corruption actually happened. Could it be
>> related to the indexing bugs reported in
>> https://www.postgresql.org/docs/release/11.14/? And the condition only
>> became known to us after 11.17 rollout which can perhaps be explained
>> by the following: while 11.17 does not have any outstanding index
>> related fixes, then https://www.postgresql.org/docs/release/11.15/
>> mentions fix for index-only scans and so does
>> https://www.postgresql.org/docs/release/11.16/.
>>
>> The bottom line is we would like to understand if the index corruption
>> and its manifestation is explained by the above release fixes or is
>> there something else that should be investigated further here with the
>> help from the community.
>>
>> With best regards,
>> --
>> Kristjan Mustkivi
>>
>> Email: kristjan.mustk...@gmail.com
>>
>>
>> Hi Kristjan,
> What if you construct a select statement containing the row id and the
> column which has the problematic index into a new table. Then perform
> queries on this table to test for uniqueness of the column on which the
> problematic index was reported.
>
> Allan.
>

How was the data "transfer" between upgrades done? Was it by dump and
restore?
If you have the 11.15 instance running having the data, you may do the
selection of the row id and the specific column which the index is based
into a new table and perform queries on this too to determine uniqueness of
the values therein. Likewise do the same for the 11.17 version.

Is it possible to build and install PG 15 from source on a different
directory (using --prefix ) then perform pg_dump using the binaries of this
installation into a directory. Then configure PG 15 installation to listen
on a different TCP/IP port to the one you are currently using with 11.17
instance. Once started, test to see if the index anomaly is present in the
PG 15 instance. Alternatively you may use the PG 15 docker image and docker
to start a PG 15 docker container for your tests instead of having to build
and install PG 15 for this test.

-Allan


Re: Index corruption revealed after upgrade to 11.17, could date back to at least 11.12

2022-10-27 Thread Allan Kamau
On Thu, Oct 27, 2022 at 12:18 PM Peter J. Holzer  wrote:

> On 2022-10-27 10:55:31 +0300, Kristjan Mustkivi wrote:
> > We use dockerized postgres.
>
> So that means you aren't just replacing PostgreSQL, but your complete OS
> (except the kernel). What is the source of your docker images? Do they
> all use the same base OS distribution? Are the locale definitions the
> same?
>
> (Just trying to rule other other possible error sources.)
>
>
> > I do apologize, but I do not understand the value of doing that select
> > juggling.
>
> I think Allan may have misread your mail.
>
> hp
>
> --
>_  | Peter J. Holzer| Story must make more sense than reality.
> |_|_) ||
> | |   | h...@hjp.at |-- Charles Stross, "Creative writing
> __/   | http://www.hjp.at/ |   challenge!"
>



Hi Peter,

It was a long shot. I was trying to suggest determining if indeed the data
does actually contain duplicating values in the mentioned columns.
Then test to see if the index construction on the new table can be
successful on both versions and see if the error is reproducible.

-Allan.


Geographic coordinate values format conversion to DD (Decimal Degrees) format

2018-12-04 Thread Allan Kamau
I have data that contains geographic coordinate values in various
geographic coordinate system formats such as "degrees minutes seconds" and
"degrees decimal minutes".
I would like to convert these geographic coordinate values into decimal
degrees format.
Does PostgreSQL (more specifically PostGIS) have functions for these types
of conversions.

Below are examples of the geographic coordinates values I have coupled with
the resulting decimal degrees values.
39.529053 N 107.772406 W=39.5290530°, -107.7724060°
27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
30°32’39” N, 91°07’36” E=30.5441667°, 091.127°
27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
1¡20'1N 103¡45'15E=01.3336111°, 103.7541667°
S 20 10.8035165 W 176 36.074496=-20.1800586°, -176.6012416°

The "°" sign in the results is optional.

Allan.


Re: Geographic coordinate values format conversion to DD (Decimal Degrees) format

2018-12-05 Thread Allan Kamau
Thank you very much Paul. Your suggestions and input have spared me many
hours of trying to identify applications and functions to perform this
transformation.
I am now installing PostGIS.

Allan.


On Wed, Dec 5, 2018 at 1:25 AM Paul Ramsey 
wrote:

>
> On Dec 4, 2018, at 12:36 PM, Allan Kamau  wrote:
>
> Does PostgreSQL (more specifically PostGIS) have functions for these types
> of conversions.
>
> Below are examples of the geographic coordinates values I have coupled
> with the resulting decimal degrees values.
> 39.529053 N 107.772406 W=39.5290530°, -107.7724060°
> 27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
> 30°32’39” N, 91°07’36” E=30.5441667°, 091.127°
> 27.485973 S 153.190699 E=-27.4859730°, 153.1906990°
> 1¡20'1N 103¡45'15E=01.3336111°, 103.7541667°
>
> SELECT st_astext(txt2geometry('S 20 10.8035165 W 176 36.074496'));
>
> =-20.1800586°, -176.6012416°
>
> The "°" sign in the results is optional.
>
>
> Nope, you’ve got a big ugly pattern matching problem there, unfortunately,
> and probably are going to have to regex your way out of the bag. PostGIS
> will help you output forms like that, but it doesn’t have any general
> handling of arbitrary DMS strings.
>
> http://postgis.net/docs/manual-2.5/ST_AsLatLonText.html
>
> Here’s a PLPGSQL example that does half of your cases.
>
> CREATE OR REPLACE FUNCTION txt2geometry(textcoord text)
> RETURNS geometry AS
> $$
> DECLARE
> textarr text[];
> sep text;
> lon float8;
> lat float8;
> BEGIN
> textarr := regexp_matches(textcoord, '(\d+)(\D?)(\d{2})\D?([\d\.]+)\D?
> ([NS]),? (\d+)\D?(\d{2})\D?(\d+)\D? ([EW])');
> sep := textarr[2];
> RAISE NOTICE '%', textarr;
> -- DD.DD
> IF sep = '.' THEN
> lat := int4(textarr[1]) + int4(textarr[3]) / 100.0 + float8(textarr[4]) /
> pow(10, length(textarr[4])) / 100;
> lon := int4(textarr[6]) + int4(textarr[7]) / 100.0 + float8(textarr[8]) /
> pow(10, length(textarr[8])) / 100;
> -- DD.MM'SS"
> ELSE
> lat := int4(textarr[1]) + int4(textarr[3]) / 60.0 + float8(textarr[4]) /
> pow(10, length(textarr[4])) / 36;
> lon := int4(textarr[6]) + int4(textarr[7]) / 60.0 + float8(textarr[8]) /
> pow(10, length(textarr[8])) / 36;
> END IF;
> IF textarr[5] = 'S' THEN
> lat := -1 * lat;
> END IF;
> IF textarr[9] = 'W' THEN
> lon := -1 * lon;
> END IF;
> RETURN ST_SetSRID(ST_MakePoint(lon, lat), 4326);
> END;
> $$
> LANGUAGE 'plpgsql' IMMUTABLE
> COST 100;
>
>
>
>
>


Re: Capacity Planning

2019-01-19 Thread Allan Kamau
Some of the aspects you may want to consider are:
1) Per unit time (day or week), how much data will be persisted to the
database, number of records and total size of the data.
2) How much of these data will be updated and how frequently in a given
time unit (day or week).
3) Will the databases hosted on the server be for OLTP (OnLine
Transactional Processing) or OLAP (OnLine Analytical Processing) or a
combination of the two.
4) In case of hardware failure or electrical outage or outage due to other
factors including human activity, how much loss of data (and time) can be
tolerated, see "https://www.postgresql.org/docs/11/backup.html"; and "
https://www.postgresql.org/docs/11/high-availability.html";.
5) The size of the budget for hardware and manpower.

Allan.

On Sat, Jan 19, 2019 at 1:38 PM bhargav kamineni  wrote:

> Hi Team,
>
>
> What are the check lists for doing capacity planning for a postgresql
> server ?
>
> Regards,
> BK.
>


building a singularity image from docker hub postgres image

2024-01-29 Thread Allan Kamau
I am trying to build a singularity image from postgres docker image.
I am issuing the command below.

$ singularity build /local/data/some/postgres.16.1.sif
docker://postgres/postgres:16.1

INFO:Starting build...
INFO:Fetching OCI image...
FATAL:   While performing build: conveyor failed to get: GET
https://index.docker.io/v2/postgres/postgres/manifests/16.1: UNAUTHORIZED:
authentication required; [map[Action:pull Class: Name:postgres/postgres
Type:repository]]


What is the url I should use?

-Allan.


Re: building a singularity image from docker hub postgres image

2024-02-01 Thread Allan Kamau
On Tue, Jan 30, 2024 at 12:27 PM Justin Clift  wrote:

> On 2024-01-30 15:52, Allan Kamau wrote:
> > I am trying to build a singularity image from postgres docker image.
> > I am issuing the command below.
> >
> > $ singularity build /local/data/some/postgres.16.1.sif
> > docker://postgres/postgres:16.1
> >
> > INFO:Starting build...
> > INFO:Fetching OCI image...
> > FATAL:   While performing build: conveyor failed to get: GET
> > https://index.docker.io/v2/postgres/postgres/manifests/16.1:
> > UNAUTHORIZED:
> > authentication required; [map[Action:pull Class: Name:postgres/postgres
> > Type:repository]]
> >
> > What is the url I should use?
>
> Not personally familiar with Singularity, but this post looks like it
> should help:
>
>
> https://www.linuxwave.info/2022/04/running-postgresql-database-using.html
>
> The example command they have there is:
>
>singularity pull docker://postgres:14.2-alpine3.15
>
> So the format of the url *seems* like it should be:
>
>docker://postgres:16.1
>
> That just pure guess work though. :)
>
> Regards and best wishes,
>
> Justin Clift
>


Thanks Justin for the corrections and resources.
I have now put together the command below that does what I was looking for.

$ cd /tmp/;singularity pull postgres_16.1-alpine3.19.sif
docker://postgres:16.1-alpine3.19;

I am using the singularity container for the postgresql binaries instead
of having to install the binaries on the computer. For example I am using
pg_dump to take backups.


This could be of help to others, below is an example of using the
PostgreSQL binaries in the singularity image built from PostgreSQL docker
image.

##construct the singularity image
singularity_image_dfn="/local/data/bcl-sdv/singularity/pg";
mkdir -p "${singularity_image_dfn}";
date;time singularity build
"${singularity_image_dfn}"/postgres.16.1-alpine3.19.sif
docker://postgres/postgres:16.1-alpine3.19;date;

##Example usage of the singularity image
date;time {

singularity_image_fn="/local/data/bcl-sdv/singularity/pg/postgres_16.1-alpine3.19.sif";
date;time singularity exec \
--workdir /tmp \
"${singularity_image_fn}" \
"psql" \

"postgresql://someusername:somepassword@somehostname:5423/some_db_name"
\
"-c" \
"SELECT version();" \
;date;
};date;


Allan.


Re: More than one UNIQUE key when matching items..

2021-03-20 Thread Allan Kamau
On Sat, Mar 20, 2021 at 6:52 PM Ron Clarke  wrote:

> /*
> I'm trying to port a system from SQL server, and at the same time better
> learn postgreSQL.
>
> I've come across a problem that is easily solved in that world, but I am
> struggling to find an approach in postgres that works.
>
> We have 2 sets of events A and B (sets), they have a shared number
> (ncode), both have unique Id's
>
> We want to link items of set A to those of set B, but each item of each
> set can only be linked once.That is we do not want to link all set 'A'
> items to all set 'B' Items with the same code.
>
> In SQL Server this is easy, we insert the records into a temporary table
> with separate Unique indexes on the id for set a and the ids for set b and
> put the 'ignore_dup_key' on which tells SQL Server to ignore duplicate rows
> and carry on.
>
> The nearest to this at first appears to be the ON CONFLICT ON CONSTRAINT
> IGNORE in Postgres. But this only works with a single constraint, at a time
> i.e. we can't set the ON CONFLICT ON CONSTRAINT IGNORE to work with
> multiple UNIQUE indexes.
>
> To show the problem:
>
> I'm using PostgreSQL version 11.
>
> */
> -- source data
> WITH sd AS (
> SELECT iid, s, ncode FROM (
> VALUES (1, 'A', 10),
>(2, 'A', 30),
>(3, 'A', 10),
>(4, 'B', 10),
>(5, 'B', 20),
>(6, 'B', 10)
> )
> AS tx (iid, s, ncode))
> SELECT iid, s, ncode FROM sd
>
> /* The target result would be :
>
>   id:1, A, 10 this matches id:4, B, 10
>   id:3, A, 10 this matches id:6, B, 10
> */
>
> --  Example to get the *wrong *answer, i.e. both sets of links
>
> WITH
> sd (i, s, n ) AS (
> SELECT iid, s, ncode FROM (
> VALUES (1, 'A', 10),
>   (2, 'A', 30),
>   (3, 'A', 10),
>   (4, 'B', 10),
>   (5, 'B', 20),
>   (6, 'B', 10)
> )
> AS tx (iid, s, ncode))
> ,
> x AS ( SELECT
>
> ax.i as ia,
> ax.s as sa,
> ax.n as na,
> bx.i as ib,
> bx.s as sb,
> bx.n as nb,
> ROW_NUMBER () OVER (
>
> PARTITION BY bx.i
>
> ORDER BY
>
> ax.i ) as rx
>
> FROM sd AS ax
> INNER JOIN sd AS bx ON ax.n = bx.n and ax.i != bx.i and bx.s = 'B'
> WHERE ax.s = 'A'
> )
> SELECT ia,ib, na, rx FROM x
> ;
>
>
> /*   I've tried using a recursive CTE where I'm trying to exclude results
> from the result set that have already been identified, but I can't get an
> allowed syntax.
>   Doesn't seem to allow joins to the recursive term to exclude results.
> */
>
>
> /*   I've tried Unique and Exclusion constraints on temporary table, e.g */
> -- similar Example to get the wrong answer, i.e. both sets of links
>
> DROP TABLE IF EXISTS links ;
>
> CREATE TEMPORARY TABLE links
> (mid serial ,
> ia int ,
> -- ia int UNIQUE,
> ib int ,
> -- ib int UNIQUE,
> EXCLUDE USING gist (ia WITH =, ib WITH =)
>
>   ) ;
>
> WITH
> sd (i, s, n ) AS (
> SELECT iid, side, ncode FROM (
> VALUES (1, 'A', 10),
>   (2, 'A', 30),
>   (3, 'A', 10),
>   (4, 'B', 10),
>   (5, 'B', 20),
>   (6, 'B', 10)
> )
> AS tx (iid, side, ncode))
> ,
> x AS (
> SELECT
> ax.i as ia,
> ax.s as sa,
> ax.n as na,
> bx.i as ib,
> bx.s as sb,
> bx.n as nb,
> ROW_NUMBER () OVER (
> PARTITION BY bx.i
> ORDER BY
> ax.i
> ) as rx
> FROM sd AS ax
> INNER JOIN sd AS bx ON ax.n = bx.n and ax.i != bx.i and bx.s = 'B'
> WHERE ax.s = 'A'
> )
> -- SELECT * FROM x
> INSERT INTO links(ia,ib)
> SELECT ia, ib FROM x
> ON CONFLICT ON CONSTRAINT links_ia_ib_excl DO NOTHING;
>
>   --
> SELECT * from links;
>
> /*   I've also tried and failed to use array(ia,ib) within or as computed
> column of an Exclusion constraint of && s on temporary table, e.g
>   but can't find any syntax that doesn't result in an error
>   */
>
>
> DROP TABLE IF EXISTS links ;
>
> CREATE TEMPORARY TABLE links
> (mid serial ,
> ia int ,
> -- ia int UNIQUE,
> ib int ,
> -- ib int UNIQUE,
> ix int[],
>   EXCLUDE USING gist (ix WITH &&)
>   ) ;
>
> -- This gives me:
> -- ERROR:  data type integer[] has no default operator class for access
> method "gist"
>
> -- I have the btree_gist extension installed
>
>
> /*
>
> I appreciate I could create a cursor from a list of proposed links and
> step through each one, checking if the id value has been "used up"
> but I am trying to keep this as a set based operation to give me the
> results in one statement.
>
> There are some similar questions w.r.t. duplicate detection, but these
> again seem to be solved by evaluating each proposed record individually.
> If that's just what I have to do then so be it. There is probably a
> simple 'postgreSQL' freindly approach I'm still yet to discover having spent
> too long in Sybase and SQL Server worlds.
>
> Thanks for looking at this
>
> */
>
>
>



Hi Ron,

How about the code below.
It may require testing with more data.






WITH _sd AS (
SELECT iid, s, ncode FROM (
VALUES (1, 'A', 10),
   (2, 'A', 30),
   (3, 'A', 10),
   (4, 'B', 10),
   (5, 'B', 20),
   (6, 'B', 10)
)
AS tx (iid, s, ncode)

Re: ignore tablespace in schema definition queries

2021-04-03 Thread Allan Kamau
On Sat, Apr 3, 2021 at 1:59 PM Joao Miguel Ferreira <
joao.miguel.c.ferre...@gmail.com> wrote:

> Hello all,
>
> I have a big set of migration queries (that I do not control) that I must
> run on my automatic test database, in order to set ip up and run tests.
> These queries create all sorts of things like indexes, tables, and so. But
> they also include the specification of the tablespace they expect to use
> (tablespace_000, tablespace_001, up to tablespace_999). This would require
> me to setup hundreds of tablespaces before I can start the migration
> process, and run the tests.
>
> Is there a way to tell the postgres server to ignore that part and just
> use some default tablespace? My present situation is that I can not bring
> the test database to a usable point because many migration queries fail due
> to the tablespace they need has not been created. My problem is that I
> would like to avoid creating them.
>
> Thanks
> João
>
>
Hi João,

Below are some thoughts, not a solution.

Since the migration queries seem to be part of a procedural process of your
activities, it may seem advisable to run them as they are so as to avoid
probable complications later on.
There may be some good reason the migration queries are constructing
tablespaces and also provides the opportunity to specify the directory file
where the data would be stored.

The construction of the 1000 tablespaces (tablespace_000 till
tablespace_999) could be done using a procedural language such as plpgsql
and generate_series or "common" programming languages such as python via
psycopg2.
Here you could construct the tablespaces before you run the migration
queries.


Allan.


GIN indexed unique constraint?

2021-06-27 Thread Allan Kamau
Hi,

Is it possible to declare a UNIQUE constraint that uses GIN indexing?
If so, what would the definition look like?

Allan.


Re: GIN indexed unique constraint?

2021-06-27 Thread Allan Kamau
On Sun, Jun 27, 2021 at 9:41 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Sun, Jun 27, 2021 at 11:18 AM Allan Kamau 
> wrote:
> >> Is it possible to declare a UNIQUE constraint that uses GIN indexing?
>
> > Doesn't seem to be possible.  The btree_gin extension would provide the
> > necessary code but it states explicitly that:
> > "... and they lack one major feature of the standard B-tree code: the
> > ability to enforce uniqueness."
>
> GIN stores all the component elements of its input values separately.
> It'd be tremendously hard even to identify which inputs share exactly
> the same component elements; let alone whether inputs sharing the
> same elements should be considered "equal".  For example, ARRAY[1,2]
> and ARRAY[2,1] would give rise to identical sets of index entries in
> a GIN array_ops index.
>
> In short, no, this isn't something you do with a GIN index.
>
> regards, tom lane
>

Thank you David and Tom for your speedy and informative responses.

Allan.


Formulating SQL queries via Retrieval Augmented Generation (RAG).

2024-12-10 Thread Allan Kamau
I am not sure if this is the right place to post a PostgreSQL + Retrieval
Augmented Generation (RAG) question, but I have tried other sites but I did
not receive any response.

I have managed to use RAG to query data in a PostgreSQL table using Python
and the lanchain_community module.

But I have been struggling for weeks trying to find a solution to using RAG
to perform WITH RECURSIVE CTE on two tables (modeling node and edge graph
data).
The SQL queries generated by RAG for this graph data model are error prone.
Is there an example somewhere where RAG has been used to successfully
generate WITH RECURSIVE CTE on two table graph data?

Below is the main code section I am using to perform RAG on PostgreSQL.

llm = ChatGroq(model="mixtral-8x7b-32768", temperature=0);

chain = (
RunnablePassthrough.assign(query=sql_chain).assign(
schema=lambda _: db.get_table_info(),
response=lambda vars: db.run(vars["query"]),
)
| prompt
| llm
)
response_obj: langchain_core.messages.ai.AIMessage=chain.invoke({
"question": user_query_str,
"chat_history": chat_history_list,
});



Any pointers are welcome.

-Allan


Re: Ideas about presenting data coming from sensors

2025-02-14 Thread Allan Kamau
On Thu, Feb 13, 2025 at 9:44 PM Thiemo Kellner 
wrote:

> 13.02.2025 10:54:05 Achilleas Mantzios - cloud <
> a.mantz...@cloud.gatewaynet.com>:
>
> > If we followed a strict normalized approach then we would create
> additionally 11 tables each tag of type c) . And we are not guaranteed that
> the same tags would have the same structure over the whole
> fleet/manufacturers. So we are thinking of putting all semi-structured data
> of tags of type c) into one table with a single col of type jsonb . From
> what I read timescaledb plays nice with jsonb (or at least not bad).
> >
> > Do you ppl see any gotcha with this approach ?
> >
> > For starters we will not convert yet to timescaledb, but store them and
> handle them like normal tables. At least until we grasp the ins and outs of
> this.
>
> I have not come to see the real advantage of putting data into something
> like Jason or XML columns. Sue, you make life easier initially saving the
> time to put them into columns one by one, on the other end you have the
> hassle of dissecting the JSON, XML you name it when you retrieve/select the
> data, every query. While the query might stay stable the  computational
> effort dissection will have to be done with every query execution.
>
> For c) you could go to 6th normal form, or what number that is, by a table
> like TAG(id, tag_name, tag_value). You would have to convert the values to
> string to enter them. It is a pain in the butt, imho, retrieving them
> again. You can alleviate by having a specific column/table per data type,
> e.g. TAG(id, tag_name,tag_value_varray,tag_value_date,...).
>
> What is the downside of having each a table for the special tags? More
> effort in setting up.
> The upside is less effort in retrieval, and a much more understandable
> model.
>
> If your tag structure is volatile, you might have generic column names on
> a table mapping them in a view to speaking names. Taking this further does
> anchor modelling https://www.anchormodeling.com/
> .
>
>
>
Though you haven't asked for comments on database design, I have a
suggestion, have you considered using table partitioning based on a well
defined date interval, maybe monthly?
Here your DB data population application will determine the name of the
table based on the current date then check for the existence of the table
and then construct the inherited table if one does not exist and proceed to
populate the child table accordingly.
This may prove useful in future.

-Allan.