GitHub user saulfrr added a comment to the discussion: Issues Creating a 
Dataset with a Firebird Database (No Schemas)

Hope this helps

# Firebird → MySQL ETL Automation

## 1. Overview

This AWS Systems Manager Automation executes on EC2 instances tagged 
`Name=TAG_HERE` to:

1.  **Start** the Firebird 3.0 service
    
2.  **Download** the latest Firebird backup (`.fbk`) from S3
    
3.  **Restore** the backup into a local Firebird database
    
4.  **Extract** DDL and **convert** it to MySQL syntax
    
5.  **Export** each Firebird table to CSV
    
6.  **Create** the target MySQL schema
    
7.  **Import** the CSV data into MySQL
    
8.  **Download** & **restore** the latest MySQL dump from S3 (Different 
Superset DBs - Not Firebird related)
    
9.  **Create** MySQL views (e.g. for Superset)
    
10.  **Stop** Firebird
    
11.  **Cleanup** temporary files and old binlogs
    
Logs for _all_ steps stream to CloudWatch Logs group `/aws/ssm/automation-logs`.

![933254E3-E116-44A2-A9D7-6DC68EF690AE](https://github.com/user-attachments/assets/b017aba4-e9dd-43fb-ba4a-6e1ef1c13e8d)

----------

## 2. Prerequisites

-   **EC2 Instance**
    
    -   Tagged `Name=TAG_HERE`
        
    -   SSM Agent installed
        
    -   IAM role granting SSM Automation + S3 read access
        
-   **Software**
    
    -   Firebird 3.0 server
        
    -   MySQL client/server
        
    -   Python 3 with `fdb` driver
        
    -   AWS CLI configured
        
-   **S3 Buckets**
    
    -   Firebird backups: `s3://BUCKET_NAME_HERE/`
        
    -   MySQL dumps: `s3://BUCKET_NAME_HERE/databases/`
        

----------

## 3. Configuration

### 3.1. Directory Layout

All scripts live under a single folder, parameterized as `ScriptsDirectory` 
(e.g. `/home/ubuntu/etl-scripts`). Inside you have:

```bash
get_latest_phdb_backup.sh
restore_phdb_firebird.sh
convert_firebird_to_mysql.py
export_tables_ec2.py
create_schema.sh
import_tables_ec2_csv.sh
get_latest_db_backup.sh
restore_latest_mysql_backup.sh
create_views.sql

tables.txt            ← generated by restore_phdb_firebird.sh
db_ddl.sql            ← extracted DDL
converted_schema.sql  ← output of converter
phdb.fbk              ← downloaded FB backup
phdb_csv_tables/      ← CSV exports
logs/                 ← step logs
latest_backup.zip     ← downloaded MySQL dump
db-dumps/             ← unzipped SQL file(s)
```

The reason to have some scripts in Python is that Firebird3.0 is running in 
**ARM64** architecture and **FBExport** is compiled for **AMD64**. It is 
possible to run it with emulation but it is extremely slow.  

### 3.2. MySQL Client (`my.cnf`)

Place a `my.cnf`  at `~/.my.cnf` with credentials:

```bash
[client]
user=admin
password=ADMIN_PASSWORD_HERE    # <-- replace with secure secret
host=localhost
```
## 4. Script Summaries

### 4.1. `get_latest_phdb_backup.sh`

-   **Purpose**:  
    Fetch the newest Firebird `.fbk` backup from S3.
    
-   **Key steps**:
    
    1.  `aws s3 ls s3://BUCKET_NAME_HERE/${PREFIX}`
        
    2.  Filter by `phdb_*.fbk`, pick the last (most recent)
        
    3.  `aws s3 cp … ./phdb.fbk`

**get_latest_phdb_backup.sh**
```bash
#!/usr/bin/env bash
# Variables
BUCKET="BUCKET_NAME_HERE"
PREFIX=""   # If there's a prefix inside the bucket e.g. "some/directory/"
LOCAL_DEST="./phdb.fbk"

# 1. List all files in the bucket, filtered by the prefix if needed
# 2. Filter down to only 'phdb_...fbk' files
# 3. Use 'sort' to ensure we get the latest one; 
#    `aws s3 ls` output is typically sorted by date/time, but to be safe, 
#we will rely on the output order.
#    The last one in the list should be the newest since `aws s3 ls` prints in 
ascending order by date.

LATEST_FILE=$(aws s3 ls "s3://$BUCKET/$PREFIX" | grep 'phdb_.*\.fbk' | sort | 
tail -n 1 | awk '{print $4}')

if [ -z "$LATEST_FILE" ]; then
    echo "No phdb_*.fbk files found in s3://$BUCKET/$PREFIX"
    exit 1
fi

echo "The latest phdb backup file is: $LATEST_FILE"

# Now copy this latest file
aws s3 cp "s3://$BUCKET/$PREFIX$LATEST_FILE" "$LOCAL_DEST"

# Check exit status
if [ $? -eq 0 ]; then
    echo "Successfully copied $LATEST_FILE to $LOCAL_DEST"
else
    echo "Failed to copy $LATEST_FILE"
    exit 1
fi
```
----------

### 4.2. `restore_phdb_firebird.sh`

-   **Purpose**:  
    Restore the downloaded `phdb.fbk` into `/firebird/data/phdb.fdb`.
    
-   **Key steps**:
    
    1.  Unzip if zipped
        
    2.  Run `gbak` with `SYSDBA/FIREBIRD_PASSWORD`
        
    3.  Extract DDL:

```bash
isql -x /firebird/data/phdb.fdb -user SYSDBA -password FIREBIRD_PASSWORD \
  > db_ddl.sql
grep "CREATE TABLE" db_ddl.sql | awk '{print $3}' > tables.txt
```
**restore_phdb_firebird.sh**
```bash
#!/usr/bin/env bash
# Variables
DB_BACKUP_FILE="./phdb.fbk"   # Path to the backup file downloaded earlier
DB_FILE="/firebird/data/phdb.fdb"
SYSDBA_USER="SYSDBA"
SYSDBA_PASSWORD="FIREBIRD_PASSWORD_HERE"
FIREBIRD_SERVICE="/etc/init.d/firebird3.0"

# Restore the database from the backup
echo "Restoring database from $DB_BACKUP_FILE..."
if [ ! -f "$DB_BACKUP_FILE" ]; then
    echo "Backup file $DB_BACKUP_FILE not found. Aborting."
    exit 1
fi

# Remove old database file if it exists
if [ -f "$DB_FILE" ]; then
    echo "Removing old database file $DB_FILE..."
    rm -f "$DB_FILE"
fi

# Run gbak restore
# /opt/firebird/bin/gbak -c -v -user $SYSDBA_USER -password $SYSDBA_PASSWORD 
-REP "$DB_BACKUP_FILE" "$DB_FILE"
gbak -c -v -user $SYSDBA_USER -password $SYSDBA_PASSWORD -REP "$DB_BACKUP_FILE" 
"$DB_FILE"
RESTORE_STATUS=$?

if [ $RESTORE_STATUS -ne 0 ]; then
    echo "Database restore failed with exit code $RESTORE_STATUS."
    exit 1
fi

echo "Database restored successfully to $DB_FILE."

sudo chown firebird:ubuntu "$DB_FILE"

# Run the commands after restore
echo "Extracting database DDL..."
# /opt/firebird/bin/isql -u $SYSDBA_USER -p $SYSDBA_PASSWORD $DB_FILE -x > 
db_ddl.sql
isql-fb -u $SYSDBA_USER -p $SYSDBA_PASSWORD $DB_FILE -x > db_ddl.sql

if [ $? -ne 0 ]; then
    echo "Failed to extract DDL using isql."
    exit 1
fi

echo "Extracting table names..."
grep "CREATE TABLE" db_ddl.sql | awk '{print $3}' > tables.txt

if [ $? -eq 0 ]; then
    echo "Table names extracted to tables.txt"
else
    echo "Failed to extract table names."
    exit 1
fi

echo "All steps completed successfully."
```

----------

### 4.3. `convert_firebird_to_mysql.py`

-   **Purpose**:  
    Translate Firebird DDL → MySQL DDL.
    
-   **I/O**:
    
    -   **Input**: `db_ddl.sql`
        
    -   **Output**: `converted_schema.sql`
        
-   **Highlights**:
    
    -   Maps types (e.g. `INTEGER`→`INT`, `BLOB`→`BLOB`, `BOOLEAN`→`TINYINT(1)`)
        
    -   Adjusts quoting, `AUTOINCREMENT`, etc.
        
**convert_firebird_to_mysql.py**
```python
#!/usr/bin/env python3
import re
# Configuration
INPUT_FILE = "db_ddl.sql"
OUTPUT_FILE = "converted_schema.sql"

# Mapping Firebird data types to MySQL data types
DATA_TYPE_MAPPING = {
    'INTEGER': 'INT',
    'SMALLINT': 'SMALLINT',
    'DOUBLE PRECISION': 'DOUBLE',
    'BOOLEAN': 'TINYINT(1)',
    'BLOB SUB_TYPE 0 SEGMENT SIZE 80': 'BLOB',
    'BLOB': 'BLOB',
    'VARCHAR': 'VARCHAR',  # Handled with size
    'CHAR': 'CHAR',
    'DATE': 'DATE',
    'TIME': 'TIME',
    'TIMESTAMP': 'DATETIME',
    # Add more mappings as needed
}

def convert_data_type(firebird_type):
    """
    Convert Firebird data type to MySQL data type.
    """
    # Handle data types with parameters, e.g., VARCHAR(20)
    varchar_match = re.match(r'VARCHAR\((\d+)\)', firebird_type, re.IGNORECASE)
    char_match = re.match(r'CHAR\((\d+)\)', firebird_type, re.IGNORECASE)
    
    if varchar_match:
        size = varchar_match.group(1)
        return f"VARCHAR({size})"
    elif char_match:
        size = char_match.group(1)
        return f"CHAR({size})"
    elif firebird_type.upper() in DATA_TYPE_MAPPING:
        return DATA_TYPE_MAPPING[firebird_type.upper()]
    else:
        # Default to TEXT if unknown type
        print(f"Warning: Unmapped data type '{firebird_type}'. Defaulting to 
TEXT.")
        return "TEXT"

def process_create_table(statement):
    """
    Convert a single Firebird CREATE TABLE statement to MySQL.
    Adds DROP TABLE IF EXISTS before creating the table.
    """
    # Extract table name
    table_match = re.search(r'CREATE TABLE\s+([A-Za-z0-9_]+)\s*\(', statement, 
re.IGNORECASE)
    if not table_match:
        print("Error: Could not find table name in statement.")
        return ""
    table_name = table_match.group(1)
    
    # Extract column definitions and constraints
    columns_section = statement.split('(', 1)[1].rsplit(')', 1)[0]
    lines = [line.strip() for line in columns_section.split(',\n')]

    columns = []
    constraints = []
    for line in lines:
        # Check for constraint
        constraint_match = re.match(r'CONSTRAINT\s+PK_[A-Za-z0-9_]+\s+PRIMARY 
KEY\s*\(([^)]+)\)', line, re.IGNORECASE)
        if constraint_match:
            pk_columns = constraint_match.group(1).strip()
            # Enclose column names in backticks
            pk_columns = ', '.join([f"`{col.strip()}`" for col in 
pk_columns.split(',')])
            constraints.append(f"PRIMARY KEY ({pk_columns})")
            continue
        
        # Otherwise, it's a column definition
        # Split by first space to separate column name and type
        parts = line.split(None, 2)  # Column name, data type, and possibly 
constraints
        if len(parts) < 2:
            print(f"Warning: Unable to parse column definition '{line}'. 
Skipping.")
            continue
        column_name = parts[0].strip('"`')  # Remove any existing quotes
        data_type = parts[1].strip()
        
        # Handle data types with multiple words (e.g., DOUBLE PRECISION)
        if data_type.upper() in ['DOUBLE']:
            if len(parts) > 2 and 
parts[2].strip().upper().startswith('PRECISION'):
                data_type += ' ' + parts[2].strip().split()[0]  # Append 
'PRECISION'
        
        mysql_data_type = convert_data_type(data_type)
        
        # Check for NOT NULL
        not_null = 'NOT NULL' if 'NOT NULL' in line.upper() else ''
        
        column_def = f"  `{column_name}` {mysql_data_type} {not_null}".strip()
        columns.append(column_def)
    
    # Assemble CREATE TABLE statement
    # Insert DROP TABLE IF EXISTS line before CREATE TABLE
    create_stmt = f"DROP TABLE IF EXISTS `{table_name}`;\n"
    create_stmt += f"CREATE TABLE `{table_name}` (\n"
    create_stmt += ",\n".join(columns)
    if constraints:
        create_stmt += ",\n" + ",\n".join([f"  {c}" for c in constraints])
    create_stmt += "\n);\n"
    
    return create_stmt

def main():
    try:
        with open(INPUT_FILE, 'r', encoding='utf-8') as infile:
            ddl_content = infile.read()
    except FileNotFoundError:
        print(f"Error: Input file '{INPUT_FILE}' not found.")
        return

    # Split the content into individual CREATE TABLE statements
    # Assumes each CREATE TABLE ends with ');'
    create_table_statements = re.findall(r'/\* Table: [^*]+\*/\s*CREATE 
TABLE[^;]+;\s*', ddl_content, re.DOTALL | re.IGNORECASE)

    if not create_table_statements:
        print("Error: No CREATE TABLE statements found in the input file.")
        return

    converted_statements = []
    for stmt in create_table_statements:
        mysql_stmt = process_create_table(stmt)
        if mysql_stmt:
            converted_statements.append(mysql_stmt)

    # Write to output file
    with open(OUTPUT_FILE, 'w', encoding='utf-8') as outfile:
        outfile.write("\n".join(converted_statements))
    
    print(f"Conversion completed successfully. Output saved to 
'{OUTPUT_FILE}'.")

if __name__ == "__main__":
    main()
```

----------
### 4.4. `export_tables_ec2.py`

-   **Purpose**:  
    Dump every table from the restored Firebird database into UTF-8 CSV.
    
-   **Config**:
    
    -   `DB_PATH=/firebird/data/phdb.fdb`
        
    -   `CSV_PATH=./phdb_csv_tables`
        
    -   `TABLES_FILE=tables.txt`
        
-   **Output**:
    
    -   `phdb_csv_tables/{table}.csv`
        
    -   `logs/export_log_*.log`
 
**export_tables_ec2.py**
```python
#!/usr/bin/env python3
import csv
import os
import sys
from datetime import datetime
import fdb  # Using fdb instead of firebird_driver

# Configuration
DB_PATH = "/firebird/data/phdb.fdb"
DB_USER = "SYSDBA"
DB_PASS = "FIREBIRD_PASSWORD_HERE"
TABLES_FILE = "tables.txt"
CSV_PATH = "./phdb_csv_tables"
LOGFILE = f"logs/export_log_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"

# Ensure CSV output directory exists
os.makedirs(CSV_PATH, exist_ok=True)
os.makedirs(os.path.dirname(LOGFILE), exist_ok=True)

def log(message):
    timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    with open(LOGFILE, "a") as lf:
        lf.write(f"{timestamp} - {message}\n")
    print(f"{timestamp} - {message}")

def export_table(conn, table):
    try:
        cur = conn.cursor()
        cur.execute(f"SELECT * FROM {table}")
        # Get column names from cursor description
        col_names = [desc[0] for desc in cur.description]
        csv_file = os.path.join(CSV_PATH, f"{table}_utf8_clean.csv")
        with open(csv_file, mode="w", encoding="utf-8", newline="") as f:
            writer = csv.writer(f)
            writer.writerow(col_names)
            row_count = 0
            for row in cur:
                writer.writerow(row)
                row_count += 1
        log(f"Successfully exported table '{table}' with {row_count} rows to 
{csv_file}")
    except Exception as e:
        log(f"FAILED to export table '{table}': {e}")

def main():
    # Connect to the Firebird database using fdb
    try:
        conn = fdb.connect(
            dsn="localhost:/firebird/data/phdb.fdb",
            user=DB_USER,
            password=DB_PASS,
            charset="ISO8859_1"  # Adjust if needed
        )
        log("Connected to the database successfully.")
    except Exception as e:
        log(f"ERROR connecting to the database: {e}")
        sys.exit(1)

    # Read list of tables
    if not os.path.isfile(TABLES_FILE):
        log(f"ERROR: {TABLES_FILE} not found.")
        sys.exit(1)

    with open(TABLES_FILE, "r") as tf:
        tables = [line.strip() for line in tf if line.strip()]

    # Export each table
    for table in tables:
        log(f"Exporting table: {table}")
        export_table(conn, table)

    conn.close()
    log("Export completed.")

if __name__ == "__main__":
    main()
```
----------
### 4.5. `create_schema.sh`

```bash
mysql --default-character-set=utf8mb4 YOUR_SUPERSET_DB < converted_schema.sql
```

-   Creates the empty MySQL schema in database `YOUR_SUPERSET_DB`.
    
----------

### 4.6. `import_tables_ec2_csv.sh`

-   **Purpose**:  
    Bulk-load CSVs into MySQL.
    
-   **Key logic**:
    
        1.  `SET FOREIGN_KEY_CHECKS=0;`
        
        2.  For each `*_utf8.csv` in `phdb_csv_tables/`:

        ```bash
        mysql --local-infile=1 -D YOUR_SUPERSET_DB \
                -e "LOAD DATA LOCAL INFILE '.../${table}.csv' INTO TABLE 
${table}
                FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY 
'\n';"
        ```
        3.  `SET FOREIGN_KEY_CHECKS=1;`
        
-   Logs to `logs/import_data_*.log`.
    
**import_tables_ec2_csv.sh**
```bash
#!/bin/bash
# Here you restore the data from Firebird
# You need "local-infile" for this scrip to run
MYSQL_DB="YOUR_SUPERSET_DB"             # The MySQL DB you use as source in 
Superset. 
CSV_DIR="./phdb_csv_tables"         # Directory containing the *_utf8.csv files
LOGFILE="logs/import_data_$(date +%Y%m%d_%H%M%S).log"

# Function to log messages
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOGFILE"
}

log "Starting import of CSV files into database '$MYSQL_DB'."

# Disable foreign key checks for bulk import
log "Disabling foreign key checks."
mysql --local-infile=1 -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=0;" >> 
"$LOGFILE" 2>&1

# Import each CSV file
for csv_file in "$CSV_DIR"/*_utf8_clean.csv; do
    if [ ! -f "$csv_file" ]; then
        log "No CSV files found in $CSV_DIR. Skipping."
        break
    fi

    # Extract table name from filename if it follows a pattern: {TABLE}_utf8.csv
    TABLE_NAME=$(basename "$csv_file" | sed 's/_utf8_clean.csv$//')
    log "Importing $csv_file into table $TABLE_NAME..."

    mysql --local-infile=1 -D "$MYSQL_DB" -e "
        LOAD DATA LOCAL INFILE '$(realpath "$csv_file")'
        INTO TABLE $TABLE_NAME
        FIELDS TERMINATED BY ','
        OPTIONALLY ENCLOSED BY '\"'
        LINES TERMINATED BY '\n'
        IGNORE 1 LINES;
    " >> "$LOGFILE" 2>&1

    if [ $? -eq 0 ]; then
        log "Successfully imported $csv_file into $TABLE_NAME."
    else
        log "Error importing $csv_file. Check the log for details. Exiting."
        mysql -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=1;" >> "$LOGFILE" 2>&1
        exit 1
    fi
done

# Re-enable foreign key checks
log "Re-enabling foreign key checks."
mysql -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=1;" >> "$LOGFILE" 2>&1

log "All CSV files have been imported successfully."
```
----------

### 4.7. `get_latest_db_backup.sh`

-   **Purpose**:  
    Download the newest MySQL dump ZIP from `s3://BUCKET_NAME/databases/`.
    
-   **Output**:
    
    -   `latest_backup.zip`
 
This script will get the latest backup for another MySQL db that is also used 
in Superset as a data source.       
 
**get_latest_db_backup.sh**
```bash
#!/usr/bin/env bash
# Variables
BUCKET="BUCKET_NAME"
PREFIX="databases/"   # The folder inside the bucket
LOCAL_DEST="./latest_backup.zip"

# 1. List all files in the bucket within the specified prefix.
# 2. Filter down to only files that match the date-based .zip filename pattern.
#    The regex below matches filenames like 2025-02-15-02-30-01.zip
# 3. Sort the list and select the last one, which should be the most recent.
LATEST_FILE=$(aws s3 ls "s3://$BUCKET/$PREFIX" | \
              sort | tail -n 1 | awk '{print $4}')

if [ -z "$LATEST_FILE" ]; then
    echo "No backup .zip files found in s3://$BUCKET/$PREFIX"
    exit 1
fi

echo "The latest backup file is: $LATEST_FILE"

# Now copy this latest file to the local destination
aws s3 cp "s3://$BUCKET/$PREFIX$LATEST_FILE" "$LOCAL_DEST"

# Check exit status of the copy command
if [ $? -eq 0 ]; then
    echo "Successfully copied $LATEST_FILE to $LOCAL_DEST"
else
    echo "Failed to copy $LATEST_FILE"
    exit 1
fi
```
----------

### 4.8. `restore_latest_mysql_backup.sh`

-   **Purpose**:  
    Unzip & import into `your_db`:

This script will restore the latest backup for another MySQL db that is also 
used in Superset as a data source.   

**restore_latest_mysql_backup.sh**
```bash
#!/usr/bin/env bash
set -e

# Variables
ZIP_FILE="latest_backup.zip"
SQL_FILE="db-dumps/mysql-your_db.sql"
DATABASE="your_db"

# Check if the zip file exists
if [ ! -f "$ZIP_FILE" ]; then
    echo "Error: $ZIP_FILE not found. Please ensure the latest backup exists."
    exit 1
fi

# Unzip the backup archive, overwriting files if they exist
echo "Extracting $ZIP_FILE..."
unzip -o "$ZIP_FILE"

# Check if the specific SQL backup exists after extraction
if [ ! -f "$SQL_FILE" ]; then
    echo "Error: $SQL_FILE not found in the extracted contents."
    exit 1
fi

# Restore the database using the SQL dump
echo "Restoring database $DATABASE from $SQL_FILE..."
mysql "$DATABASE" < "$SQL_FILE"

echo "Database $DATABASE restored successfully."
```

----------
### 4.9. `create_views.sql`

Contains your MySQL `CREATE VIEW …` statements (for Superset or BI). Loaded via:
```bash
mysql < /home/ubuntu/create_views.sql
```
Here we create the MySQL views used in Superset.

----------
### 4.10. Cleanup (`stepCleanup`)

```bash
./cleanup_mysql_binlogs.sh
rm db_ddl.sql converted_schema.sql phdb.fbk tables.txt phdb_csv_tables/* logs/* 
|| true
rm -rf db-dumps/ latest_backup.zip mysql-your_db.sql || true
```
**cleanup_mysql_binlogs.sh**
```bash
#!/usr/bin/env bash
set -e

# Number of days to keep binary logs
DAYS_TO_KEEP=1

echo "Purging MySQL binary logs older than ${DAYS_TO_KEEP} days..."

# Execute the purge command.
# This command connects to MySQL (using credentials available in .my.cnf, for 
example)
# and purges all binary logs older than the current time minus the specified 
interval.
mysql -e "PURGE BINARY LOGS BEFORE NOW() - INTERVAL ${DAYS_TO_KEEP} DAY;"

echo "Purge complete."
```
----------

## 5. Logging & Error Handling

-   **CloudWatch**
    
    -   Log Group: `/aws/ssm/automation-logs`
        
    -   Each `aws:runCommand` step streams both stdout/stderr.
        
-   **onFailure: Continue**
    
    -   Any step failure sends the run to the final “End” state.
        
    -   Individual scripts exit non-zero on errors, and record errors in their 
own log files.


GitHub link: 
https://github.com/apache/superset/discussions/31328#discussioncomment-13210013

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to