GitHub user saulfrr added a comment to the discussion: Issues Creating a
Dataset with a Firebird Database (No Schemas)
Hope this helps
# Firebird → MySQL ETL Automation
## 1. Overview
This AWS Systems Manager Automation executes on EC2 instances tagged
`Name=TAG_HERE` to:
1. **Start** the Firebird 3.0 service
2. **Download** the latest Firebird backup (`.fbk`) from S3
3. **Restore** the backup into a local Firebird database
4. **Extract** DDL and **convert** it to MySQL syntax
5. **Export** each Firebird table to CSV
6. **Create** the target MySQL schema
7. **Import** the CSV data into MySQL
8. **Download** & **restore** the latest MySQL dump from S3 (Different
Superset DBs - Not Firebird related)
9. **Create** MySQL views (e.g. for Superset)
10. **Stop** Firebird
11. **Cleanup** temporary files and old binlogs
Logs for _all_ steps stream to CloudWatch Logs group `/aws/ssm/automation-logs`.

----------
## 2. Prerequisites
- **EC2 Instance**
- Tagged `Name=TAG_HERE`
- SSM Agent installed
- IAM role granting SSM Automation + S3 read access
- **Software**
- Firebird 3.0 server
- MySQL client/server
- Python 3 with `fdb` driver
- AWS CLI configured
- **S3 Buckets**
- Firebird backups: `s3://BUCKET_NAME_HERE/`
- MySQL dumps: `s3://BUCKET_NAME_HERE/databases/`
----------
## 3. Configuration
### 3.1. Directory Layout
All scripts live under a single folder, parameterized as `ScriptsDirectory`
(e.g. `/home/ubuntu/etl-scripts`). Inside you have:
```bash
get_latest_phdb_backup.sh
restore_phdb_firebird.sh
convert_firebird_to_mysql.py
export_tables_ec2.py
create_schema.sh
import_tables_ec2_csv.sh
get_latest_db_backup.sh
restore_latest_mysql_backup.sh
create_views.sql
tables.txt ← generated by restore_phdb_firebird.sh
db_ddl.sql ← extracted DDL
converted_schema.sql ← output of converter
phdb.fbk ← downloaded FB backup
phdb_csv_tables/ ← CSV exports
logs/ ← step logs
latest_backup.zip ← downloaded MySQL dump
db-dumps/ ← unzipped SQL file(s)
```
The reason to have some scripts in Python is that Firebird3.0 is running in
**ARM64** architecture and **FBExport** is compiled for **AMD64**. It is
possible to run it with emulation but it is extremely slow.
### 3.2. MySQL Client (`my.cnf`)
Place a `my.cnf` at `~/.my.cnf` with credentials:
```bash
[client]
user=admin
password=ADMIN_PASSWORD_HERE # <-- replace with secure secret
host=localhost
```
## 4. Script Summaries
### 4.1. `get_latest_phdb_backup.sh`
- **Purpose**:
Fetch the newest Firebird `.fbk` backup from S3.
- **Key steps**:
1. `aws s3 ls s3://BUCKET_NAME_HERE/${PREFIX}`
2. Filter by `phdb_*.fbk`, pick the last (most recent)
3. `aws s3 cp … ./phdb.fbk`
**get_latest_phdb_backup.sh**
```bash
#!/usr/bin/env bash
# Variables
BUCKET="BUCKET_NAME_HERE"
PREFIX="" # If there's a prefix inside the bucket e.g. "some/directory/"
LOCAL_DEST="./phdb.fbk"
# 1. List all files in the bucket, filtered by the prefix if needed
# 2. Filter down to only 'phdb_...fbk' files
# 3. Use 'sort' to ensure we get the latest one;
# `aws s3 ls` output is typically sorted by date/time, but to be safe,
#we will rely on the output order.
# The last one in the list should be the newest since `aws s3 ls` prints in
ascending order by date.
LATEST_FILE=$(aws s3 ls "s3://$BUCKET/$PREFIX" | grep 'phdb_.*\.fbk' | sort |
tail -n 1 | awk '{print $4}')
if [ -z "$LATEST_FILE" ]; then
echo "No phdb_*.fbk files found in s3://$BUCKET/$PREFIX"
exit 1
fi
echo "The latest phdb backup file is: $LATEST_FILE"
# Now copy this latest file
aws s3 cp "s3://$BUCKET/$PREFIX$LATEST_FILE" "$LOCAL_DEST"
# Check exit status
if [ $? -eq 0 ]; then
echo "Successfully copied $LATEST_FILE to $LOCAL_DEST"
else
echo "Failed to copy $LATEST_FILE"
exit 1
fi
```
----------
### 4.2. `restore_phdb_firebird.sh`
- **Purpose**:
Restore the downloaded `phdb.fbk` into `/firebird/data/phdb.fdb`.
- **Key steps**:
1. Unzip if zipped
2. Run `gbak` with `SYSDBA/FIREBIRD_PASSWORD`
3. Extract DDL:
```bash
isql -x /firebird/data/phdb.fdb -user SYSDBA -password FIREBIRD_PASSWORD \
> db_ddl.sql
grep "CREATE TABLE" db_ddl.sql | awk '{print $3}' > tables.txt
```
**restore_phdb_firebird.sh**
```bash
#!/usr/bin/env bash
# Variables
DB_BACKUP_FILE="./phdb.fbk" # Path to the backup file downloaded earlier
DB_FILE="/firebird/data/phdb.fdb"
SYSDBA_USER="SYSDBA"
SYSDBA_PASSWORD="FIREBIRD_PASSWORD_HERE"
FIREBIRD_SERVICE="/etc/init.d/firebird3.0"
# Restore the database from the backup
echo "Restoring database from $DB_BACKUP_FILE..."
if [ ! -f "$DB_BACKUP_FILE" ]; then
echo "Backup file $DB_BACKUP_FILE not found. Aborting."
exit 1
fi
# Remove old database file if it exists
if [ -f "$DB_FILE" ]; then
echo "Removing old database file $DB_FILE..."
rm -f "$DB_FILE"
fi
# Run gbak restore
# /opt/firebird/bin/gbak -c -v -user $SYSDBA_USER -password $SYSDBA_PASSWORD
-REP "$DB_BACKUP_FILE" "$DB_FILE"
gbak -c -v -user $SYSDBA_USER -password $SYSDBA_PASSWORD -REP "$DB_BACKUP_FILE"
"$DB_FILE"
RESTORE_STATUS=$?
if [ $RESTORE_STATUS -ne 0 ]; then
echo "Database restore failed with exit code $RESTORE_STATUS."
exit 1
fi
echo "Database restored successfully to $DB_FILE."
sudo chown firebird:ubuntu "$DB_FILE"
# Run the commands after restore
echo "Extracting database DDL..."
# /opt/firebird/bin/isql -u $SYSDBA_USER -p $SYSDBA_PASSWORD $DB_FILE -x >
db_ddl.sql
isql-fb -u $SYSDBA_USER -p $SYSDBA_PASSWORD $DB_FILE -x > db_ddl.sql
if [ $? -ne 0 ]; then
echo "Failed to extract DDL using isql."
exit 1
fi
echo "Extracting table names..."
grep "CREATE TABLE" db_ddl.sql | awk '{print $3}' > tables.txt
if [ $? -eq 0 ]; then
echo "Table names extracted to tables.txt"
else
echo "Failed to extract table names."
exit 1
fi
echo "All steps completed successfully."
```
----------
### 4.3. `convert_firebird_to_mysql.py`
- **Purpose**:
Translate Firebird DDL → MySQL DDL.
- **I/O**:
- **Input**: `db_ddl.sql`
- **Output**: `converted_schema.sql`
- **Highlights**:
- Maps types (e.g. `INTEGER`→`INT`, `BLOB`→`BLOB`, `BOOLEAN`→`TINYINT(1)`)
- Adjusts quoting, `AUTOINCREMENT`, etc.
**convert_firebird_to_mysql.py**
```python
#!/usr/bin/env python3
import re
# Configuration
INPUT_FILE = "db_ddl.sql"
OUTPUT_FILE = "converted_schema.sql"
# Mapping Firebird data types to MySQL data types
DATA_TYPE_MAPPING = {
'INTEGER': 'INT',
'SMALLINT': 'SMALLINT',
'DOUBLE PRECISION': 'DOUBLE',
'BOOLEAN': 'TINYINT(1)',
'BLOB SUB_TYPE 0 SEGMENT SIZE 80': 'BLOB',
'BLOB': 'BLOB',
'VARCHAR': 'VARCHAR', # Handled with size
'CHAR': 'CHAR',
'DATE': 'DATE',
'TIME': 'TIME',
'TIMESTAMP': 'DATETIME',
# Add more mappings as needed
}
def convert_data_type(firebird_type):
"""
Convert Firebird data type to MySQL data type.
"""
# Handle data types with parameters, e.g., VARCHAR(20)
varchar_match = re.match(r'VARCHAR\((\d+)\)', firebird_type, re.IGNORECASE)
char_match = re.match(r'CHAR\((\d+)\)', firebird_type, re.IGNORECASE)
if varchar_match:
size = varchar_match.group(1)
return f"VARCHAR({size})"
elif char_match:
size = char_match.group(1)
return f"CHAR({size})"
elif firebird_type.upper() in DATA_TYPE_MAPPING:
return DATA_TYPE_MAPPING[firebird_type.upper()]
else:
# Default to TEXT if unknown type
print(f"Warning: Unmapped data type '{firebird_type}'. Defaulting to
TEXT.")
return "TEXT"
def process_create_table(statement):
"""
Convert a single Firebird CREATE TABLE statement to MySQL.
Adds DROP TABLE IF EXISTS before creating the table.
"""
# Extract table name
table_match = re.search(r'CREATE TABLE\s+([A-Za-z0-9_]+)\s*\(', statement,
re.IGNORECASE)
if not table_match:
print("Error: Could not find table name in statement.")
return ""
table_name = table_match.group(1)
# Extract column definitions and constraints
columns_section = statement.split('(', 1)[1].rsplit(')', 1)[0]
lines = [line.strip() for line in columns_section.split(',\n')]
columns = []
constraints = []
for line in lines:
# Check for constraint
constraint_match = re.match(r'CONSTRAINT\s+PK_[A-Za-z0-9_]+\s+PRIMARY
KEY\s*\(([^)]+)\)', line, re.IGNORECASE)
if constraint_match:
pk_columns = constraint_match.group(1).strip()
# Enclose column names in backticks
pk_columns = ', '.join([f"`{col.strip()}`" for col in
pk_columns.split(',')])
constraints.append(f"PRIMARY KEY ({pk_columns})")
continue
# Otherwise, it's a column definition
# Split by first space to separate column name and type
parts = line.split(None, 2) # Column name, data type, and possibly
constraints
if len(parts) < 2:
print(f"Warning: Unable to parse column definition '{line}'.
Skipping.")
continue
column_name = parts[0].strip('"`') # Remove any existing quotes
data_type = parts[1].strip()
# Handle data types with multiple words (e.g., DOUBLE PRECISION)
if data_type.upper() in ['DOUBLE']:
if len(parts) > 2 and
parts[2].strip().upper().startswith('PRECISION'):
data_type += ' ' + parts[2].strip().split()[0] # Append
'PRECISION'
mysql_data_type = convert_data_type(data_type)
# Check for NOT NULL
not_null = 'NOT NULL' if 'NOT NULL' in line.upper() else ''
column_def = f" `{column_name}` {mysql_data_type} {not_null}".strip()
columns.append(column_def)
# Assemble CREATE TABLE statement
# Insert DROP TABLE IF EXISTS line before CREATE TABLE
create_stmt = f"DROP TABLE IF EXISTS `{table_name}`;\n"
create_stmt += f"CREATE TABLE `{table_name}` (\n"
create_stmt += ",\n".join(columns)
if constraints:
create_stmt += ",\n" + ",\n".join([f" {c}" for c in constraints])
create_stmt += "\n);\n"
return create_stmt
def main():
try:
with open(INPUT_FILE, 'r', encoding='utf-8') as infile:
ddl_content = infile.read()
except FileNotFoundError:
print(f"Error: Input file '{INPUT_FILE}' not found.")
return
# Split the content into individual CREATE TABLE statements
# Assumes each CREATE TABLE ends with ');'
create_table_statements = re.findall(r'/\* Table: [^*]+\*/\s*CREATE
TABLE[^;]+;\s*', ddl_content, re.DOTALL | re.IGNORECASE)
if not create_table_statements:
print("Error: No CREATE TABLE statements found in the input file.")
return
converted_statements = []
for stmt in create_table_statements:
mysql_stmt = process_create_table(stmt)
if mysql_stmt:
converted_statements.append(mysql_stmt)
# Write to output file
with open(OUTPUT_FILE, 'w', encoding='utf-8') as outfile:
outfile.write("\n".join(converted_statements))
print(f"Conversion completed successfully. Output saved to
'{OUTPUT_FILE}'.")
if __name__ == "__main__":
main()
```
----------
### 4.4. `export_tables_ec2.py`
- **Purpose**:
Dump every table from the restored Firebird database into UTF-8 CSV.
- **Config**:
- `DB_PATH=/firebird/data/phdb.fdb`
- `CSV_PATH=./phdb_csv_tables`
- `TABLES_FILE=tables.txt`
- **Output**:
- `phdb_csv_tables/{table}.csv`
- `logs/export_log_*.log`
**export_tables_ec2.py**
```python
#!/usr/bin/env python3
import csv
import os
import sys
from datetime import datetime
import fdb # Using fdb instead of firebird_driver
# Configuration
DB_PATH = "/firebird/data/phdb.fdb"
DB_USER = "SYSDBA"
DB_PASS = "FIREBIRD_PASSWORD_HERE"
TABLES_FILE = "tables.txt"
CSV_PATH = "./phdb_csv_tables"
LOGFILE = f"logs/export_log_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
# Ensure CSV output directory exists
os.makedirs(CSV_PATH, exist_ok=True)
os.makedirs(os.path.dirname(LOGFILE), exist_ok=True)
def log(message):
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
with open(LOGFILE, "a") as lf:
lf.write(f"{timestamp} - {message}\n")
print(f"{timestamp} - {message}")
def export_table(conn, table):
try:
cur = conn.cursor()
cur.execute(f"SELECT * FROM {table}")
# Get column names from cursor description
col_names = [desc[0] for desc in cur.description]
csv_file = os.path.join(CSV_PATH, f"{table}_utf8_clean.csv")
with open(csv_file, mode="w", encoding="utf-8", newline="") as f:
writer = csv.writer(f)
writer.writerow(col_names)
row_count = 0
for row in cur:
writer.writerow(row)
row_count += 1
log(f"Successfully exported table '{table}' with {row_count} rows to
{csv_file}")
except Exception as e:
log(f"FAILED to export table '{table}': {e}")
def main():
# Connect to the Firebird database using fdb
try:
conn = fdb.connect(
dsn="localhost:/firebird/data/phdb.fdb",
user=DB_USER,
password=DB_PASS,
charset="ISO8859_1" # Adjust if needed
)
log("Connected to the database successfully.")
except Exception as e:
log(f"ERROR connecting to the database: {e}")
sys.exit(1)
# Read list of tables
if not os.path.isfile(TABLES_FILE):
log(f"ERROR: {TABLES_FILE} not found.")
sys.exit(1)
with open(TABLES_FILE, "r") as tf:
tables = [line.strip() for line in tf if line.strip()]
# Export each table
for table in tables:
log(f"Exporting table: {table}")
export_table(conn, table)
conn.close()
log("Export completed.")
if __name__ == "__main__":
main()
```
----------
### 4.5. `create_schema.sh`
```bash
mysql --default-character-set=utf8mb4 YOUR_SUPERSET_DB < converted_schema.sql
```
- Creates the empty MySQL schema in database `YOUR_SUPERSET_DB`.
----------
### 4.6. `import_tables_ec2_csv.sh`
- **Purpose**:
Bulk-load CSVs into MySQL.
- **Key logic**:
1. `SET FOREIGN_KEY_CHECKS=0;`
2. For each `*_utf8.csv` in `phdb_csv_tables/`:
```bash
mysql --local-infile=1 -D YOUR_SUPERSET_DB \
-e "LOAD DATA LOCAL INFILE '.../${table}.csv' INTO TABLE
${table}
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY
'\n';"
```
3. `SET FOREIGN_KEY_CHECKS=1;`
- Logs to `logs/import_data_*.log`.
**import_tables_ec2_csv.sh**
```bash
#!/bin/bash
# Here you restore the data from Firebird
# You need "local-infile" for this scrip to run
MYSQL_DB="YOUR_SUPERSET_DB" # The MySQL DB you use as source in
Superset.
CSV_DIR="./phdb_csv_tables" # Directory containing the *_utf8.csv files
LOGFILE="logs/import_data_$(date +%Y%m%d_%H%M%S).log"
# Function to log messages
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOGFILE"
}
log "Starting import of CSV files into database '$MYSQL_DB'."
# Disable foreign key checks for bulk import
log "Disabling foreign key checks."
mysql --local-infile=1 -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=0;" >>
"$LOGFILE" 2>&1
# Import each CSV file
for csv_file in "$CSV_DIR"/*_utf8_clean.csv; do
if [ ! -f "$csv_file" ]; then
log "No CSV files found in $CSV_DIR. Skipping."
break
fi
# Extract table name from filename if it follows a pattern: {TABLE}_utf8.csv
TABLE_NAME=$(basename "$csv_file" | sed 's/_utf8_clean.csv$//')
log "Importing $csv_file into table $TABLE_NAME..."
mysql --local-infile=1 -D "$MYSQL_DB" -e "
LOAD DATA LOCAL INFILE '$(realpath "$csv_file")'
INTO TABLE $TABLE_NAME
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;
" >> "$LOGFILE" 2>&1
if [ $? -eq 0 ]; then
log "Successfully imported $csv_file into $TABLE_NAME."
else
log "Error importing $csv_file. Check the log for details. Exiting."
mysql -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=1;" >> "$LOGFILE" 2>&1
exit 1
fi
done
# Re-enable foreign key checks
log "Re-enabling foreign key checks."
mysql -D "$MYSQL_DB" -e "SET FOREIGN_KEY_CHECKS=1;" >> "$LOGFILE" 2>&1
log "All CSV files have been imported successfully."
```
----------
### 4.7. `get_latest_db_backup.sh`
- **Purpose**:
Download the newest MySQL dump ZIP from `s3://BUCKET_NAME/databases/`.
- **Output**:
- `latest_backup.zip`
This script will get the latest backup for another MySQL db that is also used
in Superset as a data source.
**get_latest_db_backup.sh**
```bash
#!/usr/bin/env bash
# Variables
BUCKET="BUCKET_NAME"
PREFIX="databases/" # The folder inside the bucket
LOCAL_DEST="./latest_backup.zip"
# 1. List all files in the bucket within the specified prefix.
# 2. Filter down to only files that match the date-based .zip filename pattern.
# The regex below matches filenames like 2025-02-15-02-30-01.zip
# 3. Sort the list and select the last one, which should be the most recent.
LATEST_FILE=$(aws s3 ls "s3://$BUCKET/$PREFIX" | \
sort | tail -n 1 | awk '{print $4}')
if [ -z "$LATEST_FILE" ]; then
echo "No backup .zip files found in s3://$BUCKET/$PREFIX"
exit 1
fi
echo "The latest backup file is: $LATEST_FILE"
# Now copy this latest file to the local destination
aws s3 cp "s3://$BUCKET/$PREFIX$LATEST_FILE" "$LOCAL_DEST"
# Check exit status of the copy command
if [ $? -eq 0 ]; then
echo "Successfully copied $LATEST_FILE to $LOCAL_DEST"
else
echo "Failed to copy $LATEST_FILE"
exit 1
fi
```
----------
### 4.8. `restore_latest_mysql_backup.sh`
- **Purpose**:
Unzip & import into `your_db`:
This script will restore the latest backup for another MySQL db that is also
used in Superset as a data source.
**restore_latest_mysql_backup.sh**
```bash
#!/usr/bin/env bash
set -e
# Variables
ZIP_FILE="latest_backup.zip"
SQL_FILE="db-dumps/mysql-your_db.sql"
DATABASE="your_db"
# Check if the zip file exists
if [ ! -f "$ZIP_FILE" ]; then
echo "Error: $ZIP_FILE not found. Please ensure the latest backup exists."
exit 1
fi
# Unzip the backup archive, overwriting files if they exist
echo "Extracting $ZIP_FILE..."
unzip -o "$ZIP_FILE"
# Check if the specific SQL backup exists after extraction
if [ ! -f "$SQL_FILE" ]; then
echo "Error: $SQL_FILE not found in the extracted contents."
exit 1
fi
# Restore the database using the SQL dump
echo "Restoring database $DATABASE from $SQL_FILE..."
mysql "$DATABASE" < "$SQL_FILE"
echo "Database $DATABASE restored successfully."
```
----------
### 4.9. `create_views.sql`
Contains your MySQL `CREATE VIEW …` statements (for Superset or BI). Loaded via:
```bash
mysql < /home/ubuntu/create_views.sql
```
Here we create the MySQL views used in Superset.
----------
### 4.10. Cleanup (`stepCleanup`)
```bash
./cleanup_mysql_binlogs.sh
rm db_ddl.sql converted_schema.sql phdb.fbk tables.txt phdb_csv_tables/* logs/*
|| true
rm -rf db-dumps/ latest_backup.zip mysql-your_db.sql || true
```
**cleanup_mysql_binlogs.sh**
```bash
#!/usr/bin/env bash
set -e
# Number of days to keep binary logs
DAYS_TO_KEEP=1
echo "Purging MySQL binary logs older than ${DAYS_TO_KEEP} days..."
# Execute the purge command.
# This command connects to MySQL (using credentials available in .my.cnf, for
example)
# and purges all binary logs older than the current time minus the specified
interval.
mysql -e "PURGE BINARY LOGS BEFORE NOW() - INTERVAL ${DAYS_TO_KEEP} DAY;"
echo "Purge complete."
```
----------
## 5. Logging & Error Handling
- **CloudWatch**
- Log Group: `/aws/ssm/automation-logs`
- Each `aws:runCommand` step streams both stdout/stderr.
- **onFailure: Continue**
- Any step failure sends the run to the final “End” state.
- Individual scripts exit non-zero on errors, and record errors in their
own log files.
GitHub link:
https://github.com/apache/superset/discussions/31328#discussioncomment-13210013
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]