GitHub user edespino added a comment to the discussion: gpinitsystem mirror
create error
# gpinitsystem Log Message Issues
This document outlines three issues discovered in `gpinitsystem` that generate
confusing or misleading log messages, causing users to believe there are
problems when the system is actually working correctly.
## Base Issue: Confusing Mirror Registration Success Messages (Original
Reported Issue)
**Description:** During mirror registration, contradictory log messages are
generated that combine "Successfully completed" with "failed to register
mirror".
**Location:** `REGISTER_MIRRORS` function in
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem`
**Log Evidence:**
```
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully
completed failed to register mirror for contentid=0
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully
completed failed to register mirror for contentid=1
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully
completed failed to register mirror for contentid=2
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully
completed failed to register mirror for contentid=3
```
**Status:** **Confirmed** - This behavior has been reproduced and observed
during testing.
**Impact:** Creates confusion when reviewing logs - messages appear to indicate
both success and failure simultaneously. Despite these messages, mirrors
register successfully and function correctly.
## Issue 1: Misleading WARN Messages for Default Configurations
**Description:** Normal default configuration usage is logged as WARN instead
of INFO level, triggering end-of-installation warnings.
**Location:** `CHK_PARAMS` function in
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem` (lines
316, 326)
**Code:**
```bash
if [ x"$ETCD_HOST_CONFIG" = x"" ] || [ x"$CLUSTER_BOOT_MODE" = x"DEMO" ]; then
LOG_MSG "[WARN]:-No ETCD cluster host config provided, use default
configuration."
else
# ... validation logic ...
fi
if [ x"$FTS_HOST_CONFIG" = x"" ] || [ x"$CLUSTER_BOOT_MODE" = x"DEMO" ]; then
LOG_MSG "[WARN]:-No FTS cluster host config provided, use default
configuration."
else
# ... validation logic ...
fi
```
**Log Evidence:**
```
20250924:02:06:56:556404 gpinitsystem:cdw:cbadmin-[WARN]:-No ETCD cluster host
config provided, use default configuration.
20250924:02:06:56:556404 gpinitsystem:cdw:cbadmin-[WARN]:-No FTS cluster host
config provided, use default configuration.
```
**Impact:**
- Causes `gpinitsystem` to report warnings at completion
- Users are instructed to review logs for "issues" when using normal default
configurations
- Particularly problematic for single-node and demo setups where defaults are
expected
**Fix:** Change `[WARN]` to `[INFO]` in both log messages.
## Issue 2: Insufficient Timeout for Mirror Synchronization
**Description:** The `FORCE_FTS_PROBE()` function uses a 60-iteration loop
without sleep delays, which completes too quickly for mirrors to transition
from "down" to "up" state.
**Location:** `FORCE_FTS_PROBE()` function in
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem`
**Code:**
```bash
FORCE_FTS_PROBE () {
# loop on gp_segment_configuration to make sure primary/mirror pairs are up
and in sync
for i in {1..60}; do
if [ $i == 60 ]; then
# Log segment status and generate warnings
LOG_MSG "[WARN]:" 1
LOG_MSG "[WARN]:-Failed to start Cloudberry instance; please review
gpinitsystem log to determine failure." 1
break;
fi
RESULT=$( $PSQL -p $GP_PORT -d "$DEFAULTDB" -X -A -t -c "select
count(*) > 0 from gp_segment_configuration where (mode = 'n' or status = 'd')
and content != -1;" )
if [ x"$RESULT" == x"f" ]; then
break
fi
done
}
```
**Timeline Evidence:**
```
20250924:02:07:28:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Total processes
marked as completed = 4
20250924:02:07:29:556404 gpinitsystem:cdw:cbadmin-[WARN]:-Failed to start
Cloudberry instance; please review gpinitsystem log to determine failure.
```
**Testing:** Adding a 5-second sleep in each iteration shows mirrors
successfully sync after ~12 cycles (60 seconds), confirming that mirrors need
more time to transition from "down" to "up" state than the current 60-iteration
loop provides.
**Impact:**
- False "Failed to start Cloudberry instance" warnings
- Users believe initialization failed when it actually succeeded
- Triggers end-of-installation warning summary
**Root Cause:** The 60-iteration loop completes in under 1 second, which is
insufficient time for normal mirror synchronization process.
## Overall Impact
All three issues contribute to a poor user experience where `gpinitsystem`
appears to have significant problems during normal, successful operations.
Users receive multiple warnings and are repeatedly told to review logs for
"failures" and "issues" when the cluster initialization has actually completed
successfully.
**End-User Experience:** At the completion of `gpinitsystem`, users see this
alarming output:
```
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-Scanning utility log
file for any warning messages
20250924:02:22:43:584534
gpinitsystem:cdw:cbadmin-[WARN]:-*******************************************************
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[WARN]:-Scan of log file
indicates that some warnings or errors
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[WARN]:-were generated during
the array creation
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-Please review
contents of log file
20250924:02:22:43:584534
gpinitsystem:cdw:cbadmin-[INFO]:-/home/cbadmin/gpAdminLogs/gpinitsystem_20250924.log
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-To determine level of
criticality
20250924:02:22:43:584534
gpinitsystem:cdw:cbadmin-[WARN]:-*******************************************************
```
This creates unnecessary alarm and confusion, leading users to believe their
installation may have failed when it has actually succeeded.
This is particularly problematic for:
- New users who may think their installation failed
- Automated deployment scripts that check for warning patterns
- Support scenarios where users report "failures" that are actually successful
operations
## Recommendation
These logging issues should be addressed to provide accurate feedback about
system status and reduce false alarms during normal operations.
GitHub link:
https://github.com/apache/cloudberry/discussions/1370#discussioncomment-14496626
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]