GitHub user edespino added a comment to the discussion: gpinitsystem mirror 
create error

# gpinitsystem Log Message Issues

This document outlines three issues discovered in `gpinitsystem` that generate 
confusing or misleading log messages, causing users to believe there are 
problems when the system is actually working correctly.

## Base Issue: Confusing Mirror Registration Success Messages (Original 
Reported Issue)

**Description:** During mirror registration, contradictory log messages are 
generated that combine "Successfully completed" with "failed to register 
mirror".

**Location:** `REGISTER_MIRRORS` function in 
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem`

**Log Evidence:**
```
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully 
completed failed to register mirror for contentid=0
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully 
completed failed to register mirror for contentid=1
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully 
completed failed to register mirror for contentid=2
20250925:02:07:24:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Successfully 
completed failed to register mirror for contentid=3
```

**Status:** **Confirmed** - This behavior has been reproduced and observed 
during testing.

**Impact:** Creates confusion when reviewing logs - messages appear to indicate 
both success and failure simultaneously. Despite these messages, mirrors 
register successfully and function correctly.

## Issue 1: Misleading WARN Messages for Default Configurations

**Description:** Normal default configuration usage is logged as WARN instead 
of INFO level, triggering end-of-installation warnings.

**Location:** `CHK_PARAMS` function in 
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem` (lines 
316, 326)

**Code:**
```bash
if [ x"$ETCD_HOST_CONFIG" = x"" ] || [ x"$CLUSTER_BOOT_MODE" = x"DEMO" ]; then
    LOG_MSG "[WARN]:-No ETCD cluster host config provided, use default 
configuration."
else
    # ... validation logic ...
fi

if [ x"$FTS_HOST_CONFIG" = x"" ] || [ x"$CLUSTER_BOOT_MODE" = x"DEMO" ]; then
    LOG_MSG "[WARN]:-No FTS cluster host config provided, use default 
configuration."
else
    # ... validation logic ...
fi
```

**Log Evidence:**
```
20250924:02:06:56:556404 gpinitsystem:cdw:cbadmin-[WARN]:-No ETCD cluster host 
config provided, use default configuration.
20250924:02:06:56:556404 gpinitsystem:cdw:cbadmin-[WARN]:-No FTS cluster host 
config provided, use default configuration.
```

**Impact:**
- Causes `gpinitsystem` to report warnings at completion
- Users are instructed to review logs for "issues" when using normal default 
configurations
- Particularly problematic for single-node and demo setups where defaults are 
expected

**Fix:** Change `[WARN]` to `[INFO]` in both log messages.

## Issue 2: Insufficient Timeout for Mirror Synchronization

**Description:** The `FORCE_FTS_PROBE()` function uses a 60-iteration loop 
without sleep delays, which completes too quickly for mirrors to transition 
from "down" to "up" state.

**Location:** `FORCE_FTS_PROBE()` function in 
`/Users/eespino/workspace/Apache/cloudberry/gpMgmt/bin/gpinitsystem`

**Code:**
```bash
FORCE_FTS_PROBE () {
    # loop on gp_segment_configuration to make sure primary/mirror pairs are up 
and in sync
    for i in {1..60}; do
        if [ $i == 60 ]; then
            # Log segment status and generate warnings
            LOG_MSG "[WARN]:" 1
            LOG_MSG "[WARN]:-Failed to start Cloudberry instance; please review 
gpinitsystem log to determine failure." 1
            break;
        fi

        RESULT=$( $PSQL -p $GP_PORT -d "$DEFAULTDB" -X -A -t -c "select 
count(*) > 0 from gp_segment_configuration where (mode = 'n' or status = 'd') 
and content != -1;" )
        if [ x"$RESULT" == x"f" ]; then
            break
        fi
    done
}
```

**Timeline Evidence:**
```
20250924:02:07:28:556404 gpinitsystem:cdw:cbadmin-[INFO]:-Total processes 
marked as completed = 4
20250924:02:07:29:556404 gpinitsystem:cdw:cbadmin-[WARN]:-Failed to start 
Cloudberry instance; please review gpinitsystem log to determine failure.
```

**Testing:** Adding a 5-second sleep in each iteration shows mirrors 
successfully sync after ~12 cycles (60 seconds), confirming that mirrors need 
more time to transition from "down" to "up" state than the current 60-iteration 
loop provides.

**Impact:**
- False "Failed to start Cloudberry instance" warnings
- Users believe initialization failed when it actually succeeded
- Triggers end-of-installation warning summary

**Root Cause:** The 60-iteration loop completes in under 1 second, which is 
insufficient time for normal mirror synchronization process.

## Overall Impact

All three issues contribute to a poor user experience where `gpinitsystem` 
appears to have significant problems during normal, successful operations. 
Users receive multiple warnings and are repeatedly told to review logs for 
"failures" and "issues" when the cluster initialization has actually completed 
successfully.

**End-User Experience:** At the completion of `gpinitsystem`, users see this 
alarming output:

```
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-Scanning utility log 
file for any warning messages
20250924:02:22:43:584534 
gpinitsystem:cdw:cbadmin-[WARN]:-*******************************************************
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[WARN]:-Scan of log file 
indicates that some warnings or errors
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[WARN]:-were generated during 
the array creation
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-Please review 
contents of log file
20250924:02:22:43:584534 
gpinitsystem:cdw:cbadmin-[INFO]:-/home/cbadmin/gpAdminLogs/gpinitsystem_20250924.log
20250924:02:22:43:584534 gpinitsystem:cdw:cbadmin-[INFO]:-To determine level of 
criticality
20250924:02:22:43:584534 
gpinitsystem:cdw:cbadmin-[WARN]:-*******************************************************
```

This creates unnecessary alarm and confusion, leading users to believe their 
installation may have failed when it has actually succeeded.

This is particularly problematic for:
- New users who may think their installation failed
- Automated deployment scripts that check for warning patterns
- Support scenarios where users report "failures" that are actually successful 
operations

## Recommendation

These logging issues should be addressed to provide accurate feedback about 
system status and reduce false alarms during normal operations.

GitHub link: 
https://github.com/apache/cloudberry/discussions/1370#discussioncomment-14496626

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to