Hello!

I figured it out it, was a disk space issue. I thought I had checked this 
already. Please disregard! Thank you!

Best,
Chris
 
-- 
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 
 

On 2/4/22, 11:41 AM, "slurm-users on behalf of Christopher Benjamin Coffey" 
<slurm-users-boun...@lists.schedmd.com on behalf of chris.cof...@nau.edu> wrote:

    Hello!

    I'm trying to test an upgrade of our production slurm db on a test cluster. 
Specifically I'm trying to verify a update from 20.11.7 to 21.08.4. I have a 
dump of the production db, and imported as normal. Then firing up slurmdbd to 
perform the conversion. I've verified everything I can think of but I'm 
thinking maybe I'm missing a timeout related mariadb tweak or something to 
prevent the db from "going away" during the conversion.. See the slurmdbd log 
below ... I've tried doing the upgrade both ways, via the systemd start script, 
and manually starting slurmdbd by hand. Anyone run into this before? 

    Here are my innodb.conf settings:

    [mysqld]
    innodb_buffer_pool_size=10000M
    innodb_log_file_size=64M
    innodb_lock_wait_timeout=10000
    max_allowed_packet=16M
    net_read_timeout=10000
    connect_timeout=10000

    ===
    [root@storm mariadb]# time slurmdbd -D -vvv
    slurmdbd: WARNING: MessageTimeout is too high for effective fault-tolerance
    slurmdbd: debug:  Log file re-opened
    slurmdbd: pidfile not locked, assuming no running daemon
    slurmdbd: debug:  auth/munge: init: Munge authentication plugin loaded
    slurmdbd: debug2: accounting_storage/as_mysql: init: mysql_connect() called 
for db slurm_acct_db
    slurmdbd: debug2: Attempting to connect to localhost:3306
    slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL 
server version is: 5.5.5-10.3.28-MariaDB
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: 
innodb_buffer_pool_size: 10737418240
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: 
innodb_log_file_size: 67108864
    slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: 
innodb_lock_wait_timeout: 10000
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: 
pre-converting usage table for monsoon
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 
'monsoon_usage_day_table'
    alter table "monsoon_usage_day_table" change resv_secs plan_secs bigint 
unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The 
database appears to have been altered by a previous upgrade attempt, continuing 
with upgrade.
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 
'monsoon_usage_hour_table'
    alter table "monsoon_usage_hour_table" change resv_secs plan_secs bigint 
unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The 
database appears to have been altered by a previous upgrade attempt, continuing 
with upgrade.
    slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 
'monsoon_usage_month_table'
    alter table "monsoon_usage_month_table" change resv_secs plan_secs bigint 
unsigned default 0 not null;
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The 
database appears to have been altered by a previous upgrade attempt, continuing 
with upgrade.
    slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: 
pre-converting job table for monsoon
    slurmdbd: adding column container after consumed_energy in table 
"monsoon_step_table"
    slurmdbd: adding column submit_line after req_cpufreq_gov in table 
"monsoon_step_table"
    slurmdbd: debug:  Table "monsoon_step_table" has changed.  Updating...
    slurmdbd: error: mysql_query failed: 2013 Lost connection to MySQL server 
during query
    alter table "monsoon_step_table" modify `job_db_inx` bigint unsigned not 
null, modify `deleted` tinyint default 0 not null, modify `exit_code` int 
default 0 not null, modify `id_step` int not null, modify `step_het_comp` int 
unsigned default 0xfffffffe not null, modify `kill_requid` int default -1 not 
null, modify `nodelist` text not null, modify `nodes_alloc` int unsigned not 
null, modify `node_inx` text, modify `state` smallint unsigned not null, modify 
`step_name` text not null, modify `task_cnt` int unsigned not null, modify 
`task_dist` int default 0 not null, modify `time_start` bigint unsigned default 
0 not null, modify `time_end` bigint unsigned default 0 not null, modify 
`time_suspended` bigint unsigned default 0 not null, modify `user_sec` bigint 
unsigned default 0 not null, modify `user_usec` int unsigned default 0 not 
null, modify `sys_sec` bigint unsigned default 0 not null, modify `sys_usec` 
int unsigned default 0 not null, modify `act_cpufreq` double unsigned default 
0.0 not null, modify `consumed_energy` bigint unsigned default 0 not null, add 
`container` text after consumed_energy, modify `req_cpufreq_min` int unsigned 
default 0 not null, modify `req_cpufreq` int unsigned default 0 not null, 
modify `req_cpufreq_gov` int unsigned default 0 not null, add `submit_line` 
text after req_cpufreq_gov, modify `tres_alloc` text not null default '', 
modify `tres_usage_in_ave` text not null default '', modify `tres_usage_in_max` 
text not null default '', modify `tres_usage_in_max_taskid` text not null 
default '', modify `tres_usage_in_max_nodeid` text not null default '', modify 
`tres_usage_in_min` text not null default '', modify `tres_usage_in_min_taskid` 
text not null default '', modify `tres_usage_in_min_nodeid` text not null 
default '', modify `tres_usage_in_tot` text not null default '', modify 
`tres_usage_out_ave` text not null default '', modify `tres_usage_out_max` text 
not null default '', modify `tres_usage_out_max_taskid` text not null default 
'', modify `tres_usage_out_max_nodeid` text not null default '', modify 
`tres_usage_out_min` text not null default '', modify 
`tres_usage_out_min_taskid` text not null default '', modify 
`tres_usage_out_min_nodeid` text not null default '', modify 
`tres_usage_out_tot` text not null default '', drop primary key, add primary 
key (job_db_inx, id_step, step_het_comp), drop key no_step_comp, add key 
no_step_comp (job_db_inx, id_step);
    slurmdbd: accounting_storage/as_mysql: init: Accounting storage MYSQL 
plugin failed
    slurmdbd: error: mysql_commit failed: 2006 MySQL server has gone away
    slurmdbd: error: rollback failed
    slurmdbd: error: Couldn't load specified plugin name for 
accounting_storage/mysql: Plugin init() callback failed
    slurmdbd: error: cannot create accounting_storage context for 
accounting_storage/mysql
    slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting 
storage plugin

    real        11m52.197s
    user        0m0.009s
    sys 0m0.001s ===
    ===

    Thank you for any ideas!

    Best,
    Chris

    -- 
    Christopher Coffey
    High-Performance Computing
    Northern Arizona University
    928-523-1167




Reply via email to