Hello! I'm trying to test an upgrade of our production slurm db on a test cluster. Specifically I'm trying to verify a update from 20.11.7 to 21.08.4. I have a dump of the production db, and imported as normal. Then firing up slurmdbd to perform the conversion. I've verified everything I can think of but I'm thinking maybe I'm missing a timeout related mariadb tweak or something to prevent the db from "going away" during the conversion.. See the slurmdbd log below ... I've tried doing the upgrade both ways, via the systemd start script, and manually starting slurmdbd by hand. Anyone run into this before?
Here are my innodb.conf settings: [mysqld] innodb_buffer_pool_size=10000M innodb_log_file_size=64M innodb_lock_wait_timeout=10000 max_allowed_packet=16M net_read_timeout=10000 connect_timeout=10000 === [root@storm mariadb]# time slurmdbd -D -vvv slurmdbd: WARNING: MessageTimeout is too high for effective fault-tolerance slurmdbd: debug: Log file re-opened slurmdbd: pidfile not locked, assuming no running daemon slurmdbd: debug: auth/munge: init: Munge authentication plugin loaded slurmdbd: debug2: accounting_storage/as_mysql: init: mysql_connect() called for db slurm_acct_db slurmdbd: debug2: Attempting to connect to localhost:3306 slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.5-10.3.28-MariaDB slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_buffer_pool_size: 10737418240 slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_log_file_size: 67108864 slurmdbd: debug2: accounting_storage/as_mysql: _check_database_variables: innodb_lock_wait_timeout: 10000 slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting usage table for monsoon slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_day_table' alter table "monsoon_usage_day_table" change resv_secs plan_secs bigint unsigned default 0 not null; slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade. slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_hour_table' alter table "monsoon_usage_hour_table" change resv_secs plan_secs bigint unsigned default 0 not null; slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade. slurmdbd: error: mysql_query failed: 1054 Unknown column 'resv_secs' in 'monsoon_usage_month_table' alter table "monsoon_usage_month_table" change resv_secs plan_secs bigint unsigned default 0 not null; slurmdbd: accounting_storage/as_mysql: as_mysql_convert_alter_query: The database appears to have been altered by a previous upgrade attempt, continuing with upgrade. slurmdbd: accounting_storage/as_mysql: as_mysql_convert_tables_pre_create: pre-converting job table for monsoon slurmdbd: adding column container after consumed_energy in table "monsoon_step_table" slurmdbd: adding column submit_line after req_cpufreq_gov in table "monsoon_step_table" slurmdbd: debug: Table "monsoon_step_table" has changed. Updating... slurmdbd: error: mysql_query failed: 2013 Lost connection to MySQL server during query alter table "monsoon_step_table" modify `job_db_inx` bigint unsigned not null, modify `deleted` tinyint default 0 not null, modify `exit_code` int default 0 not null, modify `id_step` int not null, modify `step_het_comp` int unsigned default 0xfffffffe not null, modify `kill_requid` int default -1 not null, modify `nodelist` text not null, modify `nodes_alloc` int unsigned not null, modify `node_inx` text, modify `state` smallint unsigned not null, modify `step_name` text not null, modify `task_cnt` int unsigned not null, modify `task_dist` int default 0 not null, modify `time_start` bigint unsigned default 0 not null, modify `time_end` bigint unsigned default 0 not null, modify `time_suspended` bigint unsigned default 0 not null, modify `user_sec` bigint unsigned default 0 not null, modify `user_usec` int unsigned default 0 not null, modify `sys_sec` bigint unsigned default 0 not null, modify `sys_usec` int unsigned default 0 not null, modify `act_cpufreq` double unsigned default 0.0 not null, modify `consumed_energy` bigint unsigned default 0 not null, add `container` text after consumed_energy, modify `req_cpufreq_min` int unsigned default 0 not null, modify `req_cpufreq` int unsigned default 0 not null, modify `req_cpufreq_gov` int unsigned default 0 not null, add `submit_line` text after req_cpufreq_gov, modify `tres_alloc` text not null default '', modify `tres_usage_in_ave` text not null default '', modify `tres_usage_in_max` text not null default '', modify `tres_usage_in_max_taskid` text not null default '', modify `tres_usage_in_max_nodeid` text not null default '', modify `tres_usage_in_min` text not null default '', modify `tres_usage_in_min_taskid` text not null default '', modify `tres_usage_in_min_nodeid` text not null default '', modify `tres_usage_in_tot` text not null default '', modify `tres_usage_out_ave` text not null default '', modify `tres_usage_out_max` text not null default '', modify `tres_usage_out_max_taskid` text not null default '', modify `tres_usage_out_max_nodeid` text not null default '', modify `tres_usage_out_min` text not null default '', modify `tres_usage_out_min_taskid` text not null default '', modify `tres_usage_out_min_nodeid` text not null default '', modify `tres_usage_out_tot` text not null default '', drop primary key, add primary key (job_db_inx, id_step, step_het_comp), drop key no_step_comp, add key no_step_comp (job_db_inx, id_step); slurmdbd: accounting_storage/as_mysql: init: Accounting storage MYSQL plugin failed slurmdbd: error: mysql_commit failed: 2006 MySQL server has gone away slurmdbd: error: rollback failed slurmdbd: error: Couldn't load specified plugin name for accounting_storage/mysql: Plugin init() callback failed slurmdbd: error: cannot create accounting_storage context for accounting_storage/mysql slurmdbd: fatal: Unable to initialize accounting_storage/mysql accounting storage plugin real 11m52.197s user 0m0.009s sys 0m0.001s === === Thank you for any ideas! Best, Chris -- Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167