About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Troubleshooting
Problem
When you upgrade the Spark service on Cloud Pak for Data to version 4.7.0, the Enterprise database (EDB) crashes leading to Cloud Pak for Data Spark upgrade failure.
Symptom
Upgrade of Spark service on Cloud Pak for Data fails.
Cause
The Enterprise database (EDB) crashes due to multiple reasons like pods restarting, storage class issue or CrashLoopBackOff state. A clean EDB cluster (CR) needs to be set up to resolve the issue.
Resolving The Problem
To resolve the issue, perform EDB backup operation at Spark service level and clean up the EDB cluster.
Prerequisites
- You must have access to the OpenShift Container Platform cluster. Set the Kubernetes context to the current OpenShift Container Platform cluster by using the following command:
oc login api.test-cluster.cp.fyre.ibm.com -u kubeadmin -p MySecurePassword
- Switch to the existing project that has a CPD instance by using the following command:
oc project cpd-instance
- Set the following environment variables:
export CPD_INSTANCE="cpd-instance" export CPD_OPERATOR="cpd-operator"
- Download and save the following necessary files to your working directory:
-
Edit the following files to set the necessary permissions.
chmod a+x iae-edb-utils.sh chmod a+x iae-maintenance-mode.sh
-
Create a persistent volume to back up or restore the database.
-
If you already have a persistent Volume and already have PVC handy, you can skip this step.a. Create a new storage volume from Cloud Pak for Data > Administration > Storage Volumes.b. Get the PVC name for the volume from volume details.c. Set the PVC environment variable
export BACKUP_PVC="volumes-spark-backup-pvc"
Enabling maintenance mode for SparkTo restrict any active operations on EDB and enable maintenance mode for Spark, run the following command:bash iae-maintenance-mode.sh enable ${CPD_OPERATOR} ${CPD_INSTANCE}
Expected output:deployment.apps/ibm-cpd-ae-operator scaled cronjob.batch/spark-hb-job-cleanup-cron patched cronjob.batch/spark-hb-kernel-cleanup-cron patched deployment.apps/spark-hb-register-hb-dataplane scaled deployment.apps/spark-hb-control-plane scaled
Backup Spark database from EDBTo backup the Spark database, run the following command:bash iae-edb-utils.sh backup ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
Expected output:Geting Zen utils image Deploying Backup Pod Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "backup" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") pod/spark-edb-backup created Sleeping for 5 seconds before the check Check 1 of 30 Current Status: Succeeded Pod is in Completed state EDB database was successfully backed up pod "spark-edb-backup" deleted
Verify the files in your PVC and you must see the backup folder `zen-metastoredb-backup` and `spark-edb-backup`.Recreate EDB ClusterAfter you backup the EDB cluster, delete EDB cluster and re-create the cluster by using the following command:bash iae-edb-utils.sh recreate ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
Expected output:WARNING! Make sure database is backed up as this will result in loss of database Are you sure, you want to continue? (Y/N) Y EDB Cluster YAML is stored at edb_cluster.yaml.bk, this is required when recreating Removing EDB Cluster! cluster.postgresql.k8s.enterprisedb.io "spark-hb-cloud-native-postgresql" deleted Sleeping for 10 seconds Re-creating EDB cluster cluster.postgresql.k8s.enterprisedb.io/spark-hb-cloud-native-postgresql created Sleeping for 20 seconds before the check Check 1 of 30 Current Phase: Waiting for the instances to become active Current Ready Instances: The install is not yet complete Sleeping for 20 seconds before the check Check 2 of 30 Current Phase: Creating a new replica Current Ready Instances: 1 The install is not yet complete Sleeping for 20 seconds before the check Check 3 of 30 Current Phase: Cluster in healthy state Current Ready Instances: 2 Spark edb cluster install is complete EDB successfully recreated
Reset Spark databaseYou must also recreate Spark database and reset `pg`. Run the following command.bash iae-edb-utils.sh reset ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
Expected output:Restore Spark database to EDB from PVCResetting database Removing all active connections pg_terminate_backend ---------------------- (0 rows) datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query_id | query | backend_type -------+---------+-----+------------+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----------------+------------+-------+-------------+--------------+----------+-------+-------------- (0 rows) Dropping and recreating db DROP DATABASE DROP ROLE CREATE DATABASE CREATE ROLE GRANT ALTER DATABASE GRANT
To restore Spark database from PVC, run the following command:bash iae-edb-utils.sh restore ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
Expected output:Geting Zen utils image Deploying Restore Pod Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "restore" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost") pod/spark-edb-restore created Sleeping for 5 seconds before the check Check 1 of 30 Current Status: Succeeded Pod is in Completed state EDB database was successfully restored pod "spark-edb-restore" deleted
The Spark database in the EDB cluster is restored successfully.Disable maintenance mode for SparkTo disable maintenance mode for Spark and let the installation reconciliation to a stable state, run the following command:bash iae-maintenance-mode.sh disable ${CPD_OPERATOR} ${CPD_INSTANCE}
Expected output:deployment.apps/ibm-cpd-ae-operator scaled cronjob.batch/spark-hb-job-cleanup-cron patched cronjob.batch/spark-hb-kernel-cleanup-cron patched deployment.apps/spark-hb-register-hb-dataplane scaled deployment.apps/spark-hb-control-plane scaled
Wait for Spark to reconcile and run the following command:oc get ae -w -n ${CPD_INSTANCE}
ConclusionAfter successfully executing the steps, the Spark database gets restored with a fresh EDB cluster.
Document Location
Worldwide
[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p000000UoRbAAK","label":"Audit events"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]
Was this topic helpful?
Document Information
More support for:
IBM Cloud Pak for Data
Component:
Audit events
Software version:
All Versions
Document number:
6980961
Modified date:
15 March 2024
UID
ibm16980961
Manage My Notification Subscriptions