Unable to upgrade Spark due to Enterprise database issues

Troubleshooting

Problem

When you upgrade the Spark service on Cloud Pak for Data to version 4.7.0, the Enterprise database (EDB) crashes leading to Cloud Pak for Data Spark upgrade failure.

Symptom

Upgrade of Spark service on Cloud Pak for Data fails.

Cause

The Enterprise database (EDB) crashes due to multiple reasons like pods restarting, storage class issue or CrashLoopBackOff state. A clean EDB cluster (CR) needs to be set up to resolve the issue.

Resolving The Problem

To resolve the issue, perform EDB backup operation at Spark service level and clean up the EDB cluster.

Prerequisites

You must have access to the OpenShift Container Platform cluster. Set the Kubernetes context to the current OpenShift Container Platform cluster by using the following command:
```
oc login api.test-cluster.cp.fyre.ibm.com -u kubeadmin -p MySecurePassword
```
Switch to the existing project that has a CPD instance by using the following command:
```
oc project cpd-instance
```

Set the following environment variables:

export CPD_INSTANCE="cpd-instance"
export CPD_OPERATOR="cpd-operator"

Download and save the following necessary files to your working directory:
- iae-maintenance-mode_0.sh
- iae-edb-utils.sh

Edit the following files to set the necessary permissions.

chmod a+x iae-edb-utils.sh
chmod a+x iae-maintenance-mode.sh

Create a persistent volume to back up or restore the database.

If you already have a persistent Volume and already have PVC handy, you can skip this step.

a. Create a new storage volume from Cloud Pak for Data > Administration > Storage Volumes.

b. Get the PVC name for the volume from volume details.

c. Set the PVC environment variable

export BACKUP_PVC="volumes-spark-backup-pvc"

Enabling maintenance mode for Spark

To restrict any active operations on EDB and enable maintenance mode for Spark, run the following command:

bash iae-maintenance-mode.sh enable ${CPD_OPERATOR} ${CPD_INSTANCE}

Expected output:

deployment.apps/ibm-cpd-ae-operator scaled
cronjob.batch/spark-hb-job-cleanup-cron patched
cronjob.batch/spark-hb-kernel-cleanup-cron patched
deployment.apps/spark-hb-register-hb-dataplane scaled
deployment.apps/spark-hb-control-plane scaled

Backup Spark database from EDB

To backup the Spark database, run the following command:

bash iae-edb-utils.sh backup ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}

Expected output:

Geting Zen utils image
Deploying Backup Pod
Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "backup" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/spark-edb-backup created
Sleeping for 5 seconds before the check
 
Check 1 of 30
Current Status: Succeeded
 
Pod is in Completed state
EDB database was successfully backed up
pod "spark-edb-backup" deleted

Verify the files in your PVC and you must see the backup folder `zen-metastoredb-backup` and `spark-edb-backup`.

Recreate EDB Cluster

After you backup the EDB cluster, delete EDB cluster and re-create the cluster by using the following command:

bash iae-edb-utils.sh recreate ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}

Expected output:

WARNING! Make sure database is backed up as this will result in loss of database
Are you sure, you want to continue? (Y/N) Y
EDB Cluster YAML is stored at edb_cluster.yaml.bk, this is required when recreating
Removing EDB Cluster!
cluster.postgresql.k8s.enterprisedb.io "spark-hb-cloud-native-postgresql" deleted
Sleeping for 10 seconds
Re-creating EDB cluster
cluster.postgresql.k8s.enterprisedb.io/spark-hb-cloud-native-postgresql created
Sleeping for 20 seconds before the check
 
Check 1 of 30
Current Phase: Waiting for the instances to become active
Current Ready Instances:
 
The install is not yet complete
Sleeping for 20 seconds before the check
 
Check 2 of 30
Current Phase: Creating a new replica
Current Ready Instances: 1
 
The install is not yet complete
Sleeping for 20 seconds before the check
 
Check 3 of 30
Current Phase: Cluster in healthy state
Current Ready Instances: 2
 
Spark edb cluster install is complete
EDB successfully recreated

Reset Spark database

You must also recreate Spark database and reset `pg`. Run the following command.

bash iae-edb-utils.sh reset ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}

Expected output:

Resetting database
Removing all active connections
pg_terminate_backend
----------------------
(0 rows)
 
datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query_id | query | backend_type
-------+---------+-----+------------+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----------------+------------+-------+-------------+--------------+----------+-------+--------------
(0 rows)
 
Dropping and recreating db
DROP DATABASE
DROP ROLE
CREATE DATABASE
CREATE ROLE
GRANT
ALTER DATABASE
GRANT

Restore Spark database to EDB from PVC

To restore Spark database from PVC, run the following command:

bash iae-edb-utils.sh restore ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}

Expected output:

Geting Zen utils image
Deploying Restore Pod
Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "restore" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
pod/spark-edb-restore created
Sleeping for 5 seconds before the check
 
Check 1 of 30
Current Status: Succeeded
 
Pod is in Completed state
EDB database was successfully restored
pod "spark-edb-restore" deleted

The Spark database in the EDB cluster is restored successfully.

Disable maintenance mode for Spark

To disable maintenance mode for Spark and let the installation reconciliation to a stable state, run the following command:

bash iae-maintenance-mode.sh disable ${CPD_OPERATOR} ${CPD_INSTANCE}

Expected output:

deployment.apps/ibm-cpd-ae-operator scaled
cronjob.batch/spark-hb-job-cleanup-cron patched
cronjob.batch/spark-hb-kernel-cleanup-cron patched
deployment.apps/spark-hb-register-hb-dataplane scaled
deployment.apps/spark-hb-control-plane scaled

Wait for Spark to reconcile and run the following command:

oc get ae -w -n ${CPD_INSTANCE}

Conclusion

After successfully executing the steps, the Spark database gets restored with a fresh EDB cluster.

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p000000UoRbAAK","label":"Audit events"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

Unable to upgrade Spark due to Enterprise database issues

Troubleshooting

Problem

Symptom

Cause

Resolving The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?