IBM Support

Unable to upgrade Spark due to Enterprise database issues

Troubleshooting


Problem

When you upgrade the Spark service on Cloud Pak for Data to version 4.7.0, the Enterprise database (EDB) crashes leading to Cloud Pak for Data Spark upgrade failure.

Symptom

Upgrade of Spark service on Cloud Pak for Data fails.

Cause

The Enterprise database (EDB) crashes due to multiple reasons like pods restarting, storage class issue or CrashLoopBackOff state.  A clean EDB cluster (CR) needs to be set up to resolve the issue.

Resolving The Problem

To resolve the issue, perform EDB backup operation at Spark service level and clean up the EDB cluster.

Prerequisites

  • You must have access to the OpenShift Container Platform cluster. Set the Kubernetes context to the current OpenShift Container Platform cluster by using the following command: 
    oc login api.test-cluster.cp.fyre.ibm.com -u kubeadmin -p MySecurePassword

  •  Switch to the existing project that has a CPD instance by using the following command:
     
    oc project cpd-instance

  • Set the following environment variables:
    export CPD_INSTANCE="cpd-instance"
    export CPD_OPERATOR="cpd-operator"
  • Download and save the following necessary files to your working directory:
  • Edit the following files to set the necessary permissions.
    chmod a+x iae-edb-utils.sh
    chmod a+x iae-maintenance-mode.sh
  • Create a persistent volume to back up or restore the database.
  • If you already have a persistent Volume and already have PVC handy, you can skip this step.
    a. Create a new storage volume from Cloud Pak for Data > Administration > Storage Volumes.
    b. Get the PVC name for the volume from volume details.
    c. Set the PVC environment variable
    export BACKUP_PVC="volumes-spark-backup-pvc"
    Enabling maintenance mode for Spark
    To restrict any active operations on EDB and enable maintenance mode for Spark, run the following command:
    bash iae-maintenance-mode.sh enable ${CPD_OPERATOR} ${CPD_INSTANCE}
    Expected output:
    deployment.apps/ibm-cpd-ae-operator scaled
    cronjob.batch/spark-hb-job-cleanup-cron patched
    cronjob.batch/spark-hb-kernel-cleanup-cron patched
    deployment.apps/spark-hb-register-hb-dataplane scaled
    deployment.apps/spark-hb-control-plane scaled
     
    Backup Spark database from EDB
     
    To backup the Spark database, run the following command:
    bash iae-edb-utils.sh backup ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
     
    Expected output:
    Geting Zen utils image
    Deploying Backup Pod
    Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "backup" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    pod/spark-edb-backup created
    Sleeping for 5 seconds before the check
     
    Check 1 of 30
    Current Status: Succeeded
     
    Pod is in Completed state
    EDB database was successfully backed up
    pod "spark-edb-backup" deleted
     
    Verify the files in your PVC and you must see the backup folder `zen-metastoredb-backup` and `spark-edb-backup`.
     
    Recreate EDB Cluster
     
    After you backup the EDB cluster, delete EDB cluster and re-create the cluster by using the following command:
    bash iae-edb-utils.sh recreate ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
    Expected output:
    WARNING! Make sure database is backed up as this will result in loss of database
    Are you sure, you want to continue? (Y/N) Y
    EDB Cluster YAML is stored at edb_cluster.yaml.bk, this is required when recreating
    Removing EDB Cluster!
    cluster.postgresql.k8s.enterprisedb.io "spark-hb-cloud-native-postgresql" deleted
    Sleeping for 10 seconds
    Re-creating EDB cluster
    cluster.postgresql.k8s.enterprisedb.io/spark-hb-cloud-native-postgresql created
    Sleeping for 20 seconds before the check
     
    Check 1 of 30
    Current Phase: Waiting for the instances to become active
    Current Ready Instances:
     
    The install is not yet complete
    Sleeping for 20 seconds before the check
     
    Check 2 of 30
    Current Phase: Creating a new replica
    Current Ready Instances: 1
     
    The install is not yet complete
    Sleeping for 20 seconds before the check
     
    Check 3 of 30
    Current Phase: Cluster in healthy state
    Current Ready Instances: 2
     
    Spark edb cluster install is complete
    EDB successfully recreated
    Reset Spark database
     
    You must also recreate Spark database and reset `pg`. Run the following command.
    bash iae-edb-utils.sh reset ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
    Expected output:
    Resetting database
    Removing all active connections
    pg_terminate_backend
    ----------------------
    (0 rows)
     
    datid | datname | pid | leader_pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | wait_event_type | wait_event | state | backend_xid | backend_xmin | query_id | query | backend_type
    -------+---------+-----+------------+----------+---------+------------------+-------------+-----------------+-------------+---------------+------------+-------------+--------------+-----------------+------------+-------+-------------+--------------+----------+-------+--------------
    (0 rows)
     
    Dropping and recreating db
    DROP DATABASE
    DROP ROLE
    CREATE DATABASE
    CREATE ROLE
    GRANT
    ALTER DATABASE
    GRANT

     Restore Spark database to EDB from PVC
     
    To restore Spark database from PVC, run the following command:
    bash iae-edb-utils.sh restore ${CPD_OPERATOR} ${CPD_INSTANCE} ${BACKUP_PVC}
    Expected output:
    Geting Zen utils image
    Deploying Restore Pod
    Warning: would violate PodSecurity "restricted:v1.24": seccompProfile (pod or container "restore" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
    pod/spark-edb-restore created
    Sleeping for 5 seconds before the check
     
    Check 1 of 30
    Current Status: Succeeded
     
    Pod is in Completed state
    EDB database was successfully restored
    pod "spark-edb-restore" deleted
    The Spark database in the EDB cluster is restored successfully.
     
    Disable maintenance mode for Spark
     
    To disable maintenance mode for Spark and let the installation reconciliation to a stable state, run the following command:
    bash iae-maintenance-mode.sh disable ${CPD_OPERATOR} ${CPD_INSTANCE}
    Expected output:
    deployment.apps/ibm-cpd-ae-operator scaled
    cronjob.batch/spark-hb-job-cleanup-cron patched
    cronjob.batch/spark-hb-kernel-cleanup-cron patched
    deployment.apps/spark-hb-register-hb-dataplane scaled
    deployment.apps/spark-hb-control-plane scaled
    Wait for Spark to reconcile and run the following command:
    oc get ae -w -n ${CPD_INSTANCE}
    Conclusion
    After successfully executing the steps, the Spark database gets restored with a fresh EDB cluster.
     


Document Location

Worldwide


[{"Type":"MASTER","Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSHGYS","label":"IBM Cloud Pak for Data"},"ARM Category":[{"code":"a8m3p000000UoRbAAK","label":"Audit events"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

More support for:
IBM Cloud Pak for Data

Component:
Audit events

Software version:
All Versions

Document number:
6980961

Modified date:
15 March 2024

UID

ibm16980961

Manage My Notification Subscriptions