Exadata Cloud Service or ExaCS on OCI is like any other regular Exadata server, but with some additional agent processes and tools to automate most of the admin tasks. While this works almost all the time, many of these tools are developed based on the assumption that customers will not change anything on the default configuration of the server. So basically whenever we change any of the default configurations to suite our specific needs, we risk having issues and errors during the patching of ExaCS using the Oracle Provided patching tools.
In this post I will share the two occassions I had issues in patching ExaCS databases due to some custom configuration changes
Issue 1: When I added a new ACFS mount to store database log files, GRID patching failed
In order to work well with the scripts and tools we had in on-premises databases, we had created a new ACFS mount and pointed all databases to use that mount as the diagnostic_dest. This caused an issue in the GRID patching
# dbaascli patch db apply --patchid 29708703-GI --dbnames GRID
This command failed because it was not able to bring down the CRS stack for patching. The solution? I had to manually shutdown all the database instances, unmount the custom ACFS, and restart patching.
While this was something I can forgive Oracle for overlooking, the next issue was more or less caused by lazy scripting
Issue 2: When I added custom SSL wallets, database patch failed
All our existing database listeners are configured to have TCPS protocol listening on a separate port. This was because of the client requirement for a secure connection between the app server and DB as well.
In the case of ExaCS, the default configuration comes with its own GRID_WALLET and DB_WALLET. GRID wallet is used to encrypt even the non-SSL connection to the listener using native encryption, and DB wallet is used to encrypt the datafiles. So technically we do not need SSL connections unless client verification is needed on ExaCS.
In our case, since all our existing systems are configured to use SSL, I had to bypass the native encryption and enable SSL based connection on. I also had to use the wallet files used internally as the GRID wallet so that all existing clients will be able to connect to the DB. This is what caused the patching process to fail, because the patch binary downloaded as part of the automatic patch process is encrypted, and it needs the default wallet file to be present at the default location /u02/app/oracle/admin/grid/grid_wallet
A weird thing occurred here is that the dbaascli happily reported that the patching was successful, even though the actual patching had not even happened. When I checked the exapatch logfile /var/opt/oracle/log/exadbcpatch/exadbcpatch.log, I could find the below lines in there
[3:42 PM, 4/22/2020] Basim: 2020-04-19 23:48:29.600770 - INFO: unencrypting db11204_jan20.tar.gz.gpg bits 2020-04-19 23:48:29.612148 - INFO: grid wallet location is being set to: /u02/app/oracle/admin/grid/grid_wallet 2020-04-19 23:48:29.612294 - INFO: Bits are encrypted, beginning to unencrypt them 2020-04-19 23:48:30.351007 - ERROR: Validation failed for the required key at /var/opt/oracle/perl_lib/DBAAS/logger.pm line 495. logger::logerr('logger=HASH(0x406b558)', 'ERROR: Validation failed for the required key\x{a}') called at /var/opt/oracle/perl_lib/DBAAS/pwallet.pm line 901 pwallet::_do_getkey('pwallet=HASH(0x411cdc0)', 'exacs_patch_key', '/u02/app/oracle/admin/grid/grid_wallet') called at /var/opt/oracle/perl_lib/DBAAS/pwallet.pm line 476 pwallet::getkey('pwallet=HASH(0x411cdc0)', 'exacs_patch_key', '/u02/app/oracle/admin/grid/grid_wallet') called at /var/opt/oracle/exapatch/commonApis.pm line 743 commonApis::unencrypt_bits('commonApis=HASH(0x41fe740)', '/u02/app_acfs/exapatch/30501894', 'db11204_jan20.tar.gz.gpg', 'ExaCsPatching$', 1, 30501894, 11.2.0.4.190716) called at /var/opt/oracle/exapatch/prepare.pm line 266 prepare::prepare_home('prepare=HASH(0x40cb1c8)', '/u02/app/oracle/product/11.2.0/dbhome_5', 'OraHome104_11204_dbpe190716_0_pip', 'HASH(0x3dbecf8)') called at /var/opt/oracle/exapatch/exadbcpatch line 2371 eval {…} called at /var/opt/oracle/exapatch/exadbcpatch line 2321 main::setup() called at /var/opt/oracle/exapatch/exadbcpatch line 1033 eval {…} called at /var/opt/oracle/exapatch/exadbcpatch line 805 2020-04-19 23:48:30.351277 - ERROR: pwallet.pm: No entry found for the requested alias
Once I restored the original cwallet.sso file to this location /u02/app/oracle/admin/grid/grid_wallet on both nodes, the patching went fine. This means that during patching, the SSL connection to this cluster was not working due to certificate mismatch.
There is a workaround for this issue, which involves saving the custom wallet files in a different location and point to that location in listener.ora file. This means the original wallet will stay at the actual location, and the patch process will be able to decrypt the patch files properly
The takeaway from this experience is that the Exadata Cloud Service is really sensitive to custom config, and whenever there is a failure in patching or upgrade, the first thing you should check apart from updating the tooling, is that if any config changes are causing this issue.