It started like a routine DR testing request.
Application team wanted a safe environment on the standby side to run validations without impacting primary. The standby database was part of a 2-node Oracle RAC setup running on Oracle Cloud Infrastructure Database Base Service. By default, the standby was in mount mode, so the usual debate began - switchover or snapshot standby.
Switchover was immediately ruled out. Too heavy for a simple test cycle, too risky for timing constraints. Snapshot standby looked like the obvious choice.
That's where things should have been straightforward.
But what followed was not a feature issue, not a DR limitation — it was a silent configuration drift that only surfaced when services started disappearing from the listener the moment the PDB was opened.
No ORA errors screaming in logs. No obvious broker failure. Just services registering in mount state and quietly vanishing after activation.
It took two days, multiple toggles between snapshot and ADG mode, and repeated listener inspections before the root cause surfaced: inconsistent remote_listener values between CDB root and PDB levels on the standby site.
A small mismatch. A big operational disruption.
Snapshot Standby and Why It Looks Safer Than It Is
Snapshot standby in Oracle Data Guard is often treated as a “safe testing sandbox.” Under the hood, it is not just a mode switch — it temporarily stops redo apply and opens the standby for read-write testing.
In this case, the conversion was done through Oracle Data Guard Broker because the OCI console did not expose a direct snapshot option.
Broker handled the conversion cleanly. Redo apply was stopped. The standby transitioned as expected.
At database level, everything looked correct:
SELECT database_role, open_mode FROM v$database;
Result: PHYSICAL STANDBY → SNAPSHOT STANDBY
So far, so good.
But RAC introduces another layer most people underestimate — service registration behavior across instances and listeners. And that’s where the problem started hiding.
The First Symptom: Services Register, Then Vanish
When the standby was still in MOUNT state:
- PDB services were visible in the listener
- SCAN listener showed healthy registration
- Instance-level services were stable
The moment the PDB was opened:
-
Services disappeared from
lsnrctl status - No crash in alert log
- No ORA- error pointing to service failure
- Application connections failed intermittently
Checking listener manually:
lsnrctl status
And then querying service registration:
SELECT name, network_name, con_id FROM cdb_services ORDER BY con_id;
What made it confusing was that services were not failing to start - they were actively unregistering.
Even more interesting: the same behavior repeated in Active Data Guard mode testing. That immediately ruled out snapshot mode as the root cause.
At that point, the problem was clearly structural, not functional.
The Real Issue: remote_listener Drift Between CDB and PDB
The breakthrough came after comparing parameter values across containers.
SHOW PARAMETER remote_listener;
On primary: clean and consistent.
On standby:
- CDB$ROOT had one value
- PDB had a different inherited or overwritten value
And RAC service registration does not tolerate this inconsistency.
In Oracle RAC, service registration depends on:
-
LOCAL_LISTENERfor instance-level registration -
REMOTE_LISTENERfor SCAN and cross-node coordination
When remote_listener differs between root and PDB, services behave unpredictably during PDB state transitions.
What actually happens:
- PDB opens
- Service attempts to register using PDB-level parameter
- Listener receives mismatched endpoint context
- Registration gets dropped silently
- Instance remains healthy, but services disappear
No ORA error. Just operational confusion.
Fixing the Drift
The resolution was straightforward once identified, but painful to validate because the system otherwise looked healthy.
We aligned remote_listener across CDB$ROOT $ All PDBs
ALTER SYSTEM SET remote_listener='scan-host:1521' SCOPE=BOTH;
Then validated using :
SHOW PARAMETER remote_listener;
&
SELECT inst_id, name, valueFROM gv$parameter WHERE name = 'remote_listener';
Listener restart was not required, but service re-registration happened after: ALTER SYSTEM REGISTER;
Immediately:
- The Services stabilized
- PDB open state no longer triggered deregistration
- Both snapshot and ADG modes behaved consistently
The key takeaway: this was never a snapshot standby issue. It was a configuration divergence introduced during automated standby provisioning in OCI.
Why This Happens in OCI Environments
On platforms like Oracle Cloud Infrastructure Database Base Service, standby creation is automated.
That automation abstracts:
- Listener configuration
- Broker setup
- SCAN integration
- PDB-level parameter inheritance
And this is where subtle drift can occur:
- Primary and standby are not always cloned with identical system-level parameter propagation
- PDB-level parameters may inherit differently depending on provisioning workflow
- Console automation sometimes prioritizes connectivity over strict parameter parity
No manual intervention was done in this case — which actually made diagnosis harder.
Because DBA instinct usually assumes drift = human change. Not always true in cloud-managed provisioning.
Case Study from a recent incident
Symptoms: Services disappear from listener after PDB open in snapshot standby.
Initial assumptions:
- Snapshot standby bug
- Data Guard broker issue
- OCI console limitation
Diagnosis path:
- Verified Data Guard role and mode
- Tested Active Data Guard behavior
- Compared listener registration patterns
- Inspected CDB vs PDB parameters
Root cause: Mismatch in remote_listener between CDB$ROOT and PDB on standby system.
Fix: Aligned parameter values across all containers and re-registered services.
Lesson: Service registration issues in RAC are often parameter-driven, not mode-driven.
Conclusion
What looked like a snapshot standby limitation turned out to be a subtle RAC service registration inconsistency amplified by multi-tenant architecture and cloud automation.
In modern Oracle RAC + Data Guard environments, especially on Oracle Cloud Infrastructure, the hardest problems are not failures - they are silent misconfigurations that behave like transient issues.
Snapshot standby is stable. Data Guard is predictable. RAC service registration is deterministic., but only when parameter consistency is maintained across CDB and PDB layers.
Once that breaks, everything else becomes noise.
Some FAQs
Q1. Why did services disappear without ORA errors?
Because listener rejected registration due to parameter mismatch, not database failure.
Q2. Does snapshot standby affect listener behavior?
No. Snapshot mode was not the cause here.
Q3. How do you quickly validate this issue?
SELECT inst_id, name, valueFROM gv$parameterWHERE name='remote_listener';
Q4. Why did OCI automation contribute to the issue?
Standby provisioning did not enforce strict parity of listener-related parameters across containers.
Q5. What is the first check in similar issues?
Compare remote_listener and local_listener across CDB and PDB immediately.
No comments:
Post a Comment