Snapshot Standby in OCI RAC: When Listener Registration Breaks Quietly

Friday, 31 October 2025

Snapshot Standby in OCI RAC: When Listener Registration Breaks Quietly

It started like a routine DR testing request.

Application team wanted a safe environment on the standby side to run validations without impacting primary. The standby database was part of a 2-node Oracle RAC setup running on Oracle Cloud Infrastructure Database Base Service. By default, the standby was in mount mode, so the usual debate began - switchover or snapshot standby.

Switchover was immediately ruled out. Too heavy for a simple test cycle, too risky for timing constraints. Snapshot standby looked like the obvious choice.

That's where things should have been straightforward.

But what followed was not a feature issue, not a DR limitation — it was a silent configuration drift that only surfaced when services started disappearing from the listener the moment the PDB was opened.

No ORA errors screaming in logs. No obvious broker failure. Just services registering in mount state and quietly vanishing after activation.

It took two days, multiple toggles between snapshot and ADG mode, and repeated listener inspections before the root cause surfaced: inconsistent remote_listener values between CDB root and PDB levels on the standby site.

A small mismatch. A big operational disruption.

Snapshot Standby and Why It Looks Safer Than It Is

Snapshot standby in Oracle Data Guard is often treated as a “safe testing sandbox.” Under the hood, it is not just a mode switch — it temporarily stops redo apply and opens the standby for read-write testing.

In this case, the conversion was done through Oracle Data Guard Broker because the OCI console did not expose a direct snapshot option.

Broker handled the conversion cleanly. Redo apply was stopped. The standby transitioned as expected.

At database level, everything looked correct:

SELECT database_role, open_mode FROM v$database;

Result: PHYSICAL STANDBY → SNAPSHOT STANDBY

So far, so good.

But RAC introduces another layer most people underestimate — service registration behavior across instances and listeners. And that’s where the problem started hiding.

The First Symptom: Services Register, Then Vanish

When the standby was still in MOUNT state:

PDB services were visible in the listener
SCAN listener showed healthy registration
Instance-level services were stable

The moment the PDB was opened:

Services disappeared from lsnrctl status
No crash in alert log
No ORA- error pointing to service failure
Application connections failed intermittently

Checking listener manually:

lsnrctl status

And then querying service registration:

SELECT name, network_name, con_id FROM cdb_services ORDER BY con_id;

What made it confusing was that services were not failing to start - they were actively unregistering.

Even more interesting: the same behavior repeated in Active Data Guard mode testing. That immediately ruled out snapshot mode as the root cause.

At that point, the problem was clearly structural, not functional.

The Real Issue: remote_listener Drift Between CDB and PDB

The breakthrough came after comparing parameter values across containers.

SHOW PARAMETER remote_listener;

On primary: clean and consistent.

On standby:

CDB$ROOT had one value
PDB had a different inherited or overwritten value

And RAC service registration does not tolerate this inconsistency.

In Oracle RAC, service registration depends on:

LOCAL_LISTENER for instance-level registration
REMOTE_LISTENER for SCAN and cross-node coordination

When remote_listener differs between root and PDB, services behave unpredictably during PDB state transitions.

What actually happens:

PDB opens
Service attempts to register using PDB-level parameter
Listener receives mismatched endpoint context
Registration gets dropped silently
Instance remains healthy, but services disappear

No ORA error. Just operational confusion.

Fixing the Drift

The resolution was straightforward once identified, but painful to validate because the system otherwise looked healthy.

We aligned remote_listener across CDB$ROOT $ All PDBs

ALTER SYSTEM SET remote_listener='scan-host:1521' SCOPE=BOTH;

Then validated using :

SHOW PARAMETER remote_listener;

SELECT inst_id, name, value FROM gv$parameter WHERE name = 'remote_listener';

Listener restart was not required, but service re-registration happened after: ALTER SYSTEM REGISTER;

Immediately:

The Services stabilized
PDB open state no longer triggered deregistration
Both snapshot and ADG modes behaved consistently

The key takeaway: this was never a snapshot standby issue. It was a configuration divergence introduced during automated standby provisioning in OCI.

Why This Happens in OCI Environments

On platforms like Oracle Cloud Infrastructure Database Base Service, standby creation is automated.

That automation abstracts:

Listener configuration
Broker setup
SCAN integration
PDB-level parameter inheritance

And this is where subtle drift can occur:

Primary and standby are not always cloned with identical system-level parameter propagation
PDB-level parameters may inherit differently depending on provisioning workflow
Console automation sometimes prioritizes connectivity over strict parameter parity

No manual intervention was done in this case — which actually made diagnosis harder.

Because DBA instinct usually assumes drift = human change. Not always true in cloud-managed provisioning.

Case Study from a recent incident

Symptoms: Services disappear from listener after PDB open in snapshot standby.

Initial assumptions:

Snapshot standby bug
Data Guard broker issue
OCI console limitation

Diagnosis path:

Verified Data Guard role and mode
Tested Active Data Guard behavior
Compared listener registration patterns
Inspected CDB vs PDB parameters

Root cause: Mismatch in remote_listener between CDB$ROOT and PDB on standby system.

Fix: Aligned parameter values across all containers and re-registered services.

Lesson: Service registration issues in RAC are often parameter-driven, not mode-driven.

Conclusion

What looked like a snapshot standby limitation turned out to be a subtle RAC service registration inconsistency amplified by multi-tenant architecture and cloud automation.

In modern Oracle RAC + Data Guard environments, especially on Oracle Cloud Infrastructure, the hardest problems are not failures - they are silent misconfigurations that behave like transient issues.

Snapshot standby is stable. Data Guard is predictable. RAC service registration is deterministic., but only when parameter consistency is maintained across CDB and PDB layers.

Once that breaks, everything else becomes noise.

Some FAQs

Q1. Why did services disappear without ORA errors?
Because listener rejected registration due to parameter mismatch, not database failure.

Q2. Does snapshot standby affect listener behavior?
No. Snapshot mode was not the cause here.

Q3. How do you quickly validate this issue?

SELECT inst_id, name, value

FROM gv$parameter

WHERE name='remote_listener';

Q4. Why did OCI automation contribute to the issue?
Standby provisioning did not enforce strict parity of listener-related parameters across containers.

Q5. What is the first check in similar issues?
Compare remote_listener and local_listener across CDB and PDB immediately.

Learn DBA : A Life Long Learning Experience

Friday, 31 October 2025