Sunday, 26 October 2025

Why Oracle Execution Plans Suddenly Change

 A query that normally runs in under a second suddenly starts taking 20 seconds after midnight maintenance. CPU spikes. Application teams report slowness. Storage looks healthy. Nothing obvious changed in the SQL itself.

In many Oracle production incidents, the real culprit is not the query. It is the execution plan.

Oracle's optimizer constantly evaluates how SQL should execute based on statistics, metadata, bind values, memory conditions, and optimizer behaviour. The same SQL can behave very differently depending on what Oracle believes is the cheapest execution path at parse time.

That intelligence is one of Oracle's biggest strengths at enterprise scale. But it also means execution plans can shift unexpectedly if DBAs are not monitoring optimizer behavior carefully.

This article explains the most common reasons execution plans change, what usually breaks in production, and how experienced DBAs detect plan instability before it becomes an outage.


1. Statistics Changes

The most common trigger is fresh optimizer statistics.

After DBMS_STATS runs, Oracle may suddenly decide that a different access path is cheaper.

For example:

EXEC DBMS_STATS.GATHER_TABLE_STATS(
ownname=>'APP',
tabname=>'ORDERS',
cascade=>TRUE);
/

Looks harmless. But if table growth, skewed data, or histograms changed significantly, the optimizer may stop using indexes and switch to full table scans or hash joins.

On OLTP systems, that can become painful very quickly.

I have seen systems where nightly stats collection caused:

  • TEMP growth spikes
  • sudden CPU increase
  • excessive buffer cache churn
  • longer RAC interconnect waits
  • application timeout storms

The query itself never changed. Oracle simply recalculated the cost differently.

One thing many DBAs miss is that elapsed time alone is not enough when validating plan changes. Buffer gets usually expose the problem earlier.

SELECT sql_id,
       plan_hash_value,
       executions,
       buffer_gets/executions avg_gets
FROM v$sql
WHERE sql_id='&sql_id';

If logical reads suddenly jump 20x after stats collection, the optimizer likely changed strategy.


2. Bind Peeking and Data Skew

Bind variables improve scalability, but they also create unstable execution behavior when data distribution is uneven.

Oracle performs bind peeking during hard parse. The optimizer looks at the first bind value and chooses a plan accordingly.

That becomes dangerous when bind values vary heavily.

Consider this:

SELECT * FROM transactions WHERE region_id=:b1;

If the first execution returns only 5 rows, Oracle may choose an index range scan.

Later, another execution using a region containing millions of rows may still reuse the same index-based plan. Now the database performs massive random IO and starts flooding the buffer cache.

This is one of the most misunderstood performance problems in Oracle environments.

DBAs usually notice symptoms like:

  • inconsistent query runtime
  • multiple child cursors
  • sudden CPU spikes
  • high db file sequential read waits

A quick check:

SELECT sql_id,
       child_number,
       plan_hash_value,
       is_bind_sensitive
FROM v$sql
WHERE sql_id='&sql_id';
If child cursors keep growing, bind sensitivity is often involved.


3. Schema Changes and New Indexes

Adding an index does not always improve performance.

In fact, many execution plan regressions start immediately after deployment weekends because Oracle suddenly prefers a newly created index.

The optimizer may become overly optimistic and choose index access paths that generate excessive table lookups.

This becomes especially ugly on RAC systems where bad index access creates additional global cache traffic between nodes.

Partition changes can also affect execution plans badly.

After:

  • partition splits
  • interval partition growth
  • global index rebuilds
  • partition exchanges

Oracle may stop pruning partitions efficiently.

That usually turns a small targeted scan into a full partition sweep across terabytes of data.


4. Optimizer Changes After Patching or Upgrades

This one catches even experienced DBAs.

The database upgrade succeeds. Everything appears healthy initially. Then application performance slowly starts degrading over the next few days.

Very often the root cause is optimizer behavior changes.

One parameter always worth checking:

SHOW PARAMETER optimizer_features_enable;

Oracle continuously improves optimizer intelligence between releases. Most of the time this helps performance enormously. But older applications sometimes depend heavily on legacy optimizer behavior.

After upgrades, common symptoms include:

  • join order changes
  • nested loops becoming hash joins
  • adaptive plan shifts
  • different cardinality estimates
  • unstable execution behavior between RAC nodes

The difficult part is that the failures are often intermittent. Certain bind values work perfectly while others suddenly become extremely expensive.

Those are some of the hardest Oracle incidents to troubleshoot because infrastructure itself appears healthy.


Monitoring Execution Plan Changes

Most monitoring systems focus only on CPU, memory, or storage metrics.

That is not enough.

Execution plan volatility deserves dedicated monitoring.

AWR is usually the fastest way to identify historical plan changes:

SELECT sql_id,
       plan_hash_value,
       snap_id,
       elapsed_time_delta
FROM dba_hist_sqlstat
WHERE sql_id='&sql_id'
ORDER BY snap_id;

This immediately shows:

  • when plans changed
  • whether performance regressed
  • how frequently the optimizer shifted plans

Monitoring plan_hash_value changes proactively can save hours during production incidents.


SQL Plan Baselines

For highly critical SQL, SQL Plan Baselines can stabilize execution behavior.

DECLARE
    l_plans_loaded PLS_INTEGER;
        BEGIN
            l_plans_loaded := DBMS_SPM.LOAD_PLANS_FROM_CURSOR_CACHE(
            sql_id=>'&sql_id');
        END;
/

Baselines are extremely useful for:

  • unstable vendor SQL
  • high-frequency OLTP queries
  • post-upgrade stabilization
  • critical billing workloads

But they are not magic.

Locking a bad plan permanently is just as dangerous as uncontrolled optimizer changes. Plan management still requires proper validation and performance testing.


DBA Insights

A few operational realities many teams learn too late:

Auto statistics gathering is not always safe on large volatile systems.

A theoretically better plan is not always operationally safer. Most enterprise monitoring completely misses execution plan instability until users report slowness.

And finally, many DR tests validate backups successfully but never validate SQL performance after recovery. Different optimizer states between environments can produce very different runtime behavior.

Stable execution plans are often what separate a healthy Oracle system from a midnight escalation bridge.


Conclusion

Execution plan changes are one of the biggest hidden causes of sudden Oracle performance degradation.

The optimizer is incredibly powerful, especially on large enterprise workloads. Oracle continuously adapts execution strategies based on changing workload conditions, statistics, and metadata. That flexibility is one of the reasons Oracle scales so effectively across OLTP, RAC, Exadata, and large DW environments.

But optimizer intelligence also requires disciplined DBA oversight.

Good Oracle DBAs do not just monitor infrastructure. They monitor execution plan stability, cardinality behavior, and optimizer drift before small regressions become production outages.



No comments:

Post a Comment