Sunday 12 August 2018

MRP terminated with ORA-00600: internal error code, arguments: [3020] | For Standby database


Today, One of the database was having lag an MRP process was terminating with Internal errors ORA 600   arguments: [3020]

Here, Checked the standby database, the gap was increasing rapidly.

SQL:hostname_standby01:(MYPROD):PHYSICAL STANDBY> SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APP                  L.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"
  2  FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME)                   FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH, (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME )
  3  IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL WHERE ARCH.THREAD# = APPL.THREAD# ;

 

    Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

         1                  18223                 17969        254

 

SQL:hostname_standby01:(MYPROD):PHYSICAL STANDBY>



I was curious to check the alert log to check and know what went wrong and why MRP process keeps on terminating. So, I went through the alert log and I found below details.


hostname_standby01(oracle):MYPROD:trace$ tail -400f alert_MYPROD.log



Errors in file /app/ora/local/admin/MYPROD/diag/rdbms/myprod_hostname_129/MYPROD/trace/MYPROD_pr0s_3151989.trc:

ORA-00600: internal error code, arguments: [3020], [2], [16431], [8405039], [], [], [], [], [], [], [], []

ORA-10567: Redo is inconsistent with data block (file# 2, block# 16431, file offset is 134602752 bytes)
ORA-10564: tablespace SYSAUX

ORA-01110: data file 2: '+DATA01/myprod_hostname_129/datafile/sysaux.256.914736089'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 6478
Errors in file /app/ora/local/admin/MYPROD/diag/rdbms/myprod_hostname_129/MYPROD/trace/MYPROD_mrp0_3151683.trc  (incident=17881):


Login to Primary database and perform the backup of datafile, Here we will backup the datafile and restore the datafile to standby database.



RMAN> backup format '/db/dump01/backup_stdby/sysaux.256.914736089' datafile 2 ;


Starting backup at 19-AUG-17
using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=156 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00002 name=+DATA01/myprod_hostname_129/datafile/sysaux.257.914670317
channel ORA_DISK_1: starting piece 1 at 19-AUG-17
channel ORA_DISK_1: finished piece 1 at 19-AUG-17
piece handle=/db/files/backup_stdby/sysaux.256.914736089 tag=TAG20170219T103456 comment=NONE

channel ORA_DISK_1: backup set complete, elapsed time: 00:00:07

Finished backup at 19-AUG-17
Starting Control File and SPFILE Autobackup at 19-AUG-17

piece handle=/app/ora/local/admin/MYPROD/files/PRIMARY_MYPROD_c-218898855-20170219-01.ctl comment=NONE

Finished Control File and SPFILE Autobackup at 19-AUG-17

RMAN> exit


Now transfer the backup piece to standby server and perform the recovery :

Once the files are copied to standby server, Login to Standby database and start the restore of datafile to remediate the issue.
Catalog the backup piece using rman on standby database.


hostname_ standby01 (oracle):MYPROD:backup_stdby$ rman target /

RMAN> catalog start with '/db/files/backup_stdby' ;

using target database control file instead of recovery catalog

searching for all files that match the pattern /db/files/backup_stdby

List of Files Unknown to the Database

=====================================

File Name: /db/files/backup_stdby/sysaux.256.914736089

Do you really want to catalog the above files (enter YES or NO)? YES

cataloging files...
cataloging done

List of Cataloged Files

=======================

File Name: /db/files/backup_stdby/sysaux.256.914736089

RMAN> exit



SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY> shut immediate ;

ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.

SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY> startup mount;

ORACLE instance started.

Total System Global Area 1068937216 bytes

Fixed Size                  2235208 bytes
Variable Size             494929080 bytes
Database Buffers          566231040 bytes
Redo Buffers                5541888 bytes
Database mounted.

SQL: hostname_ standby01:(MYPRD):PHYSICAL STANDBY> !rman target /

Recovery Manager: Release 11.2.0.3.0 - Production on Sun AUG 19 10:46:20 2017

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

connected to target database: MYPROD(DBID=218895632, not open)

RMAN> restore datafile 2 ;

Starting restore at 19-AUG-17
using target database control file instead of recovery catalog

allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=78 device type=DISK
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00002 to +DATA01/myprod_files/datafile/sysaux.256.914736089
channel ORA_DISK_1: reading from backup piece /db/dump01/backup_stdby/sysaux.256.914736089
channel ORA_DISK_1: piece handle=/db/files/backup_stdby/sysaux.256.914736089 tag=TAG20170219T103456
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:01

Finished restore at 19-AUG-17



RMAN> exit

Recovery Manager complete.


Once the restore via RMAN  is completed. Bounce the MRP and check the behaviour.




SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY> alter database recover managed standby database cancel ;

Database altered.

SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DISCONNECT;

Database altered.

Check if MRP is running now. All looks good.. ! J



SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY>  !ps -ef|grep mrp

oracle   3500966       1  0 10:47 ?        00:00:00 ora_mrp0_MYPROD

oracle   3501928 3456846  0 10:48 pts/10   00:00:00 /bin/ksh -c ps -ef|grep mrp

oracle   3501930 3501928  0 10:48 pts/10   00:00:00 grep mrp


Check if lag is reducing and is in Sync with Primary database:


SQL: hostname_ standby01:( MYPROD):PRIMARY> archive log list ;

Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            /app/ora/local/admin/myprod/arch1
Oldest online log sequence     18233
Next log sequence to archive   18235                
Current log sequence           18235                


SQL: hostname_ standby01:( MYPROD):PHYSICAL STANDBY>

SELECT ARCH.THREAD# "Thread", ARCH.SEQUENCE# "Last Sequence Received", APPL.SEQUENCE# "Last Sequence Applied", (ARCH.SEQUENCE# - APPL.SEQUENCE#) "Difference"

FROM (SELECT THREAD# ,SEQUENCE# FROM V$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$ARCHIVED_LOG GROUP BY THREAD#)) ARCH, (SELECT THREAD# ,SEQUENCE# FROM V$LOG_HISTORY WHERE (THREAD#,FIRST_TIME )

IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V$LOG_HISTORY GROUP BY THREAD#)) APPL WHERE ARCH.THREAD# = APPL.THREAD# ;



SQL:xstm6551bor:( MYPROD):PHYSICAL STANDBY> /

    Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

         1                  18234                 18215         19



SQL:hostname_standby01:( MYPROD):PHYSICAL STANDBY> /

    Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

         1                  18234                 18228          6



SQL:hostname_standby01:( MYPROD):PHYSICAL STANDBY> /

    Thread Last Sequence Received Last Sequence Applied Difference

---------- ---------------------- --------------------- ----------

         1                  18234                 18234          0



SQL: hostname_standby01:( MYPROD):PHYSICAL STANDBY>
SQL: hostname_ standby01:(MYPROD):PHYSICAL STANDBY>




The Standby is in Sync with Primary database now.



"Do something (anything). If you don't do anything, you won't get anywhere. Make it your hobby, not a chore, but above all have fun!"




Sunday 10 June 2018

Troubleshooting Issues with Undo Tablespace


Commonly seen problems with the undo tablespace are of the following nature:
These errors can be caused by many different issues, such as incorrect sizing of the undo tablespace or poorly written SQL or PL/SQL code.

• ORA-01555: snapshot too old
• ORA-30036: unable to extend segment by ... in undo tablespace 'UNDO1'

Causes :


Frequent commits can be the cause of ORA-1555. It's all about read consistency. The time you start a query oracle records a before image. So the result of your query is not altered by DML that takes place in the meantime (your big transaction). The before image uses the rollback segments to get the values of data that is changed after the before image is taken. By committing in your big transaction you tell oracle the rollback data of that transaction can be overwritten. If your query need data from the rollback segments that is overwritten you get this error. The less you commit the less chance you have that the rollback data you need is overwritten. Typically this occurs when users are executing the PL/SQL procedures and code commits inside a cursor.

Actions :


    1. Check if Undo Is Correctly Sized:

The below query checks for issues that have occurred within the last day :

select to_char(begin_time,'MM-DD-YYYY HH24:MI') begin_time
,ssolderrcnt ORA_01555_cnt, nospaceerrcnt no_space_cnt
,txncount max_num_txns, maxquerylen max_query_len
,expiredblks blck_in_expired
from v$undostat where begin_time > sysdate - 1 order by begin_time; 

Output :

BEGIN_TIME           ORA_01555_CNT   NO_SPACE_CNT   MAX_NUM_TXNS   BLCK_IN_EXPIRED
----------------     -------------   ------------   ------------    ---------------
06-10-2018 14:52                 0         0         42              0

02-10-2018 07:24                 0         0          0              0


If this column reports a non-zero value, you need to do one or more of the following tasks:

The most effective way is to “Increase the UNDO_RETENTION initialization parameter”.  

2. Below are the resolutions that can be taken hence forth

         Commit less often, commit at the end only
         Ensure that code does not contain COMMIT statements within cursor loops.
         Re-schedule long-running queries when the system has less DML load or Off-peak hours
         Check the SQL’s that are consuming more undo and try to tune the SQL statement throwing the errors.
         Finally, you may proceed to add extra rollback segments (undo logs) to make more transaction slots available.


NOTE : A maximum of 4 days’ worth of information is stored in the V$UNDOSTAT view. The statistics are gathered every 10 minutes, for a maximum of 576 rows in the table. If you’ve stopped and started your database within the last 4 days, this view will only contain information from the time you last started your database.


The following query displays the current undo size and the recommended size for an undo tablespace with recommended retention in seconds:

select sum(bytes)/1024/1024 cur_mb_size,
dbms_undo_adv.required_undo_size(900) req_mb_size
from dba_data_files
where tablespace_name = (select
value from v$parameter where name = 'undo tablespace');



Output:

CUR_MB_SIZE   REQ_MB_SIZE
-----------   -----------

51200         35840


The output shows that the undo tablespace currently has size of 50GB allocated to it.
In the prior query, you used 900 seconds as the amount of time to retain information in the undo tablespace. To retain undo information for 900 seconds, the Oracle Undo Advisor estimates that the undo tablespace should be around 35G . In this example the undo tablespace is sized adequately. If it were not sized adequately, you would have to either add space to an existing data file or add a data file to the undo tablespace.

Here is perfect query to get the Current undo retention and optimal undo retention  from site Akadia

SELECT d.undo_size/(1024*1024) "ACTUAL UNDO SIZE [MByte]",
       SUBSTR(e.value,1,25) "UNDO RETENTION [Sec]",
       ROUND((d.undo_size / (to_number(f.value) *
       g.undo_block_per_sec))) "OPTIMAL UNDO RETENTION [Sec]"
  FROM ( SELECT SUM(a.bytes) undo_size FROM v$datafile a,
  v$tablespace b, dba_tablespaces c
         WHERE c.contents = 'UNDO' AND c.status = 'ONLINE' AND 
b.name = c.tablespace_name AND a.ts# = b.ts#) d,
v$parameter e, v$parameter f, (
SELECT MAX(undoblks/((end_time-begin_time)*3600*24)) undo_block_per_sec FROM v$undostat) g
WHERE e.name = 'undo_retention' AND f.name = 'db_block_size'
/


Output :

ACTUAL UNDO SIZE [MByte]
------------------------
51200

UNDO RETENTION [Sec]
--------------------
10800

OPTIMAL UNDO RETENTION [Sec]
----------------------------

14580


Find the sessions using view - v$session and v$transaction to get sessions consuming UNDO Segments :


select s.sid, s.serial#, s.osuser, s.logon_time ,s.status, s.machine
,t.used_ublk, t.used_ublk*16384/1024/1024 undo_usage_mb
from v$session s ,v$transaction t where t.addr = s.taddr;




You can use below query using view – v$SQL to get SQL statement associated with a user/session consuming undo space.

select s.sid, s.serial#, s.osuser, s.logon_time, s.status ,s.machine, t.used_ublk ,
t.used_ublk*16384/1024/1024 undo_usage_mb ,q.sql_text from v$session s,
v$transaction t ,v$sql q where t.addr = s.taddr and s.sql_id = q.sql_id;





"Do something (anything).  If you don't do anything, you won't get anywhere. 
Make it your hobby, not a chore, but above all have fun!"  😊


Monday 26 March 2018

Wait Events : checkpoint busy waits or archiver busy waits



While such wait events occur  in AlertLog file, one must consider to proceed with ARCHIVER TUNING


1. Check the number of Online Redolog Members and size of the online redo logs. 
Excessive size and the number of online redo log groups will give archiver more time to catch up. Hence Adding more online logs does not help a situation where the archiver cannot keep up with LGWR process.
It can help if there are bursts of redo generation since it gives ARCH more time to average its processing rate over time.


2. In such cases you can add multiple archiver (ARCh) processes
Create 'alter system archive log all'. This will spawn archive processes at some fixed interval may be required. These processes once spawned will assist archiver in archiving any un-archived log in that thread of redo. Once it has been completed, the temporary processes will go away.


3. Evaluate checkpoint interval and frequency
There are several possible actions include adding DBWR processes,  increasing db_block_checkpoint_batch, reducing db_block_buffers. Turning on or allowing async IO capabilities definitely helps alleviate most DBWR inefficiencies.

4. Check OS supportability of asynchronous I/Os
Async reads should help tremendously. Async writes may help if OS supports asynchronous I/Os on file systems.
You can check with your vendor if the current version of your operating system supports async IO to file systems (ufs).

5. Check for system or IO contention.
Check CPU waits and usage, disk  level bottlenecks. Also check operating system manuals for the appropriate commands to monitor system performance.
For example, you can use UNIX  commands such as "sar  5 5 5"  “sar –d ”or "iostat" to identify disk bottlenecks.




Sunday 11 March 2018

Accessing a schema without knowing the password



Most of the times that you may need to logon to a database user / schema owner, to do so emergency maintenance,,  but you don’t know the password.

There is an alternative where you can use alter session set current schema “Schema-Name” ;

Other than this you can use the below process, where you record the current password encryption, change the password, logon and do your maintenance.

Let’s try an example here
      

Create an account:

SQL> conn / as sysdba

Connected.

 

SQL>  create user nikhil identified by mypass1 ;

User created.

SQL> grant connect , resource to nikhil ;

Grant succeeded.

 




Now if I want to change the password, then I should know the ‘Old Password’


SQL> conn nikhil/mypass1 ;
Connected.


SQL> password 

Changing password for NIKHIL

Old password:

Oops..!



In such cases you can use the user$ view under sys user which will give us the encrypted password so that we will preserve the old password
Let’s see..


SQL> conn / as sysdba

Connected.



SQL> select name,'alter user '||name||' identified by values '''||spare4||';'||password||''';' command from sys.user$ where name = 'NIKHIL';



NAME

------------------------------

COMMAND

--------------------------------------------------------------------------------

NIKHIL

alter user NIKHIL identified by values 'S:2549CDA4335FCEF7814FD9832AD653A937AAE8

7105AB962A873A59A576E8;FD135DE4875002BA';


Now I will set a temporary password for the account to perform the activity.. Later will set the old password using this encrypted values.


SQL> alter user nikhil identified by demopassword ;

User altered.


SQL> conn nikhil/demopassword ;

Connected.



Now connect as sysdba,  and revert the password using the encrypted values


SQL> conn / as sysdba

Connected.



SQL> alter user NIKHIL identified by values 'S:2549CDA4335FCEF7814FD9832AD653A937AAE87105AB962A873A59A576E8;FD135DE4875002BA';

User altered.



SQL> conn nikhil/mypass1 ;

Connected.

SQL>

Alternatively, we can use the DBMS_METADATA package to get the encryption;



SQL> set long 10000

select dbms_metadata.get_ddl('USER','NIKHIL') command from dual;SQL>



COMMAND

--------------------------------------------------------------------------------



   CREATE USER "NIKHIL" IDENTIFIED BY VALUES 'S:2549CDA4335FCEF7814FD9832AD653A9

37AAE87105AB962A873A59A576E8;FD135DE4875002BA'

      DEFAULT TABLESPACE "USERS"

      TEMPORARY TABLESPACE "TEMP"