在Suse 12.4上安裝11.2.0.4的rac執行root.sh報錯“ORA-12547: TNS:lost contact”

lhrbest發表於2020-05-07


在Suse 12.4上安裝11.2.0.4的rac執行root.sh報錯“ORA-12547: TNS:lost contact”--__lll_unlock_elision


作業系統:SLES_SAP-release-12.4-1.131.x86_64



在節點1上執行root.sh,報錯: /u01/app/11.2.0/grid/root.sh

Performing root user operation for Oracle 11g 
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0/grid
   Copying dbhome to /usr/local/bin ...
   Copying oraenv to /usr/local/bin ...
   Copying coraenv to /usr/local/bin ...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
Adding Clusterware entries to inittab
CRS-2672: Attempting to start 'ora.mdnsd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.mdnsd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.gpnpd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'v3erpzgd01'
CRS-2672: Attempting to start 'ora.gipcd' on 'v3erpzgd01'
CRS-2676: Start of 'ora.cssdmonitor' on 'v3erpzgd01' succeeded
CRS-2676: Start of 'ora.gipcd' on 'v3erpzgd01' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'v3erpzgd01'
CRS-2672: Attempting to start 'ora.diskmon' on 'v3erpzgd01'
CRS-2676: Start of 'ora.diskmon' on 'v3erpzgd01' succeeded
CRS-2676: Start of 'ora.cssd' on 'v3erpzgd01' succeeded
ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507PM121627.log for details.
Configuration of ASM ... failed
see asmca logs at /u01/app/grid/cfgtoollogs/asmca for details
Did not succssfully configure and start ASM at /u01/app/11.2.0/grid/crs/install/crsconfig_lib.pm line 6912.
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed

檢視檔案:/u01/app/grid/cfgtoollogs/asmca/asmca-200507PM121627.log,錯誤如下:

[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logException:174]  ORA-12547: TNS:lost contact
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logException:175]  oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-12547: TNS:lost contact
oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1658)
oracle.sysman.assistants.util.sqlEngine.SQLEngine.connect(SQLEngine.java:981)
oracle.sysman.assistants.usmca.backend.USMInstance.connectToASM(USMInstance.java:626)
oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3016)
oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1049)
oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:944)
oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:797)
oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:174)
oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:369)
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507AM095050.log for details.
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  Instance running false
[main] [ 2020-05-07 09:50:51.837 CST ] [UsmcaLogger.logInfo:143]  ASM failed to start. Check /u01/app/grid/cfgtoollogs/asmca/asmca-200507AM095050.log for details.

其實看不出來啥,總之就是不能建立ASM例項,然後去檢查ASM例項的告警日誌發現錯誤如下:

System parameters with non-default values:
  large_pool_size          = 12M
  instance_type            = "asm"
  remote_login_passwordfile= "EXCLUSIVE"
  asm_diskstring           = "/dev/oracleasm/asm-*"
  asm_power_limit          = 1
  diagnostic_dest          = "/u01/app/grid"
Cluster communication is configured to use the following interface(s) for this instance
  10.206.110.3
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
Process PMON died, see its trace file
USER (ospid: 14306): terminating the instance due to error 443
Instance terminated by USER, pid = 14306
Thu May 07 12:16:34 2020
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F8962019640, __lll_unlock_elision()+48] [flags: 0x0, count: 1]
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.


主要報錯資訊:__lll_unlock_elision,用這個關鍵詞去搜,基本上都是bug。


執行root.sh的時候的詳細日誌:$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_主機名.log


解決:


export LD_LIBRARY_PATH=/lib64/noelision/:$LD_LIBRARY_PATH
echo "/lib64/noelision" > /etc/ld.so.conf.d/noelision.conf
ldconfig
vi /etc/ld.so.conf
新增:/lib64/noelision
ln -s /lib64/noelision/libpthread-2.22.so /u01/app/11.2.0/grid/lib/libpthread.so.0



 



How to disable Hardware Lock Elision

This document  (7022289) is provided subject to the   at the end of this document.

Environment

SUSE Linux Enterprise Server 12 (SLES 12)
SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)
SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP2)
SUSE Linux Enterprise Server 12 Service Pack 3 (SLES 12 SP3)
SUSE Linux Enterprise Server 12 Service Pack 4 (SLES 12 SP4)
SUSE Linux Enterprise Server 12 Service Pack 5 (SLES 12 SP5)
SUSE Linux Enterprise Server 15 (SLES 15)
SUSE Linux Enterprise Server 15 Service Pack 1 (SLES 15 SP1)
 

Situation

Some third-party applications installed on SUSE Linux Enterprise Server, may require that the 'Hardware Lock Elision' functionality is disabled.

Resolution

Temporary solution: Add the appropriate noelision path statement to the beginning of the library load path (LD_LIBRARY_PATH) so that the noelision libraries are used in preference to any others appearing later in the path

                                  e.g.   export LD_LIBRARY_PATH=/lib64/noelision/:$LD_LIBRARY_PATH

                                  On server reboot, this change will be lost.


Permanent solution: Create file  /etc/ld.so.conf.d/noelision.conf

                                 Add the appropriate line: e.g.    /lib64/noelision

                                 After saving the noelision.conf changes, run ` ldconfig` to rebuild caches.
 

Cause

There is a bug in some Intel CPUs which does not handle the Hardware Lock Elision correctly.

Additional Information

Oracle grid software is susceptible to this CPU bug. If elision locking is not disabled on servers with such faulty CPUs, Oracle may crash during initialization.

The ' noelision' 32 bit and 64 bit libraries are found here:-

       /lib/noelision
       /lib64/noelision



NOTE: Make sure that  /etc/ld.so.conf includes the  /etc/ld.so.conf.d/ directory (default) or add the directory you want to be included (the directory where you placed the  noelision.conf file)   * it is recommended to keep everything under  /etc/ld.so.conf.d/

NOTE: The files present in the  directories listed in  /etc/ld.so.conf are applied in alpha-numerical file name order. You need to be aware of what is in each file, in each of the included directories, to make sure that the desired settings are being applied and not 'overwritten' or ignored.


 

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.



SLES 12: CLSRSC-366: Failed to import credentials for ASM During root.sh Execution When Adding New Node (Doc ID 2420338.1)


In this Document


Symptoms

Changes

Cause

Solution

Community Discussions

References


APPLIES TO:

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

SYMPTOMS

On : 12.2.0.1 version, Clusterware

Following error is reported during root.sh execution while adding a new node to 12.2 cluster environment

#/u01/app/12.2.0.1/grid # ./root.sh

Performing root user operation.

The following environment variables are set as:

ORACLE_OWNER= grid

ORACLE_HOME= /u01/app/12.2.0.1/grid

.

.

CRS-4133: Oracle High Availability Services has been stopped.

CRS-4123: Oracle High Availability Services has been started.

2018/06/25 17:34:46  CLSRSC-366: Failed to import credentials for ASM

Died at /u01/app/12.2.0.1/grid/crs/install/crsutils.pm line 8461.
The command '/u01/app/12.2.0.1/grid/perl/bin/perl -I/u01/app/12.2.0.1/grid/perl/lib -I/u01/app/12.2.0.1/grid/crs/install /u01/app/12.2.0.1/grid/crs/install/rootcrs.pl ' execution failed

root execution log indicates error when importing ASM credentials and core dump is created.

<ORACLE_BASE>/crsdata/node3/crsconfig/rootcrs_node3_2018-06-25_05-33-24PM.log:

2018-06-25 17:34:45: It is an add node scenario
2018-06-25 17:34:45: Importing asm credentials
2018-06-25 17:34:45: Executing cmd: /u01/app/12.2.0.1/grid/bin/crsctl add credmaint -path ASM -local
2018-06-25 17:34:45: Command output:
> CRS-10405: (:CLSCRED0006:)Credential domain already exists.
> CRS-4000: Command Add failed, or completed with errors.
>End Command output
Jun 25 17:34:46. 20181657:2018-06-25 17:34:45: Executing cmd: /u01/app/12.2.0.1/grid/bin/crsctl setperm credmaint -o grid -path ASM -local
2018-06-25 17:34:46: Running as user grid: /u01/app/12.2.0.1/grid/bin/kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE
2018-06-25 17:34:46: s_run_as_user2: Running /bin/su grid -c ' echo CLSRSC_START; /u01/app/12.2.0.1/grid/bin/ kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE
'
2018-06-25 17:34:46: Removing file /tmp/xsFtxcsJX0
2018-06-25 17:34:46: Successfully removed file: /tmp/xsFtxcsJX0
2018-06-25 17:34:46: pipe exit code: 35584
2018-06-25 17:34:46: /bin/su exited with rc=139
2018-06-25 17:34:46: kfod op=credimport rc: 139 
2018-06-25 17:34:46: Failed to enable flex ASM on local node, error: bash: line 1: 29674  Segmentation fault (core dumped) /u01/app/12.2.0.1/grid/bin/kfod op=credimport wrap=/u01/app/12.2.0.1/grid/gpnp/seed/asm/credentials.xml olr=TRUE force=TRUE 

2018-06-25 17:34:46: Executing cmd: /u01/app/12.2.0.1/grid/bin/clsecho -p has -f clsrsc -m 366 
2018-06-25 17:34:46: Command output: 
CLSRSC-366: Failed to import credentials for ASM

Generate stack trace from the generated core file using  Doc ID 1812.1

Script started on Mon 25 Jun 2018 05:42:46 PM CEST
[grid@rac3/+ASM3 ~]> gdb /u01/app/12.2.0.1/grid/bin/kfod.bin /var/log[Kcal/dumps/core/ core.kfod.bin.27988
GNU gdb (GDB; SUSE Linux Enterprise 12) 8.0
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:

Find the GDB manual and other documentation resources online at:

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /u01/app/12.2.0.1/grid/bin/kfod.bin...done.
[New LWP 27988]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by 'kfod.bin 12.2.0.1/grid/bin/kfod.bin op=credimport
wrap=/u01/app/12.2.0.1/grid/g'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007fdbc171a9db in raise () from /lib64/libpthread.so.0
Missing separate debuginfos, use: zypper install
glibc-debuginfo-2.22-61.3.x86_64 libaio1-debuginfo-0.3.109-17.15.x86_64
libgcc_s1-debuginfo-6.2.1+r239768-2.4.x86_64
libnuma1-debuginfo-2.0.9-9.1.x86_64
(gdb) bt
#0 0x00007fdbc171a9db in raise () from /lib64/libpthread.so.0
#1 0x00007fdbc6cc0f12 in skgesigOSCrash () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#2 0x00007fdbc72e2295 in kpeDbgSignalHandler () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#3 0x00007fdbc6cc1250 in skgesig_sigactionHandler () from
/u01/app/12.2.0.1/grid/lib/libclntsh.so.12.1
#4
#5 0x00007fdbc171c4a0  in __lll_unlock_elision () from /lib64/libpthread.so.0 <<<<<<<<<<<<<<<<<<<<<<<<
#6 0x00007fdbc2a9b794 in  scls_iddb_is_a_privgrp_member_by_id (ctx=0x105b988, <<<<<<<<<<<<<<<<<<<<<<<<
ose=0x7ffe45412214, flags=256, u_id=0x105ba08, g_id=0x105be48,
result=0x7ffe45412204) at scls.c:3669
#7 0x00007fdbc25c5626 in proa_is_valid_user_group (ocrctx=0x1055e20,
sec_attr=0x10620a8, flag=0) at proa.c:15016
#8 0x00007fdbc25b1a31 in proa_write (ocrctx=0x1055e20, keyhandle=0x10618e8,
dtype=procr_dtype_NOVALUE, sec_attr=0x10620a8, invalue=0x1062134 "",
inout_size=0, write_option=10, flags=33554432)
at proa.c:5049
#9 0x00007fdbc25ba742 in proa_batch_execute (ocrctx=0x1055e20,
batch=0x105bc48, flags=33554432) at proa.c:9520
#10 0x00007fdbc25d9061 in procr_batch_execute (ocrctx=0x1055e20,
batch=0x105bc48, flags=0) at procr.c:8859
#11 0x00007fdbc33b3901 in clsCredOcrBatchExec (pOcrCtx=0x1055e20,
pBatch=0x105bc48, ocrFlags=0, pDruid=0x7fdbc3998a74 "(:CLSCRED0138:)",
msgId=10401, pErr=0xf0c4f0) at clsCredUtils.c:5463
#12 0x00007fdbc33b383b in clsCredStoreBatchExec (pDom=0x1025240,
pBatch=0x7ffe45417b60, pDruid=0x7fdbc3998a74 "(:CLSCRED0138:)",
pErr=0xf0c4f0) at clsCredUtils.c:5443

 

CHANGES

 

CAUSE

This issue was investigated in Bug 28268288 - ADDNODE FAIL WITH KFOD OP=CREDIMPORT CORE DUMP and it was concluded as a duplicate of Bug 26399297 - CLUSTER STARTUP FAILS IF HLE IS ENABLED

The issue caused by Hardware Lock Elision set to TRUE
 

SOLUTION

Bug 26399297 fixed in future release. Apply interim patch 26399297, if available for your platform and Oracle version and re-run root.sh. If no patch exists for your version, please contact Oracle Support for a backport request.

Workaround is to Disable Hardware Lock Elision and re-run root.sh.

Reference:





Oracle Grid Infrastructure Install Fails on SuSE 12 when Running root.sh (Doc ID 2253054.1)


In this Document


Symptoms

Changes

Cause

Solution

References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 12.1.0.1 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Linux x86-64

SYMPTOMS

 This problem can be encountered when installing Oracle Restart (SIHA) or Grid Infrastructure for a Cluster.  The root.sh script fails with:

Using configuration parameter file: ./crsconfig_params
LOCAL ADD MODE
Creating OCR keys for user 'oracle', privgrp 'dba'..
Operation successful.
LOCAL ONLY MODE
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4664: Node ldwebi101 successfully pinned.
2017/03/24 15:52:51 CLSRSC-330: Adding Clusterware entries to file 'oracle-ohasd.service'

PRCR-1006 : Failed to add resource ora.ons for ons
PRCR-1115 : Failed to find entities of type resource type that match filters
(TYPE_NAME ends .type) and contain attributes
CRS-0184 : Cannot communicate with the CRS daemon.
2017/03/24 15:53:03 CLSRSC-180: An error occurred while executing the command
'srvctl add ons' (error code 0)

 Review of the OHASD Trace ($DIAG_DEST/crs/<hostname>/crs/trace/ohasd.trc) will show OHASD crashing with Signal 11:

2017-03-29 08:09:27.014827 :OHASDMAIN:1063044736: OHASD Daemon Starting. Command string :restart
2017-03-29 08:09:27.014839 :OHASDMAIN:1063044736: OHASD params []
2017-03-29 08:09:27.015156 :OHASDMAIN:1063044736: Initializing OLR
. . .
2017-03-29 08:11:09.927125 : default:121607936: clsvactversion:4: Retrieving
Active Version from local storage.
2017-03-29 08:11:09.947361 :UiServer:121607936: {0:0:54} Container [ Name:
UI_REGISTER_TYPE
API_HDR_VER:
TextMessage[3]
ATTR_LIST:
TextMessage[BASE_TYPE=ora.local_asm.typeBASE_TYPE=_STRINGBASE_TYPE=_READONL
Y TYPE_NAME=ora.asm.typeTYPE_NAME=_STRINGTYPE_NAME=_READONLY ]
CLIENT:
TextMessage[]
CLIENT_NAME:
TextMessage[/opt/oracle/product/grid_12102/bin/crsctl.bin]
CLIENT_PID:
TextMessage[31662]
CLIENT_PRIMARY_GROUP:
TextMessage[dba]
LOCALE:
TextMessage[AMERICAN_AMERICA.AL32UTF8]
QUEUE_TAG:
TextMessage[1]
RESOURCE:
TextMessage[ora.asm.type]
]
Trace file /opt/oracle/diag/crs/<hostname>/crs/trace/ohasd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
DDE: Flood control is not active
2017-03-29 08:11:09.947455 :UiServer:121607936: {0:0:54} Sending to PE. ctx=0x7f66e8038030, ClientPID=31662
2017-03-29 08:11:09.947666 : CRSPE:123709184: {0:0:54} Cmd : 0x7f66e41e1c60 : flags: QUEUE_TAG
2017-03-29 08:11:09.947703 : CRSPE:123709184: {0:0:54} Processing PE command id=108 origin:ldwebi101. Description: [Register Type : : 0x7f66e41e1c60]
2017-03-29 08:11:09.948247 : CRSSEC:123709184: {0:0:54} Allow all users to register resources in the new engine.
CLSB:123709184: Oracle Clusterware infrastructure error in OHASD (OS PID 30985): Fatal signal 11 has occurred in program ohasd thread 123709184; nested signal count is 1
Incident 265 created, dump file:
/opt/oracle/diag/crs/<hostname>/crs/incident/incdir_265/ohasd_i265.trc
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Trace file /opt/oracle/diag/crs/<hostname>/crs/trace/ohasd.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.

 Review of the incident file generated from the OHASD crash ($DIAG_DEST/crs/<hostname>/crs/incident/incdir_<nnn>/ohasd_i<nnn>.trc) shows the signaling function in the crash to be  "__lll_unlock_elision":

----- Incident Context Dump -----
Address: 0x7f67075d9ac0
Incident ID: 265
Problem Key: CRS 8503
Error: CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgeExecuteForError [diag_dde]
[02]: dbgePostErrorDirect [diag_dde]
[03]: clsdAdrPostError []<-- Signaling
[04]: clsbSigErrCB []
[05]: skgesig_sigactionHandler []
[06]: __sighandler []
[07]: __lll_unlock_elision []
[08]: sltsmnr []
[09]: scls_iddb_is_a_privgrp_member_by_id []
[10]: _ZNK3CAA8Identity14belongsToGroupESs []
[11]: _ZN3CAA17PrimaryGroupEntry3hasERKNS_8IdentityERKSs []
[12]: _ZN3CAA3Acl3hasERKNS_8IdentityERKSs []
[13]: _ZN3CAA10Authorizer15checkPermissionERKNS_8IdentityERKSs []
[14]: _ZNK8cls_pe1210Authorizer15checkPermissionERKN3CAA8IdentityERNS1_10AuthorizerEj []
[15]: _ZNK8cls_pe1213ManagedObject17verifyPermissionsERKN3CAA8IdentityEj []
[16]: _ZN8cls_pe1229RegisterResourceTypeOperation17validateOperationEv []
[17]: _ZN8cls_pe1229RegisterResourceTypeOperation10initializeEv []
[18]: _ZN8cls_pe1229RegisterResourceTypeUiCommand15createOperationERKSsS2_ []
[19]: _ZN8cls_pe1229RegisterResourceTypeUiCommand19createIceOperationsEv []
[20]: _ZN8cls_pe127Command7processEv []
[21]: _ZN8cls_pe1218PolicyEngineModule15invokeUiCommandEPNS_7CommandE []
[22]: _ZN8cls_pe1218PolicyEngineModule23processUiCommandHandlerEPNS_14PeUiCmdMessageE []
[23]: _ZN3cls11ThreadModel12processQueueEP7sltstid []
[24]: _ZN3cls11ThreadModel5runTMEPv []
[25]: _ZN13CLS_Threading13CLSthreadMain8cppStartEPv []
[26]: start_thread []
MD [00]: 'Client ProcId'='ohasd.bin@ldwebi101.30985_140080482068224' (0x0)

Other symptoms of this issue have been seen in oracssdagent startup where will see a similar call stack show ($DIAG_DEST/crs/<hostname>/crs/incident/incdir_<nnn>/ohasd_cssdagent_root_i<nnn>.trc) shows the signaling function in the crash to be "__lll_unlock_elision":

Call Stack
-----------
Dump continued from file:
/u01/app/grid/diag/crs/<hostname>/crs/trace/ohasd_cssdagent_root.trc
[TOC00001]
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[TOC00001-END]
[TOC00002]
========= Dump for incident 9 (CRS 8503) ========
Starting a Diag Context default dump (level=3)

Problem Key: CRS 8503
Error: CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgeExecuteForError [diag_dde]
[02]: dbgePostErrorDirect [diag_dde]
[03]: clsdAdrPostError []<-- Signaling
[04]: clsbSigErrCB []
[05]: skgesig_sigactionHandler []
[06]: __sighandler []
[07]: __lll_unlock_elision []
[08]: sltsmnr []
[09]: scls_iddb_is_a_privgrp_member_by_id []
[10]: scls_canexec []
[11]: scls_process_spawn []
[12]: clsncssd_cssdstart []
[13]: _ZN8cls_agfw3Cmd7executeEv []
[14]: _ZN8cls_agfw5CmdEx10executeCmdEPN3cls7MessageE []
[15]: _ZN8cls_agfw5CmdEx14clsRequestHdlrEPN3cls7MessageE []
[16]: _ZN3cls11ThreadModel12processQueueEP7sltstid []
[17]: _ZN3cls11ThreadModel5runTMEPv []
[18]: _ZN13CLS_Threading13CLSthreadMain8cppStartEPv []
[19]: start_thread []

 

CHANGES

 New Installation of Oracle GI 12.1 and above on SuSE 12.

CAUSE

 glibc in SuSE 12 makes use of a Hardware Lock Elision (HLE) available in newer Intel Processors.  This exposes   causing Clusterware Processes (OHASD, CSSDAGENT, etc) to crash on startup.

SOLUTION

The fix for   MUST be applied prior to running root.sh as described in MOS Note: 1410202.1.     will be fixed in an upcoming GI PSU for 12.1.0.2 and RU for 12.2.0.1.  

 

If for some reason you are unable to apply the fix for   prior to running root.sh, the following workaround may be implemented to allow the installation/upgrade to succeed:

1.  Assuming root.sh has already failed, deconfigure the failed install as the ROOT user:

Note: Do NOT close out the OUI Window, we will need this to complete the installation after root.sh is successful.
 
# $GI_HOME/crs/install/roothas.pl -deconfig -force

2.  Modify the /etc/ld.so.conf adding /lib64/noelision as the FIRST entry.  It should look similar to the following:

/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
# /lib64, /lib, /usr/lib64 and /usr/lib gets added
# automatically by ldconfig after parsing this file.
# So, they do not need to be listed.

3.  Create a link in $GI_HOME/lib for the noelision version of the libpthread library:

# ln -s /lib64/noelision/libpthread-2.19.so $GI_HOME/lib/libpthread.so.0

4.  Rerun the root.sh script and complete the installation via the OUI once root.sh has successfully completed.





"sqlplus / as sysdba" reports ORA-12547 on SUSE 12 (Doc ID 2297117.1)


In this Document


Symptoms

Changes

Cause

Solution

References


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.4 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Linux x86-64

SYMPTOMS

On : 11.2.0.4 version, RDBMS, SUSE 12 platform:

1. "sqlplus / as sysdba" reports ORA-12547:

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.4.0 Production on Fri Aug 4 23:02:08 2017
Copyright (c) 1982, 2013, Oracle. All rights reserved.

ERROR:
ORA-12547: TNS:lost contact

2. Strace shows below error:

15069 0.000019 munmap(0x7ff21311f000, 268435456) = 0 <0.000010>
15069 0.000024 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
15068 0.000293 <... read resumed> "", 64) = 0 <0.010658>
15069 0.000024 +++ killed by SIGSEGV +++
15068 0.000005 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=15069, si_uid=300, si_status=SIGSEGV, si_utime=0, si_stime=0} ---

3. ora-7445 [__lll_unlock_elision] raised if attempting to startup the database:

Alert log:

Process PMON died, see its trace file
USER (ospid: 3075): terminating the instance due to error 443
Instance terminated by USER, pid = 3075
Tue Aug 08 01:43:12 2017
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F11DF21C490, __lll_unlock_elision()+48] [flags: 0x0, count: 1]

Trace File:

*** 2017-08-08 01:43:12.268
Exception [type: SIGSEGV, SI_KERNEL(general_protection)] [ADDR:0x0] [PC:0x7F11DF21C490, __lll_unlock_elision()+48] [flags: 0x0, count: 1]
DDE: Flood control is not active
========= Dump for critical error (no incident) (ORA 7445 [__lll_unlock_elision()+48]) ========
Registers:
%rax: 0x0000000000000000 %rbx: 0x0000000000048006 %rcx: 0x000000000000001e
%rdx: 0x000000000c1c2e88 %rdi: 0x000000000c1c2e88 %rsi: 0x0000000000000000
%rsp: 0x00007ffc48934788 %rbp: 0x00007ffc48934790 %r8: 0x0000000000000000
%r9: 0x00000000bfffffff %r10: 0x0000000008000000 %r11: 0x0000000000000202
%r12: 0x000000000c195340 %r13: 0x00007ffc48934a38 %r14: 0x0000000010000000
%r15: 0x00007f11de6015e8 %rip: 0x00007f11df21c490 ïl: 0x0000000000010246
__lll_unlock_elision()+29 (0x7f11df21c47d) add $0x80,%rsp
__lll_unlock_elision()+36 (0x7f11df21c484) xor êx,êx
__lll_unlock_elision()+38 (0x7f11df21c486) ret
__lll_unlock_elision()+39 (0x7f11df21c487) nopw 0x0(%rax,%rax)
> __lll_unlock_elision()+48 (0x7f11df21c490) lgdt %bp
__lll_unlock_elision()+51 (0x7f11df21c493) xor êx,êx
__lll_unlock_elision()+53 (0x7f11df21c495) ret
__lll_unlock_elision()+54 (0x7f11df21c496) cs: nopw 0x0(%rax,%rax)
__lll_unlock_elision()+64 (0x7f11df21c4a0) movzwl (%rsi),êx

----- Call Stack Trace -----

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- ssexhd <- sighandler <- lll_unlock_elisio <- sltsimr <- sskgmdt
<- skgmdtmany <- skgmdetach0 <- skgmdetach <- ksmdsgi <- ksmdsg
<- ksuabt <- opistr_real <- opistr <- opiodr <- ttcpip
<- opitsk <- opiino <- opiodr <- opidrv <- sou2o
<- opimai_real <- ssthrdmain <- main <- libc_start_main <- start

 4. "dbca" will also fail with above error.

CHANGES

 This is a new installed 11.2.0.4 on SUSE 12.

CAUSE

glibc in SuSE 12 makes use of a Hardware Lock Elision (HLE) available in newer Intel Processors.
This can cause process crash on call stack "__lll_unlock_elision" 

SOLUTION

1. Modify the "/etc/ld.so.conf" adding "/lib64/noelision" as the FIRST entry. It should look similar to the following:

/lib64/noelision
/usr/local/lib64
/usr/local/lib
include /etc/ld.so.conf.d/*.conf
# /lib64, /lib, /usr/lib64 and /usr/lib gets added
# automatically by ldconfig after parsing this file.
# So, they do not need to be listed.


2. Create a link in $ORACLE_HOME/lib for the noelision version of the libpthread library: (please replace with your own one)

su - oracle
ln -s /lib64/noelision/libpthread-<x.xx>.so $ORACLE_HOME/lib/libpthread.so.0


3. Restart the host and then re-logon oracle and see if the sqlplus works.

su - oracle
ldd $ORACLE_HOME/bin/sqlplus
ldd $ORACLE_HOME/bin/oracle
sqlplus / as sysdba

 

Note: The solution can also be applied on GRID/ASM home if ora-12547 reports on SUSE12 while connecting the ASM instance by sqlplus. please also refer to  Note: 2253054.1 for more details.

REFERENCES

NOTE:2253054.1  - Oracle Grid Infrastructure Install Fails on SuSE 12 when Running root.sh






總之很心累,花了1天半的時間。






About Me

........................................................................................................................

● 本文作者:小麥苗,部分內容整理自網路,若有侵權請聯絡小麥苗刪除

● 本文在itpub、部落格園、CSDN和個人微 信公眾號( DB寶)上有同步更新

● 本文itpub地址: http://blog.itpub.net/26736162

● 本文部落格園地址: http://www.cnblogs.com/lhrbest

● 本文CSDN地址: https://blog.csdn.net/lihuarongaini

● 本文pdf版、個人簡介及小麥苗雲盤地址: http://blog.itpub.net/26736162/viewspace-1624453/

● 資料庫筆試面試題庫及解答: http://blog.itpub.net/26736162/viewspace-2134706/

● DBA寶典今日頭條號地址:

........................................................................................................................

● QQ群號: 230161599 、618766405

● 微 信群:可加我微 信,我拉大家進群,非誠勿擾

● 聯絡我請加QQ好友 646634621 ,註明新增緣由

● 於 2020-05-01 06:00 ~ 2020-05-30 24:00 在西安完成

● 最新修改時間:2020-05-01 06:00 ~ 2020-05-30 24:00

● 文章內容來源於小麥苗的學習筆記,部分整理自網路,若有侵權或不當之處還請諒解

● 版權所有,歡迎分享本文,轉載請保留出處

........................................................................................................................

小麥苗的微店

小麥苗出版的資料庫類叢書http://blog.itpub.net/26736162/viewspace-2142121/

小麥苗OCP、OCM、高可用網路班http://blog.itpub.net/26736162/viewspace-2148098/

小麥苗騰訊課堂主頁https://lhr.ke.qq.com/

........................................................................................................................

使用 微 信客戶端掃描下面的二維碼來關注小麥苗的微 信公眾號( DB寶)及QQ群(DBA寶典)、新增小麥苗微 信, 學習最實用的資料庫技術。

........................................................................................................................

歡迎與我聯絡

 

 



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/26736162/viewspace-2690327/,如需轉載,請註明出處,否則將追究法律責任。

相關文章