安全叢集訪問非安全叢集問題記錄

邢為棟發表於2020-12-03

本文描述安全叢集訪問非安全叢集遇到的問題及分析。

案例

使用Hive對映Phoenix表,其中Hive服務在啟用kerberos的叢集中,Phoenix在另一個未啟用Kerberos的叢集中。

報錯及分析

HUE返回報錯:

Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Failed after attempts=11, exceptions: Thu Dec 03 08:52:11 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=11, exceptions: Thu Dec 03 08:51:33 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, javax.security.sasl.SaslException: Call to transfer01.bigdata.zxxk.com/10.111.118.166:16020 failed on local exception: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)] [Caused by javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]] Thu Dec 03 08:51:33 CST 2020, RpcRetryingCaller{globalStartTime=1606956693172, pause=100, maxAttempts=11}, java.io.IOException: Call to transfer01.bigdata.zxxk.com/10.111.118.166:16020 failed on local exception: java.io.IOException: Can not send request because relogin is in progress. 
......

分析:

可以看出,這個是Kerberos認證問題,主要資訊是:Server not found in Kerberos database

但是這裡沒有提示這個 Server 是啥。

經過觀察/var/log/krb5kdc.log,發現報錯如下:

Dec 03 09:12:58 utility1.bigdata.zxxk.com krb5kdc[2740](info): AS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: ISSUE: authtime 1606957978, etypes {rep=18 tkt=18 ses=18}, hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for krbtgt/BIGDATA.ZXXK.COM@BIGDATA.ZXXK.COM
Dec 03 09:12:59 utility1.bigdata.zxxk.com krb5kdc[2739](info): TGS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: LOOKING_UP_SERVER: authtime 0,  hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM, Server not found in Kerberos database
Dec 03 09:12:59 utility1.bigdata.zxxk.com krb5kdc[2739](info): TGS_REQ (4 etypes {18 17 16 23}) 10.111.116.226: LOOKING_UP_SERVER: authtime 0,  hive/gateway01.bigdata.zxxk.com@BIGDATA.ZXXK.COM for hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM, Server not found in Kerberos database
......

分析:

其中hbase/transfer01.bigdata.zxxk.com@BIGDATA.ZXXK.COM,在Kerberos資料庫中不存在,而 transfer01.bigdata.zxxk.com 是未啟用Kerberos叢集的一個主機。

解決思路(未驗證)

本思路未驗證,請勿在生產環境嘗試。

由於Kerberos的機制,非認證主機和服務無法建立連線,所以如果想要解決上述問題,需要將目標主機加入Kerberos認證管理,並建立相應的服務。

相關文章