問題說明
部署一個連線Hive的Java應用程式,遇到這個Kerberos報錯的問題,查了一天,記錄一下
問題現象
- Kerberos GSS initiate failed
- No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))
- Cannot read from System.in
javax.security.sasl.SaslException: GSS initiate failed
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[na:1.8.0_351]
at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) [hive-exec-1.1.0-cdh5.12.1-slankka.jar:1.1.0-cdh5.12.1]
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) ~[hive-exec-1.1.0-cdh5.12.1-slankka.jar:1.1.0-cdh5.12.1]
at .....
at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_351]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Attempt to obtain new INITIATE credentials failed! (null))
at sun.security.jgss.krb5.Krb5InitCredential.getTgt(Krb5InitCredential.java:386) ~[na:1.8.0_351]
... 44 common frames omitted
Caused by: javax.security.auth.login.LoginException: Cannot read from System.in
at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:871) ~[na:1.8.0_351]
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:710) ~[na:1.8.0_351]
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) ~[na:1.8.0_351]
排查過程
開啟 Kerberos debug:
-Dsun.security.krb5.debug=true
關鍵資訊
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
>>>KinitOptions cache name is /opt/userdata/krb5cache/0/krb5cc_0.jSzu-aKO
網路搜尋
https://bugs.openjdk.org/browse/JDK-6832353
https://community.spiceworks.com/t/pam-keeps-setting-the-krb5ccname-env-variable/940232
https://linux.die.net/man/5/pam_krb5
分析直接原因
KRB5CCNAME 這個環境變數被改了,與實際的KRB5CCNAME不一致。
查詢根本原因
啟動指令碼是透過 su - hue -c "springboot-app.jar start"
這種方式啟動的
以前踩過一個坑 su - hue
執行的shell環境帶hue環境變數,su hue
不帶hue環境變數
最終原因
啟動的指令碼有錯
#!/bin/bash
CURRENT_USER=$(whoami)
COMMAND='HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop /apps/springboot-app.jar start'
if [ $CURRENT_USER=='hue' ]; then
echo "1. executing as $CURRENT_USER"
bash -c "$COMMAND"
elif [ $CURRENT_USER=='root' ]; then
echo "2. executing as hue... from $CURRENT_USER"
su - hue -c "$COMMAND"
else
echo "permission denied."
fi
結果發現是 if [ ]
表示式有錯,列印的是
1. executing as hue
實際上執行的是 bash -c "$COMMAND"
而不是 su - hue -c "$COMMAND"
改正後成功列印
2. executing as hue... from root
結論
曾經懷疑過 su 是不是不支援 KERBEROS 相關的認證,結果證明是沒問題的
只要認準 KRB5CCNAME 變數設定正確,就不會有問題。
最坑的是 Linux shell 語法,從 chatGPT 上覆制尤其需要注意