實驗環境
- minio-8.0.10 http://192.168.137.100:32000/minio/bigdata/
- spark-operator-1.1.26
- spark-history-server 3.2.2 http://192.168.137.100:32627/
測試案例
案例hudi-spark-test001
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: hudi-spark-test001
namespace: spark
spec:
type: Scala
mode: cluster
image: "umr/spark:3.2.2_v2"
imagePullPolicy: IfNotPresent
mainClass: cc.hudi.HoodieSparkQuickstart
mainApplicationFile: "s3a://bigdatas/jars/bigdataDemo-1.0-SNAPSHOT.jar"
sparkVersion: "3.2.2"
timeToLiveSeconds: 259200
restartPolicy:
type: Never
volumes:
- name: "test-volume"
hostPath:
path: "/tmp"
type: Directory
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.2.2
serviceAccount: spark
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 3.2.2
volumeMounts:
- name: "test-volume"
mountPath: "/tmp"
sparkConf:
spark.ui.port: "4045"
spark.eventLog.enabled: "true"
spark.eventLog.dir: "s3a://sparklogs/all"
spark.hadoop.fs.s3a.access.key: "minio"
spark.hadoop.fs.s3a.secret.key: "minio123"
spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
spark.hadoop.fs.s3a.endpoint: "http://10.19.64.205:32000"
spark.hadoop.fs.s3a.connection.ssl.enabled: "false"
spark.hadoop.fs.s3a.path.style.access: "true"
spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
報錯:
- ClusterRole許可權不足, ClusterRole賬戶缺失persistentvolumeclaims的許可權:
3/07/07 06:26:54 ERROR Utils: Uncaught exception in thread main
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/spark-operator/persistentvolumeclaims?labelSelector=spark-app-selector%3Dspark-a9d7e8f78bc6459c9282db57a02815d9. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. persistentvolumeclaims is forbidden: User "system:serviceaccount:spark-operator:spark-operator" cannot list resource "persistentvolumeclaims" in API group "" in the namespace "spark-operator".
修改許可權:
# 部分示例
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- '*'
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- '*'
- apiGroups:
- ""
resources:
- configmaps
verbs:
- '*'
2.案例hudi-spark-test001報錯:
23/07/10 01:20:41 WARN SparkSession: Cannot use org.apache.spark.sql.hudi.HoodieSparkSessionExtension to configure session extensions.
java.lang.ClassNotFoundException: org.apache.spark.sql.hudi.HoodieSparkSessionExtension
at java.base/java.net.URLClassLoader.findClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
at java.base/java.lang.ClassLoader.loadClass(Unknown Source)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Unknown Source)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$applyExtensions(SparkSession.scala:1192)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:956)
at cc.utils.HoodieExampleSparkUtils.buildSparkSession(HoodieExampleSparkUtils.java:60)
at cc.utils.HoodieExampleSparkUtils.defaultSparkSession(HoodieExampleSparkUtils.java:53)
at cc.hudi.HoodieSparkQuickstart.main(HoodieSparkQuickstart.java:39)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
3.github原始碼編譯錯誤:
[ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.14.0-SNAPSHOT: The following artifacts could not be resolved: io.confluent:kafka-avr
o-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: io.confluent:kafka-avro-serializer:jar:5.3.4 was not found in http://10.41.31.10:9081/repository/maven-public/ during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of chinaunicom has elapsed or updates are forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :hudi-utilities_2.12
參考:
github的原始碼使用指南:
Developer Setup | Apache Hudi
社群查詢問題:
Apache Hudi - ASF JIRA