【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯

路邊兩盞燈發表於2022-07-13

問題描述

在Hadoop集中中,使用ADLS 作為資料來源,在執行PUT操作(上傳檔案到ADLS中),遇見 400錯誤【put: Operation failed: "An HTTP header that's mandatory for this request is not specified.", 400】

啟用Debug輸出詳細日誌:

【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯

錯誤訊息文字內容:

【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯
[hdfs@hadoop001 ~]$ hadoop fs -put a.txt abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt
22/07/13 15:46:05 DEBUG util.Shell: setsid exited with exit code 0
22/07/13 15:46:05 DEBUG conf.Configuration: parsing URL jar:file:/usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar!/core-default.xml
22/07/13 15:46:05 DEBUG conf.Configuration: parsing input stream sun.net.www.protocol.jar.JarURLConnection$JarURLInputStream@4fe3c938
22/07/13 15:46:05 DEBUG conf.Configuration: parsing URL file:/etc/hadoop/3.1.4.0-315/0/core-site.xml
22/07/13 15:46:05 DEBUG conf.Configuration: parsing input stream java.io.BufferedInputStream@467aecef
22/07/13 15:46:05 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
22/07/13 15:46:05 DEBUG security.Groups:  Creating new Groups object
22/07/13 15:46:05 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
22/07/13 15:46:05 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
22/07/13 15:46:05 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
22/07/13 15:46:05 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
22/07/13 15:46:05 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
22/07/13 15:46:06 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
22/07/13 15:46:06 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
22/07/13 15:46:06 DEBUG security.UserGroupInformation: hadoop login
22/07/13 15:46:06 DEBUG security.UserGroupInformation: hadoop login commit
22/07/13 15:46:06 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hdfs
22/07/13 15:46:06 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: hdfs" with name hdfs
22/07/13 15:46:06 DEBUG security.UserGroupInformation: User entry: "hdfs"
22/07/13 15:46:06 DEBUG security.UserGroupInformation: UGI loginUser:hdfs (auth:SIMPLE)
22/07/13 15:46:06 DEBUG core.Tracer: sampler.classes = ; loaded no samplers
22/07/13 15:46:06 DEBUG core.Tracer: span.receiver.classes = ; loaded no span receivers
22/07/13 15:46:06 DEBUG fs.FileSystem: Loading filesystems
22/07/13 15:46:06 DEBUG fs.FileSystem: file:// = class org.apache.hadoop.fs.LocalFileSystem from /usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: viewfs:// = class org.apache.hadoop.fs.viewfs.ViewFileSystem from /usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: har:// = class org.apache.hadoop.fs.HarFileSystem from /usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: http:// = class org.apache.hadoop.fs.http.HttpFileSystem from /usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: https:// = class org.apache.hadoop.fs.http.HttpsFileSystem from /usr/hdp/3.1.4.0-315/hadoop/hadoop-common-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: hdfs:// = class org.apache.hadoop.hdfs.DistributedFileSystem from /usr/hdp/3.1.4.0-315/hadoop-hdfs/hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: webhdfs:// = class org.apache.hadoop.hdfs.web.WebHdfsFileSystem from /usr/hdp/3.1.4.0-315/hadoop-hdfs/hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: swebhdfs:// = class org.apache.hadoop.hdfs.web.SWebHdfsFileSystem from /usr/hdp/3.1.4.0-315/hadoop-hdfs/hadoop-hdfs-client-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: gs:// = class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem from /usr/hdp/3.1.4.0-315/hadoop-mapreduce/gcs-connector-1.9.10.3.1.4.0-315-shaded.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: s3n:// = class org.apache.hadoop.fs.s3native.NativeS3FileSystem from /usr/hdp/3.1.4.0-315/hadoop-mapreduce/hadoop-aws-3.1.1.3.1.4.0-315.jar
22/07/13 15:46:06 DEBUG fs.FileSystem: Looking for FS supporting abfs
22/07/13 15:46:06 DEBUG fs.FileSystem: looking for configuration option fs.abfs.impl
22/07/13 15:46:06 DEBUG fs.FileSystem: Filesystem abfs defined in configuration option
22/07/13 15:46:06 DEBUG fs.FileSystem: FS for abfs is class org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem
22/07/13 15:46:06 DEBUG azurebfs.AzureBlobFileSystem: Initializing AzureBlobFileSystem for abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt
22/07/13 15:46:06 DEBUG security.Groups: GroupCacheLoader - load.
22/07/13 15:46:06 WARN utils.SSLSocketFactoryEx: Failed to load OpenSSL. Falling back to the JSSE default.
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_RSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_DHE_RSA_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_DHE_DSS_WITH_AES_256_GCM_SHA384
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_RSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_DHE_RSA_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG utils.SSLSocketFactoryEx: Removed Cipher - TLS_DHE_DSS_WITH_AES_128_GCM_SHA256
22/07/13 15:46:06 DEBUG services.AbfsClientThrottlingIntercept: Client-side throttling is enabled for the ABFS file system.
22/07/13 15:46:06 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.getFileStatus path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt
22/07/13 15:46:06 DEBUG azurebfs.AzureBlobFileSystem: ABFS authorizer is not initialized. No authorization check will be performed.
22/07/13 15:46:06 DEBUG azurebfs.AzureBlobFileSystemStore: Get root ACL status
22/07/13 15:46:06 DEBUG oauth2.AccessTokenProvider: AADToken: no token. Returning expiring=true
22/07/13 15:46:06 DEBUG oauth2.AccessTokenProvider: AAD Token is missing or expired: Calling refresh-token from abstract base class
22/07/13 15:46:06 DEBUG oauth2.AccessTokenProvider: AADToken: refreshing client-credential based token
22/07/13 15:46:06 DEBUG oauth2.AzureADAuthenticator: AADToken: starting to fetch token using client creds for client ID 0392543e-5eab-4de2-881b-9bd8a9fe9deb
22/07/13 15:46:06 DEBUG oauth2.AzureADAuthenticator: Requesting an OAuth token by POST to https://login.partner.microsoftonline.cn/fc54511d-de79-4bae-bfc9-3a42945d1b27/oauth2/token
22/07/13 15:46:06 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Connection=close
22/07/13 15:46:06 DEBUG oauth2.AzureADAuthenticator: Response 200
22/07/13 15:46:06 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 200 OK
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   x-ms-ests-server=2.1.13156.10 - CNN2LR1 ProdSlices
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   X-Content-Type-Options=nosniff
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Connection=close
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Pragma=no-cache
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   P3P=CP="DSP CUR OTPi IND OTRi ONL FIN"
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Strict-Transport-Security=max-age=31536000; includeSubDomains
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Cache-Control=no-store, no-cache
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Set-Cookie=*cookie info*
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Expires=-1
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Content-Length=1427
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   X-XSS-Protection=0
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   x-ms-request-id=b63779e3-ec7a-4d78-a950-fc5cd47b2f01
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Content-Type=application/json; charset=utf-8
22/07/13 15:46:06 DEBUG oauth2.AzureADAuthenticator: AADToken: fetched token with expiry Wed Jul 13 16:46:05 CST 2022
22/07/13 15:46:06 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:06 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=14467eed-4c13-4e36-9a5d-35603fe87d0a
22/07/13 15:46:06 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 200 OK
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-lease-status=unlocked
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-lease-state=available
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Last-Modified=Mon, 11 Jul 2022 08:15:08 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-blob-type=BlockBlob
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Ranges=bytes
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-server-encrypted=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-access-tier-inferred=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-meta-hdi_isfolder=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-access-tier=Hot
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   ETag="0x8DA631578216222"
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-creation-time=Mon, 11 Jul 2022 08:15:08 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Length=0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211e55-f01e-0058-388c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 200,,cid=14467eed-4c13-4e36-9a5d-35603fe87d0a,rid=4a211e55-f01e-0058-388c-9679fc000000,sent=0,recv=0,HEAD,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl//?upn=false&action=getAccessControl&timeout=90
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystemStore: getFileStatus filesystem: adsl path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt isNamespaceEnabled: true
22/07/13 15:46:07 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=756615aa-bed9-4487-88ae-b69f859f0b51
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Transfer-Encoding=chunked
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 404 The specified blob does not exist.
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-error-code=BlobNotFound
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211e7e-f01e-0058-5d8c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 404,,cid=756615aa-bed9-4487-88ae-b69f859f0b51,rid=4a211e7e-f01e-0058-5d8c-9679fc000000,sent=0,recv=0,HEAD,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test/a.txt?upn=false&timeout=90
22/07/13 15:46:07 DEBUG fs.FileSystem: Looking for FS supporting file
22/07/13 15:46:07 DEBUG fs.FileSystem: looking for configuration option fs.file.impl
22/07/13 15:46:07 DEBUG fs.FileSystem: Looking in service filesystems for implementation class
22/07/13 15:46:07 DEBUG fs.FileSystem: FS for file is class org.apache.hadoop.fs.LocalFileSystem
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.getFileStatus path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: ABFS authorizer is not initialized. No authorization check will be performed.
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystemStore: getFileStatus filesystem: adsl path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test isNamespaceEnabled: true
22/07/13 15:46:07 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=b48f18e8-ba8e-4a44-956f-5ef889b828e5
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 200 OK
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-lease-status=unlocked
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-lease-state=available
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Last-Modified=Tue, 12 Jul 2022 10:03:44 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-blob-type=BlockBlob
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Ranges=bytes
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-server-encrypted=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-access-tier-inferred=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-meta-hdi_isfolder=true
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-access-tier=Hot
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Cache-Control=max-age=0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   ETag="0x8DA63EDCE5D3F3C"
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-creation-time=Tue, 12 Jul 2022 10:03:44 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Length=0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211e93-f01e-0058-718c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 200,,cid=b48f18e8-ba8e-4a44-956f-5ef889b828e5,rid=4a211e93-f01e-0058-718c-9679fc000000,sent=0,recv=0,HEAD,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test?upn=false&timeout=90
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.getFileStatus path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: ABFS authorizer is not initialized. No authorization check will be performed.
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystemStore: getFileStatus filesystem: adsl path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_ isNamespaceEnabled: true
22/07/13 15:46:07 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=8a6491b6-e13e-4d4e-be3b-8d183c727442
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Transfer-Encoding=chunked
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 404 The specified blob does not exist.
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-error-code=BlobNotFound
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211ea5-f01e-0058-7f8c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 404,,cid=8a6491b6-e13e-4d4e-be3b-8d183c727442,rid=4a211ea5-f01e-0058-7f8c-9679fc000000,sent=0,recv=0,HEAD,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test/a.txt._COPYING_?upn=false&timeout=90
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.create path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_ permission: { masked: rw-r--r--, unmasked: rw-rw-rw- } overwrite: true bufferSize: 33554432
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: ABFS authorizer is not initialized. No authorization check will be performed.
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystemStore: createFile filesystem: adsl path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_ overwrite: true permission: { masked: rw-r--r--, unmasked: rw-rw-rw- } umask: ----w--w- isNamespaceEnabled: true
22/07/13 15:46:07 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-umask=0022
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-permissions=0644
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=bf71e98f-886d-4529-b62b-7898c655fadf
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 400 An HTTP header that's mandatory for this request is not specified.
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-error-code=MissingRequiredHeader
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Length=301
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211eae-f01e-0058-088c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=application/xml
22/07/13 15:46:07 DEBUG services.AbfsHttpOperation: ExpectedError: 
org.codehaus.jackson.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: sun.net.www.protocol.http.HttpURLConnection$HttpInputStream@38145825; line: 1, column: 5]
	at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
	at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442)
	at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:2090)
	at org.codehaus.jackson.impl.Utf8StreamParser._nextTokenNotInObject(Utf8StreamParser.java:606)
	at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:492)
	at org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processStorageErrorResponse(AbfsHttpOperation.java:379)
	at org.apache.hadoop.fs.azurebfs.services.AbfsHttpOperation.processResponse(AbfsHttpOperation.java:285)
	at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.executeHttpOperation(AbfsRestOperation.java:172)
	at org.apache.hadoop.fs.azurebfs.services.AbfsRestOperation.execute(AbfsRestOperation.java:125)
	at org.apache.hadoop.fs.azurebfs.services.AbfsClient.createPath(AbfsClient.java:254)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:342)
	at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:189)
	at org.apache.hadoop.fs.FilterFileSystem.create(FilterFileSystem.java:193)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
	at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.create(CommandWithDestination.java:521)
	at org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:485)
	at org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:408)
	at org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:343)
	at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:278)
	at org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:263)
	at org.apache.hadoop.fs.shell.Command.processPathInternal(Command.java:367)
	at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:331)
	at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:304)
	at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:258)
	at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:286)
	at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:270)
	at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:229)
	at org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:295)
	at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:120)
	at org.apache.hadoop.fs.shell.Command.run(Command.java:177)
	at org.apache.hadoop.fs.FsShell.run(FsShell.java:328)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.fs.FsShell.main(FsShell.java:391)
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 400,,cid=bf71e98f-886d-4529-b62b-7898c655fadf,rid=4a211eae-f01e-0058-088c-9679fc000000,sent=0,recv=301,PUT,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test/a.txt._COPYING_?resource=file&timeout=90
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.getFileStatus path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: ABFS authorizer is not initialized. No authorization check will be performed.
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystemStore: getFileStatus filesystem: adsl path: abfs://adsl@xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/test/a.txt._COPYING_ isNamespaceEnabled: true
22/07/13 15:46:07 DEBUG services.AbfsClient: Authenticating request with OAuth2 access token
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Request Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept-Charset=utf-8
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Accept=application/json, application/octet-stream
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   User-Agent=Azure Blob FS/3.1.1.3.1.4.0-315 (JavaJRE 1.8.0_232; Linux 3.10.0-862.el7.x86_64; SunJSSE-1.8) User-Agent: APN/1.0 Hortonworks/1.0 HDP/3.1.4.0-315
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-client-request-id=4aeb3b46-587d-489c-a246-5d6d668a84a5
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Content-Type=
22/07/13 15:46:07 DEBUG services.AbfsIoUtils: Response Headers
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Transfer-Encoding=chunked
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   HTTP Response=HTTP/1.1 404 The specified blob does not exist.
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-version=2018-11-09
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Server=Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-error-code=BlobNotFound
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   x-ms-request-id=4a211eb7-f01e-0058-108c-9679fc000000
22/07/13 15:46:07 DEBUG services.AbfsIoUtils:   Date=Wed, 13 Jul 2022 07:46:06 GMT
22/07/13 15:46:07 DEBUG services.AbfsClient: HttpRequest: 404,,cid=4aeb3b46-587d-489c-a246-5d6d668a84a5,rid=4a211eb7-f01e-0058-108c-9679fc000000,sent=0,recv=0,HEAD,https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test/a.txt._COPYING_?upn=false&timeout=90
put: Operation failed: "An HTTP header that's mandatory for this request is not specified.", 400, PUT, https://xxxxxxxxxxxxx.blob.core.chinacloudapi.cn/adsl/test/a.txt._COPYING_?resource=file&timeout=90, , ""
22/07/13 15:46:07 DEBUG azurebfs.AzureBlobFileSystem: AzureBlobFileSystem.close
22/07/13 15:46:07 DEBUG util.ShutdownHookManager: Completed shutdown in 0.004 seconds; Timeouts: 0
22/07/13 15:46:07 DEBUG util.ShutdownHookManager: ShutdownHookManger completed shutdown.
View Code

 

問題解答

雖然在Hadoop 中執行的 PUT指令如下:

./hadoop fs -put a.txt abfs://yourcontainername@youradlsname.blob.core.chinacloudapi.cn/test.txt

但實質上,也時傳送的REST API來操作ADLS資源。 所以參考PUT Blob的介面文件:https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#request-headers-all-blob-types

它必須的Header引數有:x-ms-version,x-ms-blob-type,x-ms-lease-id,Authorization,x-ms-date,Content-Length等。但是在Hadoop的日誌中,我們只發現了 x-ms-version為 2018-11-09,缺少了x-ms-blob-type。

基於這一發現,我們通過Postman復現了同樣的錯誤:

【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯

雖然找到了發生問題的根源,但是在Hadoop中,如何來解決呢? 為什麼使用 -put , -ls 等指令都會出現 HTTP Header miss 的問題呢?  按照Hadoop + ADLS 組合設計分析,不可能出現這樣的嚴重錯誤而不進行修復。

 

回想 ADLS Gen 2專為大資料操作而設計。並且還特別啟用了新的終結點(常規Blob操作終結點為:youradlsname.blob.core.chinacloudapi.cn , ADLS操作的終結點為:youradlsname.dfs.core.chinacloudapi.cn)

是否時我們在指令中使用了錯誤的終結點呢?

對比REST API 文件中,常規Blob的PUT操作和ADLS Create File的PUT操作,發現 ADLS PUT操作根本就不需要 x-ms-version,x-ms-blob-type 這兩個Header 為必須。

【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯

根據以上發現,在Hadoop put指令中修改 blob 為 dfs 測試。 問題完美解決!

【Azure 儲存服務】Hadoop叢集中使用ADLS(Azure Data Lake Storage)過程中遇見執行PUT操作報錯

以此次的錯誤,得出一個深刻的教訓:當使用ADLS進行大資料相關操作時(如hadoop,databricks)一定一定要使用ADLS專用終結點:

xxxxxxx.dfs.core.chinacloudapi.cn

 

 

參考資料

Filesystem - Create:https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/filesystem/create

Put Blob: https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob#request-headers-all-blob-types

[END]

 

相關文章