Spark 異常:Trying to write more fields than contained in row

破棉襖發表於2016-06-13


將json轉為row落地儲存為parquet:
  1. for type_name in types.value:
  2.             print(type_name)
  3.             type_data_set = lines.filter(lambda line: line['type'] == type_name)
  4.             type_row = type_data_set.map(lambda line: Row(**line))
  5.             schema_row = self.sqlContext.createDataFrame(type_row)

  6.             schema_row.write.mode('overwrite').parquet(
  7.                 'hdfs://ip:port/parquet/%s/year=%s/month=%s/day=%s/hour=%s' % \
  8.                 (type_name, self.year, self.month, self.day, self.hour)
  9.             )

異常:

  1. Caused by: java.lang.IndexOutOfBoundsException: Trying to write more fields than contained in row (15 > 12)
  2.         at org.apache.spark.sql.execution.datasources.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:261)
  3.         at org.apache.spark.sql.execution.datasources.parquet.MutableRowWriteSupport.write(ParquetTableSupport.scala:257)
  4.         at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)
  5.         at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)
  6.         at org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)
  7.         at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.writeInternal(ParquetRelation.scala:99)
  8.         at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:242)
  9.         ... 8 more
解決:
同為zi型別的兩條記錄一條12個欄位,另一條15個欄位
  1. {"time":"2016-06-06 17:25:14","message":{"channel":3,"containerId":"16","sendUserId":"2611","objectName":"RC:TxtMsg","count":49,"type":"zi","uuid":"-1","appId":"100000","nodeId":"GRM_NODE_0","userId":"2611","time":1465205114814,"ipAddress":"0","sdkVersion":"2.6.2","osName":"Android","deviceId":"0"}}
  2. {"time":"2016-06-06 17:41:31","message":{"channel":0,"count":0,"type":"zi","uuid":"","appId":"100000","nodeId":"MSG_NODE_2","userId":"2626","time":1465206091272,"ipAddress":"0","sdkVersion":"2.6.1","osName":"0","deviceId":"1"}}



來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29754888/viewspace-2119617/,如需轉載,請註明出處,否則將追究法律責任。

相關文章