spark中配置啟用LZO壓縮
參考列表:
centos7中安裝LZO與配置,請參考:http://blog.itpub.net/31511218/viewspace-2151945/
Hadoop配置LZO,請參考:http://blog.itpub.net/31511218/viewspace-2151946/
Spark中配置啟用LZO壓縮,步驟如下:
一、spark-env.sh配置
二、spark-defaults.conf配置
注:指向編譯生成lzo的jar包
三、測試
1、讀取Lzo檔案
2、寫出lzo檔案
至此配置與測試完成。
四、配置與測試中存問題
1、引用native,缺少LD_LIBRARY_PATH
1.1、錯誤提示:
點選(此處)摺疊或開啟
1.2、解決辦法:在spark的conf中配置spark-evn.sh,增加以下內容:
點選(此處)摺疊或開啟
2、無法找到LzopCodec類
2.1、錯誤提示:
2.2、解決辦法:在spark的conf中配置spark-defaults.conf,增加以下內容:
來自@若澤大資料
centos7中安裝LZO與配置,請參考:http://blog.itpub.net/31511218/viewspace-2151945/
Hadoop配置LZO,請參考:http://blog.itpub.net/31511218/viewspace-2151946/
Spark中配置啟用LZO壓縮,步驟如下:
一、spark-env.sh配置
點選(此處)摺疊或開啟
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native
- export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native
- export SPARK_CLASSPATH=$SPARK_CLASSPATH:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/tools/lib/*:/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/jars/*
二、spark-defaults.conf配置
點選(此處)摺疊或開啟
- spark.driver.extraClassPath /app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.19.jar
- spark.executor.extraClassPath /app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.19.jar
三、測試
1、讀取Lzo檔案
點選(此處)摺疊或開啟
- spark-shell --master local[2]
- scala> import com.hadoop.compression.lzo.LzopCodec
- scala> val page_views = sc.textFile("/user/hive/warehouse/page_views_lzo/page_views.dat.lzo")
點選(此處)摺疊或開啟
- spark-shell --master local[2]
- scala> import com.hadoop.compression.lzo.LzopCodec
- scala> val lzoTest = sc.parallelize(1 to 10)
- scala> lzoTest.saveAsTextFile("/input/test_lzo", classOf[LzopCodec])
- 結果:
-
[hadoop@spark220 common]$ hdfs dfs -ls /input/test_lzo
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2018-03-16 23:24 /input/test_lzo/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 60 2018-03-16 23:24 /input/test_lzo/part-00000.lzo
-rw-r--r-- 1 hadoop supergroup 61 2018-03-16 23:24 /input/test_lzo/part-00001.lzo
四、配置與測試中存問題
1、引用native,缺少LD_LIBRARY_PATH
1.1、錯誤提示:
點選(此處)摺疊或開啟
-
Caused by: java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.getDecompressorType(LzopCodec.java:120)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:178)
at org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:111)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:246)
at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:245)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:203)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:94)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
點選(此處)摺疊或開啟
-
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/app/hadoop-2.6.0-cdh5.7.0/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/yarn/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/hdfs/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/mapreduce/lib/*:/app/hadoop-2.6.0-cdh5.7.0/share/hadoop/tools/lib/*:/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/jars/*
2.1、錯誤提示:
點選(此處)摺疊或開啟
-
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzopCodec not found.
-
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135)
-
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175)
-
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
-
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzopCodec not found
-
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
- at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128)
點選(此處)摺疊或開啟
-
spark.driver.extraClassPath /app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.19.jar
- spark.executor.extraClassPath /app/hadoop-2.6.0-cdh5.7.0/share/hadoop/common/hadoop-lzo-0.4.19.jar
來自@若澤大資料
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/31511218/viewspace-2151948/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- 配置Hadoop中啟用LZO壓縮Hadoop
- CentOS7中安裝LZO壓縮程式CentOS
- Keka for Mac(壓縮解壓工具) 1.3.6中文啟用版Mac
- nginx快取配置及開啟gzip壓縮Nginx快取
- 如何透過ZBlogPHP啟用Gzip壓縮?PHP
- 如何在Spring Boot應用程式中啟用GZIP壓縮? | 前端後端Spring Boot前端後端
- 解壓縮工具:Bandizip for mac 中文啟用版Mac
- 前端效能最佳化——啟用文字壓縮前端
- vue-cli 啟動gzip壓縮,及後臺配置Vue
- tomcat網頁壓縮配置Tomcat網頁
- rar壓縮解壓工具:RAR Extractor - ZIP Unarchiver中文啟用版Hive
- Linux中檔案的壓縮和解壓縮Linux
- Apache 開啟gzip壓縮Apache
- Nginx網路壓縮 CSS壓縮 圖片壓縮 JSON壓縮NginxCSSJSON
- 專業解壓縮軟體:Oka解壓專家 for Mac v2.1.7中文啟用版Mac
- vue-cli3.0配置GZIP壓縮Vue
- CentOS中zip壓縮和unzip解壓縮命令詳解CentOS
- nginx指定埠開啟gzip壓縮Nginx
- Vue開啟gzip壓縮檔案Vue
- Apache開啟GZIP壓縮功能方法Apache
- 專業版解壓/壓縮工具:MyZip Pro for Mac v1.2.5免啟用版Mac
- 實用的壓縮解壓工具:WinZip for MacMac
- 用ASP實現線上壓縮與解壓縮功能程式碼
- 檔案壓縮和解壓縮
- 簡單解壓縮工具:OmniZip - Universal Extractor Pro 中文啟用版
- 人工智慧在資料壓縮中的應用人工智慧
- 蘋果Mac壓縮解壓工具:Archiver 4 for Mac v4.0.0註冊啟用版蘋果MacHive
- 怎麼把影片壓縮?實用又簡單的壓縮影片方法
- Python實現壓縮和解壓縮Python
- JS壓縮方法及批量壓縮JS
- linux下壓縮解壓縮命令Linux
- linux壓縮和解壓縮命令整理Linux
- 4K Image Compressor Pro fo mac(圖片壓縮器) 1.0.1中文啟用版Mac
- 影片壓縮工具:Compress Any Video pro Mac v2.2.1啟用版IDEMac
- MyZip Pro for Mac(專業解壓縮工具) v1.1.6啟用版Mac
- NET中SharpZipLib 的使用(二)【Web中壓縮與解壓】Web
- NET中SharpZipLib 的使用(一)【壓縮與解壓】
- Linux tar分卷壓縮與解壓縮Linux