Spark SQL scala和java版本的UDF函式使用

破棉襖發表於2016-05-04

java:

//註冊UDF

sqlContext.udf.register("getImei",new GetImei,StringType)


  1. public class GetImei implements UDF1<String,String>{     //入參型別和返回型別

  2.     
  3.     @Override
  4.     public String call(String jsonStr){
  5.         
  6.     try{
  7.         
  8.             if(jsonStr.contains("imei")){              //安卓型別
  9.                 int startNum = jsonStr.indexOf("imei");
  10.          return jsonStr.substring(startNum+7,startNum+22);
  11.             }else if(jsonStr.contains("idfv")){ //ios型別
  12.                  int startNum = jsonStr.indexOf("idfv");
  13.              return jsonStr.substring(startNum+7,startNum+43);
  14.             }else{                                     //沒有值需往前推    
  15.                 return "None";
  16.             }
  17.         }catch(Exception e){
  18.             return "None";
  19.         }
  20.         
  21.     }

  22. }

scala:

  1. def getLen(text:String):Int = text.length()
  2. sqlContext.udf.register("getLen",getLen _)
  3. sqlContext.sql("select getLen(appId) from wzx limit 10").collect().foreach(println)
  4.     
  5. def getLenBool(text:String,len:Integer):Boolean = text.length()>len
  6. sqlContext.udf.register("getLenBool",getLenBool _)
  7. sqlContext.sql("select getLenBool(appId,10) from wzx limit 10").collect().foreach(println)

自定義時間戳轉換函式 類似於FROM_UNIXTIME

  1. import java.text.SimpleDateFormat
  2. import java.util.Date
  3. import java.util.TimeZone
  4. def getTimeBySeal(seal:Long):String={
  5.     val format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss")
  6.     format.setTimeZone(TimeZone.getTimeZone("GMT+8"))
  7.     val time=new Date(seal*1000L)
  8.     val d = format.format(time)
  9.     d
  10. }
  11.    
  12. sqlContext.udf.register("convertTime",getTimeBySeal _)


來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/29754888/viewspace-2093226/,如需轉載,請註明出處,否則將追究法律責任。

相關文章