Apache Calcite 動態資料管理框架整合 csv 實戰筆記

發表於2024-03-01

序言

我們在 Apache Calcite 動態資料管理框架介紹 介紹了 calcite 的基本功能,本文一起來看一下如何實現一個 csv 的 sql 查詢。

入門例子

依賴

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>calcite-learn</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>pom</packaging>
    <modules>
        <module>calcite-learn-basic</module>
    </modules>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <calcite.version>1.20.0</calcite.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-core</artifactId>
            <version>${calcite.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.calcite</groupId>
            <artifactId>calcite-example-csv</artifactId>
            <version>${calcite.version}</version>
        </dependency>
        <!-- Add other dependencies, e.g., database driver -->
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.8.1</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

</project>

測試 csv

建立資料夾:

D:\github\calcite-learn\calcite-learn-basic\src\main\resources\csv

下面防對應的測試 csv 檔案:

  • depts.csv
EMPNO:long,NAME:string,DEPTNO:int,GENDER:string,CITY:string,EMPID:int,AGE:int,SLACKER:boolean,MANAGER:boolean,JOINEDAT:date
100,"Fred",10,,,30,25,true,false,"1996-08-03"
110,"Eric",20,"M","San Francisco",3,80,,false,"2001-01-01"
110,"John",40,"M","Vancouver",2,,false,true,"2002-05-03"
120,"Wilma",20,"F",,1,5,,true,"2005-09-07"
130,"Alice",40,"F","Vancouver",2,,false,true,"2007-01-01"

測試類

package com.github.houbb.calcite.learn.basic;

import org.apache.calcite.adapter.csv.CsvSchema;
import org.apache.calcite.adapter.csv.CsvTable;
import org.apache.calcite.jdbc.CalciteConnection;
import org.apache.calcite.schema.SchemaPlus;

import java.io.File;
import java.sql.*;
import java.util.Properties;

public class CsvDemo {

    public static void main(String[] args) throws Exception {
        // 0.獲取csv檔案的路徑,注意獲取到檔案所在上層路徑就可以了
        String path = "D:\\github\\calcite-learn\\calcite-learn-basic\\src\\main\\resources\\csv\\";

        // 1.構建CsvSchema物件,在Calcite中,不同資料來源對應不同Schema,比如CsvSchema、DruidSchema、ElasticsearchSchema等
        CsvSchema csvSchema = new CsvSchema(new File(path), CsvTable.Flavor.SCANNABLE);

        // 2.構建Connection
        // 2.1 設定連線引數
        Properties info = new Properties();
        // 不區分sql大小寫
        info.setProperty("caseSensitive", "false");
        // 2.2 獲取標準的JDBC Connection
        Connection connection = DriverManager.getConnection("jdbc:calcite:", info);
        // 2.3 獲取Calcite封裝的Connection
        CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);

        // 3.構建RootSchema,在Calcite中,RootSchema是所有資料來源schema的parent,多個不同資料來源schema可以掛在同一個RootSchema下
        // 以實現查詢不同資料來源的目的
        SchemaPlus rootSchema = calciteConnection.getRootSchema();

        // 4.將不同資料來源schema掛載到RootSchema,這裡新增CsvSchema
        rootSchema.add("csv", csvSchema);

        // 5.執行SQL查詢,透過SQL方式訪問csv檔案
        String sql = "select * from csv.depts";
        Statement statement = calciteConnection.createStatement();
        ResultSet resultSet = statement.executeQuery(sql);

        // 6.遍歷列印查詢結果集
        printResultSet(resultSet);
    }

    public static void printResultSet(ResultSet resultSet) throws SQLException {
        // 獲取 ResultSet 後設資料
        ResultSetMetaData metaData = resultSet.getMetaData();

        // 獲取列數
        int columnCount = metaData.getColumnCount();
        System.out.println("Number of columns: " + columnCount);

        // 遍歷 ResultSet 並列印結果
        while (resultSet.next()) {
            // 遍歷每一列並列印
            for (int i = 1; i <= columnCount; i++) {
                String columnName = metaData.getColumnName(i);
                String columnValue = resultSet.getString(i);
                System.out.println(columnName + ": " + columnValue);
            }
            System.out.println(); // 換行
        }
    }

}

測試效果

Number of columns: 10
EMPNO: 100
NAME: Fred
DEPTNO: 10
GENDER: 
CITY: 
EMPID: 30
AGE: 25
SLACKER: true
MANAGER: false
JOINEDAT: 1996-08-03

EMPNO: 110
NAME: Eric
DEPTNO: 20
GENDER: M
CITY: San Francisco
EMPID: 3
AGE: 80
SLACKER: null
MANAGER: false
JOINEDAT: 2001-01-01

EMPNO: 110
NAME: John
DEPTNO: 40
GENDER: M
CITY: Vancouver
EMPID: 2
AGE: null
SLACKER: false
MANAGER: true
JOINEDAT: 2002-05-03

EMPNO: 120
NAME: Wilma
DEPTNO: 20
GENDER: F
CITY: 
EMPID: 1
AGE: 5
SLACKER: null
MANAGER: true
JOINEDAT: 2005-09-07

EMPNO: 130
NAME: Alice
DEPTNO: 40
GENDER: F
CITY: Vancouver
EMPID: 2
AGE: null
SLACKER: false
MANAGER: true
JOINEDAT: 2007-01-01

參考資料

Apache Calcite 快速入門指南

Apache Calcite精簡入門與學習指導

本文由部落格一文多發平臺 OpenWrite 釋出!

相關文章