理解HTTP協議中的multipart/form-data

throwable發表於2021-12-28

前提

之前在寫一個通用HTTP元件的時候遇到過媒體(Media)型別multipart/form-data的封裝問題,這篇文章主要簡單介紹一下HTTP協議中媒體型別multipart/form-data的定義、應用和簡單實現。

multipart/form-data的定義

媒體型別multipart/form-data遵循multipart MIME資料流定義(該定義可以參考Section 5.1 - RFC2046),大概含義就是:媒體型別multipart/form-data的資料體由多個部分組成,這些部分由一個固定邊界值(Boundary)分隔。

multipart/form-data請求體佈局

multipart/form-data請求體的佈局如下:

# 請求頭 - 這個是必須的,需要指定Content-Type為multipart/form-data,指定唯一邊界值
Content-Type: multipart/form-data; boundary=${Boundary}

# 請求體
--${Boundary}
Content-Disposition: form-data; name="name of file"
Content-Type: application/octet-stream

bytes of file
--${Boundary}
Content-Disposition: form-data; name="name of pdf"; filename="pdf-file.pdf"
Content-Type: application/octet-stream

bytes of pdf file
--${Boundary}
Content-Disposition: form-data; name="key"
Content-Type: text/plain;charset=UTF-8

text encoded in UTF-8
--${Boundary}--

媒體型別multipart/form-data相對於其他媒體型別如application/x-www-form-urlencoded等來說,最明顯的不同點是:

  • 請求頭的Content-Type屬性除了指定為multipart/form-data,還需要定義boundary引數
  • 請求體中的請求行資料是由多部分組成,boundary引數的值模式--${Boundary}用於分隔每個獨立的分部
  • 每個部分必須存在請求頭Content-Disposition: form-data; name="${PART_NAME}";,這裡的${PART_NAME}需要進行URL編碼,另外filename欄位可以使用,用於表示檔案的名稱,但是其約束性比name屬性低(因為並不確認本地檔案是否可用或者是否有異議)
  • 每個部分可以單獨定義Content-Type和該部分的資料體
  • 請求體以boundary引數的值模式--${Boundary}--作為結束標誌

{% note warning flat %}
RFC7578中提到兩個multipart/form-data過期的使用方式,其一是Content-Transfer-Encoding請求頭的使用,這裡也不展開其使用方式,其二是請求體中單個表單屬性傳輸多個二進位制檔案的方式建議換用multipart/mixed(一個"name"對應多個二進位制檔案的場景)
{% endnote %}

特殊地:

  • 如果某個部分的內容為文字,其的Content-Typetext/plain,可指定對應的字符集,如Content-Type: text/plain;charset=UTF-8
  • 可以通過_charset_屬性指定預設的字符集,用法如下:
Content-Disposition: form-data; name="_charset_"

UTF-8
--ABCDE--
Content-Disposition: form-data; name="field"

...text encoded in UTF-8...
ABCDE--

Boundary引數取值規約

Boundary引數取值規約如下:

  • Boundary的值必須以英文中間雙橫槓--開頭,這個--稱為前導連字元
  • Boundary的值除了前導連字元以外的部分不能超過70個字元
  • Boundary的值不能包含HTTP協議或者URL禁用的特殊意義的字元,例如英文冒號:
  • 每個--${Boundary}之前預設強制必須為CRLF,如果某一個部分的文字型別請求體以CRLF結尾,那麼在請求體的二級制格式上,必須顯式存在兩個CRLF,如果某一個部分的請求體不以CRLF結尾,可以只存在一個CRLF,這兩種情況分別稱為分隔符的顯式型別和隱式型別,說的比較抽象,見下面的例子:
# 請求頭
Content-type: multipart/data; boundary="--abcdefg"

--abcdefg
Content-Disposition: form-data; name="x"
Content-type: text/plain; charset=ascii

It does NOT end with a linebreak # <=== 這裡沒有CRLF,隱式型別
--abcdefg
Content-Disposition: form-data; name="y"
Content-type: text/plain; charset=ascii

It DOES end with a linebreak # <=== 這裡有CRLF,顯式型別

--abcdefg

## 直觀看隱式型別的CRLF
It does NOT end with a linebreak CRLF --abcdefg

## 直觀看顯式型別的CRLF
It DOES end with a linebreak CRLF CRLF --abcdefg

實現multipart/form-data媒體型別的POST請求

這裡只針對低JDK版本的HttpURLConnection和高JDK版本內建的HttpClient編寫multipart/form-data媒體型別的POST請求的HTTP客戶端,其他如自定義Socket實現可以依照類似的思路完成。先引入org.springframework.boot:spring-boot-starter-web:2.6.0做一個簡單的控制器方法:

@RestController
public class TestController {

    @PostMapping(path = "/test")
    public ResponseEntity<?> test(MultipartHttpServletRequest request) {
        return ResponseEntity.ok("ok");
    }
}

Postman的模擬請求如下:

後臺控制器得到的請求引數如下:

後面編寫的客戶端可以直接呼叫此介面進行除錯。

封裝請求體轉換為位元組容器的模組

這裡的邊界值全用顯式實現,邊界值直接用固定字首加上UUID生成即可。簡單實現過程中做了一些簡化:

  • 只考慮提交文字表單資料和二進位制(檔案)表單資料
  • 基於上一點,每個部分都明確指定Content-Type這個請求頭
  • 文字編碼固定為UTF-8

編寫一個MultipartWriter

public class MultipartWriter {

    private static final Charset DEFAULT_CHARSET = StandardCharsets.UTF_8;
    private static final byte[] FIELD_SEP = ": ".getBytes(StandardCharsets.ISO_8859_1);
    private static final byte[] CR_LF = "\r\n".getBytes(StandardCharsets.ISO_8859_1);
    private static final String TWO_HYPHENS_TEXT = "--";
    private static final byte[] TWO_HYPHENS = TWO_HYPHENS_TEXT.getBytes(StandardCharsets.ISO_8859_1);
    private static final String CONTENT_DISPOSITION_KEY = "Content-Disposition";
    private static final String CONTENT_TYPE_KEY = "Content-Type";
    private static final String DEFAULT_CONTENT_TYPE = "multipart/form-data; boundary=";
    private static final String DEFAULT_BINARY_CONTENT_TYPE = "application/octet-stream";
    private static final String DEFAULT_TEXT_CONTENT_TYPE = "text/plain;charset=UTF-8";
    private static final String DEFAULT_CONTENT_DISPOSITION_VALUE = "form-data; name=\"%s\"";
    private static final String FILE_CONTENT_DISPOSITION_VALUE = "form-data; name=\"%s\"; filename=\"%s\"";

    private final Map<String, String> headers = new HashMap<>(8);
    private final List<AbstractMultipartPart> parts = new ArrayList<>();
    private final String boundary;

    private MultipartWriter(String boundary) {
        this.boundary = Objects.isNull(boundary) ? TWO_HYPHENS_TEXT +
                UUID.randomUUID().toString().replace("-", "") : boundary;
        this.headers.put(CONTENT_TYPE_KEY, DEFAULT_CONTENT_TYPE + this.boundary);
    }

    public static MultipartWriter newMultipartWriter(String boundary) {
        return new MultipartWriter(boundary);
    }

    public static MultipartWriter newMultipartWriter() {
        return new MultipartWriter(null);
    }

    public MultipartWriter addHeader(String key, String value) {
        if (!CONTENT_TYPE_KEY.equalsIgnoreCase(key)) {
            headers.put(key, value);
        }
        return this;
    }

    public MultipartWriter addTextPart(String name, String text) {
        parts.add(new TextPart(String.format(DEFAULT_CONTENT_DISPOSITION_VALUE, name), DEFAULT_TEXT_CONTENT_TYPE, this.boundary, text));
        return this;
    }

    public MultipartWriter addBinaryPart(String name, byte[] bytes) {
        parts.add(new BinaryPart(String.format(DEFAULT_CONTENT_DISPOSITION_VALUE, name), DEFAULT_BINARY_CONTENT_TYPE, this.boundary, bytes));
        return this;
    }

    public MultipartWriter addFilePart(String name, File file) {
        parts.add(new FilePart(String.format(FILE_CONTENT_DISPOSITION_VALUE, name, file.getName()), DEFAULT_BINARY_CONTENT_TYPE, this.boundary, file));
        return this;
    }

    private static void writeHeader(String key, String value, OutputStream out) throws IOException {
        writeBytes(key, out);
        writeBytes(FIELD_SEP, out);
        writeBytes(value, out);
        writeBytes(CR_LF, out);
    }

    private static void writeBytes(String text, OutputStream out) throws IOException {
        out.write(text.getBytes(DEFAULT_CHARSET));
    }

    private static void writeBytes(byte[] bytes, OutputStream out) throws IOException {
        out.write(bytes);
    }

    interface MultipartPart {

        void writeBody(OutputStream os) throws IOException;
    }

    @RequiredArgsConstructor
    public static abstract class AbstractMultipartPart implements MultipartPart {

        protected final String contentDispositionValue;
        protected final String contentTypeValue;
        protected final String boundary;

        protected String getContentDispositionValue() {
            return contentDispositionValue;
        }

        protected String getContentTypeValue() {
            return contentTypeValue;
        }

        protected String getBoundary() {
            return boundary;
        }

        public final void write(OutputStream out) throws IOException {
            writeBytes(TWO_HYPHENS, out);
            writeBytes(getBoundary(), out);
            writeBytes(CR_LF, out);
            writeHeader(CONTENT_DISPOSITION_KEY, getContentDispositionValue(), out);
            writeHeader(CONTENT_TYPE_KEY, getContentTypeValue(), out);
            writeBytes(CR_LF, out);
            writeBody(out);
            writeBytes(CR_LF, out);
        }
    }

    public static class TextPart extends AbstractMultipartPart {

        private final String text;

        public TextPart(String contentDispositionValue,
                        String contentTypeValue,
                        String boundary,
                        String text) {
            super(contentDispositionValue, contentTypeValue, boundary);
            this.text = text;
        }

        @Override
        public void writeBody(OutputStream os) throws IOException {
            os.write(text.getBytes(DEFAULT_CHARSET));
        }

        @Override
        protected String getContentDispositionValue() {
            return contentDispositionValue;
        }

        @Override
        protected String getContentTypeValue() {
            return contentTypeValue;
        }
    }

    public static class BinaryPart extends AbstractMultipartPart {

        private final byte[] content;

        public BinaryPart(String contentDispositionValue,
                          String contentTypeValue,
                          String boundary,
                          byte[] content) {
            super(contentDispositionValue, contentTypeValue, boundary);
            this.content = content;
        }

        @Override
        public void writeBody(OutputStream out) throws IOException {
            out.write(content);
        }
    }

    public static class FilePart extends AbstractMultipartPart {

        private final File file;

        public FilePart(String contentDispositionValue,
                        String contentTypeValue,
                        String boundary,
                        File file) {
            super(contentDispositionValue, contentTypeValue, boundary);
            this.file = file;
        }

        @Override
        public void writeBody(OutputStream out) throws IOException {
            try (InputStream in = new FileInputStream(file)) {
                final byte[] buffer = new byte[4096];
                int l;
                while ((l = in.read(buffer)) != -1) {
                    out.write(buffer, 0, l);
                }
                out.flush();
            }
        }
    }

    public void forEachHeader(BiConsumer<String, String> consumer) {
        headers.forEach(consumer);
    }

    public void write(OutputStream out) throws IOException {
        if (!parts.isEmpty()) {
            for (AbstractMultipartPart part : parts) {
                part.write(out);
            }
        }
        writeBytes(TWO_HYPHENS, out);
        writeBytes(this.boundary, out);
        writeBytes(TWO_HYPHENS, out);
        writeBytes(CR_LF, out);
    }
}

這個類已經封裝好三種不同型別的部分請求體實現,forEachHeader()方法用於遍歷請求頭,而最終的write()方法用於把請求體寫入到OutputStream中。

HttpURLConnection實現

實現程式碼如下(只做最簡實現,沒有考慮容錯和異常處理):

public class HttpURLConnectionApp {

    private static final String URL = "http://localhost:9099/test";

    public static void main(String[] args) throws Exception {
        MultipartWriter writer = MultipartWriter.newMultipartWriter();
        writer.addTextPart("name", "throwable")
                .addTextPart("domain", "vlts.cn")
                .addFilePart("ico", new File("I:\\doge_favicon.ico"));
        DataOutputStream requestPrinter = new DataOutputStream(System.out);
        writer.write(requestPrinter);
        HttpURLConnection connection = (HttpURLConnection) new java.net.URL(URL).openConnection();
        connection.setRequestMethod("POST");
        connection.addRequestProperty("Connection", "Keep-Alive");
        // 設定請求頭
        writer.forEachHeader(connection::addRequestProperty);
        connection.setDoInput(true);
        connection.setDoOutput(true);
        connection.setConnectTimeout(10000);
        connection.setReadTimeout(10000);
        DataOutputStream out = new DataOutputStream(connection.getOutputStream());
        // 設定請求體
        writer.write(out);
        StringBuilder builder = new StringBuilder();
        BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), StandardCharsets.UTF_8));
        String line;
        while (Objects.nonNull(line = reader.readLine())) {
            builder.append(line);
        }
        int responseCode = connection.getResponseCode();
        reader.close();
        out.close();
        connection.disconnect();
        System.out.printf("響應碼:%d,響應內容:%s\n", responseCode, builder);
    }
}

執行響應結果:

響應碼:200,響應內容:ok

可以嘗試加入兩行程式碼列印請求體:

MultipartWriter writer = MultipartWriter.newMultipartWriter();
writer.addTextPart("name", "throwable")
        .addTextPart("domain", "vlts.cn")
        .addFilePart("ico", new File("I:\\doge_favicon.ico"));
DataOutputStream requestPrinter = new DataOutputStream(System.out);
writer.write(requestPrinter);

控制檯輸出如下;

JDK內建HttpClient實現

JDK11+內建了HTTP客戶端實現,具體入口是java.net.http.HttpClient,實現編碼如下:

public class HttpClientApp {

    private static final String URL = "http://localhost:9099/test";

    public static void main(String[] args) throws Exception {
        HttpClient httpClient = HttpClient.newBuilder()
                .connectTimeout(Duration.of(10, ChronoUnit.SECONDS))
                .build();
        MultipartWriter writer = MultipartWriter.newMultipartWriter();
        writer.addTextPart("name", "throwable")
                .addTextPart("domain", "vlts.cn")
                .addFilePart("ico", new File("I:\\doge_favicon.ico"));
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        writer.write(out);
        HttpRequest.Builder requestBuilder = HttpRequest.newBuilder();
        writer.forEachHeader(requestBuilder::header);
        HttpRequest request = requestBuilder.uri(URI.create(URL))
                .method("POST", HttpRequest.BodyPublishers.ofByteArray(out.toByteArray()))
                .build();
        HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());
        System.out.printf("響應碼:%d,響應內容:%s\n", response.statusCode(), response.body());
    }
}

內建的HTTP元件幾乎都是使用Reactive程式設計模型,使用的API都是相對底層,靈活性比較高但是易用性不高。

小結

媒體型別multipart/form-data常用於POST方法下的HTTP請求,至於作為HTTP響應的場景相對少見。

參考資料:

(本文完 c-1-d e-a-20211226 寫完後發現了Boundary前導多加了中橫槓,不過看了Postman的請求也多加了很多個,懶得改)

相關文章