什麼是htmlq？

htmlq類似於 jq，但用於 HTML。使用 CSS 選擇器從 HTML 檔案中提取部分內容。在 CSS 中，選擇器用於定位我們想要設定樣式的網頁上的 HTML 元素。例如，我們可以使用此工具輕鬆提取影像或其他 URL。

安裝htmlq

首先需要在系統中安裝cargo然後使用cargo來安裝htmlq：

[root@localhost ~]# yum -y install cargo
[root@localhost ~]# cargo install htmlq

設定可執行的路徑

確保將 $HOME/.cargo/bin 新增到 PATH 變數中，以便能夠使用 export 執行已安裝的二進位制檔案：

[root@localhost ~]# echo 'export PATH="$PATH:$HOME/.cargo/bin"' >> ~/.bash_profile 
[root@localhost ~]# . ~/.bash_profile

如何使用 htmlq 從 HTML 檔案中提取內容？

下面是使用curl和htmlq的用法：

curl -s url | htmlq '#css-selector'
curl -s url2 | htmlq '.css-selector'
curl -s | htmlq --pretty '#content' | more

讓我們找到頁面中的所有連結。例如：

[root@localhost ~]# curl -s | htmlq --attribute href a

如何使用htmlq提取html檔案內容如何使用htmlq提取html檔案內容
人性化顯示HTML:

[root@localhost ~]# curl --silent | htmlq --pretty '#posts'

如何使用htmlq提取html檔案內容如何使用htmlq提取html檔案內容

幫助手冊

使用下面檢視幫助頁面：

[root@localhost ~]# htmlq --help
htmlq 0.3.0
Michael Maclean <michael@mgdm.net>
Runs CSS selectors on HTML
USAGE:
    htmlq [FLAGS] [OPTIONS] [selector]...
FLAGS:
    -B, --detect-base          Try to detect the base URL from the <base> tag in the document. If not found, default to
                               the value of --base, if supplied
    -h, --help                 Prints help information
    -w, --ignore-whitespace    When printing text nodes, ignore those that consist entirely of whitespace
    -p, --pretty               Pretty-print the serialised output
    -t, --text                 Output only the contents of text nodes inside selected elements
    -V, --version              Prints version information
OPTIONS:
    -a, --attribute <attribute>    Only return this attribute (if present) from selected elements
    -b, --base <base>              Use this URL as the base for links
    -f, --filename <FILE>          The input file. Defaults to stdin
    -o, --output <FILE>            The output file. Defaults to stdout
ARGS:
    <selector>...    The CSS expression to select [default: html]

如何使用htmlq提取html檔案內容如何使用htmlq提取html檔案內容

總結

htmlq能夠對 HTML 資料進行 sed 或 grep 操作。我們可以使用 htmlq 搜尋、切片和過濾 HTML 資料。

如何使用htmlq提取html檔案內容

相關文章