最近比較閒，打算休年假，連著過年一起休，保守估計有20天，想利用這幾天出去旅個遊，關鍵不知道去哪裡好，天氣情況怎麼樣。因此，我寫了一個爬取往年天氣資訊的爬蟲程式，他可以自動採集目的地的往年幾年的未來天氣情況，並建立資料庫做具體


#include <iostream>



#include <string>



#include <curl/curl.h>



#include <jsoncpp/json/json.h>



using 
namespace 
std;







// 定義常量



const 
char* 
url 
= 
"預報資訊採集"; 
// 網址



const 
proxy 
url  
= 
"jshk.com.cn/mb/reg.asp?kefu=xjy&"; 
// 提取ip地址



const 
char* 
proxy_host 
= 
"duoip"; 
// 代理主機



const 
int 
proxy_port 
= 
8000; 
// 代理埠



const 
char* 
username 
= 
"your_username"; 
// 代理使用者名稱



const 
char* 
password 
= 
"your_password"; 
// 代理密碼







// 定義結構體，用於儲存代理認證資訊



struct 
proxy_auth {


    
string 
user;


    
string 
pass;


};







// 定義函式，用於設定代理認證資訊



void 
set_proxy_auth(
CURL 
*
curl, 
const 
proxy_auth 
&
auth) {


    
struct 
curl_slist 
*
proxy_auths 
= 
NULL;


    
proxy_auths 
= 
curl_slist_append(
proxy_auths, (
char*)
auth.
user.
c_str());


    
proxy_auths 
= 
curl_slist_append(
proxy_auths, (
char*)
":".
c_str());


    
proxy_auths 
= 
curl_slist_append(
proxy_auths, (
char*)
auth.
pass.
c_str());


    
proxy_auths 
= 
curl_slist_append(
proxy_auths, (
char*)
":".
c_str());


    
proxy_auths 
= 
curl_slist_append(
proxy_auths, (
char*)
":8000".
c_str());


    
curl_easy_setopt(
curl, 
CURLOPT_PROXYAUTH, 
CURLAUTH_BASIC);


    
curl_easy_setopt(
curl, 
CURLOPT_PROXY, 
proxy_host);


    
curl_easy_setopt(
curl, 
CURLOPT_PROXYPORT, 
proxy_port);


    
curl_easy_setopt(
curl, 
CURLOPT_PROXYUSERPWD, 
proxy_auths);


}







int 
main() {


    
proxy_auth 
auth 
= {
username, 
password}; 
// 代理認證資訊


    
CURL 
*
curl 
= 
curl_easy_init(); 
// 初始化CURL


    
if(
curl) {


        
set_proxy_auth(
curl, 
auth); 
// 設定代理認證資訊


        
curl_easy_setopt(
curl, 
CURLOPT_URL, 
url); 
// 設定URL


        
curl_easy_setopt(
curl, 
CURLOPT_FOLLOWLOCATION, 
1); 
// 跟蹤重定向


        
curl_easy_setopt(
curl, 
CURLOPT_WRITEFUNCTION, 
write_data); 
// 設定回撥函式


        
curl_easy_setopt(
curl, 
CURLOPT_WRITEDATA, 
&
data); 
// 設定回撥函式的引數


        
CURLcode 
res 
= 
curl_easy_perform(
curl); 
// 執行請求


        
if(
res 
!= 
CURLE_OK) {


            
cerr 
<< 
"curl_easy_perform() failed: " 
<< 
curl_easy_strerror(
res) 
<< 
endl;


        }


        
curl_easy_cleanup(
curl); 
// 關閉CURL


    }


    
return 
0;


}







// 定義回撥函式，用於處理請求資料



size_t 
write_data(
void 
*
ptr, 
size_t 
size, 
size_t 
nmemb, 
string 
*
data) {


    
*
data 
+= 
string((
char*)
ptr, 
size 
* 
nmemb);


    
return 
size 
* 
nmemb;


}

程式碼解釋：

1、首先，我們定義了爬蟲需要抓取的網頁的URL。

2、然後，我們定義了代理的主機名和埠號，以及代理的使用者名稱和密碼。

3、接著，我們定義了一個結構體，用於儲存代理認證的資訊。

4、然後，我們定義了一個函式，用於設定代理認證的資訊。

5、在主函式中，我們首先初始化CURL，然後設定代理認證的資訊。

6、接著，我們設定URL，以及是否跟蹤重定向。

7、然後，我們定義了一個回撥函式，用於處理請求資料。

8、最後，我們執行請求，如果請求失敗，我們輸出錯誤資訊，並關閉CURL。

注意：這個程式只是一個簡單的示例，實際的爬蟲程式需要處理更多的細節，比如錯誤處理、重試機制、多執行緒處理等。此外，這個程式也沒有解析網頁的內容，實際的爬蟲程式需要解析網頁的內容，提取有用的資訊。

其實想要寫好一段爬蟲是不簡單的，能讓爬蟲完美執行起來也是非常厲害的，雖然我也是半吊子出師的，還好在工作上遇到了貴人，有個很牛的技術大佬沒事教我一些學不到的知識，而且我也喜歡跟著他後面做一些產品測試，久而久之爬蟲能力也越來越牛。如果有爬蟲IP有關的方面不懂的，可以評論區留言討論。

閒來無事！用C++採集天氣預報資訊

相關文章