專案地址:LeetCodeCrawler
概述
現在一般或多或少都會在LeetCode上面進行刷題練習,然後將程式碼放在GitHub上,當然我也一樣,這是我的刷題倉庫Algorithm。刷完題如果每次都去重新編輯README.md
檔案進行更新,未免顯得有些費時,因此有了需求,個人就寫了一個工具——LeetCodeCrawler:爬取 LeetCode 題目內容以及提交的AC程式碼的工具,並支援生成相應的 README.md 檔案,美化你的 LeetCode 倉庫的README。
使用方法
下載LeetCodeCrawler.jar到本地
建立好如下config.json
檔案(可直接對 repo 的config.json
進行更改),config.json
檔案需與LeetCodeCrawler.jar
放置於同一目錄下:
{
"username": "leetcode@leetcode",
"password": "leetcode",
"language": ["cpp", "java"],
"outputDir": "."
}
複製程式碼
username
和password
對應你的 LeetCode 賬號和密碼language
對應於你在 LeetCode 刷題使用的程式語言,可多選,選填欄位如下(請嚴格按照如下欄位填寫):- cpp
- java
- c
- csharp
- javascript
- python
- python3
- ruby
- swift
- golang
- scala
- kotlin
outputDir
欄位表示你希望存放原始碼檔案的目錄,預設為.
,即當前目錄
執行java -jar LeetCodeCrawler.jar
效果
爬取解析
幾個相關API
主要通過兩種方式來獲取我們想要的資料:1.RESTful API
2.GraphQL
以下是爬取過程中幾個有用的 API:
- 所有題目的相關資訊:
https://leetcode.com/api/problems/all/
,資料大致如下:
{
"user_name": "",
"num_solved": 0,
"num_total": 949,
"ac_easy": 0,
"ac_medium": 0,
"ac_hard": 0,
"stat_status_pairs": [
{
"stat":
{
"question_id": 993,
"question__article__live": true,
"question__article__slug": "tallest-billboard",
"question__title": "Tallest Billboard",
"question__title_slug": "tallest-billboard",
"question__hide": false,
"total_acs": 1361,
"total_submitted": 4295,
"frontend_question_id": 956,
"is_new_question": false
},
"status": null,
"difficulty":
{
"level": 3
},
"paid_only": false,
"is_favor": false,
"frequency": 0,
"progress": 0
},
...省略
],
"frequency_high": 0,
"frequency_mid": 0,
"category_slug": "all"
}
複製程式碼
- 某道題目提交的程式碼的資訊:
https://leetcode.com/api/submissions/two-sum/?offset=0&limit=10&lastkey=
,提交的程式碼列表可能超過一頁的顯示篇幅,因此需要做翻頁判斷的邏輯,資料大致如下:
{
"submissions_dump": [
{
"id": xxx,
"lang": "java",
"time": "2 weeks, 5 days",
"timestamp": 154****320,
"status_display": "Accepted",
"runtime": "4 ms",
"url": "/submissions/detail/19****359/",
"is_pending": "Not Pending",
"title": ""
},
...省略
],
"has_next": true,
"last_key": "xxx"
}
複製程式碼
- GraphQL:
https://leetcode.com/graphql
,向這個連結傳送query
請求,獲取我們想要的資料
模擬登陸
之前寫過一篇博文來說明了如何模擬登陸——使用OkHttp模擬登陸LeetCode,可進一步檢視,這裡簡單說一下。根據抓包結果可以得到:
因此我們只要建立一個Content-Type
型別為multipart/form-data
的請求,然後帶上初始開啟登入頁返回的Cookie
值即可完成模擬登陸。
/**
* 模擬登陸 LeetCodo,登陸過程分析見:https://www.cnblogs.com/ZhaoxiCheung/p/9302510.html
*/
public boolean doLogin() throws IOException {
boolean success;
Connection.Response response = Jsoup.connect(URL.LOGIN)
.method(Connection.Method.GET)
.execute();
csrftoken = response.cookie("csrftoken");
__cfduid = response.cookie("__cfduid");
OkHttpClient client = new OkHttpClient.Builder()
.followRedirects(false)
.followSslRedirects(false)
.cookieJar(new MyCookieJar())
.connectTimeout(30, TimeUnit.SECONDS)
.readTimeout(30, TimeUnit.SECONDS)
.writeTimeout(30, TimeUnit.SECONDS)
.build();
String form_data = "--" + boundary + "\r\n"
+ "Content-Disposition: form-data; name=\"csrfmiddlewaretoken\"" + "\r\n\r\n"
+ csrftoken + "\r\n"
+ "--" + boundary + "\r\n"
+ "Content-Disposition: form-data; name=\"login\"" + "\r\n\r\n"
+ usrname + "\r\n"
+ "--" + boundary + "\r\n"
+ "Content-Disposition: form-data; name=\"password\"" + "\r\n\r\n"
+ passwd + "\r\n"
+ "--" + boundary + "\r\n"
+ "Content-Disposition: form-data; name=\"next\"" + "\r\n\r\n"
+ "/problems" + "\r\n"
+ "--" + boundary + "--";
RequestBody requestBody = RequestBody.create(MULTIPART, form_data);
Request request = new Request.Builder()
.addHeader("Content-Type", "multipart/form-data; boundary=" + boundary)
.addHeader("Connection", "keep-alive")
.addHeader("Accept", "*/*")
.addHeader("Origin", "https://leetcode.com")
.addHeader("Referer", URL.LOGIN)
.addHeader("Cookie", "__cfduid=" + __cfduid + ";" + "csrftoken=" + csrftoken)
.post(requestBody)
.url(URL.LOGIN)
.build();
Response loginResponse = client.newCall(request).execute();
if (Main.isDebug) out.println(loginResponse.message());
Headers headers = loginResponse.headers();
List<String>cookies = headers.values("Set-Cookie");
for (String cookie : cookies) {
int found = cookie.indexOf("LEETCODE_SESSION");
if (found > -1) {
if (Main.isDebug) out.println(cookie);
int last = cookie.indexOf(";");
LEETCODE_SESSION = cookie.substring("LEETCODE_SESSION".length() + 1, last);
if (Main.isDebug) out.println(LEETCODE_SESSION);
}
}
if (LEETCODE_SESSION != null) {
success = true;
out.println("Login Successfully");
} else {
success = false;
out.println("Login Unsuccessfully");
}
loginResponse.close();
return success;
}
複製程式碼
利用 GraphQL 獲取資料
並非所有的資料都可以通過RESTful API
的形式獲取,LeetCode 對於有些資料用的是GraphQL
的方式,比如題目的Description
。之前也寫了一篇關於使用GraphQL
來獲取 LeetCode 資料的文章——爬取LeetCode題目——如何傳送GraphQL Query獲取資料,可進一步檢視。這裡主要說一下怎麼知道我們要傳送怎樣的query
語句。在 Chrome 瀏覽器下使用 F12,右鍵 Network 下,從Header
中的Request Payload
中我們可以看到一個 query 的欄位,這是我們要構造的 GraphQL Query 的一個重要資訊。,如下圖所示:
其他
獲取題目的描述
public String getProblemDescription(String problemTitle) throws IOException {
String problemDescriptionString = "";
String postBody = "query{question(titleSlug:\"" + problemTitle + "\") {content}}\n";
RequestBody requestBody = RequestBody.create(MediaType.parse("application/graphql; charset=utf-8"), postBody);
Headers headers = new Headers.Builder()
.add("Content-Type", "application/graphql")
.add("Referer", "https://leetcode.com/problems/" + problemTitle)
.add("Cookie", "__cfduid=" + Login.__cfduid + ";" + "csrftoken=" + Login.csrftoken + ";" + "LEETCODE_SESSION=" + Login.LEETCODE_SESSION)
.add("x-csrftoken", Login.csrftoken)
.build();
Response graphqlResponse = okHttpHelper.post(URL.GRAPHQL, requestBody, headers);
if (graphqlResponse != null) {
ProblemContentBean problemContentBean = okHttpHelper.fromJson(graphqlResponse.body().string(), ProblemContentBean.class);
problemDescriptionString = problemContentBean.getData().getQuestion().getContent();
graphqlResponse.close();
} else {
//TODO 輸出錯誤資訊
}
return problemDescriptionString;
}
複製程式碼
獲取某道題對於某個語言提交的程式碼
public String getSubmissionCode(String submissionUrl) throws IOException {
String url = URL.LEETCODE + submissionUrl;
if (Main.isDebug) out.println(url);
String codeString = null;
Headers headers = new Headers.Builder()
.add("Cookie", "__cfduid=" + Login.__cfduid + ";" + "csrftoken=" + Login.csrftoken + ";" + "LEETCODE_SESSION=" + Login.LEETCODE_SESSION)
.build();
Response response = okHttpHelper.get(url, headers);
if (response != null) {
String htmlString = response.body().string();
Document document = Jsoup.parse(htmlString);
Elements elements = document.getElementsByTag("script");
for (Element element : elements) {
int indexStart = element.toString().indexOf("submissionCode: '");
if (indexStart > -1) {
int indexTo = element.toString().indexOf("editCodeUrl");
codeString = element.toString().substring(indexStart + ("submissionCode: '").length(), indexTo - 5);
break;
}
}
response.close();
} else {
//TODO 錯誤資訊處理
}
codeString = encode(codeString);
return codeString;
}
複製程式碼
獲取題目對於 config 檔案指定的語言提交的程式碼
public synchronized Map<String, String> getSubmissions(String problemTitle, ResultBean resultBean) throws IOException {
if (Main.isDebug) out.println("pre problemTitle = " + problemTitle);
//儲存語言對應的提交程式碼
Map<String, String> submissionMap = new HashMap<>();
int offset = 0;
int limit = 10;
boolean hasNext = true;
String lastKey = "";
List<String> languageList = Config.getSingleton().getLanguageList();
//已經在本地存有對應語言的程式碼
List<String> savedLanguageList = resultBean != null ? resultBean.getLanguage() : new ArrayList<>(0);
//儲存某個語言的程式碼是否已經抓取
Map<String, Boolean>languageMap = new HashMap<>();
for (int i = 0; i < languageList.size(); i++) {
boolean hasExist = false;
//資料量較小,暴力搜尋
for (int j = 0; j < savedLanguageList.size(); j++) {
if (languageList.get(i).equals(savedLanguageList.get(j))) {
hasExist = true;
break;
}
}
if (!hasExist) languageMap.put(languageList.get(i), false);
}
//想要爬取的題目的對應語言提交的程式碼已經儲存在本地了
if (languageMap.size() == 0) return submissionMap;
while(hasNext) {
String submissionsUrl = String.format(URL.SUBMISSIONS_FORMAT, problemTitle, offset, limit, lastKey);
Headers headers = new Headers.Builder()
.add("Cookie", "__cfduid=" + Login.__cfduid + ";" + "csrftoken=" + Login.csrftoken + ";" + "LEETCODE_SESSION=" + Login.LEETCODE_SESSION)
.build();
Response response = okHttpHelper.get(submissionsUrl, headers);
if (response != null) {
String responseData = response.body().string();
SubmissionBean submissionBean = okHttpHelper.fromJson(responseData, SubmissionBean.class);
List<SubmissionBean.SubmissionsDumpBean> submissionsDumpList = submissionBean.getSubmissions_dump();
if (submissionsDumpList == null) {
if (Main.isDebug) {
out.println("submissionsUrl = " + submissionsUrl);
out.println("problemTitle = " + problemTitle);
out.println("responseData = " + responseData);
out.println("status message = " + response.message());
out.println("message code = " + response.code());
}
continue;
}
for (int i = 0; i < submissionsDumpList.size(); i++) {
SubmissionBean.SubmissionsDumpBean submission = submissionsDumpList.get(i);
String language = submission.getLang();
if (languageMap.containsKey(language) && languageMap.get(language) == false && submission.getStatus_display().equals("Accepted")) {
submissionMap.put(language, getSubmissionCode(submission.getUrl()));
languageMap.put(language, true);
}
}
//翻頁邏輯
hasNext = submissionBean.isHas_next();
offset = (++offset) * limit;
lastKey = submissionBean.getLast_key();
response.close();
} else {
//TODO
}
}
return submissionMap;
}
複製程式碼
更詳細的程式碼可在 GitHub 檢視——LeetCodeCrawler