python 求職 Top10 城市，來看看是否有你所在的城市

發表於2017-07-14

Python求職

前言

從智聯招聘爬取相關資訊後，我們關心的是如何對內容進行分析，獲取用用的資訊。

本次以上篇文章“5分鐘掌握智聯招聘網站爬取並儲存到MongoDB資料庫”中爬取的資料為基礎，分析關鍵詞為“python”的爬取資料的情況，獲取包括全國python招聘數量Top10的城市列表以及其他相關資訊。

一、主要分析步驟

資料讀取
資料整理
對職位數量在全國主要城市的分佈情況進行分析
對全國範圍內的職位月薪情況進行分析
對該職位招聘崗位要求描述進行詞雲圖分析，獲取頻率最高的關鍵字
選取兩個城市，分別分析月薪分佈情況以及招聘要求的詞雲圖分析

二、具體分析過程

import pymongo
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
% matplotlib inline
plt.style.use('ggplot')

import pymongo

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

% matplotlib inline

plt.style.use('ggplot')

# 解決matplotlib顯示中文問題
plt.rcParams['font.sans-serif'] = ['SimHei']  # 指定預設字型
plt.rcParams['axes.unicode_minus'] = False  # 解決儲存影像是負號'-'顯示為方塊的問題

# 解決matplotlib顯示中文問題

plt.rcParams['font.sans-serif'] = ['SimHei'] # 指定預設字型

plt.rcParams['axes.unicode_minus'] = False # 解決儲存影像是負號'-'顯示為方塊的問題

1 讀取資料

client = pymongo.MongoClient('localhost')
db = client['zhilian']
table = db['python']
columns = ['zwmc',
           'gsmc',
           'zwyx',
           'gbsj',
           'gzdd',
           'fkl',
           'brief',
           'zw_link',
           '_id',
           'save_date']
# url_set =  set([records['zw_link'] for records in table.find()])
# print(url_set)
df = pd.DataFrame([records for records in table.find()], columns=columns)
# columns_update = ['職位名稱',
#                   '公司名稱',
#                   '職位月薪',
#                   '公佈時間',
#                   '工作地點',
#                   '反饋率',
#                   '招聘簡介',
#                   '網頁連結',
#                   '_id',
#                   '資訊儲存日期']
# df.columns = columns_update
print('總行數為：{}行'.format(df.shape[0]))
df.head(2)

client = pymongo.MongoClient('localhost')

db = client['zhilian']

table = db['python']

columns = ['zwmc',

'gsmc',

'zwyx',

'gbsj',

'gzdd',

'fkl',

'brief',

'zw_link',

'_id',

'save_date']

# url_set = set([records['zw_link'] for records in table.find()])

# print(url_set)

df = pd.DataFrame([records for records in table.find()], columns=columns)

# columns_update = ['職位名稱',

# '公司名稱',

# '職位月薪',

# '公佈時間',

# '工作地點',

# '反饋率',

# '招聘簡介',

# '網頁連結',

# '_id',

# '資訊儲存日期']

# df.columns = columns_update

print('總行數為：{}行'.format(df.shape[0]))

df.head(2)

結果如圖1所示：

python 求職 Top10 城市，來看看是否有你所在的城市

2 資料整理

2.1 將str格式的日期變為 datatime

df['save_date'] = pd.to_datetime(df['save_date'])
print(df['save_date'].dtype)
# df['save_date']

df['save_date'] = pd.to_datetime(df['save_date'])

print(df['save_date'].dtype)

# df['save_date']

datetime64[ns]

1 2	datetime64[ns]

2.2 篩選月薪格式為“XXXX-XXXX”的資訊

df_clean = df[['zwmc',
           'gsmc',
           'zwyx',
           'gbsj',
           'gzdd',
           'fkl',
           'brief',
           'zw_link',
           'save_date']]
# 對月薪的資料進行篩選，選取格式為“XXXX-XXXX”的資訊，方面後續分析
df_clean = df_clean[df_clean['zwyx'].str.contains('\d+-\d+', regex=True)]
print('總行數為：{}行'.format(df_clean.shape[0]))
# df_clean.head()

df_clean = df[['zwmc',

'gsmc',

'zwyx',

'gbsj',

'gzdd',

'fkl',

'brief',

'zw_link',

'save_date']]

# 對月薪的資料進行篩選，選取格式為“XXXX-XXXX”的資訊，方面後續分析

df_clean = df_clean[df_clean['zwyx'].str.contains('\d+-\d+', regex=True)]

print('總行數為：{}行'.format(df_clean.shape[0]))

# df_clean.head()

總行數為：22605行

1 2	總行數為：22605行

2.3 分割月薪欄位，分別獲取月薪的下限值和上限值

# http://stackoverflow.com/questions/14745022/pandas-dataframe-how-do-i-split-a-column-into-two
# http://stackoverflow.com/questions/20602947/append-column-to-pandas-dataframe
# df_temp.loc[: ,'zwyx_min'],df_temp.loc[: , 'zwyx_max'] = df_temp.loc[: , 'zwyx'].str.split('-',1).str #會有警告
s_min, s_max = df_clean.loc[: , 'zwyx'].str.split('-',1).str
df_min = pd.DataFrame(s_min)
df_min.columns = ['zwyx_min']
df_max = pd.DataFrame(s_max)
df_max.columns = ['zwyx_max']
df_clean_concat = pd.concat([df_clean, df_min, df_max], axis=1)
# df_clean['zwyx_min'].astype(int)
df_clean_concat['zwyx_min'] = pd.to_numeric(df_clean_concat['zwyx_min'])
df_clean_concat['zwyx_max'] = pd.to_numeric(df_clean_concat['zwyx_max'])
# print(df_clean['zwyx_min'].dtype)
print(df_clean_concat.dtypes)
df_clean_concat.head(2)

# http://stackoverflow.com/questions/14745022/pandas-dataframe-how-do-i-split-a-column-into-two

# http://stackoverflow.com/questions/20602947/append-column-to-pandas-dataframe

# df_temp.loc[: ,'zwyx_min'],df_temp.loc[: , 'zwyx_max'] = df_temp.loc[: , 'zwyx'].str.split('-',1).str #會有警告

s_min, s_max = df_clean.loc[: , 'zwyx'].str.split('-',1).str

df_min = pd.DataFrame(s_min)

df_min.columns = ['zwyx_min']

df_max = pd.DataFrame(s_max)

df_max.columns = ['zwyx_max']

df_clean_concat = pd.concat([df_clean, df_min, df_max], axis=1)

# df_clean['zwyx_min'].astype(int)

df_clean_concat['zwyx_min'] = pd.to_numeric(df_clean_concat['zwyx_min'])

df_clean_concat['zwyx_max'] = pd.to_numeric(df_clean_concat['zwyx_max'])

# print(df_clean['zwyx_min'].dtype)

print(df_clean_concat.dtypes)

df_clean_concat.head(2)

執行結果如圖2所示： python 求職 Top10 城市，來看看是否有你所在的城市

將資料資訊按職位月薪進行排序

df_clean_concat.sort_values('zwyx_min',inplace=True)
# df_clean_concat.tail()

df_clean_concat.sort_values('zwyx_min',inplace=True)

# df_clean_concat.tail()

判斷爬取的資料是否有重複值

# 判斷爬取的資料是否有重複值
print(df_clean_concat[df_clean_concat.duplicated('zw_link')==True])

# 判斷爬取的資料是否有重複值

print(df_clean_concat[df_clean_concat.duplicated('zw_link')==True])

Empty DataFrame
Columns: [zwmc, gsmc, zwyx, gbsj, gzdd, fkl, brief, zw_link, save_date, zwyx_min, zwyx_max]
Index: []

Empty DataFrame

Columns: [zwmc, gsmc, zwyx, gbsj, gzdd, fkl, brief, zw_link, save_date, zwyx_min, zwyx_max]

Index: []

從上述結果可看出，資料是沒有重複的。

3 對全國範圍內的職位進行分析

3.1 主要城市的招聘職位數量分佈情況

# from IPython.core.display import display, HTML
ADDRESS = [ '北京', '上海', '廣州', '深圳',
           '天津', '武漢', '西安', '成都', '大連',
           '長春', '瀋陽', '南京', '濟南', '青島',
           '杭州', '蘇州', '無錫', '寧波', '重慶',
           '鄭州', '長沙', '福州', '廈門', '哈爾濱',
           '石家莊', '合肥', '惠州', '太原', '昆明',
           '煙臺', '佛山', '南昌', '貴陽', '南寧']
df_city = df_clean_concat.copy()
# 由於工作地點的寫上，比如北京，包含許多地址為北京-朝陽區等
# 可以用替換的方式進行整理，這裡用pandas的replace()方法
for city in ADDRESS:
    df_city['gzdd'] = df_city['gzdd'].replace([(city+'.*')],[city],regex=True)
# 針對全國主要城市進行分析
df_city_main = df_city[df_city['gzdd'].isin(ADDRESS)]
df_city_main_count = df_city_main.groupby('gzdd')['zwmc','gsmc'].count()
df_city_main_count['gsmc'] = df_city_main_count['gsmc']/(df_city_main_count['gsmc'].sum())
df_city_main_count.columns = ['number', 'percentage']
# 按職位數量進行排序
df_city_main_count.sort_values(by='number', ascending=False, inplace=True)
# 新增輔助列，標註城市和百分比，方面在後續繪圖時使用
df_city_main_count['label']=df_city_main_count.index+ ' '+  ((df_city_main_count['percentage']*100).round()).astype('int').astype('str')+'%'
print(type(df_city_main_count))
# 職位數量最多的Top10城市的列表
print(df_city_main_count.head(10))

# from IPython.core.display import display, HTML

ADDRESS = [ '北京', '上海', '廣州', '深圳',

'天津', '武漢', '西安', '成都', '大連',

'長春', '瀋陽', '南京', '濟南', '青島',

'杭州', '蘇州', '無錫', '寧波', '重慶',

'鄭州', '長沙', '福州', '廈門', '哈爾濱',

'石家莊', '合肥', '惠州', '太原', '昆明',

'煙臺', '佛山', '南昌', '貴陽', '南寧']

df_city = df_clean_concat.copy()

# 由於工作地點的寫上，比如北京，包含許多地址為北京-朝陽區等

# 可以用替換的方式進行整理，這裡用pandas的replace()方法

for city in ADDRESS:

df_city['gzdd'] = df_city['gzdd'].replace([(city+'.*')],[city],regex=True)

# 針對全國主要城市進行分析

df_city_main = df_city[df_city['gzdd'].isin(ADDRESS)]

df_city_main_count = df_city_main.groupby('gzdd')['zwmc','gsmc'].count()

df_city_main_count['gsmc'] = df_city_main_count['gsmc']/(df_city_main_count['gsmc'].sum())

df_city_main_count.columns = ['number', 'percentage']

# 按職位數量進行排序

df_city_main_count.sort_values(by='number', ascending=False, inplace=True)

# 新增輔助列，標註城市和百分比，方面在後續繪圖時使用

df_city_main_count['label']=df_city_main_count.index+ ' '+ ((df_city_main_count['percentage']*100).round()).astype('int').astype('str')+'%'

print(type(df_city_main_count))

# 職位數量最多的Top10城市的列表

print(df_city_main_count.head(10))

<class 'pandas.core.frame.DataFrame'>
      number  percentage   label
gzdd                            
北京      6936    0.315948  北京 32%
上海      3213    0.146358  上海 15%
深圳      1908    0.086913   深圳 9%
成都      1290    0.058762   成都 6%
杭州      1174    0.053478   杭州 5%
廣州      1167    0.053159   廣州 5%
南京       826    0.037626   南京 4%
鄭州       741    0.033754   鄭州 3%
武漢       552    0.025145   武漢 3%
西安       473    0.021546   西安 2%

number percentage label

gzdd

北京 6936 0.315948 北京 32%

上海 3213 0.146358 上海 15%

深圳 1908 0.086913 深圳 9%

成都 1290 0.058762 成都 6%

杭州 1174 0.053478 杭州 5%

廣州 1167 0.053159 廣州 5%

南京 826 0.037626 南京 4%

鄭州 741 0.033754 鄭州 3%

武漢 552 0.025145 武漢 3%

西安 473 0.021546 西安 2%

對結果進行繪圖：

from  matplotlib import cm
label = df_city_main_count['label']
sizes = df_city_main_count['number']
# 設定繪圖區域大小
fig, axes = plt.subplots(figsize=(10,6),ncols=2)
ax1, ax2 = axes.ravel()
colors = cm.PiYG(np.arange(len(sizes))/len(sizes)) # colormaps: Paired, autumn, rainbow, gray,spring,Darks
# 由於城市數量太多，餅圖中不顯示labels和百分比
patches, texts = ax1.pie(sizes,labels=None, shadow=False, startangle=0, colors=colors)
ax1.axis('equal')  
ax1.set_title('職位數量分佈', loc='center')
# ax2 只顯示圖例（legend）
ax2.axis('off')
ax2.legend(patches, label, loc='center left', fontsize=9)
plt.savefig('job_distribute.jpg')
plt.show()

from matplotlib import cm

label = df_city_main_count['label']

sizes = df_city_main_count['number']

# 設定繪圖區域大小

fig, axes = plt.subplots(figsize=(10,6),ncols=2)

ax1, ax2 = axes.ravel()

colors = cm.PiYG(np.arange(len(sizes))/len(sizes)) # colormaps: Paired, autumn, rainbow, gray,spring,Darks

# 由於城市數量太多，餅圖中不顯示labels和百分比

patches, texts = ax1.pie(sizes,labels=None, shadow=False, startangle=0, colors=colors)

ax1.axis('equal')

ax1.set_title('職位數量分佈', loc='center')

# ax2 只顯示圖例（legend）

ax2.axis('off')

ax2.legend(patches, label, loc='center left', fontsize=9)

plt.savefig('job_distribute.jpg')

plt.show()

執行結果如下述餅圖所示： python 求職 Top10 城市，來看看是否有你所在的城市

3.2 月薪分佈情況（全國）

from matplotlib.ticker import FormatStrFormatter
fig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)
x_pos = list(range(df_clean_concat.shape[0]))
y1 = df_clean_concat['zwyx_min']
ax1.plot(x_pos, y1)
ax1.set_title('Trend of min monthly salary in China', size=14)
ax1.set_xticklabels('')
ax1.set_ylabel('min monthly salary(RMB)')
bins = [3000,6000, 9000, 12000, 15000, 18000, 21000, 24000, 100000]
counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype='bar', facecolor='g', rwidth=0.8)
ax2.set_title('Hist of min monthly salary in China', size=14)
ax2.set_yticklabels('')
# ax2.set_xlabel('min monthly salary(RMB)')
# http://stackoverflow.com/questions/6352740/matplotlib-label-each-bin
ax2.set_xticks(bins) #將bins設定為xticks
ax2.set_xticklabels(bins, rotation=-90) # 設定為xticklabels的方向
# Label the raw counts and the percentages below the x-axis...
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
for count, x in zip(counts, bin_centers):
#     # Label the raw counts
#     ax2.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),
#         xytext=(0, -70), textcoords='offset points', va='top', ha='center', rotation=-90)
    # Label the percentages
    percent = '%0.0f%%' % (100 * float(count) / counts.sum())
    ax2.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),
        xytext=(0, -40), textcoords='offset points', va='top', ha='center', rotation=-90, color='b', size=14)
fig.savefig('salary_quanguo_min.jpg')

from matplotlib.ticker import FormatStrFormatter

fig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)

x_pos = list(range(df_clean_concat.shape[0]))

y1 = df_clean_concat['zwyx_min']

ax1.plot(x_pos, y1)

ax1.set_title('Trend of min monthly salary in China', size=14)

ax1.set_xticklabels('')

ax1.set_ylabel('min monthly salary(RMB)')

bins = [3000,6000, 9000, 12000, 15000, 18000, 21000, 24000, 100000]

counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype='bar', facecolor='g', rwidth=0.8)

ax2.set_title('Hist of min monthly salary in China', size=14)

ax2.set_yticklabels('')

# ax2.set_xlabel('min monthly salary(RMB)')

# http://stackoverflow.com/questions/6352740/matplotlib-label-each-bin

ax2.set_xticks(bins) #將bins設定為xticks

ax2.set_xticklabels(bins, rotation=-90) # 設定為xticklabels的方向

# Label the raw counts and the percentages below the x-axis...

bin_centers = 0.5 * np.diff(bins) + bins[:-1]

for count, x in zip(counts, bin_centers):

# # Label the raw counts

# ax2.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),

# xytext=(0, -70), textcoords='offset points', va='top', ha='center', rotation=-90)

# Label the percentages

percent = '%0.0f%%' % (100 * float(count) / counts.sum())

ax2.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),

xytext=(0, -40), textcoords='offset points', va='top', ha='center', rotation=-90, color='b', size=14)

fig.savefig('salary_quanguo_min.jpg')

執行結果如下述圖所示： python 求職 Top10 城市，來看看是否有你所在的城市

不考慮部分極值後，分析月薪分佈情況

df_zwyx_adjust = df_clean_concat[df_clean_concat['zwyx_min']<=20000]
fig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)
x_pos = list(range(df_zwyx_adjust.shape[0]))
y1 = df_zwyx_adjust['zwyx_min']
ax1.plot(x_pos, y1)
ax1.set_title('Trend of min monthly salary in China (adjust)', size=14)
ax1.set_xticklabels('')
ax1.set_ylabel('min monthly salary(RMB)')
bins = [3000,6000, 9000, 12000, 15000, 18000, 21000]
counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype='bar', facecolor='g', rwidth=0.8)
ax2.set_title('Hist of min monthly salary in China (adjust)', size=14)
ax2.set_yticklabels('')
# ax2.set_xlabel('min monthly salary(RMB)')
# http://stackoverflow.com/questions/6352740/matplotlib-label-each-bin
ax2.set_xticks(bins) #將bins設定為xticks
ax2.set_xticklabels(bins, rotation=-90) # 設定為xticklabels的方向
# Label the raw counts and the percentages below the x-axis...
bin_centers = 0.5 * np.diff(bins) + bins[:-1]
for count, x in zip(counts, bin_centers):
#     # Label the raw counts
#     ax2.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),
#         xytext=(0, -70), textcoords='offset points', va='top', ha='center', rotation=-90)
    # Label the percentages
    percent = '%0.0f%%' % (100 * float(count) / counts.sum())
    ax2.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),
        xytext=(0, -40), textcoords='offset points', va='top', ha='center', rotation=-90, color='b', size=14)
fig.savefig('salary_quanguo_min_adjust.jpg')

df_zwyx_adjust = df_clean_concat[df_clean_concat['zwyx_min']<=20000]

fig, (ax1, ax2) = plt.subplots(figsize=(10,8), nrows=2)

x_pos = list(range(df_zwyx_adjust.shape[0]))

y1 = df_zwyx_adjust['zwyx_min']

ax1.plot(x_pos, y1)

ax1.set_title('Trend of min monthly salary in China (adjust)', size=14)

ax1.set_xticklabels('')

ax1.set_ylabel('min monthly salary(RMB)')

bins = [3000,6000, 9000, 12000, 15000, 18000, 21000]

counts, bins, patches = ax2.hist(y1, bins, normed=1, histtype='bar', facecolor='g', rwidth=0.8)

ax2.set_title('Hist of min monthly salary in China (adjust)', size=14)

ax2.set_yticklabels('')

# ax2.set_xlabel('min monthly salary(RMB)')

# http://stackoverflow.com/questions/6352740/matplotlib-label-each-bin

ax2.set_xticks(bins) #將bins設定為xticks

ax2.set_xticklabels(bins, rotation=-90) # 設定為xticklabels的方向

# Label the raw counts and the percentages below the x-axis...

bin_centers = 0.5 * np.diff(bins) + bins[:-1]

for count, x in zip(counts, bin_centers):

# # Label the raw counts

# ax2.annotate(str(count), xy=(x, 0), xycoords=('data', 'axes fraction'),

# xytext=(0, -70), textcoords='offset points', va='top', ha='center', rotation=-90)

# Label the percentages

percent = '%0.0f%%' % (100 * float(count) / counts.sum())

ax2.annotate(percent, xy=(x, 0), xycoords=('data', 'axes fraction'),

xytext=(0, -40), textcoords='offset points', va='top', ha='center', rotation=-90, color='b', size=14)

fig.savefig('salary_quanguo_min_adjust.jpg')

執行結果如下述圖所示： python 求職 Top10 城市，來看看是否有你所在的城市

3.3 相關技能要求

brief_list = list(df_clean_concat['brief'])
brief_str = ''.join(brief_list)
print(type(brief_str))
# print(brief_str)
# with open('brief_quanguo.txt', 'w', encoding='utf-8') as f:
#     f.write(brief_str)

brief_list = list(df_clean_concat['brief'])

brief_str = ''.join(brief_list)

print(type(brief_str))

# print(brief_str)

# with open('brief_quanguo.txt', 'w', encoding='utf-8') as f:

# f.write(brief_str)

<class 'str'>

1	<class 'str'>

對獲取到的職位招聘要求進行詞雲圖分析，程式碼如下：

# -*- coding: utf-8 -*-
"""
Created on Wed May 17 2017
@author: lemon
"""
import jieba
from wordcloud import WordCloud, ImageColorGenerator
import matplotlib.pyplot as plt
import os
import PIL.Image as Image
import numpy as np
with open('brief_quanguo.txt', 'rb') as f: # 讀取檔案內容
    text = f.read()
    f.close()
# 首先使用 jieba 中文分詞工具進行分詞
wordlist = jieba.cut(text, cut_all=False)      
# cut_all, True為全模式，False為精確模式
wordlist_space_split = ' '.join(wordlist)
d = os.path.dirname(__file__)
alice_coloring = np.array(Image.open(os.path.join(d,'colors.png')))
my_wordcloud = WordCloud(background_color='#F0F8FF', max_words=100, mask=alice_coloring,
                         max_font_size=300, random_state=42).generate(wordlist_space_split)
image_colors = ImageColorGenerator(alice_coloring)
plt.show(my_wordcloud.recolor(color_func=image_colors))
plt.imshow(my_wordcloud)            # 以圖片的形式顯示詞雲
plt.axis('off')                     # 關閉座標軸
plt.show()
my_wordcloud.to_file(os.path.join(d, 'brief_quanguo_colors_cloud.png'))

# -*- coding: utf-8 -*-

"""

Created on Wed May 17 2017

@author: lemon

"""

import jieba

from wordcloud import WordCloud, ImageColorGenerator

import matplotlib.pyplot as plt

import os

import PIL.Image as Image

import numpy as np

with open('brief_quanguo.txt', 'rb') as f: # 讀取檔案內容

text = f.read()

f.close()

# 首先使用 jieba 中文分詞工具進行分詞

wordlist = jieba.cut(text, cut_all=False)

# cut_all, True為全模式，False為精確模式

wordlist_space_split = ' '.join(wordlist)

d = os.path.dirname(__file__)

alice_coloring = np.array(Image.open(os.path.join(d,'colors.png')))

my_wordcloud = WordCloud(background_color='#F0F8FF', max_words=100, mask=alice_coloring,

max_font_size=300, random_state=42).generate(wordlist_space_split)

image_colors = ImageColorGenerator(alice_coloring)

plt.show(my_wordcloud.recolor(color_func=image_colors))

plt.imshow(my_wordcloud) # 以圖片的形式顯示詞雲

plt.axis('off') # 關閉座標軸

plt.show()

my_wordcloud.to_file(os.path.join(d, 'brief_quanguo_colors_cloud.png'))

得到結果如下： python 求職 Top10 城市，來看看是否有你所在的城市

4 北京

4.1 月薪分佈情況

df_beijing = df_clean_concat[df_clean_concat['gzdd'].str.contains('北京.*', regex=True)]
df_beijing.to_excel('zhilian_kw_python_bj.xlsx')
print('總行數為：{}行'.format(df_beijing.shape[0]))
# df_beijing.head()

df_beijing = df_clean_concat[df_clean_concat['gzdd'].str.contains('北京.*', regex=True)]

df_beijing.to_excel('zhilian_kw_python_bj.xlsx')

print('總行數為：{}行'.format(df_beijing.shape[0]))

# df_beijing.head()

總行數為：6936行

1 2	總行數為：6936行

參考全國分析時的程式碼，月薪分佈情況圖如下： python 求職 Top10 城市，來看看是否有你所在的城市

4.2 相關技能要求

brief_list_bj = list(df_beijing['brief'])
brief_str_bj = ''.join(brief_list_bj)
print(type(brief_str_bj))
# print(brief_str_bj)
# with open('brief_beijing.txt', 'w', encoding='utf-8') as f:
#     f.write(brief_str_bj)

brief_list_bj = list(df_beijing['brief'])

brief_str_bj = ''.join(brief_list_bj)

print(type(brief_str_bj))

# print(brief_str_bj)

# with open('brief_beijing.txt', 'w', encoding='utf-8') as f:

# f.write(brief_str_bj)

<class 'str'>

1	<class 'str'>

詞雲圖如下： python 求職 Top10 城市，來看看是否有你所在的城市

5 長沙

5.1 月薪分佈情況

df_changsha = df_clean_concat[df_clean_concat['gzdd'].str.contains('長沙.*', regex=True)]
# df_changsha = pd.DataFrame(df_changsha, ignore_index=True)
df_changsha.to_excel('zhilian_kw_python_cs.xlsx')
print('總行數為：{}行'.format(df_changsha.shape[0]))
# df_changsha.tail()

df_changsha = df_clean_concat[df_clean_concat['gzdd'].str.contains('長沙.*', regex=True)]

# df_changsha = pd.DataFrame(df_changsha, ignore_index=True)

df_changsha.to_excel('zhilian_kw_python_cs.xlsx')

print('總行數為：{}行'.format(df_changsha.shape[0]))

# df_changsha.tail()

總行數為：280行

總行數為：280行

參考全國分析時的程式碼，月薪分佈情況圖如下： python 求職 Top10 城市，來看看是否有你所在的城市

5.2 相關技能要求

brief_list_cs = list(df_changsha['brief'])
brief_str_cs = ''.join(brief_list_cs)
print(type(brief_str_cs))
# print(brief_str_cs)
# with open('brief_changsha.txt', 'w', encoding='utf-8') as f:
#     f.write(brief_str_cs)

brief_list_cs = list(df_changsha['brief'])

brief_str_cs = ''.join(brief_list_cs)

print(type(brief_str_cs))

# print(brief_str_cs)

# with open('brief_changsha.txt', 'w', encoding='utf-8') as f:

# f.write(brief_str_cs)

<class 'str'>

1	<class 'str'>

詞雲圖如下： python 求職 Top10 城市，來看看是否有你所在的城市

通過IP來判斷所在城市
2017-10-27
智慧城市：未來的城市
2022-03-29
根據IP查詢所在城市介面（查詢使用者所在城市）
2017-07-21
利用js獲取IP,所在城市
2014-04-29
JS
PHP獲取ip與ip所在城市
2017-06-15
PHP
大城市求職生活建議薦
2010-06-14
求職
js根據ip地址獲取所在城市
2015-01-10
JS
智慧城市帶來的網路威脅有哪些？
2020-01-04
js獲取使用者當前所在城市（ip）
2019-03-22
JS
php獲取訪客所在城市名稱程式碼
2017-11-12
PHP
智慧城市的成人禮：城市智慧體帶來的變革與機遇
2021-10-17
智慧體
智慧城市長文綜述：展望未來城市，萬物皆可運營
2019-06-04
如何將城市變得有智慧？智慧城市將帶來哪些好處？
2018-07-31
Flow雲服務改善城市生活的基礎是控制城市，你敢用嗎？
2018-03-07
js依據ip獲取使用者當前所在城市
2020-07-01
JS
你在工作的城市中買房了嗎?
2020-05-26
利用Python網路爬蟲抓取微信好友的所在省位和城市分佈及其視覺化
2019-03-01
Python爬蟲視覺化
谷歌智慧城市計劃發力：招聘新團隊欲打造未來城市
2016-02-25
谷歌
NBA各隊所在分割槽，州，城市，主體育館資料整理
2007-11-07
城市大資料的廣泛應用將使智慧城市的未來錦上添花！
2019-07-12
大資料
京東城市發起CNCC2018城市計算技術論壇，中日韓共話智慧城市未來
2018-10-29
核心城市
2024-03-19
Gartner：未來50%的大城市居民將為智慧城市計劃分享個人資料
2017-08-09
Python的高階特徵你知多少？來對比看看
2019-04-22
Python特徵
MTData：2022年春節訂單量TOP10城市
2022-02-07
城市天際線 for Mac(城市建造類遊戲)
2021-12-08
Mac遊戲
共建智慧城市武漢邀城市合夥人
2017-07-05
數字孿生城市是數字城市的高階階段，也是智慧城市的新高度OUK
2022-03-19
城市運管服平臺：智慧城市的神經中樞
2024-07-31
智慧城市的安全之道
2017-07-03
Python 來算算一線城市的二手房價格指數相關性
2019-12-01
Python
大資料如何應用在智慧城市中？讓你瞭解什麼是城市大資料。
2019-07-25
大資料
根本停不下來的“真·遊戲毒品”城市營造（上）
2019-07-22
遊戲
阿里雲打造未來智慧城市的新模樣！
2018-09-20
阿里
三維模擬城市平臺邀請你
2008-08-28
智慧城市帶來哪些網路安全問題？
2020-01-03
治理智慧城市：智慧城市發展的政策基準報告
2021-07-15
銀彈谷：數字孿生城市和智慧城市的關係
2023-02-07

python 求職 Top10 城市，來看看是否有你所在的城市

前言

一、主要分析步驟

二、具體分析過程

1 讀取資料

2 資料整理

2.1 將str格式的日期變為 datatime

2.2 篩選月薪格式為“XXXX-XXXX”的資訊

2.3 分割月薪欄位，分別獲取月薪的下限值和上限值

3 對全國範圍內的職位進行分析

3.1 主要城市的招聘職位數量分佈情況

3.2 月薪分佈情況（全國）

3.3 相關技能要求

4 北京

4.1 月薪分佈情況

4.2 相關技能要求

5 長沙

5.1 月薪分佈情況

5.2 相關技能要求

相關文章