作者介紹：徐麟，目前就職於上海唯品會產品技術中心，哥大統計資料狗，從事資料探勘&分析工作，喜歡用R&Python玩一些不一樣的資料

個人公眾號：資料森麟（ID:shujusenlin）,知乎同名專欄作者。

前言：

隨著網際網路行業的日益興盛，吸引力越來越多的牛人加入其中，也有許多小夥伴躍躍欲試，想要在網際網路的浪潮中大展身手。今天我們透過看準網的資料，幫助大家對各大網際網路公司有一個比較概括的瞭解。

資料來源

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

看準網提供了許多員工對於公司的評價，我們從中提取需要的資料，包括整體評分、面試難度、推薦率、前景看好情況、CEO支援率，程式碼如下：


## 獲得資訊 


def get_company_info(num,headers):


    ## 獲得評價資料


    url = 

'

+str(num)+

'.html?ka=com-blocker1-review'



    js=

'window.open("'

+url+

'")'



    driver.execute_script(js)


    time.

sleep

(

5

)


    driver.

close

() 


    driver.switch_to_window(driver.window_handles[


])


    bsObj=BeautifulSoup(driver.page_source,

"html.parser"

)


    

tag

=bsObj.

find

(

'div'

,attrs={

'class'

:

'all_item'

}).text.replace(

'\t'

,

''

).replace(

'\n'

,

''

).replace(

'('

,

' '

).replace(

')'

,

' '

).

split

(

' '

)


    

tag

=

tag

[


:

len

(

tag

)-

1

]


    this_tag = {

tag

[i*

2

]:

tag

[i*

2

+

1

] 

for

 i in np.arange(

int

(

len

(

tag

)/

2

-

1

))}


    this_name = bsObj.

find

(

'div'

,attrs={

'class'

:

'co_name t_center'

}).text


    this_overal = float(bsObj.

find

(

'div'

,attrs={

'class'

:

'res_box_star f_right'

}).

find

(

'em'

).text)


    points = bsObj.

find

(

'ul'

,attrs={

'class'

:

'score_rate clearfix'

}).text.replace(

'\n'

,

' '

).

split

()


    this_recommend = float(points[


][


:

2

])/

100

*

5



    this_future = float(points[

2

][


:

2

])/

100

*

5



    this_ceo = float(points[

4

][


:

2

])/

100

*

5



    ## 獲得CEO頭像和公司logo


    ceo_pic = bsObj.

find

(

'div'

,attrs={

'class'

:

'ceo_info'

}).

find

(

'div'

).

find

(

'img'

).attrs[

'src'

]


    ceo_name = bsObj.

find

(

'div'

,attrs={

'class'

:

'ceo_info'

}).

find

(

'p'

).text


    head_logo = bsObj.

find

(

'div'

,attrs={

'class'

:

'com_logo f_left'

}).

find

(

'img'

).attrs[

'src'

]


    head_loc = 

'D:/爬蟲/看準/公司logo/'

+this_name+

'.jpg'



    ceo_loc = 

'D:/爬蟲/看準/CEOlogo/'

+this_name+

'.jpg'



    request.urlretrieve(head_logo,head_loc)


    request.urlretrieve(ceo_pic,ceo_loc)


    ## 獲得面試難度


    url = 

'

+str(num)+

'.html?ka=com-floater-interview'



    js=

'window.open("'

+url+

'")'



    driver.execute_script(js)


    time.

sleep

(

5

)


    driver.

close

() 


    driver.switch_to_window(driver.window_handles[


])


    bsObj=BeautifulSoup(driver.page_source,

"html.parser"

)


    req=request.Request(url,headers=headers)  


    html=urlopen(req)  


    bsObj=BeautifulSoup(html.

read

(),

"html.parser"

)    


    this_difficulty = float(bsObj.

find

(

'section'

,attrs={

'class'

:

'interview_feel'

}).

find

(

'em'

).text)


    this_feeling = bsObj.

find

(

'ul'

,attrs={

'class'

:

'score_list'

}).find_all(

'span'

,attrs={

'class'

:

'percent'

})


    this_feeling = [float(

k

.text.replace(

'%'

,

''

)) 

for

 

k

 in this_feeling]


    this_feeling = (this_feeling[


]*

5

+this_feeling[

1

]*

3

+this_feeling[

2

]*

1

)/

100



    ## 整合資料成為字典


    this_company ={

'name'

:this_name,

'overal'

:this_overal,

'comments'

:

tag

[

1

],

'recommend'

:this_recommend,


                   

'future'

:this_future,

'ceo'

:this_ceo,

'difficulty'

:this_difficulty,

'feeling'

:this_feeling}    


    

return

 this_company,this_tag,this_name

整體對比

我們最終選取了50家網際網路公司作為樣本進行對比，選取來源主要是結合2018年網際網路公司百強榜單和看準網上的實際評價數量，選取的公司logo拼圖如下，我們會在第4部分講解如何將圖片進行拼接：

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

首先對比各項評價指標的TOP15：

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

可以看到，榜單中BAT在各項排名中都處於十分靠前的位置，網易也佔據了多個榜單的靠前位置，騰訊霸佔了所有排名的TOP1。下面我們再來看一下面試難度，我們選取了面試難度評分的TOP15和BOTTOM15，該資料僅供參考，根據小編的經驗，同一個公司的不同部門不同崗位之間的難度差異也非常大。

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

在面試難度偏低的一些公司中，有許多非常不錯的公司，該資料僅僅是一個參考，真正的面試還是要取決於求職者的實際能力，所謂會者不難，難者不會。真正的大牛無論是面對多麼困難的面試，依然可以slay全場。



## 整體評分top15柱形圖



company=pd.read_excel(

'company_info.xlsx'

)


company_overal = company.sort_values(

'overal'

,ascending=

False

)[


:

15

]


attr = company_overal[

'name'

]


v1=round(company_overal[

'overal'

],

2

)


bar = Bar(

"整體評分TOP15"

,title_pos=

'center'

)


bar.use_theme(

'essos'

)


bar.add(

""

, attr, v1, is_stack=

False

,xaxis_rotate=

30

,yaxis_min=

3.7

,is_label_show=

True

,


         xaxis_interval =


,is_splitline_show=

False

)


bar.render(

'整體評分TOP15.html'

)

雷達圖

前面我們看的都是各個公司之間的對比，下面我們看一下同一個公司不同維度的情況，我們選取了BAT和TMD作為資料，其他的公司也可以按照同樣的方式進行對比，首先看一下BAT：

BAT真的是名副其實的業界標杆，各項指標都slay整個行業，下面我們看一下此前發展勢頭迅猛的TMD三家公司：

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

TMD三家公司和行業整體水平相比，也是出於領先地位，可見其還不錯的發展勢頭，最後放上和小編息息相關的三家公司，具體是哪三家，相信瞭解小編的朋友一定是可以猜出來的：

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司


value_avg = [

list

(company.iloc[:,[

1

,

3

,

4

,

5

,

6

]].mean())]


value_company0 = [

list

(company.iloc[


,[

1

,

3

,

4

,

5

,

6

]])]


value_company1 = [

list

(company.iloc[

1

,[

1

,

3

,

4

,

5

,

6

]])]


value_company2 = [

list

(company.iloc[

2

,[

1

,

3

,

4

,

5

,

6

]])]


c_schema= [{

"name"

: 

"總體評價"

, 

"max"

: 

4.4

, 

"min"

: 

3.2

},


           {

"name"

: 

"推薦度"

, 

"max"

: 

4.75

, 

"min"

: 

2.4

},


           {

"name"

: 

"前景看好"

, 

"max"

: 

4.25

, 

"min"

: 

1

},


           {

"name"

: 

"CEO/董事長認可度"

, 

"max"

: 

4.8

,

"min"

:

3

},


           {

"name"

: 

"面試難度"

, 

"max"

: 

3.4

,

"min"

:

2.3

}]


radar = Radar()


radar.use_theme(

'essos'

)


radar.config(c_schema=c_schema, shape=

'circle'

)


radar.

add

(company[

'name'

][


], value_company0, item_color=

"blue"

, symbol=None,linewidht=

5

)


radar.

add

(company[

'name'

][

1

], value_company1, item_color=

"orange"

, symbol=None,linewidht=

5

)


radar.

add

(company[

'name'

][

2

], value_company2, item_color=

"red"

, symbol=None,linewidht=

5

)


radar.

add

(

"整體水平"

, value_avg, item_color=

"purple"

, symbol=None,linewidth=

5

,


          legend_selectedmode=

'multiple'

)


radar.render(

'bat.html'

)

圖片拼接

看準網提供了各個公司的logo和各位公司大佬的頭像，我們冒昧地利用這些資料進行簡單的圖片拼接，製作成一副大的合成圖。主要原理是利用numpy中的多維陣列進行拼接，由於影像本身就可以看做是一個三維陣列（彩色）或者一位陣列（黑白），所以我們只需利用陣列的拼接方法，就可以達到我們的目的。

程式碼如下：



## 拼接公司logo成為5*10的拼圖



i = 0 



for

 filename 

in

 os.listdir(

"./公司logo"

):


    file_loc = 

"D:/爬蟲/看準/公司logo/"

+filename


    img = mpimg.imread(file_loc)[:,:,0:3]


    img = cv2.resize(img, (180,180),interpolation=cv2.INTER_AREA)


    

if

 i % 10 == 0:


        row_img=img


    

elif

 i == 9:


        row_img=np.hstack((row_img,img))


        all_img = row_img


    

elif

 i % 10 == 9:


        row_img=np.hstack((row_img,img))


        all_img = np.vstack((all_img,row_img))


    

else

:


        row_img=np.hstack((row_img,img))


    i = i+1


plt.imshow(all_img)    


plt.axis(

'off'

)     





## 拼接大佬頭像成為7*7的拼圖



i = 0 



for

 filename 

in

 os.listdir(

"./CEOlogo"

):


    file_loc = 

"D:/爬蟲/看準/CEOlogo/"

+filename


    img = mpimg.imread(file_loc)[:,:,0:3]


    img = cv2.resize(img, (500,500),interpolation=cv2.INTER_CUBIC)


    

if

 i % 7 == 0:


        row_img=img


    

elif

 i == 6:


        row_img=np.hstack((row_img,img))


        all_img = row_img


    

elif

 i % 7 == 6:


        row_img=np.hstack((row_img,img))


        all_img = np.vstack((all_img,row_img))


    

else

:


        row_img=np.hstack((row_img,img))


    i = i+1


plt.imshow(all_img)    


plt.axis(

'off'

)

下面就是我們的效果圖，不知道大家是否能一眼就把所有的logo都認全

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

最後是各位大佬的拼圖，不知道大家第一眼看到的是哪位大佬，第一眼看到的大佬，或許就是你未來的老闆

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

歡迎各位讀者在留言區與我們互動，聊聊你所瞭解的網際網路公司，或者是你最想去的公司。如果不方便透露姓名，可以在後臺單獨留言，我們會以匿名的方式發出來，歡迎大家一起來分享。

揭開網際網路公司的神秘面紗，資料解讀那些slay整個行業的網際網路公司

相關文章