哥倫比亞大學公眾人物臉部資料集

不務正業的猿發表於2020-11-07

原文:

Introduction

The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects. Thus, there is large variation in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. The PubFig dataset is similar in spirit to the Labeled Faces in the Wild (LFW) dataset created at UMass-Amherst, although there are some significant differences in the two:

  • LFW contains 13,233 images of 5,749 people, and is thus much broader than PubFig. However, it's also smaller and much shallower (many fewer images per person on average).

  • LFW is derived from the Names and Faces in the News work of T. Berg, et al. These images were originally collected using news sources online. For many people, there are often several images taken at the same event, with the person wearing similar clothing and in the same environment. Our paper at ICCV 2009 showed that this can often be exploited by algorithms to give unrealistics boosts in performance.

  • Of course, the PubFig dataset no doubt has biases of its own, and we welcome any attempts to categorize these.

We have created a face verification benchmark on this dataset that test the abilities of algorithms to classify a pair of images as being of the same person or not. Importantly, these two people should have never been seen by the algorithm during training. In the future, we hope to create recognition benchmarks as well.

Citation

The database is made available only for non-commercial use. If you use this dataset, please cite the following paper:

"Attribute and Simile Classifiers for Face Verification,"

Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar,

International Conference on Computer Vision (ICCV), 2009.

 

[bibtex] [pdf] [webpage]News

  • December 23, 2010: Updated PubFig to v1.2. The changes are as follows:We added md5 checksums for all images in the datafiles on the download page.

  • September 10, 2010: Updated PubFig to v1.1. The major changes are as follows:We recomputed attribute values using updated classifiers, expanding to 73 attributes.

    • Attribute values now exist for the development set as well as the evaluation set (previously only the evaluation set had attribute values).

    • We updated the face rectangles for faces to be much tighter around the face, as opposed to the rather loose boundaries given before.

    • We removed 679 bad images, including non-jpegs, images with non-standard colorspaces, corrupted images, and images with very poor alignment.

    • We generated a new cross-validation set, taking into account these deleted images. We ran our algorithm with our new attribute classifiers on this set, obtaining a new curve.

    • We removed the verification subsets by pose, lighting, and expression, as they were not being used. Instead, we created a single datafile which contains the manual labels for these parameters.

    • Some of the datafile formats have changed slightly, to be more consistent with the others.

    • We added the python script used to generate the output ROC curves

    • We updated this website to be cleaner and easier to read

  • December 21, 2009: Added face locations to dataset

  • December 2, 2009: Created website and publicly released v1.0 of dataset

Related Projects

譯:

介紹

 

PubFig資料庫是一個大型的真實世界人臉資料集,包含從網際網路上收集的200人的58797張影像。與大多數其他現有的人臉資料集不同,這些影像是在完全不受控制的情況下拍攝的,而非合作物件。因此,在姿勢、燈光、表情、場景、攝像機、成像條件和引數等方面存在很大差異。PubFig資料集在精神上與麻省大學阿默斯特分校建立的野生(LFW)資料集中的標籤人臉相似,儘管兩者之間存在一些顯著差異:

 

●LFW包含5749人的13233張影像,因此比PubFig的範圍更廣。然而,它也更小,也更淺(平均每個人的影像更少)。

 

●LFW來源於T.Berg等人新聞作品中的姓名和麵孔。這些圖片最初是通過線上新聞來源收集的。對於許多人來說,在同一個活動中,經常會有幾張照片,這些照片中的人穿著相似的衣服,在同一個環境中拍攝。我們在2009年ICCV上發表的論文顯示,這通常可以被演算法利用,從而給表現帶來非現實的提升。

 

●當然,PubFig資料集無疑有其自身的偏差,我們歡迎任何對這些資料進行分類的嘗試。

 

我們已經在這個資料集上建立了一個人臉驗證基準,測試演算法將一對影像分類為是否屬於同一個人的能力。重要的是,這兩個人在訓練期間不應該被演算法看到。在未來,我們也希望建立認可基準。

 

引用

 

該資料庫僅用於非商業用途。如果您使用此資料集,請引用以下論文:

 

用於人臉驗證的屬性和明喻分類器

 

Neeraj Kumar,Alexander C.Berg,Peter N.Belhumer和Shree K.Nayar,

 

國際計算機視覺會議(ICCV),2009年。

 

 

 

[bibtex][pdf][webpage]新聞

 

●2010年12月23日:將PubFig更新為v1.2。變化是以下:我們新增了下載頁面上資料檔案中所有影像的md5校驗和。

 

●2010年9月10日:將PubFig更新為v1.1。主要的變化是以下:我們重新計算屬性值使用更新的分類器,擴充套件到73個屬性。

 

○開發集和評估集現在都有屬性值(以前只有評估集有屬性值)。

 

○我們更新了面矩形,使面周圍的面更加緊密,而不是之前給出的相當鬆散的邊界。

 

○我們刪除了679幅不良影像,包括非JPEG影像、具有非標準色彩空間的影像、損壞的影像以及對齊非常差的影像。

 

○考慮到這些刪除的影像,我們生成了一個新的交叉驗證集。我們在這個集合上用我們的新屬性分類器執行我們的演算法,得到一個新的曲線。

 

○我們通過姿勢、照明和表情移除驗證子集,因為它們沒有被使用。相反,我們建立了一個包含這些引數的手動標籤的資料檔案。

 

○一些資料檔案格式略有變化,以便與其他格式更加一致。

 

○我們新增了用於生成輸出ROC曲線的python指令碼

 

○我們更新了這個網站,使其更乾淨、更易於閱讀

 

●2009年12月21日:向資料集新增面位置

 

●2009年12月2日:建立網站並公開發布資料集v1.0

 

相關專案

 

●用於人臉驗證的屬性和明喻分類器(哥倫比亞)

 

●FaceTracer:一個搜尋大量人臉圖片的搜尋引擎(哥倫比亞)

 

●野外標籤臉(麻省大學阿默斯特校區)

 

●姓名和麵孔(紐約州立大學石車)

大家可以到官網地址下載資料集,我自己也在百度網盤分享了一份。

連結:獲取資料集

 

相關文章