The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects. Thus, there is large variation in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. The PubFig dataset is similar in spirit to the Labeled Faces in the Wild (LFW) dataset created at UMass-Amherst, although there are some significant differences in the two:
LFW contains 13,233 images of 5,749 people, and is thus much broader than PubFig. However, it's also smaller and much shallower (many fewer images per person on average).
LFW is derived from the Names and Faces in the News work of T. Berg, et al. These images were originally collected using news sources online. For many people, there are often several images taken at the same event, with the person wearing similar clothing and in the same environment. Our paper at ICCV 2009 showed that this can often be exploited by algorithms to give unrealistics boosts in performance.
Of course, the PubFig dataset no doubt has biases of its own, and we welcome any attempts to categorize these.
We have created a face verification benchmark on this dataset that test the abilities of algorithms to classify a pair of images as being of the same person or not. Importantly, these two people should have never been seen by the algorithm during training. In the future, we hope to create recognition benchmarks as well.
The database is made available only for non-commercial use. If you use this dataset, please cite the following paper:
"Attribute and Simile Classifiers for Face Verification,"
Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar,
International Conference on Computer Vision (ICCV), 2009.
[bibtex] [pdf] [webpage]News
December 23, 2010: Updated PubFig to v1.2. The changes are as follows:We added md5 checksums for all images in the datafiles on the download page.
September 10, 2010: Updated PubFig to v1.1. The major changes are as follows:We recomputed attribute values using updated classifiers, expanding to 73 attributes.
Attribute values now exist for the development set as well as the evaluation set (previously only the evaluation set had attribute values).
We updated the face rectangles for faces to be much tighter around the face, as opposed to the rather loose boundaries given before.
We removed 679 bad images, including non-jpegs, images with non-standard colorspaces, corrupted images, and images with very poor alignment.
We generated a new cross-validation set, taking into account these deleted images. We ran our algorithm with our new attribute classifiers on this set, obtaining a new curve.
We removed the verification subsets by pose, lighting, and expression, as they were not being used. Instead, we created a single datafile which contains the manual labels for these parameters.
Some of the datafile formats have changed slightly, to be more consistent with the others.
We added the python script used to generate the output ROC curves
We updated this website to be cleaner and easier to read
December 21, 2009: Added face locations to dataset
December 2, 2009: Created website and publicly released v1.0 of dataset
Related Projects
Attribute and Simile Classifiers for Face Verification (Columbia)
FaceTracer: A Search Engine for Large Collections of Images with Faces (Columbia)
Labeled Faces in the Wild (UMass-Amherst)
Names and Faces (SUNY-Stonybrook)
Neeraj Kumar,Alexander C.Berg,Peter N.Belhumer和Shree K.Nayar,
- 《Cris Tales》:哥倫比亞的幻想JRPG
- 2018世界盃哥倫比亞vs日本比分預測 哥倫比亞vs日本誰會贏
- 2018世界盃波蘭vs哥倫比亞誰會贏 波蘭vs哥倫比亞比分預測
- 哥倫比亞大學:研究發現隱瞞好訊息的意外好處
- 2018世界盃波蘭vs哥倫比亞影片錄播 波蘭0-3哥倫比亞影片回放
- 2018世界盃塞內加爾vs哥倫比亞錄播 塞內加爾0-1哥倫比亞影片回
- 2018世界盃塞內加爾vs哥倫比亞誰會贏 塞內加爾vs哥倫比亞比分預測
- 在哥倫比亞,做一款日式RPG遊戲遊戲
- 哥倫比亞大學:干預時間對美國新冠肺炎疫情蔓延的不同影響
- 哥倫比亞大學:預計2020年美國感染COVID-19人數達1億人
- 哥倫比亞大學:科學研究發現生育兩個以上孩子會導致晚年認知能力下降
- 2021年Q2哥倫比亞智慧手機市場份額(附原資料表)
- 2024年Q1哥倫比亞主要智慧手機出貨量同比(附原資料表)
- 哥倫比亞大學:研究發現每坐30分鐘步行5分鐘可以大大減輕久坐危害
- 索尼哥倫比亞:《毒液:致命守護者》國內票房破18億
- 與 Rockstar 聖迭戈創始人在哥倫比亞共度的兩天
- 哥倫比亞大學:研究發現睡眠不足和高脂肪飲食將導致健康惡性迴圈
- 2023年哥倫比亞主要智慧手機廠商出貨量市場份額(附原資料表)
- 哥倫比亞大學:大麻電子煙成為美國所有青少年群體最受歡迎的接觸方式
- 2024年Q1哥倫比亞主要智慧手機出貨量市場份額(附原資料表)
- 2023年Q4哥倫比亞主要智慧手機廠商出貨量市場份額(附原資料表)
- CounterPoint:2021年Q2中國品牌佔哥倫比亞智慧手機市場65%
- “哥倫布”華為,與智慧聯接新大陸
- 聖地亞哥州立大學:2020年好萊塢100部高票房導演16%為女性
- ECCV2024獎項公佈,哥大摘最佳論文,微軟COCO資料集獲經典論文獎微軟
- 黑客竊取了美國聖地亞哥學區十年的資料黑客
- Battle Point | 海哥說資料大屏BAT
- 《資料分析與資料探勘》--天津大學公開課
- ImageNet「眾包」成就偉大資料集,「昇騰眾智」創新AI開發模式大資料AI模式
- 精彩!60位部長集體喊話大資料(上篇)大資料
- 【精益生產】電機企業應該學習的精益知識(德國大眾集團內部培訓資料)
- 人臉識別資料集和特點
- 奔走相告!亞馬遜內部機器學習課程現向大眾免費開放亞馬遜機器學習
- difflib: Python 比較資料集Python
- 大眾點評點餐小程式開發經驗 – 資料採集
- 人臉識別資料集 - BioID Face Database - FaceDBDatabase
- 2018深度學習倫敦大會深度學習
- Python-OpenCV人臉識別之資料集生成PythonOpenCV