License: CC BY-NC-ND 3.0


A Dataset With Over 100,000 Face Images of 530 People

Large face datasets are important for advancing face recognition research, but they are tedious to build, because a lot of work has to go into cleaning the huge amount of raw data. To facilitate this task, we developed an approach to building face datasets that detects faces in images returned from searches for public figures on the Internet, followed by automatically discarding those not belonging to each queried person.

The FaceScrub dataset was created using this approach, followed by manually checking and cleaning the results. It comprises a total of 106,863 face images* of male and female 530 celebrities, with about 200 images per person. As such, it is one of the largest public face databases.

The images were retrieved from the Internet and are taken under real-world situations (uncontrolled conditions). Name and gender annotations of the faces are included.

FaceScrub Male Female Total
# people: 265 265 530
# images: 55,306 51,557 106,863

Sample images

Data Summary
Provided by
Vintage(Vision & InterAction Group)
Led by Dr. Stefan Winkler, Vintage's research focuses on perception-based image/video processing and analysis.Vintage is now part of the School of Computing at the National University of Singapore (NUS).
Start Building AI Now