MS-Celeb-1M
2D Box
Classification
Face
|...
License: Unknown

Overview

We select one million celebrities, who are real persons in the world and have/had public attentions. The steps for selection are described in details in the following paragraphs. First, we select a subset of entities from a knowledge base called freebase [11] based on the information within freebase. In freebase, each entity is identifified by a unique key (called machine identififier, mid in [11]), and associated with rich properties. More specififically, we select the entities of which the properties satisfy all the three following conditions.

The object type of the entity is defifined as “people.person” in freebase. This condition means that we select entities which are claimed (by freebase) to be real persons in the world. We don’t include movie characters since their appearance is not strictly defifined, especially when a classic movie is retaken.

The entities are required to have at least one of the properties unique for human beings, such as “person’s name”, “place of birth”, “date of birth”, “person’s professions”. This condition removes the entities which have too sparse information for us to collect and label images. This condition also helps us to remove some of the entities of which the object type are mislabeled as “people.person” in freebase.

If the date of birth is available for a given entity in freebase, this entity can not be selected if he/she was born before the mid-nineteenth century. The reason for this condition is as follows. The fifirst roll-fifilm specialized camera “Kodak” was invented in 1888 [20] and started to get popular in late nineteenth century. We can not rely on drawings or sculptures to recognize people’s faces, since whether they are visually similar to the actual person could be subjective and arguable. An interesting example is that the sculpture of John Harvard in Harvard university is claimed to be inspired by a Harvard student Sherman Hoar rather than Harvard himself, since no one knew what John Harvard had looked like [21].

In the second step, we rank all the entities in the above sub set according to the frequency of their occurrence on the web. Then, we select the top one million entities to form our one mil lion celebrity list and provide their entity keys (mid) in freebase. The occurrence frequency for a given entity is obtained by count ing how many documents contain this entity in a large corpus with billions of documents from the web.

Data Summary
Type
Image,
Amount
10000K
Size
--
Provided by
Microsoft
Microsoft Corporation (/ˈmaɪkroʊsɒft/) is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports, and sells computer software, consumer electronics, personal computers, and related services. Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers.
Issue
Start Building AI Now