graviti
Products
Resources
About us
MS-Celeb-1M
2D Box
Classification
Face
|...
License: Unknown

Overview

We select one million celebrities, who are real persons in the world and have/had public attentions.
The steps for selection are described in details in the following paragraphs. First, we select
a subset of entities from a knowledge base called freebase [11] based on the information within
freebase. In freebase, each entity is identifified by a unique key (called machine identififier,
mid in [11]), and associated with rich properties. More specififically, we select the entities
of which the properties satisfy all the three following conditions.

The object type of
the entity is defifined as “people.person” in freebase. This condition means that we select
entities which are claimed (by freebase) to be real persons in the world. We don’t include
movie characters since their appearance is not strictly defifined, especially when a classic
movie is retaken.

The entities are required to have at least one of the properties unique
for human beings, such as “person’s name”, “place of birth”, “date of birth”, “person’s professions”.
This condition removes the entities which have too sparse information for us to collect and
label images. This condition also helps us to remove some of the entities of which the object
type are mislabeled as “people.person” in freebase.

If the date of birth is available
for a given entity in freebase, this entity can not be selected if he/she was born before the
mid-nineteenth century. The reason for this condition is as follows. The fifirst roll-fifilm
specialized camera “Kodak” was invented in 1888 [20] and started to get popular in late nineteenth
century. We can not rely on drawings or sculptures to recognize people’s faces, since whether
they are visually similar to the actual person could be subjective and arguable. An interesting
example is that the sculpture of John Harvard in Harvard university is claimed to be inspired
by a Harvard student Sherman Hoar rather than Harvard himself, since no one knew what John
Harvard had looked like [21].

In the second step, we rank all the entities in the above sub
set according to the frequency of their occurrence on the web. Then, we select the top one
million entities to form our one mil lion celebrity list and provide their entity keys (mid)
in freebase. The occurrence frequency for a given entity is obtained by count ing how many
documents contain this entity in a large corpus with billions of documents from the web.

Data Summary
Type
Image,
Amount
10000K
Size
--
Provided by
Microsoft
Microsoft Corporation (/ˈmaɪkroʊsɒft/) is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports, and sells computer software, consumer electronics, personal computers, and related services. Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers.
| Amount 10000K | Size --
MS-Celeb-1M
2D Box Classification
Face
License: Unknown

Overview

We select one million celebrities, who are real persons in the world and have/had public attentions.
The steps for selection are described in details in the following paragraphs. First, we select
a subset of entities from a knowledge base called freebase [11] based on the information within
freebase. In freebase, each entity is identifified by a unique key (called machine identififier,
mid in [11]), and associated with rich properties. More specififically, we select the entities
of which the properties satisfy all the three following conditions.

The object type of
the entity is defifined as “people.person” in freebase. This condition means that we select
entities which are claimed (by freebase) to be real persons in the world. We don’t include
movie characters since their appearance is not strictly defifined, especially when a classic
movie is retaken.

The entities are required to have at least one of the properties unique
for human beings, such as “person’s name”, “place of birth”, “date of birth”, “person’s professions”.
This condition removes the entities which have too sparse information for us to collect and
label images. This condition also helps us to remove some of the entities of which the object
type are mislabeled as “people.person” in freebase.

If the date of birth is available
for a given entity in freebase, this entity can not be selected if he/she was born before the
mid-nineteenth century. The reason for this condition is as follows. The fifirst roll-fifilm
specialized camera “Kodak” was invented in 1888 [20] and started to get popular in late nineteenth
century. We can not rely on drawings or sculptures to recognize people’s faces, since whether
they are visually similar to the actual person could be subjective and arguable. An interesting
example is that the sculpture of John Harvard in Harvard university is claimed to be inspired
by a Harvard student Sherman Hoar rather than Harvard himself, since no one knew what John
Harvard had looked like [21].

In the second step, we rank all the entities in the above sub
set according to the frequency of their occurrence on the web. Then, we select the top one
million entities to form our one mil lion celebrity list and provide their entity keys (mid)
in freebase. The occurrence frequency for a given entity is obtained by count ing how many
documents contain this entity in a large corpus with billions of documents from the web.

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti