Dutch researchers estimate age and gender of Twitter users based on content shared
Researchers at a Dutch university have developed an online programme that is able give the age and gender of users purely based on the content they post on the social network Twitter.
Based on data from almost 3,000 Twitter users who “tweet” messages in 140 characters or less, researchers at the Twente University near the eastern city of Enschede have compiled lists of words and sequences corresponding with different ages and specific genders.
Dong Nguyen, a doctoral student in computer sciences said users simply had to enter their username into the online programme which then calculated age and gender by comparing the last 200 tweets with the words and phrases in its database.
“The distinction between men and women is actually very stereotypical,” said Nguyen, one of the participants in the project: “Men talk about football and women about their nails.”
“In terms of age, younger users talk about themselves a lot more and use a lot of emoticons while older people use longer words and sentences,” she told AFP.
Emoticons are a number of icons such as a “smiley face” to portray the writer’s mood or facial expression in a tweet.
The programme, which only analyses tweets in Dutch for now, has a margin error of four years, which dips for younger users and increases when users are older.
“We note that users use more uniform language from about 35 years and older. There are larger differences between a users aged 15 and 20 then there are between users aged 45 and 55,” Nguyen said.
Researchers at Twente in collaboration with the Meertens Dutch language and culture institute, were looking at updating the programme for other languages and adapting for popular networking sites such as Facebook.