The penetration and use of social media services differs from city to city. This paper investigates the social dynamics of Twitter social media usage in three ethnically diverse cities — London, Paris, and New York City. We present a spatial analysis of Tweeting activity in the three cities, broken down by ethnicity and gender. We model the ethnic identity of Twitter users using their paired forenames and surnames. The geo–tagged Tweets provide an insight into the geography of their activity patterns across the three cities. The gender of each Twitter user is identified through classification of forenames, suggesting that, irrespective of the ethnic identity, the majority of Twitter users are male. Taken together, the results present a window on the activity patterns of different ethnic groups.
3. Areas of tweeting activity
4. Hourly and daily Twitter activity
5. Name and ethnicity analysis of Twitter users
6. Geography of tweets of different ethnic groups
7. Gender analysis
8. Conclusion and future work
Microblogging services such as Twitter have become an important part of daily activities of millions of users in most countries around the world. These services are used not only for communicating with friends, family, and colleagues, but also for real–time news feeds and content sharing venues (Pennacchiotti and Popescu, 2011).
According to recent figures, the Twitter service has more than 200 million active users around the world (Twitter, 2012a). Its major user base is in European countries: in the context of the present paper, usage in the city of London, New York and Paris is the third, fifth, and seventh highest in the world (Bennett, 2012). Twitter users generate a huge quantity of data every day, and our motivation here is to investigate what these data can tell us about the gendered and ethnically diverse activity patterns of users. There are potential applications of this analysis in cyber–security and cyber–marketing.
Analysis of the social dynamics of a city through social media is a promising research area. Pennacchiotti and Popescu (2011) used a machine learning approach to infer political orientation, gender, and ethnicity of Twitter users. However, the classification was limited to two ethnic groups. Other related work includes: the use of a network metric to compare the social dynamics of Twitter usage with those of physical communities (Quercia, et al., 2012); use of Foursquare social media data and a clustering model to study the composition of a city (Cranshaw, et al., 2012); and the application of a latent attribute inference method to infer the age, gender, and political affiliation of Twitter users (Al Zamal, et al., 2012). However, the investigation of the ethnic diversity of users of social media services is an area which has remained unexplored.
Our research has focused on the use of names to classify users to different social and ethnic groups. The study of names has deep roots in different fields, including epidemiology, linguistics, geography, and genetics (Colantonio, et al., 2003; Mateos and Tucker, 2008). In our own research, the names of Twitter users (particularly their surnames) are used to provide an indication of the probable cultural, ethnic and linguistic characteristics of individuals.
We use the Onomap classification (Mateos, et al., 2011) to assign Twitter users to different cultural, ethnic and linguistic groups, using their forenames and surnames. Onomap uses name frequency datasets derived from electoral registers and telephone directories. The classification used here was created from a database representative of nearly a billion individuals drawn from 26 different countries of the world (Worldnames, 2013): as such, it offers global coverage in its names classification, based upon analysis of longstanding and recent migrant populations from around the world.
Twitter users produce a huge amount of data every day. Twitter provides the facility for programmers and developers to download live Twitter data using its public streaming API (Twitter, 2012b). The API provides a one percent sample of live geo–tagged Tweets within a user specified bounding rectangle, at any point in time.
For this paper, the Twitter Streaming API was used to download geo–tagged Tweets for London, Paris, and New York City over the period 10 September–20 December 2012 (102 days). The numbers of geo–tagged Tweets for each city were:
New York City: 646,053
The data downloaded from the API included the ‘User Name’, ‘Default Profile Language’, ‘Latitude of the Tweet’, ‘Longitude of the Tweet’, and ‘Tweet Message’. For the analysis reported in this paper, we used ‘User Name’, ‘Latitude of the Tweet’, and ‘Longitude of the Tweet’ fields.
In every city, some users send more Tweets than others. The number of unique users who sent above tweets are listed below:
London: 140,919 users
Paris: 42,729 users
New York City: 59,272 users
3. Areas of tweeting activity
This section discusses the areas of tweeting activity in London, Paris, and New York City. Areas of relative incidence of tweeting activity are shown in the Figures 1–3. The maps were created by using the following procedure:
a) In the first step, a grid map of 150 rows and 150 columns was created for every city. This resulted in 24,025 individual grid squares for each grid map.
b) Using the Twitter data (‘latitude of the Tweet’ and ‘longitude of the Tweet’) for every city, a point in polygon operation was performed to count the number of Tweets sent from the area of each grid cell.
c) Finally, the individual grid cells were given a color based on the number of Tweets sent from them.
In London (Figure 1), the users in the central part of the city sent more tweets than users in the rest of the city. This may be because of the high number of Twitter users travelling to the center of the city for work, shopping, and visiting purposes. The outskirts of London have low Twitter usage. However, in London, there are few areas where users do not send any tweets.
Figure 1: Tweet density map of London.
In New York City (Figure 2), Twitter users in certain areas are more active than others. For example, there is high Twitter usage in lower and midtown Manhattan, West Bronx, and in certain areas of Brooklyn. Tweeting activity in some areas (Queens and Staten Island) is very low.
Figure 2: Tweet density map of New York City.
In Paris (Figure 3), Twitter usage is more homogeneous than the two other cities. Twenty–five or fewer tweets were sent from many grid squares. The central area of Paris accounts for more tweeting activity than rest of the areas. In Paris, like London, there are few areas from where users do not sent any tweets.
Figure 3: Tweet density map of Paris.
4. Hourly and daily Twitter activity
The previous section has discussed the areas of tweeting activity in London, Paris, and New York City. In this section we break down this over–all picture by time of the day and week. Figure 4 shows the hourly Twitter activity in London, Paris, and New York City. This figure was created by aggregating the number of tweets, by hours of the day, sent during 10 September to 20 December 2012.
Figure 4: Hourly Twitter usage in London, Paris and New York City.
Tweeting activity in London remains low between 2 AM to 5 AM. It starts to increase from 5 AM which is earlier than in the other two cities. Tweeting activity in London reaches its peak between 10 AM–11 AM. In the night time, Twitter usage is very high between 7 PM–11 PM.
Tweeting activity in Paris remains low between 2 AM to 6 AM. It starts increasing from 6 AM. There is high Tweeting activity between 10 AM–1 PM, which is different from London. At night, Twitter usage is very high between 7 PM–11 PM.
Similar to Paris, Tweeting activity in New York City remains low between 2 AM to 6 AM. It starts increasing from 6 AM. There is high Tweeting activity during 10 AM–2 PM. At night, Twitter usage is very high between 7 PM–11 PM.
Figure 5 shows the daily Twitter activity patterns of London, Paris, and New York City. All three cities have higher numbers of tweets on Wednesday and Thursday. The number of tweets sent on Sunday is less than the number of tweets sent on any other day of the week.
Figure 5: Weekly Twitter usage.
5. Name and ethnicity analysis of Twitter users
A name is a statement of a bearer’s cultural, ethnic, and linguistic identity (Mateos, et al., 2011). Knowing the name of the person can thus reveal other useful information about that person. Name analysis of Twitter users can give us an insight into the ethnic identity of individuals, which could be useful in identifying the different types of users who use this social media service.
When registering to use Twitter, users are required to enter their name or other identifying data in the ‘User Name’ field. In many cases, tokens other than given and family names are entered, as in ‘Justins_Home’, ‘What is Love’, etc. However, many registrants fill this field with their real forename–surname pairs.
For the three cities, we developed our own software to divide the ‘User Name’ field into separate ‘forename’ and ‘surname’ fields. Table 1 shows the result of our text analytics work, where recognisable forename–surname pairs were found in the ‘User Name’ field.
Table 1: Result of text analytics work. City Total number of users Number of users where forename–surname pairs found London 140,919 106,917 Paris 42,729 28,775 New York City 59,272 42,674
In the next step, ethnicity analysis was performed on the users where forename–surname pairs were found.
Onomap (Mateos, et al., 2011) is software which may be used to assign a predicted ethnic group to a forename–surname pairing. Onomap was run on the forename–surname pairs of Twitter users. The following table shows the number of users for each city, where an ethnic group was successfully assigned to the forename–surname pairs.
Table 2: Ethnicity analysis of Twitter users in three cities. City Number of users Number of users where an ethnic group was found London 106,917 99,974 Paris 28,775 23,450 New York City 42,674 39,110
For London, 69 ethnic groups were found. Figure 6 shows the top–10 ethnic groups of Twitter users in London.
Figure 6: Top–10 ethnic groups of Twitter users in London.
In London, ‘ENGLISH’ Twitter users are in very high numbers (51 percent). The users of second most common group (‘IRISH’) account for four percent of tweets. The graph gives us an indication of long established and recent immigrants living in London, as indicated by the considerable number of Twitter users from the ‘SPANISH’ (2.25 percent), ‘PAKISTANI’ (1.9 percent), ‘INDIAN’ (1.56 percent), ‘PORTUGUESE’ (1.34 percent), and ‘TURKISH’ (1.024 percent) ethnic groups.
For New York City, 67 ethnic groups were found. Figure 7 shows the top–10 ethnic groups of Twitter users in New York City.
Figure 7: Top–10 ethnic groups of Twitter users in New York City.
In New York City, the highest number of Twitter users are described as ‘ENGLISH’ (39 percent). The above graph provides an indication of prevalence of tweeting amongst different European ethnic groups in New York. Another interesting ethnic group is ‘JEWISH’ which is eighth in terms of Twitter usage.
For Paris, 69 ethnic groups were found. Figure 8 shows the top–10 ethnic groups of Twitter users in Paris.
Figure 8: Top–10 ethnic groups of Twitter users in Paris.
In Paris, the high number of Twitter users are ‘FRENCH’ (22 percent). However, there are also high number of ‘ENGLISH’ users as well (13 percent). This could be becuase of a high number of ‘ENGLISH’ visitors tweeting in Paris (during the period of 10 September–20 December, 2012). However, it also shows the dominance of ‘ENGLISH’ users on Twitter. Other large groups of Twitter users in Paris are ‘SPANISH’ (4.5 percent), ‘ITALIAN’ (4.5 percent), ‘IRISH’ (3.2 percent), and ‘PORTUGUESE’ (3.1 percent).
6. Geography of tweets of different ethnic groups
In this section, for each of the three cities, we present the geography of Tweets of the top 10 ethnic groups. We undertook a geographic analysis of the tweets sent by the users of different ethnic groups, as shown in the Tables 3 (for London), 4 (for New York City), and 5 (for Paris).
Table 3: Geography of tweets by different ethnic groups (London). English Irish Scottish Italian Welsh Spanish Pakistani Indian Portuguese Turkish
In London, the British (English, Welsh, and Scottish) and Irish users are most active on Twitter, but they also make up the largest resident population groups in London. There is concentration of tweets in Central London for almost every ethnic group, which is a likely indicator of travel to Central London for work, leisure, or tourism purposes. However, for particular ethnic groups, distinctive areas of high concentration of tweets can be found elsewhere in the city. For example, East London has a high concentration of Pakistani users, North West London hosts many Indian users, and North London hosts the Turkish. The tweets of Spanish, Italian, and Portuguese users are concentrated in and around central and North London.
Table 4: Geography of tweets by different ethnic groups (New York City). English Spanish Irish Italian Scottish Portuguese Chinese Jewish Welsh German
In New York City, English and Spanish users are most active on Twitter. There is concentration of tweets in Manhattan area for almost every ethnic group, which is a likely indicator of travel to Manhattan for work or tourism purposes. However, for particular ethnic groups, distinctive areas of high concentration of tweets can be found elsewhere in the city. For example, Brooklyn has high concentration of Scottish, Italian, Jewish, Chinese users and Bronx hosts many Irish, Italian, and Portuguese. The tweets of English and Spanish users are concentrated all around the city.
Table 5: Geography of tweets by different ethnic groups (Paris). French English Spanish Italian Scottish Portuguese Scottish Turkish Polish German
The maps in Table 5 show a dominance of French and English users in Paris. The high number of English users tweeting in Paris could be explained by the high number of visitors from England. In Paris, the concentration of tweets of different ethnic groups is evenly distributed throughout the city. Different areas of high tweeting incidents could be identified, but the tweeting activity in Paris is different from London and New York City where we were able to identify distinctive areas of tweeting activities for each ethnic group.
The maps in Tables 3–5 provide a novel insight into the ethnic diversity of the Twitter usage in London, Paris, New York City. While living in the same city and using the same social media service, the geography of tweeting varies markedly from one ethnic group to another.
7. Gender analysis
Forename can also be used to determine the gender of a person. GenderChecker (http://genderchecker.com) is a database which allows checking whether a forename is male, female, or unisex. Forenames of Twitter users, retrieved in earlier, were assigned a gender by using the GenderChecker database. The users were assigned ‘Male’, ‘Female’, or ‘Unisex’ genders. ‘Not Found’ was assigned when a forename was not found in the database. The result of the analysis is shown in the following Figure 9.
Figure 9: Gender analysis of the three cities.
Table 6 lists the absolute numbers of different genders used in this analysis.
Table 6: Gender analysis of the three cities. Gender London Paris New York Male 53,872 12,779 20,116 Female 36,180 9,175 14,541 Unisex 10,101 1,939 4,435 Not found 6,764 4,882 3,582 Total 106,917 28,775 42,674
Figure 9 illustrates that more than 45 percent of the users are male, and more than 30 percent of the users are female in the three cities. This shows a dominance of male users on Twitter. In all the three cities, less than 10 percent of users were assigned to the “Unisex’ category.
8. Conclusion and future work
This paper has presented an analysis of the geo–spatial characteristics of tweeting activity, name and ethnicity, and gender of Twitter users in London, Paris, and New York City. The analysis suggests that the majority of Twitter users in London and New York City are male and have Anglo–Saxon roots. In Paris, the majority consists of male French users. Our geo–spatial analysis of the tweets from different ethnic groups provides an insight into the ethnic diversity of Twitter users in the three cities. While living in the same city and using the same social media service, the geography of tweeting varies from one ethnic group to another.
While we can draw some conclusions about the geography of different ethnic groups, it must be remembered that the sources and operation of bias in our dataset are unknown: by no means does everyone use Twitter, and there is no a priori reason to assume that tweeters who disclose their locations are representative of those that do not.
Yet this is a promising research area that can be extended in different ways. One possible extension is a detailed geo–spatial analysis of different ethnic groups to better understand tweeting patterns in different urban areas. We also aim to extend this analysis to investigate the observed differences between conventional night geographies of different ethnic groups and their patterns during the day, as an investigation of likely work and leisure activity patterns at different times of the day and week.
About the authors
Muhammad Adnan is a Postdoctoral Research Associate in the Department of Geography at University College, London. His research focus is on social media analysis, data mining, and visualization of large spatio–temporal databases.
E–mail: m [dot] adnan [at] ucl [dot] ac [dot] uk
Paul A. Longley is a Professor of Geographic Information Science at University College, London. His publications include 14 books and more than 125 refereed journal articles and book chapters. He is a co–editor of the journal Environment and Planning B and a member of five other editorial boards. He has held 10 externally funded visiting appointments and given over 150 conference presentations and external seminars.
E–mail: p [dot] longley [at] ucl [dot] ac [dot] uk
Shariq M. Khan is a Ph.D. student in the Department of Electronic and Computer Engineering, Brunel University. His research interest includes routing in mobile and vehicular ad hoc networks.
E–mail: shariq [dot] khan [at] brunel [dot] ac [dot] uk
This work was completed as part of the EPSRC research grant “The Uncertainty of Identity: Linking Spatiotemporal Information in the Real and Virtual Worlds” (EP/J005266/1).
Faiyaz Al Zamal, Wendy Liu, and Derek Ruths, 2012. “Homophily and latent attribute inference: Inferring latent attributes of Twitter users from neighbors,” Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4713, accessed 1 May 2014.
Shea Bennett, 2012. “REVEALED: The top 20 countries and cities of Twitter [STATS]” (13 August), at http://www.mediabistro.com/alltwitter/twitter-top-countries_b26726, accessed 31 December 2012.
Sonia E. Colantonio, Gabriel W. Lasker, Bernice A. Kaplan, and Vicente Fuster, 2003. “Use of surname models in human population biology: A review of recent developments,” Human Biology, volume 75, number 6, pp. 785–807, and at http://digitalcommons.wayne.edu/humbiol/vol75/iss6/1, accessed 1 May 2014.
Justin Cranshaw, Raz Schwartz, Jason Hong, and Norman Sadeh, 2012. “The Livehoods Project: Utilizing social media to understand the dynamics of a city,” Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4682, accessed 1 May 2014.
GenderChecker, 2012. “GenderChecker,” at http://genderchecker.com, accessed 22 January 2012.
Pablo Mateos and Ken Tucker, 2008. “Forenames and surnames in Spain in 2004,” Names volume 56, number 3, pp. 165–184.
doi: http://dx.doi.org/10.1179/175622708X332860, accessed 1 May 2014.
Pablo Mateos, Paul A. Longley, and David O’Sullivan, 2011. “Ethnicity and population structure in personal naming networks,” PLoS ONE, volume 6, number9, e22943.
doi: http://dx.doi.org/10.1371/journal.pone.0022943, accessed 1 May 2014.
Marco Pennacchiotti and Ana–Maria Popescu. 2011. “A machine learning approach to Twitter user classification,” Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2886, accessed 1 May 2014.
Daniele Quercia, Licia Capra, and Jon Crowcroft, 2012. “The social world of Twitter: Topics, geography, and emotions,” Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, at https://www.aaai.org/ocs/index.php/ICWSM/ICWSM12/paper/view/4612, accessed 1 May 2014.
Twitter, 2012a. “What is Twitter?” at https://business.twitter.com/basics/what-is-twitter/, accessed 31 December 2012.
Twitter, 2012b. “The streaming APIs,” at https://dev.twitter.com/docs/streaming-apis, accessed 22 January 2012.
Worldnames. 2013. “World family names,” at http://worldnames.publicprofiler.org/, accessed 5 February 2013.
Received 18 August 2013; accepted 30 September 2013.
Copyright © 2014, First Monday.
Copyright © 2014, Muhammad Adnan, Paul A. Longley, and Shariq M. Khan.
Social dynamics of Twitter usage in London, Paris, and New York City
by Muhammad Adnan, Paul A. Longley, and Shariq M. Khan.
First Monday, Volume 19, Number 5 - 5 May 2014