Effects of gender and tie strength on Twitter interactions
First Monday

Effects of gender and tie strength on Twitter interactions by Funda Kivran-Swaine, Samuel Brody, and Mor Naaman

We examine the connection between language, gender, and social relationships, as manifested through communication patterns in social media. Building on an analysis of 78,000 Twitter messages exchanged between 1,753 gender–coded couples, we quantitatively study how the gender composition of conversing users influences the linguistic style apparent in the messages. Using Twitter data, we also model and control for the strength of ties between conversing users. Our findings show that, in line with existing theories, women use more intensifier adverbs, pronouns, and emoticons, especially when communicating with other women. Our results extend the understanding of gender–driven language use in the semi–public settings of social media services, and suggest implications for theory and insights for sociolinguistics.


1. Introduction
2. Theoretical background and hypotheses
3. Study: Gender and language in Twitter interactions
4. Discussion
5. Conclusion



1. Introduction

Popular social media platforms, like Facebook and Twitter, are home to significant amount of interpersonal interactions among their users. Many of these systems match the communication model of social awareness streams (SAS): one–to–many communication channels such as Twitter or Facebook’s “News Feed” (Naaman, et al., 2010). SAS expose various types of communication acts, such as sharing of information, creating new relationships, and managing existing ones. When analyzed in aggregate, SAS data can help us study human behavior in naturalistic settings and at scale.

Twitter, a widely used social network site, provides an exceptional opportunity to observe and analyze interpersonal communication patterns, in their intended social surroundings. In Twitter, there are number of communication conventions, such as replies, mentions, and re–tweets, which allow users of the platform to frame their messages with respect to their social connections. Replies, the communication convention we investigate in the current paper, facilitate directed conversations between individuals, where one user publicly “replies to” a message from another. The “reply” interactions on Twitter are acts of communication that (1) can be observed in their natural settings, not in an environment controlled for the purposes of research; and, (2) are semi–public in nature, bringing with them a potential “audience effect”, which may differentiate this communication convention from other computer–mediated communication (CMC) or non–mediated settings. In addition, data about users and their relationships, available from Twitter, offers the prospect of discovering connections between communication patterns, individual traits, and social relationships.

In this work, we look at how the gender of interacting users affects language use on Twitter. Gender is one construct that has been widely studied in relation to communication. A number of previous studies looked at the gender of the communicator as one of the main factors influencing content as well as the style of communication (Bronwlow, et al., 2003; Eckert, 1996; Labov, 1990; Lakoff, 1975; Mulac, et al., 1988). In this work, our goal is to further understand the relationship between gender and communication style, looking at exchanges between dyads of different gender compositions. While inspecting communication style, we build on the concept of linguistic style, the ways in which individuals use language. Linguistic style reveals attributes of the individuals, as well as their relationships with others (Pennebaker, et al., 2002). We look at 1,753 dyadic relationships between users of Twitter, and 78,000 directed semi–public Twitter replies exchanged between the users in these dyads. We address the following research question: “How does the gender composition of interacting dyads relate to linguistic styles in online conversations?”

When looking at language used in dyadic communications, it is beneficial to control for the strength of the connection between the interacting users. The intimacy and the intensity of the relationship between communicating parties have been shown to influence communication style (Bergs, 2006). The strength of ties between individuals can significantly influence the linguistic style of conversations that takes place between them. Thus, in this inquiry on understanding the relationship between gender and communication, we take tie strength into account.

Studying the association between gender, communication, and relationships can help us better understand communication theories, which currently are largely based on settings that are either non–mediated or tightly controlled. We use Twitter data to reveal how these theories apply to new communication environments within SAS, and at scale. Moreover, by examining gender and interpersonal interactions in their intended natural environments, we can better understand communication behavior in SAS settings.

1.1. Social interaction in Twitter

Twitter is a highly popular social media service, used for many purposes including personal communication, information sharing, business communication, and marketing. User activity in Twitter is primarily focused around “streams” of content. Twitter allows users to post short messages (tweets) up to 140 characters long. Users in Twitter are connected to others via asymmetric ”follow“ relationships (e.g., If Jeremy follows Devon, it does not imply that Devon is following Jeremy). A follow relationship on Twitter implicates that when a user logs into Twitter, she will be shown posts from those she follows, in reverse–chronological order. Unless users decide to make their accounts private, all messages in Twitter are publicly available to view. In this work, we report solely on publicly available data.

Twitter has a number of communication conventions such as mentions, replies, and re–tweets, allowing for different modes of interaction. For our study, we focus on the “reply” convention within Twitter. In Twitter, a user can send a reply message to another user, by initiating the message with an “@” immediately followed by the username of the correspondent. By default, reply messages are shown publicly on the sender’s Twitter profile page. Reply messages also are displayed in the timeline of all users following both the sender and the recipient of the message. For example, if Eddy follows both Jeremy and Devon, replies from Jeremy to Devon (or vice versa) will be displayed in Eddy’s timeline.



2. Theoretical background and hypotheses

We base our work on theories and prior work in language and gender. Grounding on theory, we develop a set of hypotheses to be tested on a dataset of replies from Twitter.

2.1. Linguistic style

In CMC environments, language often is the primary if not the only form of communication, and people utilize language to form and maintain online relationships (Baym, 2002). Linguistic style can be defined as how individuals or groups of individuals use language in communication. Linguistic style is a way of language use, specific to a community or a sub–population, supporting the construction of identification for speakers of a language (Eckert, 1996). In other words, style is how a person uses language in relation to other people.

Linguistic style, especially in the new communication environments of SNS, has the potential to reveal not only intentions (Searle, 1969) but also identity and social group membership (Tagliomente, 2006) As individuals alter their linguistic style for their audiences to achieve an expected effect (Bell, 1997), an examination of linguistic style can uncover rather complex social dynamics between gender, language, and social relationships, which may be hard or impossible to untangle otherwise.

A common approach to studying linguistic style has been observing the use of individual words, or categories of words, including various parts of speech (e.g., adverbs, pronouns, prepositions). Even though words as units, by themselves, do not carry or reveal the meaning behind speech acts, studying frequency or type of use of categories of words divorces the utterance from the context, and helps analysis focus on style (Pennebaker, et al., 2002).

2.2. Language, linguistic style, and gender

A significant number of research studies have examined gender differences with respect to language use. Past studies have illuminated differences in linguistic style between men and women, revealing patterns that emerge in language when men and women interact with one another (Bronwlow, et al., 2003; Cegala, 1989; Mulac, et al., 1988; Savicki, et al., 1997; Sillars, et al., 1997). In our study, we concentrate on analyzing use of three types of linguistic style markers — personal pronouns, intensifier adverbs, and emoticons — the most widely studied features in dyadic settings.

We next report on previous work examining the aforementioned markers that focused on the gender of the communicating person, and explain our motivation for studying them in SAS. We then report on the literature on the effect of dyadic gender composition on communication.

2.2.1. Personal pronouns

People use personal pronouns to reference others or themselves. A recipient of a message containing personal pronouns should be able to infer whom the pronouns refer to, in order to comprehend the full meaning of the message. Thus, frequent use of personal pronouns can imply that speakers make assumptions about their audiences’ ability to follow references (Pennebaker, et al., 2002), presupposing a level of intimacy and common ground. Previous studies have shown that women, when compared to men, use first person singular (FPS) pronouns significantly more often (Brownlow, et al., 2003; Mulac, 2006; Pennebaker, et al., 2002; Savicki, et al., 1997). However, there are conflicting findings regarding how the use of first person plural pronouns (FPP) differs between men and women (Brownlow, et al., 2003; Mulac, 2006). Personal pronoun use in relation to gender in SAS may indicate how speakers situate themselves within the social context of interactions (Mühlhäusler and Harré, 1990), and reflect on their social surroundings. For example, according to cognitive grammar framework (Langacker, 1987), use of the first person singular pronoun “I” increases the prominence of the speaker radically, maximally objectifying the speaker’s self. On the other hand, use of first person singular pronouns such as “we” indicates a sense of “oneness” and “communality”, which generally is associated with socially constructed gender roles for women (Eagly and Wood, 1991). Gender differences in the use of pronouns can expose how men and women may feel and act differently about identities that they construct online, as well as relationships they maintain in these platforms.

2.2.2. Adverbs

Adverbs are parts of speech that carry very little meaning by themselves, but significantly change the style as well as the strength of the utterances that they appear in. Simply, adverbs modify other words in a sentence. Intensifier adverbs (intensifiers), such as “really”, are used to strengthen, diminish, or otherwise change the meaning of the word they precede. People are motivated to use intensifiers to capture the attention of their audiences (Peters, 1994). A higher level of adverb use increases the narrative characteristic of the communication act, making utterances more embellished, and possibly alluring for listeners. The use of adverbs is a linguistic style feature that has previously been shown to be used more by women (Lakoff, 1975; Mulac, 2006). In fact, the dominance of adverb use by women has been noted by scholars since mid–eighteenth century (Partington, 1993). An early (and simplistic) interpretation from the beginning of nineteenth century for women’s increased adverb use was women’s inherent fondness for hyperbole in expressions (Stoffel, 1901). An examination of adverb use and how it changes between men and women in today’s technologically advanced communication platforms is valuable, as differences in adverb use can reveal how men and women decide to make their statements more captivating to their audiences in the performative spaces of SAS. For example, we may discover under what circumstances men and women choose to increase adverb use, thus coloring their expressions to influence their audience’s perceptions of specific messages.

2.2.3. Emoticons

Emoticons are series of symbols that represent non–verbal cues such as smiling, frowning, or winking, and are frequently used in CMC channels, where non–verbal cues often are impossible to put forth in text–based settings (Wolf, 2000). The manner in which (e.g., type, frequency) emoticons are used in language can also be considered a linguistic style choice. Former work that researched emoticon use in various CMC channels such as discussion boards, chat rooms, or blogs found significant gender differences in trends of emoticon use. Previoulsy, women were found to use emoticons more than men (Witmer and Katzman, 1997; Wolf, 2000). However, in a study of blogs authored by teenagers, boys were shown to use emoticons significantly more than girls (Huffaker and Calvert, 2005). Thus, age and context can be confounding factors in the relationship between gender and emoticon use. Emoticons, like pronouns, suggest intimacy between conversing parties. Looking at emoticon use in SAS and whether or how men and women use emoticons differently can help us better understand circumstances in which men and women feel at ease to include non–verbal cues in their semi–public communications. Moreover, by studying emoticons in relation to gender, we can uncover how men and women enrich the meaning of their interactions differently by portraying traces of their sentiment and mood in rather casual ways.

2.2.4. Gender and language in dyadic interactions

While the studies described earlier mostly considered the gender of the person communicating, researchers have also examined the interaction between linguistic style and the gender composition of the conversing pair. For example, language accommodation (Giles, et al., 1991), known to occur on Twitter (Danescu–Niculescu–Mizil, et al., 2011), leads, in same–gender dyads, to stressing tendencies for gender–specific language differences as converging tendencies (Mulac, et al., 1998). Other effects of gender composition have been shown; for example women use more pronouns (a typical female language feature) when interacting with other women (Brownlow, et al., 2003). Moreover, it was observed that in mixed–gender groups of interaction, men’s level of emoticon use rose to the level of women (Wolf, 2000).

2.3. Language and social networks

A person’s language use depends on whom the person is talking with. The speech community, a group sharing common language specifics in which an individual participates, plays an important role in the linguistic style of the individual. Online social networks are contemporary virtual speech communities (Paolillo, 1999) and manifest fundamental attributes of speech communities in traditional settings, such as language accommodation (Danescu–Niculescu–Mizil, et al., 2011).

Both audience design framework from sociolinguistics and accommodation theory from communication were used previously to explain changes in one’s language in relation to social environments where a speech act takes place. In audience design framework, Bell (1997) first defines “style” as what a person does with language in relation to others, and then builds the framework around the main idea that speakers design their styles based their audiences (Bell, 1997). Very similar to audience design framework is the communication accommodation theory (Giles, et al., 1991). This notion states that speakers adjust their speech to seek social attractiveness or to increase efficiency of communication.

The association between language and networks of social relationships has also been subject of inquiry in CMC. Previous work found features in e–mail text to detect power relationships (Bramsen, et al., 2011), as well as roles in corporate settings (McCallum, et al., 2007). Speech accommodation was observed in small groups, where word count, pronoun, and tense use related to a group’s cohesiveness (Gonzales, et al., 2010). Most recently, it was shown that linguistic accommodation is observable in social media, namely interactions in Twitter (Danescu–Niculescu–Mizil, et al., 2011).

Overall, previous work indicates that people shape their language to their audiences, and this phenomenon takes place in SAS as well. Interactions in Twitter have well–defined audiences, properties of which may influence the linguistic style of interactions, beyond the gender composition of the participants. The size of an audience in Twitter interactions directly correlates with the number contacts shared between the conversing parties. While a larger audience in traditional settings may bring with it qualms about disclosure, for Twitter replies, a large audience is indicative of a large number of shared contacts, therefore possibly a closer and more intimate relationship. Accordingly, in our analysis of gender and language, we account for the size of the audience of interactions, which may be a proxy for the strength of ties between interacting parties.

2.4. Hypotheses

Building on previous work, we develop the following hypotheses about gender composition and linguistic style in Twitter replies:

  • H1a: Women use personal pronouns (first person singular, and first person plural) more than men.
  • H1b: Women use personal pronouns more when interacting with women, than they do when interacting with men.
  • H2a: Women use intensifiers more than men.
  • H2b: Women use intensifiers more when interacting with women than they do when interacting with men.
  • H3a: Women use emoticons (positive and negative) more than men.
  • H3b: Women use emoticons more frequently when interacting with women than they do when interacting with men.



3. Study: Gender and language in Twitter interactions

We begin reporting our study by describing the dataset we used for our analysis. We then describe in detail the samples we selected from our dataset, and our motivations for doing so. We follow up by explaining how we calculated the variables to be used in our analyses. Finally we go through the analyses we have performed, to understand the relationship between gender composition of conversing individuals and the linguistic style assumed by them in Twitter interactions.

3.1. Dataset

We extracted a dataset of Twitter interactions, consisting of replies between Twitter dyads, as well as the genders of participating users, and the Twitter contact network (followers and followees) around the users in each dyad.

We began forming our dataset by identifying 715 Twitter users, whose data was available from a previous study (omitted for anonymity). This set, S, includes Twitter users that were randomly selected and manually identified as active, personal users of Twitter, not representing commercial entities or celebrities. The users followed, and were followed by fewer than 5,000 users. For each user s∈S, the dataset included the tweets s posted between over a period of four months, including replies by s to other users, as well replies posted by other users that targeted s. Using this initial dataset of Twitter posts, we identified all dyads (s, f) in the data such that: 1) s replied to f at least twice, and 2) f replied to s at least once. Selecting users in this way ensures that a) the seed interacts with the follower in non–trivial manner; and, b) both sides are engaged in the exchange have a meaningful interaction.

We further filtered the dataset to only include dyads where we could obtain gender labels for both users. We used the Amazon Mechanical Turk (AMT) crowd sourcing platform to code the gender for all users in our initial dataset. For each user, we asked the online worker to categorize the profile as one of the following four categories: male, female, undetermined sex, or not a person. We asked the workers to click through to the user’s Twitter page and to consider the photo of the user, their name, screen name, and any other information they provide in their profile description text (e.g., “I’m a mother” in the profile text indicates the user is likely to identify as a woman). To assess the accuracy of the resultant AMT codes, one of the authors manually coded a sample of the dataset (2.5 percent), creating ground truth data. The coders agreed with our ground truth 89.6 percent of the time, and 93.4 percent if we hold out users that were labeled as “undetermined”, since these are more ambiguous and difficult, and were not used for our analysis. This rate corresponds to a Scott’s Pi reliability measure of 0.83, considered “excellent”.

The resultant dataset of interacting dyads included 1,753 pairs of Twitter users where for each dyad (x,y) we have a set Ix,y of reply messages exchanged between the two users. We refer to the subset of replies in this dyad that were directed from x to y as Ix»yIx,y. The content dataset resulted in a total of 77,989 replies, an average of 44.5 interactions per dyad (i.e., mean size of Ix,y across all dyads; these sets show a skewed distribution, with median=20, SD=92.6). Some seeds were represented more than others in the dataset as they participated in more dyadic conversations (the mean number of conversations per seed x is 6.0; median=4, SD=6.99). For our analysis, we provide some control for the skewed distributions by looking at the subsamples of these exchanges at the level of exchanges and at the level of individual tweets, as described below.

For each dyad, we computed the set of common neighbors for the dyad’s users in the Twitter social network. We retrieved all users z such that z is either followed by or following one of the users in the dyad. We used that network to compute the common neighbors variable described below. To retrieve this network information, we used the Twitter social network snapshot that has been collected at the same time as the content datatset we use here, available from Kwak, et al. (2010).

3.1.1. Tweet sample

The first sample we used from dataset, tweet sample, consists of individual tweets as units. We used the tweet sample to examine whether linguistic style markers existed in each tweet. In this sample, each tweet is assigned to one of the four categories of gender composition, according to the “direction” of the communication: Male to Male (shorthand: M»M), Male to Female (M»F), Female to Male (F»M), or Female to Female (F»F). In this sample, we included at most 10 messages from each person in a dyad, to minimize bias that may be created by users who participate in longer exchanges. In other words, if one side of each dyadic exchange had more than 10 messages (|Ix»y|>10) we selected 10 of these tweets at random to use for this sample. This sample was then used to look at the directed use of different linguistic markers at the dyadic level, e.g., whether a tweet directed from a man to a woman included any intensifier adverbs. The final tweet sample consists of 25,641 tweets from 1,753 dyads (4,248 F»F, 5,261 F»M, 5,482 M»F, and 10,650 M»M tweets). The average number of tweets per dyad was 14.6.

3.1.2. Exchange sample

To capture language use in more substantial exchanges between two individuals, we prepared a sample of “exchanges”. To ensure significant body of interaction between users in each dyad, in the Exchange Sample, each unit is an interaction that included at least 10 messages between the users in the dyad (|Ix»y|≥10). This sample was then used to look at the magnitude of use of different linguistic markers at the dyadic level, e.g., the proportion of pronouns included in an exchange. We assigned each exchange to a gender composition category using a three–level categorical variable, as each exchange can be characterized as Male–Male (MM), Male–Female mixed (MF), or Female–Female (FF). This process resulted in an Exchange Sample consisting of 1,343 dyads (222 FF, 565 MF, and 556 MM) that exchanged an average of 56.2 messages (median=29, SD=102.9) each.

3.2. Computed properties

Below, we summarize how the variables capturing linguistic style and social network properties were computed.

3.2.1. Linguistic markers

For our analysis, we computed language use variables for both tweet and exchange samples, mostly by using the “Linguistic Inquiry and Word Countt” (LIWC) dictionary, a widely used and studied language analysis system. We also derived descriptive network variables for each dyad in our dataset, described below.

We used the LIWC dictionary to generate tweet–level variables for each tweet in the tweet sample, and exchange–level variables for each exchange in the exchange. For the tweet sample, as tweets are too short for generating distinguishing and meaningful counts or proportions of words, we used the LIWC dictionary to code each tweet as 1 (includes at least one word in the linguistic category) or 0 (none). For the conversation sample, for each linguistic category, we calculated ratioC, the ratio of the number of category words (identified by the LIWC dictionary) in each conversation C, to the total number of words in the conversation.

In this study we focused on use of pronouns (First Person Singular and First Person Plural), intensifier adverbs, and emoticons as linguistic style markers. For pronouns and adverbs, we used the LIWC dictionaries to compute the values in tweet and exchange samples. We calculated variables capturing the use of emoticons by tokenizing tweets into unigrams, and using a regular expression that identified 228 distinct “faces” in our data.

3.2.2. Structural network properties

While we are interested in conversations between dyads of different gender compositions, other variables may offer an alternative explanation for the content and style of conversation. Most prominently, the type and strength of connection between individuals in a dyad may play a role. The strength of ties (Granovetter, 1973) between people is known to affect linguistic style used in conversations. To account for the effect the strength of ties between users may have on interactions, we calculated the dyad’s number of common neighbors, a number that have been previously shown to be associated with tie strength in social media (Kivran–Swaine, et al., 2011).

The number of common neighbors is the number of Twitter connections shared by the members of the dyad. Formally, if the neighbors of node z are defined as Nz={w|w→z or w←z} then the number of common neighbors for a dyad (x,y) is |Nx∩Ny|. The number of common neighbors showed a log–normal distribution; we used the log–normalized number of common neighbors variable in all the tests reported below.

3.3. Analysis

The goal of the analysis was to capture the differences between dyads in different gender compositions, in their levels of use in each linguistic category we examined. For each linguistic category we examine, we test the influence of gender composition on two dependent variables, one computed from the tweet sample, and one from the exchange sample, as described next.

As previously mentioned, the unit of analysis in the tweet sample is a single tweet, and the dependent variable is the existence (0–1) of a linguistic category word in the tweet. We begin our analysis by constructing a binary logistic regression model for each language category, with the existence of the category word in the tweet as a two–level response variable. Each model uses two independent variables (IV): (1) the gender composition of the dyad interacting; and, (2) the number of common neighbors shared by the dyad’s members (see Figure 1). The gender composition IV is a four–level categorical variable representing the possible gender compositions for the dyad responsible for that tweet (M»M, M»F, F»M, F»F). The common neighbors IV is continuous, and log–normalized. We thus verify whether the existence of category words in replies can explained by the dyad’s gender composition, beyond the effect of the number of common neighbors shared by the dyad.


Regression model for the tweet sample
Figure 1: Regression model for the tweet sample.


An additional Pearson’s chi–square test, using the gender variable and the linguistic variable, helps us inquire further about nuanced differences between gender compositions. For a categorical variable like gender composition, the logistic regression model only allows us to reason about the difference between all levels and a single reference level (we used F»F as the reference level in our model, as shown in Figure 1). To supplement our statistical analysis we used the chi–square test and tested the null hypotheses about differences between gender compositions.

In the exchange sample, the unit of analysis is a dyad’s group of exchanges, and the dependent variable is based on the ratioC value, capturing the proportion of words that belong to the linguistic category in exchanges. The distribution of the ratio score is not normal, log–normal, or standardized. Therefore, we turn the ratio value for each dyad into a use level {low, medium, high}. To do that, for each linguistic category, we calculate the ratio’s mean (M) and standard deviation (SD). Then, we label the use level of a dyad’s use of the linguistic category in a conversation as “low” if ratioC<M–SD; “medium” if M–SD<ratioC<M+SD; and, “high” if ratioCM+SD.

Similar to the analysis of the tweet sample, we start our analysis by constructing a multinomial logistic regression model for each category, to explain use level of the category words in the conversation (a three–level response variable). Each model uses two independent variables (IV): (1) the dyad’s gender composition; and, (2) the number of common neighbors shared by the dyad’s members. The gender composition IV for the tweet sample is a three–level categorical variable representing the possible gender compositions for the dyad (MM, MF, FF). The common neighbors IV is the same variable we used in the tweet sample. For the purpose of capturing group differences between gender composition categories, a Pearson chi–square test was performed to test the null hypothesis (i.e., that no differences between gender compositions will be observed). The test looks at the relationship between two variables: the low–medium–high language use levels, and gender composition.

Since our hypotheses involved multiple tests using the same variables (i.e., the gender composition), we controlled for the higher likelihood of false positive results by using the Bonferroni correction, which asks for a significance level of α/n when conducting n tests at once. Thus, for the chi–square tests for both samples, when reporting the results, we point out those that are significant within the Bonferroni correction (p<.01, given the number of tests we perform). As regression models as a whole do not have associated significance values, we cannot perform this sort of correction for our regression models.

3.4. Results

Before looking at the relationship between gender composition of dyads and language in communication, we investigated the relationship between the number of common neighbors and gender composition. The results of the ANOVA test suggest, as expected, that there are significant differences between gender groups in relation to number of common neighbors (for tweet sample, F(3,23818)=87.66, p<.001). Post–hoc analysis in both cases revealed that MM dyads had significantly more common neighbors than FF dyads, who had significantly more common neighbors than mixed–gender dyads. These homophilous tendencies that suggest that stronger ties exist between individuals of the same gender, and further demonstrate the need to control for the number of common neighbors in our analyses.

3.4.1. Linguistic style

We now present the results for the use of each linguistic style feature by dyads of different gender compositions. For the regression models, we report odds ratios (OR) and only report the significant contributing factors (with p<.05). First person singular pronouns

The use of first person singular pronouns (FPS) is significantly affected by gender composition, even when controlling for the effect of the common neighbors variable, with FPS use more likely between all–female dyads and less likely for all–male, supporting H1a and H1b. For the tweet sample, the binary logistic regression model predicting existence of FPS in replies showed that an increase in number of common neighbors makes the existence of FPS in replies slightly less likely (OR=0.94, p<.001, meaning that for each ten–fold increase in number of common neighbors, the likelihood of FPS in replies decreases by a factor of 0.94; the further OR is from 1, the stronger the effect of the variable). But even beyond the common neighbors effect, the sex compositions M»F (OR=0.81, p<.001), F»M (OR=0.86, p<.001), and M»M (OR=0.78, p<.001) make the existence of FPS in tweets less likely compared to the F»F reference category. In this case, OR reflects the ratio of the likelihood of FPS in a M»F tweet (for example) to the likelihood of FPS of the reference F»F tweet. In the exchange sample, the multinomial logistic regression results showed that MM sex composition (OR=0.96, p<.005) makes high–level FPS use slightly less likely than low–level.

The chi–square tests reveal more nuanced information about group differences in FPS use (here, without controlling for the number of common neighbors). The results for the tweet sample indicated that significantly higher proportion of replies by F»F dyads (52.4 percent) and lower proportion of replies by M»M dyads (46.7 percent) contained FPS compared to M»F and F»M replies (48.4 percentbetween them). The exchange sample also showed similar group differences with respect to gender composition of dyads: a significantly higher proportion of FF dyads (20.3 percent) and a significantly lower proportion of MM dyads (9.7 percent) exhibited high levels of FPS use compared to MF dyads (13.1 percent). The test results were significant for both the tweet sample, χ2 (3, N=25641)=41.02, p<.001, as well as the exchange sample, χ2 (4, N=1343)=16.55, p<.005. First person plural pronouns

There are significant differences between gender groups with regards to their use of first person plural pronouns (FPP), even when controlling for the effect of the common neighbors variable: FPP use is more likely between all–female dyads, supporting H1a and H1b.

The binary logistic model on the tweet sample showed that the number of common neighbors had a significant positive effect on FPP existence (OR=1.26, p<.001) (i.e., the more common neighbors a dyad has, the more likely they are to use words like “we” in a tweet). The model also revealed that beyond the effect of common neighbors, the M»F (OR=0.79, p<.005) and F»M (OR=0.84, p<.05) gender compositions makes the existence of FPP in replies less likely compared to the F»F reference. However, the regression model on the exchange sample did not expose significant results.

The chi–square test results from the tweet sample showed that a significantly higher proportion of replies from F»F dyads (7.8 percent) contained at least one FPP, compared to the other groups (6.6 percent between them). The analysis of exchanges showed similar, significant, yet less conclusive outcome, with higher proportion of FF dyads (9.9 percent) using high levels of FPP compared to MF (7.3 percent) and MM (8.8 percent) dyads. However, for both samples, differences were not significant according to the Bonferroni correction requirements, with the tweet sample results at χ2 (3, N=25641)=11.01, p=.01, and the conversation sample at χ2 (4, N=1343)=9.88, p<.05. Intensifiers

The use of intensifier adverbs is significantly affected by gender composition, even after controlling for the number of common neighbors. Intensifiers are more likely to be used between all female dyads, and less likely to be used when a woman is interacting with a man, lending support to H2b and partial support to H2a.

In the binary regression model on the tweet sample, the contribution of the common neighbors variable to the overall model was not significant. Nevertheless, the existence of intensifiers were more likely in tweets by F»F dyads than F»M (OR=0.79, p<.001), M»F (OR=0.85, p<.001), or M»M (OR=0.84, p<.001) dyads. On the other hand, no significant effects were seen in the regression model for the exchange sample.

Chi–square test results from the tweet sample show that women used intensifiers more frequently when interacting with women, but less frequently when interacting with men. A significantly higher proportions of replies by F»F dyads (42.6 percent) and a significantly lower proportion of F»M dyads (37 percent contained at least one intensifier, compared to the replies by M»F and M»M dyads (38.5 percent between them). Similarly, in the exchange sample, a significantly higher proportion of conversations in FF dyads (90.5 percent) exhibited high or medium level of intensifier use, when compared to the conversations in MF (83 percent) and MM (83.5 percent) dyads. The test results for the tweet sample were significant, χ2 (3, N=25641)=33.53, p<.001. The exchange sample results were not significant according to the Bonferroni correction requirements, χ2 (4, N=1343)=11.96, p<.05. Emoticons

The use of positive emoticons (e.g., “(:”) in Twitter interactions was influenced by the gender composition of the dyad, controlling for the number of common variables. Overall, women exhibited more frequent and higher levels of positive emoticon use when compared to men, and men used emoticons more when talking to women than talking to men, supporting hypotheses H3a and H3b. We must note that neither of our samples included sufficient numbers of negative emoticons to be used in analysis and statistical tests.

The regression model on the tweet sample illustrated that even when accounting for the number of common neighbors, M»F (OR=0.81, p<.005) and especially M»M (OR=0.57, p<.001) dyads makes the existence of positive emoticons in replies less likely. The number of common neighbors did not contribute significantly to the model in the tweet sample, but did show an effect in the exchange sample, with higher values of common neighbors making the high level use of positive emoticons more likely (OR=2.29, p<.05).

The chi–square test results on the tweet sample analysis suggested that a significantly higher proportion of replies by F»F (11 percent) and F»M (11 percent) dyads, and a significantly lower proportion of replies by M»M dyads (7 percent) included a positive emoticon compared to the proportion of M»F dyads (9 percent). Similarly, the exchange sample analysis shows, for example, that a significantly higher proportion of MM interactions (60 percent) exhibited low levels of positive emoticon use, when compared to other groups (47.1 percent). Group differences between gender compositions in their use of positive emoticons were significant in both the tweet sample, χ2 (3, N=25641)=126.14, p<.001, and the exchange sample, χ2 (4, N=1343)=23.76, p<.001.




Overall, our findings highlight key gender differences in linguistic style, even after controlling for tie strength between conversing users. Gender differences revealed in our analysis have mostly confirmed observations in traditional settings; women use higher levels of FPP, FPS, intensifiers, and emoticons in their speech, with levels escalating even more when they converse with other women, hinting at accommodation.

Linguistic style differences may be exhibited not only through shifts in levels of use, but also through how certain linguistic features are used. For instance, use of different kind of intensifiers (e.g., “totally” vs. “absolutely”) may signify different linguistic styles. Therefore, following our initial inquiry, we performed secondary analyses on token–level, to find words in each language category that are the best predictors of each gender composition. Since our language categories are based on dictionaries, where each category is defined by a set of words, e.g., intensifiers, we can measure how the use of individual words from each category differs between dyads of different gender compositions. To this end, we used the tweet sample to perform an analysis of word use for each category.

For our token–level analysis, we looked at the degree of “predictiveness” of each word in each category with regard to the gender composition of a message. Specifically, each word in the vocabulary that was used by a number of users above a predefined threshold (100 in our analysis) was scored with regard to each gender composition according to the following function: pred(t,c)=f(t|c)/f(t), where f(t|c) is the fraction of tweets with the gender composition c that contain the token t, and f(t) is the fraction of tweets containing the token t in our dataset as a whole. We can then examine the high–scoring tokens for each gender composition, i.e., words which are much more likely to occur in that gender composition compared to others.

Next, we report insights we gained from the results in each category, along with further findings in token–level, to inform future hypotheses and directions of research.

4.1. Presupposed familiarity and talking about “us”

In our models, the number of common neighbors partially explained FPP and FPS use (i.e., the more people a dyad knows in common, the more likely they are to use the word “we”, and the less likely to use “me”). But even after accounting for the effect of the number of common neighbors, the use of personal pronouns was found to be associated with gender composition; with female–to–female exchanges much more likely to contain both FPS (“I”) and FPP (“we”) words, supporting hypothesis H1b and providing partial support to H1a.

We performed token–level analysis on FPS and FPP use across gender compositions. Terms that stand out in their predictive power for each gender composition can be seen at Table 1. For example, the table indicates that F»F tweets use the FPS “my” in their tweets 18 percent more frequently than it is used in the dataset as a whole, whereas tweets containing “mine” are 25 percent more frequent among M»F messages than they are overall. Token–level analysis for FPP use showed that the FPP “we” can be a significant marker for distinguishing replies by females (F»M and F»F messages; see Table 1).


Most predictive terms for each language feature in each gender composition category


These results suggest that, in their Twitter interactions, women tend to reference both themselves and others, more than men do. Moreover, the finding that the FPP “we” is a strong predictor of an utterance by a woman may imply that women in fact make communal references more frequently in their speech. In other words, it is likely that in their interactions, women refer to their partner in communication in unity with themselves, or they speak on behalf of others in the social context of a given communication. The increased FPS use by women also reveals how women might tend to make themselves the primary subject of their speech significantly more often than men. In general, the female linguistic style that was manifested in our study is more socially aware than linguistic style exhibited by men. This may be due to the fact that even when conversing with those they feel close to, in Twitter, women’s interactions are more about people and social happenings, whereas men prefer a style that is less personal. While our findings are not conclusive nor do they explain why these differences exist, future studies on social and behavioral predecessors of this particular linguistic difference could be valuable.

4.2. Embellished language and cues of discourse

The use of intensifiers was indeed shown to be more common with and between women, supporting the hypotheses H2a and H2b. Consistent with previous work, increased use of intensifiers is a marker of “female–style language”. This effect is heightened even more when the recipient of the message, as well as the sender, is a woman, suggesting communication accommodation.

It is possible that in their social media interactions, especially when the interactions are with a familiar audience, women, more so than men, aim to make their messages more captivating, influencing, and interesting. Intensifier adverbs can also be perceived as powerful tools to economically add detail and color to utterances in Twitter, where the length of utterances are strictly limited.

When we investigated, through token–level analysis, whether intensifiers were used differently between men and women, we discovered that intensifiers might in fact be strong linguistic features for further investigation of predicting gender composition of conversations. We found that for each gender composition group, there was at least one distinct intensifier that distinguished the group from the others (for example, M»M dyads used “actually” more often than other groups; see Table 1).

While intensifiers alone do not bear significant meanings, they can set up the tone of the message that they accompany, and clarify the intention of the speaker that uses them. A look at the most distinctive intensifiers for gender compositions exposed a potential trend: while the adverb “too” was the strongest predictor of an interaction by a woman directed to another woman, “actually” was the strongest predictor of an interaction from a man directed to a man. This finding puts forward the possibility that in their interactions especially with other women, women aim to emphasize their intentions of compliance and social harmony (e.g., “I like it too”, “You can do it too”). However, in man–to–man messages, in more so than other gender compositions, the tone is more argumentative, as the distinctive use of “actually”, an adverb used for clarification or correction purposes, exposes.

4.3. Men (still) don’t put on happy faces :)

The analysis of use of emoticons followed the same trend, supporting H3a; women used significantly higher levels of emoticons than men. As hypothesized (H3b), men used emoticons more frequently when interacting with women. We do not include a token–level analysis for individual emoticons, as the most predictive emoticons did not appear frequently enough in the dataset for us to treat them as sufficient indicators.

We noted that the use of emoticons exhibited a drastic drop in conversations between men. Almost two–thirds of the MM interactions had a low level of emoticon use. As emoticons essentially are symbolic representations of non–verbal cues, we look at literature in sex roles and communication to explain the trend that we observed. Stereotypically, women have been believed to be more emotionally expressive, verbally and non–verbally, than men. These perceptions have also been empirically observed in previous studies (Briton and Hall, 1995). In most Western cultures, it is the expected norm for men to suppress any emotional expression. Brody and Hall conclude that this training of suppression results in men perceiving non–verbal communication as irrelevant and unimportant, and accordingly giving less emphasis to non–verbally appending their communicative acts (Brody and Hall, 1993). Our results suggest that, even in CMC channels, where individuals supposedly are liberated in selecting their communication styles, they conform to established gender norms, knowingly or unknowingly.

4.4. “Love” versus “dude”

Finally, for each of the language categories we could point out distinctive tokens (words) used by different gender compositions, even when the use of that language category overall was not different between gender compositions. Indeed, the special characteristics of Twitter and our dataset motivated us to explore other stylistic categories, which may be associated with gender. To this end, we further examined the most predictive words for each gender class out of all the terms that appear in the data. The predictiveness of terms was measured similarly to the token level analysis described earlier in the Analysis section. Here, we again required support threshold of 100 occurrences per word (we only include terms that occur at least 100 times in one conversations of one of the gender compositions).

The results are shown in Table 2, with the top 10 most predictive words for each gender composition, along with their predictive value. Among the words in the table, we can see several which belong to the categories we examined (e.g., positive emotion words such as “good” and “love”, and intensifiers such as “so” and “too”). We also see several other categories of potential interest, such as words used to address the recipient (“u”, “dude”, “man”), third person pronouns (“her”, “him”, “she”), and question words (“when”, “how”, “will” and “what’s”). This last category is especially of interest, since, it pertains to the way that users express interest and involvement in their conversational partner’s feelings and emotional status (“what’s up?”,“how are you?”). These features can suggest additional differences in style of interaction between gender compositions, within, and outside Twitter.


Most predictive terms in each gender composition category




5. Conclusion

The Internet (and social media) has often been lauded as a liberative space in which individuals can express themselves in any way they choose. However, our findings show that large majorities of people continue to engage in familiar patterns — namely, gendered communication. We show that gender communication on directed semi–public responses on Twitter confirms known language and expression tendencies, and largely follows what is known from other settings. Our results are reminiscent of Nakamura’s findings (2002) about race online, particularly that structured issues that affect our non–digital lives follow our digital lives as well. However, we are cautious not to reduce the findings of this study to a simple conclusion of “Men talk like this, and women talk like that.” It is possibly the case that the majority presence of gender normative users in Twitter is drowning out others. As such, particularly in an era of machine learning, classifiers, and big data, we believe that future research should focus on ways of detecting more subtle variations of gender performance.

Our study exhibits a number of key limitations. Focusing our research on Twitter, we acknowledge that there is a significant bias in terms of the people using this service, and further, participating in publicly directed correspondence in it. Indeed, there is opportunity to extend and further verify our results. Are there additional variables that can explain language variation? A survey method, or further coding of profiles or relationships for additional characteristics (e.g., geographic location) can help refine sociolinguistic elements that are in play.

Social media services can be an exciting laboratory for studying human language, where language and its variation can be studied in an environment where it naturally occurs. Social media thus provides a significant opportunity for research, to extend and develop an understanding of language use in social and cultural groups, and relate language style and variation to other forms of social processes like relationship formation, status, emotional well–being, and more. End of article


About the authors

Funda Kivran–Swaine is a Ph.D. candidate in the School of Communciation & Information at Rutgers University.
Web: http://www.fundakivranswaine.com
E–mail: funda [at] rutgers [dot] edu

Samuel Brody is a software engineer at Google.
E–mail: sdbrody [at] gmail [dot] com

Mor Naaman Mor Naaman is an Associate Professor at the Jacobs Technion–Cornell Innovation Institute at Cornell NYC Tech.
E–mail: mor [dot] naaman [at] cornell [dot] edu



All authors were affiliated with Rutgers University at the time of the research.



Nancy Baym, 2002. “Interpersonal life online,” In: Leah A. Lievrouw and Sonia Livingstone (editors). The handbook of new media: Social shaping and consequences of ICTs. Thousand Oaks, Calif.: Sage, pp. 35–55.

Alexander Bergs, 2006. “Analyzing online communication from a social network point of view: Questions, problems, perspectives,” Language@Internet, volume 3, at http://www.languageatinternet.org/articles/2006/371, accessed 26 August 2013.

Allan Bell, 1997. “Language style as audience design,” In: Nikolas Coupland and Adam Jaworski (editors). Sociolinguistics: A reader. New York: St. Martin’s Press, pp. 240–250.

Philip Bramsen, Martha Escobar–Molano, Ami Patel, and Rafael Alonso, 2011. “Extracting social power relationships from natural language,” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 773–782, and at http://www.aclweb.org/anthology-new/P/P11/P11-1078.pdf, accessed 26 August 2013.

Nancy J. Briton and Judith A. Hall, 1995. “Beliefs about female and male nonverbal communication,” Sex Roles, volume 32, numbers 1–2, pp. 79–90.
http://dx.doi.org/10.1007/BF01544758, accessed 26 August 2013.

Leslie R. Brody and Judith A. Hall, 1993. “Gender and emotion,” In: Michael Lewis and Jeannette M. Haviland (editors). Handbook of emotions. New York: Guilford Press, pp. 447–460.

Sheila Brownlow, Julia A. Rosamond, and Jennifer A. Parker, 2003. “Gender–linked linguistic behavior in television interviews,” Sex Roles, volume 49, numbers 3–4, pp. 121–132.
http://dx.doi.org/10.1023/A:1024404812972, accessed 26 August 2013.

Donald J. Cegala, 1989. “A study of selected linguistic components of involvement in interaction,” Western Journal of Speech Communication, volume 53, number 3, pp. 311–326.
http://dx.doi.org/10.1080/10570318909374309, accessed 26 August 2013.

Cristian Danescu–Niculescu–Mizil, Michael Gamon, and Susan Dumais, 2011. “Mark my words: Linguistic accommodation in social media,” WWW ’11: Proceedings of the 20th International Conference on World Wide Web, pp. 745–754.
http://dx.doi.org/10.1145/1963405.1963509, accessed 26 August 2013.

Penelope Eckert, 1996. “Vowels and nail polish: The emergence of linguistic style in the preadolescent heterosexual marketplace,” In: Natasha Warner, Jocelyn Ahlers, Leela Bilmes, Monica Oliver, Suzanne Wertheim and Melinda Chen (editors). Gender and belief systems. Berkeley, Calif.: Berkeley Woman and Language Group, pp. 183–190; version at http://www.stanford.edu/~eckert/PDF/nailpolish.pdf, accessed 26 August 2013.

Alice H. Eagly and Wendy Wood, 1991. “Explaining sex differences in social behavior: A meta–analytic perspective,” Personality and Social Psychological Bulletin, volume 17, number 3, pp. 306–315.
http://dx.doi.org/10.1177/0146167291173011, accessed 26 August 2013.

Howard Giles, Justine Coupland, and Nikolas Coupland (editors), 1991. Contexts of accommodation: Developments in applied sociolinguistics. Cambridge: Cambridge University Press.

Amy L. Gonzales, Jeffrey T. Hancock, and James W. Pennebaker, 2010. “Language style matching as a predictor of social dynamics in small groups,” Communication Research, volume 37, number 1, pp. 3–19.
http://dx.doi.org/10.1177/0093650209351468, accessed 26 August 2013.

Mark S. Granovetter, 1973. “The strength of weak ties,” American Journal of Sociology, volume 78, number 6, pp. 1,360–1,380.

David A. Huffaker and Sandra L. Calvert, 2005. “Gender, identity, and language use in teenage blogs,” Journal of Computer–Mediated Communication, volume 10, number 2, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00238.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.2005.tb00238.x, accessed 26 August 2013.

Funda Kivran–Swaine, Priya Govindan, and Mor Naaman, 2011. “The impact of network structure on breaking ties in online social networks: Unfollowing on Twitter,” CHI ’11: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1,101–1,104.
http://dx.doi.org/10.1145/1978942.1979105, accessed 26 August 2013.

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon, 2010. “What is Twitter, a social network or a news media?” WWW ’10: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600.
http://dx.doi.org/10.1145/1772690.1772751, accessed 26 August 2013.

William Labov, 1990. “The intersection of sex and social class in the course of linguistic change,” Language Variation and Change, volume 2, number 2, pp. 205–254.
http://dx.doi.org/10.1017/S0954394500000338, accessed 26 August 2013.

Robin T. Lakoff, 1975. Language and woman’s place. New York: Harper & Row.

Ronald W. Langacker, 1987. Foundations of cognitive grammar. Volume 1: Theoretical prerequisites. Stanford, Calif.: Stanford University Press.

LIWC, Inc., 2007. “The LIWC2007 application,” at http://www.liwc.net/liwcdescription.php, accessed 18 March 2013.

Andrew McCallum, Xuerui Wang, and Andrés Corrada–Emmanuel, 2007. “Topic and role discovery in social networks with experiments on Enron and academic e–mail,” Journal of Artificial Intelligence Research, volume 30, number 1, pp. 249–272.
http://dx.doi.org/10.1613/jair.2229, accessed 26 August 2013.

Peter Mühlhäusler and Rom Harré, 1990. Pronouns and people: The linguistic construction of social and personal identity. Oxford: Basic Blackwell.

Anthony Mulac, 2006. “The gender–linked language effect: Do language differences really make a difference?” In: Kathryn Dindia and Daniel J. Canary (editors). Sex differences and similarities in communication. Second edition. Mahwah, N.J.: Lawrence Erlbaum Associates, pp. 211–231.

Anthony Mulac, John M. Wiemann., Sally J. Wiedemann., and Toni W. Gibson, 1988. “Male/female language differences and effects in same–sex and mixed–sex dyads: The gender–linked language effect,” Communication Monographs, volume 55, number 4, pp. 315–335.
http://dx.doi.org/10.1080/03637758809376175, accessed 26 August 2013.

Mor Naaman, Jeffrey Boase, and Chih–Hui Lai, 2010. “Is it really about me? Message content in social awareness streams,” CSCW ’10: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 189–192.
http://dx.doi.org/10.1145/1718918.1718953, accessed 26 August 2013.

Lisa Nakamura, 2002. Cybertypes: Race, ethnicity, and identity on the Internet. New York: Routledge.

John C. Paolillo, 1999. “The virtual speech community: Social network and language variation in IRC,” Journal of Computer–Mediated Communication, volume 4, number 4, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.1999.tb00109.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.1999.tb00109.x, accessed 26 August 2013.

Victor Savicki, Merle Kelley, and Erica Oesterreich, 1997. “Effects of instructions on computer–mediated communication in single or mixed–gender small task groups,” Computers in Human Behavior, volume 14, number 1, pp. 163–180.
http://dx.doi.org/10.1016/S0747-5632(97)00038-1, accessed 26 August 2013.

John R. Searle, 1969. Speech acts: An essay in the philosophy of language. Cambridge: Cambridge University Press.

Alan Sillars, Wesley Shellen, Anne McIntosh, and Maryann Pomegranate, 1997. “Relational characteristics of language: Elaboration and differentiation in marital conversations,” Western Journal of Communication, volume 61, number 4, pp. 403–422.
http://dx.doi.org/10.1080/10570319709374587, accessed 26 August 2013.

Cornelis Stoffel, 1901. Intensives and down–toners; A study in English adverbs. Heidelberg: C. Winter’s Universitätsbuchhandlung.

Diane F. Witmer and Sandra Lee Katzman, 1997. “On–line smiles: Does gender make a difference in the use of graphic accents?” Journal of Computer–Mediated Communication, volume 2, number 4, at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.1997.tb00192.x/full, accessed 26 August 2013.
http://dx.doi.org/10.1111/j.1083-6101.1997.tb00192.x, accessed 26 August 2013.

Alecia Wolf, 2000. “Emotional expression online: Gender differences in emoticon use,” CyberPsychology & Behavior, volume 3, number 5, pp. 827–833.
http://dx.doi.org/10.1089/10949310050191809, accessed 26 August 2013.


Editorial history

Received 22 March 2013; accepted 21 August 2013.

Creative Commons License
“Effects of gender and tie strength on Twitter interactions” by Funda Kivran–Swaine, Samuel Brody, and Mor Naaman is licensed under a Creative Commons Attribution–NonCommercial–NoDerivs 3.0 Unported License.

Effects of gender and tie strength on Twitter interactions
by Funda Kivran–Swaine, Samuel Brody, and Mor Naaman.
First Monday, Volume 18, Number 9 - 2 September 2013

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2016.