By examining “who connects with whom” in an online community using social network analysis, this study tests the social drivers that shape the collaboration dynamics among a group of participants from SourceForge, the largest open source community on the Web. The formation of the online social network was explored by testing two distinct network attachment logics: strategic selection and homophily. Both logics received some support. Taken together, the results are suggestive of a “performance–based clustering” phenomenon within the OSS online community in which most collaborations involve accomplished developers, and novice developers tend to partner with less accomplished and less experienced peers.
The development of the Internet and social media technologies have created new opportunities for people to share ideas, interact, and create content in online communities. One of the defining characteristics of these online social systems in the era of Web 2.0 lies in the ability to harness the power of collective intelligence and social network effects (O’Reilly, 2005). Prominent sites include Wikipedia, YouTube, Flickr, as well as the open source software (OSS) development communities such as SourceForge, which typify the decentralized and distributed model of “peer production of information, knowledge and culture” (Benkler, 2006).
Two issues are central to our understanding of the social and collaborative dynamics in online communities. First, what are the motivations for users to participate in online communities, in the form of posting messages, sharing videos and pictures, editing articles, developing free and open software, among others? Online communities, regardless of their goals and purposes, largely rely on members’ voluntary participation to survive and succeed (Lakhani and von Hippel, 2003; Ling, et al., 2005; Olson, 1965; Yuan, et al., 2009). As most communities do not provide direct economic rewards to encourage participation, understanding the motivations as well as barriers to voluntary participation online has thus become an important first step in determining the mechanisms and processes necessary to foster successful communities.
The second issue concerns the ways in which user collaborations are structured and distributed. Empirical research has shown that the level of participation in online communities can be extremely uneven. For example, a study of Wikipedia participation found that the top five percent of the total contributors made 44 percent of the total edits during the 28–day study period, while 10 percent of the contributors did not make any edits at all (Yuan, et al., 2009). Similarly, Franke and von Hippel (2003) found that in an online Apache forum, one percent of the developers contributed 20 percent of the messages, and the top 20 percent contributed 61 percent of the postings. Studies analyzing large numbers of projects on SourceForge, the largest OSS community Web site on the Internet (Sourceforge, 2011), reported that the distribution of projects is severely skewed on various activity measures (Crowston and Howison, 2006; Madey, et al., 2002), with only a tiny fraction of projects showing strong collaborative activities that are assumed to characterize the whole community (Healy and Schussman, 2003; Raymond, 2001), suggesting that the patterns of OSS activities may resemble a Zipf (or power–law) distribution (Madey, et al., 2004). Clearly, participation disparity is a widely observed phenomenon even in the most successful online communities. Understanding the sources of the disparity would not only inform the design and maintenance of online communities, but may also shed important light on the fundamental processes of human communication and collaboration.
This study approaches these two issues from a social network perspective. Specifically, it conceptualizes online communities as networks of participants in which the formation of social ties among participants are guided by distinct network attachment logics. As individuals in the online community actively choose other participants with whom to interact, their choices of network partners reveals the motivations and processes through which networks of collective intelligence are created and maintained. Therefore, by examining “who connects with whom” in the social networks comprising online communities, this study aims to test the social drivers that structure the collaboration dynamics among community participants.
Open source software (OSS) communities on the Web make excellent sites for online community research because they are emblematic of a participatory culture and the emergent peer production model in contemporary networked media environments (Benkler, 2006; Weber, 2004). This study analyzes the social network among a group of participants from SourceForge, the largest open source community on the Web (Sourceforge, 2011). The formation of social and collaborative dynamics in this OSS community was explored by examining two distinct attachment logics: (1) strategic selection, which predicts that OSS developers tend to connect with prominent developers in the community (Barabási, 2002; Kuk, 2006), and (2) homophily, which predicts that developers with similar traits tend to collaborate with each other (McPherson, et al., 2001; Ruef, et al., 2003). In the following sections, we first introduce the background of OSS online communities and review the literature on the two network attachment logics. We then present an empirical test of the research hypotheses. The paper ends with a discussion of the implications and directions for future research.
Open source software communities on the Web
Open source software (OSS) generally refers to computer software products which users are allowed to freely “use, modify, and redistribute” (Feller, et al., 2005). Although the term OSS literally describes only the license of the software, OSS is often characterized by toolsets and software development processes that are perceived to be radically different from the traditional and proprietary model of software development. One of the prominent distinctions between OSS and the traditional model is how they motivate developers and coordinate production. Software products developed under proprietary models are usually supported by private investment and are developed in traditional hierarchical organizations. OSS products are typically created and maintained through Internet–based communities of geographically distributed software developers who voluntarily contribute to software products (von Hippel and von Krogh, 2003).
In OSS online communities, developers establish social relations by collaborating in software project teams (Grewal, et al., 2006). Developers who are working or have worked on the same project are linked to each other thereby creating collaboration networks. OSS developers typically do not receive monetary reward for their contributions, nor are they bounded by employment relationships. They retain the freedom to start, join or leave a project of choice without penalty (Lee and Cole, 2003). A unique example is “code forking,” a term which describes the initiation of a new project or a new branch of software product on the basis of existing codes. The possibility of code forking ensures that all involved have the opportunity to lead a project and make their own decisions within the limits of their willingness and capacity to contribute. Developers work on a certain project only if they want to, “because it is always possible to exit and fork” (Weber, 2004).
As OSS communities can be conceptualized as networks of developers, the patterns of partner selection among developers provide a unique perspective on the social drivers of collaboration. Specifically, the social network of OSS developers should resemble a random network unless one or more network attachment logics, which favor some developers more than others, are operative. In the following, two distinct network attachment logics that apply to the OSS developer network are discussed: strategic selection and homophily. Both attachment logics are found to be particularly robust in explaining the structural dynamics of large–scale networks (Powell, et al., 2005) and could potentially guide the decision of partner selection in smaller scale OSS communities.
The term “strategic selection” refers to an attachment logic, which describes the “rich–get–richer” phenomenon that is widely observed in social worlds (Barabási, 2002; Powell, et al., 2005). For example, novice actors usually start their careers by collaborating with famous movie stars; in academia, new publications often cite well–known papers. Merton (1968) referred to this general phenomenon as the “Matthew Effect” or cumulative advantage, in which the early achievers are disproportionally favored by the reward system. In OSS communities where software products are often the result of collective contributions, decisions to join certain projects are far from random (Weber, 2004). For example, Kuk (2006) found that OSS developers are more likely to interact strategically with expert developers who have specific and relevant knowledge that matters to software development. Research on collective action (Oliver and Marwell, 1988) indicated that individuals are quite strategic in selecting potential partners and projects, and they are most likely to interact with highly resourceful individuals and to join potentially successful projects.
In OSS communities where individuals have an essentially inexhaustible number of projects and potential partners to choose from, developers have to make a strategic choice as to which project(s) to join and which collaborator(s) to partner with. Researchers have proposed a set of related coordination mechanisms, including concepts like trust (Adler, 2001), reputation (Powell, 1990), status and legitimacy (Podolny and Page, 1998). In Benkler’s (2006) terms, OSS communities often display a “meritocratic hierarchy”, which does not hinge on employment authority as in traditional firms but on the professional reputation within the community. Particularly, online communities usually provide fewer signals of status than there are in an off–line community. When common social attributes such as educational background are largely absent or obscured, individuals could resort to following visible signals when they try to identify and choose a potential strategic partner in OSS communities: project leadership and reputation.
Project leaders usually initiate projects by articulating the goals, writing some code, inviting others to join in the work, and coordinating software production processes. Project leadership is usually synonymous with ownership in small projects, but leaders become more like moderators in larger projects (Weber, 2004). Studies of OSS largely acknowledge the critical role of project leaders. Lerner and Tirole (2002) argued that an important determinant of OSS project success is the nature of its leadership, and Healy and Schussman (2003) claimed that the key role of project leaders is very much under–theorized. One recent study found that project leaders’ social capital, conceptualized as their network embeddedness has an impact on project success, assuming that project leaders, compared to normal developers, are at an advantageous position in accessing relational resources (Grewal, et al., 2006).
As OSS developers have the freedom to fork and become their own project leader, project leaders may become “more dependent on the followers than the other way around” (Weber, 2004). Therefore, the responsibility may devolve to the leader to attract followers — often with distributed interests and knowledge — to work on a project collectively. Well–connected developers, compared to marginal and isolated developers, are more likely to succeed in such a demanding role. Additionally, project leaders are more likely to be approached for collaboration opportunities as well as expertise and experience for software development and project management. Therefore, it is reasonable to propose that project leadership may contribute to more collaboration opportunities in an OSS developer network:
H1: Project leaders tend to have more ties in the network than non–leaders.
As OSS developers are typically not paid, reputation in the community has been found to be one of the dominant motivators for contribution (Lerner and Tirole, 2002; Raymond, 2000). Reputation motivates voluntary participation in several important ways. First, good reputation among peers contributes to enhanced self–efficacy and ego boosting, which is arguably one of the most important reasons why OSS communities lure some of the most talented and creative developers in the world (Weber, 2004). Second, good reputation attracts attention and cooperation from others, making collaborative work much easier to accomplish (Raymond, 2000). Third, for the developers who seek potential employment opportunities, good reputation in OSS communities simply translates into a larger possibility of getting better–paid jobs and privileged access to resources like venture capital. A quick way to become reputable is to connect with highly reputable people in the community. Also, since reputable developers tend to be more knowledgeable and experienced, collaboration with them could in turn generate learning benefits and high–quality software, thus further enhancing the reputation of all the team members.
Reputation in the community, however, is not the only motivator. Several large–scale surveys show consistently that a large percentage of developers are motivated by their own need for software (Raymond, 2001), with other major motivators including the enjoyment of the creative process (Lakhani and Wolf, 2005), learning benefits (Lakhani and von Hippel, 2003), and gift–giving intentions (Zeitlyn, 2003). Ghosh’s (2005) large–scale survey of OSS developers showed that the majority felt that their returns outweighed their contributions. These findings buttress the “private–collective” model, originally proposed by von Hippel and von Krogh (2003), which postulates that OSS combines both collective action and individual rational choice by which people invest in group projects to derive private rewards such as knowledge, specific software required, monetary rewards and career benefits.
Even though developers are not only or not mainly reputation–oriented, reputation may still function in OSS communities as a signal of developers’ level of knowledge and ability to produce good quality software. Reputation provides an indirect way to assess the anticipated return, such as knowledge, learning experience, or the software itself, to developers’ private investment of time and effort. When uncertainty is high and information about the quality of work is not readily available, people use conventional signals, such as educational credentials, to make judgments about quality (Spence, 1973). This is difficult to achieve in the virtual world of OSS communities where normal credentials are hardly verifiable. People then tend to make evaluations on the basis of developers’ reputations, which are the only reliable signals of underlying quality. Observations of OSS communities show that developers don’t care who one is in the real world; degrees and certificates no longer matter. Instead, people value what software one writes in OSS communities (Benkler, 2006; Weber, 2004).
Therefore, regardless of developers’ motivations, reputation is an important factor in the strategic selection process. Specifically, reputation can be measured on two dimensions. Since OSS communities such as SourceForge normally do not provide any directly visible metric to measure developers’ reputations directly, past performance could serve as a proxy. Developers who have contributed to well–received products are sought after for collaboration and knowledge sharing, resulting in more connected positions in the network. A second metric of reputation comes from seniority. The longer a programmer has been involved with OSS communities, the more opportunities there are to accumulate expertise regarding OSS, most of which is tacit and experiential know–how, transferable only through practice (Nonaka, 1994). As a result, longer involvement in OSS signals experience and knowledge, which attract strategic affiliations.
H2: Developers who have better performance are likely to have more ties.
H3: Developers who have longer experience are likely to have more ties.
In contrast with strategic selection, homophily is an alternative attachment logic that has received consistent support from empirical studies (McPherson, et al., 2001). The word “homophily” was first coined by Lazarsfeld and Merton (1954); it refers to a tendency for people to be attracted to others who have similar attitudes, beliefs, and personal characteristics. Monge and Contractor (2003) summarized two lines of theoretical underpinnings of homophily: the similarity–attraction hypothesis and the theory of self–categorization. The similarity–attraction hypothesis postulates that people are more likely to interact with those who have similar traits (Byrne, 1971). Self–categorization theory argues that people tend to self–categorize with regard to race, gender, socio–economic status, and so on and they differentiate between similar and dissimilar others based on these attributes (Abrams and Hogg, 1999; Turner, 1987). Simply put, homophily is well illustrated by the old saying “birds of a feather flock together.”
Homophily has received strong support from empirical research, in terms of gender (Ibarra, 1992), race (Mollica, et al., 2003), and status (McPherson and Smith–Lovin, 1987). Homophily, especially with regard to gender, ethnicity, and occupation, has been found as a critical factor of relationship formation in social networks in general (McPherson, et al., 2001). Recent research of online social networks suggests that homophily is a strong predictor of relationship formation even if people are interacting through computer–mediated communication (Adamic, et al., 2003; Yuan and Gay, 2006).
OSS project teams are not assembled in the same manners as those in traditional hierarchical firms. By contrast, they emerge naturally as individuals choose which teams to join or which developers to associate with. As a result, attachment processes are likely to resemble the basic patterns of association in social worlds within which the developers are embedded. Ruef, et al., (2003) studied different group composition mechanisms in founding entrepreneurial teams, a situation where strategic rationality is expected to prevail. Yet, they found that group composition was largely based on similarity, rather than functionality or competence considerations. Another study on the choice of work–group partners also found that homophily was an important driver for partnership, suggesting that predictability was a sought after quality with future work partners (Hinds, et al., 2000).
In OSS communities where distributed project teams largely rely on asynchronous computer–mediated communication to coordinate collaboration, familiarity or predictability between developers may exert a stronger influence on the production processes (Hahn, et al., 2008). One important distinction between traditional work teams and Internet–based development communities is that social attributes such as race, gender, and education, upon which traditional homophilous ties are usually established, become much less, if at all, visible in cyberspace. Therefore, in OSS communities homophily is more likely to operate on other important constituents of identity, such as leadership roles, performance, and seniority, as represented by the following predictions:
H4: Project leaders are likely to connect with other project leaders.
H5: Developers are likely to connect with those who have similar levels of performance.
H6: Developers are likely to connect with those who have similar levels of experience.
In sum, two distinct network attachment logics, strategic selection and homophily, may guide the selection of partners in OSS online communities. Decisions to connect with other software developers could be made based on project leadership and reputation (as measured by performance and experience) of potential partners. It should be noted, however, that the two attachment logics are not meant to be mutually exclusive. On the contrary, past studies have found that network dynamics are often shaped by more than one attachment logics (Powell, et al., 2005). In the current case, we were interested in exploring the extent to which strategic selection and homophily each contributed to the social and collaborative dynamics in OSS communities.
SourceForge (www.SourceForge.net) is the largest OSS community on the Web, with more than 230,000 projects and over two million registered developers (Sourceforge, 2011). It provides hosting services for OSS project developers, allowing them to manage their source code, communicate with one another, track different versions of their work, and make their products available for download. SourceForge provides server logs to the academic community, with monthly snapshots containing both static (e.g., the original date when a user joined SourceForge) and dynamic (e.g., rank of a project at a particular month) statistics on projects and registered users’ activities.
This study selected a sub–community of developers from SourceForge who worked on BitTorrent–related software. BitTorrent is a peer–to–peer communications protocol for file sharing. It was first put into use in July 2001 with the earliest OSS BitTorrent project hosted in SourceForge registered in 2003. This sub–community satisfies two criteria: 1) the social network size is manageable; and, 2) the network is relatively exclusive and complete. This sub–community was extracted from the March 2007 SourceForge server logs from the data archive hosted at University of Notre Dame, by searching for projects with the word “BitTorrent” and its variations in the project descriptions. It should be noted that this approach did not distinguish between active and inactive projects. Developers form collaborative relations through participation in the same projects. Even though some projects became inactive at the time of study, developer relations established as a result of these projects still constitute an important part of the developer network. By including all projects regardless of their level of activity, it was possible to reconstruct a relatively complete network of collaboration, rather than biasing the analysis towards only the “successful” efforts.
Based on this set of projects, a list of developers (nodes) was retrieved for each project. Only OSS developers explicitly listed on the project Web page were included. Besides those explicitly listed developers, OSS projects may attract a large base of “peripheral” contributors, who make occasional contributions, such as testing various versions of the software and reporting bugs (Crowston, et al., 2006). Although peripheral members constitute an important component of the OSS community, their contributions tend to be quite small and sporadic. Therefore, this study included only those core developers who were listed as members of the official team according to the project Web page.
A 271X271 undirected developer–developer adjacency matrix was created by treating co–participation in one project as a tie between two developers. For example, if developer A and B worked together on Project X while developer A and C worked together on Project Y, a tie was created between A and B as well as A and C, but not B and C. This network was sparse (n = 271, network density = .021), with a large proportion of isolate developers (29.9 percent, n = 81). This was not surprising as previous OSS research reported the so–called “caveman” phenomenon, suggesting that a considerable portion of OSS developers tend to initiate projects and work on their own, with minimal collaborative activities (Ghosh, 2005; Krishnamurthy, 2002). Since the aim of this research is to examine the attachment logics of the OSS developer network, it is conceptually more focused and analytically more efficient to exclude isolates and study only the linked developers. Therefore, a connected network was extracted on the basis of the original network with all isolates removed (n = 190, network density = .04, mean degree = 7.79, SD of degree distribution = 6.52). Further analyses were performed on the second network and results presented below were also based on the second network.
Measures of leadership roles, performance and experience were obtained as nodal level attributes from the SourceForge data archive. The binary variable, leadership role, was determined by whether a developer was listed as a project administrator for at least one project at SourceForge. SourceForge does not provide a consistent performance measure for individual developers , so performance was measured by taking the sum of performance, measured as the software rank in percentile, of all the projects an individual developer had ever contributed to, either as a project leader or an explicitly listed member of a project team. Developer experience, measured in months, was obtained by calculating the difference between the original date they joined SourceForge (as early as the emergence of SourceForge in 1999) and the time of the dataset (March 2007). Leadership roles, performance and experience were all visible from either the project Web page or the developer Web page. Leadership role was a binary variable, while both performance and experience were continuous variables and were transformed into z–scores before analysis. Another binary variable was also included which measured whether a developer had participated in only one project. This variable was used in the exponential random graph modeling process to control for the inflated homophily effect where two or more developers collaborate on their only project.
Correlation analyses were performed on developer attributes, as highly correlated nodal attributes might interfere with model estimation. There were modest correlations between leadership and performance (Pearson correlation = .29, ρ < .001), leadership and experience (Pearson correlation = .31, ρ < .001), and performance and experience (Pearson correlation = .30, ρ < .001). These were not large enough to create problems for model estimation.
Exponential random graph (p*) models
The analyses to test the hypotheses were based on the p* family of exponential random graph models for social networks originally presented by Wasserman and Pattison (1996) and more recently extended by Pattison, Robins and associates (Robins, et al., 2007a; Robins, et al., 2007b). Just like logistic regression, ERG models estimate the likelihood for a particular network configuration to exist in a network more (or less) frequently than would be expected by chance alone, conditional upon other structural configurations of the network. The coefficients of ERG models are analogous to those in multiple logistic regression models. A positive and significant coefficient in an ERG model indicates that a specific structure exists in the network and the value of the estimate corresponds to the intensity of the effect. Yet, unlike logistic models, which assume that the probability of any tie does not depend on the value of any other tie, ERG models account for the interdependence of network structural characteristics. The current social network of developers was analyzed using the Statnet package (Handcock, et al., 2003) in R. Statnet is a specialized program for statistical network analysis that implements recent advances in modeling exponential random graphs (Handcock, et al., 2008).
For the current study, three converged models were estimated with Markov Chain Monte Carlo maximum likelihood estimation methods. All three models were estimated with the edges fixed and included baseline terms, terms for node attribute main effects to test strategic selection hypotheses, and terms for node attribute based mixing to test homophily hypotheses.
The baseline terms included two parameters. The first one was Alternating K–Star, a higher–order structural parameter (Snijders, et al., 2006). The second was an attribute–based “nodematch” parameter to capture the tie probability between developers who participated in only one project. This term was included to control for an inflated homophily effect that can occur when two developers are members of the same project.
To test strategic selection hypotheses (H1–H3), a series of terms representing node attribute main effects were included in the estimation. Significant and positive estimates for these terms would indicate that developers who were project leaders, who had better performance, and who had longer experience were likely to have more connections in the OSS community.
To test homophily hypotheses (H4–H6), a series of terms representing node attribute–based mixing were included in the estimation. A significant and positive “nodematch” parameter for project leadership would suggest that a tie was more likely to exist between two project leaders. Significant and negative “difference” parameters for performance and experience would suggest that a tie was more likely to exist between two developers who had similar levels of performance and experience.
Table 1 summarizes the estimation results of the three models. In addition to baseline parameters, Model 1 considered only strategic selection effects, while Model 2 homophily effects. Model 3 included all three blocks (baseline parameters, strategic selection and homophily effects). The likelihood–based measures of AIC and likelihood–ratio tests indicate that the full model (Model 3) fits best by a large margin , so we consider its coefficients the best estimates for the true magnitude of strategic selection and homophily effects. In the following, the results of hypotheses tests are reported based on Model 3.
Table 1: Summary of estimation models.
Note: *** ρ<.001; ** ρ<.01; * ρ<.05.
H Parameters Model 1 Model 2 Model 3 Coefficients S.E. Coefficients S.E. Coefficients S.E. Baseline parameters Alternating K–star -1.18 3.04 -1.40 4.91 -0.95 2.37 Node match
(one project only–one project only)
-0.15 0.10 0.08 0.09 -0.18 0.10 Strategic selection
(nodal attribute main effect)
H1 Project leadership -0.69*** 0.06 -0.71*** 0.06 H2 Developer performance 0.10*** 0.03 0.44*** 0.03 H3 Developer experience -0.11*** 0.01 -0.13*** 0.01 Homophily
(nodal attribute based mixing)
H4 Node match
-0.77*** 0.09 -0.06 0.04 H5 Difference of performance -0.01 0.04 -0.43*** 0.02 H6 Difference of experience -0.13*** 0.02 -0.10*** 0.02 Likelihood -2981.2 -3016.1 -2959.1 AIC 5972.4 6042.1 5934.1
H1 states that project leaders tend to have more connections in general. Contrary to the prediction, the results show that project leadership has a negative and significant impact on tie formation (coefficient = -0.71, ρ < .001). In other words, being a project leader decreases, rather than increases, one’s probability of having collaborative ties with other developers. H2 states that developers who have better performance tend to attract more ties. The parameter estimate strongly supports this hypothesis, showing a positive and significant performance main effect (coefficient = 0.44, ρ < .001). H3 predicts that developers with substantial experience tend to attract more ties. Contrary to our prediction, developers having a longer experience in the OSS community tend to have fewer collaborative ties (coefficient = -.13, ρ < .001).
H4 states that project leaders are more likely to connect with other project leaders. This hypothesis is not supported given that the parameter estimate of “nodematch” between project leaders is not significant (coefficient = -0.06, n.s.). H5 states that developers tend to connect with those who have similar levels of performance. The estimated effect of difference of performance is negative and significant (coefficient = -0.43, ρ < .001); therefore, H5 is supported. H6 states that developers tend to connect with those who have similar levels of experience. This hypothesis is also supported with the estimated effect of difference of experience is negative and significant (coefficient = -0.10, ρ < .001).
This study explored the social and collaborative dynamics in an open source community by examining two network attachment logics: strategic selection and homophily. A 190X190 undirected matrix of developer network was analyzed, along with developers’ project leadership and reputation (measured by performance and experience with SourceForge). Results provide partial empirical support for both the mechanisms of strategic selection and homophily, and such effects were particularly strong with regard to developer performance.
H1 and H4 concern the role of project leaders in an OSS community. Although previous studies largely acknowledged the significance of leadership (Lerner and Tirole, 2002), results show that the assumption that leaders are privileged in obtaining a structural edge (Grewal, et al., 2006) is questionable, at least in this particular OSS network. Being a project leader did not necessarily provide a project leader with any benefits in connecting with developers in general. Similarly, results also show that project leaders are not likely to connect with other leaders. These findings are noteworthy because they hold true when all the lone developers (isolates) in the network are removed, who are essentially disconnected project leaders. Therefore, these findings could not be easily accounted for by the “caveman” phenomenon (Krishnamurthy, 2002). Weber (2004) noticed a role reversal in OSS project teams in that leaders are “more dependent on the followers”  to collaborate on creative productions because every developer is free to move from project to project. In other words, every developer has the right to fork and start on their own. And precisely because of this ubiquitous freedom, project leadership may cease to become an important factor in developers’ strategic decision–making processes about choosing potential collaborators.
Developer performance was found to influence network formation through both strategic selection and homophily processes. Taken together, we found that accomplished developers tend to connect with other accomplished developers, essentially forming an elitist circle in the OSS community. By contrast, it is more difficult for less successful developers to establish collaborative relations, and even if they do, they tend to connect with others who have a similar lower level of performance and experience. In other words, it is unlikely for developers with a large “performance gap” to team up. The findings are suggestive of a “performance–based clustering” phenomenon within the online community in which most collaborative connections tend to be dominated by the good and reputable, while the rest usually occur between less established developers. The findings also indicate that the merit–based system of peer review for OSS project development described by Benkler (2006) may work for another important process in the community: collaborator selection. One possible explanation for this phenomenon is that a collaborative relation often requires serious commitment from both parties, which is quite different from less substantial relations such as citations or hyperlinks where network attachment does not require substantive effort. Therefore, even novice developers are motivated to strategically attach to the good and reputable in the community, they are unlikely to attain knowledge or status benefits quickly by linking with accomplished developers, because novices themselves are also strategically scrutinized by their potential partners. The general pathways for aspiring novices are not easy. They are unlikely to connect with accomplished and famous developers until they establish their own reputations, typically through collaborating with other novices.
This paper makes several contributions to the field of online community research. By conceptualizing an online community as a network of participants and examining the formation of social ties, this research demonstrates that social network analysis can be a useful approach to studying the dynamics of online social systems. Specifically, this research provides interesting empirical insights for assessing the impact of OSS developers’ leadership, performance and experience on creating connections in the online social network. This study tests two different network attachment logics — strategic selection and homophily. Both mechanisms received some support, which indicates that social network formation in peer production communities is a complex process that involves multiple mechanisms. Furthermore, this research also extends the theory of homophily in that it tested homophilous tie formation in distributed communities. Past research focused mostly on demographic variables (Monge and Contractor, 2003), using exact matches of those attributes to construct homophily matrices for analysis. This study is among the few studies (Huang, et al., 2009; Yuan and Gay, 2006) that examine homophily effects in distributed virtual communities in which demographic information is less, if at all, visible. Other factors, such as reputation in the community, may instead have an impact on tie formation.
This study supplements existing empirical research by analyzing a relatively stable and complete social network of OSS developers. Case studies of a few successful OSS projects characterized a large proportion of early OSS research, thus systematically sampling the “successful” rather than the “failing” OSS projects. Larger and more comprehensive samples have been used, but the selection scheme has often skewed the sample toward those with certain project metrics (e.g., top 100 projects as in Krishnamurthy, 2002). Because of the highly stratified nature of the OSS community, the “sampling the successful” approach does not provide a representative picture. Social network analysis, which usually demands a complete population rather than a selective sample, provides a better analytical lens to study the organizational dynamics.
This study has a number of limitations that warrant future research. First, as the current study analyzed collaborative relations based on a static snapshot of a dynamic network, the findings are not sufficient to establish the direction of causality. Although the prevalence of certain network properties provides some support for both strategic selection and homophily mechanisms, it does not necessarily preclude alternative explanations. For instance, although the results suggest that developers tend to connect with those who have similar levels of reputation, the homophily phenomenon observed here could be the outcome, rather than the origin, of a dynamic evolutionary network process. Future studies with longitudinal designs are needed to uncover the co–evolutionary dynamics of collaboration and developers’ attributes such as performance. One novel and informative approach is to visualize the software development process by dynamically plotting developers and their code commits (changes to the code or documentation) over time .
Second, this study analyzed a network of BitTorrent developers, which might not be inclusive of network ties established among other projects outside of this particular domain (e.g., two unconnected people in the BitTorrent network may have joint membership in a project outside of BitTorrent or even outside of SourceForge). Our study imposes an arbitrary cut–off point (BitTorrent projects in SourceForge) to limit the size of the network, which inevitably overlooks developer connections as well as developer reputation or experience established elsewhere. Future studies are encouraged to conduct similar analyses on a larger scale and if possible, to integrate SourceForge with other OSS communities such as GitHub (www.github.com) and Bitbucket (bitbucket.org) to reconstruct a more complete network landscape and developer profiles.
Third, although ERG models allow us to examine a particular network configuration dependent upon other structural and nodal properties, currently most ERG models focus on the estimation of dichotomous social network data. In other words, all the collaborative relations in the network are assumed to be homogenous regardless of the number or intensity of actual interactions (Goodreau, et al., 2009). Strictly speaking, even the number of joint projects is not a precise measure of tie strength either, because developers could have very different levels of involvement in the same project. Specifically, participation in the same project can result in a positive relationship (when two developers collaborate), a negative relationship (when they have conflicts) and a null relationship (when they do not interact at all). Indeed, a more nuanced measure is needed to capture the intensity of collaborative and social interactions between community participants. Such a measure could also consider the multiplex interactions between developers through various communication channels, such as collaboration in the same project team (as considered in the present study), communication in project forums, participation in mailing lists, and so on. With further development of ERG models for valued networks (Robins, et al., 1999), future research is encouraged to take into account not only the existence, but also the strength and multiplexity, of developer connections.
Finally, the OSS community represents only one of the many types of distributed peer production online communities and it is not clear whether the two network attachment mechanisms examined in this study are equally applicable elsewhere. Scholarly efforts have been made to uncover the network formation processes in other domains, including expertise networks (Zhang, et al., 2007), blogsphere (Adar and Adamic, 2005), Massively Multiplayer Online Games (Huang, et al., 2009), among others. Although these communities differ from each other on various dimensions, collectively they provide interesting cases by which to study emergent and innovative organizational forms through network relations. A promising future research direction is then to compare the social network formation processes across different types of online communities, which should provide more nuanced insights into the interplay among network mechanisms, individual motivations and system characteristics.
Online communities have become important sites for voluntary peer production on the Web. By examining “who connects with whom” in a community using social network analysis, this study tests the social drivers that structure the collaboration dynamics among a group of participants from SourceForge, the largest open source community on the Web (Sourceforge, 2011). The formation of the online social network was explored by examining two distinct network attachment logics: strategic selection and homophily. Results showed that both strategic selection and homophily received some support. In general, reputable developers with good performance tended to have more collaborative connections and developers also tended to connect with people with similar levels of performance and experience. Contrary to our expectation, project leadership did not exert any influence on network formation. Taken together, the results are suggestive of a “performance–based clustering” phenomenon within the OSS online community in which most collaborations involve accomplished developers, and novice developers tend to partner with less accomplished and less experienced peers. This study shows the promise of social network analysis as a useful approach for online community research. Testing and comparing network formation mechanisms in online social networks across different domains will open new avenues for understanding the social and collaborative dynamics in contemporary networked media environments.
About the authors
Cuihua Shen (Ph.D., University of Southern California) is an Assistant Professor in the Emerging Media and Communication Program, University of Texas at Dallas.
Direct comments to shencuihua [at] gmail [dot] com
Peter Monge (Ph.D., Michigan State University) is a Professor at the Annenberg School for Communication and Journalism and the Marshall School of Business, University of Southern California.
The preparation of this article was supported by a grant from the National Science Foundation (IIS–0838548) and by funding to the Annenberg Networks Network from the Annenberg School for Communication and Journalism. The authors are grateful to Greg Madey for access to the SourceForge Research Data Archive hosted at University of Notre Dame. They would also like to thank Tim Berners–Lee, Karim Lakhani, Seungyoon Lee, Drew Margolin, Garry Robins, Kimberlie Stephens, Chunke Su, Peng Wang, Matthew Weber, and members of the Annenberg Networks Network for help and comments.
1. SourceForge provides a user–generated “reputation” statistic for every developer, which is designed to facilitate transactions among service buyers and sellers. We did not use this statistic to measure reputation because it focuses on transactions and because this statistic is usually left blank for most SourceForge developers.
2. In addition to traditional measures of fit such as AIC and likelihood–ratio, Statnet also provides graphical goodness–of–fit examinations of simulated random networks based on estimated models. All the three models estimated in this study display a moderately good fit with observed network on degrees, edgewise shared partners and geodesic distance (Hunter, et al. 2008). Detailed goodness–of–fit graphs can be obtained from the authors.
3. Weber, 2004, p. 167.
Dominic Abrams and Michael A. Hogg (editors), 1999. Social identity and social cognition. Malden, Mass.: Blackwell.
Lada Adamic, Orkut Buyukkokten, and Eytan Adar, 2003. “A social network caught in the Web,” First Monday, volume 8, number 6, and at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1057/977, accessed 1 May 2011.
Eytan Adar and Lada Adamic, 2005. “Tracking information epidemics in blogspace,” WI ’05: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 207–214.
Paul S. Adler, 2001. “Market, hierarchy, and trust: The knowledge economy and the future of capitalism,” Organization Science, volume 12, number 2, pp. 215–234.
Albert–László Barabási, 2002. Linked: The new science of networks. Cambridge, Mass.: Perseus.
Yochai Benkler, 2006. The wealth of networks: How social production transforms markets and freedom. New Haven, Conn.: Yale University Press.
Donn Byrne, 1971. The attraction paradigm. New York: Academic Press.
Kevin Crowston and James Howison, 2006. “Hierarchy and centralization in free and open source software team communications,” Knowledge, Technology, and Policy, volume 18, number 4, pp. 65–85.
Kevin Crowston, Kangning Wei, Qing Li, and James Howison, 2006. “Core and periphery in free/libre and open source software team communications,” Proceedings of the Hawaii International Conference on System Sciences, pp. 118–124.
Joseph Feller, Brian Fitzgerald, Scott A. Hissam, and Karim R. Lakhani, 2005. “Introduction,” In: Joseph Feller, Brian Fitzgerald, Scott A. Hissam, and Karim R. Lakhani (editors). Perspectives on free and open source software. Cambridge, Mass.: MIT Press, pp. xvii–xxxi.
Nikolaus Franke and Eric von Hippel, 2003. “Satisfying heterogeneous user needs via innovation toolkits: The case of Apache security software,” Research Policy, volume 32, number 7, pp. 1,199–1,215.
Rishab Aiyer Ghosh, 2005. “Understanding free software developers: Findings from the FLOSS study,” In: Joseph Feller, Brian Fitzgerald, Scott A. Hissam, and Karim R. Lakhani (editors). Perspectives on free and open source software. Cambridge, Mass.: MIT Press, pp. 24–45.
Steven M. Goodreau, James A. Kitts, and Martina Morris, 2009. “Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks,” Demography, volume 46, number 1, pp. 103–125.
Rajdeep Grewal, Gary L. Lilien, and Girish Mallapragada, 2006. “Location, location, location: How network embeddedness affects project success in open source systems,” Management Science, volume 52, number 7, pp. 1,043–1,056.
Jungpil Hahn, Jae Yun Moon, and Chen Zhang, 2008. “Emergence of new project teams from open source software developer networks: Impact of prior collaboration ties,” Information Systems Research, volume 19, number 3, pp. 369–391.
Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris, 2008. “statnet: Software tools for the representation, visualization, analysis and simulation of network data,” Journal of Statistical Software, volume 24, number 1, pp. http://www.jstatsoft.org/v24/i01/.
Mark S. Handcock, David R. Hunter, Carter T. Butts, Steven M. Goodreau, and Martina Morris, 2003. “statnet: Software tools for the statistical modeling of network data,” version 2.0. Seattle, Wash.
Kieran Healy and Alan Schussman, 2003. “The ecology of open–source software development,” at http://www.kieranhealy.org/files/drafts/oss-activity.pdf, accessed 1 May 2011.
Pamela J. Hinds, Kathleen M. Carley, David Krackhardt, and Doug Wholey, 2000. “Choosing work group members: Balancing similarity, competence, and familiarity,” Organizational Behavior and Human Decision Processes, volume 81, number 2, pp. 226–251.
Yun Huang, Cuihea Shen, Dmitri Williams, and Noshir Contractor, 2009. “Virtually there: The role of proximity and homophily in virtual world networks,” Proceedings of the IEEE SocialCom 2009 Symposium on Social Intelligence and Networking (Vancouver, Canada), at http://dmitriwilliams.com/proximity.pdf, accessed 31 May 2011.
David R. Hunter, Steven M. Goodreau, and Mark S. Handcock, 2008. “Goodness of fit for social network models,” Journal of the American Statistical Association, volume 103, number 481, pp. 248–258.
Herminia Ibarra, 1992. “Homophily and differential returns: Sex differences in network structure and access in an advertising firm,” Administrative Science Quarterly, volume 37, number 3, pp. 422–447.
Sandeep Krishnamurthy, 2002. “Cave or community? An empirical examination of 100 mature open source projects,” First Monday, volume 7, number 6, at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/960/881, accessed 1 May 2011.
George Kuk, 2006. “Strategic interaction and knowledge sharing in the KDE developer mailing list,” Management Science, volume 52, number 7, pp. 1,031–1,042.
Karim R. Lakhani and Robert G. Wolf, 2005. “Why hackers do what they do: Understanding motivation and effort in free/open source software projects,” In: Joseph Feller, Brian Fitzgerald, Scott A. Hissam, and Karim R. Lakhani (editors). Perspectives on free and open source software. Cambridge, Mass.: MIT Press, pp. 3–21.
Karim R. Lakhani and Eric von Hippel, 2003. “How open source software works: ‘Free’ user–to–user assistance,” Research Policy, volume 32, number 6, pp. 923–943.
Paul F. Lazarsfeld and Robert K. Merton, 1954. “Friendship as a social process: A substantive and methodological analysis,” In: Morroe Berger, Theodore Abel, and Charles H. Page (editors). Freedom and control in modern society. New York: Van Nostrand, pp. 18–66.
Gwendolyn K. Lee and Robert E. Cole, 2003. “From a firm–based to a community–based model of knowledge creation: The case of the Linux Kernel development,” Organization Science, volume 14, number 6, pp. 633–649.
Josh Lerner and Jean Tirole, 2002. “Some simple economics of open source,” Journal of Industrial Economics, volume 50, number 2, pp. 197–234.
Kimberly Ling, Gerard Beenen, Pamela Ludford, Xiaoqing Wang, Klarissa Chang, Xin Li, Dan Cosley, Dan Frankowski, Loren Terveen, Al Mamunur Rashid, Paul Resnick, and Robert Kraut, 2005. “Using social psychology to motivate contributions to online communities,” Journal of Computer–Mediated Communication, volume 10, number 4, pp. 1–32, and at http://jcmc.indiana.edu/vol10/issue4/ling.html, accessed 1 May 2011.
Gregory Madey, Vincent Freeh, and Renee Tynan, 2004. “Modeling the F/OSS community: A quantitative investigation,” In: Stefan Koch (editor). Free/open source software development. Hersey, Pa.: Idea Group Publishing, pp. 203–220.
Gregory Madey, Vincent Freeh, and Renee Tynan, 2002. “The open source software development phenomenon: An analysis based on social network theory,” Proceedings of the Americas Conference on Information Systems (Dallas, Tex.), pp. 1,806–1,813.
Miller McPherson and Lynn Smith–Lovin, 1987. “Homophily in voluntary organizations: Status distance and the composition of face–to–face groups,” American Sociological Review, volume 52, number 3, pp. 370–379.
Miller McPherson, Lynn Smith–Lovin, and James M. Cook, 2001. “Birds of a feather: Homophily in social networks,” Annual Review of Sociology, volume 27, pp. 415–444.
Robert K. Merton, 1968. “The Matthew Effect in science,” Science, volume 159, number 3810, pp. 56–63.
Kelly A. Mollica, Barbara Gray, and Linda K. Trevino, 2003. “Racial homophily and its persistence in newcomers’ social networks,” Organization Science, volume 14, number 2, pp. 123–136.
Peter R. Monge and Noshir S. Contractor, 2003. Theories of communication networks. New York: Oxford University Press.
Ikujiro Nonaka, 1994. “A dynamic theory of organizational knowledge creation,” Organization Science, volume 5, number 1, pp. 14–35.
Tim O’Reilly, 2005. “What is Web 2.0,” at http://oreilly.com/web2/archive/what-is-web-20.html, accessed 1 May 2011.
Michael Ogawa and Kwan–Liu Ma, 2009. “code_swarm: A design study in organic software visualization,” IEEE Transactions on Visualization and Computer Graphics, volume 15, number 6, pp. 1,097–1,104.
Pamela E. Oliver and Gerald Marwell, 1988. “The paradox of group size in collective action: A theory of the critical mass. II,” American Sociological Review, volume 53, number 1, pp. 1–8.
Mancur Olson, 1965. The logic of collective action: Public goods and the theory of groups. Cambridge, Mass.: Harvard University Press.
Joel M. Podolny and Karen L. Page, 1998. “Network forms of organization,” Annual Review of Sociology, volume 24, pp. 57–76.
Walter W. Powell, 1990. “Neither market nor hierarchy: Network forms of organization,” Research in Organizational Behavior, volume 12, pp. 295–336.
Walter W. Powell, Douglas R. White, Kenneth W. Koput, and Jason Owen–Smith, 2005. “Network dynamics and field evolution: The growth of interorganizational collaboration in the life sciences,” American Journal of Sociology, volume 111, number 5, pp. 1,463–1,568.
Eric S. Raymond, 2001. The cathedral and the bazaar: Musings on Linux and open source by an accidental revolutionary. Cambridge, Mass.: O’Reilly.
Eric S. Raymond, 2000. “Homesteading the noosphere,” at http://catb.org/~esr/writings/homesteading/homesteading/, accessed 28 May 2011; earlier version in First Monday, volume 3, number 10, at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/621/542, accessed 31 May 2011.
Garry Robins, Philippa Pattison, and Stanley Wasserman, 1999. “Logit models and logistic regressions for social networks: III. Valued relations,” Psychometrika, volume 64, number 3, pp. 371–394.
Garry Robins, Pip Pattison, Yuval Kalish, and Dean Lusher, 2007a. “An introduction to exponential random graph (p*) models for social networks,” Social Networks, volume 29, number 2, pp. 173–191.
Garry Robins, Tom Snijders, Peng Wang, Mark Handcock, and Philippa Pattison, 2007b. “Recent developments in exponential random graph (p*) models for social networks,” Social Networks, volume 29, number 2, pp. 192–215.
Martin Ruef, Howard E. Aldrich, and Nancy M. Carter, 2003. “The structure of founding teams: Homophily, strong ties, and isolation among U.S. entrepreneurs,” American Sociological Review, volume 68, number 2, pp. 195–222.
Tom A.B. Snijders, Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock, 2006. “New specifications for exponential random graph models,” Sociological Methodology, volume 36, number 1, pp. 99–153.
Sourceforge, 2011. “Homepage,” at http://www.sourceforge.net, accessed 1 May 2011.
Michael Spence, 1973. “Job market signaling,” Quarterly Journal of Economics, volume 87, number 3, pp. 355–374.
John C. Turner, 1987. Rediscovering the social group: Self–categorization theory. New York: Blackwell.
Eric von Hippel and Georg von Krogh, 2003. “Open source software and the ‘private–collective’ innovation model: Issues for organization science,” Organization Science, volume 14, number 2, pp. 209–223.
Stanley Wasserman and Philippa Pattison, 1996. “Logic models and logistic regressions for social networks: I. An introduction to Markov graphs and p*,” Psychometrika, volume 61, number 3, pp. 401–425.
Steve Weber, 2004. The success of open source. Cambridge, Mass.: Harvard University Press.
Y. Connie Yuan and G. Gay, 2006. “Homophily of network ties and bonding and bridging social capital in computer–mediated distributed teams,” Journal of Computer–Mediated Communication, volume 11, number 4, at http://jcmc.indiana.edu/vol11/issue4/yuan.html, accessed 1 May 2011.
Y. Connie Yuan, Dan Cosley, Howards T. Welser, Ling Xia, and Geri Gay, 2009. “The diffusion of a task recommendation system to facilitate contributions to an online community,” Journal of Computer–Mediated Communication, volume 15, number 1, pp. 32–59, and at http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2009.01491.x/abstract, accessed 1 May 2011.
David Zeitlyn, 2003. “Gift economies in the development of open source software: Anthropological reflections,” Research Policy, volume 32, number 7, pp. 1,287–1,291.
Jun Zhang, Mark S. Ackerman, and Lada Adamic, 2007. “Expertise networks in online communities: Structure and algorithms,” Proceedings of the WWW2007 (Banff, Canada), at http://www2007.org/papers/paper516.pdf, accessed 1 May 2011.
Received 4 May 2011; revised 29 May 2011; accepted 30 May 2011.
“Who connects with whom? A social network analysis of an online open source software community” by Cuihua Shen and Peter Monge is licensed under a Creative Commons Attribution–NonCommercial–NoDerivs 3.0 Unported License.
Who connects with whom? A social network analysis of an online open source software community
by Cuihua Shen and Peter Monge.
First Monday, Volume 16, Number 6 - 6 June 2011
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2013.