The Price-Pareto growth model of networks with community structure
Brzozowski, Gagolewski, Siudem et al.
We introduce a new analytical framework for modelling degree sequences in individual communities of real-world networks, e.g., citations to papers in different fields. Our work is inspired by Price's model and its recent generalisation called 3DSI (three dimensions of scientific impact), which assumes that citations are gained partly accidentally, and to some extent preferentially. Our generalisation is motivated by existing research indicating significant differences between how various scientific disciplines grow, namely, minding different growth ratios, average reference list lengths, and preferential citing tendencies. Extending the 3DSI model to heterogeneous networks with a community structure allows us to devise new analytical formulas for, e.g., citation number inequality and preferentiality measures. We show that the distribution of citations in a community tends to a Pareto type II distribution. We also present analytical formulas for estimating its parameters and Gini's index. The new model is validated on real citation networks.
academic
The Price-Pareto Growth Model of Networks with Community Structure
This paper proposes a novel analytical framework for modeling degree sequences of individual communities in real-world networks, such as citation patterns across different research fields. The work is inspired by the Price model and its recent generalization, the 3DSI (Three Dimensions of Scientific Impact) model, which assumes that citations are acquired partly randomly and partly through preferential attachment. The research motivation stems from existing evidence showing significant differences across scientific disciplines in growth patterns, including varying growth rates, average reference list lengths, and preferential citation tendencies. The 3DSI model is extended to heterogeneous networks with community structure, enabling the design of new analytical formulas to calculate citation inequality and preferentiality measures. The study demonstrates that citation distributions within communities tend toward Pareto II distributions and provides analytical formulas for estimating their parameters and Gini coefficients.
This research addresses the limitation that existing citation network models cannot effectively handle community structure. Traditional network growth models such as the Barabási-Albert model and Price model, while capable of explaining scale-free properties, are based on relative homogeneity assumptions and cannot capture network features with local variability, particularly networks with community structure.
Disciplinary Differences: Different scientific disciplines exhibit significant variations in network growth patterns, including growth rates, average reference list lengths, and preferential citation tendencies
Ubiquity of Community Structure: Community structure plays important roles in biological, urban, and social networks but is frequently overlooked in modern citation network modeling
Missing Analytical Tools: Lack of analytical tools that simultaneously provide theoretical insights and handle community structure
Proposes Price-Pareto Growth Model: Extends the 3DSI model to heterogeneous networks with community structure, allowing different communities to have different parameters
Theoretical Analysis: Proves that citation distributions within communities converge to Pareto II distributions and derives related analytical formulas
Gini Coefficient Formula: Provides exact analytical formulas for computing Gini coefficients within communities and for the entire network
Parameter Estimation Methods: Develops multiple parameter estimation methods, particularly estimators based on Gini coefficients
Empirical Validation: Validates model effectiveness on CORA and DBLP datasets
Input: Citation networks with community structure
Output: Degree sequence models for each community and their parameters
Objective: Accurately model citation distribution characteristics within each community
Local Time Concept: Introduces local time relative to community size, enabling handling of communities with different growth rates
Mixed Distribution Handling: Models network growth randomness through negative binomial distribution, precisely calculating accidental income
Effective Parameters: Introduces ν_i as an "effective" version of ρ in the standard 3DSI model, simplifying analysis
Asymptotic Analysis: Proves degree distribution convergence to Pareto II distribution, establishing connections between Price model and Pareto distribution
Parameter Heterogeneity: Significant variation in ρ̂ values across different disciplines within the same network, confirming that different disciplines have different ratios of accidental to preferential citations
Tail Fitting Advantage: Model shows particularly good fitting quality in distribution tails, important for understanding high-citation paper distributions
Global Consistency: Weighted averages of community models are highly consistent with global 3DSI model
Global Gini Coefficient:
Represented through integration of mixed distributions involving complex hypergeometric functions, with practical approximation formulas provided.
Price (1965): Networks of scientific papers - Original Price model
Siudem et al. (2020): Three dimensions of scientific impact - 3DSI model
Albert & Barabási (2002): Statistical mechanics of complex networks - BA model
Fortunato (2010): Community detection in graphs - Community detection survey
Holland et al. (1983): Stochastic blockmodels - Stochastic block model
This paper makes important contributions at the intersection of network science and scientometrics, providing new theoretical tools for understanding network growth with community structure through rigorous mathematical analysis and empirical validation.