Abstract for: Benchmarking Estimation of Dynamic Models in Social and Behavioral Sciences

Reliable parameter estimation is crucial for impactful simulation models in management, sociology, and behavioral sciences. However, universally applicable methods for estimating complex nonlinear models with process and measurement noise are lacking. Without such methods, objectively assessing estimation quality remains central to advancing model estimation practices and enhancing confidence in dynamic simulation models across research communities and practical applications. We introduce a novel benchmarking framework comprising canonical dynamic simulation models from diverse management research communities, methods for generating synthetic datasets, and comprehensive performance metrics for evaluating inference quality. Our approach includes standardized, aggregated measures enabling comparisons across multiple inference tasks at different scales. Using Neural Posterior Estimation (NPE), we identified common learning trajectories, as well as distinct complexities of inference tasks across models. Our benchmarking experiments show that the NPE method is robust, performing effectively across diverse models when sufficient training data is available. Inference quality was generally stable despite increasing parameter dimensionality; however, accurately estimating process noise parameters remained challenging. The experiments also demonstrated feasibility of amortized inference within reasonable computational timeframes, which highlights potential for further efficiency gains through sequential neural network-based estimation methods (SNPE). Our research identifies clear pathways for future improvements. Benchmarking could be expanded to include additional estimation techniques like Neural Likelihood Estimation (NLE) and traditional methods, alongside optimizing estimation hyperparameters. Introducing models with tractable likelihood functions or hierarchical Bayesian frameworks could provide theoretical performance insights. We hope our benchmarking framework serves as a foundational step toward developing rigorous and efficient parameter estimation methods for dynamic simulation models. for code debugging