Towards a Model-Based Approach: Applications to Historical Demography and Palaeodemography

There is a large variety of different kinds of models. However we think that they all have in common to represent something beyond themselves: they are representations of parts of the world. As scientists, we are driven to select only a few aspects of the phenomena studied. However these aspects will be characterised with great precision. This explains why models are only considering parts of the world as we are driven to select only a few aspects of the phenomena studied. We will first give an historical presentation of models to show their usefulness in the past. We will then develop agent-based models, which are most used in demography, but also in historical demography. Some of them are not able to solve this incompleteness. Finally, we will show how a deeper philosophical approach of these problems may permit a true scientific treatment of explanation in demography


Introduction
I will first try to show here that modelling do not really constitute a new approach in the history of demography, but has always been occurring in the past.However some more recent methods, which we will later present, may lead to further development in the understanding of behaviour of past and contemporary populations.This conference is dedicated to past populations, for which data are sparser, than for contemporary ones.So that, for these past populations, modelling will appear often necessary: for example to reconstruct the missing data, or to understand past behaviours.However, my presentation will present a more general view of modelling, in order to understand more clearly its usefulness.

Let us now define what we consider as models.
There is a large variety of different kinds of models, and a number of ways in which they function in the service of science.
However we think that they all have in common to represent something beyond themselves: as suggested by Richard Giere (1988), they are representations of parts of the world by the relationship of similarity.
First, as scientists we are driven to select only a few aspects of the phenomena studied.However these aspects will be characterised with great precision, in order to be able to give a good representation.This explains why models are only Second, what do we consider as a relationship of similarity?We cannot in social science, and even in every scientific activity, be able to deduce facts from empirical "laws", but only from a formal structure, which generates explanation within its own boundaries.Such a formal structure is only similar to the observations of the real world, but represents their functional architecture as Robert Franck says in his book: The explanatory power of models (2002).We will come back to this notion of function in the last part of this presentation.
We will first give in this talk, an historical presentation of models to show their usefulness in the past.
We will then develop agent-based models, which are most used in demography.Eric Silverman's book on Methodological investigations in agent-based modelling (2018), gives a good discussion of the problems they raised and we will try here to show their incompleteness.Other kinds of simulation models will also be shortly presented.Again some of them are not able to solve this incompleteness.However the model using recursive Bayesian networks escape to this criticism.
Finally, we will show how a deeper philosophical approach of these problems may permit a true scientific treatment of explanation in demography, and more generally in social sciences.This approach is already followed in other sciences like natural sciences or biology.I hope that social science will follow it.

History of models
As we previously said, modelling has a long history in demography.
The first researcher, using a cross-sectional approach, was Euler in1760, who wrote a memoir on the multiplication of the human species.He made interesting hypotheses about the evolution of human populations which lead to a precise model of stable populations His first hypothesis is based on the 'vitality or power of life that is specific to humans'.It leads to equating this vitality with the probability of dying at each age, assumed identical for all persons of the same age.
His second hypothesis rests on 'the principle of propagation, which depends on marriage and fertility'.Again he identifies this principle with the fact that 'the number of children born every year is always proportional to the number of all living persons'.Even if this definition of a fertility rate may be now considered as very rough, it was in his time quite interesting.
His third hypothesis is that 'the two principles of mortality and propagation are independent of each other'.Therefore, he does not need to take into account of possible interactions between these two probabilities.Again this hypothesis needs a new approach.
From these three principles he was able to reconstruct everything that can be said about such populations, whatever they are.In order to do this he will use a large corpus of observations made by Süssmilch, 'that seem adequate to settle most of the questions arising in this research'.Such an approach, even if it was in many senses approximate, was the first model of population evolution through time.In fact, we could speak here not of an axiomatization of demography in the full sense, but of a proto-axiomatization of population sciences.
We had to wait until Lotka in 1639 to continue this work, with what he called Malthusian populations.These populations are keeping all along the time the same mortality and the same age distribution.Such a population becomes a stable population when its fertility function remains also constant through time.As he showed, if these formerly variables rates become identical at a given moment, the stable population is not reached immediately, but only as a limit.
We can draw a parallel between these results and the first law of Newton's theory of gravitation found in 1687.This inertia principle says: 'Every body perseveres in his state of rest, or uniform motion in a right line, unless it compelled to change this state by forces impress'd thereon.' Similarly, any population whose fertility and mortality are assumed to become constant from a given instant will end towards the stable population that meets these conditions.But whereas the physical body immediately acquires its uniform motion when no force acts upon it, the stable population is not attained at once, since it is a limit.
However, Bourgeois-Pichat showed in 1994, that, in reality, many observed populations, at this time, attain without any delay the stable state if their age distribution remains constant through time: he called them: semi-stable populations.This is less true today, with the downtrend in fertility in these countries.
It is also easy to flesh out this model of population change with age-specific emigration and immigration rates, expressed as net migration rates.This yields a basic relationship between age structure, mortality, fertility, and migration at a given point in time as Samuel Preston and Ansley Coale showed in1 982.
However, such a model does not introduce the social, economic, religious, political, and other characteristics of the society or group in which these events occur, which we will now consider.
In order to do that, we will directly jump to more recent approaches, like the models introduced by Francesco Billari and Alexia Prskawetz in 2003: agent-based modelling.

Agent-based models and others
The agent-based models are derived from simulation analyses used by mathematicians and physicists.The economist Schelling in 1971 suggested their use to study segregation processes, while the ecologists Botkin et al. at the same time proposed a computer model to predict changes in forest growth.In the nineties, these models spread to several social sciences.
Those who introduced them often took care not to consider each science separately, but to view them as a whole, incorporating the spectrum of social processes-demographic, economic, sociological, political, and so on.Rather than modelling specific data, this approach models theoretical ideas and is based on computer simulation.Its aims is to understand how the behaviour of biological, social or more complex systems arise from the characteristics of the individuals or more general agents composing these systems.
In demography Billari and Prskawetz clearly said: ' Different to the approach of experimental economics and other fields of behavioral science that aim to understand why specific rules are applied by humans, agent-based computational models pre-suppose rules of behavior and verify whether these micro based rules can explain macroscopic regularities.'This is therefore, mainly a bottom-up approach, with population-level behaviour emerging from rules of behaviour of autonomous individuals.No necessity to introduce other levels to understand demography: the individual rules of behaviour will explain macroscopic regularities.
As I have already said such an approach is presented in many papers of this workshop as a way to introduce models in historical demography and palaeodemography.
This approach is very interesting, as it eliminates the need for empirical data, on personal and individual characteristic, so difficult to introduce in historical demography.It is based on simple decision rules followed by the individuals, which can explain some macro phenomena observed from the rare data observable in these historical sciences.A theoretical model of this kind cannot be validated in the same way as an empirical model.A frequentist inference is not applicable to such models, as the probability that an unknown parameter lies in a given interval has no signification in such cases.In Franck's words: 'one had ceased to credit deduction with the power of explaining phenomena.Explaining phenomena means discovering principles which are implied by the phenomena.' As the agent-based approach focuses on the mechanisms driving the action of individual as agents, it will simulate the evolution of such a population from simple rules of behaviour.It may thus use game theory, complex systems theory, evolutionary programming, and-to introduce randomness-Monte Carlo methods.This workshop will show us the preferred methods for historical demography.
It may also use survey data, not to explain the studied phenomenon, but only to verify if the parameters used in the simulation, lead to behaviour similar to the one observed in the survey.
I will present shortly here an application of such a model in palaeodemography, from a paper of Robert Axtell et al. in 2002 on the collapse of the Anasazi population, observed from the ninth to the beginning of the fourteenth century.They showed that using simple household rules on choosing location for farms, and introducing eight adjustable parameters for agents and landscape heterogeneity, they were able to simulate population evolution during this period.In order to choose the best model, they used the cumulated absolute values of the differences between the observed and simulated populations.In this case, simulated population levels closely follow the historical trajectory.The model analysis also shows that the abandonment of the valley at the beginning of the fourteenth century cannot be explained solely by environmental variations.This model is one of the icon models in the agent-based community.
However Marco Janssen, in 2009, showed that this model does not provide much information beyond a comparatively simple model based on two parameters that adjust the carrying capacity of the valley where this population lived: 'The reason for this is that the model act as a smoothing function of the input data and has limited endogenous dynamics that contribute to the aggregated population data.' This last point leads us to some problems raised by agent-based models.
The fist problem is that these models are intended to represent the import and impact of individual actions on the macrolevel patterns observed in a complex system.This implies that a phenomenon emerging at aggregate level can entirely be explained by individual behaviour.John Holland in 2012, while recognizing that agent based models have been a major tool for studying complex adaptive systems in the last twenty years, insist on these limitations.He said that they 'include a little amount of relevant mathematics and so far little provision for agent conglomerates that provide building blocks and behaviour at higher level of organization.'I can give an example for this in demography in my books of 2002 and 2007, on multilevel analysis.I study the fact of being working in the agricultural sector on the migration behaviour of a Norwegian cohort, using different approaches.
We have already spoken of the cross-sectional analysis.But we have not spoken about the way it introduces different  This gives an opposite result to the previous one: the probability of migrating of farmers is now more than a third less than for the other occupations.
In order to explain these contradictory results it is necessary to use a multilevel model, which introduces simultaneously the individual behaviour and the aggregate one.Such a model permits to understand the contradictory results obtained with the two previous models: the fact of being a farmer still strongly reduces the probability of migrating, while the fact of living in a region with a large percentage of farmers increase the probability of migration only for non-farmers.
In conclusion, it seems hard if you consider only the individual level, as in an agent-based model, to explain a contradictory result obtained at the macro-level.I think that often aggregate-level rules cannot be modeled with purely micro-level rules, for they transcend the behaviours of the component agents.
The second problem lies in the search of a good explanation of the observed phenomenon.There are no clear guides to find rules, and their choice often appears as arbitrary.Conte et al. in their Manifesto of computational social science, in 2012, clearly said about agent-based models: 'First, how to find out the simple local rules?How to avoid ad hoc and arbitrary explanations?As already observed, one criterion has often been used, i.e., choose the conditions that are sufficient to generate a given effect.However, this leads to a great deal of alternative options, all of which are to some extent arbitrary.The construction of plausible generative models is a challenge for the new computational social science.' For example, without factoring the influence of networks on individual behaviour, we can hardly explain a macro-behaviour only by aggregating individual behaviours.
In order to obtain more satisfactory models, we must introduce, for example, explicit decision-making theories, but also representations, attitudes, strategies, motivations, etc.Unfortunately, the choice of a good theory is influenced by the researcher's discipline and can produce highly different results for the same studied phenomenon.For example, how to choose between the theory of utility maximization, mainly used by economists, and the theory of planned behaviour, mainly used by sociologists?
The third problem lies in the validation of an agent-based model.We have already said that the usual validation tests for statistical analysis are not applicable to those models.How then can we say that the model explains the observed phenomenon?
Günter Küppers and Johannes Lenhard said in 2005, in a paper on Validation of simulation: 'It is consensus in literature that validation constitutes one of the central epistemological problems of computer simulation methods.Especially in the case of simulations in the social sciences the answers given by many authors are not satisfactory.' For example, in the case of the Anasazi population, the authors give different ways to compare the number of simulated and measured populations, but they were unable to say what the best one was.Even they were not able to see that another model using only two parameters replicates as well the data.
So that generally, one typically starts with some data to be explained, and the simulation is said to be successful, if the interaction of some interaction rules lead to an approximate reproduction of some structural characteristics of the data.
There are no clear ways to say that an agent-based model may be structurally accurate.We can conclude that there are no clear verification and validation procedures for agent-based models in social sciences.
We need a more clearly designed modeling in order to promote a true scientific approach.
As we have previously said, even if agent based models are prominent in social sciences, there exist other kinds of simulation models.We restrict however our investigation to simulation models used in demography.
Macro-simulation and micro-simulation are some other main alternative methods used in demography for making similar statements about the future.You may have a very complete presentation of them in a paper from Evert Van Imhoff and Wendy Post: Microsimulation methods for population projections in 1998.They are subject to the first previous problem, as macro-simulation works on the aggregate level only, and micro-simulation only on the individual level.Also the validation problem affects these simulations as generally all projections of population lead to incorrect results: projection errors increase systematically as they look further ahead.
More recently appeared models based on recursive Bayesian networks.These models can be applied to modelling the hierarchical structure of observed phenomena.It then avoids the first previous one-level problem.Lorenzo Cassini et al. in 2013 applied such a model to cancer.The higher level of this model contains variables at the clinical level, while the lower level maps the structure of the cell's mechanism for apoptosis.On introducing mechanisms, as we will further see, this approach avoids the validation problem.So that, in order to be applicable to demography this approach needs to be ascertained, with a mechanistic approach for demography.
However in view of these problems, for the major part of models used in demography, we don't have to throw the baby out with the bathwater!We will have now to show why this baby "simulation" is an important one and how to save it.

Towards a model-based synthesis
I have already cited the volume by Franck in 2002, in which you will find a clear explanation of the usefulness of simulation models.We will summarize here this explanation, and give its application to demography, which you will find in a paper later published in 2017: Model based demography: towards a research agenda?, which I wrote with him and two demographers Jakub Bijak and Eric Silverman.
First, a semantic approach of theories was developed, for example by Frederick Suppe, in his 1989 book on: The semantic conception of theories and scientific realism.This book gives some ways towards offering a satisfactory epistemological basis for differentiating formal explanatory models from empirical explanatory models.
He begins by saying that: ' A science has not to deal with phenomena in all their complexity; rather it is concerned with certain kinds of phenomena only insofar as their behaviour is determined by, or characteristic of, a small number of parameters abstracted form those phenomena'.
So that, a formal explanatory model is like a filter that retains only a small number of parameters, chosen as the object of research.For demography the concept of statistical individual follows perfectly this approach and the parameters will be the functions of fertility, mortality and migration, which channel the investigation.
As I said in Probability and social science, in 2012: ' Under this scenario, two observed individuals, with some identical characteristics, will certainly have different chances of experiencing a given event, for they will have an infinity of other characteristics that can influence the outcome.By contrast, two statistical individuals, seen as units of a repeated random draw, subjected to the same sampling conditions and possessing the same characteristics, will have the same probability of experiencing the event.' On the contrary, empirical explanatory models try to seek explanations of social facts from empirical regularities.He showed that this view is wrong, as it obscures much of epistemic importance other analyses can reveal.
Following the semantic approach, a theory is a formal system, empty of any empirical content.So that, the explanation of empirical facts consists of deducing them from a formal system, and not from empirical laws.Such a formal structure is here elaborated by the researcher, without any consideration of the empirical data, and its components are conceptual or mathematical.
As Burch said in 2002, such an explanation involves: '(a) creation of a logical structure of variables and their relationships, a structure which logically implies or entails the event; (b) demonstration that there is correspondence or "isomorphism" between the logical structure and the real-world context in which the event is embedded.' The notion of "isomorphism" however raises some problems for social scientists while it is possible to apply it to a theoretical physical model.In social sciences we can only observe real populations, which are not under idealized circumstances as in physical science.So that, we may have some doubts on this notion and say that there is no easy way to demonstrate isomorphism.As Burch says: 'This problem of how to assess the relationship between complex simulation models and empirical data has plagued the practice of computer modelling from the beginning and has yet to be adequately resolved.' So that the main question is the following one: how would one identify the relationship between the theoretical model and the empirical observations, and test the fit of a simulation model?As Burch says: 'correct prediction can result from a model with incorrect assumptions and inputs'.
In order to go further and to enrich this approach, we will rely on model-based science which is also known as a mechanistic view.We will follow here the presentation given by Franck for social sciences in two thousand two, and its application to demography we gave with him, Bijak and Silverman in 2017.
First such a model permits to generalize the results obtained by a semantic model, to real observations which are not under idealized circumstances.In order to do that, it introduces some new concepts or some more old concepts which had lost their original sense.Let us see them in more detail.
For the pioneers of modern science in the seventeenth century (Bacon for example on 1620 in Novum Organon), induction consists of discovering the principle of a phenomenon from the study of its properties.Later however, from the eighteenth century, the philosophers consider its usual sense of generalization (Hume, Mill or Popper for example).Deduction which had often been considered as the main scientific instrument has ceased to have the power of explaining phenomena.Harold Keynes says in his Theory of probability, in 1939: 'the tendency to claim that scientific method can be reduced in some way to deductive logic, which is the most fundamental fallacy of all…' This research by induction will permit us to find the theoretical explanation, which consists of discovering the combination of concepts without which the observed properties of a phenomenon would be inconceivable or impossible.
Simultaneously it will give us an empirical demonstration on using the factors operative in a given society.
The theoretical explanation represents the conceptual structure of the phenomenon studied.The empirical explanation represents the social factors which give rise to it.
The notion of mechanism had been defined in different ways, according to the considered science, but we prefer the one, more general, given by Franck and following well the previous definitions: 'The formal (conceptual) model is the form of the social mechanism, and the social mechanism is the matter of the formal model.'Such a form will be constructed by an axiomatization of the discipline.
We are now able to give the general method, proposed by Franck, in order to be able to give a mechanistic explanation: '(1) Beginning with the systematic observation of certain properties of a given social system, (2) we infer the formal (conceptual) structure which is implied by these properties.
(3) This formal structure, in turn, guides our study of the social mechanism which generates the observed properties.( 4) The mechanism, once identified, either confirms the advanced formal structure, or indicates that we need to revise it.' Such a process has already been followed for the study of probability, which is an important tool for demography.From its conception by Pascal in 1654 to its formal axiomatization by Kolmogorov in 1933, a great number of attempts had failed.
Similarly demography conception occurred with Graunt in 1662, and we have already seen a proto-axiomatization by Euler in 1760.But a full axiomatization as proposed by Franck had not yet been accomplished.

Conclusion
For the moment, we have four paradigms which had been followed during the history of demography: first a crosssectional, one from sixteen sixty two until the end of the Second World War; followed by a longitudinal version which went at the beginning of the eighties to a full-fledged event history approach; fourth a multilevel one beginning at the middle of the eighties.Each paradigm did not cancel the previous one but gives a new point of view for demography.
Model-based approach provides us with the means to expand the range of benefices already provided by the four previous paradigms.We gain deeper insight into the interactions between various population systems, and we also gain the capacity to explore the parameter space of the simulations by generating "what-if" scenarios.Simulation parametersonce they result from the functional-mechanistic approach-govern the way in which the complex, interacting social processes in the model work.
Qeios, CC-BY 4.0 • Article, October 2, 2023 Qeios ID: D5RTMR • https://doi.org/10.32388/D5RTMR5/13aggregate level characteristics in order to explain a phenomenon.This is done on introducing these characteristics in a regression model.The following figure gives the increase of the estimated migration rates with the proportion of farmers present in each Norwegian region.