RECORDED ON NOVEMBER 12th 2025.
Dr. Marina Dubova is an Omidyar postdoctoral fellow at the Santa Fe Institute. Her research aims to reveal and inform the cognitive mechanisms of discovery. She develops formal (e.g., computational models) and empirical methods (e.g., cognitive experiments with scientists) to put the foundations of scientific method to rigorous tests. She uses insights from cognitive science to learn how theories and data can be integrated and lead to better understandings of the world.
In this episode, we first talk about the cognitive mechanisms of discovery. We discuss the cognitive foundations of the scientific method. We talk about experimentation, and how it can be randomized. We discuss concept-laden evidence, and the importance of cultural and cognitive diversity in science. Finally, we talk about parsimony and complexity.
Time Links:
Intro
The cognitive mechanisms of discovery
The cognitive foundations of the scientific method
Experimentation
Randomizing experimentation
Concept-laden evidence
Parsimony and complexity
Follow Dr. Dubova’s work!
Transcripts are automatically generated and may contain errors
Ricardo Lopes: Hello everyone. Welcome to a new episode of the Dissenter. I'm your host, as always, Ricardo Lobs and today I'm joined by Doctor Marina Dubova. She's a um a nomi poststructural fellow at the Santa Fe Institute, and today we're talking about her work related to the cognitive mechanisms of discovery and also the cognitive foundations of the scientific method. So, Doctor Dubova, welcome to the show. It's a pleasure to everyone.
Marina Dubova: Thanks so much for inviting me.
Ricardo Lopes: OK, so, uh, first of all, what is discovery and how do you approach such a topic?
Marina Dubova: Yeah, that's a great question. Uh, I typically think of discovery in a cognitive way. So, I think of science in general as a cognitive activity of how we as humans try to make sense of the world. Um, SO in that way, uh, science is a cognitive process and it's just, uh, and it's characterized by many subprocesses that we often distinguish in any cognitive system. For example, that would include how We observe the world, how we experiment or actively, uh, investigate the world around us, uh, how do we represent the world based on our observations? How do we categorize the world or like, uh, parse the world into multiple parts? Uh, HOW do we communicate with others? How do we know who to learn from, uh, from the community? Uh, AND how do we use flexible analogies to guide our thinking and to notice similarities between different observations. So all of these subprocesses, I believe, guide scientific process just as much as they guide human learning or animal learning or other systems learning in the world.
Ricardo Lopes: Yes, I mean, I invited you because I've never read, or, or I, or I mean, I've had other discussions about the cognitive foundations of science on the show, but it's always interesting to discuss this a little bit further because I've also had people on the show to talk about the sociology of science, the philosophy of science, and all of that because, uh, I mean, to If we want to be rigorous about science as an institution, we have to approach it as a human activity with all of our biases, our cognitive limitations, the fact that it operates within a particular cultural context, and all of that. I mean, it's not just a pure way that is not unbiased, that is not. Is the role of discovering the truth with a capital T or something like that, right?
Marina Dubova: Absolutely, and, uh, as you mentioned, there are a lot of people who have recognized this aspect of science, the fact that all our knowledge is human knowledge, it's limited, it's subject to very similar biases that we have in general as humans. It's not a magical activity that can lead us to Uh, objective knowledge about the world, no matter what, no matter what we do, uh, if we just follow exactly the methods that science prescribes to us, we can, we, we still cannot transcend the inherent limitations of science as an activity, as a human activity as we know it.
Ricardo Lopes: Yeah, so what are then the cognitive mechanisms of discovery?
Marina Dubova: Yeah, as I mentioned, these are all different mechanisms that we use to try to understand the world around us. This includes trying to observe the world, categorizing the world into parts, or trying to parse the world in some other different ways, uh, representing the world with different types of models or theories. Um, And, uh, things like analogies, making flexible connections between observations or models, uh, or different ways of understanding the world, interacting with other people. All of these are different cognitive mechanisms of discovery, uh, and These have been to some degree appreciated by philosophers and sociologists of science to some extent, but now we can also start to make the connection to cognitive sciences, which has studied these exact same processes, uh, in kind of in more detail exactly how humans, uh, do these things in their everyday life.
Ricardo Lopes: So then, I mean, adding a cognitive science perspective to all of these, uh, we can include it in a sort of multidisciplinary approach to understand how science as an institution really works and how we produce. Scientific knowledge, correct?
Marina Dubova: Yeah, and ideally we could, uh, I, at least that's my hope is that we can also think about how we could improve science by thinking about it more concretely in terms of some of these processes and how they work in certain contexts. Uh, AND we can start applying cognitive methods, the methods we have traditionally used to study, uh, for example, human learning to better understand learning in the context of science. For example, we can start using, or we can use more cognitive experiments with scientists where we try to ask scientists to learn about certain toy system where we can set up exactly what is there for them, for them to learn and see what kind of strategies they actually use, and we can kind of scale this approach to study. Many different scientists at different career stages and what kind of approaches they take and whether that helps them learn about the toy system. Uh, WE can also start modeling the process in the same way we model learning in general, uh, or maybe with some modifications, but the general approach is kind of, uh, very similar where we can apply computational models to basically probe which kinds of strategies, uh, really help. Agents or scientists, uh, learn about the world in different conditions and that, that way we can hopefully, uh, open up the possibility of finding other strategies that have been so far, different strategies than the ones that have been already considered in science and learning that they may be the ones that help, could help us learn about the world. That's kind of the hope at least.
Ricardo Lopes: Yes, and I mean, I don't know if you will agree exactly what, with what I'm going to say, but from my conversations, as I said, with sociologists of science, philosophers of science, and cognitive science, scientists, what I took from all of that is that we should think about scientific knowledge as a human construct. I mean, we're not really through science arriving at objective truth or About or or at objective reality. I mean, it's always ultimately a construct, correct?
Marina Dubova: Absolutely, I completely agree with that, and uh again, cognitive perspective just adds to that a little bit, uh, but yes, science is an inherently human activity. Of course, we are making active progress in learning useful knowledge about the world, uh, but it doesn't mean we are arriving at one. One true model or one true theory about how the world actually works. I, I completely agree with that and I think some of my favorite philosophy literature is the literature that kind of makes, makes this point very clear.
Ricardo Lopes: So what would you say are then the cognitive foundations of the scientific method?
Marina Dubova: I mean, uh, these, I would say would be the strategies we typically tend to prescribe or use when we go about, uh, scientific process. This could include some strategies or hunches or method methodological foundations that we prescribe to students when they try to design their experiments or some specific, uh, preferences that We suggest, uh, students or scientists should have when they're designing new models or new representations of the world. So these would be all the foundations or the strategies that we, uh, tend to, uh, tend to follow when we, uh, when we do science. And of course, these are very versatile, very diverse. Nobody is really following any one of these prescriptions perfectly well. Uh, BUT we can start kind of analyzing at least what are the typical, typical strategies that we as human scientists at least try to, uh, try to adhere to when we learn about the world. Mhm.
Ricardo Lopes: Yeah, that, that's, that's very interesting, but I mean, because this is also something that I've talked about with philosophers of science. When you approach the cognitive foundations of the scientific method, are you approaching the scientific method itself as one single thing, or do you think there are scientific methods out there and they vary depending on the scientific discipline we are approaching.
Marina Dubova: Absolutely. I think there's a lot of diversity in scientific methods and different disciplines could definitely have different perspectives on how they are, uh, most effectively learning about the world, what would be the best approaches. Uh, ACTUALLY being here at the Santa Fe Institute. Really gives me an opportunity to observe this firsthand because I can interact both with physicists and philosophers and sociologists and anthropologists. And this really highlights this diversity of approaches that 1 may have about learning about the world. Uh, AND I would say there is Again, there's so much diversity, but there's also some things that tend to reoccur in at least some sciences where scientists do tend to have an intuition about what kind of model is a good illuminating model, for example, or what kind of experiment ideally, uh, might be a good experiment. And some of these we can trace, we can potentially speculate what the history of these preferences would be. For example, uh, when, uh, in social sciences or in psychology, we often look Look up to sciences like physics where we may inadvertently try to adopt the methods that may not be perfectly suited for the subject matter. So I think there's both a lot of diversity and also some convergence among scientists, scientists on uh what things we should do.
Ricardo Lopes: So what do we know, because this is something you've studied, uh, what do we know about how scientists go about choosing new experiments to perform?
Marina Dubova: Yeah, this is a, a fascinating question to me. I really love, uh, thinking about it. Uh, I think we, we know a lot of, a lot about how in specific, uh, historical cases, what really happened and why specific scientists may have chosen a specific experiment to perform or to pay attention to. Uh, WE also know on a large scale, uh, what kind of Large scale patterns we can detect in the, uh, publications, for example, of scientists, how they go about exploring this knowledge landscape of what things they could add on to. Um, BUT something we were interested in is not only kind of reiterating the existence of potentially some of these strategies, but to also Test when, uh, and in what situations they actually help our learning, uh, help the learning of the agents about the world just as science is. So basically what we did, uh, in the study specifically, we took some of the canonical examples of, um, scientific experimental strategies that have been discussed either in methodology of science, philosophy and history of science, or scient metrics. That may or may not, that are not always the exact strategies that scientists follow, but at least some of the strategies that we know as gold standards and that we sometimes try to perform, these include the ideas of, for example, falsification-driven experiment. The idea that a good experiment is really this rigorous test of the dominant theory in the field where we design the experiment specifically to aim to falsify the theory that we currently have. And that's very, uh, Very prominent idea in many sciences. The other experimentation strategy is, um, driven by, uh, this conception of crucial experimentation by La Kato. Uh, FOR example, in this case, uh, the idea is that a good experiment is the one that tries to, um, resolve a disagreement in the, uh, in the theories in, in the given field. So, for example, if there are multiple dominant theories in the field, we need to design an experiment. The perfectly identi uh that is, uh, kind of in that space where two different theories give divergent predictions. So, that's a really good informative experiment compared to running the experiments where maybe two theories are giving very similar predictions. Even though these, these are just examples of these canonical strategies that we sometimes think about as scientists on when we are designing our experiments because they seem like a really good, good approach. Uh, IT has also been recorded that many scientists or hypothesized that sometimes scientists do exactly the opposite, where we actually design an experiment in such a way that Kind of, um, that makes it more likely that our favorite theory ends up being proven correct again and again. So that's more of a confirmation driven strategy, and we know that also happens in the practice of science.
Ricardo Lopes: But I mean, are some of those ways of approaching experimentation, I mean, should they be ditched, for example, do you think that when, even when people with their own biases design an experiment where it leads to, or, or it's designed in the way to try to Uh, basically validate their ideas or their hypothesis. Do you think that that's bad, uh, methodology, or that can also be valuable in some way?
Marina Dubova: I, I think both are true. Uh, I think it depends on the context, and of course there's a lot of, uh, I mean, interestingly, in the, um, In the methodology of science literature, at least in my field in uh cognitive psychology, it's often considered that confirmation-driven approach is, is supposed to be, is intuitively feels bad versus falsification-driven or crucial experimentation or disagreement, uh, driven experimentation seems like the most rigorous way we could do, uh, science, and that's what many cognitive psychologists are ideally striving to. Um, SOMETHING we have been interested in is actually testing in a model which one of these strategies would lead to more successful learning, and of course, we can define more successful learning in different ways, uh, and of course, we had to make certain assumptions when, uh, designing a model, but that kind of helped us learn which strategy actually is the one that leads to a more successful learning about the world, at least in the context which we studied.
Ricardo Lopes: Right. Uh, WHAT is, what does it mean for experimentation to be theory motivated and should it be theory motivated?
Marina Dubova: Yeah, that's, uh, I mean, all of our experimentation is, uh, theory driven in some way, so we construct even the space of possible experiments, we conceive of the space of possible experiments we could do based on our theories or conceptualizations of the world. However, even among the strategies I just mentioned, Uh, that we are currently Discussing in science, there are ones that are more guided by theory, which we are deliberately trying to use our theories to guide specific selection of, uh, experimental conditions, and the other strategies that may be less theory, uh, theory motivated or more exploratory. For example, uh, ideas of, uh, constructing an experiment to confirm one's theory or falsify one's theory, or, uh, to resolve a theoretical disagreement could be considered. And and deliberately theory motivated strategy where on top of everything else that theory provides for us to even conceive of an experiment, we're also trying to pick the certain experimentation conditions that are, that are guided and driven, uh, motivated by the specific theoretical predictions that a given theory is producing. Uh, BUT a, a different strategy, more exploratory strategy could be something like just simply choosing. Uh, AND experimental manipulation by random choice. So once we have conceived of the space of possible experiments, uh, it doesn't matter what the theory says, it doesn't matter what two theories and where they disagree, uh, it's more important to just choose whatever, uh, whatever experiment, uh, we can do it by random or try to do some kind of novelty-driven sampling where we try to Uh, design experiments that are as different as possible from other experiments we've done. In that case, theory itself places a slightly lesser role in design in our choice of design of an experiment, and that's kind of what we call less theory-driven experiments in our, um, in our, um, in our simulation that we have mentioned. What we have found in that study is that If we basically run this model of agents who are trying to learn about the world and who are designing experiments based on one of these strategies, uh, and who are basically trying to, uh, create representations of the world based on their experimental observations. So the goal is to create a good theory for these agents, and good in the sense means either a theory that can predict the world, uh, predict the ground truth very well, or, uh, the theory that can Um, account for as much information in the world as possible. So these would be our two, metrics of success in this case. We basically found that the agents who are, uh, driven by more exploratory strategies such as simply randomly choosing what kind of experiment to do or, uh, using some kind of novelty considerations. Uh, THESE were the agents who ended up learning the most predictive and informative theories about the world, whereas the agents who are following this more theory guided, um, strategies such as false even falsification or crucial experimentation, were the ones that inadvertently narrowed themselves down to a certain aspect of the, of the world, and they, they thought that they would be developing very good theories because they limited the range of observations they are learning from. So, in that case, uh, it turned out that there's this very interesting illusion of a theoretical success that was emerging. So, if the experimentation strategy is guiding you to narrow down the set of observations that you're learning from, you may think, you, you may have less information to even explain. Therefore, leading to potentially, uh, leading you to potentially conceive of your theory as being more successful just because once we evaluate scientific success, we can only, um, We can only look up, look up at the things that we currently know. For example, if I try to evaluate the success of psychology right now, I would look mostly at the current observations that we know as scientists and how well our theories are accounting for these observations. So, from that perspective, these agents who were very theories Driven, even the ones who were trying to falsify, um, the, uh, theories by their experiments were the ones that were collecting more limited data for themselves to learn from, leading to this illusion of having successfully explained the world when in fact, you cannot actually generalize to the ground truth that you're trying to, try to learn about.
Ricardo Lopes: OK, but how would randomization work in experimentation? I mean, how would experimenters or researchers randomize experimentation?
Marina Dubova: I mean, that's a great question, and I think that kind of goes into uh this other aspect I mentioned of we, we tried all of, all of these idealized strategies. Obviously, nobody can ever execute these strategies precisely. Uh, EVEN perfect falsification is also impossible. We would always be swayed by these different other aspects or by our inability to perfectly Perfectly assess what would be ideal falsification case for a theory and things like that. In the same way, perfectly random strategy is never possible. Uh, WE always are guided by all kinds of, uh, all kinds of both theoretical considerations that are already guiding our experimentation, but also we are also, uh, guided by costs and other practical things that make it sometimes very, uh, Potentially even uh not beneficial to conduct these kinds of more exploratory experiments when we as human beings may have more trouble, for example, to even connect these findings to other things that we know, which would make, make it cognitively costly, for example, to conduct these more exploratory experiments. What I would suggest it's more, um, What I would suggest is more uh kind of nuanced position where there are different trade-offs that are associated with different experimentation strategies and we should just be not, uh, be aware of this, this possible vicious cycle, uh, which may be introduced if we are following a theory-driven experimentation, and I believe any increase in the how much exploration we are allowing in our experiments. Uh, WOULD make us more robust to these kinds of, uh, vicious traps that we have uncovered in our stimulation. So, for example, in a given experiment, if I have a choice, In a given situation, to either specifically, deliberately pre-plan all of the variables and how they're going to behave in my experiment, versus if I just let it, let it be more exploratory choice, where I don't perfectly pre-plan everything, I believe that uh more exploratory choice would be more on the side of these agents who were randomly, uh, selecting their experiments. So, I believe that's, that's kind of one. Uh, KIND of minor way in which we can alter our experimentation day to day by making some of the small choices that we make potentially more exploratory. So, it doesn't mean we should always be satisfied that none, none of our experiments are ever perfectly random or I don't know if we even want that, but there's always choices that we make that could be made more or less exploratory, and this, uh, this study kind of speaks for the value of more exploratory studies.
Ricardo Lopes: So another topic I would like to ask you about, what is concept-laden evidence?
Marina Dubova: Yeah, that's a very related topic, of course. Um, THIS is something we thought about, uh, kind of again inspired by the philosophy literature that is pointing out the theory ladenness of scientific observation and how all of the things, all of the data that we are collecting actually is not neutral or pure or theory-free. It's always ingrained with our way, the ways we, in which we are conceptualizing the world. So, what we try to do with uh my advisor, Rob Goldstone, We try to think about the ways in which concepts, specifically the ways we carve the world into multiple parts. For example, in human, uh, in human life, it could be carving the continuous color spectrum into discrete categories such as red and orange and blue in one culture or some other categorization in the other culture. Um, IN the science, it would be creating scientific taxonomies where we kind of carve our phenomena into multiple parts. Uh, THIS would, the example of this would be, for example, in the human cognitive psychology, we conceive of human mind as consisting of multiple processes such as attention, perception, memory. Memory is divided into long-term memory, short-term memory, working memory, and things like that. So all of these are conceptualizations that we impose onto our phenomena of interest. So, the, we suggest in this paper that our, uh, evidence that we're collecting is influenced by these concepts or taxonomies that we are, uh, that we have when we are approaching the, the subject matter. So, uh, specifically, we are trying to say, uh, we're trying to make a case in the paper that, um, A lot of the ways in which we have already noticed in human cognitive psychology that our concepts of the world often influence our perception of the world. For example, having different color color categorizations may influence how, uh, we, we would be perceptually discriminating different colors if we come from different cultures. Um, IN the same way, we're saying that Uh, some of the same mechanisms may affect how concepts, these concepts in which we divide the world into parts may affect scientific experimentation, scientific theory building, uh, and, uh, scientific reassessment of the evidence in such a way that may sometimes instantiate these, these categories even when we're trying to reassess them. Uh, I can talk more about this, but that's kind of the big picture.
Ricardo Lopes: I mean, would that be one of the reasons why cognitive and cultural diversity in science is important? Because, I mean, people from different, uh, cultural and cognitive backgrounds perhaps have something to add that if we only add, for example, people from Uh, countries where the, the dominant psychologies, to use Joseph Eric's terms, uh, weird psychology, that would be limiting in a way.
Marina Dubova: Absolutely, I think there are so many ways in which, uh, cultural diversity and in general, uh, epistemic, any kind of theoretical diversity in science, I believe, should be encouraged. Uh, THE main reason for this is that we bring all of this historical and cultural baggage with us when we construct. Uh, OUR conceptualizations of the world, even as, as scientists, and science is not this magical way to basically transcend all our, all of our human cognitive limitations and human conceptualizations that we, uh, we have been, uh, ingrained with over, over the course of our lives. Uh, AND actually what we, uh, what many people have noticed is that a lot of scientific taxonomies are trait, have their traces in the folk taxonomies of a given. Uh, OFTEN Western cultures. Uh, SO for example, uh, the way in which, again, we think about human cognitive, uh, cognitive processes, a lot of these, uh, these concepts trace their roots in, uh, in the Western conceptualization, uh, conceptualizations of how humans work, uh, even before psychology was created. Uh, IN fact, I think It's not just about the concepts, but it's also about the ways in which we approach the world and represent the world, and it has been documented in a cross cultural psychology that there's a lot of variation in which people, uh, from different cultures, uh, might be even approaching, uh, the, this activity of learning about the world. For example, Douglas Mine and Megan Bang and other have, others have done a lot of beautiful work on How, uh, native, uh, native people's conceptualizations of nature actually sometimes are more consistent with the complexity view where you consider all of the things to be interrelated with each other as compared to kind of Western, uh, way of thinking about the world when we try to, sometimes it seems like we're trying to really Consider every variable in isolation and which one is really affecting the, the target, uh, target phenomenon, uh, and these are different high-level strategies in which we can even, uh, approach to learn about the world and some of them, them may or may not be, uh, more useful or all of them are useful, and we need to really, uh, see where each approach would, would lead us. Mhm.
Ricardo Lopes: Yeah, uh, no, I definitely have had several conversations on the show about that. For example, I've interviewed Doctor Richard Nisbett about the cognitive differences between Westerners and Eastern Asians, and Eastern Asians, for example, tend to be more holistic in their thinking, more relational, and, uh, and, uh, I mean, approaches to, uh, science in the East tend, uh, are more easily favored, for example, a process. Ontology instead of a substance ontology and then there are work of people uh that unfortunately is not very mainstream in the cognitive sciences and psychology yet, but there are people applying complexity science and dynamical systems theory specifically to psychology and cognitive science. I mean, I've interviewed people like Naomi Reuter and Paul van Gert from the Netherlands and they are doing fantastic work in that domain and Um, yeah, I, I mean, it's just that there are ways of thinking that perhaps are harder for people who were brought up in the Western culture.
Marina Dubova: Absolutely, and I think what like this cognitive approach could add onto this is basically we can say, oh, these are the specific differences we have noticed and how people Approach they're learning about the world in these different countries. For example, that could include more holistic and relational thinking versus more, uh, kind of reductionist thinking, uh, and of course, we are not making a point that one of them is better than another, but it's very important to understand where some of our aspects of our existing science are coming from, whether it's really the world speaking to us and being all of, like, being, uh, Kind of um Being Isomorphic to the structures that we typically use to think about the world versus if, if, if that's really our cognitive limitations and our cognitive predispositions are, are that we are imposing onto the world, and I think both could be possible and it's very important to kind of document some of these differences, some of these possible differences, uh, and to think about, um, how we could Assess which, which aspect was uh in play, uh, when a certain piece of evidence has been gathered. Of course that can never be achieved in perfect form, but I think just recognizing the differences is the first step.
Ricardo Lopes: I mean, at least in the Western tradition, I think we've known at the very least since Kant that the pure empiricist approach, I mean that we should assume that we just receive pure information without any. The cognitive filters from the world and process it just doesn't work and isn't true, right?
Marina Dubova: Absolutely. I think in philosophy it has been recognized for a very long time and I think many scientists are aware of this, but I think when it comes to very specific Day to day parts of doing science, we somehow, often they're still, at least in cognitive psychology, I feel like I personally have been trained with this kind of mindset and others have been trained, even though we do know there are these biases that affect our cognition, still, we, we think that we're learning this, uh, ideally, the ideal is to find this one theory that really explains the phenomenon and we often have these fights over Attention and memory and attention and perception, these are really the modules of cognition. Um, BASED on, uh, based on looking at some of the evidence when we don't really go into depth about how this evidence has been collected in the first place and how this evidence that we're using to justify again and again that perception and attention are different processes might have been collected with these conceptualizations in mind where there are different cognitive psychology communities that are studying these different processes and it's really, really hard to Backtrack and say, oh, let's just look at the data and see if perception and attention are re-emerging as kind of joints of the mind, uh, when we have used them, uh, for our entire activity. It's kind of hard to, uh, appreciate that in the moment sometimes.
Ricardo Lopes: OK, so let me ask you about another topic that you've explored in your work, uh, because many times, and I still hear this from many scientists, we assume that parsimony is always better and that we should always apply the parsimon. On the principle or we should apply something like Occam's razor to science that is the, the most, uh, the simplest uh explanation with the fewest assumptions. It is always the better, that's the assumption. Um, I mean, is this, is the parsimony principle always the best to apply in science?
Marina Dubova: Um, I think it's not always the best, and it depends on the context, uh, and in fact, something I'm very interested in is kind of mapping. When and in which situations parsimony is the useful principle for a learning agent to learn about the world and when is it not a useful principle and again, that's where some of the cognitive, uh, studies of other types of learners, for example, either human learners learning about the world when we can assess the progress or we can look at the machines or artificial learners who are trying to learn about the set of data and we can basically manipulate. How, uh, what the degree of the parsimony bias they have when they're trying to learn a representation of the data, and we have some really interesting recent results from computer science and statistical learning that basically is speaking. Against at least the most naive intuition that always just having trying to compress the data into a smallest number of components or at least relatively small number of components is always the best. So, basically, some of these recent results, it's um called the double descent of generalization error. So, basically, when we think about scientific uh statistical representation learning, Let's say we're trying to create either a linear model with multiple parameters to capture the set of data, or we can think about a neural autoencoder, kind of, uh, there's an input data of a, of a given dimensionality and output data of a given dimensionality, and the goal is to construct a representation to capture as much of, as much of the structure in the data as possible. So when we think of these systems, uh, we can basically vary the degree, at least of the parsimony that they are. Originally, uh, originally created with. So in this case, we can, for example, say this neural autoencoder could have a bigger bottleneck where they're only one neuron, one single neuron that all the information is passing through, or the linear model could have just one parameter to account for all the variants in the data. Uh, THE other option would be, uh, to increase the number of neurons that are, that are capturing the information and the data, uh, or the, to increase the amount of parameters of this regression model. And traditionally we have thought that increasing this number of parameters contra or very intuitively, it basically, uh, Increases our, uh, risk of overfitting the data, overfitting the noise in the data. So basically, the idea is that by compressing the data, we find really the things that truly matter and we rule out all the noise and everything else in our everyday observations. So, uh, none of the observations are exactly the same, so it's useful to, when you're constructing a representation to basically rule out all of this, uh, noise as much as possible to capture only the important elements of it. Uh, AND that's kind of one of the arguments for parsimony is that we, we basically aim right away to rule out all the noise in the data and, uh, try not to overfit this noise in our representations because that would be not useful, not generalizable. We cannot reuse that representation to guide our behavioral new situations. But, uh, something, uh, some the literature that I find really fascinating is this literature in statistical learning on double descent, which basically finds that this picture is not as straightforward as we thought. So basically, it turns out if you start increasing, for example, the number of parameters in the linear model or the number of um neurons in the autoencoder, well beyond the point where the model can perfectly account for literally everything in the data, so now the, the error of the model is basically zero on the data. And now we keep increasing the number of parameters. For example, that would constitute learning a polynomial regression with 1000 parameters to account for 20 observations or learning an autoencoder with 1000 inner neurons to account for 10 dimensional data. This sounds like a really ridiculous concept. Why do we imply, why do we construct this map that is potentially more dimensional than the data itself? But what people tend to find in statistical learning literature is that actually, these end up often being the representations that generalize the best, which is, uh, from some perspective could be considered very counterintuitive, where basically you're creating the representation that is at least more rich than the data itself. This representation is already memorizing every single data point and has many more degrees of freedom. Then needed to capture this data, uh, but this representation ends up generalizing very well. And of course, there is a lot of, uh, very interesting technical literature trying to explain why this happens and why it may again go back to this, uh, traditional trade-off between parsimony and complexity. Is there some notion of parsimony that emerges there in that regime, but what we are, we're interested in is thinking about how this mode of learning in general. What it could teach us about this universal preference for parsimony. At least what this is showing us is that there are modes of learning that seem to be at least incompatible, partially incompatible with our traditional notions of how parsimony should be imposed on learning about the world, that nevertheless lead one to learn a very successful representation of the world, at least very useful for generalizing and predicting, and we can ask whether that representation is useful for other epistemic goals we can have.
Ricardo Lopes: OK, so one last question slash topic I would like to explore here and it is very much related to the previous one, because of course in science, we can't represent reality itself, or at least I think that's not possible. We always have to model reality. Uh, I mean, I, I think this is sort of almost the reverse of my previous question. I mean, are more complex models better?
Marina Dubova: I mean, I think it depends on the context. It depends on what are the observations we are learning from. Uh, THERE are different, uh, goals in which complex models could serve us a really good role in originally exploring many, many patterns that may be there in the data that we could eventually distill into a couple of intuitive, uh, principles that we can comprehend. Um, SO these are the areas that have been explored, again, in computer science literature. It's called, uh, distilled learning, for example, where often, There's a stage of learning in which constructing this very, very complex representation or a very high dimensional representation is actually useful, even if the goal eventually is to come up with just a couple of parsimon just a parsimonious model with two variables. So basically, uh, if you learn the parsimonious model directly from the data, it often ends up not being as successful as the model of the same data that is learned from this higher dimensional representation of the data we can learn first. So that's kind of hinting at a possibility of using more complex models at a different stage of the scientific process where we, even if our goal is eventually to come up with a very, uh, intuitively understandable, uh, representation of the phenomenon, maybe the first stage usefully might be constructing this higher dimensional representation of it that might be more complex than the phenomenon that's that even the data, uh, at least of the phenomenon. Uh, SO I think Uh, back to your question, I think it's context dependent, depends on, uh, what we're trying to achieve in science, but I do feel like there, there are contexts in which complex models could be usefully utilized, and it is. It is a limitation of science if we don't make use of this uh mode of learning.
Ricardo Lopes: OK, so Doctor Dubova, just before we go, would you like to tell people where they can find you and your work on the internet?
Marina Dubova: Yeah, totally. Uh, PEOPLE can go to my, uh, Google Scholar profile or my website. It's M Duboba.com. Uh, THERE'S a lot of summaries of my research and some of the papers, and of course I'm always happy to answer any questions that somebody may have over email, for example.
Ricardo Lopes: OK, great. So thank you so much for taking the time to come on the show and for the fascinating conversation.
Marina Dubova: Thank you so much for inviting me.
Ricardo Lopes: Hi guys, thank you for watching this interview until the end. If you liked it, please share it, leave a like and hit the subscription button. The show is brought to you by Enlights Learning and Development done differently. Check their website at enlights.com and also please consider supporting the show on Patreon or PayPal. I would also like to give a huge thank you to my main patrons and PayPal supporters, Perergo Larsson, Jerry Muller, Frederick Sundo, Bernard Seyaz Olaf, Alex, Adam Cassel, Matthew Whittingberrd, Arnaud Wolff, Tim Hollis, Eric Elena, John Connors, Philip Forst Connolly. Then Dmitri Robert Windegerru Inai Zu Mark Nevs, Colin Holbrookfield, Governor, Michel Stormir, Samuel Andrea, Francis Forti Agnun, Svergoo, and Hal Herzognun, Machael Jonathan Labran, John Yardston, and Samuel Curric Hines, Mark Smith, John Ware, Tom Hammel, Sardusran, David Sloan Wilson, Yasilla Dezaraujo Romain Roach, Diego Londono Correa. Yannik Punteran Ruzmani, Charlotte Blis Nicole Barbaro, Adam Hunt, Pavlostazevski, Alekbaka Madison, Gary G. Alman, Semov, Zal Adrian Yei Poltontin, John Barboza, Julian Price, Edward Hall, Edin Bronner, Douglas Fry, Franco Bartolati, Gabriel Pancortez or Suliliski, Scott Zachary Fish, Tim Duffy, Sony Smith, and Wisman. Daniel Friedman, William Buckner, Paul Georg Jarno, Luke Lovai, Georgios Theophannus, Chris Williamson, Peter Wolozin, David Williams, Dio Costa, Anton Ericsson, Charles Murray, Alex Shaw, Marie Martinez, Coralli Chevalier, Bangalore atheists, Larry D. Lee Junior. Old Eringbon. Esterri, Michael Bailey, then Spurber, Robert Grassy, Zigoren, Jeff McMahon, Jake Zul, Barnabas Raddix, Mark Kempel, Thomas Dovner, Luke Neeson, Chris Story, Kimberly Johnson, Benjamin Galbert, Jessica Nowicki, Linda Brendan, Nicholas Carlson, Ismael Bensleyman. George Ekoriati, Valentine Steinmann, Per Crawley, Kate Van Goler, Alexander Obert, Liam Dunaway, BR, Massoud Ali Mohammadi, Perpendicular, Jannes Hetner, Ursula Guinov, Gregory Hastings, David Pinsov, Sean Nelson, Mike Levin, and Jos Necht. A special thanks to my producers Iar Webb, Jim Frank Lucas Stink, Tom Vanneden, Bernardine Curtis Dixon, Benedict Mueller, Thomas Trumbull, Catherine and Patrick Tobin, John Carlo Montenegro, Al Nick Cortiz, and Nick Golden, and to my executive producers, Matthew Lavender, Sergio Quadrian, Bogdan Kanis, and Rosie. Thank you for all.