Episode 1

March 01, 2023

00:45:07

Brain Science is Data Science

Brain Science is Data Science
UVA Data Points
Brain Science is Data Science

Mar 01 2023 | 00:45:07

/

Show Notes

This episode explores the intersection of neuroscience and data science with three experts in the field, Drs. John Darrell van Horn, Tanya Evans, and Teague Henry. As we know, the brain is complicated. People have been charting paths through the brain for decades, making breakthroughs and discoveries that have changed the world. In recent years though, new methodologies in brain research have made significant impacts. Advances in computing power, as well as techniques like machine learning, neural networks, and computer vision, have allowed researchers to ask questions and make discoveries that were not possible even ten years ago. Given these new approaches to studying the world’s most complicated organ, one could say that brain science is data science. Our guests make a compelling case.

View Full Transcript

Episode Transcript

Monica Manney 00:01 Welcome back to UVA data points. I'm your host, Monica Mani, and today's episode we're exploring the intersection of neuroscience and data science with our guests, Dr. Jack Van Horn, Dr. Tanya Evans and Dr. Teague Henry. All three of our guests are experts in the fields of neuroscience. And you'll hear more about their backgrounds and research at the beginning of the conversation. As you can probably guess, the brain is complicated. And so in order to study it, you have to find ways to navigate through this complexity. Of course, people have been charting paths through the brain for decades, making breakthroughs and discoveries that have changed the world. But in recent years, new methodologies in brain research have shifted the field. Advances in computing power, as well as techniques like machine learning, neural networks and computer vision have allowed researchers to ask questions and make discoveries that weren't possible even 10 years ago. So you might say, given these new approaches to studying the world's most complicated organ, that brain science is data science. And our guests today make a compelling case for this view. So with that, here's Jack Van Horn, Tanya Evans and Teague Henry. Jack Van Horn 01:05 Good day, everyone. I'm Jack Van Horn from the University of Virginia, where I'm a professor in the Department of Psychology, as well as in the School of data science. And I'm joined here by my colleagues, Tanya Evans, and Teague Henry. Tanya, can you say a few words? Introduction? Tanya Evans 01:22 Absolutely. So I'm faculty here in the School of Education, and human development. at UVA. I also have courtesy appointments in psychology, and neurology, and I am co director of the Center for healthy brain development. Teague Henry 01:38 Teague I yeah, I am faculty in the department of psychology and the Department of data science. Jack Van Horn 01:44 Tony, can you tell us a little bit about your research background and some of the activities that you've undertaken? Tanya Evans 01:52 Sure, absolutely. Um, so my undergraduate training is actually in chemical engineering. So I come to this from a fairly quantitative perspective, I interdisciplinary training in neuroscience as a PhD student, and then postdoctoral training in developmental cognitive neuroscience. Ultimately, my research program seeks to characterize pediatric brain development, and how it relates to school readiness skills. And I define that quite broadly in the domains of reading, math and social cognition. And I bring to bear multiple neural imaging methodologies to those questions. Jack Van Horn 02:28 And I know a fun fact about you is that you spend a lot of time in Wales. Tanya Evans 02:33 Well, I spend a lot of time in the United Kingdom. My, my in laws live in England, five miles away from the Welsh border. But they are not Welsh, they're not yes. But yeah, I spent a little bit of time there as a graduate student as well. That's fantastic. Jack Van Horn 02:50 Teague, Tell us a little bit about your kind of upbringing, science, and, yeah, Teague Henry 02:55 of course. So I did my undergraduate degree in psychology, I originally I went to my undergraduate university, to do bioengineering. But probably for good reason. They didn't let freshmen engage in research and bioengineering labs, with what I know now, I think that was probably a good idea. But in psychology, they did. So I went over to psychology, I got a degree there. And then I went to graduate school for quantitative psychology, which is a strange subfield that is basically statistics for psychologists. I did postdoc in clinical neuroscience, and then developmental psychopathology more generally. And now I'm here. And my work really revolves around kind of two features. I've got one side of my research, which is neuro imaging, and I'm interested in how do we build better software and tools for Neuro imagers to use just very generally from a from almost a user design standpoint, because our software is unfortunately, it works. But it's not well designed from a user standpoint. And the I'm also interested in network neuroscience, understanding connectivity in the brain, multiple modalities in the brain, and building new statistical tools for that better side of my research that is less brain related. But but very intermingled is I'm very interested in behavioral dynamics and modeling behavior, and especially in terms of psychopathology. I'm interested in just in time interventions for psychopathology based on behavioral data, and have developed and I'm actively working on developing some new methods for that. Jack Van Horn 04:33 Fantastic and I guess I'll tell you a little bit about myself. I earned my bachelor's degree in psychology, and then immediately went into a Ph. D. program in England as it turns out in London, and then took a postdoctoral fellowship working at the National Institutes of Health. And while I was there, I didn't have enough to do so I went and did a master's degree in a Electrical and Computer Engineering. And so I was doing that at night while I was doing my postdoctoral research during the day, and then had opportunities to join the faculty at Dartmouth College. And I went over to the west coast for a while and was at UCLA and USC before I came here to UVA. And a lot of my work is largely dealing with some of these large scale datasets from neuroimaging. Again, it's just I've seen enormous growth in the size and the scale and the number of subjects that we involve in these types of studies. This has involved a lot of magnetic resonance imaging, positron emission tomography, those two different modalities, looking at different elements of kind of brain structure and function. But also, more recent years looking a lot at br ain conductivity using diffusion weighted imaging modalities, looking at networks, mapping out pathways in the brain, looking at functional correlates of these networks with behavioral outcomes. I've done this in traumatic brain injury, Alzheimer's disease, Autism Spectrum Disorder, Schizophrenia, and Parkinson's disease, and probably a number of other disorders, which I've forgotten about. But it is a, it's been fascinating, absolutely fascinating, that field changes so fast, that if you turn your head for a moment, and look back, you will have missed a lot, because there is just so much excitement and enthusiasm. And I think as we were talking about things like team science, it's very exciting because you really get to work with a lot of different levels of expertise. It's not like we're all in the same kind of scientific bucket, and we're all talking each other in an echo chamber. It's so many perspectives, from clinical medicine, to neurology, to radiology, to physics, to computer science, engineering, and psychology, it's really is an exciting area to be in. And I think data science is really a brain science as well. And so that in so much as you know, data science, it can play a role, I think it's absolutely vital. I'm really glad we have a chance to get together today. Because this comes after a lot of excitement around here at UVA about the neuroscience grand challenges, which are even now there's all manner of initiatives going on, on around grounds, at the same time, as we're on the eve of Brain Awareness Week, which is a national effort to make sure that people are aware of brain diseases and developments and what it means to have a healthy brain and all that, that really a lot of our research impacts. And I wanted to have a conversation with the both of you about the brain as a data science and the amounts of data that we collect on it. But a lot of people don't really appreciate the fact that a lot of the work that we do is really data rich. And Tanya, I know that from your experiences, you're collecting data across multiple modalities, and just how much has that grown in your career? And what have you seen? And like, you know, why is this? Why is this important for people to know? Tanya Evans 08:13 Yeah, absolutely. Um, so in terms of complexity of data, you know, I currently collect structural imaging data, functional imaging data, now collecting EEG data as well across both individual participants and dyads. And ultimately, my goal is to connect brain to behavior to make kind of meaningful contributions about what we're actually looking at. And even, you know, the behavioral data is rich as well. And so making connections across those, it's just, you know, myriads of matrices that need to be analyzed in some complex fashion. Jack Van Horn 08:51 Yeah, absolutely. And Teague, you know, one or two things about matrices and the these data are from multiple different spatial and temporal scales, and integrating them all and looking at relationships amongst different brain regions really ends up being a very complicated Teague Henry 09:08 Oh, yeah, absolutely. I mean, I'm always just struck by the complexity of a single modality for brain image, because not only do you have possibly the brain moving across time, but then if you think about the brain as a three dimensional array, then you've got hundreds of 1000s of data points for every individual from one modality and multiplied across multiple modalities. That's a lot of data. Jack Van Horn 09:31 Absolutely. When I was a newly minted Postdoctoral Fellow, for the laboratory that I was in, I was asked to go and buy the hard disk that we were going to use to store all of our data. So I went out with I can't remember how much money they allowed me to have. But I went out about the biggest freestanding hard disk that you could get at that time. And it was four gigabytes, which was like infinity in 1993, right and now four gigabytes, you could eat for breakfast, it's really not that big of a data set. And the types of data that we're collecting today in about an hour's worth of, you know, EEG or MRI. In the future that will be considered cute, just like four gigabytes is now I can imagine. So I no doubt. And as we are linking these different data types together with full genome sequencing with some of that deep phenotyping, you're talking about these things are the sort of computational challenge and really required data science. Thinking Teague Henry 10:33 Absolutely, absolutely. It's working with neuroimaging data, or just any sort of brain data really kind of puts a, almost a physical quality to the size of the data, where it takes effort to move it from one computational platform to another, you get a sense of the bandwidth needed of the processing capacity needed, becomes a math, it becomes a math, a huge, huge, just, it has a physicality to it. At least that's how I think of it when I'm working with large scale neuroimaging data. I remember Jack Van Horn 11:05 I remmeber, earlier in my career, a lot of the data that people would get would be if you think about like bench work, which is still true, is, you know, you're basically doing gels, or you're doing something and you create a mylar thing, or in in a lab know exactly, and now a lot of the data is just born digitally. So you take your sample, you put it in a machine, the machine spins, does whatever it does, and spits you out a data file that is then the subject of your analysis for the next two years. This has been true in pretty much any modality and certainly for these human centered modalities. It's definitely true. What are some of the areas where you've really seen this happen? Tanya Evans 11:51 Yeah, I mean, there's, there's coding at every level. I mean, we, for instance, we watch videos of participants interacting and encode that behaviorally, we also utilize the methodology, we collaborate with Steven Bokor, and do motion energy analysis to quantify pixelation and movement from that behavioral perspective. phenotypic data and neuro psychiatric measures, all of them with multiple sub tests, and to be able to you know, parse out and and say, you know, which bit of this behavior is driving these changes in brain development? It gets very complex very quickly. Jack Van Horn 12:33 How do you make inferences on this Teague? How do you make that linkage between a change in this brain signal is related to this change in cognitive performance, for example? Teague Henry 12:49 Well, of course, with causal inference being a difficult thing, it's difficult to draw direct relations between those. But it's generally we just have to be very careful about using the data and thinking about it. And then once you once you understand what you're looking at, when it comes to neuroimaging data, and you have adjusted for the features of neuroimaging data, most of the time it can be reduced down to a classical model, right, like the dominant method in a lot of functional neuroimaging work is something called the general linear model, right? The GLM. And all that is is it's just repeated multivariate linear regression over and over and over and over again, hundreds of 1000s of times, which is complicated, but it's not like neural networks or deep learning what have you. But the magic comes into then adjusting everything that you're doing for the fact that there's spatial relations in the brain for the fact that you've got slight differences in the brain, across people that doesn't even get into the processing pipelines that are needed. Because the remarkable things about brains is that they do look similar, but they're different enough to make computers not be able to equate them very well. So an entire field of data processing is registering these images together. Jack Van Horn 14:07 What's the role of the computing? These are not the kinds of analyses you can do on your laptop require something special? Teague Henry 14:16 Well, you could do them on your laptop, you wouldn't want to do the laptop, because that's going to probably, for the most part, kind of keep your laptop running very, very hot for days, right? It's just a matter of the sheer amount of data, can your laptop hold hundreds of gigabytes of information in memory, and then throwing these giant matrices around? It's again, it's not computationally complex, but there's a lot of it a lot to do with it. So it's really a matter of scale. And I've worked with neuroimaging data on my personal computers, and I've deployed it to very, very large clusters. And it's it's, there's nothing inherent to the analyses that requires, a very, very powerful machine. You just don't want to have to wait for months or things to be done. Jack Van Horn 15:05 Yeah, you need to be able to do it at scale. Tanya Evans 15:08 Yeah, I'd like to just throw in that, you know, one can take data driven approaches. But it also is incredibly important to be theoretically grounded, and conduct hypothesis driven research, you know, all of this data is meaningless. Unless we can make sense of it. And really approaching our experiments with hypotheses that then we're able to rigorously test utilizing these methods. And these datasets, Jack Van Horn 15:35 There was always the there was, for a while there was this sort of banal attitude that we just collect enough data, we won't have to test hypotheses anymore, because we'll just know everything. It's just a matter of ferreting out of all the data, right, and I've never subscribed to that view, it's, it's a helpful for us to generate new hypotheses, which can then go and be specifically tested with new data collection. But as a replacement for hypothesis driven research, absolutely not, you know, we'd be foolish to think so. Teague Henry 16:05 And just the very nature of brain data makes data driven approaches to like hypothesis building, almost dangerous, because there's so much signal and so many different ways of slicing this data up, that if you're going data driven, you're going to find something, right. And because of how, you know, modeling works, and how everything, hope, basically, our psychology influences how we think of models, you're probably going to find something that you're looking for when it comes to the data, but it might not actually be the best way of analyzing it. So yeah, I'm a big advocate for hypothesis driven testing. For this type of data. I think there's a place for data driven approaches, but you have to bring in prior theory, you have to bring them knowledge about the data type. Jack Van Horn 16:54 And, Tonya, you do a lot of your work in developmental neuroscience. And so you're looking at patterns of how brain form function and conductivity are, are changing over the early lifespan from birth to young adulthood. What are some of the kind of specific breakthroughs you might have seen or be aware of, in your own work or in the work of others that really these types of data have given rise to? Tanya Evans 17:24 Yeah, and so I think, you know, looking at developmental data, they're kind of if you want to break it down, quite simplistically, if you want to study, something changing over time, you wouldn't have a cross sectional dataset or longitudinal data set, you know, a cross sectional dataset is the quickest way to collect developmental data. But we really can't understand individual differences if we utilize that method. And so really leaning into these longitudinal data sets, which take time to collect. And we deal with attrition. So we're not seeing the same subjects that are staying in the sample and really leaning into what's been found behaviorally, trying to replicate that with more neuroimaging types of methodologies. And so the implementation of longitudinal models has come into play. And I think that an area that has been incredibly exciting is, is looking at a way in which we can use brain imaging data to predict either trajectories or outcomes. So take, you know, a resting state scan, even at birth, or a structural scan, even at birth and be able to say, you know, with this image, we're able to reliably predict what the trajectory of growth in one particular skill might be for this individual. I think that's a pretty exciting, Jack Van Horn 18:49 that's super exciting. Teague, do you have any examples you can think of that, really, I don't know, were born out of these large scale data collection efforts. Teague Henry 18:59 oh, yeah, no, I use these sort of really large scale data collection efforts as sources of data for methodological work, right. So I've used there's a huge data set, that's public youth, it's called the ABCD adolescent brain cognitive development, beautiful data set 10,000 children, I think, starting at age what nine to 10, they're going to follow them up at at over 10 years, five different scan occasions, think they're in their second follow up scan now. Right. And there's 6600, and I think 66 scans available in the resting state that are all viable. It's a it's a beautiful dataset for just pulling down and when you're developing a method, running something on at because you really want that's huge sample size, to be able to play around with. Yeah, Jack Van Horn 19:55 I think that's one of the things which I've seen over a career and doing that says that just the size of the number of subjects, at the same time is improvements in the technology to be able to get more more data per unit of space per unit of time. So just the size of the samples, when we started doing this stuff ages ago, the sample sizes were, you know, 10 subjects, you know, the the sample size was relatively small. Now, you're have these enormous resources of, you know, all what is 6500 subjects and ABC, you have the aid, the Human Connectome Project dataset, which is many 1000s of subjects. The UK Biobank is ridiculous resource, hundreds of 1000s of individuals and all contributing, MRI scanning, all contributing neuro psych demographic information, regionally specific stuff, you know, in this case throughout throughout England. And there's a number of other efforts going on in the US currently, as well as that are kind of being under development, that are going to be amazing resources. And just the scale of that has really changed in the last 10 to 15 years. Tanya Evans 21:10 But the collaborative effort required to undertake those studies. It's just remarkable no Jack Van Horn 21:15 single institution no single investigator could undertake. It's it's a team science effort. It really requires people with in medicine, in biology, neurobiology, psychology, education, statistics, quantitative psychology, and data science writ large to be able to do these things at all. It does take a village for those types of studies to coordinate Tanya Evans 21:39 the data collection, have it be identical across sites, oh, yeah, maintain the data store the data provide access to others that are interested in using it throughout the community. Teague Henry 21:51 Absolutely. And it provides a really nice blueprint for more targeted data collection efforts. Because one of the difficulties with clinical neuroimaging is collecting certain types of data I've worked with a dataset on medication naive children with ADHD, not only are those children very rare, because most children who are diagnosed with ADHD go on medication. But also they move a lot in the scanner, which means they're very hard to collect data from. So ultimately, we had about 18 subjects worth of data for this randomized controlled trial. And it just got me thinking like, you know, if there were multiple sites that had shared protocols, or standardized protocol for tasks, that we could do a bunch of equating across sites. And then I know that there are multiple teams of investigators who are interested in these questions, sharing the study design, sharing the protocol, and getting the sample size, we need to do really good inference. That is a consortium effort. That's a that's a big collaborative effort. Jack Van Horn 22:50 Do you think that UVA as an example institution could help to lead efforts like this? Tanya Evans 22:55 I believe it can. I've worked for a number of academic institutions. And I can honestly say that UVA is really investing resources, and taking strides to break down administrative barriers and really provide resources to incentivize collaborative work. Jack Van Horn 23:15 So I think it would be wonderful to see UVA with this investment in the neuroscience Grand Challenges, the hirings that have happened here with our faculty is to be able to position ourselves to be leaders in some of this large scale, neuroscience related work. It would be just so exciting and be so many opportunities for scholarship and opportunities for students to get engaged. It'd be super exciting. Oh, yeah. Oh, yeah. Is there any specific stuff that is kind of necessary to undertake those types of things computationally or administratively? Or what? What do we have? And what might we be missing? Tanya Evans 23:55 Yeah, I mean, just the infrastructure, the human resources required to administratively handle the kind of funding that goes into those types of resources. And having that be centralized, having that be able to be a process that can easily occur across schools and across departments. Jack Van Horn 24:18 I think that's super important is making sure that things aren't so siloed that you can't talk to other people who are at your same institution, but because of the structure, it makes it really difficult to communicate across different schools and different departments. Tanya Evans 24:33 I think having the Brain Institute as a as a centralized institute that is kind of connecting this community together as a fantastic resource. I think that having computational resources that could support our work are important. Jack Van Horn 24:50 Yeah, I think that's gonna end up being very important. There's much as made of computing in the cloud, right? It's like as if this is just this easy. solution, we'll all just move to the cloud and all things will be wonderful. I, I still believe there is a very fundamental role for on premises, high performance computing at our institution, because for a number of reasons, you want to know where your data are being processed, because you want to know it's uniformly done on a common platform, you want to know that it's a resource that you can make statements about when you're applying for grants, or you're writing up your methods section, if you say you just we did it in the cloud, it's sort of that what did you do in the cloud. And that's a little different statement than saying we actually have hardware purposefully deployed to handle these kinds of data types. And I saw, I think that's really important Tanya Evans 25:46 for an institution to kind of take the lead with something like that. You need kind of proof of concept that that institution is going to be able to handle all all of these moving pieces in order to garner trust from funding agencies as well as collaborative investigators across other schools. Jack Van Horn 26:03 Oh, absolutely. Teague with all this data and kind of computational approaches, which are, you know, under development here and elsewhere, where's neuroscience going to be in 10 years? Or 20? Teague Henry 26:16 That's a great question. We're going to be doing Oh, I get to I get to make some prophecies about neuroimaging. Yeah, I think, let me think I can put together some safe predictions, and maybe some more risky predictions. I think that multiple modalities of imaging, and what I mean by that is a functional structural white matter tractography. But multiple different types of scanners, pet MRI, in CG, are a yeah, there's a lot of different types, that sort of multi modality study is going to become the most important way of doing this. I also think that increasingly, people are going to be interested in neuro imaging combined with intensive longitudinal behavioral data, just because we knew good pictures of the brain, and we need to understand the dynamics of the behavior. And so we're gonna get better data on the dynamics of the behavior. So I think that a lot of the areas around brain science is going to grow and connect with brain science. I do think that we are going to move I think, I hope this my more risky prediction, I hope that we're going to move as a field more towards a, a standardization. And what I mean by that is, every every field in within neuroimaging seems to have its own language, its own set of systems that they're interested in. Give me an example, I do network neuroscience, I'm familiar with talking about different networks of the brain. But if you go to somebody who's interested in circuits of the brain, then they might be talking about like, you know, the cortical Thalmic cortical loop structure, which is very much related to some of the networks that I'm interested in. It's the same tissue, right? So I hope that as a scientific community, we're going to be able to have a more standardized approach for talking about the brain within the next 10 years as as data continues to Jack Van Horn 28:20 give certainly some kind of a thesaurus. Translation, it also means the following. What are some of the Tani and I are, we're having been working on kind of this project together. And as we were developing the project, I was thinking about what are some of the computational kind of ideas that one wants to put into place over this next 10 years or so. And it sort of speaks to the notion of bringing in theory with the computation because you kind of do machine learning now. And you pour your data into the machine and it goes and does something and says Eureka, but I don't know what it did. Yeah. And I want to be able to take advantage of that. On the other hand, I do want there to be some sort of theoretical underpinning. And the idea I came up with was like, a spectrum from Maxwell's equations to machine learning. Maxwell's equations are very fundamental to how we understand electromagnetic energy and transmission of information. It's one of the reasons why our phones work, and a number of other things. At the same time, there's that spectrum of everywhere in between, to where we're just letting the machines kind of tell us things. Are, is there any place in that spectrum where we should put particular emphasis? Well, is there anything in network neuroscience that we want to Teague Henry 29:39 bring? Yeah, yeah, leverage in there. I would say that I don't think that it's necessarily a, you know, single line spectrum. I think that there are ways of making machine learning approaches a kind of these blackbox approaches explainable, right, there's the entire field of Explainable AI and I think This is going to be increasingly important to this. Because, yeah, we and I see people apply deep learning models to brain data all the time. And it to be a little flippant, and to be a little straw man, a, sometimes I kind of think that all those projects can be described as we have discovered that the brain is related to behavior, which, okay, I mean, the Greeks didn't really realize that back in the old days, but I think we have a pretty good, pretty good handle on that. So I would like to see more methods for interrogating these blackbox methods, right? If you, you know, build these deep learning methods, you show, hey, they predict Well, what in the brain predicts this? Well, what are you? Where is the data going in these black boxes? And that's a question that goes outside of, of neuroscience. It's, it's a methodological data science question. Yeah. And Jack Van Horn 30:52 I think this is super important when you're talking about these machine learning methodologies, which are really good at classifying things in from a clinical point of view, or you're looking at autism relative to, you know, typically developing children or you're looking at, you know, a range of different individuals, individual differences, a range of different clinical disorders. What are those things mean? And what's the thing that is telling you the difference between someone who has ADHD versus somebody who has autism spectrum disorder? Tonya, what things would you hope that you'd see out of something like that? Yeah, Tanya Evans 31:29 I think the most meaningful discoveries that would come about would be really, if we could identify the periods of time during development, which are best to intervene, and what those interventions should look like in order to optimize outcome. developmental trajectories. Jack Van Horn 31:50 Yeah, this, I think your comment right there really touch touches back on the notion of all this as a team science, right, is we can do a lot of computation and say, Oh, look, this classifier does something. But if to somebody who's in education, for example, well, what does that mean? How do I use that? And it really means we need to have more conversations, you know, like this one, to bring people together to talk about these? Yeah. Tanya Evans 32:15 I mean, it's about translating the science in a meaningful way to, you know, the consumers that, you know, in this case, it's, you know, parents or children with developmental disabilities in some of this work. And I really think that the investigators, so the scientists that are going to succeed from now and 10 years from now, are going to be the ones that are most fluent in team science, the ones that can sit in the middle, and talk to all of the disciplines and be able to translate between those. Jack Van Horn 32:49 It's really interesting, because I think that a lot of at least in probably speaking for you guys, too. In our careers, we're almost sort of socialized, we specialized, specialized, specialized and become one thing. And I've always resisted that, because I've always worried that I would look up one day, and no one cares about what it is I'm studying. And there's so much more interesting stuff at the intersections of these different fields. And it really does neat necessitate a team science. Teague Henry 33:17 Yeah. And that's, I mean, that's exactly why I love being in data science and having the background and methodological development that I do, because I'm very interested in developmental neuroscience, I'm very interested in clinical neuroscience. But I don't run my own studies in that instead, I get to collaborate with people, I get to play with all sorts of different data. And I can really learn a lot about a different a bunch of different fields. And that's made me I hope, it's made me a better communicator to the consumers of what I'm researching, which are the scientists who are using the methods that I'm developing, I find it's, it's something that you have to learn how to be able to say I have developed this new approach for analyzing your data. And here is how you can use it. Slash you also always ask your collaborators, how what are your needs? What do you need from a method here? Jack Van Horn 34:08 Yeah, that's super important. It's a two way street is very much I've seen it where, you know, a group of developers is told, you know, or software developers, you know, we need this and they rush out, they develop something and they show it back to the people who asked for it. They go, Yeah, we would never use it. That's not what we want. It happens happens all the time. It really does need to be an interactive process. I'm curious about, we're talking about the future here a little bit in the next 10 years or 15 years. Do you think that there are national funding agencies? Do you think that they get that that this this team science approach, that is really the way we're going to move these data rich approaches forward? And they understand that? Tanya Evans 34:54 I think they're getting there. I think there's certainly some movement. It could get better Um, some limitations that I see are that near length of most funding opportunities, cap at five years. And so to have a large scale collaborative project begin and end in that timeframe doesn't always seem feasible, Jack Van Horn 35:17 do you think private foundations might be a better, you know, maybe a little more nimble? And a little more forward thinking? Tanya Evans 35:23 I think so. I think that's a great space to think about this work to fit into. Jack Van Horn 35:27 Yeah. And philanthropy, you know, yeah, certainly, they can help pay for a really be helpful in paying for the infrastructure and the kind of the brick and mortar elements. Whereas NIH can and other agencies can help pay for the, you know, the actual hypothesis driven research or whatever that is going to utilize those resources. Teague Henry 35:51 I feel like so much of the issue is in infrastructure is in actually getting the data handling the data, running the data through. And to to do work to streamline that is, is not only going to result in better science, but much, much faster science, even by today's standards, Jack Van Horn 36:10 being able to like for example, developing workflow technologies that allow you to take the data in its raw form, and really automatize the process of getting it prepared. There's various cleaning things you have to do just to kind of get it into shape. So you can do those analyses. But if you can streamline that process, we can like get to answers. Potential cures for things so much faster, so much faster, but you got to have it in place. Yeah, Teague Henry 36:34 yeah. Yeah. I mean, last neuroimaging study that I was actively doing all the processing for, it took me six months to do all the processing. Because not because I didn't know what to do with processing, I had my own pipeline ready to go. It was just the the infrastructure that I was on, there were issues in in the specific formats of the data in just having things run. And if there was a cleaner setup a more standardized setting this getting back to what I said earlier about standards in terms of of the data in neuroimaging and neuroscience, I think that that would be a major, major benefit to researchers all over the place. Jack Van Horn 37:12 So much of the brain is visually compelling. It's, it makes for, like, it's very photogenic. Who knew? And oh, which really kind of underscores the role of of data visualization and the presentation of results. And, you know, with, we're, you know, brain imagers, we're neuroscientists, we, these photos, convey information, they convey the subject under study, they convey the results that we've obtained through our experiments. And I know that certainly, 3d graphical representations of these things that are interactive, are particularly compelling, I really enjoy that I know, Teague, you've played around with this a little bit, you know, and to be able to run some very complicated algorithms, turn them into something that we can hand to Tanya, and say, Look, here is a result based on your data. And hopefully, it allows you to see it in a abstracted, but possibly information rich and meaningful way, is something that we don't really, I don't know, if we would say that we don't put an emphasis on it. It's just that very often, that's kind of the result we need to get to, but it's something that's so so important. Tanya Evans 38:28 It's kind of the last thing that we think about, right? Yeah, it's we Jack Van Horn 38:32 do all the stuff, I thought magic happens. And then we're Oh, wow, we actually have to present this now. So it is kind of interesting to be able to play with, for example, some of the methodologies that are used in the film industry, for example, to create CGI kinds of things. I know, it's, you know, it's all kind of, at least from a Hollywood point of view, kind of fake. But on the other hand, it is a way to rep.. render and represent your data in a way that is very compelling. And you can tell a very interesting story with it. Absolutely. Tanya, are there any things it from a kind of clinical point of view over this next decade or so that you would like to see kind of data science play an important role and Tanya Evans 39:21 I mean, I, I began speaking to it earlier, but I can kind of elaborate a bit more in terms of really defining what interventions should look like and when they should happen. And so, if we could have, again, getting to visualization if we could visualize developmental brain trajectories, in you know, a typically developing child versus, you know, the heterogeneous scale of development going into clinical populations and really isolate areas of susceptibility in time areas of susceptibility in brain Network. because that ultimately will support certain skills that aren't quite set up at the right time. And can we intervene and make that happen in a in a better way, so they don't manifest with the actual disorder. I think that integrating with education system, particularly public education, where there is a lot of data on what kids look like in preschool, and in early school years, and if we can get to them early and make sure that they're ready for the classroom, both from the, you know, reading, math, social cognition, all of these facets, Jack Van Horn 40:40 there's a must be an important role for health disparities. Tanya Evans 40:44 Absolutely, absolutely. I mean, in terms of, you know, we have, we have a fantastic school of medicine and the School of Education here. And if we can kind of not miss out on this, like, age two to age five, you know, what, yeah, kind of just, you know, kids are here they're born, you follow them very closely with the pediatrician while they're young, and then and then they go off somewhere, and then they come back to school. But if if something that if something could be done in between to make sure that they're that they're doing well, then I think that finding ways to do that, from a neuroscience perspective would be really interesting. Jack Van Horn 41:18 Yeah, it's very, very important. Teague, any closing thoughts you want to share with us? While we're thinking about wrapping up here? Hmm. Teague Henry 41:30 Oh, just, you know, if you're if you're interested in in data science and brain science, I think, you know, UVA has a lot of expertise in that. So we're here. And, yeah, I think that the brain science, as a very data centric science is the way of the future, generally speaking, both in terms of funding and how we need to to work with large scale data. But I do want to wrap back to the point that, particularly when it comes to like clinical work, intervention work, that sort of thing, we have to be theoretically informed, I worry about the dangers of really like purely data driven intervention work or data driven inference work, where we are then developing treatments or developing interventions or developing that sort of thing. And I think that we have to have a good standard of care that we take, as we work through these problems, Jack Van Horn 42:28 certainly, best practices, practices, how we are going to comport ourselves with this absolute stuff, and absolute as as a journal editor of a scientific journal, when I see articles come through, where they've just applied some sort of pure machine learning thing to do classifications, I'm, I'm not very enthusiastic about those. And I will tend to not move those forward in the publication process, because they're often telling me something more about the Machine Learning algorithms applied, then about the brain that they were utilizing data from. And that to me is so one sided, it didn't take me any closer to understanding the brain that I knew a lot more about what you could apply this machine learning algorithm to, but that isn't really helpful. Tanya Evans 43:17 I think we should I, you know, at as a university, you know, think about our trainees and our students even at the undergraduate level, like, Should every undergraduate at UVA take a course in data science? Jack Van Horn 43:30 Yeah. You should. And, in fact, we've got plenty of brain data for them too. Apply some of these methodologies, I think it's more than just data science. However, it's really they should learn a programming language. Or two, they should become familiar with how to manage and maintain the literally 1000s and 1000s of files that you're having to manage. Being able to understand it at that level is helpful because you understand how if you understand how the study is conducted, right, you understand how the data is organized, that gives you a lot of insight and how to analyze it. And then you can apply these machine learning methodologies and other other technologies. So yeah, I would completely agree that we should start early as possible. Well, I want to thank you both for taking the time to have a conversation today. So again, Tanya Evans from the School of Education here at UVA Teague, Henry from the Department of Psychology and the School of data science and myself, Jack Van Horn from psychology and data science. We're really glad to have had this opportunity to chat with you both today. Yeah. Thank you. Thank you. Thanks, Jeff. Monica Manney 44:46 Thanks for checking out this week's episode. We'll be back with a new episode next month. If you're enjoying UVA data points, give us a rating and review wherever you listen to podcasts. And if you'd like to contact us, you can send us an email at UVA data points at virginia.edu We'll see you next time

Other Episodes