Episode 7

November 21, 2022

00:36:42

WikiProject Biography

WikiProject Biography
UVA Data Points
WikiProject Biography

Nov 21 2022 | 00:36:42

/

Show Notes

This bonus episode features a conversation between Lane Rasberry, Wikimedian-In-Residence at the UVA School of Data Science, and Lloyd Sy, a Ph.D. candidate in the UVA Department of English. In this conversation, Lane and Lloyd take a deep dive into the expansive world of Wikidata and ask the existential question, "What makes a person a person?" Or, more specifically, what data points make up a person? To help answer this question, Lloyd developed a large-scale data model of the biographical data contained within the Wikidata platform. This project serves as the foundation for their conversation. They also take a wide view of biographical data as it pertains to research and academia, including the process of gathering the data, the ethics of utilizing the data, personal ownership of the data, and much more. Anyone interested in these concepts should find this discussion valuable.

Links:

WikiProject Biography

Music:

"Screen Saver" Kevin MacLeod (incompetech.com)
Licensed under Creative Commons: By Attribution 4.0 License
http://creativecommons.org/licenses/by/4.0/

View Full Transcript

Episode Transcript

Llyod Sy I think one of the things I've learned on this is is how immense this this dataset really is, you know, the Library of Alexandria is is a common metaphor for, for projects of this type. I think it's a bad metaphor because there's no way the Library of Alexandria was this extensive, you know, I think that this thing shrinks that compendium of knowledge. Monica Manney Welcome back to UVA data points. I'm your host, Monica Manney. In today's bonus episode, we're featuring a conversation between Lane Rasberry, the Wikimedian and in residence at the UVA School of data science, and Lloyd Sy a PhD candidate in the Department of English here at UVA. In this conversation, Lane and Lloyd, take a deep dive into the expensive world of Wikidata and ask the existential question, what makes a person a person? Or more specifically, what data points make up a person? To help answer this question, Lloyd developed a large scale data model of the biographical data contained within the Wikidata platform. It's this project that serves as the foundation for their conversation. Additionally, they take a wide view of biographical data as it pertains to research and academia. This includes the process of gathering this data, the ethics of utilizing this data, personal ownership of the data, and much more. So the conversation is not limited to only this project, or even to the Wikidata platform. Anyone interested in these concepts should find this discussion valuable. And so with that, here's Lane and Lloyd. Llyod Sy Hi, my name is Lloyd, Lloyd Sy. I'm a PhD candidate in the Department of English here at the University of Virginia. I'm in my sixth year, I'm completing a dissertation, it's almost done. The dissertation is about deforestation and indigenous literature, from around 1820 to 1930. That work doesn't have much to do with my Wikipedia work, which I've been working on since I believe, January of this year, we've been studying various things about Wikipedia primarily related to the topic of demographic profiling, and demographic ethics. Other than that, I don't know I'm from Northern Illinois. And I like baseball. Lane Raspberry Thanks, Lloyd, you are perfect for this project. I'm Lane Raspberry. I'm Wikimedian in Residence here at the School of Data Science at the University of Virginia. Lloyd, tell me about this project, Wiki Project Biography on Wikidata, what was this about? And what did you do? Llyod Sy So Wiki Project Biography, as far as I can tell, is about assembling information about how we might describe a human being through data. What I worked on was the quote unquote, data model for wiki project biography, which is a list of properties that describe the human being. So to think of it another way, if you have a human being and you want to describe them, which properties do you use, there's obviously an infinite amount of ways to describe them. But some properties may be more salient or more universal than others. And I think those are the ones that we were trying to list, identify and provide examples of, in this model. Lane Raspberry I love how you said, as far as I can tell, this is what the project is about. And then you actually set up this data model. But that's the way we did this, this the way Wikipedia works, of course. So you went into this, and you checked out profiles or records of humans in wiki data, and what did you see? And then how did you use that to compile this this model in the project? Llyod Sy Well, like I said, what we were trying to do was come up with a list of properties that human beings might be described with. Certain properties, every human being can be described with, Lane Raspberry Like, what, for instance? Llyod Sy Well, I'll start with something contentious gender, right. I don't know if it's actually appropriate to say that everybody has a gender identity, but let's just say the vast majority of people have a gender identity. And with most people, it's readily identifiable. Race is something but I'll start, you know, for a less contentious thing, everybody has a height, right. And this is not always well known with people, but theoretically, every single person has a height. So one of the first properties I listed was height. And I found where that property was described on wiki data. And I put it in our data model, and I attached a particular person with their particular height to that property. So I began very general, I found things like, well say height, or occupation, or race or gender, all these things that are usually well known about anyone in the public eye. Then there were more particular things, right. For instance, if you're a member of the military, one way to describe you might be your military rank, right? I don't have a military rank being a civilian, but a soldier definitely has a rank and that's almost always publicly available. If you're an athlete, you're particular sports or competition is an a way you can be described, Lane Raspberry You might have a height also, Llyod Sy You likely have a height Lane Raspberry Height would be less important for a non athlete perhaps. Llyod Sy Sure, exactly. And so the data model attempted to account for these more specific kinds of human beings. And yet, I just kind of, over the course of a couple of weeks thought about all the ways that people can be described. Lane Raspberry Okay, let me back up. Why is wiki data compiling this kind of information? Why did why did we hit that already have these these records associated with humans? And now that you've compiled this, this list, this data model of all these different properties that wiki data tracks? What can a person do with this list? Llyod Sy It's a good question. I think the place to begin is to describe what wiki data is, Lane Raspberry Let's do it. Llyod Sy And I believe that you could probably come up with a better description than me, but I'm assuming you're gonna want me to describe what you did. Lane Raspberry Give it a go. Llyod Sy I think wiki data, from my perspective is the single largest and most comprehensive database ever compiled or at least ever compiled publicly. And by that, I mean, other databases focus on a particular field or a particular topic, you know, you might have a database of baseball players. Wiki data is a database of everything. Lane Raspberry What does everything mean? What's in wiki data? Llyod Sy Anything you could think of? Right? There's an entry and wiki data for things that might reasonably lend themselves to description through data. There's an entry on Barack Obama, for instance, there's an entry on the City of London, for instance. Okay. But there's also a, you know, an entry for well, to take an example that's, that's, that's meaningful to us. There's entries on particular scholarly papers, right, that maybe no one has read, you know, there's entries on the smallest mountain in an Uraguay right. Lane Raspberry Every mountain Llyod Sy Every mountain probably, on lighthouses, right, on obscure novels, Lane Raspberry Buildings, novels, publications, people, cities, Llyod Sy Literally everything and things that are abstract to, you know, joy, or angst, or impressionism, you know, or a particular sub movement of impressionism. Okay, everything that you can think of, is described in wiki data. And all under the auspices of a single umbrella, right? It's all within one datasets, Lane Raspberry The wiki data dataset. Llyod Sy The wiki data dataset. Lane Raspberry Okay? What does a person see if they look up one of these concepts in wiki data, Llyod Sy You'll see something that I believe is intuitive that but but that maybe it's not intuitive, if you're not used to looking at data, you'll see a list of basically boxes and the boxes will generally have two things; the property and the way in which this entity fulfills that property. Lane Raspberry Can you give me an example? Llyod Sy Sure. So let's, I don't know, let's pick a person that I that I like Lane Raspberry You mentioned, Barack Obama. Llyod Sy Okay, we can go with Barack Obama. You go onto Barack Obama's wiki data entry, and you will see gender, male, race, African American height, it was he is like six foot three or something. Okay, Alma Mater, Occidental College and Columbia University and Harvard Law School, you know, sometimes you're gonna have several fulfillments of the same property, and so on, and so on. Right. And most ways, you can think about describing Barack Obama will be listed on the wiki data entry for him. Lane Raspberry All right, what's, what's the point of publishing information in this way? What good is it to have this information and wiki data or to know that this information is there? Llyod Sy Wiki data is queryable, which means that you can easily ask the data base questions, and it pops out answers. Lane Raspberry What kinds of questions? Llyod Sy Well, if we're sticking with Obama, you might ask the question, who are all the African American politicians who came to office before 2012? And Obama would be one of the people who you get in this data set, but you'd also see a whole list of a bunch of other politicians. Maybe you get a sense of, of of a particular demographic, you Lane Raspberry Can you can you go through that again? So that query, what information has to be in wiki data for that to work? Llyod Sy Okay, so if I so for instance, I took the query, who are all the African American politicians who came to office in the United States before 2020 12 or something, okay. So in order for wiki data, to be able to answer this question, the database needs to have certain features. It has to have entries on the politicians, it needs to know their race. It needs to know the fact that they're politicians. It needs to know when they came to office. It needs to know where they came to office. In America, right? And, yeah, it needs to know all those things to send this back at you. Lane Raspberry And so if there's records for all these, these people, and it has this data for them in wiki data, and then someone presents the question to wiki data, then what what does we need to do in response? Llyod Sy Well, you write the way you ask this question is you assemble something called a query. Queries are written in a particular, I think you'd call it a scripting language. Maybe it's not a scripting a query language, you know, ours is called sparkle, I believe. And it looks technical and complicated. But once you get the hang of it, I think it's it's a fairly easy thing to manipulate. You submit a query to wiki data, and it pops back out at you. A data set a table, the list of the people who made the list of the people who met that meet that criteria? Criteria, those criteria? Lane Raspberry Yes. All right. Is it just for people? Or what can you? You could do Llyod Sy I mean, you love to talk about the largest city with a female mayor, right? You could ask all sorts of questions. Right. But but let's try something really esoteric, right? Like, could wiki data give me a list of all of the people at a university who identify as women who are not white? Who are? Who went to a university located in a in a Pacific Northwest state? And who were born after the year? 1950? You know, it, I guess, we're still talking about people, you know, but But Lane Raspberry can we can we get to answer a question like they can? Llyod Sy It can? Well, it can. And it's dependent on I suppose several things, what's what's dependent on the dataset being complete? Right. All right. For instance, you couldn't answer the question I just provided if we don't have any information about the alma maters of professors at the university, right. So we can answer to the question, so long as the data is there. But yeah, otherwise I can answer the query, Lane Raspberry who puts the data and wiki data? Where does this data come from? How does the person know if the data is actually in wiki data? Llyod Sy It's a good question. And a lot of the a lot of the people who contribute to wiki data are people that I don't personally know, you know, it's people who, for some reason, or for some reason or another, decide to take a dataset, and have it undergo a process called curation, which is the cleaning up and they somehow input it into wiki data, right? I don't actually know how one does that at a large scale. But like Wikipedia, famously, wiki data can be edited by anyone. So if I go on to an athlete's page, and I noticed that the page does not list their height, I without I think even registering for an account, can edit the data, edit the page, edit the the the entities, descriptors, and add their height. So I think it's people who, you know, anyone can edit it. So anyone can upload a curated dataset. And I imagine much of the data on wiki data comes from people who have worked on these large scale datasets. Lane Raspberry Right, right. Like in the case of athletes, or so many people with sports statistics, the Olympic Committee or different sports teams, they put out data sets about the athletes, those can come into wiki data, you're talking about school's alma mater is where did people graduate from universities themselves, put out lists of people, their faculty, their profiles, this stuff comes into wiki data. So back to wiki project, biography and wiki data. You set up a data model there to me again, what is what is a data model? What did you set up? And what does this look like? Llyod Sy A data model is a list of all of the qualities with which we might describe human beings. Now, again, that is an impossible ask, there are an endless amount of ways that you could describe a human being. And especially once you get down to the nitty gritty of like, different occupational, different occupational descriptors, it's impossible to list them all. We tried to list say, a representative sample. We tried to show basically, that you can break down human beings into different into a list of different properties. Lane Raspberry All right, you tell me what properties Did you find? Let's go with some general categories. Like can you break these down into categories? Llyod Sy Yeah, sure. So I think one category I had was, like, biographical descriptors. When was a person born? When did the person die? Where was the person born? Where did the person die? Or was the person buried? How tall are they? How much do they weigh? I think those are the things that who who was their father? Who was their mother? Okay. So family, siblings, family, things like that as well. Yeah. Yeah. So those are all things under biographical descriptors. You might get other things like related to. We haven't talked about religion, religion. Don't tell me something about that. So obviously, one thing you can list about a person is what religion they practice. But I think there's a separate property or maybe like, what worldview they hold, or what, like an ideology or philosophy and ideology or philosophy, I believe vegetarianism might be under one of these sorts of systems of thought, you know, or practices. But you see, like, even with religion, like, once you get down into the more specific things about how people are described, like take take, take Catholics, right, you can describe them by the particular order of friars that they belong to. And perhaps you can even describe them based on the rank that they hold within that order of friars. You might describe them based on on the places that they visited, you know, in the past. So it's truly endless, right? And we're just trying to pick a representative sample, Lane Raspberry what happens if there's a wiki data record for a person and you don't know their religion, then then what happens to that property? And that goes, Llyod Sy well, it's just not listed for the person. And if you query based on that particular property, then they won't show up. And maybe this is what I was talking about with datasets and being incomplete. Right. Let's say I wanted to find out a list of all the Muslim faculty at UVA, I'm assuming that that's not publicly available for a lot of people. And so if I asked that question to wiki data, I might not get more than like one or two, if any, you know, Lane Raspberry depends on how many people have publicly declared this, have publicly Llyod Sy declared it and if that publicly available information has actually been put into wiki data, which is not a guarantee Lane Raspberry about how many properties did you identify? And how did you identify any of these properties to put in this list that you created? Llyod Sy I think we identified about 100, maybe you told me to start with I think 30, and then I just had fun. So I kept on going and got to 100. And at the beginning was fairly easy. I was like, Well, I know everyone's going to be described via their mother or father height, all these things. But then, you know, once you start getting further into it, for me, it was a matter of looking up people to begin with, and seeing what properties that they were described with, that I hadn't put in my data model. And, and for instance, when I went to when I went to when I started thinking about military people, I went onto general, Douglas MacArthur's page, right. And I saw that there were all these wiki data properties I hadn't expected, you know, like campaigns served on right or military award received, you know, so then I put those to the model. Lane Raspberry So if you're not a military person, those things just wouldn't be on your profile. Yeah. Llyod Sy I mean, most civilians haven't won a Purple Heart. Yes, yeah. Yeah. Lane Raspberry Other surprises what other people did you find that had unusual properties tag to them? Llyod Sy You'd never guessed the stuff that wiki Data Editors come up with, you know? What's a good one? Do you have do you have the thing up? I mean, hair color is one that I was thinking about right? Hair color as applied to fictional humans, you know? Lane Raspberry So this isn't just real life humans. This is also fictional humans? Llyod Sy Well, you know, this is more of a philosophical debate, right? But like, arguably, fictional characters are also bound by the rules of biographies or, you know, they have identifiable aspects. Lane Raspberry Okay. So they might have a gender, they may have an occupation that may have served in a military campaign. And one more, yes. Llyod Sy All right. You know, in the case of hair color, Tintin has red hair, you know, and that's listed, who would think that there's a dataset that has the stuff, I'm looking through other sort of surprising things, okay, ancestral home, you know. So I, so one of the categories that I worked on was race, ethnicity, and nationality. And so that's trying to describe various properties that describe where a person is born, where they're where they currently live. And when I started looking at a lot of Chinese people, I saw something recurrently which is ancestral home, you know, what does this mean? It is, according to this descriptor on Wicca data, the ancestral home the place of origin for ancestors of the subject, right. So for instance, Sun Yat Sen's ancestral home is in a province called Dongguan. And that's a way with which you can describe someone which I'd never even heard of, you know, Lane Raspberry so this, this could mean where somebody from but this has a particular meaning in Chinese culture, and many Chinese people want this as part of their record. Llyod Sy Sure. Or, you know, I don't know if they want it, but it is publicly available publicly available in any case. So it's, it's worth pointing out enough to be publicly available. Lane Raspberry Anything else that was specific to a community or culture that you saw Llyod Sy many things like take academics, right, doctoral advisor, you know, Erdogan number? Are you familiar with this concept? Lane Raspberry Yeah, explain it. What is it? Llyod Sy There was a hunt, I believe a Hungarian set right, a Hungarian mathematician named Paul Erdős and he was very prolific. So he wrote tons of papers, so many papers that he collaborated with tons and tons of people. and your Erdős number is in a way of describing how many degrees of separation you are from Paul Erdős, so I think he's dead now. So it's unlikely that, you know, a scholar working in mathematics beginning today has an overdose number of one. But it's likely that their advisor worked with their dose or maybe their advisors, advisor work with their dose. And so that's a descriptor that's entirely confined to the specific community of mathematic mathematicians. Right. Lane Raspberry But something that's important to this community, they're publicly remarked upon, yes. Llyod Sy And is on wiki data for most mathematicians. Lane Raspberry Other stories like this, very interesting. Llyod Sy Yeah. Let me see. Let me look through this, you know, blood type, apparently, you can you can find the blood type of various people. Let's see, let's see. Just okay, though. This is one that you might not think is described, well, criminals are often described on wiki data, in what way? Okay, so here's a property under my section called criminality. number of victims of killer, right. investigated by, you know, their specific criminal charge what they were convicted of, right. Lane Raspberry of interest to people who follow true, correct. True Crime stories? Llyod Sy Well, I hope so. I hope there's not a community of serial killers out there, you know, but, but yeah, so for instance, this guy named Yoo Young-chul who I don't really know, killed 20 people. And on on his wiki data, you can find a property called number of victims of killer. And for him, it's 20. You know? Yeah, yeah. So presumably, one of the questions you could query to wiki data is, give me a list of all the people who have killed at least 30 people. Lane Raspberry All right, yeah. Of interest to some people and AI. How does this relate to Wikipedia? Could you say something about the relationship between wiki data and we can preview? Llyod Sy Yeah, so I think formally, and you know, better than I do, they both fall under this thing called the Wikimedia Foundation. Right, which is a space which which runs or organizes a series of various sister projects, Wikipedia, wiki data? Wikimedia, right, I think I use wiki source, I love wicktionary, which is, I think, the best dictionary in the world. And it's, and it's, as I understand it, Wikimedia is just devoted to making knowledge free and open to the public, wherever you are in the world, as long as you have an internet connection. Wiki. Wikipedia, is by far the most publicly publicly known and famous of these of these projects, it's an encyclopedia of various things. Wiki data is a way to take all of that encyclopedicly described information and break it down to structured data. So the relationship is, is sometimes you see something on Wikipedia, that's not on wiki data. And so you can create a data element on wiki data based on what's said in the Wikipedia page. So in a way, like Wikipedia information can filter down to wiki data. I imagined it might also go the other way, sometimes where information first appears on wiki data, and then find its way to Wikipedia, somehow, I'm less familiar with that process. Lane Raspberry So it's a hot topic still being determined here. But some interplay between the two. Now that you've compiled this data model, and it's possible to query a number of people by their crimes, number of artworks from an artist's number of publications from an academic of these other things? How would you want anybody to respond to this? What would you want anyone to do in reaction to this? Either answer it for what would you want the wiki community Wikipedia editing community to know about this? And what about two non wikipedians people on the outside? Llyod Sy I think the Wikipedia community, the Wikimedia community needs to really consider how ethical This is. They already are considering this. It's not like I'm the first person to be like, there's gonna be ethics. But it's publicly available information, right? I mean, I don't know if everybody wants to be described, certainly nobody wants to be described completely right. You know, there might be a wiki data entry for like, you know, the mistress of this particular person, you know, and I imagine people don't want their affairs coming out in the public. And that's just an extreme case. But there are all sorts of I mean, there's private people who don't want any of their information to be available on Wikipedia. And, and especially available on wiki data, because if it's on wiki data, then you become sort of, sort of obviously in manifestly a collection of organized aspects, you know. So I hope and I know that it's already going on, I hope Wikimedia community members really ask themselves What are ethics of this and when is it? Okay, what are the bounds? What are the boundaries for us to list data and to put data up on the internet? Because this is all again, public and available. And, you know, I hope that the public takes it seriously, because it's an enormous project. And it's a project of great significance. Maybe this the greatest collection of data ever assembled in one place. So the public should join in this conversation, basically, the public should be on Wikipedia and Wikimedia also, talking to people about about, you know, I think I'd want, in particular, those who have privacy concerns to talk about what specific privacy privacy concerns they have. Yeah. Yeah, yeah. Lane Raspberry If somebody were to enter Wikipedia, I want to join one of these talks about privacy or social issues or ethics. What would that look like? What is how does the wiki community talk about these things among themselves? And what does it mean for someone who's not in the wiki community to go into the website and then join one of these conversations? Llyod Sy It's incredibly easy. And also, it can be daunting. It's incredibly easy, because anyone can make a Wikipedia profile. And start editing. I mean, you don't even need you don't even have the account to start editing, right. But I never actually told you this. But I first created a Wikipedia page when I was in seventh grade. Oh, tell me about it. Yeah, I was a well, I was a seventh grader on the internet. And as I said, I love baseball. And I found somehow on Wikipedia, a list of baseball player pages that had yet to be created. And mostly it was, it's clear someone was doing this alphabetically. And so a lot of the Z's were not yet taken. So there was a player for, I believe, the Boston Americans, which is a team that is now called the Boston Red Sox, in the very first decade of the 20th century, his name was guys in. And as a seventh grader, I thought, Well, it'd be kind of cool if I knew how to make this page. And that page is still up. Wow. And it's just my prose after 15 years, mostly, you know, and it's just a list of like, how many RBIs? Yeah, there's nothing special. But yeah, it's still up there. And I mean, no one's had any. You know, the point is that a seventh grader could do this, right. And then I made a page and I started editing Wikipedia, and I created something that's still knowledge to this day, right? To actually discuss more philosophical matters of the kind that we're talking about here. It's, it's important to note that every single Wikipedia and wicked data page has a TOC page attached to it, which is a page or you know, if you're looking up in on the desktop, and you see the tabs at the very top of the page, you'll see a talk page right at the very top. And if you click on that, you'll see various Wikipedia users who care discussing debating arguing, contentiously or otherwise. About things on the page, you know, and, yeah, they're there. There's discussions that, that, that go nowhere, there's discussions that haven't been answered in years, you know, but potentially every single site, every single article is the starting point for some discourse. Beyond that, with wiki projects like this, which are which sort of, I suppose live outside of actual pages, you can discuss more specific topics, right? I any, any example would do. Let's say there's a wiki project Canada, I'm sure there is. You can find it very quickly. And that is a community of wikipedians, who care about topics and articles relating to Canada? And maybe they have a discussion going on right now about how we can, you know, better edit the, the pages of Canadian, I don't know, policeman or something, you know, and you can join in that discussion. I say, it's daunting, because sometimes these discussions are things you've never thought about before, you know? Yeah, like, like, I wouldn't know how to contribute to wiki project Canada, even something they do care about, like wiki project baseball. Yes. I'm sure they're talking about topics, and they have very strong opinions on them. But I don't yet so it's easier for me still invited. So I am still invited. Yeah, yeah. But but you know, not every party that you get invited to is one that you can talk, you know, so it's both incredibly easy to join. And maybe difficult to feel as though you're making an impact. That's my perception as a kind of incipient wikipedians Lane Raspberry tell me what it was like putting together this this wiki project. Like how did how did it feel to do this? What's it mean to compile this list and then put it out there for the wiki community? Is it intimidating? Were you comfortable? Were you shy? What did it feel like? Llyod Sy Well, you you got me involved on this. You know, I think you started this right and From the beginning, he told me that that people hadn't really touched this topic because it's way too difficult to try to, to encapsulate the human being, you know, through through a list of properties. I didn't want to touch it. Well, then you had me do it. Right. And I think the part that was most that we were most concerned about was with every single property, we had to list an example. Right. So I gave the example of Sun Yat Sen, I think, whose ancestral home is Dongguan. So how did they pick Sun Yat? Sen. Right? I had to ask that question for every single property, right? Like if I'm, if I'm, if I'm saying that, that, that there's a property called Father, and it's exemplified by this thing? How do I pick this thing? Lane Raspberry I should have asked you earlier, where did these properties come from? Who writes these properties? Llyod Sy These properties, I imagine were begun at these properties were created at the beginning of wiki data. Some of them were, you know, like, when the wiki data project started, you know, people were probably asking themselves, how do we describe things? How do we describe people, and some of the more obvious ones came to mind immediately, you know, like, you know, we're probably going to need a property for height, we're probably going to need a property for, you know, sibling or date of birth, right. And I imagined people started creating entries. I don't know anything about the history of wiki data, you know, Lane Raspberry it's how it happened. Yeah, everyone just made it up. Llyod Sy Right, right. But then, you know, to take the ancestral home example, I'm assuming at some point, like, some dedicated set of Chinese wikipedians, or people who know about China were like, you know, ancestral home is a very important quality. Right? So we should have a property called ancestral home. And yeah, it grew organically. But but maybe this day, I, you know, once thinking about, about properties, Lane Raspberry yeah, you listed a lot of properties. Of course, I don't know the story of all of them. But that's how they're made. Some dedicated group of people says, We need a property for such and such they talk it through, your property gets graded, and then it gets applied to people or concepts, or whatever the case may be. And they all have a story. Yes, it's one of those stock pages, you can go and see the people talking about why this property is so important, right? And then it's right there. So you assign people to these properties, or give it give examples. How did you choose other examples for these properties? And what kind of example are you talking about? Llyod Sy So with every property, we have several, so the wiki, the wiki, project data model is like a table. Yes, right. And the the row, the column headings on these table, every you know, first I list the properties name, then I listed properties, alphanumeric identifier on wiki data, because they all have alphanumeric identifiers. Then I list the description of that property. For instance, you might not know what an ancestral home is. So there's a more detailed description in the third column. The fourth column holds an example of this property. Lane Raspberry And how did you choose the examples you said about suing outside, but what about the other ones? Llyod Sy Largely, largely, I would describe the process as controlled chaos, right? I wanted to get representation of people, you identify the problem for me, which is that when we talk about data, we often use mainstream examples so that everyone knows, right? It's just so happens that often those mainstream examples are white people, or they're more often men than women. Right? Everyone knows that they're underrepresented groups, in every aspect of life. So my goal was really to use different examples for every property to achieve diversity in many, many ways. racial, sexual orientation, gender, nationality, fictional versus non fictional, right. Contemporary, dead alive, you know, I just wanted to get a full, you know, a roster, a representative roster of human beings from every aspect of life. It's a massive undertaking. It's also when I certainly failed, you know, like, I'm sure I missed out on something, you know, like, I don't know if I had any Argentinians. Right. But I tried, you know, and, and that was just drawing on my own sort of knowledge about people from different places. And, you know, after sometimes I was like, Well, I haven't had a woman in a while, I wonder, like, which, I wonder, like, I wonder like, What woman would be a good descriptor for this property? Right? Or I haven't had anyone from Australia here. Can I think of any Australians, you know, Lane Raspberry what if somebody sees your list and they feel left out, like, I'm sure that I'm sure there's people who do feel left out, you know, what should they do about that? Llyod Sy Well, they should complain to me, you know, they should and that's, that's the point of the talk page, right? Like, let's say I didn't list anyone who As trans, right, let's say I didn't list anyone who was from South America, a trans person or a South American should note that, and then we can replace one of the people with an example. That's more representative. Hmm, I think that's a good practice. You know, Lane Raspberry that's how wiki works. Yeah, yeah, Llyod Sy it should, I mean, a project as encompassing and diverse as Wikipedia should, should be should seek to be diverse and representative and all of its facets. Yeah, I think Lane Raspberry that's great. Anything else you have to say about wiki project biography and wiki data? Llyod Sy You know, I, I think one of the things I've learned this, through the various things I've worked on with you is is how immense this this dataset really is, you know. It's, the Library of Alexandria is is a common metaphor for for projects of this type. I think it's a bad metaphor, because there's no way the Library of Alexandria was this extensive. I think that this thing shrinks that compendium of knowledge. I'm someone who cares a lot about knowledge, and I'm care about knowledge being publicly available. This is just about the best project there is for that. And so it's inspiring constantly. Yeah. Lane Raspberry You were really the right person to put this together. Lloyd, thanks so much for your contribution. Thank you. And thanks for talking with us. Yes, Monica Manney thanks for checking out this week's episode, we'll have a link to Lloyd's wiki project in the show notes. Music for this episode was created by Kevin MacLeod, and links to his music can also be found in the show notes. We'll be back on December 1 with a conversation about sports analytics. We'll see you next time.

Other Episodes