Creating a Biased-Free Job Board with Matthew Thomas

<intro music> Monica Manney Alright. You have 20 seconds and two sentences to give me a definition of data science. Whatever your definition of data science is. Matthew Thomas Two sentences, okay, let's see. Data science is I think other people have said something along the lines before, but basically it means that you're better at programming than statisticians, but you're better at math than programmers. And if you hit that sweet spot, you're a data scientist. Monica Manney Welcome back to UVA data points. I'm your host, Monica Manney. Today, we have a bonus episode featuring a conversation with Matthew Thomas, a data scientist at Inclusively and alumnus of the UVA School of Data Science's online MSDS program. In this episode, Matthew discusses how he works with his colleagues at Inclusively to create and maintain a job board specifically designed for job seekers with disabilities. Through this conversation Matthew explains how typical job boards come with many built-in biases that can screen out qualified individuals often times without them even knowing. Matthew discusses the challenges of removing biases from algorithms, and he expresses the importance of honesty and self criticism when examining a data science project. As Cathy O'Neill stated in the previous episode of data points, we should always be asking the question "for whom can this fail"? Matthew's work is a good illustration of this sentiment in practice. In addition to discussing his work, Matthew also delivers solid advice for anyone seeking a similar career path in data science. So, here's our conversation with Matthew Thomas. Alrighty, so we're gonna have you first introduce yourself and just give us kind of your background. Matthew Thomas Okay. Yeah, so my name is Matt Thomas. I live in Fredericksburg, Virginia. And I work for a company called Inclusively, which is based out of Richmond, but our employees are kind of all over the country. And I got my master's in data science in the UVA School of Data Science online program. I was in the first cohort. So graduated December of 2020. And I've been working with inclusively since then my job title is data scientist. And most of the data science work I do at I nclusively is either, you know, business analytics. I'm managing their data analytics or doing machine learning around NLP like text stuff, because we are, we're a job board. So most of what I do is, is dealing with text. Yeah, and I've been doing data science stuff, probably since about 2018. Before that, I was a small business owner. So kind of a career change for me. But yeah, Monica Manney Awesome. That sounds like a lot of our online students kind of that career change. That's really exciting. So I want to ask you just for an overview of Inclusively, just talk to us about what you all do there what your mission is just the background of Inclusively, please. Matthew Thomas Sure, Inclusively is both a job board. And it's sort of in transition, we're looking to become more of a community space also. But it currently it's geared towards job seekers with disabilities. And so it differs from other job boards in a number of ways. One of it is sort of how we do our job matching, which I guess I'll talk about more in a moment. It also has a few other differences. One is that our clients are employers, so we don't just post any job, we post jobs specifically from employers who are looking to diversify their workforce, particularly when it comes to job seekers with disability. So they're sort of already coming into it with that, you know, mission in mind. And so we have several large employers larger and smaller ones, but Accenture is on our platform, United Health Group, Comcast and some others to its inclusively.com. It also comes with features for example, we make it much easier for jobseekers to request, typically the term used is accommodations, our website called them success enablers. Bu t for example, if you need something in particular, let's say for example, you can you can only work remotely because of mobility issues. For example, you can sort of state that it's just part of the overall profile the employers know it going in the employers know for which jobs are offering which accommodations they can offer. It's sort of streamlines that whole process, which makes it a lot easier for those job seekers. Most of the big job boards don't have anything with that. The only opportunity you get to discuss it with them is you know if You're invited to an interview if you make it that far. So we make that part of the process. And then, of course, there are other features too. Like I said, there's a community board where people can go and share their experiences. And we're looking to expand that. And then of course, we have our own. Well, we have a talent team, actually, that helps job seekers. So there's actual people that you can contact if you need assistance with anything, or you know, to help employers as well, but we've also it's kind of a constant work in progress, but we have our own job matching AI that we use. So we don't, we don't outsource that part. And that's another kind of one of our big differentiators. Monica Manney So my other question is kind of going backward a little bit. So you talked about kind of a lot of text and machine learning, but what does the data scientist do in this environment specifically? Matthew Thomas Okay, so do you mean the machine learning part of it? Monica Manney I want it all. Matthew Thomas Okay, well, we're a small company. So I wear several hats that maybe in a bigger company, a data scientist might not. So one of the big things I do is I manage basically the data analytics for the company. So we have business intelligence tools and dashboards that we create, so that the business stakeholders, the higher ups can make wiser business decisions. A lot of that is not machine learning that I do. It's mainly data engineering. So I, there's, there's a database that lives in the cloud, and I do ETL, extract transform load functions, I guess, is the word I'm looking for to make it easier to use on our Business Intelligence platform. So that's all data analytics stuff. And actually see that as an area of big growth in data science kind of making that stuff more advanced, I'm realizing as I'm doing it, there's a lot of potential there. So in terms of the machine learning, what we do mainly is a lot of information extraction. So tech related to text. So a lot of what I do so for example, we have to the the jobs are posted. And what a lot of my machine learning models do is extract the information from those job postings that we need in order to find a good match for that job. So that could be things like hard skills. If it's a technical job extracting programming languages, that just as an example, it could be extracting degrees if the person, you know, if a job requires a higher level degree, or if it doesn't, extracting that information. So it's a lot of algorithms like named entity recognition, where it, the idea is that instead of using a rules based approach to say, Okay, well, if you find one of these skills, you know, extract it, it will know, even if it's never seen it before, so that's kind of where the machine learning part comes in. Then when we extract that we do something similar with the candidates, we extract their skills and so on from their resumes that they upload. And then the algorithms try to find a good match based on that. That's kind of the short version of it. So in that sense, it's, it's similar to other job matching AI, they all kind of work like that. It's in the it's in the particulars where you where you know, where you differentiate, but so most of the machine learning I do is that or text classification. For example, classifying when someone posts a job, we have to classify it into a category to make it easier for our users. Is this a computer? It type job? Is this a an administrative type job, legal, whatever. So yeah, most almost all the machine learning I do is some sort of text based Monica Manney Very, very cool, thank you for sharing that. So another question for you. How does the Inclusively algorithm differ from other job boards? So what aspects of the job postings resume etc? Does a platform need to address the other platforms don't? Matthew Thomas Well, a lot of it has to do with how you use the what information you extract, and then how you use that information. So one of our differentiators is that we a lot of job matching AI offers resume to resume matching, which from a pure machine learning standpoint is kind of easier because you're matching like to like so in other words, you present let's say you're hiring for a particular position, you may have a resume that you hold up as kind of an ideal like I want resumes that look like this resume because people with this resume have been successful in our company in the past. And we don't do that at all. Because even though again, from machine learning stand point, it's probably easier to do it that way. It is, in our view, basically impossible to avoid perpetuating biases by doing it this way, even if you're doing it inadvertently. Because these, you know, these algorithms are deep learning algorithms and you know, deep learning can find stuff sometimes that you don't necessary or make connections that you don't necessarily want it to make. So what can end up happening, if you take that approach is that you end up with a workforce that kind of looks like the workforce you already have, which is probably not your goal, if what you're trying to do is, you know, have more, for example, be more inclusive of people with disabilities. So we don't do that at all. The other thing that we do that differentiates us is that we're very careful how we extract and use soft skills. So harder skills are a little bit easier. If an employer is has a job, and it's a software engineer, and you're mainly programming in Java, then you can you extract that skill, and then you look for resumes that have that skill. So that's a little more straightforward, but a lot of jobs, you know, will post soft skills in that where, we may or may not want to extract. So if the job is a customer service related job, we may extract that and then look for resumes where someone has accustomed some sort of customer service experience. And that could be a lot of different things that could be in hospitality, or whatever. And we may use that to match. But when jobs say things like, excellent interpersonal and communication skills, we won't extract that or look for it, because number one is too vague. And number two, those things can sometimes scare off some or discourage, I should say, people with some types of disabilities from applying. Even if it's not really that crucial for the job. A lot of a lot of jobs will just include this kind of boilerplate language. So that's one example. There a lot of others. So when I go through when we're I look for that isn't the data labeling process, when I go through jobs, in order to create my machine learning models, I don't have ready made, I have to create my own data. So I go through jobs, and I'll label different skills and what type of skill it is. And that's kind of where I look for where I'm most conscientious of this. So I will label skills that I kind of have to think about it. And sometimes I'll consult with other people, is this something that we want to extract and use? Or is this something that is just going to have a negative effect. So that's another that's another big differentiator. The other thing is that we this is kind of, I think, the most crucial point, most job matching AI is used to make HR people's life easier. And so in the words of data science, they if it gives the data scientists who create this AI more of an incentive to care about false positives and false negatives. What I mean by that is, if you they care very much about not passing along candidates that are not suited to the job, they don't care as much about filtering out candidates that may also have been suited for the job. Because if you're posting a job, and this job matching AI, presents the HR people with 10, like pretty good candidates, then they might say, Okay, well, that's job done for us, like the algorithm works really well, we didn't give you anyone who wasn't really suited for this job. So we're good, they might not care so much that there were another five that also would have been well suited for the job. That's what I mean by they care more about false not having false positives, they care more about not presenting someone that's just not a fit than the other way around. We don't take that approach, our approach is to kind of start by assuming you know that all the candidates could potentially be a match, and then lifting some up depending on you know, if they have a hard skill or, or if they have experienced in the area, whatever it is to kind of push them up to the top. So if anything, we can't we go sort of lean a little bit more in the other direction, we would rather not have false negatives, and maybe a few false positives may go through there. You know, obviously, we try and tighten up the algorithm as we go. But that's that's a fundamentally different approach. Monica Manney Absolutely. Thank you so much. So one of the things that you said was the word fit. And I think that you know, when we talk about fit in the work that you do, versus the work that companies who do not use your work, do fit can be something different, right? So I want to ask kind of what is the difference between applying as a person who is able bodied and applying as a person who is not able bodied? In the application process? I know you've walked us through some examples, but what does that look like? What are some of those differences? Matthew Thomas Well, one of the main differences I think, is In the in how you communicate accommodations, this is something that some of my bosses spend more time on, than I do. But a lot of people with disabilities, you know, they don't feel comfortable self disclosing, and they don't feel comfortable as a result requesting accommodations that they may need in order to be successful in their jobs. Or if it's something that they absolutely need a screen reader, for example, or, or there's, there's lots of, you know, technologies out there, they may have to request it, but it may, depending on the company, they may be reluctant to do that, and therefore more reluctant to apply for the job in the first place. What we do is make that process up front. So when someone's on boarded onto our site, they can actually succeed, choose which success enablers they're going to need, both for work or for an interview process, because they can be different, they can do it right up front, or they can add it to their profile later, that can be changed at any time. And they can see, it makes it a lot easier for both themselves. And the employer and employer, when someone connects to a job or applies for a job, the employer can see what the success enablers are for this particular candidate. And if there's one, they're simply not able to offer because of the nature of the job, you know, that they can communicate that at that time. But we obviously encourage employers to offer you know, they really shouldn't be offering any success enabler they can. And so that's fundamentally the difference it makes it streamlines that and makes it a lot easier as part of the platform and you don't have to self disclose a disability by the way. Of course you can. And sometimes people might share that on our community boards and things like that. But it's not necessarily necessary to disclose a disability, to request success enablers. So that's a fundamentally different part of the process. Because if you go to just a regular job board on pretty much almost all of them, you won't be able to do that until you would have to apply for the job first. And then assuming you don't get screened out, maybe then you talk to someone and then maybe then you could bring up the issue of accommodations. So it's much more, it's much harder for those candidates to do it that way. Monica Manney That makes a lot of sense to me. And I think, you know, following this same pathway here that we're going down, a question that our listeners might have is, you know, does it disqualify you to disclose disabilities? Matthew Thomas Well, it certainly shouldn't. And that's, that's generally illegal to even do that. Anyway. I don't know that most companies would, I think, just about any company, if you ask them would say, No, no, no, of course, that doesn't disqualify anybody for a job. But I think where it becomes more of a problem is not so much that kind of overt discrimination, but more problems, like I said, where people are reluctant to even request the accommodations in the first place. Because for whatever reason, maybe they're worried about self disclosure, or, or the there can be a lot of worries there, they don't know what the company's commitment to this is, in the first place, you know, whether they're doing just the legal minimum, or whether they're actually are more committed to, you know, to be more inclusive. So I think that's where the more pernicious effects can happen, just kind of the unknown, whereas we, you know, we try to build it into the platform itself. So there is no one down, it's kind of its reason for being Monica Manney that's awesome. Thank you so much. Um, let's get back into the data science aspect of it. Thank you for answering all those questions. What are the technical challenges for building and maintaining this platform? Well, Matthew Thomas from a data science perspective, it's hard. As I said, I have to label my own data. That's, that's actually one of the most challenging aspects, both both in terms of, as I talked about, before, avoiding biases, and just having enough data to have a reliable machine learning model. You know, these are the models that I use, mostly, except for my text classification. They're all deep learning models, in particular use transfer learning. So I take already established, I take models that are trained on a very large data set, and then fine tune them to my purposes, which is very common in NLP to do it that way in any image, you know, image based data science as well. So having enough data to do that is always a challenge. There's also model drift. So that's usually when your data becomes a little bit stale. So what you know what happens is that the data that I use to train the models is based on jobs that have already been posted on the site. And that works great for a while but as we get more on Employers have more different types of jobs, I have to then you know, include those. So it's not one of those things where I can always I just create the model, put it out there and go, Okay, job done, I'm kind of like every once in a while, I have to label more data and then fine tune the model again. And that's never going to stop, that's going to be a constant, ongoing process. Another technical challenge, this is common in text applications is also model size. So because I use a transfer learning approach, these models are quite large, you know, even the smaller ones, you know, 500 megabytes pled they can get a lot bigger than that. That's just a challenge from a computational aspect, deploying these models in production can be computationally expensive and financially expensive, also, when you're using platforms like AWS. So I try to minimize that I try to use the smallest models sometimes that I can, and still be effective sometimes with especially with tech space models, you get diminishing returns like a model, that's 10 times the size of another one is not 10 times better, the difference in accuracy might be 1%. So you know, not worth necessarily all that extra expense, I also try to use the same model for different tasks. So multitask learning, that's something that's becoming more popular in this space, natural language processing, for the basically the same reason. So you use the same model, what it might be used for named entity recognition, which is basically token classification. If any of the listeners do have done any work in the space, you're classifying words, basically, instead of documents, words or phrases, you might use the same model for that as you do for part of speech tagging, or something else. The library I use for that, if any of your listeners are curious is Spacey. Spacey is a great Python library. It's very good at this multitask learning, and it does pipelines where you can sort of deploy multiple models at once. And that avoids some of that overhead. But yeah, just model size is a problem. Monica Manney Thank you so much. He talked about having to talk to your colleagues and kind of do some introspective work on some of the ethics of the language that you find on job postings. Talk to me about any advice that you have or just your response to what is the data scientists responsibility to ethics? Matthew Thomas Well, I think you have to think very hard about the impact if you're designing an algorithm that's going to go to production, and people are going to use, you know, in our case, employers are using the algorithm in the sense that they're the ones looking at the recommendations, but then the candidates are themselves being recommended. So your your algorithm has a has a real effect on people. And especially for something like this, where you're talking about finding jobs, you have to think really hard about how your algorithm works. That's number one. And I think you have to think really hard about how competing algorithms might be falling short in the space. So what you don't want to do as a data scientist is, obviously you want to think about things like model accuracy. What approach is the moat is most likely to yield, you know, the most accurate model and data data or, you know, what tool should I use. But you want to resist the temptation to just look at it that way. Because what's what's perhaps easiest to do from a data science point of view may or may not be the best thing to do from an ethical point of view. So you may have to be willing to make it a bit harder on yourself, in order to do better in terms of things like perpetuating biases, which is one of the air, that's probably the biggest thing, ethics wise, that people who design AI, especially in this space have to think about. So I think from for a data scientist, you just have to be really honest about yourself, you have to do some research into the space both, you know, what, have people written about it? And you know, who who resources kind of thing? And also, how are these? How are most of the algorithms working? And then think about, you know, how I can do better. The specific application of that is gonna differ depending on your use case. But I think that's, that's really, from a data scientist point of view. I think that's main, the main thing is, is is, you know, thinking about the impact that you're out of the algorithms are going to have on real people and taking that into account, in your model design and in the technical decisions that you make day to day. Monica Manney I love that answer. My last question is, what has the response to this platform been? Matthew Thomas I think most of the employers We have dealt with have been pretty enthusiastic about using the platform. You know, generally I think they particularly the the clients that we already have have kind of realized that they need to do better in this space, they want to diversify their workforce, particularly when it comes to people with disabilities. So they're enthusiastic about the technology. In the first place, candidates, of course, we've had a lot of positive feedback from, you know, people, in particular, they they like being able to, I talked to a lot about the success enablers, you know, before, that's an area where we get a lot of positive feedback. Because there's not a lot out there, that that helps people with that, you know, we're still pretty young. It's still a small company. And so obviously, we're, you know, in the process of expanding and building out our platform more and more. So I think, as we continue to add features, we're going to, you know, become better and better known, I guess. But we've certainly helped quite a few people find roles and, and the we do find that the clients that engage on our platform, the most, you know, check in the most, look at the most recommendations definitely see the best results, you know, they get the most hires, they get the most interested candidates. So it's headed in the direction that we want it to be headed. Monica Manney Now, we've asked the questions that we want to ask, what else do you want to add? What's something that you, we didn't ask you, but you want to talk about, Matthew Thomas I guess, you know, one thing I guess I'll say one thing for, you know, our data science listeners are by day, either data science practitioners or data science students, I'll talk a little bit of just about my experience with NLP. If that's alright, so I'm not necessarily particular to a job board. But if I would say, if you're thinking that NLP is a space that you want to get into, I'll say a couple of things about it. One is, there's a lot of demand for that skill within within data science. If you can do NLP, well, I think you're going to find that a lot of companies want to hire you. I will say it's more programming focus, then it is mathematics focused. So the algorithms that underlie a lot of what I do the token classification, or text classification, it's all mathematics based, but a lot of the best practices are kind of already established. So what you end up having to do is, understand the fundamentals of language very well, understanding, you know, part of speech, sentence modifiers, all of these things, and then learning how to do the programming in order to create these models. So it's a little more on the software engineering spectrum of data science than it is on the math and statistics spectrum. Just realistically, I don't think that's true in academia, but you know, that I'm doing it in commercial applications and commercial applications. I think that's definitely true. So if you're an aspiring data scientist, and you are more interested in the math side of it, I would say NLP is probably not for you. But if you're more of a programmer, it's definitely something to look into. Monica Manney All right. That's all we have for you. That was great. Thank you so much. All right. Matthew Thomas No problem. Thanks a lot. Monica Manney Thanks for checking out this week's episode. We'll be back on October 1 with a conversation between Raph Alvarado and Alison Bigelow as always, be sure to rate us review and subscribe wherever you listen to podcasts. We'll see you next time. <outro music>

Creating a Biased-Free Job Board with Matthew Thomas

Show Notes

Episode Transcript

Other Episodes

Advances in Sports Analytics

Exploring the Popol Vuh with Allison Bigelow and Raf Alvardo

The Transformative Role of AI in the Credit Industry