Defensible Decisions: Building a Defensible AI Framework—Inventory, Testing, and Monitoring, Part 2

Announcer: Welcome to the Ogletree Deakins podcast, where we provide listeners with brief discussions about important workplace legal issues. Our podcasts are for informational purposes only and should not be construed as legal advice. You can subscribe through your favorite podcast service. Please consider rating this podcast so we can get your feedback and improve our programs. Please enjoy the podcast.

Scott Kelly: Hello everyone and welcome to Defensible Decisions. I’m Scott Kelly, an employment lawyer at Ogletree Deakins. And for the last 25 years, I’ve helped employers design and defend fair, effective workforce systems across hiring, pay, promotion, retention, and now artificial intelligence systems, using legal insights, analytics, and audit ready documentation.
Each episode is going to blend employment law developments with rigorous workforce analytics, so your decisions are defensible, compliant, and effective. We translate evolving enforcement priorities and regulations into practical steps you can apply before the lawsuit or the investigation arrives. You’re going to hear from attorneys, other experts such as data scientists, labor economists, social scientists, and hopefully some governmental officials. Together we’ll cover recruiting and selection, pay equity, systemic discrimination, DEI compliance post-Students for Fair Admissions, artificial intelligence bias and audits, and federal contracting and reporting. Subscribe to Defensible Decisions. This podcast is for informational purposes only and does not constitute legal advice. Listening does not create an attorney-client relationship. The opinions expressed are those of the speakers and do not necessarily reflect the views of their employers or clients.

I’m excited because we’re having Lauren Hicks, a shareholder in our Indianapolis office join us again today to kind of kick off where we left off on the last episode dealing with AI bias audits. We went through a lot of information on the last episode, and today we’re going to try to really focus in on questions around documentation of these audits. Maybe go through a hypothetical or two to demonstrate what it might look like to conduct one, talk about some practical takeaways, and that should do us.

So, Lauren, as you all probably know, like I said, is a shareholder in Indianapolis, but she is really leading our team of lawyers and data analytics professionals here at Ogletree to assist our clients with privileged AI bias audits, and also has a lot of experience with AI governance and defensibility. So, Lauren, thank you again for joining us today.

Lauren Hicks: Thanks so much for having me back.

Scott Kelly: All right, so as promised, we said we were going to talk about documentation, and I know after practicing in this space of employment law for the last 25 years, I know how important documentation can be to help prove legitimate nondiscriminatory reasons, and to kind of mitigate employers’ legal risk, but it can feel like a chore. Let’s just be honest. What should be captured for employers that are using AI and looking at conducting any type of AI bias audits?

Lauren Hicks: I agree that documentation can often feel like a chore. And when we are working with technology systems, it can even feel overwhelming and unwieldy because of this sheer volume of data. But preserving contemporaneous records, showing kind of job-relatedness and business necessity can be useful for many defensive purposes. Where also we may have adverse impact appear, an employer might need this type of documentation to be able to follow up and sort of evaluate less discriminatory alternatives or do further investigation into the mechanics of that technology.

So, we want to think about recording the who, what, why of human reviews, including reasons for following or for overriding AI and technology outputs. We want to maintain clear, plain language descriptions of how each tool works, what factors it considers, and then how the employer itself uses those factors to relate to job success. When we’re thinking about these things, we want to make sure we’re capturing the full life cycle. So, whether it’s testing plans, results, remediation decisions, retests, where some type of problem or concern has been identified, things of that nature. When regulators ask whether a system is lawful and fair or if a plaintiff challenges outcomes, that record truly becomes the backbone of the employer’s defense.

Scott Kelly: So, sounds like if you didn’t write it down, it didn’t happen.

Lauren Hicks: You got it. That’s just right.

Scott Kelly: Let’s think about doing, maybe running through hypothetical or a couple of quick scenarios here. One common practice that I’m seeing is organizations using a résumé screener, an AI résumé screener that’s going to assign scores, and only allow candidates over a cutoff to make it to recruiter review. What’s the risk in that approach, and is there a potential mitigation or a fix that employers could consider?

Lauren Hicks: I like this hypothetical simply because, for what it’s worth, the example you came up with happens to be probably the most common scenario we do see mass use of AI for. It’s sort of assigning a score or stack ranking candidates, things like that. Now, I believe your hypothetical also noted that it has a cutoff. So, if there is a cutoff score, meaning someone has to minimally meet some specific percentage or grade, that can definitely be a red flag.

So, we want to test whether that threshold kind of systematically excludes protected groups at higher rates. If it does, you would evaluate whether the features driving those scores are truly job related, and whether there could be a less discriminatory alternative, such as kind of removing or reweighing features using banded cutoffs or incorporating additional structured human reviews that achieves hopefully the same business goal. And just to note, to be clear, the sort of cutoff score does create a significant red flag, but that is not to suggest that if we don’t use a cutoff score, as many do not, so it is becoming more common frankly, that these systems use just a generic scoring system without having a cutoff. And those to be clear can be just as problematic or concerning. So, one should not step back and think, oh, we don’t have a cutoff, therefore there is no risk. It’s just risk that we would look at a little bit differently. If there’s a cutoff score, we have kind of clear and defined next step obligations. If there’s not a cutoff score, we have the same potential bias risks without sort of maybe that secondary step of evaluating under established guidelines, the uniform guidelines that set a series of requirements. So, you might consider remediation efforts though, Scott, like moving the tool to be earlier in the process, or to serve as assistive, meaning to provide advice or guidance rather than being a gate with a cutoff score.

Scott Kelly: Goodness, lots to think about. How about a different scenario here dealing with, I ran across this the other day and really I was surprised to be honest that this was a tool, but in kind of looking in a promotion model, one that would predict “leadership potential,” in quotes there, where it might be selecting from or over selecting even from certain locations where your internal employees may have been assigned from a physical location standpoint, and maybe what their prior work experience would be. Any issues there?

Lauren Hicks: That’s a really good question also, and I would second your thoughts on that, that in the past, this technology really popped up primarily in the space of recruiting, and sort of applicant flow and selections. And so that technology, while still developing and changing at times, is much more established in the market. You are right, Scott, that we are seeing now vendors moving more into looking at performance type measures and promotional type measures to sort of capture potential and other workforce analytics. So, stay tuned to see what issues might arise here.

But the scenario you suggested suggests that if there’s over selection from certain campuses and prior employers, that would suggest to me that there are likely proxy features and serving as some type of potential correlation with protected traits. Those can be difficult to identify, frankly. They’re not always as logical as we might think. Proxies can really hide in strange places. But you might interrogate the features to identify those proxies. And then if they’re identified, you can re-engineer or remove them, and then kind of validate the model against structured performance criteria that is in fact tied to job success.

Or you may be able to test the stability across teams and locations and add a formal justification step where managers have to provide independent structured reasons to adopt or override model recommendations. And we probably sound like a broken record here, Scott, but monitor, monitor, monitor. What you don’t want to do is use these systems, they have some type of bias results, and then we let it go uncorrected. So, I think particularly with these performance or leadership promotional potential type systems that are newer, it is even more critical to stay on top of short-term and routine governance and bias monitoring.

Scott Kelly: All right. One last quick hypothetical or scenario that I thought about, Lauren. I know a lot of times employers are looking at what talent they have and wanting to get the high potential talent to be moving as part of succession planning, or even just to make sure that we kind of have the right people in the right roles, and that we’re advancing the objectives of the organizations. I hear that a lot from clients. And they want to make sure there’s not discrimination risks in how they’re doing that. And I recently learned of . . . there’s I think an AI model that can be used that flags folks for a retention risk. So, these kinds of flight risks—and then I guess the corollary there is maybe if it’s a flight risk, you can maybe not give them as high profile of assignments, or I suppose you could try other kind of intervention methodologies there. But any risks at using what I’ll call kind of a retention risk AI model?

Lauren Hicks: Yeah, I would bucket these a little bit with that second scenario you gave in that they are much newer, they’re not nearly as common in usage as what we see for applicant tracking, which is very common. But these are exactly, I mean, to your point, we don’t see it as often because it’s new in developing software, but employers again are very incentivized right now. Boards and executives are quite interested in obtaining as much AI as possible. And so, we are seeing this sort of creep into newer areas like identifying alleged tenure or flight risk candidates.
And so this is sort of a classic feedback loop. You verify whether the model is accurate across groups, and you may constrain you so that it triggers kind of supportive interventions like career conversations rather than having opportunities withheld. Or you can closely monitor downstream outcomes to ensure the tool isn’t creating a self-fulfilling prophecy that harms protected groups. So, again, monitor, monitor, monitor is the key.

And that may include holding conversations with business executives to create awareness of the risk. And if the tool continues in usage, certainly implementing a more frequent auditing mechanism, perhaps monthly or quarterly depending on the volume, to really monitor these risks. So important to understand that there’s just less testing, there’s less history, and there’s less known predictive outcome with these newer types of AI usages.

Scott Kelly: All right. So much to think about here. Can you give any kind of tips on maybe mistakes or things that you’re seeing most often that’s maybe causing an impediment for moving forward in a careful way?

Lauren Hicks: Yeah, that’s a really good question. One that I hear frequently is kind of overreliance on human in the loop as a defense. Meaning, well, we’re not relying exclusively on the AI, there’s human just takes it into account. So, the AI could narrow the pool or influence the human. And in that case, to be clear, we still have significant legal risk. You can still have a biased system. And all your sort of standard employment law risks under Title VII, ADEA, and things like that remain in place. It’s not a shield that there’s a human in the loop. So, making sure that, yes, it’s important to have human oversight, but a sign-off alone is not a shield.

Scott, sometimes we also see kind of cut-offs created without validation. So, one of your hypos, I think the second one, addressed that question about could there be a cut-off score? And though they are becoming less frequent, as I noted, we’re seeing the vendors shift to selling their technology as assistive rather than determinative. But cut-offs without validation are very risky. So if you’re using a hard threshold, it can certainly magnify impact risks unless you test them, document business necessity, and evaluate less discriminatory alternatives.

Another one that I hear pretty frequently, Scott, is kind of relying on unexamined or being uncritical of vendor claims. Again, technology is not something to be feared. It can be very helpful. It does help with efficiencies and many business needs. But vendor representations that something is bias-free or validated is not something you want to rely on from a legal defensibility standpoint. You need to do your own diligence. Employers should be thinking about running their own real data to ensure there’s not bias and not be relying on generic representations of the vendor. They’re looking at things from a broader standpoint across all of the community that they serve. And you as the employer need to be worried about your population.

Another one that comes up very frequently, we talked about this actually in the prior podcast that you and I recorded, is sort of a failure to have a complete inventory. So, you want to make sure that you have an accurate and complete inventory of every type of technology and tool that you’re using, where it’s used, and what version you are using, how it’s being implemented, across what populations it’s being implemented. This is a common gap, right? Employers, again, maybe don’t have technology as centralized because maybe something is purchased by recruiting in one instance, maybe it’s purchased by campus recruiting in another instance and maybe HR manages some other functionalities. And so oftentimes we don’t find a centralized repository, that can be a significant gap. So just understanding and identifying what technology and not falling into that common misperception that if something isn’t generative AI, that it’s not a legal risk.

Another one, trying to think of some of the common examples, would be equating speed and efficiency with improvement. So, these tools often do provide sort of faster screening, but it can be less accurate and/or it can create legal risks with bias. So, we talked on our prior podcast about working in tandem with bias analysis and effectiveness analysis. In other words, you’re paying for a tool, let’s make sure that it’s doing what it purports to do, but also that we’re not paying for cheaper but biased. And into that point, Scott, we just want to evaluate them together. We want to make sure that the tools work for job-related reasons, and we want to make sure that accuracy and impact are stable over time. Monitor, monitor, monitor.

Scott Kelly: You sound like a broken record. I’ll give you a hard time for that. But as far as the whole adage of we’ll just test once and move on, that’s probably not going to age very well. Correct?

Lauren Hicks: You’re right. So just thinking from a pure regulatory compliance standpoint alone. If you’re within certain jurisdictions, you may have ongoing monitoring regulatory requirements. But aside from that, bias can appear at any time. Models drift, environments change because your pool of candidates or whatever it’s impacting is continuously shifting. So continuous monitoring is going to be part of the bargain when you automate employment decisions.

Scott Kelly: Lots to think about as I keep saying, just level with me, I suppose, if our listeners today have heard all of this, and are kind of wondering what’s a short list of things they can do maybe in the next quarter or first quarter of maybe that’s too late, first quarter of 2026, what should they be thinking of? What should they be doing? Again, a short list. Do you have one?

Lauren Hicks: I would say first stand up or refresh your AI governance counsel. That should include legal, HR, and IT at the core. And it may include certainly other stakeholders. Second, build or update that governance inventory that we’ve talked about. So, you have a comprehensive list of all your AI and algorithmic tools across the talent life cycle. Third, you want to consider the privileged, bias, and effectiveness testing for your tools, especially anything that ranks, scores, screens, recommends promotions, or otherwise has material impact on the employment process.

I would say fourth, you want to align the policies and the vendor contracts to ensure audit rights, data access, explainability documentation, and remediation obligations. And then fifth, design kind of a monitoring cadence with clear thresholds and a remediation playbook, and pair that with disciplined documentation practices. I think taken together, these are some steps that may help reduce regulatory exposure and position your company to benefit from AI responsibly.

Scott Kelly: Okay. So, I think the through line I’m hearing here is privilege, inventory, testing, monitoring and documentation and contract or kind of supply chain governance to a certain extent. Is that covering all the bases that we need to cover here?

Lauren Hicks: Yeah, I would agree, Scott, that’s a really good start. So, wrap those in transparency, meaningful human review, and strong data governance, and hopefully you’re working your way toward a defensible and adaptable framework. Adaptability will be critical as new laws and obligations are constantly arising. And so having a robust monitoring system already in place is frankly likely to be a significant business advantage, if not a necessity.

Scott Kelly: All right. So maybe a parting thought. What do you think? Artificial intelligence and HR, is this an opportunity or a risk?

Lauren Hicks: That’s a great question. I would have to say my take on it is that it’s unquestionably probably both. And the balance of those things probably ultimately depends on your governance. So, with privileged audits, really robust policies, continuous monitoring, and kind of clear documentation, employers can capture that efficiency that they want from these systems, and the consistency gains hopefully while reducing the risk of discriminatory outcomes and enforcement actions. That’s really the goal. But without robust monitoring and discipline, those same tools can create systemic inequities that are hard to detect and even harder to defend.

Scott Kelly: I think that’s a perfect place to land. Lauren, thanks for walking us through this practical, defensible approach to how to handle artificial intelligence in the talent life cycle.

Lauren Hicks: Thank you for having me, Scott.

Scott Kelly: Absolutely. And for our listeners, thanks for tuning in to Defensible Decisions. For more employment law and workforce analytics, follow this podcast and share this episode. You can find us at Ogletree Deakins and wherever you can subscribe to your podcasts. This podcast is, of course, for informational purposes only and does not constitute legal advice.

Announcer: Thank you for joining us on the Ogletree Deakins podcast. You can subscribe to our podcast on Apple Podcasts or through your favorite podcast service. Please consider rating and reviewing so that we may continue to provide the content that covers your needs. And remember, the information in this podcast is for informational purposes only and is not to be construed as legal advice.

Insights

Resources

Subscribe

Innovation

Client Pledge

Transcript

Speakers

T. Scott Kelly

Lauren B. Hicks

Topics

Share Podcast

Workforce Analytics and Compliance

Request Transcript

Rates and Rate Structures

Fill out the form below to receive more information on our Rate Structures:

Request webinar recording for Defensible Decisions: Building a Defensible AI Framework—Inventory, Testing, and Monitoring, Part 2

Fill out the form below to receive more information on OD Comply:

Fill out the below to receive more information on the Client Portal:

Fill out the form below to share the job Defensible Decisions: Building a Defensible AI Framework—Inventory, Testing, and Monitoring, Part 2