Colorado DOI weighs in on how to prevent algorithmic discrimination in life insurance
Life insurance is one of the oldest and most carefully regulated industries in America, and it is one of many in the midst of upheaval due to “big data” and advances in machine learning. These changes have sparked concerns about algorithmic discrimination, and rightly so, considering the industry’s sordid history.
In 1896, a statistician at a life insurance company wrote a highly influential 330-page manifesto of “statistics, eugenic theory, observation, and speculation” arguing that African Americans were uninsurable. Before the Civil Rights Act was passed, it was common practice for insurers to charge Black Americans higher premiums. Even today, which protected classes which are actually legally prohibited from being considered vary widely across states.
This context, combined with general rising awareness of the potential of algorithms to exhibit or amplify bias against groups of people, has made regulators wary of the new industry trend of replacing traditional processes with new algorithms based on new data.
And this is where our story begins. Recently, the Colorado Division of Insurance released a draft of its Algorithm and Predictive Model Governance Regulation. These rules, currently specific to life insurance, are meant to enforce Colorado’s Senate Bill SB21-169, signed into law in 2021, which “protects Colorado consumers from insurance practices that result in unfair discrimination” based on a number of protected class characteristics. These rules were developed after a significant period of stakeholder engagement, as well as a contract with Cathy O’Neil’s algorithmic auditing company, ORCAA. Commentators have rightly referred to the release as a “watershed” in AI regulation for its specific guidelines for corporations and developers.
In this post, we’ll explain what practices this regulation is targeting and what’s at stake, our impressions on what these rules get right, and our lingering questions about enforcement.
Setting the scene: algorithmic underwriting
The basic principle behind life insurance is that customers with a higher estimated mortality risk must pay more for their policy.
Traditionally, applicants who wish to purchase a policy must undergo a comprehensive health and lifestyle examination, including lab work. A trained underwriter for the company will review this information and either reject the application for being too high-risk, or accept the applicant into one of a number of risk classes.
The premiums charged for the policy ultimately depend on statistically derived estimates of mortality risk, which are based on risk class, as well as demographic characteristics with robust statistical relationships with mortality like age and gender (women pay less because they tend to live longer). These models are developed and maintained by actuaries, specialized risk management professionals who must pass a series of exams and join professional associations in order to be credentialed.
Since actuaries must follow strict industry standards when they estimate mortality risk, it is the underwriting process that lends itself most naturally to being transformed by machine learning. Could the human-in-the-loop who manually analyzes health and behavior be replaced by an algorithm that can predict mortality based on historical application information? Or, instead of forcing every applicant to get blood drawn before becoming a potential customer, what if that algorithm could be based on readily available data, like prescription history, credit history, or even social media behavior?
The Colorado guidelines and the law they are meant to implement are concerned with models which use “external consumer data and information sources,” or ECDIS. (Notably, the guidelines do not seem to apply to the use of machine learning for underwriting with “traditional” data.) According to a 2019 Society of Actuaries survey of 28 companies which use ECDIS to underwrite some or all of their policies, inputs to these algorithms include motor vehicle records, prescription histories, FCRA-approved data, “consumer data,” and “vendor model risk factors.”
While it is generally considered acceptable for insurance rates to favor or disfavor a certain group if the difference in risk is actuarially justified, or based on “real” differences in mortality risk, regulators are worried about the fact that ECDIS is not directly logically related to mortality, and thus may not be a valid basis for differences in outcomes. The role of structural racism in credit history, policing, and other potential sources of data is of particular concern.
Interestingly, while 23 of the companies who responded to the SOA survey reported that actuaries were among the resources “involved in” developing their accelerated underwriting algorithm itself, nearly half of the companies also listed internal data scientists, and several listed external vendors and consultants–notably not bound to actuarial professional standards. These draft guidelines are thus posed to fill a potential gap in the oversight of risk model development.
The good: an emphasis on documentation
The guidelines are meant to provide specific requirements for an insurer’s governance and risk management framework to ensure their ECDIS systems do not result in discrimination or unfairness. In particular, insurers are required to report their guiding principles and values, put their methods for model development and evaluation in writing, and have plans in place for correcting for any issues that arise.
The guidelines also require thorough documentation of any in-house or third-party ECDIS system. Companies must track and describe their data and models, and to thoroughly delineate their properties, limitations, and purpose. It also requires the tracking of decisions made, who made them, and why.
The documentation “must be easily accessible to appropriate insurer personnel and available upon request by the Division [of Insurance]”, and insurers have to submit regular reports to the Department of Insurance with details on how they are complying with the regulations. Mandating documentation and reporting is one of the ways in which this draft regulation echoes key elements of the Blueprint for an AI Bill of Rights issued by the White House in October 2022.
In essence, the guidelines require insurers to be as explicit and transparent as possible about how they develop—and plan to manage—their ECDIS systems. This is clearly a response to a call from sociotechnical researchers that has been echoing in the academic community for years.
It may seem odd that only one of the guidelines actually refers directly to discrimination or unfairness (we’ll get to that later). But some of the most notorious cases of algorithmic “discrimination” in modern parlance can often be traced to particular errors or decisions–such as choosing a target variable ineffective for the problem that must be solved. The hope, then, is that by forcing insurers to document the data processing and modeling choices that could lead to bias, poor judgment can be identified before it is acted upon–and if this fails, a paper trail exists to hold the right person responsible. For instance, since the racial gap in access to credit is relatively well-known, an insurer should feel compelled to explain the safeguards or tests in place to validate any credit data they choose to use, which can then be evaluated by regulators.
The open questions: enforcement and defining discrimination
Notably absent from the guidelines are instructions on how to test models or data for discrimination. Even the definition of discrimination in the legislation that the guidelines are meant to implement is some serious word salad:
‘Unfairly discriminate’ and ‘unfair discrimination’ include the use of one or more external consumer data and information sources, as well as algorithms or predictive models using external consumer data and information sources, that have a correlation to race, color, national or ethnic origin, religion, sex, sexual orientation, disability, gender identity, or gender expression, and that use results in a disproportionately negative outcome for such classification or classifications, which negative outcome exceeds the reasonable correlation to the underlying insurance practice, including losses and costs for underwriting.
Interestingly, though, the guideline asking insurers to report how they audit their own algorithms specifically asks them to describe “testing conducted to detect unfair discrimination in insurance practices … including the methodology, assumptions, results, and steps taken to address disproportionate negative outcomes,” which are separately defined as
a result or effect that has been found to have a detrimental impact on a group as defined by race, color, national or ethnic origin, religion, sex, sexual orientation, disability, gender identity, or gender expression, and that impact is material even after accounting for factors that define similarly situated consumers.
Scholars of fairness know that a number of metrics with very different properties could all meet this description, potentially resulting in very different analyses—this is clearly acknowledged by the guideline, which asks insurers to articulate their methodology and assumptions.
Of course, left to their own devices, insurers are likely to interpret the directive in whatever way favors their bottom line when designing their own efforts to measure or mitigate “unfairness.” Whether or not this regulation has teeth, then, partly comes down to how the DOI will enforce the guidelines. Will they just enforce the requirement for insurers to "describe" their mitigation and evaluation methods? Or will they be willing to come out and say that the particular methods or metrics the insurer used are insufficient, if they are?
A clue to the answer of this question may lie in the legislation:
Nothing in this section … Requires an insurer to collect from an applicant or policyholder the race, color, national or ethnic origin, religion, sex, sexual orientation, disability, gender identity, or gender expression of an individual.
This gives insurers a bit of an out. In theory, they could cite their lack of access to protected class information to explain the absence of empirical fairness testing in their protocols. However, by estimating protected class attributes or reaching out to external data brokers, a common practice in other industries, insurers do have the power to attempt to conduct fairness testing of their algorithms and data. How hard will the DOI expect them to try?
Even though it remains unclear exactly how, if at all, regulators will use these guidelines to hold companies liable for discrimination, this ambiguity could help incentivize responsible modeling decisions on the part of the insurers. The already-strictly-regulated companies often err on the side of caution when it comes to the letter of the law, and it is in their best interest to prove to regulators that they are taking these rules seriously by reporting reasonable, thorough testing practices. When the line between “acceptable” and “unacceptable” practices is a little blurry, the insurers might try harder to stay well on the correct side of that line.
Ultimately, the power of this regulation to penalize companies for building discriminatory models may only be as effective as the agency tasked with enforcing it. But hopefully, the requirements about transparency, documentation, and governance will at the very least have a preventative effect on the kinds of harm that stem from irresponsible or careless business practices.
About the Author: Lizzie Kumar is a CNTR affiliate and PhD candidate in Computer Science at Brown.