Today's Fair Lending Guidelines Support AI Underwriting. Let's Not Weaken Them.

Subscribe to Our Blog

Strong fair lending guidelines support more responsible lending and can encourage the development of fair and transparent AI in underwriting. A proposed rule change by HUD endangers critical legal tools for fighting lending and other forms of discrimination, and may also lead to the wrong kind of AI outcomes. A few weeks ago Zest AI submitted this comment letter explaining why, and urging HUD to support fairer and more inclusive lending.

________

Zest AI appreciates the opportunity to comment on the Department of Housing and Urban Development’s proposed rule titled Implementation of the Fair Housing Act’s Disparate Impact Standard. Zest agrees with HUD that algorithmic models, if used responsibly, can extend access to credit to otherwise underserved communities. Our own experience developing machine learning underwriting models bears this out. However, we are concerned that HUD’s proposal—in particular the defenses for allegations based on discriminatory effects caused by models—risks encouraging the use of opaque and flawed models in ways that would threaten consumers and unnecessarily perpetuate discrimination.

Zest AI is a financial services technology company that helps lenders develop machine learning underwriting models for a wide range of credit products, including auto finance and consumer and mortgage loans. Zest’s tools allow lenders to approve more creditworthy borrowers while maintaining the institutions’ risk profiles. Lenders use our software and modeling capabilities to increase loan approval rates, lower defaults, and make their lending fairer. Importantly, our tools allow lenders to explain, validate, interpret, and document the reasoning behind their credit decisions, all of which are critical to the responsible use of models.

We agree with HUD’s twin observations that models can “be an invaluable tool in extending access to credit and other services to otherwise underserved communities”; and that “disparate impact provides an important tool to root out factors that may cause these models to produce discriminatory outputs.”

In our view, however, HUD’s primary emphasis on “substitutes or close proxies” for protected classes does not adequately account for the discrimination risks raised by models, especially sophisticated machine learning models. The terms “substitutes” and “proxy” are undefined and likely to cause confusion. More importantly, it is inappropriate to focus solely on variables in isolation, because seemingly benign variables can combine in sophisticated models to generate significant and unnecessary disparities. Similarly, providing immunity for reliance on models developed by third parties—without assessing the transparency, validity, or fair lending testing of those models—will encourage reliance on opaque, flawed, and discriminatory third-party models.

There is no need to provide a defense specific only to models. Models should be assessed under the same framework applied to other policies: Does the model cause a disparate impact? If so, is the model justified? If so, is there a less discriminatory alternative? Lenders can and do undertake that traditional inquiry today, even when using sophisticated machine learning models. At Zest AI, we have developed a methodology that allows lenders to easily and quickly identify whether their models result in adverse disparities, and whether less discriminatory alternative models exist that would decrease those disparities while still serving the lenders’ business interests. This methodology does not impose material burdens on lenders; rather, it enables them to eliminate unnecessary disparities while making, in the words of the Supreme Court, "the practical business choices and profit-related decisions that sustain a vibrant and dynamic free-enterprise system."

Background on Models and Machine Learning

Machine leaning models are the future of credit underwriting, offering benefits to lenders and consumers.

Using models and algorithms for credit scoring is not new. As early as 2004, the FTC reported that automated underwriting models for assessing applicants for loans had already become “particularly important in the mortgage market. Even in 2003, about 75% of loans were made through automated underwriting, and 94% of lenders had implemented at least one automated underwriting system. Using models for credit scoring is ubiquitous in today’s market, especially for mortgage loans.

What is new is the advent and increasing prevalence of artificial intelligence models in the financial industry for assessing creditworthiness, marketing, and other key decisions. According to a 2017 survey, 79% of bankers agree that artificial intelligence “will revolutionize the way they gain information from and interact with customers." Seventy-six percent believe that “in the next three years, the majority of organizations in the banking industry will deploy [artificial intelligence] interfaces as their primary point for interacting with customers.” In the words of one observer: “Algorithms rule the world.”

Machine learning is a particularly powerful type of artificial intelligence that discovers relationships between many variables in a dataset to make better predictions. These models can leverage very large amounts of data, meaning the models can consider and assess a broader and more diverse set of variables than standard statistical models traditionally used for credit underwriting. Because machine learning-powered credit models substantially outperform traditional credit models, companies are increasingly using them to make more accurate decisions.

For example, customers using Zest’s machine learning underwriting tools to predict creditworthiness have seen a 10% approval rate increase for credit card applications, 15% approval rate increase for auto loans, and a 51% approval rate increase for personal loans—each with no increase in defaults.

This innovation is good news for lenders and consumers and it should be encouraged. Machine learning increases access to credit, especially for low-income and minority borrowers with thin or no credit files. In particular, a test we ran with a major mortgage originator found that machine learning modeling techniques could responsibly expand access to mortgages for the thousands of American families that have traditionally been unnecessarily excluded from these markets.

Machine learning models can raise risks and must be developed and deployed responsibly.

At the same time, machine learning models can raise serious risks for institutions and consumers. Machine learning models are opaque and inherently biased. For this reason, they are sometimes referred to as “black boxes.” Even the human that programmed the model may not be able to discern how variables were combined or considered, or how those combinations were weighted to yield the model’s predictions. Moreover, like any model, machine learning models are developed by humans that make various decisions during development and implementation, including about what datasets should be used to train the model, what assumptions should be made when calibrating the model, what variables should be assessed by the model during implementation, and what cut scores, thresholds, and segmentations should be used—all of which (as well as many other decisions) offer the opportunity for bias or discrimination to affect the model.

For good reason, Congress, federal agencies, academics, and civil rights groups have expressed concerns that these new forms of models and the large amounts of new data they can process can be discriminatory, biased, unfair, and perpetuate inequalities. For example, algorithms trained on real-world data may reflect existing discriminatory patterns or biases, which can unnecessarily perpetuate inequalities and unconscious prejudices. In one Princeton study, an AI algorithm linked white-sounding names with “pleasant” and black-sounding names with "unpleasant." African American-sounding names are also more likely to generate advertisements related to arrest records than names typically associated with white Americans. And criminal sentencing algorithms have been criticized for relying on data that contains racial bias, reflecting, for example, higher incidents of minorities with criminal records in minority areas where police focus their efforts.

These very serious risks exist to the same extent in models used for underwriting credit. As the OCC has explained:

Bank management should be aware of the potential fair lending risk with the use of [artificial intelligence] or alternative data in their efforts to increase efficiencies and effectiveness of underwriting. It is important to understand and monitor underwriting and pricing models to identify potential disparate impact and other fair lending issues. New technology and systems for evaluating and determining creditworthiness, such as machine learning, may add complexity while limiting transparency. Bank management should be able to explain and defend underwriting and modeling decisions.

Without understanding why a model made a decision, bad outcomes will occur. For example, a used-car lender we work with had two seemingly benign signals in their model. One signal was that higher mileage cars tend to yield higher risk loans. Another was that borrowers from a particular state were slightly riskier than those from other states. Neither of these signals appeared to be an obvious proxy for a protected class under fair lending laws. However, our machine-learning tools noted that, taken together, these signals predicted a borrower to be African-American and more likely to be denied. Without visibility into how seemingly fair signals interact in a model to hide bias, lenders will make decisions that tend to adversely affect minority borrowers.

Lenders put themselves, consumers, and the safety and soundness of our financial system at risk if they do not use transparent models and appropriately validate and monitor those models, including testing for disparate impact risk.

Zest AI Innovations

Zest has spent the last decade becoming the leader in machine learning models for credit. Our customers regularly see double digit increases in approval rate while also reducing losses by double digits as well. At the same time—and just as importantly—all of our models are completely explainable. This means that, no matter the complexity of the model, Zest can reveal the contributions or influence of each input feature towards the model prediction.

Moreover, Zest’s tools allow lenders to assess and mitigate adverse impacts of their models—such as different rates of selection across protected classes like race and national origin. Relying on the transparency tools built into Zest’s software, a lender can identify adverse impacts and easily modify a model to reduce those disparities without meaningfully affecting the model’s performance. This allows lenders to quickly identify the availability of less discriminatory alternative models that retain the power of the machine learning models. In this way, Zest’s tools decrease disparate impacts across protected groups and ensure that the use of machine learning-based underwriting models mitigate, rather than exacerbate, discrimination in lending. This process is fast and efficient; it allows lenders to pick and deploy better, less-discriminatory models without imposing meaningful burdens.

While these tools were developed to solve the problem of identifying and resolving disparate impact risks raised by sophisticated machine learning models, they can be used just as effectively and efficiently on the types of traditional statistical models widely used for credit underwriting.

Concerns Regarding HUD’s Proposal

HUD’s proposal would allow a defendant to defeat an allegation of discriminatory effect caused by a model, such as a “risk assessment algorithm,” in one of three ways. The first and third defenses put undue emphasis on identifying substitute or proxy variables in isolation. The second defense would immunize the use of models produced by undefined third parties, potentially encouraging the use of flawed models.

To defeat an allegation of discriminatory effect caused by a model, a defendant may:

  1. Provide the material factors that make up the inputs used in the challenged model and show that these factors do not rely in any material part on factors that are substitutes or close proxies for protected classes under the Fair Housing Act and that the model is predictive of credit risk or other similar valid objective;
  2. Show that the challenged model is produced, maintained, or distributed by a recognized third party that determines industry standards, the inputs and methods within the model are not determined by the defendant, and the defendant is using the model as intended by the third party; or
  3. Show that the model has been subjected to critical review and has been validated by an objective and unbiased neutral third party that has analyzed the challenged model and found that the model was empirically derived and is a demonstrably and statistically sound algorithm that accurately predicts risk or other valid objectives, and that none of the factors used in the algorithm rely in any material part on factors that are substitutes or close proxies for protected classes under the Fair Housing Act.

We agree with HUD that it is important for lenders to carefully consider the inputs used in their models. With limited exceptions, the use of a prohibited basis as a variable in a credit scoring model is intentionally discriminatory and violates the Fair Housing Act and ECOA. HUD’s proposal rightly indicates that it would be illegal to use “substitutes or close proxies” for protected classes. This implication makes sense: a variable such as “subscribes to Ebony magazine” is so intuitively related to a protected class that it should be treated no differently than the protected class itself.

Unfortunately, however, these defenses would introduce serious problems that threaten the stability of disparate impact as applied to models. For this reason, we urge HUD to abandon them.

First, the term “substitute or close proxy” is undefined and so is likely to inject confusion and uncertainty into lenders’ fair lending analyses. Our Ebony magazine example is an easy one, but many other variables will not be so straightforward. Thus, emphasizing this analysis above all else will leave institutions overly-focused on whether variables that may be correlated with protected characteristics are permissible, and second-guessing whether a range of variables are permissible. This uncertainty will hinder innovation. Determining whether variables in isolation are proxies or close substitutes can be useful for assessing disparate treatment risk. But it should not be the sole focus of a disparate impact analysis of models, which should take into account the actual impacts of the model, not just its inputs.

Second, the defense would be available as long as the “model is predictive of credit risk or other similar valid objective,” without an analysis of the predictiveness of potentially problematic variables. That formulation could mean that a variable that drives disparate impact caused by a model would be immunized, even if that variable was not itself predictive. There is no statistical or legal basis for the use of variables that drive disparate impact and that do not contribute to model performance.

Third, these defenses inappropriately ignore the actual effects or impacts of models on applicants, which has always been the primary focus of disparate impact law. A defense based only on proxies or substitutes would be inconsistent with disparate impact case law, as well as agency regulatory materials (including other parts of HUD’s own proposal) confirming that the first step of a disparate impact analysis looks to adverse impacts.

It is not the case that the existence of proxies or substitutes for a protected class will determine the effects of a model on applicants. In fact, an emphasis on whether a model uses any factors that are proxies or substitutes for protected classes misunderstands how models—particularly machine learning models—operate. A model that contains a proxy may not have disproportionately adverse impacts on protected classes; for example, if the proxy carries little weight, if its effects are offset by other variables, or if it works to the advantage of the protected class. Similarly, a model that does not contain proxies may well have adverse impacts on protected classes. Machine learning models, for example, generate countless combinations of variables (and combinations of combinations of variables) to generate predictions. How these variables interact is often unintuitive. As the example we provide above about high-mileage cars and residing in a particular state demonstrates: seemingly benign variables—meaning variables that are not substitutes or proxies in isolation—can (and often do) interact in ways likely to cause disparate adverse impacts on protected classes. Other stages in model development can also contribute to adverse impacts, including unrepresentative training data, unfavorable outcome definitions, and many other factors.

For this reason, identifying whether the model itself produces an adverse impact—for example, denial-disparities that disproportionately disadvantage members of protected classes—is essential. Whether a model includes proxies does not answer the question whether the model has a disparate impact on protected classes. Focusing on that end-result is the same analysis used in other contexts; there is no basis for treating allegations based on models differently. HUD’s proposal explains that despite this special exception for disparate impact as applied to models, it is not providing an exemption. Instead, a focus on these variables allows a defendant to demonstrate “lack of a robust causal link between the defendant’s use of the model and the alleged disparate impact.” But this is not how models work. As described above, an absence of proxies or substitutes does not mean an absence of disparate impact.

Fourth, focusing only on proxies or substitutes for protected classes eliminates any inquiry into whether less discriminatory alternative models exist. This new standard would eliminate what the Supreme Court in Texas Department of Housing & Community Affairs v. Inclusive Communities Project, Inc., 135 S. Ct. 2507 (2015), recognized is the defendant’s burden: “prov[ing] [a policy] is necessary to achieve a valid interest.” If a less discriminatory alternative exists—for example a model that would achieve the business interest with less adverse impact on protected classes—then the policy cannot be “necessary.”

Lenders routinely analyze the adverse impact of credit models by evaluating whether traditional credit models result in an adverse impact and, if so, assessing whether alternative predictive variables or other changes to the model are available that have less of a disparate effect and do not significantly degrade the predictive power of the model.

Although this type of analysis can be complicated when using machine learning models, we have developed tools that are up to the task. Relying on the transparency tools built into Zest’s software, lenders can identify adverse impacts and easily modify a model to reduce those disparities without meaningfully affecting the model’s performance. This tool allows lenders to assess the availability of less-discriminatory alternative models that retain the power of the machine learning models.

This solution is a win-win: It is fast, efficient, and easy for lenders to use. It is also consistent with years of disparate impact law and HUD’s existing disparate impact rule.

Fifth, HUD’s defense for models “produced, maintained, or distributed by a recognized third party that determines industry standards,” is unclear and could immunize seriously problematic models. The proposal does not explain what would qualify as a “recognized third party that determines industry standards.” Unlike in some other contexts where there may be generally accepted professional standards for developing tests, and for which developers have professional obligations to provide evidence of validity, no such industry standards exist in credit underwriting. This defense will increase uncertainty for entities that use third-party models and for entities that develop these models. It will also put entities that do not qualify as ones that “determine industry standards”—however that term is understood—at a competitive disadvantage, without any clear basis for doing so. Providing immunity for reliance on third parties that may have incentives to sell scores rather than support responsible nondiscriminatory underwriting—and that may themselves be immunized from liability because of causation or other issues—is a recipe for encouraging dangerous and discriminatory lending and will increase costs and uncertainty.

For this reason, lenders should not rely on models developed by third parties absent transparency into development, validation, and fair lending testing of those models.

Conclusion

Zest has proven that developing transparent and fair machine learning models for credit underwriting is not only possible, but that it can increase access to credit to otherwise underserved communities. The responsible use of such models should be encouraged. At the same time, we agree with HUD that disparate impact provides an important tool for rooting out discriminatory models and encouraging lenders to use responsible and fair models. We urge HUD not to adopt its proposed defense for allegations related to models because it will increase uncertainty and risk encouraging the use of flawed and unnecessarily discriminatory models.

 

Image by Sebastian Wagner from Pixabay.