From A to Zest AI — a glossary of machine learning terms for credit underwriting know-how
Here some words we toss around every day with clients and our fellow Zestys
An adverse action is a negative action reported to an individual or business which generally pertains to the denial of credit, employment, insurance, or other benefits. An adverse action notice can be issued by a lender, business, or government based on specific information found in credit reports or public records.
Adversarial attacks are inputs to machine learning models intentionally designed to cause the model to return an incorrect value. Malicious individuals often use these to steal information or money from an AI-based system.
AI-automated credit underwriting
Using machine learning algorithms to provide an instant lending decision. The machine learning algorithm can automatically and accurately predict credit risk, allowing the lending decision to take seconds.
Artificial Intelligence (AI)
A multidisciplinary field that includes psychology, computer science, engineering, and philosophy, whose aim is to create intelligent machines and to understand the mechanisms of intelligence better. Machine learning is a sub-field of AI.
A subset of information technology involved in the processing of large amounts of data. Big data sets are often bursty, varied, and unstructured (such as voice, video, and free text entry). Big data technologies provide the storage, networking, and processing infrastructure necessary to apply machine learning to the massive amounts of data collected today to create predictive models.
Examining the behavior of a machine learning model using data on which it has not been trained. Blindspot analysis is a step in model validation and performance analysis.
A credit score is a number that lenders use to evaluate whether an individual will repay their debts. The higher the score, the lower the risk to the lender. The score is derived from a formula based on variables that can include payment history, number of accounts, and amounts owed. Your credit score may affect the interest rate you pay to a lender and can make the difference between a loan being approved or declined.
A document or repository that comprehensively describes the variables used to create a model, how the data was gathered, the data sources involved, and how combined values are computed. A data dictionary may, for example, contain an entry describing a variable that represents the number of delinquent payments on revolving lines of credit for an applicant as reported by a credit bureau.
A piece of information that can be measured.
A person employed to analyze and interpret complex digital data to assist a business in its decision-making. A data scientist is often a combination of a software engineer with the programming abilities to build software to scrape, combine, and manage data from various sources and a statistician with the knowledge to derive insights from the information.
Decision trees are a widely-used type of machine learning model. They fit a sequence of decisions to a data set. Used alone, they tend to perform poorly. Still, when used in combination with other trees or ensembled with different tree-based model types such as gradient boosted classifiers or random forests, they can achieve excellent performance.
Deep neural networks
Deep neural networks are a type of machine learning model that have proven strong abilities to solve problems computer vision, autonomous navigation, and credit and insurance underwriting.
Equal Credit Opportunity Act (ECOA)
The Equal Credit Opportunity Act of 1974 governs fair lending in the U.S. and was meant to address discriminatory loan practices. The law requires lenders to identify disparities in approval rate, pricing, and other terms, and recognize drivers of discrepancies so that variables that seem fair or “facially-neutral” can be assessed for their business impact and impact on bias.
An ensemble model combines two or more related but different analytical models, whose results are then synthesized into a single score to improve the accuracy of predictive analytics and data mining applications.
The ability to explain a model’s results, for instance, why a member was approved or denied a loan. This is especially important when using AI-automated underwriting, as multiple, interrelated factors contribute to the decision.
Exploratory data analysis
The process of evaluating data in advance of modeling to determine the character of the population overall and critical segments to the business (e.g., approved applicants vs. denied segments). Exploratory data analysis may look at correlations between variables and targets and compute descriptive statistics (mean, median, etc.) disaggregated by segment and over time. Shifts in distributions over time can indicate problems with specific variables that may make them less suitable for modeling. Thus, exploratory data analysis is a best practice initial step in any modeling project.
Fair Credit Reporting Act (FCRA)
Congress passed FCRA in 1970 to promote the accuracy, fairness, and privacy of the information held by consumer reporting agencies. The FCRA requires lenders to send adverse action notices that describe what the consumer could do to improve the likelihood of having their applications get approved.
A set of regulations that govern the extension of credit to an individual or a business such that it is proven to be free of bias or discrimination on the basis of race, gender, sex, or other protected categories. The primary legislation that governs fair lending and credit in the U.S. includes the Fair Housing Act and Consumer Credit Protection Act of 1968, the Fair Credit Reporting Act of 1970, the Equal Credit Opportunity Act of 1974, and the Fair and Accurate Credit Transactions Act of 2003.
FICO, or the Fair Isaac Corporation, was the first company to offer credit-risk scoring, now commonly called a FICO Score. The scores are computed using logistic regression, a predictive modeling approach. FICO scores replaced judgmental underwriting with an objective number that could more accurately predict outcomes. This increased certainty enabled the rapid expansion of credit across the planet.
The process of transforming raw data into variables used to create a clearer picture of an applicant’s financial situation during credit risk modeling. A classic example of feature engineering is the development of ratios such as debt to income (DTI). DTI is intended to capture how easy it would be for a borrower to repay a loan given their income. Another example of feature engineering is creating rates, such as the rate of credit utilization over time. By considering whether the rate of credit utilization is increasing or decreasing — and how quickly or slowly — a model can more accurately predict the likelihood of default on an additional credit facility.
Generative adversarial networks
A widely used machine learning technique that produces results by using one network to generate outputs and another to judge the results. The algorithm repeats the judging process until it reaches a decent result.
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.
Data reserved for testing a model after it was built to see how it will perform on data never seen before.
In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, and some simple algorithms require none.
An algorithm used to attribute the prediction of a deep neural network to its input features. Zest AI has patented an attribute method called Generalized Integrated Gradients that extends this explainability to combination of neural networks and tree-based models.
A variable used in training data for a machine-learning algorithm that happens to include information the model is attempting to predict or is not available at the time of prediction.
Linear regression analysis
A mathematical method that shows the relationship between two or more variables.
Logistic regression is a type of regression analysis used to make predictions from a data set by explaining the relationship between one dependent binary variable (e.g., present or absent) and one or more nominal, ordinal, interval, or ratio-level independent variables.
A sub-field of AI that aims to create intelligent machines that learn from data. Many problems are becoming too complex to solve via traditional software engineering methods, and data is increasingly becoming essential to guide problem-solving. Machine learning enables computers to acquire human-like abilities by processing vast quantities of data and finding subtle, non-intuitive patterns in data. Machine learning differs from other approaches to AI because it learns from data instead of being programmed from scratch.
A program that has been trained to recognize patterns and make predictions based on data.
There are a few contexts for the use of the word bias in machine learning. Statistical bias is the amount by which an expected model prediction differs from the true value of the training data. The most common interpretation of bias is the bias-variance tradeoff, in which bias is a source of error in your model that causes it to over-generalize and underfit your data. In contrast, variance is sensitivity to noise in the data that causes your model to overfit. The layman’s view of bias is that ML models take on unconscious or unintentional discrimination created by their programmers, such as instances of ML models misclassifying humans as gorillas.
A model that is more or less likely to produce a given outcome based on specific applicant attributes. For example, a model that approves fewer African-Americans is biased against African-Americans.
Model Risk Management (MRM)
The comprehensive process of managing model risk as outlined in the OCC and Federal Reserve Supervisory Guidance on Model Risk Management, 2011-12. Model risk management involves documenting the modeling approach and the theoretical basis for and limitations of the modeling methods used. Robust MRM also requires performing and recording model validation and verification steps and ensuring the safe operation of models in production.
The process of determining whether the model will behave as expected on new data. For example, back-testing validates model predictions against known outcomes by holding out a sample of applications for loans that later defaulted and evaluating whether a model would have accurately predicted defaults. Model sensitivity testing allows validators to understand how a model would behave under various conditions. More advanced approaches enable efficient computation of model outcomes across a range of variables without requiring exhaustive search. One best practice: Analyze the swap-in population, the ones who will be newly approved by a new model, to see how different they are from those approved in the past. If they look the same, you know there’s less risk in using the new model.
A model may perform well on training data but fail to generalize on test data held out for evaluation. When a model is highly tuned to the training data but fails on data from the real world, the model is ‘overfit.’. Ensuring a model will generalize and operate reliably over time is a critical consideration in any modeling project.
This term can mean a few things, including but not limited to:
Input distribution monitoring: Comparing recent model input data with similar training data to determine, for example, whether variable distributions of incoming credit applications are significantly different from model training data. Systems for monitoring model inputs should send alerts when they spot anomalies or shifts that exceed safe bounds.
Missing input data monitoring: Comprehensive model monitoring should include monitoring for missing data. A complete model monitoring program should monitor and provide alerts when the rate of missing data, and its impact on model outputs and downstream business outcomes, exceed desired thresholds.
Output distribution monitoring: Monitoring systems should compute statistics that establish the degree to which the score distribution has shifted from the scores generated by the model in prior periods, such as those contained in training and validation data sets.
Execution failure monitoring: Error and warning alerts generated during model execution can indicate flaws in model code that may affect model outputs.
Latency monitoring: Ensures that model execution code and infrastructure meet the latency requirements of applications and workflows that rely on model outputs.
Economic performance monitoring: Solution to enable analysts to configure alerts on key performance indicators such as default rate, approval rate, and volumes. Substantial changes in these indicators can signal operational issues with model execution that need to be investigated and understood to manage risk.
Reason code stability: Reason codes explain the key drivers of a model’s score. You should monitor their distribution in case material changes indicate a significant shift in the character of the applicant population.
Fair lending analysis: It is essential to monitor loan approvals and default rates across protected classes to ensure fair treatment. Historically, this monitoring has been done far after the fact. Because of their potential for incidental bias, ML models should be monitored in real-time.
Population drift occurs when the underlying population scored by a machine learning diverges from the population used to train the model. Credit analysts using ML have to use a given population to generate a predictive model. Still, sometimes the character of that population changes, for example, when a bank expands into a new state, launches a new marketing program, or creates a digital offering. The character of the populations drawn from these new channels may differ from the population the model was trained to predict. Population drift can diminish a model’s predictive accuracy, making it necessary to monitor model inputs more frequently to identify and mitigate population drift.
A proxy model is a mathematically or statistically defined function that replicates the simulation model output for selected input parameters. Proxy models are widely applied in different areas of science for numeric modeling approximation.
Python is an interpreted, object-oriented, high-level programming language popular with developers and data scientists because of its clear syntax and readability.
The process of estimating what would have happened if a model had approved a rejected applicant. In model development, outcomes data are only available for approved loan applicants. This means new models are blind to rejected applicants. Thus, if a model determines a new set of applicants should be approved, it is impossible to know whether those approvals were a good idea because outcomes data is not available. Some data providers (credit bureaus) provide sources that engineers can use to develop outcome ‘proxies’ to evaluate the soundness of approval decisions on previously rejected applicants.
The act of apportioning uncertainty in the output of a math model to the sources of uncertainty in its inputs. Typical implementations of sensitivity analysis include exploring all combinations of inputs and permuting these inputs one by one to understand the influence of each variable (or a combination thereof) on model scores.
A swap-set analysis captures the change in consumer distribution across a score range in a predictive model. In a credit underwriting context, swap-sets help a modeler visualize the selected segments that get moved from approved to rejected (swap-outs) and from declined to approved (swap-ins). You compare the swap sets with the groups that would have been denied and approved in both existing and challenger models. The objective is to replace bad borrowers with good ones.
Test set and training set
The training set is the original dataset used to “teach” a model to make predictions or classifications. The test set is a dataset used to measure how well the model performs at making predictions or classifications on that test set. The test set must be different than the training set. Otherwise, you’d get a uselessly excellent score. Training Sets and Test Sets are the cruces of machine learning. Without a test set, there is no way to know whether or not the model overfits or underfits the training data. Either scenario indicates that modelers did not correctly tune the algorithm to learn from the training data.
The process of evaluating each variable for stability and suitability for modeling.