From A to Zest AI — a glossary of machine learning terms for credit underwriting know-how

Zest AI

April 20, 2021

This glossary provides definitions of key terms used in machine learning for credit underwriting. These terms are often used when talking about advanced analytics–especially for credit risk assessment and decision-making.

Updated: 6/16/2025

_____________________________

Adverse action

An adverse action is a refusal to grant credit in the same amount and terms as the applicant requested. A termination of an account or an unfavorable change in the terms (ex.: lowering credit limit, raising APR). Or, a refusal to increase the amount of credit upon the applicant’s request.

If the creditor takes an “adverse action” against an applicant, they must provide the written denial in the form of a “notice of adverse action.” Included in the notice are the specific reasons for the action taken (or the applicant’s right to receive these).

Adversarial attacks

Adversarial attacks are inputs to machine learning models intentionally designed to cause the model to return an incorrect value. Malicious individuals can use adversarial attacks to steal information or money from an AI-based system.

AI-automated credit underwriting

Lenders can use machine learning algorithms to offer borrowers immediate loan decisions, significantly cutting down on manual review for the majority of applications. A machine learning algorithm can more accurately predict credit risk and enhance automation, reducing a lending decision to mere seconds.

‍Artificial Intelligence (AI)

Artificial Intelligence (AI) is a field of computer science that focuses on creating machines and algorithms that can think and solve problems like humans. It involves creating systems that can analyze data, identify patterns, and make predictions or choices based on the large datasets that are provided to the machine learning model during training (see Datasets below).

Big data

“Big data” refers to a subset of information technology that is involved in the processing of large amounts of data. Big data sets are often sporadic, varied, and unstructured (such as voice, video, and free text entry). Big data technologies provide the storage, networking, and processing infrastructure necessary to apply machine learning to the massive amounts of data collected today to create predictive models.

Blindspot analysis

Examining the behavior of a machine learning model using data on which it has not been trained. Blindspot analysis is a step in model validation and performance analysis.

Credit score

A credit score is a number that lenders use to evaluate how likely an individual will repay their debts. Credit scores can be considered custom and/or proprietary to a lender (based on an that lender’s private scoring or underwriting model) or, as defined in the Fair Credit Report Act (FCRA), FCRA-reportable when they are based solely on information found in a consumer’s credit report maintained by a national consumer reporting agency. Typically, the higher the score, the lower the risk to the lender. Custom credit scoring models or national credit scores (such as FICO) are often derived from a proprietary formula based on variables in your credit history like payment history, the number of accounts, and amounts owed. Your credit score may affect the interest rate you pay to a lender and can make the difference between a loan being approved or declined.

Data dictionary

A data dictionary is a document or repository that comprehensively describes the variables used to create a model, how the data was gathered, the data sources involved, and how combined values are computed. A data dictionary may, for example, contain an entry describing a variable that represents the number of delinquent payments on revolving lines of credit for an applicant as reported by a credit bureau.

Data feature

A piece of information that can be measured.

Data scientist

A data scientist is a person employed to analyze and interpret complex digital data to assist a business in its decision-making. A data scientist merges software engineering skills to gather and manage data from various sources with statistical expertise to derive insights from that information.

Dataset

A dataset is a group of individual records with defined data features. For example, a credit underwriting model could use a dataset of 1,000,000 application records to learn how to predict credit default based on the individual datapoints or features in each applicant record. Training datasets are used to “teach” a model to make these predictions or classifications. The test set is a dataset used to measure how well the already trained model performs at making predictions or classifications on that test set. The test set is usually controlled to be distinct from the training set. Training sets and test sets are the cruces of machine learning. Without a test set, there is no way to know whether or not the model overfits or underfits the training data or does a good job of predicting its target outcome. Overfitting or underfitting indicates that modelers did not correctly tune the algorithm to learn from the training data.

Decision trees

Decision trees are a widely used type of machine learning model. They fit a sequence of decisions to a data set. Used alone, they tend to perform poorly. Still, when used in combination with other trees or assembled with different tree-based model types such as gradient-boosted classifiers or random forests, they can achieve excellent performance.

Deep neural networks

Deep neural networks are a powerful type of machine learning model known for their ability to solve complex problems in areas like computer vision (enabling computers to “see” and understand visual information), autonomous navigation, and credit and insurance underwriting. These models are built on artificial neural networks with multiple layers—often many layers—allowing them to analyze intricate data patterns and make sophisticated predictions.

Disparate impact

A facially neutral practice that has a disproportionately negative impact on a prohibited basis. Disparate impact liability does not require showing that a creditor had an intent to discriminate. It arises when a facially-neutral credit policy, credit characteristic, or group of interrelated characteristics disproportionately negatively impacts protected class borrowers.

Disparate treatment

Disparate treatment is a type of intentional discrimination. Disparate treatment is a form of discrimination where a company or individual intentionally treats individuals differently based on protected characteristics like race, gender, or religion. Unlike disparate impact, which focuses on neutral policies with disproportionate effects, disparate treatment requires proof of intentional discrimination. In AI model risk assessment, we look at model design and variable choices as possible sources of disparate treatment risk.

Equal Credit Opportunity Act (ECOA)

The Equal Credit Opportunity Act (ECOA) is a federal law that prohibits discrimination in lending based on race, color, religion, national origin, sex, marital status, age, or receipt of public assistance.

Ensemble model

An ensemble model combines two or more related but different analytical models, and their results are then synthesized into a single model output or score to improve the accuracy of predictive analytics and data mining applications.

Explainability

The ability to explain a model’s results, such as why an applicant was approved or denied a loan. This is especially important when using AI-automated underwriting, as multiple interrelated factors contribute to the decision.

Exploratory data analysis

The process of evaluating data in advance of modeling is used to determine the character of the population overall and critical segments to the business (e.g., approved applicants vs. denied segments). Exploratory data analysis may look at correlations between variables and targets and compute descriptive statistics (mean, median, etc.) disaggregated by segment and over time. Shifts in distributions over time can indicate problems with specific variables that may make them less suitable for modeling. Thus, exploratory data analysis is a best practice initial step in any modeling project.

Fair Credit Reporting Act (FCRA)

Congress passed FCRA in 1970 to promote the accuracy, fairness, and privacy of the information held by consumer reporting agencies (also referred to as credit bureaus). Among other requirements, FCRA requires the disclosure by creditors when they obtain or use credit information or scores from outside sources like consumer reporting agencies/bureaus, so that applicants may contact these sources and make corrections in the case of errors that may be negatively impacting their credit history or resulting credit score.

Fair lending

Fair lending refers to a set of laws and regulations that govern the extension of credit to an individual or a business such that it is proven to be free of bias or discrimination on the basis of race, gender, sex, or other protected categories. Federal legislation governing fair lending and credit in the U.S. includes the Fair Housing Act and the Equal Credit Opportunity Act of 1974.

Feature engineering

The process of transforming raw data into variables is used to create a clearer picture of an applicant’s financial situation during credit risk modeling. A classic example of feature engineering is the development of ratios such as debt-to-income (DTI). DTI is intended to capture how easy it would be for a borrower to repay a loan given their income. Another example of feature engineering is creating rates, such as the rate of credit utilization over time. By considering whether the rate of credit utilization is increasing or decreasing — and how quickly or slowly — a model can more accurately predict the likelihood of default on an additional credit facility.

Generative adversarial networks

A widely used machine learning technique that produces results by using one network to generate outputs and another to judge the results. The algorithm repeats the judging process until it reaches a satisfactory result.

Gradient boosting

Gradient boosting is a machine learning technique for regression and classification problems that produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees.

Holdout set

Data is reserved for testing a model after it has been built to see how it will perform on data never seen before.

Hyperparameter

In machine learning, a hyperparameter is a parameter whose value is set before the learning process begins. By contrast, the values of other parameters are derived via training. Different model training algorithms require different hyperparameters, and some simple algorithms require none.

Integrated gradient

An integrated gradient is an algorithm used to attribute the prediction of a deep neural network to its input features. Zest AI has patented an attribute method called Generalized Integrated Gradients that extends this explainability to a combination of neural networks and tree-based models.

Leaky variable

A leaky variable is a variable used in training data for a machine-learning algorithm that happens to include information the model is attempting to predict or is not available at the time of prediction.

‍
Linear regression analysis

This type of analysis is a mathematical method that shows the relationship between two or more variables.

‍
Logistic regression

Logistic regression is a type of regression analysis used to make predictions from a data set by explaining the relationship between one dependent binary variable (e.g., present or absent) and one or more nominal, ordinal, interval, or ratio-level independent variables.

Machine Learning

Machine learning (ML) is a subfield of AI that aims to create intelligent machines that learn from data. Many problems are becoming too complex to solve via traditional software engineering methods, and data is increasingly becoming essential to guide problem-solving. Machine learning enables computers to acquire human-like abilities by processing vast quantities of data and finding subtle, non-intuitive patterns in data to predict an outcome.

Model

A model is essentially a program that has been trained to recognize patterns and make predictions based on data.

‍
Model bias

There are a few contexts for the use of the word “bias” in machine learning. Statistical bias is the amount by which an expected model prediction differs from the true value of the training data. The most common interpretation of bias is the bias-variance tradeoff, in which bias is a source of error in your model that causes it to over-generalize and underfit your data. In contrast, variance is sensitivity to noise in the data that causes your model to overfit. The layman’s view of bias is that machine learning (ML) models take on unconscious or unintentional discrimination created by their programmers, such as instances of ML models misidentifying people because they were not trained on sufficiently diverse or representative training records of people’s faces.

Model drift

Model drift is a decline in a machine learning model’s accuracy or performance over time. This happens when the data the model encounters drifts away from the data it was trained on. As a result, the model’s predictions become less reliable.

Model refit

Model refit takes a deployed model and refits the same model by training it on the latest data.

Model Risk Management (MRM)

This is a comprehensive process of managing model risk as outlined in the OCC, NIST AI Risk Management Framework (AI RMF), and Federal Reserve Supervisory Guidance on Model Risk Management, 2011-12. Model risk management involves documenting the modeling approach and the theoretical basis for and limitations of the modeling methods used. Robust MRM also requires performing and recording model validation and verification steps and ensuring the safe operation of models in production.

Model validation

Model validation is the process of determining whether the model will behave as expected on new data or over time. For example, back-testing validates model predictions against known outcomes by holding out a sample of applications for loans that later defaulted and evaluating whether a model would have accurately predicted defaults. Model sensitivity testing allows validators to understand how a model would behave under various conditions. More advanced approaches enable efficient computation of model outcomes across a range of variables without requiring exhaustive search. One best practice: Analyze the swap-in population, the ones who will be newly approved by a new model, to see how different they are from those approved in the past. If they look the same, you know there’s less risk in using the new model.

Overfitting

A model may perform well on training data but fail to generalize on test data held out for evaluation. When a model is highly tuned to the training data but fails on data from the real world, the model is ‘overfit.’ Ensuring a model will generalize and operate reliably over time is a critical consideration in any modeling project.

Performance monitoring

This term can mean a few things, including but not limited to:

Input distribution monitoring: Comparing recent model input data with similar training data to determine, for example, whether variable distributions of incoming credit applications are significantly different from model training data. Systems for monitoring model inputs should send alerts when they spot anomalies or shifts that exceed safe bounds.

Missing input data monitoring: Comprehensive model monitoring should include monitoring for missing data. A complete model monitoring program should monitor and provide alerts when the rate of missing data, and its impact on model outputs and downstream business outcomes, exceed desired thresholds.

Output distribution monitoring: Monitoring systems should compute statistics that establish the degree to which the score distribution has shifted from the scores generated by the model in prior periods, such as those contained in training and validation datasets.

Execution failure monitoring: Error and warning alerts generated during model execution can indicate flaws in model code that may affect model outputs.

Latency monitoring: Ensures that model execution code and infrastructure meet the latency requirements of applications and workflows that rely on model outputs.

Economic performance monitoring: A solution to enable analysts to configure alerts on key performance indicators such as default rate, approval rate, and volumes. Substantial changes in these indicators can signal operational issues with model execution that need to be investigated and understood to manage risk.

Reason code stability: Reason codes explain the key drivers of a model’s score. You should monitor their distribution in case material changes indicate a significant shift in the character of the applicant population.

Fair lending analysis: It is essential to monitor loan approvals and default rates across protected classes to ensure fair treatment. Historically, this monitoring has been done long after the fact. Machine learning models can be monitored periodically to address bias risks that may arise unexpectedly.

Population drift

Population drift occurs when the underlying population scored by a machine learning model diverges from the population used to train the model. Credit analysts using machine learning have to use a given population to generate a predictive model. Still, sometimes the character of that population changes, for example, when a bank expands into a new state, launches a new marketing program, or creates a digital offering. The character of the populations drawn from these new channels may differ from the population the model was trained to predict. Population drift can diminish a model’s predictive accuracy, making it necessary to monitor model inputs more frequently to identify and mitigate population drift.

Proxy models

A proxy model is a mathematically or statistically defined function that replicates the simulation model output for selected input parameters. Proxy models are widely applied in different areas of science for numeric modeling approximation.

Python

Python is an interpreted, object-oriented, high-level programming language popular with developers and data scientists because of its clear syntax and readability.

Sensitivity analysis

Sensitivity analysis is the act of apportioning uncertainty in the output of a mathematical model to the sources of uncertainty in its inputs. Typical implementations of sensitivity analysis include exploring all combinations of inputs and permuting these inputs one by one to understand the influence of each variable (or a combination thereof) on model scores.

Swap-set analysis

A swap-set analysis captures the change in consumer distribution across a score range in a predictive model. In a credit underwriting context, swap-sets help a modeler visualize the selected segments that get moved from approved to rejected (swap-outs) and from declined to approved (swap-ins). You compare the swap-sets with the groups that would have been denied and approved in both existing and challenger models. The objective is to replace bad borrowers with good ones.

Supervised training

Supervised training is a machine learning method where you train a model using data that’s already labeled. This means each piece of information you feed the model comes with the correct answer. The model learns by comparing its guesses to these known answers, then it tweaks itself to reduce mistakes and become more accurate. The goal is for the model to be able to make good predictions on new information it hasn’t seen before.

Variable analysis

Variable analysis is the process of evaluating each variable for stability and suitability for modeling.