Machine learning (ML) is used throughout the financial services industry to perform a wide variety of tasks, such as fraud detection, market surveillance, portfolio optimization, loan solvency prediction, direct marketing, and many others. This breadth of use cases has created a need for lines of business to quickly generate high-quality and performant models that can be produced with little to no code. This reduces the long cycles for taking use cases from concept to production and generates business value. In this post, we explore how to use Amazon SageMaker Autopilot for some common use cases in the financial services industry.
Autopilot automatically generates pipelines, trains and tunes the best ML models for classification or regression tasks on tabular data, while allowing you to maintain full control and visibility. Autopilot enables automatic creation of ML models without requiring any ML experience. Autopilot automatically analyzes the dataset, processes the data into features, and trains multiple optimized ML models.
Data scientists in financial services often work on tasks where the datasets are highly imbalanced (heavily skewed towards examples of one class). Examples of such tasks include credit card fraud (where a very small fraction of the transactions are actually fraudulent) or bankruptcy (only few corporations file for bankruptcy). We demonstrate how Autopilot automatically handles class imbalance without requiring any additional inputs from the user.
Autopilot recently announced the ability to tune models using the Area Under a Curve (AUC) metric in addition to F1 as the objective metric (which is the default objective for binary classification tasks), more specifically the area under the Receiver Operating Characteristic (ROC) curve. In this post, we show how using the AUC as the model evaluation metric for highly imbalanced data allows Autopilot to generate high-quality models.
Our first use case is to detect credit card fraud based on various anonymized attributes. The dataset is highly imbalanced, with over 99% of the transactions being non-fraudulent. Our second use case is to predict bankruptcy of Polish companies . Here, bankruptcy is similarly a binary response variable (will bankrupt = 1, will not bankrupt = 0), with 96% of the companies not becoming bankrupt.
To reproduce these steps in your own environment, you must complete the following prerequisites:
- Create an AWS Identity and Access Management (IAM) role that allows the Amazon SageMaker notebook access to Amazon Simple Storage Service (Amazon S3) for storing data.
- Create an Amazon SageMaker notebook instance.
- Create an S3 bucket to store the outputs of your machine learning models and any data.
Credit card fraud detection
Source - Continue Reading: https://aws.amazon.com/blogs/machine-learning/creating-high-quality-machine-learning-models-for-financial-services-using-amazon-sagemaker-autopilot/