First Project - Example

Please use the following Google Colaboratory project for a complete example you can work through showing how you can integrate the Seclea Platform into your data science workflow. Please copy the Google Colaboratory project into your account and run it from there.

Following is a step-by-step guide, followed by the Google Colaboratory project.

We will run through a sample project showing how to use Seclea's tools to record your data science work and explore the results in the Seclea Platform.

Set up the Project

You can head to platform.seclea.com and log in.

Please create a new project by clicking the "+ Create" button and giving it a name and description.

Creating a New Project on the Seclea Platform

After you have created your project, it will appear in the All Projects section, as shown below:

All Projects Section After Project Creation

Click the project name in the All Projects section to go to the project setting page on the Seclea Platform.

Project Dashboard on the Seclea Platform

Click the settings option in the left panel will take you to the respective project's settings.

Individual Project Setting Page on the Seclea Platform

In the project setting, you can select, modify or set new templates for compliance, risk management and any internal standard/policy you must comply with. Details on how to use compliance, risk management, and internal settings are detailed in their respective sections.

If you want to include additional team members in the project, you can include them using the Access section of the setting.

These are optional settings; you can set/modify them later in the project.

Integrating with Seclea-AI

You can get the seclea-ai package from pip

When you initialise the SecleaAI object, you will be prompted to log in if you haven't already done so. You must use the same Project Name you used earlier and the Organization name provided with your credentials.

Handling the Data

You can download the data for this tutorial if you are working on this in Colab or without reference to the repo - this is an Insurance Claims dataset with various features and 1000 samples.

Now we can upload the initial data to the Seclea Platform.

This should include whatever information we know about the dataset as metadata. There are only two keys to add in metadata for now - outputs and continuous_features.

You can leave out outputs if you haven't decided what you will be predicted yet, but you should know or be able to find the continuous features at this point.

You can also update these when uploading datasets during/after pre-processing.

Evaluating the Dataset

After running the above section, head back to the Seclea Platform so that we can take a closer look at our Dataset. To do so, you can navigate to the datasets section - under Prepare tab.

Youtube video showing how to access the dataset evaluation page on the Seclea Platform.

Personal Identifiable Information (PII) and Format Check

Youtube video showing how to access the dataset PII and format check functionality on the Seclea Platform.

Data Bias Check

Dataset bias evaluation functionality on the Seclea Platform.

Transformations

When using Seclea to record your Data Science work, you will have to take care with how you deal with transformations of the data.

We require that all transformations are encapsulated in a function that takes the data and returns the transformed data. There are a few things to be aware of, so please see the docs for more.

Data Cleaning

We will carry out some pre-processing and generate a few different datasets to see how to track these on the platform. This also means we can train our models on different data and see how that affects performance.

Upload Intermediate Dataset

Before balancing the datasets we will upload them to the Seclea Platform.

  • We define the metadata for the dataset - if there have been any changes since the original dataset we need to put that here, otherwise, we can reuse the original metadata. In this case, we have dropped some of the continuous feature columns so we will need to redefine

  • We define the transformations that took place between the last state we uploaded and this dataset. This is a list of functions and arguments. Please have a look at docs.seclea.com for more details on the correct formatting.

Evaluating the Transformations

Now head to platform.seclea.com again to take another look at the Datasets section. You will see that there is much more to look at now.

You can see here how the transformations are used to show you the history of the data and how it arrived in its final state.

Evaluating the dataset transformations over the course of a project.

Modelling

Now we get started with the modelling. We will run the same models over each of our datasets to explore how the different processing of the data has affected our results.

We will use three models from sklearn for this, DecisionTree, RandomForest and GradientBoosting Classifers.

Training

Analysis

Let's head back to platform.seclea.com and we can analyse our Models

Analysis of the trained models on the Seclea Platform.

Last updated