First Project - Example
Please use the following Google Colaboratory project for a complete example you can work through showing how you can integrate the Seclea Platform into your data science workflow. Please copy the Google Colaboratory project into your account and run it from there.
Following is a step-by-step guide, followed by the Google Colaboratory project.
We will run through a sample project showing how to use Seclea's tools to record your data science work and explore the results in the Seclea Platform.
Set up the Project
You can head to platform.seclea.com and log in.
Please create a new project by clicking the "+ Create" button and giving it a name and description.

After you have created your project, it will appear in the All Projects section, as shown below:

Click the project name in the All Projects section to go to the project setting page on the Seclea Platform.

Click the settings option in the left panel will take you to the respective project's settings.

In the project setting, you can select, modify or set new templates for compliance, risk management and any internal standard/policy you must comply with. Details on how to use compliance, risk management, and internal settings are detailed in their respective sections.
If you want to include additional team members in the project, you can include them using the Access section of the setting.
These are optional settings; you can set/modify them later in the project.
Integrating with Seclea-AI
You can get the seclea-ai package from pip
When you initialise the SecleaAI object, you will be prompted to log in if you haven't already done so. You must use the same Project Name you used earlier and the Organization name provided with your credentials.
Handling the Data
You can download the data for this tutorial if you are working on this in Colab or without reference to the repo - this is an Insurance Claims dataset with various features and 1000 samples.
Now we can upload the initial data to the Seclea Platform.
This should include whatever information we know about the dataset as metadata. There are only two keys to add in metadata for now - outputs and continuous_features.
You can leave out outputs if you haven't decided what you will be predicted yet, but you should know or be able to find the continuous features at this point.
You can also update these when uploading datasets during/after pre-processing.
Evaluating the Dataset
After running the above section, head back to the Seclea Platform so that we can take a closer look at our Dataset. To do so, you can navigate to the datasets section - under Prepare tab.
Personal Identifiable Information (PII) and Format Check
Data Bias Check
Transformations
When using Seclea to record your Data Science work, you will have to take care with how you deal with transformations of the data.
We require that all transformations are encapsulated in a function that takes the data and returns the transformed data. There are a few things to be aware of, so please see the docs for more.
Data Cleaning
We will carry out some pre-processing and generate a few different datasets to see how to track these on the platform. This also means we can train our models on different data and see how that affects performance.
Upload Intermediate Dataset
Before balancing the datasets we will upload them to the Seclea Platform.
We define the metadata for the dataset - if there have been any changes since the original dataset we need to put that here, otherwise, we can reuse the original metadata. In this case, we have dropped some of the continuous feature columns so we will need to redefine
We define the transformations that took place between the last state we uploaded and this dataset. This is a list of functions and arguments. Please have a look at docs.seclea.com for more details on the correct formatting.
Evaluating the Transformations
Now head to platform.seclea.com again to take another look at the Datasets section. You will see that there is much more to look at now.
You can see here how the transformations are used to show you the history of the data and how it arrived in its final state.
Modelling
Now we get started with the modelling. We will run the same models over each of our datasets to explore how the different processing of the data has affected our results.
We will use three models from sklearn for this, DecisionTree, RandomForest and GradientBoosting Classifers.
Training
Analysis
Let's head back to platform.seclea.com and we can analyse our Models
Last updated