Portfolio

🤖 Computer Vision

POC: Face Recognition & People Counting for Productivity Measurement

To identify key regions of productivity based on Facial Recognition, People Counting and Duration (the time spent inside the designated regions)

Dwell time for each of detected person

Read full project details here: Github Repository

Face Recognition in Java

This project will highlight the Face Recognition technique used to detect and recognize faces of individuals where it is involved in capturing people faces from a web camera. It involves candidates and trainers of CERTIFAI Online CDLE. The individual’s images are categorized into two, which are the faces of trainers or candidates. These images are then saved in the face database. The model then should be able to analyze face’s features and match it with the information in a database and identify who there are.

📖 Natural Language Processing (NLP)

Tokenization with spaCy

Below Image shows the Tokenization process take place in spaCy NLP pipelines. Image credited to spaCy.

RiseHill Data Analysis's SaaS Product: Integrated Data Intelligence (IDI).

Image above shows how my application of NLP (cleaned texts) helps for instant browsing of documents that retrieve & full traceability to original documents.

For more details, please check it out in my Github Repository

Named Entity Recognition (NER)

The purpose of this notebook is to demonstrate the entire process of name-entity recognition (NER) from start to the end with Spacy. This notebook also explore pattern matching as an alternative to NER when there is a known small set of fixed values.

This will be a complete end-to-end demonstration of the entire process, including both labelling and model training.

In this notebook, I train a model to detect entities related to oil/petrol from this public dataset which contains a list of emails related to the oil industry. This is an over simplification because we want more generic entities, but it shows how pattern matching is a better alternative than NER in this case. To summarise, we will extract oil-related elements from email messages.

For more details, please check it out in my Github Repository

📈 Data Science

Fraud Detection in Insurance Claims

This predictive model predicts the dataset from auto insurance either the claims is fraudulent or not. This will be a binary classification task and I will demonstrate few auto ML model using Dataiku DSS Platform like Logistic Regression, and Random Forest

Based on the prediction data, the model are able to estimate the total predicted fraudulent claims (amounts), and break down the features of this fraudulent by looking fraud count by insured hobbies etc.

The Impact from Fraud Detection ML Model:

Detect and prevent fraud before claims are paid
Increase the acceptance rate for further investigation resulting in fewer false positives
Increase the identification of suspicious claims

To read more on how I working on for this ML Model, please check it out in my Github Repository

Fraud Detection in Customer Transaction

In this project, machine learning model will predict the probability an online transaction being fraudulent, as indicated by the binary target isFraud.

The data is divided into two files, identification and transaction, which are linked together by TransactionID. Not all transactions are associated with a unique identifier.

This ML Model developed end-to-end with Dataiku DSS Platform.

The goal of this ML Model:

Built machine learning models on a challenging large-scale e-commerce transactions dataset
To help business to reduce fraud loss and increase their revenue
To provide best solutions for fraud prevention

Get to know more about this project by visiting my Github Repo. Click here!

Sulaiha Subi