dataninjato
Projects Blog About

    Harnessing Practical Ecommerce Data Scenarios with OpenAI API and ChromaDB Integration

    OpenAI ChromaDB Python E-commerce Product Recommendation Semantic Search Sentiment Analysis

    Important applications for Ecommerce business also utilize textual data to either supplement other Machine Learning models with more data fields or can be used to quickly produce applications on their own and on the fly. Let's have a quick look at those latter and how sole textual embeddings can quickly benefit Ecommerce businesses by ramping up quick AI application use cases. For this purpose lets ...

    16 April 2024

    Realtime streaming data analytics with PyFlink, Apache Kafka, Elasticsearch & Kibana

    PyFlink Apache Kafka Elasticsearch Kibana Python Data Analytics Streaming Data

    Analyse and monitor business data & payment amounts in ten retail branches realtime with a Kafka/Elasticsearch/Kibana setup.

    13 February 2024

    Logistics Time-Driven Activity-Based Costing (TDABC) - Applied Python in Costing Logistic Services

    Activity-Based Costing Time-Driven Activity-Based Costing Python Supply Chain Logistics

    Implementing Time-Driven Activity-Based Costing (TDABC) with Python and extending with storage costs. Thus significantly enhancing logistics decision-making by providing valuable insights and improving various facets of operational efficiency and profitability management.

    04 February 2024

    Dockerize Python market data scraping app with postgresql db

    Docker Python SQL Postgres Scraping Server Power BI

    Data Analytics requires conistent error free data generation and therefore our Python scraping app will run in a docker container environment which is isolated from host issues and retains the data at the same time between runs for later analysis.

    23 January 2024

    Auto ingest data to Snowflake with Snowpark Python API and RSA key based authentication

    Data Engineering Snowflake Data Warehouse data ingestion Key based authentication Python SQL

    Automate the ingestion of local or cloud based data files as part of our Data Engineering task to the cloud-based data warehouse Snowflake with help of Snowpark Python Libraries and API to process SnowSQL commands.

    26 October 2023

    Applied Multi-Criteria Supplier Selection with Data Envelopment Analysis & Game Cross-Efficiency Model

    supplier selection data envelopment analysis DEA R linear programming data scaling

    Demonstrating supplier evaluation and selection with Data Envelopment Analysis & and its competition compatible Game Cross-Efficiency Model.

    08 October 2023

    Monte Carlo Analysis of a Discrete Event Simulation of a non deterministic manufacturing process with multi resources using SimPy

    SimPy Discrete Event Simulation (DES) Process Management non deterministic processes Monte Carlo Simulation Python

    Implementing a non deterministic manufacturing process as a Discrete Event Simulation with the Python SimPy package and analysing the simulations statistically through multiple Monte Carlo runs.

    05 July 2023

    Modelling Order Fulfillment Business Process with BPMN 2.0

    Business Process Modelling Process Management Order fulfillment BPMN 2.0 AWS/MWS ERP SQL XML

    Documenting and demonstrating my process-modelling and -management knowledge and the skills to document business processes with BPMN 2.0 at my own Order Fulfillment Business Process conceived and implemented in PHP code.

    12 May 2023

    Statistical Process Control with automatically generated Control Charts in Python

    Statistic Process Control Process Management Lean Six Sigma Control Charts continuous improvement Python

    Review of Python package pyspc to automatically generate Control Charts, e.g. Xbar - S Charts to monitor a repetitive process in both manufacturing and service processes.

    26 January 2023

    Supply Chain Inventory Reorder Policy Comparison - Monte Carlo Simulation, Bayesian Optimization & Tuning

    Supply Chain Inventory management Monte Carlo Simulation Bayesian Optimization 3D Visualization Parallelized Processing Python

    Having worked with interesting product demand data and the implementation of the product's reorder policies and related costs, I felt driven to update and improve the Jupyter notebook of the author Mehul. He featured this notebook on his tds article "Inventory Management — Dealing with unpredictable demand".

    18 December 2022

    Belgium Train Delay Analysis with PySpark on Azure Databricks Cluster

    Python PySpark Azure Databricks EDA

    I have got hands on Belgium Train Delay data set which is due to be analyzed with PySpark on an Azure Databricks Cluster notebook session. The goal of the analysis is finding of correlating features with train delays. For that to happen this barebone data set will have to be complemented with further data in the future; e.g. data from business systems, IoT (Internet of Things) sensor data in trains and stations, and external data sources like e.g. local/regional weather data.

    12 November 2022

    Azure Machine Learning Python SDK Deployment of Airline Delay Data Processing & Classification Modelling Pipeline

    Python Azure Azure Machine Learning Azure ML Python SDK Pipeline Machine Learning Classification SHAP

    Complete Pipeline workflow including registering data set on Azure and uploading files into the datastore, converting former notebook to a Python script for the Preprocessing Pipeline step to be run and secondly the compact ML script to complete the Pipeline before running it. Finally having a quick look which features are of high importance in predicting delays.

    06 October 2022

    Process Analytics & Process Mining with pm4py ~ Case study of an 'Order to Cash' process

    Process Analytics Process Mining Business Processes Petri Nets BPMN pm4py Python bupaR

    I have a look at the pm4py python package of Fraunhofer Institute, which analog to R's bupaR package is able to mine and analyse processes through its log data. Analysing event data is an iterative process of three steps extraction, processing and analysis ...

    27 September 2022

    Survival Analysis for C-MAPSS turbofan jet engines ~ predictive maintenance

    Survival Analysis Python Time Series predictive maintenance

    The NASA C-MAPSS multivariate timeseries data set is about a set of identical turbofan jet engines all showing different degrees of initial wear and manufacturing variations as well as operational settings. We apply statistical methods of Survival Analysis to analyze the length of time until the undesired occurrence of an failure event can be observed.

    18 September 2022

    Bayes Theorem applied in Python incl. A/B Testing example - Notes

    Statistics Python A/B testing Inference Notes

    Notes on the datacamp course 'Bayesian Data Analysis in Python' exercising through e.g. A/B testing the Bayesian way using credible intervals instead those of confidence.

    11 May 2022

    Non contractual Customer Lifetime Value estimated probabilistically with the Beta Geometric/Negative Binomial Distribution (BG/NBD) Model

    Customer Lifetime Value Python Beta Geometric/Negative Binomial Distribution (BG/NBD) Customer Penetration Purchase Frequency

    We assume an online service business where customers/clients continously purchase our services. For such a service business we generate our customer transactions ourselves instead of using once again one of the few available public Datasets. We pick a Gamma distribution for the tenure of our ...

    31 March 2022

    OpenSea API and Scrape Explorer Streamlit Automated Dashboard App

    Streamlit Python Dashboard API Scraping Analysis Tool Heroku

    Self Servicing Demo of an app that provides real added value and high reusability for more than one user only. A Data Analyst can support and reach a wide audience of users, who would like to self service and maybe ad-hoc require a piece of information right away.

    18 March 2022

    Propensity Score Matching on AirAsia Insurance Upsell Data (Causal Inference)

    Causal Inference Propensity Score Matching Propensity Modelling Python Kaggle

    Propensity Score Matching as a method to descriptive data science modelling exemplified with AirAsia bookings dataset from Kaggle.

    15 March 2022

    Non Fungible Token (NFT) Rarity Analysis with Jaccard Distance

    Python sample set dissimilarity Jaccard Distance Jaccard Similarity Scraping NFT Blockchain Multiprocessing

    NFT trait set distance calculation with Jaccard Similarity/Distance for rarity measurement of indivual tokens in collections

    23 February 2022

    Are You Ready for the Zombie Apocalypse - Datacamp R project in Statsmodels (Python)

    Statsmodels Python Logistic Regression Datacamp CDC Zombies

    The CDC survival factor analysis of the Zombie Apocalypse originally in R, now implemented with Statsmodels

    15 January 2022

    Mobile Games AB Testing with Cookie Cats

    AB-testing

    Where should the gates be placed? Initially the first gate was placed at level 30, but in this notebook we're going to analyze an AB-test where we moved the first gate in Cookie Cats from level 30 to level 40. In particular, we will look at the impact on player retention.

    20 December 2021

    A/B testing of website landing pages - ETL with awswrangler, AWS Python SDK and AWS Cloud Services

    ab-testing python ETL awswrangler aws python SDK aws glue aws athena aws s3

    data ingestion with awswrangler, AWS Python SDK and AWS S3, Glue & Athena services

    18 December 2021

    Logistic Regressions Assumption

    Logistic Regression Algorithm Assumptions

    I have worked through this notebook created by Kenneth Leung and updated and corrected certain code parts and he merged my pull request into his Original.

    07 December 2021
    with by Data Ninjato
    theme portfolYOU