🐍 🐼 📊 🤖 💻

Hello, I'm

Arsh Sahay

|

Transforming raw data into actionable insights through engineering excellence

Scroll to explore

About Me

I'm a passionate Data Engineer with expertise in building robust data pipelines and infrastructure. My background in Python, Data Science, and Machine Learning allows me to bridge the gap between raw data and meaningful business solutions.

Currently working as a Data Engineer, I specialize in designing scalable data architectures and implementing efficient ETL processes that make data accessible and actionable. With hands-on experience across AWS and Azure cloud platforms, I build and deploy end-to-end data solutions in the cloud.

Skills & Expertise

Python

Python

Advanced proficiency in Python for data engineering, automation, and building scalable applications

Data Science

Data Science

Statistical analysis, data visualization, and extracting insights from complex datasets

Machine Learning

Machine Learning

Building and deploying ML models for predictive analytics and pattern recognition

Pandas

Pandas

Expert data manipulation and analysis using Pandas for efficient data processing

NumPy

NumPy

Numerical computing and array operations for high-performance data processing

Data Engineering

Data Engineering

Designing and maintaining data pipelines, ETL processes, and data infrastructure

Experience

Coding

Data Engineer

Jan '25 - Present

Digihumans Technologies Private Limited

  • Architected cloud-based data workflows on AWS and Azure, leveraging Cosmos DB and Blob Storage for scalable data management
  • Built a web scraper using Crawl4AI to extract and process full website content into structured markdown
  • Implemented Redis message queue and AWS SQS to handle 100+ async tasks/hour for pipeline orchestration
  • Developed infrastructure automation using Terraform, Ansible, and Jenkins for CI/CD and deployment pipelines
Data

ETL Engineer Intern

Previous

Digihumans.ai

  • Built ETL pipelines to extract data from multiple document formats (PDF, DOCX, PPTX)
  • Implemented transformation logic to convert diverse file types into unified markdown format
  • Loaded processed data to Azure Blob Storage and Cosmos DB
  • Automated data workflows using Python for document processing and web scraping

Projects

📊

Customer Service Request Analysis

Exploratory data analysis and preprocessing pipeline for NYC 311 service requests dataset. Features data cleaning, feature engineering, and visualization to analyze complaint patterns and resolution times across boroughs.

Python Pandas Jupyter Data Visualization
  • Analyzed missing values and cleaned dataset with 50%+ threshold
  • Computed resolution times for complaint type comparison
  • Created borough-level geographic visualizations
  • Built EDA pipeline for city-wise complaint analysis
🌤️

Weather Agent

An LLM-powered agentic weather assistant that interprets natural language queries and calls a weather API to fetch real-time weather data. Built with an agentic workflow where the model decides when and how to invoke the API based on user intent. Backend self-hosted on a Raspberry Pi 5 inside a Docker container. Frontend built entirely using Kiro.

Python HTML/CSS/JS LLM Agentic AI Weather API Kiro Cloudflare Docker Raspberry Pi
  • LLM autonomously decides when to call the weather API based on query
  • Real-time weather data retrieval via agentic tool calling
  • Backend self-hosted on a Raspberry Pi 5 in Docker, with Cloudflare for HTTPS and secure tunneling
  • Custom-built frontend using HTML, CSS, and JavaScript with Kiro

Get In Touch

Interested in working together? Feel free to reach out!

AS

Arsh's Assistant

Online
Hey! 👋 I'm Arsh's assistant. Ask me anything about his skills, experience, or projects!