How I Reduced Inference Latency by 43%
Optimizing a BERT sentiment model from 85ms to 48ms using ONNX Runtime, quantization, and request batching. Includes benchmarks and deployment considerations.
ML Engineer Instance
Neural network topology of a production ML engineer
| Layer | Type | Description | Params |
|---|---|---|---|
| INPUT | Foundation | Computer Science & Mathematics | ∞ |
| Dense | Machine Learning Fundamentals | 512 | |
| Conv2D | Deep Learning & Neural Networks | 1024 | |
| Attention | NLP & Computer Vision | 2048 | |
| Transformer | LLMs & Generative AI | 4096 | |
| OPTIM | AdamW | Real-world Deployment & MLOps | lr=0.001 |
| OUTPUT | Softmax | Production-ready ML Systems | ∞ |
7,680+
Hours of LearningMulti-task
Parallel ProcessingMinimizing
Continuously OptimizingReal-time
Fast DeploymentProduction models shipped with measurable impact
A production-ready Flask web application for predicting dengue risk in Dhaka, Bangladesh using XGBoost, scikit-learn, and advanced feature engineering for individual risk predictions and public health insights.
A lightweight Flutter mobile app for research paper discovery and management with built-in ML capabilities for automated categorization, clustering, semantic search, and smart recommendations.
Model training logs — from initialization to production
Interactive machine learning models running in your browser — proof that I deploy, not just code.
Type anything → Get real-time sentiment prediction with confidence scores.
Chat with my portfolio AI — ask about skills, projects, or experience.
Paste resume text → Predicts if the candidate fits an ML Engineer role.
Input house features → Get instant price prediction from trained model.
Lessons learned from building and deploying ML systems
Optimizing a BERT sentiment model from 85ms to 48ms using ONNX Runtime, quantization, and request batching. Includes benchmarks and deployment considerations.
From single-container deployments to multi-service architectures. Docker Compose for ML, health checks, resource limits, and graceful shutdown patterns.
A decision framework for choosing between traditional ML and deep learning based on data, interpretability, latency, and maintenance overhead.
Chunking strategies, embedding model selection, re-ranking, and evaluation metrics that correlate with user satisfaction in production RAG pipelines.
What I learned shipping my first 5 ML models. Error handling, input validation, logging, model versioning, and the unglamorous work that makes systems reliable.
You don't need Feast or Tecton to start. Practical patterns for feature pipelines with DVC, SQLite, and simple Python scripts that scale.
Got a problem worth solving? I turn messy data into production systems. Reach out — I reply fast.
{
"name": "Shahin",
"role": "ML Engineer",
"email": "shahinok1912@gmail.com",
"github": "github.com/shahin5646",
"linkedin": "linkedin.com/in/shahin-5646-diu",
"status": "Open to opportunities",
"response_time": "< 24 hours",
"interests": [
"Production ML Systems",
"NLP & LLM Applications",
"Computer Vision Pipelines",
"MLOps & Infrastructure"
]
}
Have an idea, a project, or just want to say hi? Drop a note below.