Senior Data Scientist

Naveen Kumar
Challa

Machine Learning · NLP · LLMs · Healthcare AI

5+ years building production-grade ML, deep learning, and LLM solutions in healthcare and IT analytics. Turning complex data into measurable clinical and business impact.

// At a Glance
Experience
5+
Years in Production ML & AI

Risk Stratification Accuracy
+15%
Improvement @ Humana

Manual Review Time
−30%
via LLM Fine-tuning

HIPAA Compliant MLOps AWS · Azure · GCP Healthcare AI

Who I Am

A data scientist driven by the challenge of translating raw data into real-world healthcare impact.

Hi, I'm Naveen — a Senior Data Scientist currently at Humana, where I architect and deploy machine learning and LLM-powered systems on datasets spanning millions of healthcare records. My work sits at the intersection of AI engineering and clinical impact.

I specialize in NLP, predictive modeling, and MLOps, with a strong commitment to building AI that is explainable, reproducible, and HIPAA-compliant. I care deeply about the downstream effect of every model I ship.

Previously at Mphasis, I built end-to-end text pipelines, classification systems, and ETL infrastructure for enterprise healthcare analytics. I hold an M.S. in Computer Science from Auburn University, Montgomery.

5+
Years Experience
5M+
Records Modeled
90%+
Model Accuracy
3
Cloud Platforms
📍
Location
Open to Work — All Locations in the United States
🏥
Current Role
Senior Data Scientist — Humana
🎓
Education
M.S. Computer Science, Auburn University
🛡️
Compliance
HIPAA-Compliant AI Development

Core Technical Skills

A battle-tested toolkit built across healthcare, enterprise, and cloud environments.

//ML & AI
Deep Learning93%
LLM Fine-tuning90%
Predictive Modeling95%
RAG Systems88%
//NLP
Text Classification92%
Named Entity Recog.89%
Semantic Search90%
Text Summarization91%
//Programming
Python97%
SQL (T-SQL / PL-SQL)91%
R84%
Shell Scripting78%
//MLOps
MLflow / Kubeflow86%
Docker / Kubernetes83%
CI/CD Pipelines88%
AWS SageMaker85%
//Cloud Platforms
AWS88%
Azure (Synapse, ML Studio)82%
GCP (BigQuery, Cloud AI)80%
Snowflake79%
//Data & Visualization
Tableau / Power BI87%
Apache Spark84%
Databricks82%
A/B Testing & Stats90%

Professional Journey

Building intelligent systems that make healthcare smarter and safer.

Jun 2024 – Present
Senior Data Scientist
// Humana · USA
  • Developed and deployed ML models on datasets exceeding 5 million patient records, improving risk stratification prediction accuracy by 15%.
  • Fine-tuned GPT-4 and Llama-2 to automate extraction and summarization of unstructured clinical notes, reducing manual review time by 30%.
  • Designed clinical decision support models using advanced ML and LLMs, enabling timely interventions and reducing hospital readmission rates by 12%.
  • Built intelligent clinical chatbots integrated with vector databases (Pinecone, Weaviate) for semantic search and knowledge retrieval across multi-modal healthcare data.
  • Automated end-to-end data pipelines, feature engineering, and model retraining using Python, R, SQL, and cloud platforms — cutting manual effort by 20%.
  • Implemented real-time model performance monitoring via AWS CloudWatch and Datadog to proactively detect data drift and model decay.
  • Created interactive Tableau and Power BI dashboards to visualize patient trends and KPIs for clinical stakeholders.
Jul 2019 – Dec 2022
Data Scientist
// Mphasis · India
  • Designed and productionized text summarization and Q&A pipelines using GPT-3/4 and vector databases, enhancing semantic search on unstructured healthcare data.
  • Built and deployed classification, regression, and clustering models with Python, R, Scikit-learn, and Spark MLlib — achieving up to 88% accuracy in pilots.
  • Automated ETL and feature engineering pipelines using Airflow, Databricks, and Docker, reducing data processing time by 25%.
  • Applied LLM-powered data augmentation for NLP datasets, boosting annotation speed by 25% and improving model training throughput.
  • Developed custom NLP solutions using spaCy, NLTK, and Hugging Face Transformers for clinical text analytics.
  • Tuned models with grid search, cross-validation, and Optuna hyperparameter optimization, reducing overfitting and maximizing generalization.
  • Conducted root cause analysis on operational inefficiencies, delivering process improvements that saved 15% in time and resources.

Key Achievements

Measurable outcomes delivered at scale in real healthcare environments.

🏥
−12%
Reduced hospital readmission rates through clinical decision support models at Humana, enabling timely patient interventions.
📊
+15%
Improved patient risk stratification accuracy using ML models trained on 5M+ healthcare records at Humana.
−30%
Cut manual clinical note review time via LLM fine-tuning and automated summarization pipelines using GPT-4 and Llama-2.
🤖
+25%
Accelerated NLP dataset annotation speed using LLM-powered data augmentation at Mphasis, improving training throughput.
🔄
−20%
Reduced manual data processing effort through automated ETL pipelines and cloud infrastructure automation.
🎯
90%+
Maintained post-deployment model accuracy above 90% through SHAP, LIME validation and HIPAA-compliant monitoring pipelines.

Full Technology Stack

Every tool I've used in production across data science, MLOps, and cloud.

Languages
PythonRSQL (T-SQL)PL/SQLMATLABShell Scripting
ML & Deep Learning
Scikit-learnTensorFlowPyTorchKerasXGBoostLightGBMHugging FaceOpenAI APIspaCyNLTK
LLMs & NLP
GPT-4Llama-2RAGPineconeWeaviateSemantic SearchNERTopic ModelingText Summarization
MLOps & Data Engineering
MLflowApache AirflowApache SparkDatabricksDockerKubernetesKubeflowVertex AISageMakerSeldon CoreFeastGreat Expectations
Cloud & Databases
AWS (S3, EC2, Lambda, Glue)Azure SynapseGCP BigQuerySnowflakeCloudWatchAWS Glue
Visualization & Stats
TableauPower BIMatplotlibSeabornPlotlyDashSHAPLIMEOptunaA/B Testing
Tools & Platforms
JupyterVS CodeGit / GitHubJiraConfluenceWeights & BiasesNeptune.aiClearMLDatadog

Let's Connect

Open to new opportunities, collaborations, or just a conversation about ML and AI in healthcare.

Whether you have a project in mind, want to discuss a role, or are curious about any of my work — feel free to reach out. I typically respond within 24 hours.