Senior Data Scientist

Naveen Kumar
Challa

Machine Learning · NLP · LLMs · Healthcare AI

5+ years building production-grade ML, deep learning, and LLM solutions in healthcare and IT analytics. Turning complex data into measurable clinical and business impact.

View Experience Get in Touch

// At a Glance

Experience

Years in Production ML & AI

Risk Stratification Accuracy

+15%

Improvement @ Humana

Manual Review Time

−30%

via LLM Fine-tuning

HIPAA Compliant MLOps AWS · Azure · GCP Healthcare AI

01 — About

Who I Am

A data scientist driven by the challenge of translating raw data into real-world healthcare impact.

Hi, I'm Naveen — a Senior Data Scientist currently at Humana, where I architect and deploy machine learning and LLM-powered systems on datasets spanning millions of healthcare records. My work sits at the intersection of AI engineering and clinical impact.

I specialize in NLP, predictive modeling, and MLOps, with a strong commitment to building AI that is explainable, reproducible, and HIPAA-compliant. I care deeply about the downstream effect of every model I ship.

Previously at Mphasis, I built end-to-end text pipelines, classification systems, and ETL infrastructure for enterprise healthcare analytics. I hold an M.S. in Computer Science from Auburn University, Montgomery.

Years Experience

5M+

Records Modeled

90%+

Model Accuracy

Cloud Platforms

📍

Location

Open to Work — All Locations in the United States

🏥

Current Role

Senior Data Scientist — Humana

🎓

Education

M.S. Computer Science, Auburn University

🔗

naveen-kumar-challa7

🛡️

Compliance

HIPAA-Compliant AI Development

02 — Expertise

Core Technical Skills

A battle-tested toolkit built across healthcare, enterprise, and cloud environments.

//ML & AI

Deep Learning93%

LLM Fine-tuning90%

Predictive Modeling95%

RAG Systems88%

//NLP

Text Classification92%

Named Entity Recog.89%

Semantic Search90%

Text Summarization91%

//Programming

Python97%

SQL (T-SQL / PL-SQL)91%

R84%

Shell Scripting78%

//MLOps

MLflow / Kubeflow86%

Docker / Kubernetes83%

CI/CD Pipelines88%

AWS SageMaker85%

//Cloud Platforms

AWS88%

Azure (Synapse, ML Studio)82%

GCP (BigQuery, Cloud AI)80%

Snowflake79%

//Data & Visualization

Tableau / Power BI87%

Apache Spark84%

Databricks82%

A/B Testing & Stats90%

03 — Experience

Professional Journey

Building intelligent systems that make healthcare smarter and safer.

Jun 2024 – Present

Senior Data Scientist

// Humana · USA

Developed and deployed ML models on datasets exceeding 5 million patient records, improving risk stratification prediction accuracy by 15%.
Fine-tuned GPT-4 and Llama-2 to automate extraction and summarization of unstructured clinical notes, reducing manual review time by 30%.
Designed clinical decision support models using advanced ML and LLMs, enabling timely interventions and reducing hospital readmission rates by 12%.
Built intelligent clinical chatbots integrated with vector databases (Pinecone, Weaviate) for semantic search and knowledge retrieval across multi-modal healthcare data.
Automated end-to-end data pipelines, feature engineering, and model retraining using Python, R, SQL, and cloud platforms — cutting manual effort by 20%.
Implemented real-time model performance monitoring via AWS CloudWatch and Datadog to proactively detect data drift and model decay.
Created interactive Tableau and Power BI dashboards to visualize patient trends and KPIs for clinical stakeholders.

Jul 2019 – Dec 2022

Data Scientist

// Mphasis · India

Designed and productionized text summarization and Q&A pipelines using GPT-3/4 and vector databases, enhancing semantic search on unstructured healthcare data.
Built and deployed classification, regression, and clustering models with Python, R, Scikit-learn, and Spark MLlib — achieving up to 88% accuracy in pilots.
Automated ETL and feature engineering pipelines using Airflow, Databricks, and Docker, reducing data processing time by 25%.
Applied LLM-powered data augmentation for NLP datasets, boosting annotation speed by 25% and improving model training throughput.
Developed custom NLP solutions using spaCy, NLTK, and Hugging Face Transformers for clinical text analytics.
Tuned models with grid search, cross-validation, and Optuna hyperparameter optimization, reducing overfitting and maximizing generalization.
Conducted root cause analysis on operational inefficiencies, delivering process improvements that saved 15% in time and resources.

04 — Impact

Key Achievements

Measurable outcomes delivered at scale in real healthcare environments.

🏥

−12%

Reduced hospital readmission rates through clinical decision support models at Humana, enabling timely patient interventions.

📊

+15%

Improved patient risk stratification accuracy using ML models trained on 5M+ healthcare records at Humana.

⚡

−30%

Cut manual clinical note review time via LLM fine-tuning and automated summarization pipelines using GPT-4 and Llama-2.

🤖

+25%

Accelerated NLP dataset annotation speed using LLM-powered data augmentation at Mphasis, improving training throughput.

🔄

−20%

Reduced manual data processing effort through automated ETL pipelines and cloud infrastructure automation.

🎯

90%+

Maintained post-deployment model accuracy above 90% through SHAP, LIME validation and HIPAA-compliant monitoring pipelines.

05 — Stack

Full Technology Stack

Every tool I've used in production across data science, MLOps, and cloud.

Languages

PythonRSQL (T-SQL)PL/SQLMATLABShell Scripting

ML & Deep Learning

Scikit-learnTensorFlowPyTorchKerasXGBoostLightGBMHugging FaceOpenAI APIspaCyNLTK

LLMs & NLP

GPT-4Llama-2RAGPineconeWeaviateSemantic SearchNERTopic ModelingText Summarization

MLOps & Data Engineering

MLflowApache AirflowApache SparkDatabricksDockerKubernetesKubeflowVertex AISageMakerSeldon CoreFeastGreat Expectations

Cloud & Databases

AWS (S3, EC2, Lambda, Glue)Azure SynapseGCP BigQuerySnowflakeCloudWatchAWS Glue

Visualization & Stats

TableauPower BIMatplotlibSeabornPlotlyDashSHAPLIMEOptunaA/B Testing

Tools & Platforms

JupyterVS CodeGit / GitHubJiraConfluenceWeights & BiasesNeptune.aiClearMLDatadog

06 — Contact

Let's Connect

Open to new opportunities, collaborations, or just a conversation about ML and AI in healthcare.

Whether you have a project in mind, want to discuss a role, or are curious about any of my work — feel free to reach out. I typically respond within 24 hours.

📧

naveenkc@careerattainment.com

🔗

linkedin.com/in/naveen-kumar-challa7

📍