Shikha Bordia

Professional Background

I am a Machine Learning Engineer with extensive experience developing scalable NLP, information extraction, and generative AI systems for production. My work focuses on architecting high-performance pipelines, designing evaluation frameworks, and driving reliability across end-to-end ML workflows. I hold multiple U.S. patents and have published in leading conferences, including NAACL, EMNLP, and SDP. My research in debiasing, summarization, and language model analysis has contributed to major responsible AI toolkits from Microsoft, Google, Stanford, and CVS Health.

At Verisk Analytics, I have worked on Discovery Navigator, a leading medical record review platform that processes large volumes of clinical documents. My work spans medical record understanding, legal concept search, and multi-hop fact retrieval, collaborating closely with clinical experts, product teams, and ML Ops to build systems that significantly reduce manual effort and enhance decision quality. I previously earned my MS in Computer Science at NYU’s Courant Institute, working with the ML^2 Group Lab on bias,linguistic analysis and model interpretability.

Beyond my professional work, I am deeply passionate about yoga, dancing, and biking—practices that bring balance, energy, and creativity to my life. I am also a proud mom to a toddler whose curiosity and joy shape the way I approach learning, resilience, and the meaningful impact of my work.

Publications/Projects

Bonafide at LegalLens 2024 Shared Task: Using Lightweight DeBERTa Based Encoder For Legal Violation Detection and Resolution
Shikha Bordia*
[Paper]

HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification
Yichen Jiang*, Shikha Bordia*, Zheng Zhong, Charles Dognin, Maneesh Singh, Mohit Bansal
Findings of EMNLP 2020
[Paper][HoVer Leaderboard]

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs
Alex Warstadt*, Yu Cao*, Ioana Grosu*, Wei Peng*, Hagen Blix*, Yining Nie*, Anna Alsop*, Shikha Bordia*, Haokun Liu*, Alicia Parrish*, Sheng-Fu Wang*, Jason Phang*, Anhad Mohananey*, Phu Mon Htut*, Paloma Jeretic* and Samuel R. Bowman.
Proceedings of EMNLP 2019
[Paper][Slides][Talk]

Identifying and Reducing Gender Bias in Word-Level Language Models
Shikha Bordia and Samuel R. Bowman
NAACL, Student Research Workshop, 2019
[Paper][Slides][Poster][Talk]

On Measuring Social Biases in Sentence Encoders
Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger
NAACL 2019.
[Paper][Talk]

Do Attention Heads in BERT Track Syntactic Dependencies?
Phu Mon Htut*, Jason Phang*, Shikha Bordia*, and Samuel R. Bowman.
Natural Language, Dialog and Speech (NDS) Symposium, The New York Academy of Sciences. 2019. (Extended Abstract)
[Paper][Poster][Blog]

Contributed to jiant
jiant is a work-in-progress software toolkit for natural language processing research, designed to facilitate work on multitask learning and transfer learning for sentence understanding tasks.

Patents

Machine learning systems and methods for interactive concept searching using attention scoring (US 11550782, 2023)

Machine learning systems and methods for many-hop fact extraction and claim verification (US 12406150, 2025)

Systems and Methods for Machine Learning From Medical Records (Accepted)