Welcome to my homepage! I am Kai Sun (孙锴). I am currently a research scientist at Meta Reality Labs. I obtained my Ph.D. in Computer Science from Cornell University in 2021. My advisor was Claire Cardie. Before that, I did my undergraduate studies in Computer Science, in the ACM Honors Class 2011 at Shanghai Jiao Tong University.
My research interests lie broadly in artificial intelligence (AI), especially natural language processing (NLP) and AI in Board Games.
(*: equal contribution)
Shicheng Liu, Kai Sun, Lisheng Fu, Xilun Chen, Xinyuan Zhang, Zhaojiang Lin, Rulin Shao, Yue Liu, Anuj Kumar, Wen-tau Yih, and Xin Luna Dong. SCRIBES: Web-Scale Script-Based Semi-Structured Data Extraction with Reinforcement Learning. The Fourteenth International Conference on Learning Representations (ICLR). 2026. arXiv
Zhaojiang Lin*, Yong Xu*, Kai Sun*, Jing Zheng, Yin Huang, Surya Teja Appini, Krish Narang, Renjie Tao, Ishan Kapil Jain, Siddhant Arora, Ruizhi Li, Yiteng Huang, Kaushik Patnaik, Wenfang Xu, Suwon Shon, Yue Liu, Ahmed A Aly, Anuj Kumar, Florian Metze, and Xin Luna Dong. WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables. The Fourteenth International Conference on Learning Representations (ICLR). 2026. arXiv
Kai Zhang, Xinyuan Zhang, Ejaz Ahmed, Hongda Jiang, Caleb Kumar, Kai Sun, Zhaojiang Lin, Sanat Sharma, Shereen Oraby, Aaron Colak, Ahmed Aly, Anuj Kumar, Xiaozhong Liu, and Xin Luna Dong. AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval. The Fourteenth International Conference on Learning Representations (ICLR). 2026. arXiv
Kai Sun, Yin Huang, Srishti Mehra, Mohammad Kachuee, Xilun Chen, Renjie Tao, Zhaojiang Lin, Andrea Jessee, Nirav Shah, Alex Betty, Yue Liu, Anuj Kumar, Wen-tau Yih, and Xin Luna Dong. Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs? The 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL). 2026. arXiv, data
Wang Bill Zhu, Deqing Fu, Kai Sun, Yi Lu, Zhaojiang Lin, Seungwhan Moon, Kanika Narang, Mustafa Canim, Yue Liu, Anuj Kumar, and Xin Luna Dong. VisualLens: Personalization through Task-Agnostic Visual History. The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS). 2025. arXiv
Yushi Sun, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, and Lei Chen. KERAG: Knowledge-Enhanced Retrieval-Augmented Generation for Advanced Question Answering. Findings of the Association for Computational Linguistics: EMNLP 2025 (EMNLP Findings). 2025. arXiv
Mohammad Kachuee, Teja Gollapudi, Minseok Kim, Yin Huang, Kai Sun, Xiao Yang, Jiaqi Wang, Nirav Shah, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, and Xin Luna Dong. PrismRAG: Boosting RAG Factuality with Distractor Resilience and Strategized Reasoning. The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP) Industry Track. 2025. arXiv
Yushi Sun, Kai Sun, Xiao Yang, and Nan Tang. Knowledge Internalized in LLMs. In Handbook on Neurosymbolic AI and Knowledge Graphs, IOS Press, 2025.
Xiao Yang*, Yifan Ethan Xu*, Kai Sun*, Jiaqi Wang*, Lingkun Kong, Wen-tau Yih, and Xin Luna Dong. KDD Cup CRAG competition: Systems, Findings and Learnings. IEEE Data Engineering Bulletin. 2024.
Xiao Yang*, Kai Sun*, Hao Xin*, Yushi Sun*, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, and Xin Luna Dong. CRAG -- Comprehensive RAG Benchmark. 38th Conference on Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks. 2024. arXiv, data & code
Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, and Lei Chen. Are Large Language Models a Good Replacement of Taxonomies? The 50th International Conference on Very Large Databases (VLDB). 2024. arXiv, data & code
Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2024. arXiv, code
Kai Sun. Digital Asset Valuation: A Study on Domain Names, Email Addresses, and NFTs. 2022. arXiv, data & code
Kai Sun*, Dian Yu*, Jianshu Chen, Dong Yu, and Claire Cardie. Improving Machine Reading Comprehension with Contextualized Commonsense Knowledge. Annual Meeting of the Association for Computational Linguistics (ACL). 2022. arXiv, code
Kai Sun. Machine Reading Comprehension: Challenges and Approaches. Ph.D. Thesis. 2021. pdf
Dian Yu, Kai Sun, Dong Yu, and Claire Cardie. Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data. Findings of the Association for Computational Linguistics: EMNLP 2021 (EMNLP Findings). 2021. arXiv, data
Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, and Claire Cardie. Adding Chit-Chat to Enhance Task-Oriented Dialogues. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2021. arXiv, data & code
Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan. CLUE: A Chinese Language Understanding Evaluation Benchmark. The 28th International Conference on Computational Linguistics (COLING). 2020. arXiv
Dian Yu*, Kai Sun*, Claire Cardie, and Dong Yu. Dialogue-Based Relation Extraction. Annual Meeting of the Association for Computational Linguistics (ACL). 2020. arXiv, data & code, project page
Kai Sun, Dian Yu, Dong Yu, and Claire Cardie. Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension. Transactions of the Association for Computational Linguistics (TACL). 2020. arXiv, data & code, project page
Xiaoman Pan*, Kai Sun*, Dian Yu, Jianshu Chen, Heng Ji, Claire Cardie, and Dong Yu. Improving Question Answering with External Knowledge. EMNLP Workshop on Machine Reading for Question Answering (MRQA). 2019. arXiv, resource
Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, Dong Yu, David McAllester, and Dan Roth. Evidence Sentence Extraction for Machine Reading Comprehension. The SIGNLL Conference on Computational Natural Language Learning (CoNLL). 2019. arXiv, resource
Hai Wang, Dian Yu, Kai Sun, Jianshu Chen, and Dong Yu. Improving Pre-Trained Multilingual Model with Vocabulary Expansion. The SIGNLL Conference on Computational Natural Language Learning (CoNLL). 2019. arXiv
Kai Sun, Dian Yu, Dong Yu, and Claire Cardie. Improving Machine Reading Comprehension with General Reading Strategies. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019. arXiv, code
Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension. Transactions of the Association for Computational Linguistics (TACL). 2019. arXiv, data & code, leaderboard
Kai Sun and Claire Cardie. Cornell Belief and Sentiment System at TAC 2017. Text Analysis Conference (TAC). 2017.
Mohamed Al-Badrashiny, Jason Bolton, Arun Tejavsi Chaganty, Kevin Clark, Craig Harman, Lifu Huang, Matthew Lamm, Jinhao Lei, Di Lu, Xiaoman Pan, Ashwin Paranjape, Ellie Pavlick, Haoruo Peng, Peng Qi, Pushpendre Rastogi, Abigail See, Kai Sun, Max Thomas, Chen-Tse Tsai, Hao Wu, Boliang Zhang, Chris Callison-Burch, Claire Cardie, Heng Ji, Christopher Manning, Smaranda Muresan, Owen C. Rambow, Dan Roth, Mark Sammons, and Benjamin Van Durme. TinkerBell: Cross-lingual Cold-Start Knowledge Base Construction. Text Analysis Conference (TAC). 2017.
Vlad Niculae, Kai Sun, Xilun Chen, Yao Cheng, Xinya Du, Esin Durmus, Arzoo Katiyar, and Claire Cardie. Cornell Belief and Sentiment System at TAC 2016. Text Analysis Conference (TAC). 2016.
Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, and Kai Yu. Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues. Conference of the International Speech Communication Association (Interspeech). 2016. pdf, bib
Kai Sun, Qizhe Xie, and Kai Yu. Recurrent Polynomial Network for Dialogue State Tracking. Dialogue and Discourse (D&D). 2016. pdf, bib
Kai Yu, Lu Chen, Kai Sun, Su Zhu, and Qizhe Xie. Evolvable Dialogue State Tracking for Statistical Dialogue Management. Frontiers of Computer Science. 2015. pdf, bib
Kai Yu, Kai Sun, Lu Chen, and Su Zhu. Constrained Markov Bayesian Polynomial for Efficient Dialogue State Tracking. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP). 2015. pdf, bib
Qizhe Xie, Kai Sun, Su Zhu, Lu Chen, and Kai Yu. Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers. 16th Annual SIGdial Meeting on Discourse and Dialogue (SIGdial). 2015. pdf, bib
Kai Yu, Lu Chen, Bo Chen, Kai Sun, and Su Zhu. Cognitive Technology in Task-Oriented Dialogue Systems -- Concepts, Advances and Future. Chinese Journal of Computers. 2014. pdf, bib
Su Zhu, Lu Chen, Kai Sun, Da Zheng, and Kai Yu. Semantic Parser Enhancement for Dialogue Domain Extension with Little Data. IEEE Spoken Language Technology Workshop (SLT). 2014. pdf, bib
Kai Sun, Lu Chen, Su Zhu, and Kai Yu. A Generalized Rule Based Tracker for Dialogue State Tracking. IEEE Spoken Language Technology Workshop (SLT). 2014. pdf, bib
Kai Sun, Lu Chen, Su Zhu, and Kai Yu. The SJTU System for Dialog State Tracking Challenge 2. 15th Annual SIGdial Meeting on Discourse and Dialogue (SIGdial). 2014. pdf, bib
I designed Yixin, an AI program playing Gomoku and Renju. It was the world champion AI, the winner of the 13th, 14th, 15th, 16th, 17th, 18th, and 19th Gomocup.
Yixin was the first Gomoku and Renju AI that can compete at the human champion level. It beat Taiwan's Meijin title holder Lin Shu-Hsuan and the world Gomoku champion Rudolf Dupszki in 2017, and drew with world Renju champion Qi Guan in 2018.
I have been managing Gomocup (with Tianyi Hao) since the 17th Gomocup, 2016.
Email: ks985 [at] cornell [dot] edu
Last updated: Jan 2026