Structure-Aware Multimodal Reasoning
Retrieval- and generation-oriented representations that preserve layout, section hierarchy, table–figure relations, cross-page dependencies, and provenance.
Multimodal Senior Researcher, Human-Inspired AI Research Founder & Lead at KUDoc
I research structure-aware Multimodal Reasoning: methods that recover document structure (layout, tables, figures, hierarchy, cross-page dependencies, and provenance) to turn complex documents into auditable evidence for reliable retrieval, grounded generation, and long-context reasoning. My first- and co-first-author work spans hierarchical retrieval, multimodal dependency parsing, structure-preserving chunking, and domain-specific RAG. My current manuscripts extend this agenda toward evidence auditing, multimodal QA evaluation, and evidence-centric memory for reliable multimodal agents.
I completed my B.S. in Computer Science and Engineering (advised by Prof. Jeongu Kim) and M.S. in Artificial Intelligence (advised by Prof. Hyuk-Chul Kwon) at Pusan National University, beginning NLP research in Prof. Kwon's AI Research Lab in 2020. I then joined Korea University's Human-Inspired AI Research Group, co-advised by Prof. Jaehyung Seo and Prof. Heuiseok Lim, where I founded and lead KUDoc, earning a promotion to Senior Researcher.
Together we have produced 13 publications and 13 manuscripts under review at top-tier venues, including 4 first-author papers at ACL, CVPR, and EMNLP, along with 5 patents, 4 industry projects with 3 technology transfers, and 7 awards.
I am seeking Ph.D. opportunities with advisors working on multimodal reasoning, structured memory, world models, and planning for future agents. If these directions resonate with yours, please feel free to reach out via email or LinkedIn.
HiKEY was accepted at ACL 2026 Main, Oral (First Author).
M3DocDep was accepted at CVPR 2026 Main (First Author).
MultiDocFusion was accepted at EMNLP 2025 Main (First Author).
Appointed to the National Representative K-AI Research Team (with NC AI).
StyleDFS was accepted at EMNLP 2024 Industry (Co-First Author).
Retrieval- and generation-oriented representations that preserve layout, section hierarchy, table–figure relations, cross-page dependencies, and provenance.
LVLM-based parsing and dependency modeling for long, noisy, multi-page documents.
Claim-to-evidence linking, evidence coverage analysis, and support-sensitive QA evaluation.
Across ACL, CVPR, and EMNLP(2).
Translated research ideas into protected and deployable AI methods.
Across multimodal reasoning, foundation models, and applied AI collaborations.
Recognized across research, industry collaboration, and applied AI competitions.
Structure-aware retrieval, multimodal RAG, and production-facing document workflows.
Korean and multilingual foundation-model adaptation, data pipelines, post-training, and evaluation.
Applied AI collaborations spanning Korean speech, education, and geoscience.