About

Why I research structured multimodal reasoning.

01 Origin

From the Wrestling Mat to Computer Science

As a former ssireum (Korean wrestling) athlete, I made a late but determined leap into computer science at Pusan National University, driven by a desire to create technology that serves real industries and everyday life — technology I could call my own.

Yet the undergraduate curriculum, centered on theory from dated textbooks, couldn't quench that thirst. I threw myself into internships and startups, where hands-on projects exposed the limits of my knowledge and made the need for deeper learning painfully clear.

02 Foundation

The Philosophy of Parsing

Searching for answers, I joined the lab of Professor Hyuk-Chul Kwon — one of Korea's first AI Ph.D.s and an authority on parsing — first as an undergraduate researcher and then as a master's student. Over two and a half intensive years, I immersed myself in the foundations of natural language processing and the philosophy that understanding the intrinsic structure hidden within unstructured data is the key to trustworthy AI.^[1]^[2]

This was the defining period that taught me a crucial lesson: for technology to be trusted and used in real life, it must go beyond surface-level data manipulation and be grounded in a deep, logical Understanding of Structure.

03 Present

Trustworthy Structuring at Scale

Building on this foundation, I joined the Human-Inspired AI Research Group at Korea University, where I was co-advised by Prof. Jaehyung Seo and Prof. Heuiseok Lim as I broadened my scope from sentence-level parsing to document-level structure understanding. Over the past three years, the core of my work has been expanding the scope of Trustworthy Structuring — from text and table QA^[3]^[4] to industrial-document-level structure recovery^[5], hierarchical multimodal retrieval^[6], and retrieval-augmented generation^[7].

Today I focus on unifying heterogeneous unstructured inputs — text, tables, images — into Structured Evidence that enables reliable long-context reasoning^[8]. Beyond short-term metric gains, I am establishing fundamental mechanisms that allow AI to audit its own reasoning process, proving the explainability of multimodal document AI.

To tackle real-world challenges beyond what a single researcher could address, I founded KUDoc — the Document AI research group at Korea University — on my own initiative. I now lead and advise KUDoc, and together we have published 13 papers and have 13 manuscripts under review at top-tier venues (including 4 first-author papers at ACL, CVPR, and EMNLP) and successfully completed 4 major industry–academia projects^[9] with 3 technology transfers, ensuring that academic results translate directly into practical value. This leadership earned me a promotion to Senior Researcher.

04 Vision

Beyond Documents — Toward World Structure

I aim to pursue a Ph.D. to extend my research horizon beyond document structure toward Structured Memory, World Models, and Planning for Future Agents. Just as I extracted structured evidence from unstructured data, I now aim to model the structure of a dynamic world beyond static documents.

With the tenacity of an athlete who never yields on the wrestling mat, I will build solid structures from sparse data and grow into a scholar leading world-class AI research.

References

Neural Symbolic Models for Overcoming Deep Learning Limitations and Korean Dependency Parsing. Joongmin Shin*, Hyuk-Chul Kwon. KIISE 2023.
A Dependency Parsing Model Applying Enhanced Dominant-Dependent Constraint Rules. Joongmin Shin*, Hyuk-Chul Kwon. KSC 2022.
Multi-Paragraph Machine Reading Comprehension with Hybrid Reader over Tables and Text. Sanghyun Cho*, SeongReol Park, et al. Applied AI Journal, 2024.
Evaluation of Korean Machine Reading Comprehension Generalization Performance. Joongmin Shin*, Sanghyun Cho, et al. KSC 2022.
M3DocDep: Multi-modal, Multi-page, Multi-document Dependency Chunking with Large Vision-Language Models. Joongmin Shin*, Jeongbae Park, et al. CVPR 2026.
HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering. Joongmin Shin*, Gyuho Shim, et al. ACL 2026.
MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents. Joongmin Shin*, Chanjun Park, et al. EMNLP 2025.
Intelligent Predictive Maintenance RAG Framework for Power Plants. Seongtae Hong*, Joongmin Shin*, et al. EMNLP 2024.
DocGraph Copilot: Multimodal Document Chunking for Industrial RAG. Industry Project, Korea University.