HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering
Top ConferencesHierarchical retrieval for multimodal document QA with structured evidence assembly.
My publications focus on transforming heterogeneous unstructured inputs into structured evidence, reliable long-context reasoning, and robustness diagnostics. Accepted and published work appears first; anonymous manuscripts under review and future-facing research directions are separated below.
Hierarchical retrieval for multimodal document QA with structured evidence assembly.
LVLM-based dependency chunking that reconstructs cross-page structure for long-document retrieval and QA.
A hierarchical multimodal chunking pipeline that preserves layout and improves evidence composition in industrial RAG.
Domain-specific RAG framework for scientific and industrial QA; led to two technology transfers.
Graph-aware lexical embedding model that injects structured knowledge into vector representations.
Hybrid reader model that jointly processes text and tables for multi-paragraph machine reading comprehension.
Comparative analysis of zero-shot vs. RAG for GPT models, demonstrating benefits of evidence-grounded generation.
Investigated language-specific influences on LLM performance across Korean benchmarks.
QA-pair passage construction method for Korean RAG chatbots, reducing hallucination in domain-specific settings.
Neural-symbolic parser integrating linguistic constraints to overcome deep learning limitations in dependency parsing.
Proposed encoder-decoder embedding architecture for T5, improving upon encoder-only approaches.
Identified domain generalization limitations in tabular MRC models through cross-validation analysis.
Expanded neural-symbolic constraint rules from 2 to 24, achieving state-of-the-art dependency parsing.
Transformer-augmented dependency parser with rule-based probability control for improved parsing accuracy.
Applied continual learning to Korean MRC, addressing catastrophic forgetting in sequential model training.
Entity-linked assertion graphs unify text, tables, and figures into a single evidence representation for open-domain multimodal QA.
A validated intermediate representation maps instructions into executable operations, task-critical arguments, and prerequisite/dataflow structure.
A matched-reader audit framework separates real evidence use from evaluation illusions in multimodal QA.
A matched-intervention diagnostic framework analyzes how upstream parsing uncertainty propagates across representation families.
Current research directions include actionable ontologies for multimodal agent memory, video-structured memory for long-horizon reasoning, and planning over structured world representations.