HiKEY: Hierarchical Multimodal Retrieval for Open-Domain Document Question Answering
Top ConferencesHierarchical retrieval for multimodal document QA with structured evidence assembly.
My publications focus on transforming heterogeneous unstructured inputs into structured evidence, reliable long-context reasoning, and robustness diagnostics. Accepted and published work appears first; anonymous manuscripts under review and future-facing research directions are separated below.
Hierarchical retrieval for multimodal document QA with structured evidence assembly.
LVLM-based dependency chunking that reconstructs cross-page structure for long-document retrieval and QA.
A hierarchical multimodal chunking pipeline that preserves layout and improves evidence composition in industrial RAG.
Domain-specific RAG framework for scientific and industrial QA; led to two technology transfers.
Graph-aware lexical embedding model that injects structured knowledge into vector representations.
Hybrid reader model that jointly processes text and tables for multi-paragraph machine reading comprehension.
Comparative analysis of zero-shot vs. RAG for GPT models, demonstrating benefits of evidence-grounded generation.
Investigated language-specific influences on LLM performance across Korean benchmarks.
QA-pair passage construction method for Korean RAG chatbots, reducing hallucination in domain-specific settings.
Neural-symbolic parser integrating linguistic constraints to overcome deep learning limitations in dependency parsing.
Proposed encoder-decoder embedding architecture for T5, improving upon encoder-only approaches.
Identified domain generalization limitations in tabular MRC models through cross-validation analysis.
Expanded neural-symbolic constraint rules from 2 to 24, achieving state-of-the-art dependency parsing.
Transformer-augmented dependency parser with rule-based probability control for improved parsing accuracy.
Applied continual learning to Korean MRC, addressing catastrophic forgetting in sequential model training.
9 first-author and 4 co-author manuscripts currently under peer review at top-tier venues (anonymized titles per double-blind submission policy).
Entity-linked assertion graphs unify text, tables, and figures into a single evidence representation for open-domain multimodal QA.
Joint measurement of retrieval relevance, evidence breadth, faithfulness, latency, memory, and cost for fair chunker comparison.
Demonstrates that final-answer correctness alone is not evidence of grounding; proposes diagnostics on what an agent commits to memory.
Quantitative framing, boundary measurement, and practical mitigation tools for agentic AI memory-write vulnerabilities.
Shifts video QA/RAG evaluation beyond final-answer accuracy by auditing which timestamped segments were actually consumed.
A matched-reader audit framework separates real evidence use from evaluation illusions in multimodal QA.
A matched-intervention diagnostic framework analyzing how upstream parsing uncertainty propagates across PDF-to-RAG representation families.
Controllable repair, recalibration, and materialization of retrieved candidates before reader-context packing.
Attribution analysis of how OCR quality, evidence placement, answer policy, and reader family interact to affect answer-quality gains.
A multimodal multi-document benchmark with page-level annotations for open-domain document QA.
Comprehensive survey of VQA methods, benchmarks, and evaluation paradigms.
Survey and unified audit framework for the reliability and safety of multimodal agent systems.
Retrieval- and reasoning-guidance methods for reasoning-efficient agentic RAG systems.
Current research directions include actionable ontologies for multimodal agent memory, video-structured memory for long-horizon reasoning, and planning over structured world representations.