Dynamic-KGQA

A Dynamic Knowledge Graph Question Answering Framework

About Dynamic-KGQA

In this work, we introduce Dynamic-KGQA, a scalable framework for generating adaptive QA datasets from knowledge graphs (KGs), designed to mitigate memorization risks while maintaining statistical consistency across iterations. Unlike fixed benchmarks, Dynamic-KGQA generates a new dataset variant on every run while preserving the underlying distribution, enabling fair and reproducible evaluations.

Furthermore, our framework provides fine-grained control over dataset characteristics, supporting domain-specific and topic-focused QA dataset generation.

Additionally, Dynamic-KGQA produces compact, semantically coherent subgraphs that facilitate both training and evaluation of KGQA models, enhancing their ability to leverage structured knowledge effectively.

Important Links

Citation

If you use the Dynamic-KGQA dataset or codebase in your work, please cite the following paper:

      @article{2025dynamickgqa,
        title={Dynamic-KGQA: A Dynamic Knowledge Graph Question Answering Framework},
        author={Preetam Prabhu Srikar Dammu and Himanshu Naidu and Chirag Shah},
        journal={arXiv preprint arXiv:2109.03893},
        year={2025}
      }

You can also cite the dataset directly using the following DOI:

Knowledge Graph Hosting

While it is possible to use any knowledge graph with the Dynamic-KGQA framework, we built our dataset using the Yago knowledge graph for generating QA pairs. The Yago knowledge graph is available for download from the Yago website.

There are multiple ways to host the Yago knowledge graph for use with the Dynamic-KGQA framework. The instructions for hosting the Yago knowledge graph using a Docker Blazegraph container are provided in our GitHub repository.

Dataset Format

The public dataset created by the DynamicKGQA framework is available in the Hugging Face Datasets format. You can load the dataset using the following code snippet:

      
      from datasets import load_dataset
      dataset = load_dataset("preetam7/dynamic_kgqa")

The dataset contains the following columns:

id The unique identifier for the QA pair.
question The input question text.
answer The answer text (typically a Yago entity).
answer_readable The human-readable answer text.
answer_uri The Yago URI of the answer entity.
supporting_facts The supporting Yago triples for the answer in the form of knowledge graph triplet labels.
supporting_facts_uri The URIs of the supporting Yago triples for the answer.
subgraph The subgraph used to generate the QA pair.
subgraph_size The size of the subgraph used to generate the QA pair.
logical_structure_flag_n Flag by the nth LLM-as-judge, indicating if the QA pair has a logical structure.
logical_structure_reasoning_n Explanation of the logical structure flag by the nth LLM-as-judge.
redundancy_flag_n Flag by the nth LLM-as-judge, indicating if the QA pair is trivial or the question includes the answer.
redundancy_reasoning_n Explanation of the redundancy flag by the nth LLM-as-judge.
answer_support_flag_n Flag by the nth LLM-as-judge, indicating if the supporting Yago triples substantiate the answer.
answer_support_reasoning_n Explanation of the answer support flag by the nth LLM-as-judge.
answer_adequacy_flag_n Flag by the nth LLM-as-judge, indicating if the answer is adequate for the question.
answer_adequacy_reasoning_n Explanation of the answer adequacy flag by the nth LLM-as-judge.