PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips
arxiv(2023)
摘要
Knowledge graphs (KGs) have emerged as a prominent data representation and
management paradigm. Being usually underpinned by a schema (e.g., an ontology),
KGs capture not only factual information but also contextual knowledge. In some
tasks, a few KGs established themselves as standard benchmarks. However, recent
works outline that relying on a limited collection of datasets is not
sufficient to assess the generalization capability of an approach. In some
data-sensitive fields such as education or medicine, access to public datasets
is even more limited. To remedy the aforementioned issues, we release PyGraft,
a Python-based tool that generates highly customized, domain-agnostic schemas
and KGs. The synthesized schemas encompass various RDFS and OWL constructs,
while the synthesized KGs emulate the characteristics and scale of real-world
KGs. Logical consistency of the generated resources is ultimately ensured by
running a description logic (DL) reasoner. By providing a way of generating
both a schema and KG in a single pipeline, PyGraft's aim is to empower the
generation of a more diverse array of KGs for benchmarking novel approaches in
areas such as graph-based machine learning (ML), or more generally KG
processing. In graph-based ML in particular, this should foster a more holistic
evaluation of model performance and generalization capability, thereby going
beyond the limited collection of available benchmarks. PyGraft is available at:
https://github.com/nicolas-hbt/pygraft.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要