besskge.dataset.KGDataset
- class besskge.dataset.KGDataset(n_entity, n_relation_type, triples, original_triple_ids, entity_dict=None, relation_dict=None, type_offsets=None, neg_heads=None, neg_tails=None)[source]
Represents a complete knowledge graph dataset of (head, relation, tail) triples.
- Parameters:
- classmethod build_openbiolink(root)[source]
Build the high-quality version of the OpenBioLink2020 dataset [ASAM20]
- classmethod build_yago310(root)[source]
Build the YAGO3-10 dataset. This is the subgraph of the YAGO3 knowledge graph [MBS15] containing only entities which have at least 10 relations associated to them. First used in [DPPR18].
- classmethod from_dataframe(df, head_column, relation_column, tail_column, entity_types=None, split=(0.7, 0.15, 0.15), seed=1234)[source]
Build a KGDataset from a pandas DataFrame of labeled (h,r,t) triples. IDs for entities and relations are automatically assigned based on labels in such a way that entities of the same type have contiguous IDs.
- Parameters:
df (
Union
[DataFrame
,Dict
[str
,DataFrame
]]) – Pandas DataFrame of all triples in the knowledge graph dataset, or dictionary of DataFrames of triples for each part of the dataset splithead_column (
Union
[int
,str
]) – Name of the DataFrame column storing head entitiesrelation_column (
Union
[int
,str
]) – Name of the DataFrame column storing relationstail_column (
Union
[int
,str
]) – Name of the DataFrame column storing tail entitiesentity_types (
Union
[Series
,Dict
[str
,str
],None
]) – If entities have types, dictionary or pandas Series of mappings entity label -> entity type (as strings).split (
Tuple
[float
,float
,float
]) – Tuple to set the train/validation/test split. Only used if no pre-defined dataset split is specified, i.e. if df is not a dictionary.seed (
int
) – Random seed for the train/validation/test split. Only used if no pre-defined dataset split is specified, i.e. if df is not a dictionary.
- Return type:
- Returns:
Instance of the KGDataset class.
- classmethod from_triples(data, split=(0.7, 0.15, 0.15), seed=1234, entity_dict=None, relation_dict=None, type_offsets=None)[source]
Build a dataset from an array of triples, where IDs for entities and relations have already been assigned. Note that, if entities have types, entities of the same type need to have contiguous IDs. Triples are randomly split in train/validation/test sets. The attribute KGDataset.original_triple_ids stores the IDs of the triples in each split wrt the original ordering in data.
If a pre-defined train/validation/test split is wanted, the KGDataset class should be instantiated manually.
- Parameters:
data (
ndarray
[Any
,dtype
[int32
]]) – Numpy array of triples [head_id, relation_id, tail_id]. Shape (num_triples, 3).split (
Tuple
[float
,float
,float
]) – Tuple to set the train/validation/test split.seed (
int
) – Random seed for the train/validation/test split.entity_dict (
Optional
[List
[str
]]) – Optional entity labels by ID.relation_dict (
Optional
[List
[str
]]) – Optional relation labels by ID.type_offsets (
Optional
[Dict
[str
,int
]]) – Offset of entity types
- Return type:
- Returns:
Instance of the KGDataset class.
- property ht_types: Dict[str, ndarray[Any, dtype[int32]]] | None
If entities have types, type IDs of triples’ heads/tails; {part: int32[n_triple, {h_type, t_type}]}
- classmethod load(path)[source]
Load a
KGDataset
object saved withKGDataset.save()
.
-
neg_heads:
Optional
[Dict
[str
,ndarray
[Any
,dtype
[int32
]]]] = None IDs of (possibly triple-specific) negative heads; {part: int32[n_triple or 1, n_neg_heads]}
-
neg_tails:
Optional
[Dict
[str
,ndarray
[Any
,dtype
[int32
]]]] = None IDs of (possibly triple-specific) negative tails; {part: int32[n_triple or 1, n_neg_tails]}
-
original_triple_ids:
Dict
[str
,ndarray
[Any
,dtype
[int32
]]] IDs of the triples in KGDataset.triples wrt the ordering in the original array/dataframe from where the triples originate.