besskge.sharding.PartitionedTripleSet

class besskge.sharding.PartitionedTripleSet(sharding, inverse_triples, partition_mode, dummy, triples, triple_counts, triple_offsets, triple_sort_idx, types, neg_heads, neg_tails)[source]

A partitioned collection of triples. If partition_mode = 'h_shard' each triple is assigned to one of n_shard partitions based on the shard where the head entity is stored. Similarly, if partition_mode = 't_shard', each triple is assigned to one of n_shard partitions based on the shard where the tail entity is stored.

If partition_mode = 'ht_shardpair', each triple is assigned to one of n_shard^2 partitions based on the shard-pair (shard_h, shard_t). Shard-pairs are ordered as: (0,0), (0,1), …, (0, n_shard-1), (1,0), …, (n_shard-1, n_shard-1).

Parameters:

sharding (Sharding) –
inverse_triples (bool) –
partition_mode (str) –
dummy (str | None) –
triples (ndarray[Any, dtype[int32]]) –
triple_counts (ndarray[Any, dtype[int64]]) –
triple_offsets (ndarray[Any, dtype[int64]]) –
triple_sort_idx (ndarray[Any, dtype[int64]]) –
types (ndarray[Any, dtype[int32]] | None) –
neg_heads (ndarray[Any, dtype[int32]] | None) –
neg_tails (ndarray[Any, dtype[int32]] | None) –

classmethod create_from_dataset(dataset, part, sharding, partition_mode='ht_shardpair', add_inverse_triples=False)[source]

Create a partitioned triple set from a KGDataset part.

Parameters:

dataset (KGDataset) – Knowledge graph dataset.
part (str) – The dataset part to shard.
sharding (Sharding) – The entity sharding to use.
partition_mode (str) – The triple partition mode. Can be “h_shard”, “t_shard”, “ht_shardpair”.
add_inverse_triples (bool) –

Return type:

PartitionedTripleSet

Returns:

Partitioned set of triples.

classmethod create_from_queries(dataset, sharding, queries, query_mode, ground_truth=None, negative=None, negative_type=None)[source]

Create a partitioned triple set from a set of (h,r,?) or (?,r,t) queries. Pairs are completed to triples by adding dummy entities.

Parameters:

dataset (KGDataset) – Knowledge graph dataset.
sharding (Sharding) – The entity sharding to use.
queries (ndarray[Any, dtype[int32]]) – shape: (n_query, 2) The set of (h, r) or (r, t) queries. Global IDs for entities/relations.
query_mode (str) – “hr” for (h,r,?) queries, “rt” for (?,r,t) queries.
ground_truth (Optional[ndarray[Any, dtype[int32]]]) – shape: (n_query,) If known, the global ID of the ground truth tail/head.
negative (Optional[ndarray[Any, dtype[int32]]]) – shape: (N, n_negative) Global IDs of negative entities to score against each query. This can be query-specific (N=n_query) or the same for all queries (N=1). Default: None (namely score each query against all entities in the graph).
negative_type (Optional[str]) – Score each query only against entities of a specific type. Default: None (namely score each query against entities of any type).

Return type:

PartitionedTripleSet

Returns:

Partitioned set of queries (with dummy h/t completion).

dummy: Optional[str]: If set is constructed from (h,r,?) (resp. (?,r,t)) queries, dummy tails (resp. heads) are added to make pairs into triples. “head”, “tail”, “none”

inverse_triples: bool: Whether the collection contains inverse triples (t,r_inv,h) for each regular triple (h,r,t)

neg_heads: Optional[ndarray[Any, dtype[int32]]]: Global IDs of (possibly triple-specific) negative heads; int32[n_triple or 1, n_neg_heads]

neg_tails: Optional[ndarray[Any, dtype[int32]]]: Global IDs of (possibly triple-specific) negative heads; int32[n_triple or 1, n_neg_tails]

partition_mode: str: Partitioning criterion for triples; “h_shard”, “t_shard”, “ht_shardpair”

sharding: Sharding: Sharding of entities

triple_counts: ndarray[Any, dtype[int64]]: Number of triples in each partition; int64[n_shard] or int64[n_shard, n_shard]

triple_offsets: ndarray[Any, dtype[int64]]: Delimiting indices of ordered partitions; int64[n_shard] or int64[n_shard, n_shard]

triple_sort_idx: ndarray[Any, dtype[int64]]: Sorting indices to order triples by partition; int64[n_triple]

triples: ndarray[Any, dtype[int32]]: h/r/t IDs for triples ordered by partition. Local IDs for heads (resp. tails) and global IDs for tails (resp. heads) if partition_mode = “h_shard” (resp. “t_shard”); local IDs for heads and tails if partition_mode = “ht_shardpair” int32[n_triple, {h,r,t}]

types: Optional[ndarray[Any, dtype[int32]]]: Entity type IDs of triple head/tail; int32[n_triple, {h_type, t_type}]