besskge.sharding.Sharding

class besskge.sharding.Sharding(n_shard, entity_to_shard, entity_to_idx, shard_and_idx_to_entity, shard_counts, entity_type_counts, entity_type_offsets)[source]

A mapping of entities to shards (and back again).

Parameters:
classmethod create(n_entity, n_shard, seed, type_offsets=None)[source]

Construct a random, balanced sharding of entities.

Parameters:
  • n_entity (int) – Number of entities in the knowledge graph.

  • n_shard (int) – Number of shards.

  • seed (int) – Seed for random sharding.

  • type_offsets (Optional[ndarray[Any, dtype[int64]]]) – shape: (n_types,) Global offsets of entity types. Default: None.

Return type:

Sharding

Returns:

Random sharding of n_entity entities in n_shard shards.

entity_to_idx: ndarray[Any, dtype[int32]]

Entity local ID on shard by global ID; int32[n_entity]

entity_to_shard: ndarray[Any, dtype[int32]]

Entity shard by global ID; int32[n_entity]

entity_type_counts: Optional[ndarray[Any, dtype[int64]]]

Number of entities of each type on each shard; int64[n_shard, n_types]

entity_type_offsets: Optional[ndarray[Any, dtype[int64]]]

Offsets for entities of same type on each shared (entities remain clustered by type also locally); int64[n_shard, n_types]

classmethod load(path)[source]

Load a Sharding object saved with Sharding.save().

Parameters:

path (Path) – Path to saved Sharding object.

Return type:

Sharding

Returns:

The saved Sharding object.

property max_entity_per_shard: int

Number of entities in a shard, after applying padding.

property n_entity: int

Number of entities in the knowledge graph.

n_shard: int

Number of shards

save(out_file)[source]

Save sharding to .npz file.

Parameters:

out_file (Path) – Path to output file.

Return type:

None

shard_and_idx_to_entity: ndarray[Any, dtype[int32]]

Entity global ID by (shard, local_ID); int32[n_shard, max_entity_per_shard]

shard_counts: ndarray[Any, dtype[int64]]

Number of true entities (excluding padding) in each shard; int64[n_shard]