besskge.pipeline.AllScoresPipeline

class besskge.pipeline.AllScoresPipeline(batch_sampler, corruption_scheme, score_fn, evaluation=None, filter_triples=None, candidate_ents=None, return_scores=False, return_topk=False, k=10, window_size=1000, use_ipu_model=False)[source]

Pipeline to compute scores of (h, r, ?) / (?, r, t) queries against all entities in the KG (or a given subset of entities), and related prediction metrics. It supports filtering out, for each query, the scores of specific completions that appear in a given set of triples.

To be used in combination with a batch sampler based on a “h_shard”/”t_shard”-partitioned triple set.

Initialize pipeline.

Parameters:
  • batch_sampler (ShardedBatchSampler) – Batch sampler, based on a “h_shard”/”t_shard”-partitioned triple set.

  • corruption_scheme (str) – Set to “t” to score (h, r, ?) completions, or to “h” to score (?, r, t) completions.

  • score_fn (BaseScoreFunction) – The trained scoring function.

  • evaluation (Optional[Evaluation]) – Evaluation module, for computing metrics. Default: None.

  • filter_triples (Optional[List[Union[Tensor, ndarray[Any, dtype[int32]]]]]) – The set of all triples whose scores need to be filtered. The triples passed here must have GLOBAL IDs for head/tail entities. Default: None.

  • candidate_ents (Union[Tensor, ndarray[Any, dtype[int32]], None]) – If specified, score queries only against a given set of entities. This array needs to contain the global IDs of the candidate entities to be used for completion. All other entities will then be ignored when scoring queries. Default: None (i.e. score queries against all entities).

  • return_scores (bool) – If True, store and return scores of all queries’ completions (with filters applied, if specified). For large number of queries/entities, this can cause the host to go OOM. Default: False.

  • return_topk (bool) – If True, return for each query the global IDs of the most likely completions, after filtering out the scores of filter_triples. Default: False.

  • k (int) – If return_topk is set to True, for each query return the top-k most likely predictions (after filtering). Default: 10.

  • window_size (int) – Size of the sliding window, namely the number of negative entities scored against each query at each step on IPU and returned to host. Should be decreased with large batch sizes, to avoid an OOM error. Default: 1000.

  • use_ipu_model (bool) – Run pipeline on IPU Model instead of actual hardware. Default: False.

forward()[source]

Compute scores of all completions and (possibly) metrics.

Return type:

Dict[str, Any]

Returns:

Scores, metrics, and (if provided in batch sampler) IDs of inference triples (wrt partitioned_triple_set.triples) to order results.