This research proposes a theoretical framework to determine the optimal height partition granularity in hierarchical indexing structures for achieving optimal retrieval accuracy. It establishes a logarithmic relationship between optimal partition depth and dataset size, balancing search efficiency and coverage probability. The study derives closed-form expressions for the granularity-recall trade-off and proves the existence of a critical partition height under general metric space assumptions.
Key findings
Optimal partition depth follows a logarithmic relationship with dataset size.
Derivation of closed-form expressions for the granularity-recall trade-off.
Proof of a critical partition height that maximizes expected recall under general metric space assumptions.
Limitations & open questions
The theoretical framework assumes general metric space and data distribution, which may not hold for all datasets.