This paper investigates self-supervised approaches to improve XBRL tag recommendation, focusing on domain-specific fine-tuning and zero-shot re-ranking strategies. We implement and evaluate multiple baseline methods and propose novel self-supervised frameworks combining semantic retrieval with lightweight reranking mechanisms. Experiments on the FNXL dataset demonstrate the difficulty of this extreme classification task and reveal opportunities for self-supervised methods to improve performance, particularly for rare tags in the long-tail distribution.
Key findings
The paper proposes self-supervised frameworks combining semantic retrieval with lightweight reranking mechanisms for XBRL tag recommendation.
Experiments on the FNXL dataset show the potential of self-supervised methods to improve performance, especially for rare tags.
Analysis provides insights into the challenges of financial numeral labeling and establishes a foundation for future work in this domain.
Limitations & open questions
The paper does not provide a comprehensive comparison with all possible extreme classification methods.
The proposed methods' scalability and computational efficiency for production deployment are not fully explored.