Software

I am committed to efficient software production: my software is available on GitHub.

Data Structures

Efficient C++ implementations of the following data structures (see also related publications).

  • Compacted and Colored de Bruijn Graphs (RECOMB2023, WABI2023, ALGOMB2024, RECOMB2024)
  • K-Mer Dictionaries (ISMB2022, WABI2022, GBIO2023, ALGOMB2023)
  • Minimal Perfect Hash Functions (SIGIR2021, ISMB2023, TKDE2023)
  • Segment-Trees and Fenwick-Trees (SPE2021)
  • Mutable Bitmaps with Rank/Select (INFOSYS2021)
  • Compressed Bitmaps (DCC2021)
  • Tries (SIGIR2017, TOIS2019, TKDE2021)
  • Inverted Indexes (TOIS2017, WSDM2019, BIC2019, TKDE2020, SIGIR2020, CSUR2021)

Fulgor – A fast and compact k-mer index for large-scale matching and color queries.
Reference publications: WABI2023, ALGOMB2024, RECOMB2024.

LPHash – Fast and compact locality-preserving minimal perfect hashing for k-mer sets.
Reference publications: ISMB2023.

SSHash – A compressed, weighted, associative, and exact dictionary for k-mers. A membership-only version of SSHash is SSHash-Lite.
Reference publications: ISMB2022, WABI2022, GBIO2023, ALGOMB2023.

PTHash – Fast and compact minimal perfect hash functions.
Reference publications: SIGIR2021, TKDE2023.

Mutable Rank and Select Queries – Mutable bitmaps with support for Rank and Select queries.
Reference publication: INFOSYS2021.

Prefix-Sum Queries – A range of tree-shaped data structures for maintaining prefix-sums, including: binary Segment-Tree (top-down and bottom-up), b-ary Segment-Tree, Fenwick-Tree, b-ary Fenwick-Tree, blocked Fenwick-Tree, truncated Fenwick-Tree.
Reference publication: SPE2021.

Query Auto-Completion – Efficienct and effective autocompletion framework, based on forward/inverted indexes, succinct RMQ, and string dictionaries (Front-Coding and tries).
Reference publication: SIGIR2020.

Inverted Indexes Benchmark – A benchmarking suite for inverted index data structures, featuring the following compressors: Elias-Fano and partitioned Elias-Fano, Opt-PFor-Delta, Binary Interpolative, QMX, Simple, Variable-Byte and Opt-VByte, Gamma, Delta, Rice, Zeta, DINT.
Reference publication: CSUR2021.

Interpolative Coding – An efficient implementation of the Binary Interpolative Coding algorithm.
Reference publication: BIC2019.

Sliced Indexes – Compressed bitmap indexes that support fast intersection and union.
Reference publication: DCC2021.

DINT – A Fast and compact dictionary-based decoder for inverted lists.
Reference publication: WSDM2019.

Opt-VByte – Optimal partitioning of inverted lists compressed using binary vectors and point-wise encoders, like Variable-Byte.
Reference publication: TKDE2020.

Indexes for RDF – Trie-based indexes for semantic data like RDF triples.
Reference publication: TKDE2021.

Tongrams – Fast language model queries and estimation in compressed space.
Reference publications: SIGIR2017, TOIS2019.

Clustered Elias-Fano Indexes – Clustered Elias-Fano inverted indexes.
Reference publication: TOIS2017.

Miscellanea

C++ utilities that I used as sub-modules for larger projects.

essentials – A C++ library providing essential core utilities for data structure design and benchmarking. More precisely:

  • benchmarking facilities, including: messages displaying local time, configurable timer class, function to prevent code elision by compiler, simple creation and printing of json documents;
  • functions to serialize-to and load-from disk data structures;
  • functions to compute the number of bytes consumed by data structures;
  • support for creating, removing, and iterate inside directories;
  • transparent support for contiguous memory allocation.

cmd_line_parser – Command line parser for C++17. It offers all handy features in just 150 lines of code.

mm_file – A self-contained, header-only, implementation of memory-mapped files in C++ for both reading and writing.