6Gen

6Gen starts with one cluster per seed address, repeatedly merges the most promising pairs, and grows the cluster pattern until a budget is exhausted. The algorithm prefers expansions that increase density while keeping the generated pattern as specific as possible.

  • Reference: Placeholder citation (coming soon).

Train

rmap train --seeds seeds/hitlist.txt --output models/6gen.bin six-gen \
  --budget 1000000

Generate

rmap generate --model models/6gen.bin --count 200000 \
  --unique --output sixgen.txt

Configuration

  • --budget <u128> – maximum aggregate size of all generated cluster patterns (default 1_000_000). Increasing the budget allows more aggressive growth at the cost of runtime.

Model notes

  • The serialized model stores every discovered cluster pattern and its relative weight. Generation samples clusters proportional to their size.
  • Provide a few thousand high-quality seeds to avoid early convergence on sparse regions; noisy seeds can cause the budget to be wasted on low-density patterns.