i kind of went into this thinking that dual pivot quicksort was faster because it performed less instructions, turned out i was wrong when i read the paper, this actually performs twice the instructions (tested it too), didnt really bother to add a pivot selection but i could and it would help alot. The reason dual pivot quicksort is faster is because it has less cache misses and is primarily for modern architecture, where sorting algorithms are not cpu bound, so outside of scratch, assuming i improved the pivot selection, theres a good chance it can beat quicksort v2
the implementation is mine but i later used some optimizations and changes from both quicksort v2 and the original implementation + paper on above: https://arxiv.org/pdf/1511.01138