||Spike sorting is a critical task in neural signal decoding because of its computational complexity. From this perspective, the research trend in the last years aimed at designing massively parallel hardware accelerators. However, for implantable system with a reduced number of channels, as could be those interfaced to the Peripheral Nervous Systems (PNS) for neural prostheses, the efficiency in terms of area and power is in contrast with such a parallelism exploitation. In this paper, a novel approach based on high-level dataflow description and automatic hardware generation is presented and evaluated on an on-line spike sorting algorithm for PNS signals. Results in the best case revealed a 71% of area saving compared to more traditional solutions, without any accuracy penalty. With respect to single kernels execution, better latency performance are achievable still minimizing the number of adopted resources.