Design And Implementation Of A Parallel Pipelined Matrix Transposition Architecture Using Shift Registers
DOI:
https://doi.org/10.70135/seejph.vi.6751Abstract
An efficient algorithm and architecture for matrix transposition using registers is proposed, enabling parallel processing with reduced latency and complexity. The design supports K-parallel transposition, where K represents the level of parallelism, and achieves minimal latency and memory usage. This architecture employs a sequence of uniform swap units arranged in a cascaded manner, where the activation of each stage is algorithmically defined and driven by counter-based control logic. The design supports matrix transposition for dimensions that are integer multiples of K, where K is not constrained to powers of two. As part of this study, a 3-parallel and a 4-parallel architecture are implemented for transposing a 12×24 matrix. A performance comparison shows that the 4-parallel architecture performs better in terms of processing efficiency and resource utilization. The results also offer deeper insights into continuous-flow transposition for non-square matrices.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.