The only current way to do this is by implementing your own matrix multiplication. You will need 200000 * 4 * 4 = 3200000 multiplications to compute this matrix product, so I would not suggest a fully parallel approach. You could use the HDL multiply-accumulate block or the dot product block to compute each output value, and implement selection logic to feed the single-pixel compute core the correct set of vector inputs from your data source.
0 Comments
Sign in to comment.