BF534 l1 memory write problem

你好!

I encountered a problem in L1 memory write, which was a hundred times slower than reading data from L1.
Take a look at the code below. After adding "Filter_Buffer [j] = mul;" to the loop, the execution time of the loop increased by 8 microseconds, which I can't stand. Doesn't L1 memory write work with kernel clock? Why is this happening? I need to finish writing in 1 microsecond. What can I do to improve the write speed?

code:

section("L1_data_a") fract16 sPPI_RxBuffer[512];
section("L1_data_b") fract16 Filter_Buffer[512];

for (j = 3; j < 254; j++)
{
sum = mult_fr1x32(sPPI_RxBuffer[j - 3], GaussCoef[0]) ;
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j - 2], GaussCoef[1]));
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j - 1], GaussCoef[2]));
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j], GaussCoef[3]));
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j + 1], GaussCoef[4]));
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j + 2], GaussCoef[5]));
sum = add_fr1x32(sum, mult_fr1x32(sPPI_RxBuffer[j + 3], GaussCoef[6]));

mul = (fract16)(mult_fr1x32(sum, 121) >> 19);

Filter_Buffer[j] = mul;


}