is there any highly optimized crc32 algorithm for BF537?
i need a very fast implementation
regards chris
Would crc16 be good enough? If you put the internal vars as registers and if it is small enough, make sure the target data is in cashe it should be really quick.
i need a CRC32 - no CRC16 possible
Chris,
As a starting point you can use Yaniv’s “Efficient CRC calculation with minimal memory footprint”
, Embedded.com, 03/16/2008 article that shows how to implement any CRC up to 32bit (included) in two cycles per bit.
http://www.embedded.com/design/206901030?printable=true
Thanks,
Ystein
or check the sources of the BF54x boot code which come with your VisualDSP++ installation
We use a basic CRC check on the Blackfin. To send a N word array, we add on a single 32-bit so that when everything is xor'ed together, you end up with zero. This is good for catching any single word error. It's fast as well. The code snippet is:
{
DWORD crc=0;
for(i=0; i<nLen; i++)
{
crc^=g_PacketBuffer[i];
}
if (crc != 0)
goto Packet_Error;
}
Paul,
The algorithm that you are presenting called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the XOR of all those words. The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred.
With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words.
On the other hand:
CRC is along division operation in which the quotient is discarded and the reminder becomes the result, with the important distinction that the arithmetic used is the carry-less arithmetic of a finite field GF(2). Typically, an n-bit CRC, applied to a data block of arbitrary length, will detect any single error burst not longer than n bits (in other words, any single alteration that spans no more than n bits of the data).
Thanks,
Yosi
Just ran into something in the Blackfin Programming Reference, remembered this thread and decided it might be worth bringing it up. There's a couple of instructions, BXOR and BXORSHIFT, which have a Type I CRC as an example of their usage. You would need to make a couple of adjustments (CRC_LEN would have to be 32, and you'd have to alter the code to be callable from C - receiving a ptr to the message as a parameter etc), however it might at the very least give you ideas. I've copied the text from the PRM below:
The BXOR and BXORSHIFT instructions let you calculate a Type I
CRC at a rate of two cycles per input bit, as in the following example
program:
// _CRC_BXOR - calculate CRC value of a message polynomial
// for a given generator polynomial.
#define MSG_LEN 32 // bits
#define CRC_LEN 16 // bits
_CRC_BXOR:
a1 = a0 = 0;
r1 = 0x8408 (z); // LFSR polynomial, reversed:
// x^16 + x^12 + x^5 + 1
a1.w = r1; // initialize LFSR mask
r2.h = 0xd065; // r2 = message
r2.l = 0x86c9;
p1 = MSG_LEN (z);
loop _MSG_loop lc0 = p1;
loop_begin _MSG_loop;
r2 = rot r2 by 1;
a0 = bxorshift(a0, a1, cc);
loop_end _MSG_loop;
r0 = 0; // initialize CRC
r2.l = cc = bxor(a0, r1);
r0 = rot r0 by 1;
p1 = CRC_LEN-1 (z);
loop _CRC_loop lc0 = p1;
loop_begin _CRC_loop;
r2.l = cc = bxorshift(a0, r1);
r0 = rot r0 by 1;
loop_end _CRC_loop;
// r0.l now contains the CRC
_CRC_BXOR.end:
Hi,
Using 1 kB data memory, we can implement CRC32 on Blackfin with 1 cyc/bit.
-mahana
The code above, from Sean's post, is similar to the article Yosi pointed to earlier. In the article one can find detailed explanation on how to adjust this kind of code to specific needs.
Mahana,
I guess it would be nice to post a code sample.
Heh, I should really have looked at the embedded.com article you posted as it's pretty much the same as the PRM text.
LOL! I wonder why... ![]()
Hi Yaniv,
WIth LUT, it could work like this -
/* algorithm:
new CRC (updated each incoming byte) = (old CRC << 8) ^ (CRC_update_table[input byte ^ MSByte of old CRC] )
*/
// ... proper loading the loop, then:
// CRC lives in R0
CRC32Start:
r3 = r2 << 2 || r6 = [p1]; // r6 is 32-bit CRC corresponding to input byte^MSbyte of CRC
r3 = r7 + r3 (ns) || r1 = b[p0++]; // r7 - base for CRC update table, r1 - input byte
p1 = r3; // ptr into CRC update table, size 256*4 bytes
r5 = r0 >> 24; // 8 MSB of CRC interacting with input
r2 = r1 ^ r5;
r5 = r0 << 8; // 24 LSB of old CRC interacting with CRC update
CRC32End:
r0 = r5 ^ r6;
// 7 cycles per 8 bits = 0.875 c/bit
Boris,
Do you agree that the two LSB of R7 are "0" (32bit aligned!)
You can save a cycle by:
Mine R7 == yours >>2!!!
// CRC lives in R0
CRC32Start:
r3 = (r7+r2) << 2 || r6 =[p1]; // r7 = CRC table base , r6 is 32-bit CRC corresponding to input byte^MSbyte of CRC
p1 = r3; // ptr into CRC update table, size 256*4 bytes
r5 = r0 >> 24 || r1 = b[p0++]; // r1 - input byte, 8 MSB of CRC interacting with input
r2 = r1 ^ r5;
r5 = r0 << 8; // 24 LSB of old CRC interacting with CRC update
CRC32End:
r0 = r5 ^ r6;
and now you have 6 cyc/8bit == > 0.75cyc/bit
do you agree?
Thanks,
Yosi