JPEG Processing - 4:2:2 v 4:2:0?

My JPEG saga continue...

I am using a BF561 single core to compress a 640x480 YUV 4:2:2 image into JPEG format, employing the ADI JPEG Library tools.  I am getting very poor performance compared with the reference results from the Spec Sheet.  The Spec Sheet says that a colorful 512x512 YUV4:2:0 image with a quality factor of 20 will compress to 27KB in 10*10^6  cycles.  I have a boring 640x480 YUV4:2:2 image that produces a 45K image in 60*10^6 cycles (120mS @510MHz CLK).  The input and output buffers are in separate banks of SDRAM.  All other factors are defaults for Sequential JPEG encoding.

Any idea why it's so slow?  Even with twice the pixel data, it shouldn't be 5-6 times slower.



  • 0
    •  Analog Employees 
    on Jan 11, 2011 10:10 AM

    Hi Dan,

    there are a few other variables we can look at.  First, what is the SCLK on your setup?  Are you using writethrough or write-back cache?  Do you have interrupts running in the background?



  • Wow, I'm sorry I didn't see this sooner.  My project got side-tracked, but now I'm back on it.

    I am running DMA from the PPI to external SDRAM in the background while processing the previous image through the JPEG.  The SDRAM is arranged as 4 internal banks as suggested by the ADI notes.  My CCLK is 600MHz and SCLK is 120 MHz (or 510/128).  I have a very small 1mSec timer IRQ/ISR running, but nothing else.  I believe I'm using write-through cache by default (I don't see any cache config code anywhere).

    Some follow-up questions:

    - I am running my project in debug mode.  Some of the JPEG files are recompiled, but I'm assuming the libraries are optimized.  Correct?

    - I also assume that the JPEG library does not use DMA to retrieve the image blocks.  Do the core-fetches collide with the DMA activity inefficiently?



  • 0
    •  Analog Employees 
    on Jun 27, 2011 4:46 PM

    Hi Dan,

    just saw your post - here are some quick comments:

    - The JPEG libraries are optimised, and do not use DMA.

    - You should check the cache setup directly by analysing the processor registers (L1 Data Memory registers and L1 Code Memory registers) via the VisualDSP IDE.

    - sharing of the processor buses between DMA and the core is described in the blackfin hardware reference manual. Note that it is possible to play around with arbitration priority settings.



  • Hi Steve;

    Thanks for this response (and the internal one).  I've read all the words on the cache modes and SDRAM bank organization, but I'm still not clear on which is the best mode for what I'm doing.  Can you either explain the benefits further, or just give me a recommendation for the best performance for the JPEG Encoder engine?


  • 0
    •  Analog Employees 
    on Jul 6, 2011 5:51 AM

    This issue is now being dealt with via direct support, but for the benefit of other forum users, here is some information that should help.

    "When looking for basic information about how cache memories are used on BF processors. we suggest you to refer EE-271, which discusses about cache memory management for BF processors. It introduces popular cache schemes and then discusses the Blackfin instruction cache and the data cache in detail.

    You may also want to refer EE-326, which also discusses about how to increase the SDRAM Performance of Your System.

    Please go through these documents, in addition to Processor Hardware Reference Manual (HRM) and Programming Reference Manual (PRM)."