ADRV9009-SOM - clocking issues and now boot issues


I am running into some strange issues with the Rev. A ADRV9009-ZU11EG SOM that have only started appearing in the last few weeks (have been using this specific SOM board for development for ~1yr now without issue). These problems may or may not be related - I am stumped at the moment.

Quick Summary: My custom PL design started failing in lab after a recent update, and an ILA probe showed what I thought were unreported timing errors. After reworking a my design a few times and seeing a known older and working design also fail, I started looking for other sources for the cause of the failure and found the SOM's HMC7044 PLL2 isn't locking (most likely causing my observed issues). Then a week and a half later, IIO devices failed to populate. And a day after that the SOM doesn't get past first stage boot loader.

I have a modified HDL reference design (2019_r1 release) that I have been incrementally adding custom IP to. Up until the start of December, I haven't run into any major issues with my modifications to the reference design. I noticed some data wasn't coming out correctly from a newly added PL-PS interface, so I scoped the interface with an ILA and saw that the interface had data bits and control signals randomly change in unexpected ways clock cycle-to-clock cycle only occasionally maintaining the correct values. Thinking this was a timing issue with my HDL design unreported by Vivado, I reworked the data path to be better pipelined and optimized, but I was still seeing these random bit flips in the ILA. I also checked an older working design, but these same issues were present there as well, but with different signals.

Thinking now that my PL design is not the root issue, I started looking into other areas this problem could be originating from i.e. the clock distribution path. I checked the iio_attr of the SOM's HMC7044, and I noticed that the PLL2 was not locking. I then took a full diagnostic report using the ADI Diagnostic Report Tool  on 12/24/2020 (date will be relevant later).

Note: Anything I tried in my modified design, I also tried using the ADI default files for Rev. A from the ADRV9009-SOM Quick Start wiki and had the same result.

The snippet I am referring to is:

	iio:device3: hmc7044
		10 channels found:
. Channel info
		2 debug attributes found:
				debug attr  0: status value: 
--- PLL1 ---
Status:	Locked
Using:	CLKIN1 @ 122880000 Hz
PFD:	7680 kHz
--- PLL2 ---
Status:	Unlocked (Synchronized)
Frequency:	2949120000 Hz
SYSREF Status:	Invalid
SYNC Status:	Unsynchronized
Lock Status:	Unlocked
				debug attr  1: direct_reg_access value: 0x0
		No trigger on this device

I compared this to the Quick Start wiki and the output of the SOM's HMC7044 iio_attr query:

root@analog:~# iio_attr -q -D hmc7044 status
--- PLL1 ---
Status: Locked
Using:  CLKIN1 @ 122880000 Hz
PFD:    7680 kHz
--- PLL2 ---
Status: Locked (Unsynchronized)
Frequency:      2949120000 Hz
SYSREF Status:  Valid & Locked
SYNC Status:    Unsynchronized
Lock Status:    PLL1 & PLL2 Locked

These outputs are obviously different, and the SOM's HMC7044 lock status being unlocked would explain why the PL (clocked to core_clk_b from the HMC7044) is failing, assuming the clock distribution path outputs would have a high jitter due to the unlocked status.

Then yesterday, as I was trying to continue figuring out what's going on, the iio devices stopped populating. I did an "iio_info | grep iio:device" query, this was the output:

root@analog:~# iio_info | grep iio:device
        iio:device0: ams
        iio:device1: hmc7044-car
        iio:device2: adm1177

Again comparing to the Quick Start wiki of the same query:

root@analog:~# iio_info | grep iio:device
        iio:device0: ams
        iio:device1: hmc7044-car
        iio:device2: adm1177
        iio:device3: hmc7044
        iio:device4: adrv9009-phy
        iio:device5: adrv9009-phy-b
        iio:device6: axi-adrv9009-rx-obs-hpc (buffer capable)
        iio:device7: axi-adrv9009-tx-hpc (buffer capable)
        iio:device8: axi-adrv9009-rx-hpc (buffer capable)

Fast forward to today, using the exact same files as yesterday that successfully booted, the SOM does not getting past the boat loader. I tried both my modified files and the ADI files from the Quick Start wiki (both the Rev A and Rev B, just for good measure). In all cases, eventually the PS_ERR_OUT LED comes on on the carrier. I also noticed that the DS2 LED on the SOM board (PS_DONE) never turns on (which makes sense). Both DS3 (PG_SOM) and DS4 (PG_ALL) light up on the carrier and SOM.

I have double checked the SD card is selected for boot, and the boot switches are setup

So sometime around the start of December, the SOM's HMC7044 PLL2 stopped locking. The last full diagnostics report I have are from 12/24/2020. Then yesterday, IIO devices stopped populating and today the board doesn't boot with those same files.

My questions are:

1) The HMC7044 PLL2 not locking would cause a dirty input to be fed to the clock distribution path, correct? Then all the output clocks would have poor performance (i.e. high jitter) as a result?

2) What would possible causes of the SOM not getting past boot loader be? I am at a loss since the exact same setup was booting just yesterday...

Any insight is greatly appreciated!



Addendum: I tried a brand new Rev. C SOM on my Rev. B carrier board (with updated boot files from the Quick Start guide), and I am seeing the same boot issue as with the Rev. A SOM on the smae Rev. B carrier board.

Added info on Rev. C SOM experiment
[edited by: Samual at 6:25 PM (GMT -5) on 6 Jan 2021]

Top Replies