Programmable Modulus Implementation on AD9915

I have a system using that AD9915 that requires adjustment and precise control of the phase, amplitude and frequency.  Currently, we clock the DDS at 2.5GHz which sets the system clock at (1/16) 156.25MHz.  We set the function pins to 0010 to set the FTW using the 32 parallel pins, then 0100 for POW-AMP.  We then set the IO UPDATE.  This is as follows:

Cycle 0: FTW (0010)

Cycle 1: POW+AMP (0100)

Cycle 2: IO UPDATE

It takes use 3 cycles to update phase/frequency/amplitude which would allow us to update every 3/156.25MHz = 19.2ns.

I would like to try to improve the output frequency stepping resolution by using the programmable modulus mode but have some questions regarding its implementation.  My understanding is that, in addition to sending the FTW, we also need to set the 32 bit A and the B registers.  My question is: what is the fastest way to set frequency, phase and amplitude?  Do we need to set the A and B values using 16 bit parallel programming mode? Could we, for example, do something like this:

Cycle 0: POW+AMP (0100)

Cycle 1: FTW (0010)

Cycle 2: A[31:16] (0000)

Cycle 3: A[15:0] (0000)

Cycle 4: B[31:16] (0000)

Cycle 5: B[15:0] (0000)

Cycle 6: IO UPDATE

In this configuration it takes use 7 cycles to update phase/frequency/amplitude which would allow us to update every 7/156.25MHz = 44.8ns.

Is there a better/faster way?