-- BarryGreen - 27 Jul 2007 -- BarryGreen - 17 May 2007

Calice Off Detector Receiver User Guide (Quad Channel)

This is version 4 of the documentation. The main changes to the ODR are to automate the transfer of data to the host by offloading functions to the ODR. The target is to remove the necessity for the ODR to be polled. Initialisation requires the allocation of a host memory page for the DMA done FIFO and a list of available host pages to be maintained on the ODR for the data to be transferred to.

PCI Address Space

Base Address 0

Size 4k bytes 32-bit, non-prefetchable

Address Map
Register Name 64 bit Address Byte Address
Unused 0x00 0x0000
DMA1LocalReg 0x01 0x0008
DMA1PCIAddressReg 0x02 0x0010
DMA1AbortReg 0x03 0x0018
Unused 0x04 0x0020
DMA1Status 0x05 0x0028
Unused 0x06 0x0030
DMA2LocalReg 0x07 0x0038
DMA2PCIAddressReg 0x08 0x0040
DMA2AbortReg 0x09 0x0048
Unused 0x0A 0x0050
DMA2Status 0x0B 0x0058
Unused 0x0C 0x0060
DMA3LocalReg 0x0D 0x0068
DMA3PCIAddressReg 0x0E 0x0070
DMA3AbortReg 0x0F 0x0078
Unused 0x10 0x0080
DMA3Status 0x11 0x0088
GlobalControlReg 0x12 0x0090
Global Status 0x13 0x0098
Unused 0x14 0x00A0
Unused 0x15 0x00A8
FreePageQueueFIFO 0x16 0x00B0
FDWriteIndex 0x17 0x00B8
FDWriteIndexLocation 0x18 0x00C0
WPDuplicatorTimeout 0x19 0x00C8
DMADoneFIFO 0x1A 0x00D0
FreePageFIFO 0x1B 0x00D8
FDFIFOLocation 0x1C 0x00E0
FDReadIndex 0x1D 0x00E8
FDTimeCount 0x1E 0x00F0
WindexTestReg 0x1F 0x00F8
EthxTXFIFO 0x20/30/40/50 0x0100/180/200/280
DataGeneratorx 0x21/31/41/51 0x0108/188/208/288
DataGeneratorxStatus 0x22/32/42/52 0x0110/190/210/290
BPMxHeaderReg 0x23/33/43/53 0x0118/198/218/298
BPMxReadPointerReg 0x24/34/44/54 0x0120/1A0/220/2A0
BPMxFIFO_NU 0x25/35/45/55 0x0128/1A8/228/2A8
BPMxStatus 0x26/36/46/56 0x0130/1B0/230/2B0
BPMxFIFO 0x27/37/47/57 0x0138/1B8/238/2B8
MemoryControllerxReg 0x28/38/48/58 0x0140/1C0/240/2C0
MemoryControllerxStatus 0x29/39/49/59 0x0148/1C8/248/2C8
EthxStatus 0x2A/3A/4A/5A 0x0150/1D0/250/2D0
ChannelxControlReg 0x2B/3B/4B/5B 0x0158/1D8/258/2D8
BPMxFIFOAccessCount 0x2C/3C/4C/5C 0x0160/1E0/260/2E0
ChannelxStatus 0x2D/3D/4D/5D 0x0168/1E8/268/2E8
EthMACxStatus 0x2E/3E/4E/5E 0x0170/1F0/270/2F0
TwoToThirtyTwoCount 0x2F/3F/4F/5F 0x0178/1F8/278/2F8
     
DMA4Parameters 0xA0 0x0500
DMA4LocalReg 0xA1 0x0508
DMA4PCIAddressReg 0xA2 0x0510
DMA4AbortReg 0xA3 0x0518
Unused 0xA4 0x0520
DMA4Status 0xA5 0x0528
     
FPFTestReg 0xB0 0x0580
PostArbBPMFIFO 0xB1 0x0588

Base Address 1

Size 128M bytes 32-bit, non-prefetchable
Memory Map
Memory Aread 128M Address Byte Address
BufferMemory0 0 0x0000000
BufferMemory1 1 0x4000000

Base Address 2

Size 16k bytes 32 bit, non-prefetchable
Registers Long Word Address Byte Address
MAC 0 Host Bus 0x000 0x0000
MAC 1 Host Bus 0x080 0x0400
MAC 2 Host Bus 0x100 0x0800
MAC 3 Host Bus 0x180 0x0C00

Offset Function
  _Configuration_
0x0200 Receiver (Word 0)
0x0240 Receiver (Word 1)
0x0280 Transmitter
0x02C0 Flow Control
0x0300 Ethernet MAC Mode
0x0340 RGMII/SGMII
0x0380 Management
  _Address Filter_
0x0380 Unicast Address (Word 0)
0x0384 Unicast Address (Word 1)
0x0388 Multicast Address Table Access (Word 0)
0x038C Multicast Address Table Access (Word 1)
0x0390 Mode

See Xilinx EMAC User Guide UG074, pages 69-75 for Configuration and 75-78 for Address Filter register details.

Register Descriptions

#Unused00

(0x00, 0x0000) Unused

Was DMA1Parameters

(0x01, 0x0008) DMA1LocalReg

This shows the address of the source of the transfer and the size of the transfer. see PCIExpress Xpress Lite Reference Manual. Read Only
Bits Function
63:32 Local Address
31:0 Transfer Size

(0x02, 0x0010) DMA1PCIAddressReg

see PCIExpress Xpress Lite Reference Manual This shows the PCI source/destination address. Read Only

(0x03, 0x0018) DMA1AbortReg

see PCIExpress Xpress Lite Reference Manual Accessing this register will abort the current DMA

#Unused04

(0x04, 0x0020) Unused

(0x05, 0x0028) DMA1Status

see PCIExpress Xpress Lite Reference Manual Read Only

Bits 3:0 State Meaning
0000 Idle Last Transfer ended successfully
0001 Idle Last Transfer was stopped by backend
0010 Idle Last Transfer ended because of CPL timeout
0011 Idle Last Transfer ended because of CPL UR error
0100 Idle Last Transfer ended because of CPL CA error
0XXX Idle reserved
1000 Busy Channel Busy Processing
1001 Busy Requesting Transfer
1010 Busy Waiting for completion
1011 Busy Waiting for backend to provide/accept data
1xxx Busy Reserved

#Unused06

(0x06, 0x0030) Unused

Was DMA2Parameters

(0x07, 0x0038) DMA2LocalReg

Same as DMA1LocalReg

(0x08, 0x0040) DMA2PCIAddressReg

Same as DMA1PCIAddressReg

(0x09, 0x0048) DMA2AbortReg

Same as DMA1AbortReg

(0x0A, 0x0050) Unused

(0x0B, 0x0058) DMA2Status

Same as DMA1Status

(0x0C, 0x0060) Unused

Was DMA3Parameters

(0x0D, 0x0068) DMA3LocalReg

Same as DMA1LocalReg

(0x0E, 0x0070) DMA3PCIAddressReg

Same as DMA1PCIAddressReg

(0x0F, 0x0078) DMA3AbortReg

Same as DMA1AbortReg

#Unused10

(0x10, 0x0080) Unused

(0x11, 0x0088) DMA3Status

Same as DMA1Status

(0x12, 0x0090) GlobalControlReg

Bits must be modified on a read-modify-write basis. Resets are not self clearing so must be set to 1 then reset to 0. Some parts can be disabled by holding them reset. e.g. The Used FIFO Arbiter takes data from the Used FIFO. If you are reading the Used FIFO directly the Used FIFO Arbiter should be held reset to stop it removing words from the FIFO.

Bit Function Operation
0 Channel 0 Reset 1= Reset
1 Channel 1 Reset 1= Reset
2 Channel 2 Reset 1= Reset
3 Channel 3 Reset 1= Reset
8 ReadAddressDecoderReset 1= Reset
9 ReadDataMuxReset 1= Reset
10 WriteAddressDecoderReset 1= Reset
11 SlaveAddressDecoderReset 1= Reset
13 RDSAAccessCounterReset 1= Reset
14 DMA3QueueReset 1= Reset
15 EthernetInterfaceReset 1= Reset
16 PrimaryDMAControlReset 1= Reset
17 FDReset 1= Reset
18 UsedFIFOArbReset 1= Reset

(0x13, 0x0098) GlobalStatus

Bits Mapping Description
63:48 ReadDMASetupAccessCount(15:0) Debugging counter to count read operations

#Unused14

(0x14, 0x00A0) Unused

#Unused15

(0x15, 0x00A8) Unused

(0x16, 0x00B0) FreePageQueueFIFO

The physical address of the host memory pages are written here shifted right by 8. This FIFO can hold 1024 entries, if the number of pages in the pool is less than the FIFO size then the FIFO doesn't have to be checked for free space. This currently assumes address bits above 43 are zero, if they aren't this won't work and will have to be changed.
FIFO bits Page Address
35 downto 0 43 downto 8

(0x17,0x00B8) FDWriteIndex

(0x18,0x00C0) FDWriteIndexLocation

(0x19,0x00C8) WPDuplicatorTimeout

(0x1A,0x00D0) DMADoneFIFO

(0x1B,0x00D8) FreePageFIFO

#FDFIFOLocation

(0x1C,0x00E0) FDFIFOLocation

(0x1D,0x00E8) FDReadIndex

(0x1E,0x00F0) FDTimeCount

(0x1F,0x00F8) WindexTestReg

(0x20/30/40/50, 0x0100/180/200/280) EthxTXFIFO

The signals are the same as the RX in reverse: data, sof, eof, src_rdy, dst_rdy. But I guess you know that, and need more detail:

There are already FIFOs in place (as with the RX too): Below is taken form the source (note that the individual clocks have been have been commoned-up in our implementation):

This is a transmitter side local link fifo implementation for the design example of the Virtex-4 Ethernet MAC Wrapper core. The transmit FIFO is created from 2 Block RAMs of size 2048 words of 8-bits per word, giving a total frame memory capacity of 4096 bytes.

Valid frame data received from local link interface is written into the Block RAM on the write_clock. The FIFO will store frames upto 4kbytes in length. If larger frames are written to the FIFO the local-link interface will accept the rest of the frame, but that frame will be dropped by the FIFO and the overflow signal will be asserted.

The FIFO is designed to work with a minimum frame length of 14 bytes.

When there is at least one complete frame in the FIFO, the MAC transmitter client interface will be driven to request frame transmission by placing the first byte of the frame onto tx_data[7:0] and by asserting tx_data_valid. The MAC will later respond by asserting tx_ack. At this point the remaining frame data is read out of the FIFO in a continuous burst. Data is read out of the FIFO on the rd_clk.

If the generic FULL_DUPLEX_ONLY is set to false, the FIFO will requeue and retransmit frames as requested by the MAC. Once a frame has been transmitted by the FIFO it is stored until the possible retransmit window for that frame has expired.

The FIFO has been designed to operate with different clocks on the write and read sides. The write clock (locallink clock) can be an equal or faster frequency than the read clock (client clock). The minimum write clock frequency is the read clock frequency divided by 4.

The FIFO memory size can be increased by expanding the rx_addr and wr_addr signal widths, to address further BRAMs.

Requirements : * minimum frame size is 14 bytes * tx ack is never asserted at intervals closer than 64 clocks * Write clock is always greater than a quarter of the read clock frequency * tx retransmit is never closer than 16 clocks together

(0x21/31/41/51, 0x0108/188/208/288) DataGeneratorx

The Data Generator creates a dummy data stream which by default is the input data stream (see Global Control Register). The number written to this register sets the size of the data. The generator will free run once started but will be flow controlled by the destination. 0 is written to the generator to stop it.

Bits Function
15 down to 0 Event Fragment Size
31 downto 16 Run Number
55 downto 32 Fragments per event

Event fragment size sets the size of each packet of data.

Run Number is inserted into the run number field of the data packet.

Fragments per event sets the number of data packets in each spill. i.e. the number of packets before the spill number increments.

Data Generator Event packet format
Word Content Description
0 BOF Begining Marker
1 ee1234ee hdr marker
2 00000009 hdr size
3 03000000 format code
4 abcdacbd source id
5 run number run number
6 Event ID Event ID
7 Spill number Spill number
8 b0b0b0b0 Trigger type
9 0d0d0d0d Event type
last - 3 00000001
last - 2 Data Length
last - 1 00000001
last EOF End marker

(0x22/32/42/52, 0x0110/190/210/290) DataGeneratorxStatus

(0x23/33/43/53, 0x0118/198/218/298) BPMxHeaderReg

The Buffer Pointer Manager Header Register is 8 byte wide registers which can be accesses as bytes, 16 bit words, 32 bit words and a 64 bit word. The value of the byte causes that word to be extracted from the data stream and put into the BPM FIFO. By by default all the bytes are set to 0x06. This will cause the 7th byte of the stream to be copied into the BPM FIFO. Duplicates have no additional effect.

(0x24/34/44/54, 0x0120/1A0/220/2A0) BPMxReadPointerReg

The Buffer Pointer Manager Read pointer Register allows the Read Pointer for the Buffer Manager to be changed. The Buffer Manager will only write data up to the (Read Pointer) -1 . As the data is read from the buffer under off card control the read pointer can be incremented.

(0x25/35/45/55, 0x0128/1A8/228/2A8) BPMxFIFO_NU

This register is the same as the BPMxFIFO except it doesn't cause the data to be removed from the FIFO

(0x26/36/46/56, 0x0130/1B0/230/2B0) BPMxStatus

Bits Function
63 MemoryFull
62 WriteDataReadEnable
61 AnyTagMatch
60:Memory word bits+32 0
memory word bits +31 : 32 write pointer
31:10 0
9:0 BPMFIFOFill(9:0)

(0x27/37/47/57, 0x0138/1B8/238/2B8) BPMxFIFO

Buffer Pointer Manager FIFO. Also referred to as the Used FIFO and the Extracts FIFO. Input data stream is tagged with Start, End and Error markers. When these markers are encountered the data word is extracted in to the BPM Fifo. Additional words can be extracted based on their position in the stream relative to the start tag. The additional word to extract can be set in the BufferPointerManagerReg.
Bits Function
31:0 ReceiveData
MemoryWordBits + 31:32 WritePointer(MemoryWordBits -1:0)
57:MemoryWordBits + 32 0
58 ReceiveStartOfPacket
59 ReceiveErrorPacket
60 ReceiveEndOfPacket
61 HeaderTrigger
62 0
63 FIFO Empty
Care must be taken when reading this FIFO. The first read must check the FIFO empty bit. Subsequent reads must only be done if the FIFO is not empty. The FIFO is updated when the lowest byte is read. If all bytes are read every time then it is possible for the FIFO status to change during the read and can result in loss of information.

(0x28/38/48/58, 0x0140/1C0/240/2C0) MemoryControllerxReg

Currently not implemented

(0x29/39/49/59, 0x0148/1C8/248/2C8) MemoryControllerxStatus

Currently not implemented

(0x2A/3A/4A/5A, 0x0150/1D0/250/2D0) EthxStatus

Dual Ethernet Interface Status. This shows overall status of the 2 MAC+LocalLink FIFO modules that make up the Ethernet Interface. The register is split symetrically into the lower 32 bits for MAC0 and upper 32 for MAC1. Details of the individual MAC frame errors etc. are in 2 additional registers, 1 per MAC (see EthernetInterfaceMacxStatus).

The various Count values are generated from the rising edges of their corrosponding signals. NOTE: New and changed values are text in italics. (1/8/08).

bits Function
63:32 RX Byte Count
bits Function
31:28 TX Overflow Count
 
 
 
27:24 TX FIFO Used - 256 byte units
 
 
 
23:20 SFP LOS Count
 
 
 
19 TX Overflow
18 Unused = 0
17 SFP Present
16 TX SFP Fault
15:12 RX Overflow Count
 
 
 
11:8 RX FIFO Used - 256 byte units
 
 
 
7:4 RX Framedrop Count
 
 
 
3 RX Overflow
2 RX Frame Drop
1 RX SFP LOS
0 Tranceiver Sync Aquired

(0x2B/3B/4B/5B, 0x0158/1D8/258/2D8) ChannelxControlReg

Channel reset from the global register will reset all bits in the register and will cause all resets in this register to be asserted.
Bits Function comment
0 MemoryControllerReset
1 BPMAccessCounterReset
2 BufferPointerManagerReset
3 DataGeneratorReset
4 MemorySizeRegReset
63 DataStreamSelect 1 = EthernetInterface ,0 = DataGenerator (Default)

(0x2C/3C/4C/5C, 0x0160/1E0/260/2E0) BufferPointerManagerxReg

This register allows the Buffer Pointer Manager Read Pointer to be changed. The Buffer Pointer Manager deals with 32 bit words. After data has been copied from the Buffer memory the read pointer should be incremented to allow this memory to be reused.

(0x2D/3D/4D/5D, 0x0168/1E8/268/2E8) ChannelxStatus

Bits Name
1 Data Valid from source Mux
0 Read Enable from BPM

(0x2E/3E/4E/5E, 0x0170/1F0/270/2F0) EthMACxStatus

Mapped directly onto Xilinx EMAC status registers. 27:0 RX, 63:32 TX. These are detailed in Xilinx EMAC User Guide UG074 (http://www.xilinx.com/bvdocs/userguides/ug074.pdf), pages 62-66. Briefly:
Bits Name
TX STATISTICS VECTOR
63 PAUSE_FRAME_TRANSMITTED
62 Reserved
61 Reserved
60:57 TX_ATTEMPTS[3:0]
56 Reserved
55 EXCESSIVE_COLLISION
54 LATE_COLLISION
53 EXCESSIVE_DEFERRAL
52 TX_DEFERRED
51 VLAN_FRAME
50:37 FRAME_LENGTH_COUNT
36 CONTROL_FRAME
35 UNDERRUN_FRAME
34 MULTICAST_FRAME
33 BROADCAST_FRAME
32 SUCCESSFUL_FRAME
RX STATISTICS VECTOR
26 ALIGNMENT_ERROR
25 Length/Type Out of Range
24 BAD_OPCODE
23 FLOW_CONTROL_FRAME
22 Reserved
21 VLAN_FRAME
20 OUT_OF_BOUNDS
19 CONTROL_FRAME
18:5 FRAME_LENGTH_COUNT
4 MULTICAST_FRAME
3 BROADCAST_FRAME
2 FCS_ERROR
1 BAD_FRAME*
0 GOOD_FRAME*
*If the length/type field error checks are disabled, then a frame containing this type of error is marked as a GOOD_FRAME, providing no additional errors were detected.

(00x2F/3F/4F/5F, 0x0178/1F8/278/2F8) TwoToThirtyTwox

Currently not implemented

(unallocated) BufferMemoryx

The Buffer Memory space is twice the size of the actual memory size. The memory appears twice in this area. This should simplify wrap-around. If a DMA has to read from near the end of the memory and continue at the beginning then it can be allowed to read into the beginning of the mirror. Programming: The start address should be masked such that it is in the first region but the end address is the start plus the size and can be allowed to run into the mirror region.

(unallocated) MemoryxSizeReg

Currently this register can only be read to find the number of memory address bits. e.g. it would read 15 for a 32k word memory. This register is intended to allow a memory size to be set which is smaller than the physical memory size. This would enable the effect on performance of different memory sizes to be measured. This feature is currently not implemented.

Polling Read Avoidance

Three DMAs are used to avoid polling reads of the ODR across the PCI bus. These are the DataDMA, the DMAdoneDMA and the Write pointer duplicator. The register assocciated with these functions are the free page list, the dmadone fifo page, the write pointer duplicate location and the write pointer duplicate update timer.

Data DMA

The data DMA is the DMA which transfers the main data stream to the host memory. The destination location is taken from the free page list. The free page list is a fifo which is written to by the host driver and contains the physical/PCI address of pages of memory available on the host. When a transfer has completed information about the transfer is written to the DMADone fifo.

DMA done DMA

The DMA done DMAs function is to transfer the list of completed Data DMAs to a single page of memory allocated on the host. This page functions as a circular buffer (FIFO) and is read by the driver . The read and write indexess for this FIFO are duplicated to avoid polling reads across the PCI bus. The primary read pointer is stored as a host variable. When it is changed it must also be changed in the Read Pointer duplicate on the ODR card.

Write Pointer Duplicator

The Write Pointer Duplicator is a DMA which has the sole function of duplicating the write index of the DMA done circular buffer to a location in the host memory. If compares the last duplicated write pointer with the current one and updates it if they are different. A time will set a minumum time between updates to avoid inundating the bus with these small inefficient transfers.

How to use the DMA offload

The purpose of the DMA offload is to duplicate data from the ODR into the host memory and avoid reads (polling) across the PCIe bus. There are 3 stages to the DMA Offload.

1. Primary DMA. This copies the main block of data to preassigned pages in the host memory. When a block of data has been transferred its location and ID is written to the DMA Done FIFO.

2. FIFO Duplicator. To avoid the host application reading the DMA done fifo is is duplicated in the host memory. As it writes to the FIFO the write pointer is updated.

3. Write Pointer Duplicator. To avoid the host application having to read the write pointer it is duplicated at a preassigned location in the host memory.

Ethernet MAC Address Map
Each MAC is accessed via it's own address space (details TBA). Each MAC has 11 bits of address, with the MSb selecting the future MDIO mode.
Address bits Register
addr(10) MDIO mode select (not yet implemented)
addr(9:0) MAC/MDIO register address

Registers are further devided into Configuration and Address Filter groups. All registers use only the lower 32bits of a data accesses, except some filter registers return 64 bit data.

Initialisation Sequence

1. Primary DMA A number of 4k byte pages have to be allocated in the host memory. The virtual address used by the program has to be translated to a physical address for use by the hardware. The physical address is then shifted right 12 to remove the trailing 0s. The pages must be on 4k byte boundaries. Each 4k page could be subdivided in to smaller pages but minor changes to the firmware would have to be made to accommodate this.

2. FIFO Duplicator The DMADone FIFO is duplicated (or copied) in to a single 4l byte page in the host memory. The host application need to allocate a page in memory then convert the address to a physical address and shift is 12 to the write . This is then written to the FDFIFOLocation register.

3. Write Pointer Duplicator The host application must create an 8 byte global volatile variable. The pointer to this is the virtual address. This virtual address needs to be converted to a physical address. The physical address has to be written to the register.

C Structures

FIFO Duplicate structure

struct {uint32_t WordNo       : 3;
        uint32_t dummy1       : 4;
        uint32_t StartAddress : 26;
        uint32_t ChannelNo    : 5;
        uint32_t dummy2       : 12;
        uint32_t Length       : 13;
        uint32_t dummy3       : 1;
        } Word1Struct;

struct{uint32_t WordNo       : 3;
       uint32_t dummy1       : 2;
       uint32_t EndAddress   : 26;
       uint32_t DataExtract  : 32;
       uint32_t Dummy2       : 1;
       } Word2Struct;

struct{uint32_t WordNo     : 3;
       uint32_t Dummy1     : 8;
       uint32_t PageNo     : 52;
       uint32_t Dummy2     : 1;
       } Word3Struct;

struct{uint32_t WordNo            : 3;
       uint32_t Dummy1            : 18;
       uint32_t FreePageFIFOFill  : 11;
       uint32_t Dummy2            : 1;
       uint32_t DMDDoneFIFOFill   : 10;
       uint32_t DMATimeCount      : 20;
       uint32_t Dummy3            : 1;
       } Word4Struct;

union u_tag { unsigned short[4] ShortWords;
                    Word1Struct Word1;
                    Word2Struct Word2;
                    Word3Struct Word3;
                    Word4Struct Word4} FDFIFOElement;

FDFIFOElement FDFIFO[512];
                   

-- BarryGreen - 24 Aug 2007 -- BarryGreen - 16 Jan 2008

-- BarryGreen - 09-Nov-2009

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2009-11-09 - BarryGreen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CALICE All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback