Changes between Version 1 and Version 2 of DPDKNotes


Ignore:
Timestamp:
08/07/13 13:19:52 (7 years ago)
Author:
rjs51
Comment:

Add in documentation for the advance settings of the DPDK format.

Legend:

Unmodified
Added
Removed
Modified
  • DPDKNotes

    v1 v2  
    1 = Notes on Libtrace Intel Data Plane Development Kit Support =
     1= Notes on Libtrace Intel Data Plane Development Kit (DPDK) Support =
    22
    33'''This format is considered experimental and has limitations that should be understood before using'''
     
    66The format supports most Intel NIC's see the release notes pdf.
    77
    8 Documentation and source code for the Intel DPDK can be downloaded from [www.intel.com/go/DPDK] the links are in a box at the bottom of the page.
     8Documentation and source code for the Intel DPDK can be downloaded from [http://www.intel.com/go/DPDK] the links are in a box at the bottom of the page.
    99
    10 = Basic Setup Guide for Intel DPDK for libtrace =
    11 '''It is strongly recommended that you build and test Intel DPDK with it's included samples and verify they are functioning correctly before attempting to use build libtrace with DPDK '''
     10= System Requirements =
     11* Gettimeofday() and/or clock_gettime() must be implemented as virtual system calls for your linux kernel, these are called for every packet received so the advantage of using DPDK will be lost if a system call still has to be made.
     12* DPDK is a polling format hence it is highly recommended to use a multicore system so other processes can be run on the remaining cores.
     13* For better performance the CPU core that has DPDK bound to it should only be running DPDK as such interrupts could be disabled on this core.
     14
     15= Libtrace application requirements =
     16* The same thread must be used to create, start and read/write packets and all other calls to libtrace format dependent functions.
     17* Minimal processing should be done on the thread interacting with libtrace and the DPDK format, for two main reasons:
     18  1. Packets will be dropped when queues fill up this applies to all formats
     19  2. The timestamping of DPDK packets occurs when trace_read_packet() is called (using gettimeofday()) so the longer packet processing takes the less accurate the timestamps are, unless hardware timestamping is being used.
     20* When using the DPDK format the system should remain on at all times, don't put it to sleep or into hibernation.
     21* There is a limitation of the DPDK format that only allows one trace to be created at any given time. This means only a single interface can be reading or writing (but not both) using the DPDK format at a given time. This does not stop other libtrace formats being used.
     22
     23
     24= Basic Setup Guide for libtrace with Intel DPDK =
     25'''It is strongly recommended that you build and test Intel DPDK with it's included samples and verify they are functioning correctly before attempting to use build libtrace with DPDK. '''
    1226
    13271. Read the DPDK Getting Started Guide and make sure the prerequisites are met such as hugepages.
    14 2. Download DPDK from the Intel website
     282. Download DPDK from the Intel website [http://www.intel.com/go/DPDK]
    15293. Extract the archive:
    1630
     
    2034}}}
    2135
    22 4. Apply the patch named DPDK_libtrace.patch included within libtrace, assuming we have copied this into the newly created IntelDPDK folder. This is required to allow libtrace to create shared libraries otherwise building libtrace will fail.
     364. Apply the patch named DPDK_libtrace.patch included within libtrace/Intel DPDK Patches, assuming we have copied this into the newly created IntelDPDK folder. This is '''required''' to allow libtrace to create shared libraries otherwise building libtrace will fail.
    2337
    2438{{{
     
    2640}}}
    2741
    28 5.Apply optional patches (For a specific card HW timestamping etc.. make sure changes are also made to libtrace defines where needed)[[BR]]
     425. Apply optional patches (For a specific card HW timestamping etc.. make sure changes are also made to libtrace defines where needed)[[BR]]
    2943
    30 6.Make the DPDK library
     446. Make the DPDK library
    3145
    3246{{{
     
    3549}}}
    3650
    37 7.Export RTE_SDK and RTE_TARGET
     517. Export RTE_SDK and RTE_TARGET
    3852
    3953{{{
     
    5973    ~sudo make install
    6074}}}
     75
     76= Advance Settings (Defines at the top of libtrace/lib/dpdk.c) =
     77''This is based upon testing using the Intel DPDK 1.3.1_7 and a Intel 82580 based Ethernet controller. Some of these settings are not supported by all controllers.''
     78
     79== NB_RX_MBUF - Number of memory buffers i.e. number of packets in the ring buffer ==
     80
     81Patch included libtrace/Intel DPDK Patches/larger_ring.patch
     82
     83NB_RX_MBUF controls the maximum number of packets the DPDK format can buffer at one time. In general the larger this is the lower the packet drop rate is (Ideally this becomes 0).
     84 
     85There is a limit placed on the NB_RX_MBUF of 4k per RX ring by the pmd driver. This is controlled by a define for the IGB driver it is located in IntelDPDK/lib/librte_pmd_e1000/igb_rxtx.c line 1063
     86
     87{{{
     88#define IGB_MAX_RING_DESC
     89}}}
     90
     91It appears this can be increased without any side-effects (except more memory usage). There is a limit of 65535 due to DPDK using a uint16_t to represent this size. In order to exceed this multiples queues would need to be used (not supported by libtrace). NOTE: 65535 itself cannot be used directly due to the alignment size however 65536 - ''alignment''(such as 128) can be used.
     92If you want to use this setting on your Intel NIC, check with the documentation to make sure there isn't a hardware limit placed on this value.
     93
     94== Capturing Bad Packets - Those with an ethernet checksum mismatch ==
     95
     96A minor change can be made to the pmd driver IntelDPDK/lib/librte_pmd_e1000/igb_rxtx.c that keeps packets with bad ethernet checksums which would otherwise be dropped by default.
     97Simply change rctl &= ~E1000_RCTL_SBP; to rctl |= E1000_RCTL_SBP;
     98
     99NOTE: Bad packets don’t appear to get timestamped, so this will cause problems if used with Hardware Timestamping because there is no way of knowing if a packet is bad or not and if a timestamp is sitting in front of the packet.
     100
     101== HAS_HW_TIMESTAMPS_82580 - Hardware Timestamping Packets (Implemented for Intel 82580 based NICs) ==
     102
     103To get a hardware timestamp from the Intel DPDK a change must be made to the pmd driver.
     104I’ve made a patch for Intel 82580 based NICs see libtrace/Intel DPDK Patches/hardware_timestamp.patch.
     105This must be first applied to DPDK and then set the HAS_HW_TIMESTAMPS_82580 define in dpdk.c to 1. Once applied the libtrace DPDK format can only be used with Intel 82580 Controllers.
     106Packets must be read by calling trace_read_packet within half of the hardware clocks wrap around time which for Intel 82580 controller is 18/2 seconds.
     107
     108In order to use timestamping the Intel NIC must support Receive Packet Timestamp in Buffer. This means the NIC will place the timestamp in a header before packet data. Libtrace then needs to correctly interpret this header things that need to be considered are:
     109* Clock resolution - convert this to nanoseconds
     110* Synchronizing with the current time - record the time of the first packet you've received and add this to all packets after it.
     111* Timer wrap around - Compare system time to that of the last packet received and estimate how many times the timer has (possibly) wrapped around then pick what makes sense.
     112* Consider what happens after the device is paused - You need to restart timestamps because the clock will be reset when starting it again.
     113The current implementation gets a system timestamp (hopefully via vsyscall) every time a packet is received. This could be done differently on a system that didn’t implement vsys calls by starting a background thread to increase a counter (i.e. do what estimated_wraps does) every 18 seconds when the clock is expected to wrap around. At this point you should get the system time to make sure you stay correctly in sync with it and the next sleep should be based on the difference.
     114
     115== GET_MAC_CRC_CHECKSUM ==
     116
     117This option can be turned on by setting the define GET_MAC_CRC_CHECKSUM to 1. This gets the full packet including the checksum. This is safe to turn on, however it should be noted when writing to native interfaces like int: and ring: it's assumed that there is no checksum.
     118
     119== USE_CLOCK_GETTIME ==
     120
     121Use get_clocktime() instead of gettimeofday() (nanoseconds vs microseconds). This should only be considered if clock_gettime() is a virtual system call for your system. One should remember that this timestamp is added by libtrace when trace_read_packet() is called so it's likely that the accuracy of this timestamp isn't close enough to hardware to support nanosecond accuracy anyway. If you require accurate timestamping to the nanosecond hardware timestamping is the only way to truly achieve this.
     122
     123NOTE: This setting has no effect if hardware timestamping is already being used.
     124
     125== Capturing Jumbo Frames ==
     126
     127Jumbo frames can be captured by setting the TRACE_OPTION_SNAPLEN using trace_config(). The size specified here excludes the checksum size and is limited to around 9k by most Intel NIC's.
     128TRACE_OPTION_SNAPLEN may be set to less than the maximum Ethernet packet size of 1514 however this setting will drop any packets that fall above that size.
     129So if snaplen was set to 100 then any packet over 100 bytes + 4 bytes (Ethernet CRC) will be dropped automatically by the NIC.