Changes between Version 4 and Version 5 of DPDKNotes


Ignore:
Timestamp:
09/05/14 17:38:31 (6 years ago)
Author:
rjs51
Comment:

Moved to github

Legend:

Unmodified
Added
Removed
Modified
  • DPDKNotes

    v4 v5  
    1 = Notes on Libtrace Intel Data Plane Development Kit (DPDK) Support =
    2 
    3 '''This format is considered experimental and has limitations that should be understood before using'''
    4 
    5 The Intel Data Plane Development Kit format allows packets to be captured in a truly zero copy manner and provides direct access to every packet with almost zero overhead. This means more CPU is left for your application to process the packet. Libtraces Intel DPDK capture format works in a very similar way to the DAG capture format.
    6 The format supports most Intel NIC's see the DPDK release notes pdf.
    7 
    8 Documentation and source code for the Intel DPDK can be downloaded from [http://www.intel.com/go/DPDK] the links are in a box at the bottom of the page.
    9 
    10 = System Requirements =
    11 * Gettimeofday() and/or clock_gettime() must be implemented as virtual system calls for your Linux kernel, these are called for every packet received so the advantage of using DPDK will be lost if a system call still has to be made.
    12 * DPDK is a polling format hence it is highly recommended to use a multicore system so other processes can be run on the remaining cores.
    13 * For better performance the CPU core that has DPDK bound to it should only be running DPDK as such interrupts could be disabled on this core.
    14 
    15 = Libtrace application requirements =
    16 * DPDK v1.5 or newer is required
    17 * The same thread must be used to create, start and read/write packets and all other calls to libtrace format dependent functions.
    18 * Minimal processing should be done on the thread interacting with libtrace and the DPDK format, for two main reasons:
    19   1. Packets will be dropped when queues fill up this applies to all formats
    20   2. The timestamping of DPDK packets occurs when trace_read_packet() is called (using gettimeofday()) so the longer packet processing takes the less accurate the timestamps are, unless hardware timestamping is being used.
    21 * When using the DPDK format the system should remain on at all times, don't put it to sleep or into hibernation.
    22 * There is a limitation of the DPDK format that only allows one trace to be created at any given time. This means only a single interface can be reading or writing (but not both) using the DPDK format at a given time. This does not stop other libtrace formats being used.
    23 
    24 
    25 = Basic Setup Guide for libtrace with Intel DPDK =
    26 '''It is strongly recommended that you build and test Intel DPDK with it's included samples and verify they are functioning correctly before attempting to use build libtrace with DPDK. '''
    27 
    28 1. Read the DPDK Getting Started Guide and make sure the prerequisites are met such as hugepages.
    29 2. Download DPDK from the Intel website [http://www.intel.com/go/DPDK] or dpdk.org [http://www.dpdk.org/]
    30 3. Extract the archive:
    31 
    32 {{{
    33     ~unzip DPDK-1.6.0-18 -d IntelDPDK
    34     ~cd IntelDPDK/DPDK-1.6.0
    35 }}}
    36 
    37 4. Apply optional patches (For a specific card HW timestamping etc.. make sure changes are also made to libtrace defines where needed)[[BR]]
    38 
    39 5. Make the DPDK library with the CONFIG_RTE_BUILD_COMBINE_LIBS=y and EXTRA_CFLAGS="-fPIC" added. This should create a the static library x86_64-default-linuxapp-gcc/libs/libintel_dpdk.a required by libtrace, note prior to DPDK v1.5 CONFIG_RTE_BUILD_COMBINE_LIBS is not supported and this lirbary will not be created.
    40 
    41 {{{
    42     ~make install T=x86_64-default-linuxapp-gcc CONFIG_RTE_BUILD_COMBINE_LIBS=y EXTRA_CFLAGS="-fPIC"
    43 }}}
    44 
    45 6. Export RTE_SDK and RTE_TARGET
    46 
    47 {{{
    48     ~export RTE_SDK=`pwd`
    49     ~export RTE_TARGET=x86_64-default-linuxapp-gcc
    50 }}}
    51 
    52 7. Set any advance options within libtrace if required if you have applied patches (defines at top of ./lib/format_dpdk.c)
    53 8. Configure and build - RTE_SDK and RTE_TARGET must be set in the environment for Intel DPDK to be detected
    54 
    55 {{{
    56     ~cd ../../libtrace-svn/
    57     ~./configure
    58     ~make
    59     ~sudo make install
    60 }}}
    61 
    62 9. Load the DPDK modules
    63 
    64 {{{
    65     ~cd $RTE_TARGET/kmod
    66     ~sudo modprobe uio
    67     ~sudo insmod ./igb_uio.ko
    68 }}}
    69 
    70 10. Use the pci_unbind.py tool (found in IntelDPDK/tools/) to bind the port you want to use to the igb_uio driver
    71 
    72 {{{
    73     ~cd ../IntelDPDK/DPDK-1.6.0
    74     ~sudo ./pci_unbind.py --status
    75     Network devices using IGB_UIO driver
    76     ====================================
    77     <none>
    78 
    79     Network devices using kernel driver
    80     ===================================
    81     0000:01:00.0 '82580 Gigabit Network Connection' if=eth1 drv=igb unused=igb_uio
    82     0000:01:00.1 '82580 Gigabit Network Connection' if=eth2 drv=igb unused=igb_uio
    83     0000:03:00.0 'NetXtreme BCM5754 Gigabit Ethernet PCI Express' if=eth0 drv=tg3 unused=<none> *Active*
    84 
    85     Other network devices
    86     =====================
    87     <none>
    88     ~sudo ./pci_unbind.py -b igb_uio 0000:01:00.0
    89     ~sudo ./pci_unbind.py --status
    90     Network devices using IGB_UIO driver
    91     ====================================
    92     0000:01:00.0 '82580 Gigabit Network Connection' drv=igb_uio unused=
    93 
    94     Network devices using kernel driver
    95     ===================================
    96     0000:01:00.1 '82580 Gigabit Network Connection' if=eth2 drv=igb unused=igb_uio
    97     0000:03:00.0 'NetXtreme BCM5754 Gigabit Ethernet PCI Express' if=eth0 drv=tg3 unused=<none> *Active*
    98 
    99     Other network devices
    100     =====================
    101     <none>
    102 
    103 }}}
    104 
    105 11. Test a libtrace tool here the pci address can be found with the pci_unbind tool
    106 
    107 {{{
    108     ~tracesummary dpdk:0000:01:00.0
    109 }}}
    110 
    111 
    112 = Advance Settings (Defines at the top of libtrace/lib/dpdk.c) =
    113 ''This is based upon testing using the Intel DPDK 1.3.1_7(No longer supported by libtrace) and a Intel 82580 based Ethernet controller. Some of these settings are not supported by all controllers.''
    114 
    115 == NB_RX_MBUF - Number of memory buffers i.e. number of packets in the ring buffer ==
    116 
    117 Patch included libtrace/Intel DPDK Patches/larger_ring.patch
    118 
    119 NB_RX_MBUF controls the maximum number of packets the DPDK format can buffer at one time. In general the larger this is the lower the packet drop rate is (Ideally this becomes 0).
    120  
    121 There is a limit placed on the NB_RX_MBUF of 4k per RX ring by the pmd driver. This is controlled by a define for the IGB driver it is located in IntelDPDK/lib/librte_pmd_e1000/igb_rxtx.c line 1063
    122 
    123 {{{
    124 #define IGB_MAX_RING_DESC
    125 }}}
    126 
    127 It appears this can be increased without any side-effects (except more memory usage). There is a limit of 65535 due to DPDK using a uint16_t to represent this size. In order to exceed this multiples queues would need to be used (not supported by libtrace). NOTE: 65535 itself cannot be used directly due to the alignment size however 65536 - ''alignment''(such as 128) can be used.
    128 If you want to use this setting on your Intel NIC, check with the documentation to make sure there isn't a hardware limit placed on this value.
    129 
    130 == Capturing Bad Packets - Those with an ethernet checksum mismatch ==
    131 
    132 A minor change can be made to the pmd driver IntelDPDK/lib/librte_pmd_e1000/igb_rxtx.c that keeps packets with bad ethernet checksums which would otherwise be dropped by default.
    133 Simply change rctl &= ~E1000_RCTL_SBP; to rctl |= E1000_RCTL_SBP;
    134 
    135 NOTE: Bad packets don’t appear to get timestamped, so this will cause problems if used with Hardware Timestamping because there is no way of knowing if a packet is bad or not and if a timestamp is sitting in front of the packet.
    136 
    137 == HAS_HW_TIMESTAMPS_82580 - Hardware Timestamping Packets (Implemented for Intel 82580 based NICs) ==
    138 
    139 To get a hardware timestamp from the Intel DPDK a change must be made to the pmd driver.
    140 I’ve made a patch for Intel 82580 based NICs see libtrace/Intel DPDK Patches/hardware_timestamp.patch.
    141 This must be first applied to DPDK and then set the HAS_HW_TIMESTAMPS_82580 define in dpdk.c to 1. Once applied the libtrace DPDK format can only be used with Intel 82580 Controllers.
    142 Packets must be read by calling trace_read_packet within half of the hardware clocks wrap around time which for Intel 82580 controller is 18/2 seconds.
    143 
    144 In order to use timestamping the Intel NIC must support Receive Packet Timestamp in Buffer. This means the NIC will place the timestamp in a header before packet data. Libtrace then needs to correctly interpret this header things that need to be considered are:
    145 * Clock resolution - convert this to nanoseconds
    146 * Synchronizing with the current time - record the time of the first packet you've received and add this to all packets after it.
    147 * Timer wrap around - Compare system time to that of the last packet received and estimate how many times the timer has (possibly) wrapped around then pick what makes sense.
    148 * Consider what happens after the device is paused - You need to restart timestamps because the clock will be reset when starting it again.
    149 The current implementation gets a system timestamp (hopefully via vsyscall) every time a packet is received. This could be done differently on a system that didn’t implement vsys calls by starting a background thread to increase a counter (i.e. do what estimated_wraps does) every 18 seconds when the clock is expected to wrap around. At this point you should get the system time to make sure you stay correctly in sync with it and the next sleep should be based on the difference.
    150 
    151 == GET_MAC_CRC_CHECKSUM ==
    152 
    153 This option can be turned on by setting the define GET_MAC_CRC_CHECKSUM to 1. This gets the full packet including the checksum. This is safe to turn on, however it should be noted when writing to native interfaces like int: and ring: it's assumed that there is no checksum.
    154 
    155 == USE_CLOCK_GETTIME ==
    156 
    157 Use get_clocktime() instead of gettimeofday() (nanoseconds vs microseconds). This should only be considered if clock_gettime() is a virtual system call for your system. One should remember that this timestamp is added by libtrace when trace_read_packet() is called so it's likely that the accuracy of this timestamp isn't close enough to hardware to support nanosecond accuracy anyway. If you require accurate timestamping to the nanosecond hardware timestamping is the only way to truly achieve this.
    158 
    159 NOTE: This setting has no effect if hardware timestamping is already being used.
    160 
    161 == Capturing Jumbo Frames ==
    162 
    163 Jumbo frames can be captured by setting the TRACE_OPTION_SNAPLEN using trace_config(). The size specified here excludes the checksum size and is limited to around 9k by most Intel NIC's.
    164 TRACE_OPTION_SNAPLEN may be set to less than the maximum Ethernet packet size of 1514 however this setting will drop any packets that fall above that size.
    165 So if snaplen was set to 100 then any packet over 100 bytes + 4 bytes (Ethernet CRC) will be dropped automatically by the NIC.
     1This page is no longer maintained. The libtrace DPDK Notes can now be found at [https://github.com/wanduow/libtrace/wiki/DPDK-Notes---Experimental our GitHub page]