Hello, I get a lot of PCIe port errors although they are corrected by the bus protocol thanks to (error correction) this makes me nervous. See logs below. I already disassembled the bolt, and tested with default settings from BIOS/UEFI and with no devices in NGFF/M2 slots connected just booting from network(grml) or eMMC(focal). The referred device is 0000:00:01.6: which is according to lspci: 00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] I have installed an Ubuntu Focal on NVMe and eMMC and when network booting I have a grml booted. Is someone seeing this too? What might be the cause? Any hints? Might this be a potential support case? best regards Logs: Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0 Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:46:59 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0 Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:02 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0 Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:03 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000080/00006000 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 7] BadDLLP Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=000000c0/00006000 Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:04 bolt kernel: pcieport 0000:00:01.6: AER: [ 7] BadDLLP Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0 Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:06 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: Multiple Corrected error received: 0000:00:01.0 Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:07 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0 Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID) Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: device [1022:15d3] error status/mask=00000040/00006000 Aug 02 12:47:10 bolt kernel: pcieport 0000:00:01.6: AER: [ 6] BadTLP Connected Devices when fully assembled: $ lspci 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Root Complex 00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.6 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:01.7 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 PCIe GPP Bridge [6:0] 00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus A 00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Internal PCIe GPP Bridge 0 to Bus B 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 61) 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51) 00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 0 00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 1 00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 2 00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 3 00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 4 00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 5 00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 6 00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raven/Raven2 Device 24: Function 7 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 03:00.0 Network controller: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] (rev 10) 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) 05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] (rev 83) 05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio Controller 05:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor 05:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 05:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raven USB 3.1 05:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/FireFlight/Renoir Audio Processor 05:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) HD Audio Controller 05:00.7 Non-VGA unclassified device: Advanced Micro Devices, Inc. [AMD] Raven/Raven2/Renoir Non-Sensor Fusion Hub KMDF driver 06:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 61)
Please see the BUS 6:0 in this thread (half way through) https://www.udoo.org/forum/threads/some-prelim-v8-v1605b-findings.27176/
@ccs_hello thanks for pointing me to this link, which post number you mean in specific? I do not see a direct relation, as PCIe bus 6 is not mentioned by the logs above. The logs are mentioning bus 0 with it' PCIe bridges. The PCIe bridges are high likely the endpoints for some of the attached hardware at the M.2 connectors, according to #12 in your mentioned thread. I get those messages, too, when no hardware is connected to those M.2 ports. Thats why I am confused by your answer.
Oh, yes thanks, I wrote that in my initial post. Are you seeing this, too? Or do you have ideas how to fix it?
No, I do not have the PCIe bus error issues. As you can see in my earlier picture, that 6:0 has all 3 M.2 slots' PCIe buses as well as Realtel Ethernet controller. BTW, do you see the same issues if using a different OS?
With Windows I cannot verify, I do not see similiar stuff in the event log, but I see those to with a network booted grml (grml.org).
Windows is in EVent Manager. Perhaps you can try unbuntu LiveUSB instead (and tail -f on dmesg.) My point is the CPU/APU SoC support on AMD Ryzen may not be current on certain OS. (May not be H/W issues.)
I investigated that a bit further and disconnected all devices in M.2 slots and tested them seperatly. In the M.2 2280: NVMe Samsung Evo 970 (focal) M.2 2260: Trancend SATA 512MB (Windows 10) M.2 2240: The Intel BT+Wifi Dongle from the Kickstarter Current hypthesis: It is the BT+Wifi Dongle issuing these messages. That's why I did not see it when booted from the Live CD (grml), because then I did not use and test Bluetooth and Wifi. When unblocking it via rfkill and manually issuing BT Scans or Wifi Scans I get those messages there, too. The same if I install an Focal to the eMMC device or boot an Focal Live installer and enable Wifi. So either it is the M.2 2240 slot or the BT+Wifi M.2 Card. Unfortunatly I do not have another dongle here to test, but I will organize one. Also I like to test if the same dongle issues errors on other PCs with M.2 slot, but thats more difficult. I am open to any other hints or ideas.