element14 Community
element14 Community
    Register Log In
  • Site
  • Search
  • Log In Register
  • About Us
  • Community Hub
    Community Hub
    • What's New on element14
    • Feedback and Support
    • Benefits of Membership
    • Personal Blogs
    • Members Area
    • Achievement Levels
  • Learn
    Learn
    • Ask an Expert
    • eBooks
    • element14 presents
    • Learning Center
    • Tech Spotlight
    • STEM Academy
    • Webinars, Training and Events
    • Learning Groups
  • Technologies
    Technologies
    • 3D Printing
    • FPGA
    • Industrial Automation
    • Internet of Things
    • Power & Energy
    • Sensors
    • Technology Groups
  • Challenges & Projects
    Challenges & Projects
    • Design Challenges
    • element14 presents Projects
    • Project14
    • Arduino Projects
    • Raspberry Pi Projects
    • Project Groups
  • Products
    Products
    • Arduino
    • Avnet Boards Community
    • Dev Tools
    • Manufacturers
    • Multicomp Pro
    • Product Groups
    • Raspberry Pi
    • RoadTests & Reviews
  • Store
    Store
    • Visit Your Store
    • Choose another store...
      • Europe
      •  Austria (German)
      •  Belgium (Dutch, French)
      •  Bulgaria (Bulgarian)
      •  Czech Republic (Czech)
      •  Denmark (Danish)
      •  Estonia (Estonian)
      •  Finland (Finnish)
      •  France (French)
      •  Germany (German)
      •  Hungary (Hungarian)
      •  Ireland
      •  Israel
      •  Italy (Italian)
      •  Latvia (Latvian)
      •  
      •  Lithuania (Lithuanian)
      •  Netherlands (Dutch)
      •  Norway (Norwegian)
      •  Poland (Polish)
      •  Portugal (Portuguese)
      •  Romania (Romanian)
      •  Russia (Russian)
      •  Slovakia (Slovak)
      •  Slovenia (Slovenian)
      •  Spain (Spanish)
      •  Sweden (Swedish)
      •  Switzerland(German, French)
      •  Turkey (Turkish)
      •  United Kingdom
      • Asia Pacific
      •  Australia
      •  China
      •  Hong Kong
      •  India
      •  Korea (Korean)
      •  Malaysia
      •  New Zealand
      •  Philippines
      •  Singapore
      •  Taiwan
      •  Thailand (Thai)
      • Americas
      •  Brazil (Portuguese)
      •  Canada
      •  Mexico (Spanish)
      •  United States
      Can't find the country/region you're looking for? Visit our export site or find a local distributor.
  • Translate
  • Profile
  • Settings
Raspberry Pi
  • Products
  • More
Raspberry Pi
Raspberry Pi Forum Help: Encountering CAN Bus Problem on SmartEdge IIoT Gateway
  • Blog
  • Forum
  • Documents
  • Quiz
  • Events
  • Polls
  • Files
  • Members
  • Mentions
  • Sub-Groups
  • Tags
  • More
  • Cancel
  • New
Join Raspberry Pi to participate - click to join for free!
Featured Articles
Announcing Pi
Technical Specifications
Raspberry Pi FAQs
Win a Pi
Raspberry Pi Wishlist
Actions
  • Share
  • More
  • Cancel
Forum Thread Details
  • State Suggested Answer
  • Replies 27 replies
  • Answers 17 answers
  • Subscribers 665 subscribers
  • Views 5311 views
  • Users 0 members are here
  • raspberry_pi
  • raspberrypi
  • smartedge iiot gateway
  • canbus
Related

Help: Encountering CAN Bus Problem on SmartEdge IIoT Gateway

khz_321
khz_321 over 5 years ago

Hi All,

 

I was wondering if anyone has had any experience using the CAN Controller on the SmartEdge IoT Gateway

The box has a Built-in MCP2515 CAN Controller on board, which is great & why I selected it for a project that requires CAN

 

For the most part, it works fine, but every so often I'm encountering "Garbage" data coming from the chipset (at a rate of about 1 message every 10 mins with a message cycle of 100ms)

This is shown below, when simply monitoring using candump and nothing else

 

image

the shown 67 EB in bytes 7 & 8 are the bits Garbage Data in question. It should be noted that when I do get the garbage, its always the same 2 Bytes and same values.

 

The sender device is broadcasting the same message repeatedly, so there should be no changes at all!

 

I was also able to confirm that it is not present on the physical bus by using a PCAN USB and datalogging with Pcan-View at the same time.

The data dump from Pcan-View shows no messages like the one highlighted!

 

Seems like this is happening on the Controller Hardware or Driver Implementation side on the IIoT Gateway

Anyone encountered this before? Either on this product or any other Raspberry Pi ?

 

I've tried adjusting the bit timing, sampling point, everything I could think of, but still encountering a random error every so often

 

Here is my CAN settings from /etc/network/interfaces

 

auto can0
iface can0 inet manual
   pre-up /sbin/ip link set can0 type can bitrate 250000 sample-point 0.875 triple-sampling on
   up /sbin/ifconfig can0 up
   down /sbin/ifconfig can0 down

 

And here is the output of ip -details -statistics link show can0

ip -details -statistics link show can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0 
    can <TRIPLE-SAMPLING> state ERROR-ACTIVE restart-ms 0 
  bitrate 250000 sample-point 0.875 
  tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
  mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
  clock 8000000
  re-started bus-errors arbit-lost error-warn error-pass bus-off
  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped overrun mcast   
    17224264   2153033  0       1508    0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    16         2        0       0       0       0

 

Note: Kernel Version 4.14.79-v7+, & Software Image: avnet_iot_demo_20190719a_shrink

(Also tried with latest release avtse-iiotg-v-11-20200326.img, which apparently updated the MCP2515 driver, but problem still persists)

 

Any advice? 

  • Sign in to reply
  • Cancel
Parents
  • Jan Cumps
    0 Jan Cumps over 4 years ago

    I have never seen this happen.

    I have a smartedge on my desk here, attached to a single CAN device. I haven't seen bogus traffic at Linux level.

    I'll leave it running tonight with candump active. Let's see if I received something unexpected.

    I'm using image avnet_iot_demo_20190719a_shrink.

     

    Config:

    auto can0
    iface can0 inet manual
       pre-up /sbin/ip link set can0 type can bitrate 500000
       up /sbin/ifconfig can0 up
       down /sbin/ifconfig can0 down

     

     

    Log (rebooted to have a clean test start point):

    avnet@smartedge:~ $ ip -details -statistics link show can0
    4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
        link/can  promiscuity 0
        can state ERROR-ACTIVE restart-ms 0
              bitrate 500000 sample-point 0.875
              tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
              mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
              clock 8000000
              re-started bus-errors arbit-lost error-warn error-pass bus-off
              0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
        RX: bytes  packets  errors  dropped overrun mcast
        0          0        0       0       0       0
        TX: bytes  packets  errors  dropped carrier collsns
        0          0        0       0       0       0

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to Jan Cumps

    I didn't get any bogus traffic with a working circuit.

    I deliberately sent one command on the bus after 12 hours waiting, and it was reported by  candump.

    Nothing else happened during the 12 hours.

     

    image

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to Jan Cumps

    We've seen similar behaviour, in which the last two bytes of some 8 byte CAN data packets are corrupted.

     

    We have tested this using two SmartEdges connected via CAN, one transmitting with cansend, the other recording with candump.  Below are diffs between the send file and received file (stripped of timestamps) for two tests.

     

    There are a total of 9027 packets sent over 10s.  I can mail the files through if it will help.

     

    Any suggestions for a workaround or fix?

     

    Here's the result of one run:

      --- sender-nots.log     2021-06-28 15:34:39.658592972 +0200
      +++ msgs2.txt   2021-06-28 15:33:25.089640922 +0200
      @@ -3932,7 +3932,7 @@
       can0 376#24655A31785B795F
       can0 377#573C705D452B5875
       can0 379#7A00
      -can0 380#3132333435363738
      +can0 380#3132333435367000
       can0 381#6162636465666768
       can0 351#E80372064E048C07
       can0 355#60004A002013
    

    and here's the other:

      --- sender-nots.log     2021-06-28 15:34:39.658592972 +0200
      +++ msgs1.txt   2021-06-28 15:33:15.039784507 +0200
      @@ -4275,7 +4275,7 @@
       can0 356#2CAE96FE5802
       can0 35E#4672656564574F4E
       can0 35F#106144EA
      -can0 370#46726565646F6D20
      +can0 370#46726565646F6E00
       can0 371#4C69746500000000
       can0 372#0500040001000700
       can0 373#E709380B0E01F000
      @@ -8199,7 +8199,7 @@
       can0 371#4C69746500000000
       can0 372#0500000005000100
       can0 373#9F033B020E015101
      -can0 374#592575716D3B4541
      +can0 374#592575716D3B6E80
       can0 375#2A4D2E4C5F7D5277
       can0 376#5D6760413044613C
       can0 377#594C5B5043774429
    
    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • michaelkellett
    0 michaelkellett over 4 years ago in reply to roryyorke

    Here's another thought, useful only depending on your level of desperation.

     

    The MCP2515 is connected to the micro via SPI. If you have a scope/logic analyzer which can decode and trigger on SPI you could look for the bad message in the data stream from MCP2515 to the micro.

    Or you could build an SPI sniffer using a different micro or even an FPGA to monitor the inter-chip data.

     

    It might be worth carefully looking at the SPI signals for integrity issues.

     

    I once had a problem between and FPGA and the SPI bus on a ST micro.

    Since I had good control of both ends I was able to get a scope trigger on the fault and about 1 byte in 10E7 the micro did not correctly capture the data that was on the bus.

    In that case my workaround was to packetise the data and add CRCs to the packets so I could re-try a bad packet - not an answer if your problem is between MCP2515 and processor.

     

    If the code is open source you might be able to intercept the bad message in the drive code and make a trigger signal.

     

    Do Avnet have an answer or comment ?

     

    MK

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to michaelkellett

    michaelkellett  wrote:

     

    ... If the code is open source you might be able to intercept the bad message in the drive code and make a trigger signal...

    It is. The Linux CAN library with the command line utilities - can_utils - is available, and Avnet's CAN implementation on top of Raspberry OS too:

     

    raspberrypi-industrial-gateway

    Source code used for Avnet Raspberry Pi Industrial Gateway

     

     

     

    changes from initial release.

    Updates to CAN mcp251x driver fixing a problem after everyother reset.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to Jan Cumps

    Thanks for the suggestions on looking at the SPI, and on the pointers to the kernel code.

     

    We were prompted to investigate using cansend and candump by seeing similar corruption (final two-bytes of 8-byte data packets) when using the python-can library, so the bug is unlikely to be in can-utils.  In this (more realistic) case, the packet rate is low, and the corruption is repeatable, much like the case reported by the original poster.

     

    I would imagine SPI signal integrity errors would give more non-deterministic errors, i.e. corruption not always in the final two bytes of a data packet?

     

    Unfortunately it's not that easy to detect this particular error from the data; the corruption results in a possibly valid signal.

     

    We are now testing a workaround of unloading and reloading the mcp251x and can_dev kernel modules, to see if that resets driver state which might let us avoid the bug.

     

    For what it's worth, we have to unload and reload the modules *twice*, due to the "CAN doesn't start every other boot" bug; something like this might be a (crude!) workaround for anyone not wanting to update to the latest image.

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Cancel
  • michaelkellett
    0 michaelkellett over 4 years ago in reply to Jan Cumps

    Thanks Jan Cumps ,

     

    I had a quick look at the MCP251x driver.

    Interesting - not the kind of code you would expect to find in a car or plane.

     

    eg:

     

    do {

            mdelay(MCP251X_OST_DELAY_MS);

            reg = mcp251x_read_reg(spi, CANSTAT);

        } while (!reg && retries--);

     

    I haven't got time now but I'll run that loop through PCLint and see how many whinges it gives.

     

    @ roryyorke

     

    I thought maybe you could look for a repeating pattern like the OP had seen with the 67 EB

     

    MK

    • Cancel
    • Vote Up +2 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to michaelkellett

    The kernel module unload-load workaround didn't help.

     

    We have found something interesting; when a packet is corrupted, the final two bytes are a bit-shuffled variant of the packet arbitration ID.

     

    Specifically, the last two bytes become, interpreted as a little-endian number,  `(arbitration_id & 0x7) << 13) + (arbitration_id >> 3)`

     

    We're using 12-bit arbitration IDs; I don't know if there's similar determinism with extended IDs.

     

    We've verified the above with simple Python 3 scripts; I don't see an "attach" option here, but I'm happy to mail the code to anyone who wants it.  You will need two SmartEdges to use these scripts "as-is".

     

    Latest result I have from running these scripts is, of 1401502 packets received, 1888 were bad, and all 1888 bad packets match the bit-shuffle pattern above.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to roryyorke

    roryyorke  wrote:

    ... I don't see an "attach" option here, but I'm happy to mail the code to anyone who wants it.  You will need two SmartEdges to use these scripts "as-is"...

     

    You can't attach code to comments or to someone else's post. You can upload them to gist.github.com, and link here.

     

    I have 2 SmartEdges and should be able to replicate the test.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to Jan Cumps

    I've put the code at https://github.com/roryatario/smartedge-can-tester

     

    The scripts need the python3-can package ("apt install ...").

     

    Run "python3 sendsimple.py" on one SmartEdge, and "python3 recvsimple.py" on the other; recvsimple will print number of total, bad, and bad-matching-pattern packets once a second.

     

    You might need to increase the transmit queue on the sender (`ip link set can0 txqueuelen 50`).

     

    FWIW, I've just confirmed that I see this bug in a candump recording from packets sent from a completely different device, so the error occurs on the receiving side.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Cancel
Reply
  • roryyorke
    0 roryyorke over 4 years ago in reply to Jan Cumps

    I've put the code at https://github.com/roryatario/smartedge-can-tester

     

    The scripts need the python3-can package ("apt install ...").

     

    Run "python3 sendsimple.py" on one SmartEdge, and "python3 recvsimple.py" on the other; recvsimple will print number of total, bad, and bad-matching-pattern packets once a second.

     

    You might need to increase the transmit queue on the sender (`ip link set can0 txqueuelen 50`).

     

    FWIW, I've just confirmed that I see this bug in a candump recording from packets sent from a completely different device, so the error occurs on the receiving side.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Cancel
Children
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to roryyorke

    I've done the hardware setup.

    image

     

    The MicroChip analyzer only snoops traffic. It doesn't interfere.

    I'll now install the Python libs and check the test scripts.

    image

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • michaelkellett
    0 michaelkellett over 4 years ago in reply to Jan Cumps

    Lucky you have all the right gear to replicate the problem.

     

    Good hunting.

     

    MK

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Cancel
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to Jan Cumps

    Enabled the interface on both - tested with and without changing the txqueuelen:

     

    sudo ip link set can0 up type can bitrate 125000

    sudo ifconfig can0 up

    sudo ip link set can0 txqueuelen 50

     

    Then first box send, 2nd box receive

     

    python3 sendsimple.py

    python3 recvsimple.py

     

     

    without tx queue change

    image

     

    with buffer length 50:

    image

     

    The CAN analyzer does not detect any traffic errors. The data on the bus seems OK.

    image

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to Jan Cumps

    Thanks for repeating the test.

     

    The txqueuelen is only be needed if, on the sender side,  you get "buffer full" errors; I may have used it when I didn't have the `time.sleep(0.0001)` call in the sender loop.

     

    In your first run (txqueuelen=0) we can see a replication the behaviour I described: there's one bad packet (at 1595 packets), and, from the second "1" on that line, it matches the corruption pattern.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • Jan Cumps
    0 Jan Cumps over 4 years ago in reply to roryyorke

    You could log an issue in github, on the repository where the CAN code sits.

    • Cancel
    • Vote Up 0 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
  • roryyorke
    0 roryyorke over 4 years ago in reply to roryyorke

    Thanks for all the help.  Issue at https://github.com/Avnet/smartedge-iiot-gateway/issues/12  (I tagged you in it, hope you don't mind.)

    • Cancel
    • Vote Up +1 Vote Down
    • Sign in to reply
    • Verify Answer
    • Reject Answer
    • Cancel
element14 Community

element14 is the first online community specifically for engineers. Connect with your peers and get expert answers to your questions.

  • Members
  • Learn
  • Technologies
  • Challenges & Projects
  • Products
  • Store
  • About Us
  • Feedback & Support
  • FAQs
  • Terms of Use
  • Privacy Policy
  • Legal and Copyright Notices
  • Sitemap
  • Cookies

An Avnet Company © 2025 Premier Farnell Limited. All Rights Reserved.

Premier Farnell Ltd, registered in England and Wales (no 00876412), registered office: Farnell House, Forge Lane, Leeds LS12 2NE.

ICP 备案号 10220084.

Follow element14

  • X
  • Facebook
  • linkedin
  • YouTube