Linux commands can turn a log into structured data (turn a TTN Gatesway log into a csv)

20 Jan 2024

Linux commands can turn a log into structured data (turn a TTN Gatesway log into a csv)

A bit of Linux play. How to turn an active unstructured log file into structured data.
image source: nightcafe

Like many programs, my The Things Network Gateway generates a verbose log file. A lot of information. Every so many (> 1000) lines, there is data that I want to track: when a Thing sends data to my Gateway. I want to use that data while the log is actively written to by the Gateway.

Line of interest (click to zoom):

I want to capture lines with that structure. The rest of the log (the vast majority) is not what I want to retrieve and process.

This is what I want to get out of that active log. Both on the console and in a .csv file:

rules

I only want to track Things that contact my gateway.
I can filter relevant lines that have this unique text (bold):
Jan 20 19:54:22 raspberryttn lora_pkt_fwd[23528]: INFO: Received pkt from mote: 260BCF0F (fcnt=0)
I am interested in the red parts: date and time, Thing identifier and message counter
These values should be written to a .csv file, with ';' as separator. And to the console.
Everything else should be ignored

Here is a Linux command line that will do this. I will break it apart in this post.
Where you see BAR, it is actually var. The e14 forum does not accept slash var slash.
This is one single command. The \ allows me to show it in 4 lines and make it fit to screen.

stdbuf -o0 tail -f /BAR/log/lora_pkt_fwd.log 2> /dev/null | \
stdbuf -i0 -o0 awk '/: INFO: Received pkt from mote:/ {print $1, $2, $3, $11,$12}' | \
stdbuf -i0 -o0 cut -d ' ' -f 1,2,3,4,5 --output-delimiter=';' | \
stdbuf -i0 -o0 sed 's/(fcnt=//g' | \
stdbuf -i0 -o0 sed 's/)//g' | \
stdbuf -i0 -o0 tee output.csv

It's a pipeline:

tail -f keeps reading the log file as it is written. It streams every new line that's written to the log.
The 2> /dev/null part takes care that I only forward log file content. Warnings and errors (E.g: when the log is being rotated by Linux every week) are ignored.
awk finds relevant lines and retrieves the fields that we need (date, time, identifier, counter). Any line that doesn't match our pattern is ignored.
cut changes the separator from a space to semicolon
sed: called twice to remove fixed text in the counter field that I don't want
tee sends the stream both to the standard out (console) and a csv file.
stdbuf -i0 -o0 (or stdbuf -o0): I disabled buffering. Used before every command in the pipeline. I want to real-time extract events as they are happening.
If I would process a megabyte log file at once, I would not use these. But I want to get every event as it is happening.

The output is a structured (character separated values) file.

Jan;20;21:34:47;D3FB2208;4938
Jan;20;21:35:52;0030D318;475
Jan;20;21:45:52;0030D318;476
Jan;20;21:46:52;A6BC7955;29136
Jan;20;21:54:52;260BCF0F;1
Jan;20;21:54:52;260BCF0F;2
Jan;20;21:56:22;0030D318;477
Jan;20;21:57:22;7D42C57D;20388
Jan;20;21:58:48;0048BAB4;802
Jan;20;21:58:48;2601643B;53122
Jan;20;22:05:34;0030D318;478
Jan;20;22:16:44;0030D318;479
Jan;20;22:16:44;FF007FF8;84
Jan;20;22:21:53;179BEEF2;50279
Jan;20;22:26:23;0030D318;480
Jan;20;22:35:53;0030D318;481

A different process can pipe this output into a processor script. This could be a database writer, an mqtt publisher, or a program that flashes an LED when an upload is received.

This is performant. I'm running it on a Pi 3. And it uses virtually no resources.

If I replace the tail command (reading only what gets added to the log by the gateway) with a cat command (dumping a full 40 Mb log into the pipeline), it finishes in a second.

why this way?

non-intrusive: Every command is standard Linux
non-intrusive. The command accepts the log file of the Gateway as-is. No software changes.
non-intrusive. Does not require additional installs.
non-intrusive. This isn't a program, or a shell script. It's just a command.
non-intrusive. Can be run by a generic user. No sudo needed.
the output file is also streamable in other processes. It can be consumed (E.g.: by another tail command) while this command is writing to it.

image: one process parses the raw Gateway log. A second process monitors the structured csv file in parallel.

shabaz 11 months ago

Great AI pic! That is actually a close visual representation to how my brain feels when working with Regexp's/sed/awk.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps 11 months ago

A bit of analysis. 83 contacts in 5 hours

sorted by amount of messages sent:

Except for my own Arduino, I do not know what these devices are. 030D318 is definitely a TTN device. It appears in the online console log. Some of them aren't. Those appear in my gateway log but not in TTN's console.

The DevAddr aren't the device IDs on TTN. This ID gets assigned when joining, and I can only check the ones for my own devices.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
DAB 11 months ago

Nice post Jan.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
Jan Cumps 11 months ago in reply to balajivan1995

Let's continue on the bug report. I think there are several things that need e14 community staff attention.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel
balajivan1995 11 months ago in reply to Jan Cumps

Oh, by the way I could not preview the GItHub Gist attachment when I made my comment in that post.
- Cancel
- Vote Up 0 Vote Down
- Sign in to reply
- More
- Cancel