Blog #5: Torturing Thermal Switches –Temperature Measurement Complexities, Switching Deviations, Data Analysis & Interim Results

13 Feb 2022

Blog #5: Torturing Thermal Switches –Temperature Measurement Complexities, Switching Deviations, Data Analysis & Interim Results

Having applied the thermal switches to a simple project and various short torture experiments, it is now time to revisit the main event – the cycle torture experiment. In order to understand the data better, I will have to delve in to some complexities with temperature measurements, siphon off some data from a life experiment, write some code to summarise this data and do some interim analysis, explaining the reason for observed switching temperature deviations.

The Problem with Temperature
Grabbing Data “On-the-Fly”
Dealing with a Data Deluge
Interim Results
- OHD1-50B
- M-TRS5-60B
Switching Temperature Deviation
Conclusion
[The Torturing Thermal Switches Blog Series Index]

The Problem with Temperature

On the face of it, temperature seems like an ordinary scalar quantity that can be measured directly. But in reality, there are a number of fine nuances with regards to temperature measurement that are important to grasp, as you may not be measuring what you think you are measuring!

Before we start, perhaps a quick Physics 101 recap – my apologies if the terminology is not perfect, but it should suffice for this explanation. Temperature is a measurement of thermal energy which flows from high heat concentration to low heat concentration. It does so by three methods – conduction (e.g. going through a metal bar and heating it up), convection (e.g. heating up the air which creates currents that carry the heat away) and radiation (e.g. emitting long-wave infrared). How quickly something heats up or cools down depends on a few factors – one is its thermal mass (how much energy is needed to raise the temperature), its geometry (which affects its surface area), the airflow (which affects the efficiency of convection), its thermal resistance (how much it slows down the passage of heat going through the material) and emissivity (ability to radiate heat). As a result, it can be already deduced that temperature change is not instantaneous (e.g. thermal mass can be modelled as an electrical capacitance, thermal resistance can be modelled as an electrical resistance) and is the result of many competing processes by which thermal energy is flowing into or out of an object.

A quick sketch of my testing set-up may serve to clarify these points. In the heating regime (to the left), the resistor is developing heat which is diffusing out towards the thermocouple through the thermal switch. The direction of heat-flow is mostly from left to right, with heat being lost to the surroundings continuously. At the point of switching, say 50°C, the switch cuts power to the resistor as it “feels” a temperature of exactly 50°C. But because there is a thermal resistance from the heat source to the switch and a thermal mass of the switch which serves to slow down the temperature change, the temperature of the switch is not exactly the temperature of the heat source which may have overshot slightly to 51°C. Similarly, the thermocouple which is on the other side may actually feel less than 50°C as the heat has not made its way through the thermal switch evenly to the other side yet, and there is a thermal resistance from the surface of the switch to the thermocouple. As a result, the temperature that is measured may not be the actual temperature experienced by the component.

This situation in the case of cooling is slightly different. The heat flow when in the cooling phase radiates outwards in mostly all directions, possibly with a slight bias to the heat source (not shown) which may have a slightly higher initial temperature. As it cools down, because the thermal switch is in the “middle” of the bundle, the heat escaping has to pass thermal resistances in part through the heat source and thermocouple, as well as the surrounding heat-shrink to the outside space. As a result, its core temperature may be higher than the measured temperature of the thermocouple.

The speed at which temperature changes is an important factor, as the faster the heating or cooling occurs, the bigger the effect of the thermal resistance and thermal mass becomes, likely resulting in increasing error in measurement as the sensor is not directly co-located inside the package. This can also be complicated by the intrinsic response time of the thermal switch and Thermolite material which may not be able to cycle extremely quickly.

This is why maintaining a reasonable cycle time and flooding the assembly with thermal compound to improve the thermal conductivity while potentially increasing the thermal mass is helpful. It should even out the temperature by providing more paths for heat to enter and leave the package but also serve to dampen severe changes in temperature (be they caused externally or by the heating resistors).

Finally, it is worth considering the absolute accuracy of the measurement. Thermocouples, for example, develop a very small voltage in proportion to the temperature at the junction. This process is usually quite accurate, but deviations can occur due to material impurities which can grow worse over time as the material properties can change through continued exposure to high temperatures. Errors can also creep in from the acquisition hardware that measures these small voltages. The problem is that the measured voltage doesn’t directly correspond to the temperature – it must be scaled but also compensated by a reference junction as the thermocouple voltage actually measures temperature difference between the junction and the terminals. When all is considered, it is possible to find about 1°C error in the thermocouple material and about 1.5°C error in the acquisition and reference junction. As a result, getting readings within about 2°C on an absolute temperature basis is probably the limit of many affordable thermocouple systems, with relative accuracy being much better although thermocouple material-induced drift may still be a concern at very high temperatures. For the purposes of this experiment, I felt this was an appropriate technology to use and had the best compromise.

There are other options, of course. Generally speaking, simple bead-type thermistors are usually not highly accurate, usually down to a degree or two with a fairly limited range and very non-linear response. These are not usually used in precision applications, but are very inexpensive.

Resistance Thermal Devices (RTDs) are usually the go-to choice, being made out of platinum, are linear and more accurate down to an error of just 0.05°C for the highest-grade sensors. With higher accuracy comes higher cost, with basic RTDs starting at upwards of US$20 a piece. These are physically larger, usually come in a cylindrical or planar shape and have more thermal mass than thermocouples which can affect their response time. Using them properly requires making a three or four-wire connection and also understanding that the measurement of an RTD actually creates a small amount of heat inside the RTD itself which could affect the results of highly sensitive experiment. While I would have liked to use this, the cost would have been prohibitive.

Perhaps some may consider the use of infrared thermometers or more costly thermal imaging cameras for such applications, but these are potentially even more problematic. They do not measure temperature directly, but measure the infrared radiation emitted from the surface of the device. Without a highly accurate emissivity value, which is influenced by the type of material and finish, the displayed temperature can be significantly in error. Even when a good emissivity value is used, the accuracy is still not as high as an RTD, but does have the advantage of being able to spatially resolve the temperature distribution. Unfortunately, supply chain shortages and high demand due to COVID-19 have made such devices difficult to obtain.

Who said measuring temperature was easy?

Grabbing Data “On-the-Fly”

The experiment is controlled by a pyvisa-based Python program I wrote in Blog #3 running on a Harting MICA Industrial Linux Computer running Debian Linux. The MICA is equipped with a microSDXC card for external storage, where the results of the experiments are currently being logged.

Thanks to how Linux performs file-locking, it is possible to read a file as it is still open for writing. This means it is possible to “grab” the logged data while the experiment is still progressing without stopping the experiment or corrupting the written data.

Ordinarily, you could probably just copy the file to a new file using the cp command and then operate on the copy safely. That will usually make a copy without locking the file in the process. Opening in an editor, however, would not be safe as that will probably request an exclusive lock on the file, thus preventing writes to the file. The safety of using SCP to copy the test file to a remote host over SSH is not established.

Not wanting to risk any of the progress I have already made, I decided against using a file copy. This is because the large data log will result in a rather large interleaved read/write request to the microSDXC card which is likely to cause a long “wait” for I/O operations. This could mean that my experiments will lose data as they wait for the card to be available for a write. In the case of this experiment, I deliberately avoided calling flush() after each sample so that writes can be cached and more efficiently dispatched to the card, preferring to lose a small chunk of data in case power is lost or the experiment is interrupted in a non-graceful manner. The other disadvantage is that such a copy would also make writes to the microSDXC card which will wear it out.

I could try SCP or SMB, however, my concern was the encryption overhead and the potential for file-locking. The MICA is limited in its capabilities, having just a single 1GHz ARMv7 CPU, which has almost 40% of its capability used up in running two experiments in parallel. Adding encryption burden to this would probably result in the experimental processes becoming CPU-starved, reducing their sample rates and affecting experimental results.

Instead, I decided to go to a crude but tried-and-true method – netcat. Think of this as opening a raw TCP listener on the target computer which dumps all the received data into a file, with a corresponding raw TCP sender on the MICA that will have the experimental data cat into the process. No encryption, no file copies, minimal CPU overhead.

This can be achieved by first starting up the listener with:

nc -l <port> > <outfile>

On the sender, you execute the following:

cat <infile> | nc <dst-addr> <port>

The command will not show anything while transfer is occurring, but will return to the prompt once completed.

Running this command on the morning of the 11^th February 2022 resulted in a file of 746MiB being transferred. This corresponds to a data size of about 78.236MiB per day, slightly less than the 85MiB that I previously estimated. Before you ask why this tally was not done “on the day of the blog post” – I tend to prepare parts of my blog posts in advance and post them once they are ready, so hence the data extract and interim analysis was done a few days prior to the blog posting.

Dealing with a Data Deluge

The log-file consists of comma-separated values where each line consists of a UNIX timestamp, the temperature of the first unit, the voltage across the first unit, the temperature across the second unit and the voltage across the second unit. A line is recorded as long as any of the four readings is different, as the Modbus-TCP protocol does not offer a way to only obtain changes, so my script polls continuously and will detect changes, which does consume quite a bit of CPU time. On average, a line is recorded every 96ms (about 10 lines per second), thus the file grows quite large to the point that many ordinary tools will run out of memory attempting to load it.

In the previous section, I have identified that the values of interest are the temperature at the switching point and the minimum and maximum temperature of a cycle as those are most illustrative of the behaviour of the switch. Extracting per-cycle time is also useful as it allows for extrapolation of how many cycles may be completed in a given amount of time.

As a result, I wrote a simple Python program that processes the data log line-by-line, adding them into a temporary array and identifying the relevant data points, writing output files that contain just this summarised data in the form of cycle number, cycle time, up-switching temperature, maximum temperature, down-switching temperature, minimum temperature. It has fixed output file names, no real error checking, assumptions about data formats and could probably be more efficient as it just goes through the file twice – once for each device. But the most important outcome is that it works and is accurate.

# Data Log Analysis Program for Experimenting with Thermal Switches
# by Gough Lui - February 2022
# Intended only for use with CSV data files generated by my experiment!
# Very little in the way of error checking - use at own risk!

fn = input("File to Analyse? ")
for ch in range (0,2) : # Iterate through file twice!
  f = open(fn,"r")
  g = open("output-"+str(ch)+".csv","a") # Fixed output filenames!
  cline = f.readline()
  if len(cline) == 0 :
    print("Zero Length File!") # In case wrong file selected
    exit()
  rdgt = [float(cline.split(",")[0])]
  ch1t = [float(cline.split(",")[1])]
  ch1v = [float(cline.split(",")[2])]
  ch2t = [float(cline.split(",")[3])]
  ch2v = [float(cline.split(",")[4])]
  cphase = 0
  scycles = 0
  cyclet = 0
  initt = float(cline.split(",")[0])
  supt = 0
  sdnt = 0
  mupt = 0
  mdnt = 0
  g.write("Cycle,CycleT,SwitchTUp,MaxTUp,SwitchTDown,MinTDown\n")
  while True:
    cline = f.readline()
    if len(cline) == 0 :
      break
    crdgt = float(cline.split(",")[0])
    cch1t = float(cline.split(",")[1])
    cch1v = float(cline.split(",")[2])
    cch2t = float(cline.split(",")[3])
    cch2v = float(cline.split(",")[4])
    # If no change of state has occurred, add the values onto the end of the list
    if (cphase == 0 and cch1v < 2.5 and ch == 0) or (cphase ==1 and cch1v >= 2.5 and ch == 0) \
        or (cphase == 0 and cch2v < 2.5 and ch == 1) or (cphase ==1 and cch2v >= 2.5 and ch == 1) :
      rdgt.append(crdgt)
      ch1t.append(cch1t)
      ch1v.append(cch1v)
      ch2t.append(cch2t)
      ch2v.append(cch2v)
    else : # Change of state has occurred, collect data and (if appropriate) print result to file
      if cphase == 0 :
        if ch == 0 :
          supt = cch1t
          mdnt = min(ch1t)
        else :
          supt = cch2t
          mdnt = min(ch2t)
        cphase = 1
        if scycles > 0 :
          g.write(str(scycles)+","+str(cyclet)+","+str(supt)+","+str(mupt)+","+str(sdnt)+","+str(mdnt)+"\n")
        scycles = scycles + 1
      else :
        if ch == 0 :
          sdnt = cch1t
          mupt = max(ch1t)
        else :
          sdnt = cch2t
          mupt = max(ch2t)
        cphase = 0
        cyclet = crdgt-initt
        initt = crdgt
      # Clean-up lists to start a new cycle
      rdgt=[crdgt]
      ch1t=[cch1t]
      ch1v=[cch1v]
      ch2t=[cch2t]
      ch2v=[cch2v]
  f.close()
print("All Done!")

Of course, this is far from the final version of the program, but it serves to at least turn the large data dumps into summarised values which can be more easily handled by ordinary tools like Microsoft Excel. I anticipate the major change I will make for the final analysis is to also gather the average switch-closed voltage which may indicate trends in contact resistance over cycles, assuming the B&K Precision DAS240-BAT Multi-Channel Recorder is both sensitive and stable enough. If it is not, I still intend to deconstruct the test set-up and re-measure the contact resistance with the B&K Precision BA6010 Battery Analyser using the same settings that were used to establish the baseline contact resistance in Blog #2.

Interim Results

The data dump from the morning of the 11^th February 2022 revealed that the tested OHD1-50B had accumulated 29,518 cycles while the tested M-TRS5-60B had accumulated 11,313 cycles in approximately nine-and-a-half days of continuous test time. In both cases, this is more cycles than an ordinary power relay would usually have in terms of electrical cycle rating (10,000) and represents good progress towards my goal of stressing the switches to a high number of cycles.

A typical cycle looks like the above when viewed up close. In the figure, I have highlighted the points where the switch transitions from one state to another and the measured temperature from the thermocouple. A few things become obvious – the thermal lag that I mentioned early is clearly seen and the length of the thermal lag is different on the heating and cool-down, as predicted. As a result, the “simple” measurement of the temperature at the point the thermal switch changes state is really not all that instructive, and in fact, results in a negative computed temperature differential (Td) or hysteresis. The actual switching temperature is somewhere in-between the temperature at the state change and the maximum or minimum for the cycle, although perhaps closer to the latter than the former. As a result, the data analysis will probably focus on the cycle temperature range (maximum minus minimum) as a proxy measurement for the temperature hysteresis.

OHD1-50B

The per-cycle temperature switching points and maximum/minimum temperatures are plotted in the above graph. Daily diurnal temperature variation is visible in the trend, which is not unexpected as that will influence the rate of heat loss from the experiment which will most significantly impact on the rate of cooling. There is also some fast temperature variations, especially in the earlier days, as I did not realise that the open window in the room allowed air to directly blow past the test setup, causing rapid convective cooling. Each of the temperatures measured appears quite stable, not varying more than about 1°C. For a switch with an absolute rating of 50°C, the measured switching at about 45-46°C seems a little low, but that is likely due to thermal lag. The measured cycle maximum ranged between about 46-47°C which is more reasonable, but this is still perhaps a little low because of a reason that is presented in the next subheading.

This is most clearly illustrated in the temperature differential plot. Both the naive switching-point and more representative cycle-range values are calculated and despite the more messy movements in the previous plot, this shows a very stable performance where the measurement noise seems to show. The readings were within about 0.25°C over time and remain stable, suggesting the switching point is very well controlled. The cycle range value is 0.8°C, which implies a very small hysteresis for this 50°C-rated switch in this application. This may be because of the relatively small body of the switch which probably does not have much in the way of thermal mass or thermal resistance from package to Thermolite material slowing down the response.

Cycle times averaged about 28 seconds. This is a reasonable amount of time and should ensure valid results. When testing relays, it is customary to have, at most, a cycle every two seconds.

M-TRS5-60B

The 60°C switch saw a very similar trend in results showing diurnal variation but overall stability in the long term with a relatively tight window for each type of reading remaining within 1°C or so. This is fascinating, especially considering the absolute accuracy of the thermocouple set-up is expected to be within about 1.5°C in part due to accuracy limitations of the thermocouple, measurement device and reference junction. The direct switching threshold was recorded at about 54-55°C, again a bit low for the switch, but the cycle maximum was a more reasonable 55-57°C. An additional reason is alluded to in the next sub-heading of this post.

Extracting the cycle temperature differential again makes it clear that the thermal switch is very consistent. Most values were within 0.25°C of the adjacent readings in the short term, with a cycle-range of about 2.8°C which is a larger hysteresis than the 50°C switch. This may be because of the larger thermal resistance and thermal mass from the larger package.

Cycle times averaged 73 seconds, which is noticeably slower. This is in part because the heating power is determined by the current limit, with the higher temperatures required also increasing the heat-loss rate, thus resulting in slower heat-up. The larger switch and more numerous resistors also potentially increase thermal mass, slowing temperature changes in both directions. As a result, the experiment is expected to have fewer cycles on this switch due to the slower cycle time in this configuration.

Switching Temperature Deviation

One thing that may be immediately apparent is that the switching temperature seems a little lower than expected. Aside from the complications with temperature measurement noted in the first part of this blog, there is a second potential complication which I failed to spot until after the experiment had begun.

Remember when I substituted tinned copper wire which was not strong enough to bind the apparatus together with steel fencing wire? It had entirely slipped my mind that the reed switch uses magnets, thus having ferromagnetic material will distort the magnetic field and affect the switching temperature.

The only reference I saw was in the TRS screw type datasheet, which I’ve adapted above, but it indicates that a 3.3mm thick sheet with zero distance is expected to pull the break-type sensor threshold temperature down by approximately 4°C. In my case, I may not be using such a thick sheet, but I do “wrap” the wire around the body of the switch three times, so a similar effect may be expected.

Unfortunately, having already set-up the experiment, I did not feel it would be good to change the experiment now as that would severely limit the number of cycles achievable. While copper is not strong enough, I fear that using plastic zip-ties would be affected by the heat, and I don’t have any other readily available options. Perhaps an aluminium clip or non-magnetic stainless-steel clip would be the best option if I were to redesign the experiment. However, if the alignment of the fencing wire bands is maintained perfectly through the thermal cycles, the effect on the switching temperature should be a constant – this cannot be guaranteed, however, due to the thermal cycling also resulting in material expansion and contraction that could cause fatigue that might result in a slight shift over time. I should still be able to achieve a result on durability, although the switching threshold stability may exhibit some variation due to confounding factors.

Conclusion

It seems that the infrastructure set-up in Blog #3, while not perfect due to the unaccounted interference due to the steel-wire used to bind the set-up together, is working well to allow for the switches to be tortured without user intervention, logging data around the clock. This post explores the technicalities with interpreting this data and manages to “siphon” some of this data without disturbing the collection process. By doing so, I can demonstrate a simple bit of code that I use to deal with the deluge of data and summarise it into the most important values – the temperature at each switching transition, the maximum/minimum cycle temperatures, and cycle time. The output of this code is used to provide the interim results which already demonstrate a significant number of cycles have been accumulated without failure. The program may still be modified to summarise more data – for example, the switch-closed voltage which may indicate trends in contact resistance.

From hereon in, the experiment will continue to run unattended, collecting more data for the final blog which will arrive just before the deadline to provide the maximum amount of cycle data for this design challenge. As a result, unless I think of other interesting experiments to conduct with the remaining (undamaged) thermal switches, there will probably not be any blogs to be posted in the following few weeks. This would be perfect timing for me as well, as I will be embarking on a major RoadTest review at around this time.

Until the next blog, feel free to leave a comment in case you have any questions or ideas!
(or read the blogs from other Experimenting with Thermal Switches Design Challenge participants)

Hope to see you in the final blog in mid-March where I will present the full results of the torture test and a summary of all the learnings accrued throughout this design challenge.

[The Torturing Thermal Switches Blog Series Index]

Top Comments

DAB over 3 years ago +1

Good test data. Given these are cut off switches, I think the characteristics are about what you should expect. I look forward to your torture data.

Parents

DAB over 3 years ago

Good test data.

Given these are cut off switches, I think the characteristics are about what you should expect.

I look forward to your torture data.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Comment

DAB over 3 years ago

Good test data.

Given these are cut off switches, I think the characteristics are about what you should expect.

I look forward to your torture data.
- Cancel
- Vote Up +1 Vote Down
- Sign in to reply
- More
- Cancel

Children

No Data