### STP Bench Number : 3 - HeartMath Data Update

This article flows from the initial Introduction to HeartMath’s initiatives, at Bench Number 3 in the STP (virtual) Lab.

It concerns data carefully obtained by a private individual, using his HeartMath emWave2. His goal was to see if it could be used to reduce his overanxious state.

An additional goal was to see if use of this device over time would be associated with an increase in "Average Coherence," a term defined by HeartMath, and the source of data considered here.

The source of the data is given here, as gathered and presented by it's original author on his page.

I referred to him in the other article as "John Q. Public." While his actual name is still unknown to me, he presents himself as Drew, (which may in fact be his school), and which we will use here.

Drew's data, I have extracted from his article and placed (hopefully accurately) in an Excel spreadsheet (*.xlsx). You can download this Excel Spreadsheet of Drew’s data by clicking the link. The goal here is to analyze the data from and for Drew (he didn't ask for that, but this outcome of his work has been thankfully shared with him).

#### As I introduced in the other article ...

...j ust how much do we need to analyze of his collection of gathered measurements?

** All of it.** I say that because one is tempted to select "a little bit" of the data for analysis. Perhaps the first and last week of his 6 week daily trial of the HeartMath emWave2, to see if his "coherence" is improving. But would those weeks be a good choice? "How would you know?"

He does two trials per day for 6 weeks. = 2 x 7 x 6 = 84 data points. He deletes one, stating a low value that he labels "dismal," = 83 data points. He starts the trial over because his earpiece sensor stopped working, so he had to buy another. He discards the first weeks data, starting over. One learns to never discard data. It always has something to teach. If one views it as "a failure," and tosses it, one may be throwing away the most important part of what the data are trying to get through one's thick skull. So since he has this data, I include it. (I make this data available in the article, as an Excel spreadsheet that you can download). So n = 83 + 8 = 91 trials of emWave2. Perhaps without knowing it, Drew's method also defined a hypothesis and a methods protocol.

#### "Why do this?"

Drew was trying to define the effectiveness of a HeartMath device that he aimed to use to reduce anxiety. He did the first part. He gathered data. If he did miss a usual part of the investigative process related to hypothesis testing : that would be extracting from his observations, what the numbers are trying to teach him.

Even without any analysis of his numbers, Drew marched boldly ahead to a few conclusions :

- "For me my best coherence came when I focused intensely on my heart, adjusted my breathing, and then added some sort of positive thinking and/or emotion. It was difficult to add the positive emotion while maintaining focus on your heart. Best results came when I had nearly 80% of my focus on my heart and a 20% (subtle) focus on positive thinking – which created positive emotion." Since John Q. did no data analysis, this is of course simply parroting back what HeartMath writes and advertises. "How would you know?"
- " I'm definitely going to keep using the emWave2 because it does help me relax and I have noticed bursts of creativity during my sessions. This may have something to do with my baseline level of arousal as well as brain waves…"
- "Funny enough I was getting more stressed by not being able to increase my HRV… go figure." - does the HeartMath device, like a full eMailbox, become just another source of stress for some. Does the emphasis on gaining "awards," contribute to this?

So let's state the null hypothesis for Drew's study : **There is no difference in Heart Rate Variability as represented by a calculated "Average Coherence" using an emWave2 device, over 91 consecutive observations, performed with the intention of seeing "Average Coherence" improve with time.**

The complete data set is in the mentionned spreadsheet. Here's a sample of Data entry and some summary results:

Two daily sessions provide values for "Average Coherence" (AC) from the emWave2's associated graphic software. These are averaged to give a daily Mean AC value, with standard deviation. Drew had planned a 6 week study, to which the initial Week 0 values were added back, because you don't throw away data.

As drew explains on his site, after 8 sessions over 4 days, the emWave2's earpiece pulse sensor failed. He ordered a new one and started over. There was no difference in AC between the old sensor ("Week 0") and the new one when used in "Week 1." "**How would you know?"** Briefly, because the mean and standard deviations for AC in these two weeks were not significantly different. That looks like this :

what does it mean? It says that the apparent difference (0.3) of a lower AC (1.9±0.624) when using the new sensor when compared with the old sensor (Mean AC = 2.2±0.866), is a difference that may have appeared simply by chance. "How would you know?" Because a test called a Student's t-test (two tailed), used to compare these Means, says so! The probability that the apparent difference is not real = 0.3572. Or, 35 chances out of 100 that the apparent difference is due to chance. That high a value doesn't fly in Medical Research circles, where the standard acceptable is 5 chances out of 100, or less (p<0.05). That level of significance can at times make one dismiss real events, but here is not the place to get into that. Conclusion: Before the old sensor stopped working, it was probably working just fine, and not fading gradually towards its demise. In fact its values were the higher of the two sensors. Enough.

#### Let's look at a quick plot of the data :

Which tells us what? Daily average for the two sessions shows a range for AC between about 0.9 and 3.4 during these 45 or so days. It appears that these values increase a bit as time passes. That R² value of 0.1191 suggests that about 11.9% of the change seen in AC (again, that's "Average Coherence"), is simply due to the passage of time. That leaves 1-0.1191 = 0.8809, or 88% of the apparent trend in need of explanation from some other source or sources besides just passage of time.

Let's move on to a more useful summarizing look :

What is this? Well, over 7 weeks (I'm counting the 4 days with the old sensor as "Week 0," Mean "Average Coherence" (red) values seem to rise. Especially true from Week 1 sessions to those of Week 5, with a drop off at Week 6.

Below, the green bars are Standard deviations. This just indicates how close all the values fall to the average or (better term) mean value. This value seems to become smaller with time in this case. This implies less variability in what Drew and his emWave2 working together, are doing. Here, the R² value at top right of 0.5015, informs that 50% of the changes seen in the Wekly Mean AC values, are explained by the passage of time from one week to the next.

Drew also recorded some (but not all, getting a bit lazy at the end), session durations. These overall, look like this:

This let's one quickly and confidently say that sessions ranged from about 6 to 8 minutes. Two sessions of about 10 minutes are noted. Session length seems to decrease a bit with time. Here are more specific summary data.

Out of 91 sessions (or AC trials if you prefer that word), Drew measured 55.

They averaged 7.2 ± 1.1 minutes in length. That's 6.6 hours hooked up to his emWave2. It's actually more, since there were sessions where he did not specifically measure duration. (He mentions on his site that the later one's were longer, about 10 minutes each, since he thought that seemed to result in better (= higher) values for "Average Coherence." Was Drew right? "How would you know?" Before answering we also note that an estimated total time for his 91 sessions would be 10.9 hours, hooked up to his emWave2.

Here's a look at duration of one of Drew's emWave2 sessions, and the "Average Coherence" level obtained. This for the 55 sessions with duration data :

What is this suggesting? Well, Drew was dead wrong in his assumption. Longer sessions for Drew, did not provide higher AC values. Let's look at the graph to extract its information. At the bottom (X-axis) : "Average Coherence" and at left (the Y-axis), Session Duration in Minutes. Look at the pattern of blue dots (data points). They do not cluster together, they seem to cluster around a line. That suggests the two variables have a relationship with one another. Perhaps a correlation, which does not necessarily mean, causation.

OK. So? Well the higher the dots go in duration, (left side of the blue dot cluster), the lower is the "Coherence" obtained. And vice versa of course. This means that during this experiment, at least for these 55 sessions, Drew had better "Average Coherence" with shorter sessions.

There is a dotted line and a red curve passing through the dots. These are trend lines (or curves). The dotted line has an equation associated with it, (upper right) but rather than explaining that fully here, just notice that R² = 0.6109. As before, this suggests that 61% of the changes seen in "Average Coherence," are explained by changes in duration of the session. That also suggests a need to look around for other variables that can explain the other 39%.

What about that Red Curve? Well, if the dotted line has an equation to express this relationship of Duration and "Average Coherence" obtained, that's just saying that there is a mathematical relationship that describes the relationship. But could there be a better mathematical relationship than that defining a staight line? You bet. And in this case there is. It might be a logarithm, or a power curve, or an exponential curve, or, ... as in this case, a polynomial order 2 equation. Wow! Drew just imagine that! The equation is given down below on the right in the graph. How do we know if this mathematical expression of the relationship is better than for a straight line? Because R² is larger. R² for the same data, now = 0.6683, or 67% of the variation in the relationship between Session Duration, and "Average Coherence" obtained, is explained. Saying it differently, 67% of the variation seen in "Average Coherence," is explained by changes in Duration of Session in minutes. Wow!

Did this relationship vary from Session 1 to Session 2 each day? Yup!

Here are Session 1 and Session 2 data separated:

Summary? : Session 2 data (blue lozenges) are a bit more scattered than those of Session 1 where a "tighter" relationship exists between Duration, and "Average Coherence." Both sessions demonstrate that longer session times, at least in Drew's case, the only subject here, are associated with lower "Average Coherence." Again, the relationship is better captured by a curve than a straight line. For Sessions 1, 74% of changes in "Average Coherence," are explained by changes in Duration. Only 52% for the Session 2's, even with the polynomial equation.

Let's look again at weekly averages. Data presented in graph form can sometimes fool the eye into believeing that something is more or less important.

Statistical methods help here.

First, as above :

Well? Less variability in the process as time passes (Standard Deviations going down), and "Average Coherence" going up from week 1 to 5, then a drop off. Or, ... not? "How would you know?"

A nice way to compare Means, is with Student's t-test (not really invented by a guy named Student, but Google it. It's an interesting story in humility).

Here are some results of this analysis :

#### "OK. What's it mean?"

Well, this is a way of answering the question: "How truly different are the Mean values for "Average Coherence" from one week to the next?"

Not all the weeks have been compared here, just those where the chance of finding such a difference seemed the highest (see graph above).

The difference between Week 0 (very beginning) and Week 6 (very end) has 76 chances out of 100 of having arisen by chance. Such a result would lead to the conclusion that there is no actual progression going on during the training. Maybe it was the Old Pulse Sensor after all.

Nevertheless, between Weeks 0 and 5, one is getting closer to our magical p<0.05 value that everybody uses as a cutoff of "significance." p=0.1673 means 17% chance that it's happening by chance.

Let's do it all with the same New sensor that Drew purchased too replace the one that stopped working. OK. Between Weeks 1 (lowest) & Weeks 5 (highest), the difference in Mean "Average Coherence" has an associated p value = 0.0022. That's to say that there are about 2 chances in 1,000 that an apparent improvement in "Average Coherence," happened by chance. Meaning, it's probably real.

That does not mean that things can't change. By Week 6, compared with Week 1, the difference between the Weekly Means has dropped off. The chance that that difference appeared by chance is p = 0.1081, or an 11% chance that it's all a coincidence.

Now depending on one's style, training and setting, one could choose to react as follows:

- First way: "p=0.1081 is greater than p<0.05, so we have to throw this one out. No significant difference in Means between Weeks 1 and 6. If anything was going in in Week 5, it seems to have disappeared by Week 6."
- Second way: "For a small study like this, a p value of 0.1 is trying to tell us something. Throwing it out as "insignificant" may get us thinking down the wrong path.

I like the second way. It helps one avoid dismissing important truths.

#### Should Drew be disappointed?

He voiced concern when his "Average Coherence" level during one session was so low, that he just wrote "dismal," omitting the actual value!

But what is a "good" value? And for whom? Can one compare one's value's with another? Should one? What does that gain?

Coming back to Drew's data: his overall "Average Coherence" level in these 91 sessions was 2.323±0.69074.

Data from a second individual, an experienced Mindfulness Meditator. The more profoundly he meditates, the lower go his scores!

His values: in 29 sessions, an "Average Coherence" value of 1.95±0.60274

Are the values different? How would you know?

Student's t-test: t=2.607, df=118, p=0.0103, a statistically significant difference. But how is it important?

What could be different in two individuals, that would have an effect on Heart Rate Variability (which is what is being measured here, afterall)? Lots of things could be different. One that readily comes to mind is age. HRV decreases as age increases.

Drew mentionned that he had a hard time getting to and maintaining the "green light" state during sessions. (The device has small diodes that light red, blue, green, merging from one color to the next. Red implies autonomic nervous system imbalance or sympathetic predominance (let's say, "stressed"), "green" implies a "coherent" state, with more parasympathetic tone, (let's say "calm and coherent") and blue is in between.

So the Mindfulness Meditator stayed in the green "High Coherence Ratio" level 70±17.4% of the time, yet his "Average Coherence" value is lower than Drew's. And to add to the mystery, there is a strong correlation between "Average Coherence" and "High Coherence (green light) Ratio levels as seen below.

We'll leave that one for another day.

#### CONCLUSIONS :

- Even in Drew's small study, an evolution over time (6+ weeks) in his "Average Coherence" levels obtained, and that these increased, seems in evidence. The data support rejecting the null hypothesis as presented at the top of this page.
- It is interesting that in this investigation, longer session durations, generated lower Mean "Average Coherence" values. Boredom?
- Why did levels drop off at Week 6? Is Week 6 actually different from Week 5? The table above shows a p value = 0.1677 for that difference, dropping off between Week 5 and Week 6. For the reasons just given, I would say that with only a 17% chance to the contrary, this drop off is probably real. Boredom?
- Most importantly!
**The "Conclusions" that Drew had drawn himself, based on seeing his raw data, and on, perhaps, "experience," or "intuition," were dead wrong.**This is not to beat on Drew. But re-reading his "Conclusions" as he formulated them on his webpage, adds interest. Drew's thoughts reflect what might be those of many others. They are not driven by basic analysis of his raw data. As he continues to use his emWave2, he will probably increase "Average Coherence" (whatever that actually represents is not perfectly clear, but the model of autonomic nervous system balance is frequently offered as an explanation). This AC increase may then drop off after a point. Perhaps a break in the action from time to time may be indicated, but these data don't clearly test that hypothesis. - Clearly, if Drew got 350,000 of his friends to do some emWave2 as well, the value of "n" in these calculations, the number contributing to Mean values and Standard deviations, would suddenly render most if not all the differences observed by Drew, "significant." Then again, those raw data values would no longer be identical in such a large study. An assumption: but a pretty safe one.
- The usual "end of paper" recommendations typically include changes to the study protocol to make it's next "go 'round" more effective. Seeing what would have happened over 8 weeks, for instance. And of course, if one starts off measuring session durations, one doesn't quite half way through. Quitting half way through happens when one assumes one already knows what is going on. Usually wrong! "If we knew what we were doing, it wouldn't be research..."

So what could explain this? One explanation: Boredom. It is possible that longer sessions became boring, and that punching through to the end of the last week of Drew's investigation also had the same effect on "Average Coherence" dropping off. The reader is certainly free to suggest other explanations based on the data, or suggestions for further study.

#### One of Drew's conclusions, given at the top of this page, bears repeating :

- "Funny enough I was getting more stressed by not being able to increase my HRV… go figure." - does the HeartMath device, like a full eMailbox, become just another source of stress for some, rather than an antedote?
- Does HeartMath's emphasis on gaining "awards," contribute to this? Does it foster unnecessary sentiments of competition? Yet, some in our culture, even when thinking they are doing something to "relax," may still believe they need this stimulus. That's quite cultural.
- "How would you know?" (to ask Deming's question as though he was still there to look over these data...).
- Drew is to be commended for his move from questionning to measuring. But who should be doing this measuring? The public? The buyer? (not very practical if one has to buy a device to, in addition, learn whether it works as advertised or not). Who should be presenting these types of results?

High "Achievement" levels, (seen below) result mostly from longer durations. I.e., more time "hooked up" to one's emWave2.... High "Achievement", and a mediocre level of "Coherence," is quite possible as one can see below.

"But what does it all mean?" - we will have to wait for another Lab Bench Update! Hopefully, today's Update contributed to shedding light as well as heat.

#### Thanks Drew !

** **

The STP Lab-