It looks like you're new here. If you want to get involved, click one of these buttons!
Here we will pick apart the following paper:
Note: this was originally in the Climate networks thread, but I'm moving it here into its own space.
For any $f(t)$, denote the moving average over the past year by:
$$\langle f(t) \rangle = \frac{1}{365} \sum_{d = 0}^{364} f(t - d) $$ Let $i$ be a node in the El Niño basin, and $j$ be a node outside of it.
Let $t$ range over every tenth day in the time span from 1950 to 2011.
Let $T_k(t)$ be the daily atmospheric temperature anomalies (actual temperature value minus climatological average for each calendar day).
Define the time-delayed cross-covariance function by:
$$ C_{i,j}^{t}(-\tau) = \langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ $$ C_{i,j}^{t}(\tau) = \langle T_i(t - \tau) T_j(t) \rangle - \langle T_i(t - \tau) \rangle \langle T_j(t) \rangle $$ They consider time lags $\tau$ between 0 and 200 d, where "a reliable estime of the backround noise level can be guaranteed."
Divide the cross-covariances by the standard deviations of $T_i$ and $T_j$ to obtain the cross-correlations.
Only temperature data from the past are considered when estimating the cross-correlation function at day $t$.
Comments
Next, for nodes $i$ and $j$, and for each time point $t$, the maximum, them mean and the SD around the mean are determined for $|C_{i,j}^t|$, as $\tau$ varies across its range.
Define the link strength $S_{i j}(t)$ as the difference between this maximum and mean value, divided by the SD.
They say:
Next, let the mean strength $S(t)$ of the dynamical teleconnections in the climate network be the average over all individual link strengths.
$$ C_{i,j}^{t}(\tau) \stackrel{?}{\neq}\langle (T_i(t - \tau) - \langle T_i(t - \tau) \rangle )(T_j(t) - \langle T_j(t) \rangle )\rangle$$ This first glance might be wrong but if this is a different definition on purpose, this might be mentioned in the text.
nad, I think it is delibrate: they don't want to use future data (which won't be known when making a prediction). Fig 4 in the supplemetary info shows a typical result.
Do you mean that they write:
but mean the cross-covariance?
I guess thats rather clearly just a typo. Unfortunately I am not so sure though in the above formula for the cross covariance wether this is a typo. It might be that they used what looks to me like "another" form of correlation. Whatever this means for the result.
nad, I thought you were concerned about the sign of tau, so I probably just confused you.
No. I interpreted this as a typo, i.e. instead of $$C_{i,j}^t(-\tau) $$ as David mentioned above they probably meant:
$$ C_{i,j}^{t-\tau}(-\tau)$$ I am concerned about their interpretation of correlation, it might not be what they originally wanted and not be what they think it represents.
I've got tau as just |tau| ie. abs but I haven't thought about it yet.
Thanks for moving this discussion here, David. I think this paper is simpler than the one you'd originally begun discussing, and "better" in the sense that it makes a prediction lots of people are interested in. We should improve on their methodology - ideally before September, when the El Niño may start!
David did not define $T_i(t)$ before saying:
I'm guessing $T_i(t)$ is the temperature at time $t$ at site $i$. Of course, I could look at the paper and remind myself, but maybe David or I should add the definition to his first post.
Why my confusion?
In our earlier discussion of Yamasaki et al's paper, we used a notation resembling $T_i(t)$ to denote the temperature at time $t$ at site $i$ minus the average temperature at that site on that day of the year, where we take an average over all years.
It seems both concepts can be useful.
It's the adjusted temperature. I added this quote from the paper the to the first message above:
Okay, great. So this paper and Yamasaki's seem to both use $T$ to mean 'adjusted' temperature rather than the actual temperature. The old paper used $\tilde{T}$ to mean the actual temperature, so perhaps we should follow suit, even though it galls me to use something other than plain old $T$ for temperature. The adjusted temperature seems more important here, so we want a nice short name for it.
I'm not sure what Nad is wondering about, but here's a guess.
She might be puzzled that we're calling
$$ \langle A B \rangle - \langle A \rangle \langle B \rangle $$ the covariance between $A$ and $B$. She might be more used to
$$ \langle A - \langle A \rangle \rangle \; \langle B - \langle B \rangle \rangle $$ However, these are equal! (Fun exercise.)
(Edit: there was a typo here.)
I thought they were called 'anomalies'. http://en.wikipedia.org/wiki/Instrumental_temperature_record
John wrote:
Sure, let's proceed along these lines.
Recap: $S(t)$ was defined as the mean of all the $S_{ij}(t)$, where $i$ is a node inside the El Niño basin, and $j$ is outside of it.
Hence, $S(t)$ is a measure of the "cooperativity" between the El Niño basin and its complement.
The claim made by the paper is that when $S(t)$ rises from below to cross a certain threshold, then that is a predictive signal that an El Niño will occur next year.
To determine the threshold, the divide the historical data into two periods: first a learning phase, and then a prediction phase.
The learning phase is used to find that value of the threshold which gives the "best" performance in terms of minimizing false alarm rate and maximizing the hit rate.
(Note these are two conflicting goals, so I there must be some subjective judgement about what gives the best performance, no?)
This threshold is then used in the prediction phase.
The following paper, by the same authors, uses this methodology on a larger data set, and states specific numerical results:
With the threshhold that they obtained during the learning phase, they report that in the testing phase, alarms were correct in 76% and the nonalarms in 86% if all cases.
Furthermore, based on current values of $S(t)$ between September 7 and September 17 of last year, their threshold predicts the return of El Niño in 2014 with a 3-in-4 likelihood.
They state that this is in contrast to the CPC/IRI consensus probabilistic ENSO forces yielding a 1-in-5 likelihood for an ENSO event next year, which increased to a 1-in-3 likelihood by November 2013.
Well, if it makes a testable prediction, which is borne out by empirical evidence, then it's worthy of investigation and further development.
Here are my reservations about it:
There's no trace of a plausible explanation for why cooperativity would foreshadow El Niño the next year. Of course, if it's a real empirical discovery, then the explanation could come later.
The result could be fluke resulting from an arbitrary conjecture whose parameters were optimized using too small a dataset. The training and predictions periods are each about thirty years long. I saw it stated that during the training period there were ten El Niño events. Can statistics be applied here, to help address this concern? (Though we'd have to view statistical reassurances also with a grain of salt.)
In other words, I'm concerned that this is like applying machine learning to the predictions of a set of Tarot cards.
No I think they are not equal. That is I haven't performed the calculation but I am pretty sure that they are not equal - I think they are only equal if you can treat the average as a number with respect to averaging, but given the above definition of an average, together with the time dependence this doesn't seem to be the case.
I don't think it's a fun exercise to keep track of summands. Moreover as said it's not fully clear which definition of an average of an average they use and maybe this equality is more or less just a typo and the computer programm uses the usual definition of covariance anyway so apart from the typo there would be no big problem, regardless whether the equality holds or not. Or they wanted to have a different definition of a correlation. Finally alone typing all steps of the calculation would probably take me at least half an hour. I don't have that much time for these kind of things and I mostly made a comment that I see a problem here because it could eventually be the case that they want to use their predictions for real life applications. But then I don't really know what happens in case of El Nino warnings.
You're not the only one feeling like that. See John's comment. A quote from the paper "To statistically validate our method, we have divided this time interval into two equal parts..." though I don't see any formal test of the success rate. The problem with any formal test is that you don't know how much they tweaked the method, consciously or subconsciously.
As I said in the other thread, it might be better to attempt continuous predictions of an El Niño index like NINO3.4. If one could get decent predictions of that at all times, not just during El Niño events, it would be much more convincing.
I don't think it is quite as bad as Tarot cards. The finding that S(t) tends to decrease during El Niño events seems pretty well supported, and while there is no explicit mechansim, it seems plausible that such a mechansim exists. El Niño events certainly stir things up. If S(t) decreases during El Niño events it must increase between them (assuming no long term trend in S(t)), so any measure of increase in S(t) (such as crossing a threshold) is likely to be positively correlated with future El Niño events. It might be like predicting an avalanche next week if the snow reaches a certain depth this week, in an area where avalanches occur randomly and about once every 4 weeks on average.
Since $ \langle (A+a) - \langle (A+a) \rangle \rangle = \langle A - \langle A \rangle \rangle $ do we care if $A$ is a temperature or a temperature anomaly?
Numerically, I think $ \langle A B \rangle - \langle A \rangle \langle B \rangle $ is a bad way to the computation if $A$ and $B$ tend to be big.
I wrote:
Nad wrote:
Whoops, you're right - I made a typo.
Obviously
$$ \langle A - \langle A \rangle \rangle \; \langle B - \langle B \rangle \rangle = 0$$ since
$$ \langle A - \langle A \rangle \rangle = \langle A \rangle - \langle A \rangle = 0 $$ using the principle that $\langle \langle X \rangle \rangle = \langle X \rangle$ - you can pull a number out of a mean.
I meant to write
$$ \langle (A - \langle A \rangle ) \; (B - \langle B \rangle) \rangle $$ This is what equals
$$ \langle A B \rangle - \langle A \rangle \langle B \rangle $$ Showing this really is a fun exercise:
$$ \langle (A - \langle A \rangle ) \; (B - \langle B \rangle) \rangle = \langle A B - \langle A \rangle B - A \langle B \rangle + \langle A \rangle \langle B \rangle \rangle $$ and $ \langle \langle X \rangle Y \rangle = \langle X \rangle \langle Y \rangle $ (you can pull a number out of mean) so two terms cancel and we're left with
$$ \langle A B \rangle - \langle A \rangle \langle B \rangle $$
I wrote:
Graham wrote:
I've seen 'anomaly' used for the mean temperature during a given year (or some such period) minus the mean temperature over a longer historical period.
The 'adjustment' we're talking about here is the mean temperature during a given day minus the mean of temperatures on the same day of the year over a longer historical period.
Maybe this is called an anomaly too - I don't know.
David wrote:
I agree that this is very much worth thinking about. But we shouldn't necessarily expect the explanation would appear in this particular paper.
It seems quite plausible to me that correlations between different locations increase as we approach a widespread event like an El Niño. In statistical mechanics we think a lot about 2-point functions - covariances between the value of some field $F$ at one point $i$ and another point $j$:
$$ C_{i,j} = \langle F_i F_j \rangle - \langle F_i \rangle \langle F_j \rangle $$ 2-point functions typically decay exponentially as the distance between the points $i$ and $j$ increases. However, as our system approaches a phase transition, e.g. as a solid approaches its melting point, its 2-point functions decay more slowly, and right at the phase transition they often show power-law decay.
In other words: when something dramatic is on the brink of happening, the system displays a lot of correlation between distant locations.
Does the start of an El Niño act in this way? That seems like a good question.
The network guys seem to be hoping something like this is true. But unlike people in stat mech, they're studying the 2-point function by drawing a graph where there's an edge between any two sites where the 2-point function is "big" (according to some specific criterion). They're hoping this graph will change character right before the start of an El Niño.
A fun alternative would be to directly study the 2-point function, see how it decays with distance, and see if the approach of an El Niño changes the decay rate.
Now by 2-point function I either mean the time-delayed cross-covariance function David mentioned:
$$ C_{i,j}^{t}(-\tau) = \langle T_i(t) T_j(t - \tau) \rangle - \langle T_i(t) \rangle \langle T_j(t - \tau) \rangle $$ or else, more simply, the version with no time delay:
$$ C_{i,j}^{t} = \langle T_i(t) T_j(t) \rangle - \langle T_i(t) \rangle \langle T_j(t) \rangle $$
John said
From the paper:
It seems to be a word that is used quire freely (and ambiguously).
Okay - anomaly it is!
This has been done in
I mentioned this in the climate network thread, which I think is a better palce to take this further. But briefly, it seems like the weather always follows a power law.
Graham wrote:
Yes, that would be very good. Note that since this index is defined in terms of sea surface temperatures, we can think of the challenge this way: use current sea surface temperatures to predict aspects of future sea surface temperatures. Of course there's no need to restrict ourselves to using current sea surface temperatures - we could use the Dow Jones industrial average if that helped. But using just sea surface temperatures makes it into an interesting self-contained game: "how much do sea surface temperatures now know about future sea surface temperatures"?
John wrote:
Yep, but I was assuming that you meant:
$$ \langle (A - \langle A \rangle ) \; (B - \langle B \rangle) \rangle $$ since you where talking about covariance. And so - I still think that it seems that for this case here:
$$ \langle (A - \langle A \rangle ) \; (B - \langle B \rangle) \rangle \neq \langle A B \rangle - \langle A \rangle \langle B \rangle $$
As I wrote above, exactly this principle doesn't hold if you follow the definitions in the article and thats why I strongly suspect that the two terms are not equal for this case here or that the definition is different. (?)
David wrote:
I agree the dataset seem too small for statistics. But it might be in principle be sufficient for determining a threshold, like as in the case that if you notice that if your car starts to sideslip in a certain curve at a certain velocity than you may not want to wait for a sample rate of 1000. In particular I have the slight suspicion - but this is still rather speculative - that the El Nino may be at least partially due to some kind of regular biannual temperature change, which is due to celestial mechanics. (Where I have to say that I have no idea where the biannual cycle should come from (something moon orbit and sun orbit ?) and that this suspicion is sofar quite on shaky grounds, in particular I looked only at a rather short temperature sample.) So it would be some kind of stochastic resonance/more-or-less-regular-reoccurring thing.
John said
Hmmm. Now I'm beginning to go off the way the authors use this word. Ref 33 is also their work. I think "seasonally adjusted" would be a better term for what they doing: "actual temperature value minus climatological average for each calendar day".
There is a positive correlation between temperatures in Turkey during March to May in 1960 and those in New Zealand during September to November in 1990. I haven't checked, but I know, because it was spring in both cases. Obviously, this is not of interest for predicting El Nino events. At the moment I'm leaning towards using absolute temperatures (in Kelvin, no messing about) and talking about "seasonally adjusted correlations/covariances".
Okay, seasonally adjusted temperature seems a much clearer term for "actual temperature at some location and some day value minus the average over years of the temperature at that location on that day of the year."
John said
It was in this spirit that I made this image earlier today.
The box is roughly the area for defining the NINO3.4 index. It is not hard to see that 1958 is warmer in the box than 1957. The tricky question is: can you see it coming? You could use more data: further back into 1956 or earlier; you could look at days, not ten-day averages; and you could use finer spatial resolution (the NOAA data is one point per 2.5 degrees, these images and the Ludescher paper use 7.5).
John wrote:
Wouldn't you need some more homogenous image for that? Unfortunately El Nino doesn't look very scale invariant.
I think their description is better. Finally seasonally adjusted could e.g. also mean "actual temperature at some location and some day value minus the average over years of the temperature at that location on that season of the year.”
Did you read my comment ?
Graham wrote:
Is this black frame for NINO3.4 index ? By the way are the red dots their sample points?
I find actually the region right in front of the coast of South America (middle right side of image) rather significant.
My black frame is supposed to be like their red frame, but I'm not sure I got it exactly right. The red dots are what they call the "El Nino basin". They use all the blue circles as sample points, but divide into those in and outside the "basin".
Nad wrote:
Right.
For those who don't know what Nad and I are talking about: in a 2nd-order phase transition (like the point where the difference between the liquid and gas phases of water ceases to exist), a physical system becomes scale-invariant. This is reflected in the fact that its 2-point functions obey a power law. Physicists love to think about the situations, mainly because the math is tractable.
I don't expect that the sea surface temperature becomes scale-invariant at the onset of El Niño, or that the 2-point functions of the sea surface temperature obeys a power law. I merely expect that the 2-point function decays more slowly at the onset of El Niño than at other times. I would like us to take a look and see if this is true. I'll propose a specific programming challenge for this.
I wrote:
Nad wrote:
Misunderstanding. I meant "seasonally adjusted temperature" is much clearer than what Graham had initially proposed: "temperature anomaly". Obviously “actual temperature at some location and some day value minus the average over years of the temperature at that location on that day of the year" is even clearer, but it's too long to say over and over. We need a shorter term, which we will precisely define at the beginning of any article about this stuff.
Nad wrote:
Yes. I agree that subtleties arise in situations where we have two concepts of mean (like mean over years and mean over days in the year), or one concept of mean that no longer obeys $\langle \langle X \rangle Y \rangle = \langle X \rangle \langle Y \rangle $ (like "mean over the last 365 days"). In any work we do, we'll have to be careful about these issues. You are right about that.
I was merely asserting that
$$ \langle (A - \langle A \rangle ) (B - \langle B \rangle ) \rangle = \langle A B \rangle - \langle A \rangle \langle B \rangle $$ in the usual context of a mean that obeys
$$\langle \langle X \rangle Y \rangle = \langle X \rangle \langle Y \rangle $$
Some ref. I can't immediately find says the ONI3.4 criterion, where I've just discoverd that 3.4 is the area of the pacific, defines its average wrt the 0.5 deg threshold as a standard 3-month moving average over 2 moves ie. 5 months. Correction: the ONI say for "5 overlapping seasons".
Good, we should learn this stuff. This webpage:
says:
The Niño 3.4 region is the rectangular region surrounded by thick lines here:
See the publications of Tim DelSole for some literature on the predictability of sea surface temperature; his "predictable component analysis" may be of some interest. You may also be interested in this review paper on ENSO predictability and this paper on information-theoretic measures of ENSO predictability. There is a fairly substantial literature on this subject applied to the predictability of particular modes of SST variability, not just ENSO.
John said:
I've been reading about the Walker circulation, where it says "When the Walker circulation weakens or reverses, an El Niño results". Does that sort of change - a change in the number of convection cells in a volume of fluid - count as a 2nd-order phase transition?
Graham wrote:
Right! When this westward-blowing wind slows down, hot water that has piled up on the west side of the Pacific starts moving east, in blobs. Apparently if enough of these blobs move east we get a El Niño. You can see several of them moving east this year:
Everyone should watch this nice short movie!
I don't know. I'd be a bit surprised if it did. 2-point functions go wild during phase transitions, but during a 2nd-order phase transition they become scale-invariant.
The melting of ice is a first-order phase transition: right at the brink of melting we expect long-range correlations between whether ice is melted here and whether it's melted somewhere else. The point where the difference between the liquid and gas phases ceases to exist is a second-order phase transition. Near this point we see droplets of water floating in water vapor, and the situation becomes closer and closer to scale-invariant near the critical point, meaning that we see droplets of all sizes. This is visible as critical opalescence.
If a second-order phase transition were occurring near the onset of an El Niño, we'd see some scale-invariant phenomena. For example: blobs of warm water of all sizes, big and small, starting to move east. Or something like that. 2-point functions would obey power laws.
This would be fun to look for, but I'm not betting on it. I'm just hoping that that some sort of longer-range correlations arise, as the whole system "comes to a consensus" about whether there will be an El Nño.
Actually you gave me a fun idea, Graham! In a sandpile when the sand is at the critical angle of repose, as steep as possible, small landslides occur... and at least in theoretical models, these landslides are roughly scale-invariant: there are small ones and big ones and bigger ones, with the frequency of a landslide of size $x$ being $\propto x^{-p}$ for some power $p$. Under some conditions sand naturally organizes itself into dunes that are near the critical angle of repose: this is called self-organized criticality. The idea is that this system naturally has a second-order phase transition as some sort of attractor.
Maybe Pacific warm water that's just about ready to slosh back east is a bit like a sandpile at its critical angle of repose! If so, there might be a second-order phase transition here.
I feel this idea is a overly naive, but it might have some merit, or lead to some better ideas.
I mentioned avalanches in comment 23, and I did (vaguely) know about the power law thing. If David Tweed was still around he might calm us down. Also I liked the paper Nathan pointed to: a review paper on ENSO predictability.
Thanks for reminding me about that review paper.
I'm less interested in joining the power law hype than in seeing how 2-point functions change with time... and also location! To be really scientific, it would be good at some point to compare how things work in and near the El Niño basin to how they work somewhere else. But you're doing so much great stuff that at this point I merely need to catch up, not invent new sub-projects.
I was thinking along those lines too. One important thing that stands out in the "Covariance maps 1951-1979" image is that green turns into red about 3 pixels away from the basin. I don't know if that is special to that region. If ENSO is like a see-saw, then maybe the green-red border is like the pivot point. But it could be a general thing that happens over oceanic regions, or eveywhere.
I posted this to the discussion on El Nino blog post 6.
Ludescher et al 2014 now claims their method works 1 year in advance. Can somebody get this paper?
I don’t think I’ve seen:
Ludescher, J., Gozolchiani, A., Bogachev, M. I., Bunde, A., Havlin, S., and Schellnhuber, H. J. (2014). Very Early Warning of Next El Niño, PNAS 111, 2064 (doi/10.1073/pnas.1323058111)
http://www.pnas.org/content/111/6/2064.abstract
Abstract The most important driver of climate variability is the El Niño Southern Oscillation, which can trigger disasters in various parts of the globe. Despite its importance, conventional forecasting is still limited to 6 mo ahead. Recently, we developed an approach based on network analysis, which allows projection of an El Niño event about 1 y ahead. Here we show that our method correctly predicted the absence of El Niño events in 2012 and 2013 and now announce that our approach indicated (in September 2013 already) the return of El Niño in late 2014 with a 3-in-4 likelihood. We also discuss the relevance of the next El Niño to the question of global warming and the present hiatus in the global mean surface temperature.
I see no reason not to email one of the authors (who include H.J.Schnellhuber, founding director of PIK) with any questions or criticisms before publishing.
Jim, we've been discussing both papers by Ludescher et al for a while. Seehttp://johncarlosbaez.wordpress.com/2014/07/01/el-nino-project-part-3/
My bad. I'd never noticed they were forecasting a year ahead so this wasn't a new paper. Sorry for the noise. I'll go back to the beginning.
Here are some background, nearly all linked, papers and abstracts related to the Ludescher paper from Yamasaki, Gozolchiani and some others in their group. Whether any of these papers should be cited on the Azimuth wiki or blog I don't know, I'm only just reading them.
Yamasaki publications not online:
The Cooperation of two Time-scale's Adaptation in Dynamical Environment, Proceedings of the First International Conference on Information pp.71-72 , K.Yamasaki、M.Sekiguchi , 2000
Kazuo Yamasaki has also written a number of papers is vegetation land cover es timation using neural networks.