#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Experiments in Markov chains

I have not been following the blogs on network theory carefully. However I just saw that in Experiments in Markov chains John is doing something which looks (from a suitable distance) very like Bayesian phylogenetic analysis using a Markov Chain Monte Carlo (MCMC) technique.

There are 15 binary trees (with labelled tips) which represent the possible evolutionary relationships between 4 species. The MCMC technique uses (a deliberately designed) Markov Chain which visits these. One possible set of transitions is given by 'nearest neighbour interchanges'. I have added a picture at the bottom of the page showing the resulting graph of transitions.

• Options
1.
edited April 2011

Another vaguely connected thing is that the, for a directed acyclic graph (which occurs when the nodes come from a partial order, and probably just needs to be a preorder) the diagram of subsets (as shown on that page) is the Hasse diagram. The diagram tends to be more interesting when some of the elements aren't linked by edges so there's more structure in the Hasse diagram. There are some interesting algorithms one can do on the Hasse diagram, although the size blows up to the unusable very quickly. One area where this comes up is in "incomplete preferences" and to a lesser extent task execution order planning.

Comment Source:Another vaguely connected thing is that the, for a directed acyclic graph (which occurs when the nodes come from a _partial order_, and probably just needs to be a _preorder_) the diagram of subsets (as shown on that page) is the [Hasse diagram](http://en.wikipedia.org/wiki/Hasse_diagram). The diagram tends to be more interesting when some of the elements aren't linked by edges so there's more structure in the Hasse diagram. There are some interesting algorithms one can do on the Hasse diagram, although the size blows up to the unusable very quickly. One area where this comes up is in "incomplete preferences" and to a lesser extent task execution order planning.
• Options
2.
edited April 2011

Right now I'm fascinated by a specific Markov chain describing the ways a molecule with this geometry can rearrange itself:

The transition graph is highly symmetric:

This will eventually become a blog post in my networks series, just a fun example of the general theory.

I'm definitely looking for more practical examples from a wide range of subjects - so thanks, Graham!

David wrote:

Another vaguely connected thing is that the, for a directed acyclic graph (which occurs when the nodes come from a partial order, and probably just needs to be a preorder) the diagram of subsets (as shown on that page) is the Hasse diagram.

True! And the Desargues graph is the Hasse diagram coming from just the 2- and 3-element subsets of a 5-order set, partially ordered by inclusion. You can also see it as a Kneser graph.

There's an absurd amount of beautiful mathematical structure in this example - it's even related to grand unified theories in particle physics. I've been trying to get away from overly cute stuff like this, trying to plunge myself into the messy muck of reality, but I'm having some trouble: abstract math keeps chasing after me. In fact, the graph Graham uploaded is related to the 'pentagon identity' in category theory... pretty scary.

Comment Source:Right now I'm fascinated by a specific Markov chain describing the ways a molecule with this geometry can rearrange itself: <img src = "http://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Trigonal-bipyramidal-3D-balls.png/200px-Trigonal-bipyramidal-3D-balls.png" alt = ""/> The transition graph is highly symmetric: <img src = "http://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/DesarguesGraph.svg/300px-DesarguesGraph.svg.png" alt = ""/> This will eventually become a blog post in my networks series, just a fun example of the general theory. I'm definitely looking for more practical examples from a wide range of subjects - so thanks, Graham! David wrote: > Another vaguely connected thing is that the, for a directed acyclic graph (which occurs when the nodes come from a partial order, and probably just needs to be a preorder) the diagram of subsets (as shown on that page) is the Hasse diagram. True! And the Desargues graph is the Hasse diagram coming from just the 2- and 3-element subsets of a 5-order set, partially ordered by inclusion. You can also see it as a [Kneser graph](http://en.wikipedia.org/wiki/Kneser_graph). There's an absurd amount of beautiful mathematical structure in this example - it's even related to grand unified theories in particle physics. I've been trying to get away from overly cute stuff like this, trying to plunge myself into the messy muck of reality, but I'm having some trouble: abstract math keeps chasing after me. In fact, the graph Graham uploaded is related to the 'pentagon identity' in category theory... pretty scary. <img width = "300" src = "http://www.londonmet.ac.uk/library/x86175_8.jpg" alt = ""/>
• Options
3.
edited April 2011

By the way, Graham, I don't know why there are just 15 trees in your picture.

If I keep the 4 species at the tips of the trees in the same order, I get just 5 trees - those are the corners of the pentagon shown above. This is an example of an 'associahedron' . Mathematicians also study it when we replace the number 4 with some other number. Here's the associahedron for trees with 5 tips:

If I allow the 4 species at the tips of the trees to be permuted arbitrarily, I get 5 × 4! = 120 trees. These are the corners of a shape that's an example of a 'permutoassociahedron'.

But you're just allowing 3 permutations, for a total of 5 × 3 = 15 trees.

Oh, I should probably be able to guess this - you're not really working with planar trees, for one thing.

Do you know if biologists have pondered the general situation where the 4 species are replaced by a larger number of species? For mathematicians, this would be a lot of fun: associahedra and permutoassociahedra and their ilk are really fascinating.

Comment Source:By the way, Graham, I don't know why there are just 15 trees in your picture. If I keep the 4 species at the tips of the trees in the same order, I get just 5 trees - those are the corners of the pentagon shown above. This is an example of an 'associahedron' . Mathematicians also study it when we replace the number 4 with some other number. Here's the associahedron for trees with 5 tips: <img src = "http://faculty.tnstate.edu/sforcey/tamari_sm.gif" alt = ""/> If I allow the 4 species at the tips of the trees to be permuted arbitrarily, I get 5 &times; 4! = 120 trees. These are the corners of a shape that's an example of a 'permutoassociahedron'. But you're just allowing 3 permutations, for a total of 5 &times; 3 = 15 trees. Oh, I should probably be able to guess this - you're not really working with planar trees, for one thing. Do you know if biologists have pondered the general situation where the 4 species are replaced by a larger number of species? For mathematicians, this would be a lot of fun: associahedra and permutoassociahedra and their ilk are really fascinating.
• Options
4.
edited April 2011

I presume the biologically interesting event is the "thing speciates into X and Y", and in particular I'd imagine there's no difference between that and the event "thing speciates into Y and X" (ie, the internal nodes are "commutative") so some of the 4!=24 permutations are equivalent. I can't immediately think of the way to count those for the 4 element case.

Ok, here's an attempt. There are three "fully symmetrical" trees, which up to equivalences can be generated by "what thing is the closest relative of the tomato" (3 choices), since the unordred pair that goes on the other tips is then determined. Then there's the unbalanced trees, which can be generated by specifying what goes on the two "longest" branches (4 times 3 = 12 choices) since the unordered pair that goes on the final split is then determined. So that gives 15 possibilities.

Comment Source:I presume the biologically interesting event is the "thing speciates into X and Y", and in particular I'd imagine there's no difference between that and the event "thing speciates into Y and X" (ie, the internal nodes are "commutative") so some of the 4!=24 permutations are equivalent. I can't immediately think of the way to count those for the 4 element case. Ok, here's an attempt. There are three "fully symmetrical" trees, which up to equivalences can be generated by "what thing is the closest relative of the tomato" (3 choices), since the unordred pair that goes on the other tips is then determined. Then there's the unbalanced trees, which can be generated by specifying what goes on the two "longest" branches (4 times 3 = 12 choices) since the unordered pair that goes on the final split is then determined. So that gives 15 possibilities.
• Options
5.
edited April 2011

There are a lot of different ways of labelling (and therefore counting) binary trees. For biologists, what matters is the evolutionary relationships. So they don't care about left-right flips of the two subtrees at any node. There are internal three nodes in a tree with 4 tips, so by orienting the tree (giving left, right labels to the subtrees at all nodes) you could turn my 15 trees into $15*2*2*2 = 120$.

Another way of counting trees the biologists' way is to count the number of ways you can combine n things using a binary operation which is commutative but not associative. You can express evolutionary relationships in text like this:

A,(B,(C,D)) = A,(B,(D,C)) = (B,(D,C)),A = ...

(AB),(CD) = (DC),(BA) = ...

Biologists usually work with much more than 4 species. 50 might be typical. I can tell you the number of trees on n tips as biologists count them: $3*5*7*...*(2n-3)$

I can tell you that the nearest neighbour interchange allows you to get from any tree to any other for any n. I don't remember reading about associahedra and permutoassociahedra (or know what they are) but probably someone has done work on them.

I hope I've got all that right, it is easy to get confused. This is a nice article on the maths of biological trees, which I consult when I feel confused:

http://arxiv.org/abs/0803.0153

I also recommend this book: "Inferring phylogenies" by Felsenstein.

Comment Source:There are a lot of different ways of labelling (and therefore counting) binary trees. For biologists, what matters is the evolutionary relationships. So they don't care about left-right flips of the two subtrees at any node. There are internal three nodes in a tree with 4 tips, so by orienting the tree (giving left, right labels to the subtrees at all nodes) you could turn my 15 trees into $15*2*2*2 = 120$. Another way of counting trees the biologists' way is to count the number of ways you can combine n things using a binary operation which is commutative but not associative. You can express evolutionary relationships in text like this: A,(B,(C,D)) = A,(B,(D,C)) = (B,(D,C)),A = ... (AB),(CD) = (DC),(BA) = ... Biologists usually work with much more than 4 species. 50 might be typical. I can tell you the number of trees on n tips as biologists count them: $3*5*7*...*(2n-3)$ I can tell you that the nearest neighbour interchange allows you to get from any tree to any other for any n. I don't remember reading about associahedra and permutoassociahedra (or know what they are) but probably someone has done work on them. I hope I've got all that right, it is easy to get confused. This is a nice article on the maths of biological trees, which I consult when I feel confused: [http://arxiv.org/abs/0803.0153](http://arxiv.org/abs/0803.0153) I also recommend this book: "Inferring phylogenies" by Felsenstein.
• Options
6.

David,

Glad you noticed the balanced and unbalanced distinction! Your counting looks OK too.

Among the 15 trees here, 12 are unbalanced (3:1 or 1:3 split at the root, where I've now started distinguishing left and right), 3 are balanced (2:2 split at the root). So 80% are unbalanced. If you generate trees using a Yule or standard birth-death process, you find that you get 66.7% unbalanced. 3:1, 2:2, 1:3 are all equally likely.

What about real phylogenetic trees? Turns out to be somewhere between, maybe around 75%. Finding a nice mathematical model which make the right amount of unbalance is a problem that has been bothering me for the last couple if years...

Comment Source:David, Glad you noticed the balanced and unbalanced distinction! Your counting looks OK too. Among the 15 trees here, 12 are unbalanced (3:1 or 1:3 split at the root, where I've now started distinguishing left and right), 3 are balanced (2:2 split at the root). So 80% are unbalanced. If you generate trees using a Yule or standard birth-death process, you find that you get 66.7% unbalanced. 3:1, 2:2, 1:3 are all equally likely. What about real phylogenetic trees? Turns out to be somewhere between, maybe around 75%. Finding a nice mathematical model which make the right amount of unbalance is a problem that has been bothering me for the last couple if years...
• Options
7.
edited April 2011

Thanks, David and Graham!

Another way of counting trees the biologists' way is to count the number of ways you can combine n things using a binary operation which is commutative but not associative.

Okay, got it. Makes perfect sense. I'm sure pure mathematicians must have pondered this, but I mainly know other variants, which have been incredibly important in my work...

I don't remember reading about associahedra ...

The vertices of the $n$th associahedron are the ways of combining $n$ things written out in order using an operation that's not associative. (And not commutative, but that doesn't even come into play.)

So, the 4th associahedron has 5 vertices:

$A(B(C D)), A((B C)D), (A B)(C D), (A(B C))D, ((A B)C)D$

which correspond to the trees at the corners of this pentagon:

Ignore the trees at the edges on the edges and tree in the middle, unless you're interested in species that can speciate 3 or 4 at at time! These fancier trees are very important in the associahedron game, but I'm only telling you about how the vertices work.

The 5th associahedron has 14 vertices, like this:

The next one has 42, and so on - these are the "Catalan numbers", which show up all over the place in math. They're great, and I could talk your ear off about them...

But anyway, this stuff is largely irrelevant to phylogenetic trees, except that they're both special case of some bigger theory I only partially understand.

I'll try to learn a bit about the case that matters here.

Comment Source:Thanks, David and Graham! > Another way of counting trees the biologists' way is to count the number of ways you can combine n things using a binary operation which is commutative but not associative. Okay, got it. Makes perfect sense. I'm sure pure mathematicians must have pondered this, but I mainly know other variants, which have been incredibly important in my work... > I don't remember reading about associahedra ... The vertices of the $n$th associahedron are the ways of combining $n$ things _written out in order_ using an operation that's not associative. (And not commutative, but that doesn't even come into play.) So, the 4th associahedron has 5 vertices: $A(B(C D)), A((B C)D), (A B)(C D), (A(B C))D, ((A B)C)D$ which correspond to the trees at the corners of this pentagon: <img width = "400" src = "http://www.londonmet.ac.uk/library/x86175_8.jpg" alt = ""/> Ignore the trees at the edges on the edges and tree in the middle, unless you're interested in species that can speciate 3 or 4 at at time! These fancier trees are very important in the associahedron game, but I'm only telling you about how the vertices work. The 5th associahedron has 14 vertices, like this: <img width = "400" src = "http://faculty.tnstate.edu/sforcey/tamari_sm.gif" alt = ""/> The next one has 42, and so on - these are the "[Catalan numbers](http://en.wikipedia.org/wiki/Catalan_number)", which show up all over the place in math. They're great, and I could talk your ear off about them... But anyway, this stuff is largely irrelevant to phylogenetic trees, except that they're both special case of some bigger theory I only partially understand. I'll try to learn a bit about the case that matters here.
• Options
8.

I have bumped into the Catalan numbers several times. The ways of counting trees are related like

$3.5.7 ... (2n-3) 2^{n-1} = C_{n-1} n!$

"evolutionary trees when oriented = associahedron vertices when tip labels are permuted."

Comment Source:I have bumped into the Catalan numbers several times. The ways of counting trees are related like $3.5.7 ... (2n-3) 2^{n-1} = C_{n-1} n!$ "evolutionary trees when oriented = associahedron vertices when tip labels are permuted."
• Options
9.
edited April 2011

That's very helpful. I'm not sure what "oriented" means here, though.

Comment Source:That's very helpful. I'm not sure what "oriented" means here, though.
• Options
10.

An evolutionary tree like A,(B,(C,D)) is the same thing when reflected about each comma, so

A,(B,(C,D)) = A,(B,(D,C)) = (B,(D,C)),A = ...

'Orienting' means specifying a particular choice for each reflection. n species means n-1 commas so $2^{n-1}$ choices.

Maybe it is useful to think of evolutionary trees as trees in the sense of graph theory: an evolutionary tree is a connected acyclic graph in which every vertex has degree one or three, the vertices of degree one are labelled, and one of them is distinguished as the root.

Are you going to say 'forgetful functor' soon?

Comment Source:An evolutionary tree like A,(B,(C,D)) is the same thing when reflected about each comma, so A,(B,(C,D)) = A,(B,(D,C)) = (B,(D,C)),A = ... 'Orienting' means specifying a particular choice for each reflection. n species means n-1 commas so $2^{n-1}$ choices. Maybe it is useful to think of evolutionary trees as trees in the sense of graph theory: an evolutionary tree is a connected acyclic graph in which every vertex has degree one or three, the vertices of degree one are labelled, and one of them is distinguished as the root. Are you going to say 'forgetful functor' soon?
• Options
11.

I will try to restrain myself.

Thanks, I think I almost get it... but now I can't think out loud without breaking my pledge.

Comment Source:I will try to restrain myself. Thanks, I think I almost get it... but now I can't think out loud without breaking my pledge.
• Options
12.

There is a Sage graph tutorial which shows the breath of Sage support for both graphs and what you can do with them

I intended my next Sage tutorial to be about graphs and I've started it, but if you want me to address some particular topic more in the tutorial ill listen

Comment Source:There is a [Sage graph tutorial](http://www-sop.inria.fr/members/Nathann.Cohen/tut/Graphs/) which shows the breath of Sage support for both graphs and what you can do with them I intended my next Sage tutorial to be about graphs and I've started it, but if you want me to address some particular topic more in the tutorial ill listen
• Options
13.

Staffan,

The R code below makes a 15 by 15 matrix which is the adjacency matrix of the graph of evolutionary trees. Perhaps you can make it into a graph in Sage, and do something with it, eg use g.automorphism_group.

z <- matrix(0, nrow=5, ncol=5)
a <- z
b <- z
for (r in 1:5) {
for ( c in 1:5) {
d <- (r-c) %% 5
if (d == 1) a[r,c] <- 1
if (d == 2) b[r,c] <- 1
if (d == 3) b[r,c] <- 1
if (d == 4) a[r,c] <- 1
}
}
abz <- cbind(a,b,z)
baz <- cbind(b,z,a)
zab <- cbind(z,a,b)
M <- rbind(abz, baz, zab)
M

Comment Source:Staffan, The R code below makes a 15 by 15 matrix which is the adjacency matrix of the graph of evolutionary trees. Perhaps you can make it into a graph in Sage, and do something with it, eg use g.automorphism_group. ~~~~ z <- matrix(0, nrow=5, ncol=5) a <- z b <- z for (r in 1:5) { for ( c in 1:5) { d <- (r-c) %% 5 if (d == 1) a[r,c] <- 1 if (d == 2) b[r,c] <- 1 if (d == 3) b[r,c] <- 1 if (d == 4) a[r,c] <- 1 } } abz <- cbind(a,b,z) baz <- cbind(b,z,a) zab <- cbind(z,a,b) M <- rbind(abz, baz, zab) M ~~~~
• Options
14.

Ive added the code to the currently public worksheet, which Ill use as a base when I do the tutorial during this week. There are Desargues graphs,n-cubes, random graphs and After your code I added a KneserGraph(5,2)

Comment Source:Ive added the [code to the currently public worksheet](), which Ill use as a base when I do the tutorial during this week. There are Desargues graphs,n-cubes, random graphs and After your code I added a KneserGraph(5,2)