Skip to 0 minutes and 10 seconds To build our phylogenetic tree, we go back to the influenza virus database query window that we were at before. And again, we look for influenza type A. We’re only looking for sequences infecting humans. We’re looking for haemagglutinin proteins. And we’re looking for subtype H1N1. Again, I’ve clicked Full Length Plus and Collapse Identical Sequences. However, what I’m going to do now is I’m going to do collection date from 1918, which of course, is the year of the Spanish flu, H1N1 pandemic. And we’re going to click up until about 1980. So 1980 was more or less the year when H1N1, the Russian flu pandemic, had settled down, and H1N1 descended from Russian flu had become a seasonal H1N1.
Skip to 1 minute and 1 second So if I now click Add Query, we’ll get many more hit sequences. In fact, we get we get 73 here. So I’m going to show results, clicking on this button. And now we have a list of hits. And there are 73 of them. I can click on date here to sort them by date. If I do that, we can see that the oldest one is in 1918. And this is the one from the 1918 pandemic. South Carolina strains, South Carolina 1. And then, of course, there’s a gap until 1933. The reason for this is because we haven’t been able to get any clinical material from the 1920s from which we’ve been able to isolate flu sequences.
Skip to 1 minute and 45 seconds 1933 is the year the flu virus was first isolated in the laboratory. So from 1933 onwards, we have the sequences of flu stocks that have been isolated in the laboratory. And our cut off date was 1980. And you can see here, if we look in the middle of the range here, that there’s actually a big gap in between 1957 and 1976 when H1N1 wasn’t seen. 1957, of course, was the H2N2 pandemic year. And antigenic shift then removed the descendants of Spanish flu H1N1 from the seasonal flu seen at that time. Now, there are some sequences here from 1976, which are not Russian flu sequences. These are all American sequences.
Skip to 2 minutes and 29 seconds And they’re the sequence of an H1N1 swine flu, which infected several people at that time. So I’m going to exclude them from this analysis, because they’re not really relevant to what we’re looking at. We’re only really interested in flus that are circulating in humans. We’re not interested in swine flus that have passed into humans. So by unchecking those boxes, I’m going to grey out those particular samples. So having made the selection, I’m going to click Do Multiple Alignment here. And what that does is it makes the flu sequences align against each other so that they’re in a position of maximum similarity, and we can best study their evolutionary relationships from one to another.
Skip to 3 minutes and 8 seconds So this is what the multiple sequence alignment looks like. You can see that where the sequences are similar, it has lined them up.
Skip to 3 minutes and 18 seconds So where there are dots in this alignment, it means that there are no deviations from what we call the consensus sequence here. So in this particular position, every amino acid is a serine. Whereas in the next position, every amino acid is a serine, except in this strain here and that strain there. It’s a threonine represented by a T. So there’s our sequence alignment for our selected strains. And now I’m going to click on Build a Tree. So phylogenetic tree is quite like a family tree. So things that are most closely related in the family tree, for instance, you might have brothers and sisters, they’re going to be close together on the phylogenetic tree.
Skip to 4 minutes and 2 seconds I’m just clicking the standard way of doing it here. I’m not going to change any of the options.
Skip to 4 minutes and 9 seconds So this has produced our phylogenetic tree. So flu sequences that are most closely related are together in the tree. And flu sequences that more distantly related are further apart in the tree. So the first thing you might notice is down here near the root we have the 1918 pandemic flu sequence. And then it’s quite a big gap between that and the first sequences that were discovered in the lab in the early 1930s. And this shows us 20 odd years worth of evolution– not quite 20 years, but 15 years worth of evolution from 1918 to the 1933. Now, where is our Russian flu 1977 sequence? It’s actually up here on this branch here.
Skip to 4 minutes and 48 seconds These are all of the sequences from the Russian flu pandemic, and from seasonal flu circulating shortly afterwards, like 1980 and 1979, and so on. So to look at that in slightly more detail, I’m going to click on that branch there. And then I’m going to click down here. You see that the branch has turned red to show it’s been highlighted. And then I’m going to click at the bottom here. And it shows it expands all that branch. So we have Russian flu from 1918 here. We have the early seasonal flus from the 1930s. And then we have some from the 1940s. We don’t have any samples collected– there’s only one sample collected during the Second World War here.
Skip to 5 minutes and 26 seconds There’s three of them, actually, 1942, 1943 here. But then after the war ends, we have quite a lot more sequences available. And these are ones that were collected in seasonal flus of the 1950s, H1N1 seasonal flus of the ’50s. And there are more seasonal flus here from the ’40s and ’50s. And them we meet flu from 1977. So you can see that, in fact, the flu from 1977 is quite close to the flu from the 1950s. We see here that there isn’t very much distance there at all between Roma in 1949 and 1977. And there isn’t very much distance from the 1951 to 1977.
Skip to 6 minutes and 6 seconds And indeed here, Leningrad 1954 is so close to the 1977 strain, that it’s actually clustered inside with it as if it’s very, very closely related, which it is. It’s almost identical. So this is part of the evidence that we use for believing that the 1977 strain, pandemic strain, was actually probably a laboratory escapee, because the distance between 1954 and 1977, which is 23 years, is very small indeed.
Skip to 6 minutes and 39 seconds Whereas here, we see the distance between 1953 and 1954, which is actually slightly less than– it’s 19 years less than 20 years– is really very large indeed So this phylogenetic tree doesn’t make sense if the flu had been circulating seasonally between 1950s and 1970s, that somehow, the flu from the 1950s came back again in the 1970s as Russian flu. And the only way that we can explain that is probably that something was accidentally released from a laboratory.
Skip to 7 minutes and 18 seconds And it might have been something quite similar to this Leningrad 1954 strain.
Building a Phylogenetic Tree: Decoding the mystery of the Russian flu outbreak
In this video, we’re going to continue the demonstration of why many virologists believe that the 1977 Russian Flu must have been a laboratory strain from the 1950s which then became an escapee. In this second step, we’ll be using the sequences we retrieved from GenBank in the first step to make a phylogenetic tree. Then we’ll be analysing the relationships between the 1977 H1N1 strain and others.
You don’t need to download any special software to do this task - everything you need is freely available on the GenBank website. After you’ve watched the video, go to the website and try to replicate what has been demonstrated.
If you feel confident enough, perhaps you could try looking at a few other problems, for instance, can you find the origins of the PB1, PB2, PA, NA, M, NS1 and NS2 genes of the current 2015 H1N1 seasonal flu? Please discuss you findings on the message boards.
Hint: just repeat what you have seen demonstrated for the H segment, but retrieve the sequences for one of the other segments instead and repeat the tree building process. You’ll need to adjust the dates appropriately, as we are starting with a 2015 H1N1, not 1977. Also, you might want to select “pig” as well as “human” (it isn’t giving the game away to much to suggest that “swine flu” had some origins in flu infecting pigs).
Don’t worry if you find this a bit of a challenge. My undergraduates do too. If you can manage even a bit of this, you’re doing very well.
I must also apologise for 2 slips of the tongue in this video.
- At position 5:14, I say “Russian Flu”, but of course 1918 was “Spanish Flu”
- At position 6:40, I say “1953”, but really mean “1933”
For those who are really keen, a link to a phylogenetics paper on the same subject we have been studying, is below.
© Lancaster University