David Henty

I have been working with supercomputers for over 25 years, and teaching people how to use them for almost as long. I joined EPCC after doing research in computational theoretical physics.
Location EPCC, The University of Edinburgh, Scotland, UK.
Activity
-
David Henty replied to Betty Kibirige
Yes - the OS is constantly juggling dozens of different processes and threads, trying to ensure they all get their fair share of CPU time. The threads that OpenMP creates are just thrown into the mix with all the others. For HPC applications we usually make sure that a minimum of other tasks are running so the OpenMP threads will run almost continuously on the...
-
David Henty replied to Tom Couch
It is possible to reuse the heat but, until recently, the outlet water was not hot enough to be of much use. However, modern machines run much hotter which makes the heat carried away but the water much easier to use e.g. in heating other buildings - see "Energy Efficiency by Warm Water cooling" at...
-
@IstvanF The layout of all the cabinets is typically fixed to minimise cable lengths. Connecting all the cables is a huge job and normally done by dedicated experts.
-
That is a very good point - for large simulations on supercomputers, the limiting factor (the slowest part) is usually reading and writing memory and not the clock speed of the CPUs.
-
Yes - in a typical cellular automaton model you need to know the state of all the neighbouring cells. In 1D this is 2 neighbours (left and right), 2D is 4 neighbours (up and down as well), 3D is 6 neighbours ... In general, it's 2xD neighbours for D dimensions. If you include diagonals then the numbers of neighbours for 1D, 2D and 3D are 2, 8 and 26. In...
-
There are a number of parallel packages that can do Molecular Dynamics on parallel supercomputers, e.g. NAMD, GROMACS, LAMMPS, AMBER, ... EPCC recently ran an online LAMMPS tutorial - see https://www.epcc.ed.ac.uk/blog/2019/online-lammps-training-archer
-
-
@AndrewMatthew Up until the early 2000's, each manufacturer had their own version of Unix, e.g. Unicos (Cray), Tru64 (DEC/Compaq), Irix (SGI), Solaris (Sun), AIX (IBM), ... The advantages were that each OS was tailored for a particular architecture, but the development cost of maintaining their own OS was too much for most companies so they gradually moved to...
-
David Henty replied to Istvan F
You can argue that more powerful CPUs enable software to be written more easily as you can concentrate on functionality and elegance rather than having to worry about performance (since a fast CPU can still run less efficient software at an acceptable speed). Another view is that fast CPUs just encourage poorly written, bloated software!
-
Power consumption and heat are real issues for mobile devices - you want to maxmise battery life and, as you point out, they are not well designed for getting rid of heat. This is why multicore technology is so attractive even if it makes the software more complicated - two cores each running at 1GHz use less power than one core running at 2GHz.
-
David Henty replied to Istvan F
In practice, different cores will all be running at different speeds. Modern CPUs vary clock frequency dynamically based on load (e.g. turn it down if the processor is getting hot, crank it up if there aren't that many cores running and there is spare power). Even if they operated at the same clock speed, they would run at very different speeds in practice as...
-
The Game of Life is a very good example in terms of parallelising a real program. In practice, the strategy is identical to the traffic model - at each step, you update each cell based on the state of its nearest neighbours. In the 1D traffic model that just comprised the cells up and down in the road. For the 2D Game of Life, it's the eight nearest neigbours...
-
That's exactly correct - Message Passing is harder to implement, but less prone to subtle bugs. Most importantly for supercomputing, it is the only way to run on multiple nodes as Shared Memory is limited to a single node. Although this is a fine way to use all the cores on your laptop, on ARCHER this would limit you to running on only 24 cores of the total...
-
Virtualisation / containerisation is becoming more common in Supercomputing as it allows to develop on a local system (e.g. your laptop) and deploy in a larger machine (e.g. ARCHER). However, this can cause significant slowdowns for parallel programs. The whole point of virtualisation is to insulate operating systems from each other and from the hardware. In a...
-
David Henty replied to Alex Wardle
That's correct - fans blow air over the blades, so it's cool air in and hot air out. The air is then cooled by large chillers which transfer that heat from the air to water, and at the ACF we can normally cool the water back down again using "ambient cooling" since the weather in Scotland is not normally very hot! See...
-
David Henty replied to Andrew Matthew
I commented on a similar point someone made in a different step and I think it's relevant here too:
"This was tried in the early days of parallel computing and was called "metacomputing" - a single program running across separate computers distributed all over the globe. The problems are reliability (one of the machines could crash) and speed (it takes a...
-
This was tried in the early days of parallel computing and was called "metacomputing" - a single program running across separate computers distributed all over the globe. The problems are reliability (one of the machines could crash) and speed (it takes a long time for a computer in Europe to communicate with one in Japan). However, the model is used in...
-
David Henty replied to Istvan F
My understanding is that it is using the appropriate precision for storing floating-point numbers rather than always using the highest precision available. For example, at the start of a calculation (where you may be a long way form the correct answer) there may be no need to use double-precision numbers - maybe single precision is enough. Later on, as you're...
-
@DavidFischak I first started working in HPC back in 1990 and you're right that there was a lot more diversity in the market: lots of competing processors and different flavours of Unix from numerous manufacturers. This changed and for quite some time we've had an almost complete monopoly of Intel x86 CPUs and Linux. However, things are changing again and, as...
-
@BernatMolero Monte Carlo simulation typically refers to any computation where random numbers are used. For example, if I wanted to simulate people evacuating from a building then I might use lots of random numbers to decide if someone turns left or right at the end of a corridor on their way out. This leads to lots of different simulations where people take...
-
Production-line manufacturing is an very good analogy. As you point out, there is parallelism within a single production line (e.g. different workers build different sections of a car as it passes down the line). The amount of parallelism might be limited, e.g. if there are 20 steps then you can't make use of more than 20 workers. The solution, as you've...
-
David Henty made a comment
Hi - I'm David Henty and I work at EPCC at the University of Edinburgh, Scotland, UK. I co-developed the MOOC with Weronika and colleagues from SURFsara in the Netherlands.
-
As Jane points out, the number one machine has a performance profile that isn't necessarily representative of the majority of the world's supercomputers. However, another factor is that Moore's law is relevant for the performance of a single CPU. A supercomputer has many thousands of CPUs, so the total performance can outstrip Moore's law if we also increase...
-
David Henty replied to Catherine Yorkshire
Exactly - even a "null" message actually contains data such as the headers so they do clog up the network.
-
David Henty replied to Graham Brown
We could, but I think the issue has always been that processor speeds have increased more rapidly than memory systems so we're fighting a losing battle.
-
A very good point! Over the years, computing has swung between "think client" models like your "dumb terminal" example (processing done remotely) and "thick client" models like powerful desktops (processing done locally). We seem to be in a "thin client" phase where many of our devices are just used as access points for remote processing systems such as...
-
These cycles are observed in real predator / prey data, see e.g. https://theglyptodon.wordpress.com/2011/05/02/the-fur-trades-records/
-
@TonyMcCafferty I don't know if it's exactly what you were thinking of, but people do something called "autotuning" to optimise performance. If there are lots of possible parameters to adjust for a computation, you can simply run thousands of copies with different settings and find out experimentally what the best settings are. This takes huge amounts of...
-
David Henty replied to Stephen Marsh
If the batch system is doing a good job then the system should be reasonably full up all the time. People do build machines specifically to mine bitcoins, but it wouldn't be a cost-effective use of a supercomputer as you would not be using the capabilities of the high performance network.
-
David Henty replied to Bart Wauters
Thanks for putting that link in!
-
David Henty replied to Fumi I
@FrancescoMaroso You're correct that we could have had a GPU portion. However, we might have effectively ended up with two smaller systems - one with GPUs and one with CPUs - rather than one large system. The main focus of ARCHER was to enable very large simulations that could not be done on any other academic system in the UK so the decision was to have the...
-
F1 designers definitely use supercomputers to model their cars. However, to ensure a level playing field between teams, the amount of computer time they can use is severly limited e.g. I found this discussion on an F1 fan site: https://www.f1technical.net/forum/viewtopic.php?t=13311
-
David Henty replied to Simon Hennessey
@HarryTerkanian We have a few simple parallel programs written in MPI plus C or Fortran that we use on training courses - see for example the exercise material at http://www.archer.ac.uk/training/course-material/2017/12/intro-ati/index.php - which cover image processing, fluid dynamics and fractals. These should be relatively easy to port to a Raspberry Pi...
-
David Henty replied to Simon Hennessey
@SimonHennessey We have a few simple parallel programs written in MPI plus C or Fortran that we use on training courses - see for example the exercise material at http://www.archer.ac.uk/training/course-material/2017/12/intro-ati/index.php - which cover image processing, fluid dynamics and fractals. These should be relatively easy to port to a Raspberry Pi...
-
David Henty replied to Jason Polyik
People have been looking at using FPGAs for HPC several years. Despite the potential for very good performance compared to power consumption, the problem has generally been programming them. It is very difficult to get good performance from large, numerically intensive programs written in C, C++ or Fortran.
-
@GillianC That's a good point - if a problem has a very complicated geometry such as if you wanted to simulate the air flow round an entire car then it is not easy to split the calculation up into equal-sized chunks. In situations like this then the approach is exactly as you describe - an important part of the pre-processing stage is "mesh partitioning" where...
-
David Henty replied to Gillian C
@HarryTerkanian As ever, problems in computing have very good analogies in everyday life and "The Mythical Man Month" is an excellent analogy to the problem of just throwing more CPU-cores at a calculation. The real killer is that as you add more CPU-cores, each core is working on a smaller piece of the problem and the overhead of communication becomes greater.
-
David Henty replied to Chris Cussen
The problem is to do with power consumption and heat production. Although we could produce a CPU with twice the speed, it would be so power hungry that it would be too expensive to run. It would also not be suitable for consumer devices as you would need expensive additional cooling to stop it overheating - your laptop can only really accommodate a small fan....
-
That's a very good point - on ARCHER the nodes are packaged so that there are four on a physical "blade". This means that these four nodes can actually communicate with each much more quickly than with nodes on a different blade.
-
David Henty replied to stan chell
I'm glad you found them useful - we significantly expanded the "Towards the Future" section after the first run last year as it was clearly an area that people were interested in.
-
That's correct, but it's important to note that this comes from the use of accelerators (in the case of Piz Daint, NVIDIA GPUs) rather than traditional multicore CPUs. Since GPUs have a very different architecture to CPUs, it's not immediately clear how many "cores" a GPU has, but the top500 list appears to count the number of "Streaming Multiprocessors". The...
-
David Henty replied to Stephen Marsh
That's an interesting observation, but in supercomputer networks it turns out that the major overhead is getting the data onto and off of the network infrastructure. Once data is on the network it travels very fast, so the cable length doesn't have such a big effect on the end-to-end transfer time.
-
David Henty replied to Tony McCafferty
I was always sceptical about whether driverless cars would take off as, even if they reduce risks at a statistical level (i.e. fewer accidents across thousands of drivers) an individual driver will always think that they would have done better than the robot in each particular accident. However, I read an article that made the point that for driving there is a...
-
My understanding is that the complexity comes from simulating two materials of very different viscosities at the same time - oil is very thick and gas is very "runny" in the sense that it flows very easily. I'll see if I can find a more definitive answer ...
-
David Henty replied to Stephen Marsh
I don't think hard-wiring the OS would be a good idea as any errors could never be fixed, e..g you could not patch the system when yet another security hole was discovered! I have talked about caches in terms of data, but in fact instructions are also cached so the performance of the operating system is usually very good as all commonly executed pieces will...
-
David Henty replied to Peter Rogers
Although individual packets of data may be retransmitted, if there is a serious network failure then it will typically bring the whole system down. We spend lots of money on supercomputer networking for both speed and reliability. If you are doing calculations across widely distributed computers, such as done by Amazon and Google, you build resilience into the...
-
David Henty replied to Sandra Passchier
@SandraPasschier It depends. On ARCHER, you do your visualisation on a separate (smaller) system called the Data Analytic Cluster, although it is connected to the same disk storage as ARCHER so you don't have to copy your data around. If the visualisation is very computationally expensive, or needs such huge amounts of data that you can't afford to write it...
-
My understanding is that TPUs are designed for very fast calculation but at low precision. This is OK for many artificial intelligence applications but probably not OK for traditional computer simulations - I touched on this a bit in a previous answer https://www.futurelearn.com/courses/supercomputing/3/comments/25575992
-
David Henty replied to Harry Terkanian
I didn't notice you'd already answered Anton's question before I posted my own answer in https://www.futurelearn.com/courses/supercomputing/3/comments/25988184
-
David Henty replied to Tony McCafferty
@TonyMcCafferty A very good point - log graphs can be deceptive and hide the enormous increase in the data values by collapsing them together. We touch briefly on quantum computing here, which some believe is the next step.