David Henty

David Henty

I have been working with supercomputers for over 25 years, and teaching people how to use them for almost as long. I joined EPCC after doing research in computational theoretical physics.

Location EPCC, The University of Edinburgh, Scotland, UK.

Activity

  • Yes - the OS is constantly juggling dozens of different processes and threads, trying to ensure they all get their fair share of CPU time. The threads that OpenMP creates are just thrown into the mix with all the others. For HPC applications we usually make sure that a minimum of other tasks are running so the OpenMP threads will run almost continuously on the...

  • It is possible to reuse the heat but, until recently, the outlet water was not hot enough to be of much use. However, modern machines run much hotter which makes the heat carried away but the water much easier to use e.g. in heating other buildings - see "Energy Efficiency by Warm Water cooling" at...

  • @IstvanF The layout of all the cabinets is typically fixed to minimise cable lengths. Connecting all the cables is a huge job and normally done by dedicated experts.

  • That is a very good point - for large simulations on supercomputers, the limiting factor (the slowest part) is usually reading and writing memory and not the clock speed of the CPUs.

  • Yes - in a typical cellular automaton model you need to know the state of all the neighbouring cells. In 1D this is 2 neighbours (left and right), 2D is 4 neighbours (up and down as well), 3D is 6 neighbours ... In general, it's 2xD neighbours for D dimensions. If you include diagonals then the numbers of neighbours for 1D, 2D and 3D are 2, 8 and 26. In...

  • There are a number of parallel packages that can do Molecular Dynamics on parallel supercomputers, e.g. NAMD, GROMACS, LAMMPS, AMBER, ... EPCC recently ran an online LAMMPS tutorial - see https://www.epcc.ed.ac.uk/blog/2019/online-lammps-training-archer

  • @AndrewMatthew Up until the early 2000's, each manufacturer had their own version of Unix, e.g. Unicos (Cray), Tru64 (DEC/Compaq), Irix (SGI), Solaris (Sun), AIX (IBM), ... The advantages were that each OS was tailored for a particular architecture, but the development cost of maintaining their own OS was too much for most companies so they gradually moved to...

  • You can argue that more powerful CPUs enable software to be written more easily as you can concentrate on functionality and elegance rather than having to worry about performance (since a fast CPU can still run less efficient software at an acceptable speed). Another view is that fast CPUs just encourage poorly written, bloated software!

  • Power consumption and heat are real issues for mobile devices - you want to maxmise battery life and, as you point out, they are not well designed for getting rid of heat. This is why multicore technology is so attractive even if it makes the software more complicated - two cores each running at 1GHz use less power than one core running at 2GHz.

  • In practice, different cores will all be running at different speeds. Modern CPUs vary clock frequency dynamically based on load (e.g. turn it down if the processor is getting hot, crank it up if there aren't that many cores running and there is spare power). Even if they operated at the same clock speed, they would run at very different speeds in practice as...

  • The Game of Life is a very good example in terms of parallelising a real program. In practice, the strategy is identical to the traffic model - at each step, you update each cell based on the state of its nearest neighbours. In the 1D traffic model that just comprised the cells up and down in the road. For the 2D Game of Life, it's the eight nearest neigbours...

  • That's exactly correct - Message Passing is harder to implement, but less prone to subtle bugs. Most importantly for supercomputing, it is the only way to run on multiple nodes as Shared Memory is limited to a single node. Although this is a fine way to use all the cores on your laptop, on ARCHER this would limit you to running on only 24 cores of the total...

  • Virtualisation / containerisation is becoming more common in Supercomputing as it allows to develop on a local system (e.g. your laptop) and deploy in a larger machine (e.g. ARCHER). However, this can cause significant slowdowns for parallel programs. The whole point of virtualisation is to insulate operating systems from each other and from the hardware. In a...

  • That's correct - fans blow air over the blades, so it's cool air in and hot air out. The air is then cooled by large chillers which transfer that heat from the air to water, and at the ACF we can normally cool the water back down again using "ambient cooling" since the weather in Scotland is not normally very hot! See...

  • I commented on a similar point someone made in a different step and I think it's relevant here too:

    "This was tried in the early days of parallel computing and was called "metacomputing" - a single program running across separate computers distributed all over the globe. The problems are reliability (one of the machines could crash) and speed (it takes a...

  • This was tried in the early days of parallel computing and was called "metacomputing" - a single program running across separate computers distributed all over the globe. The problems are reliability (one of the machines could crash) and speed (it takes a long time for a computer in Europe to communicate with one in Japan). However, the model is used in...

  • My understanding is that it is using the appropriate precision for storing floating-point numbers rather than always using the highest precision available. For example, at the start of a calculation (where you may be a long way form the correct answer) there may be no need to use double-precision numbers - maybe single precision is enough. Later on, as you're...

  • @DavidFischak I first started working in HPC back in 1990 and you're right that there was a lot more diversity in the market: lots of competing processors and different flavours of Unix from numerous manufacturers. This changed and for quite some time we've had an almost complete monopoly of Intel x86 CPUs and Linux. However, things are changing again and, as...

  • @BernatMolero Monte Carlo simulation typically refers to any computation where random numbers are used. For example, if I wanted to simulate people evacuating from a building then I might use lots of random numbers to decide if someone turns left or right at the end of a corridor on their way out. This leads to lots of different simulations where people take...

  • Production-line manufacturing is an very good analogy. As you point out, there is parallelism within a single production line (e.g. different workers build different sections of a car as it passes down the line). The amount of parallelism might be limited, e.g. if there are 20 steps then you can't make use of more than 20 workers. The solution, as you've...

  • David Henty made a comment

    Hi - I'm David Henty and I work at EPCC at the University of Edinburgh, Scotland, UK. I co-developed the MOOC with Weronika and colleagues from SURFsara in the Netherlands.

  • As Jane points out, the number one machine has a performance profile that isn't necessarily representative of the majority of the world's supercomputers. However, another factor is that Moore's law is relevant for the performance of a single CPU. A supercomputer has many thousands of CPUs, so the total performance can outstrip Moore's law if we also increase...

  • Exactly - even a "null" message actually contains data such as the headers so they do clog up the network.

  • We could, but I think the issue has always been that processor speeds have increased more rapidly than memory systems so we're fighting a losing battle.

  • A very good point! Over the years, computing has swung between "think client" models like your "dumb terminal" example (processing done remotely) and "thick client" models like powerful desktops (processing done locally). We seem to be in a "thin client" phase where many of our devices are just used as access points for remote processing systems such as...

  • These cycles are observed in real predator / prey data, see e.g. https://theglyptodon.wordpress.com/2011/05/02/the-fur-trades-records/

  • @TonyMcCafferty I don't know if it's exactly what you were thinking of, but people do something called "autotuning" to optimise performance. If there are lots of possible parameters to adjust for a computation, you can simply run thousands of copies with different settings and find out experimentally what the best settings are. This takes huge amounts of...

  • If the batch system is doing a good job then the system should be reasonably full up all the time. People do build machines specifically to mine bitcoins, but it wouldn't be a cost-effective use of a supercomputer as you would not be using the capabilities of the high performance network.

  • Thanks for putting that link in!

  • @FrancescoMaroso You're correct that we could have had a GPU portion. However, we might have effectively ended up with two smaller systems - one with GPUs and one with CPUs - rather than one large system. The main focus of ARCHER was to enable very large simulations that could not be done on any other academic system in the UK so the decision was to have the...

  • F1 designers definitely use supercomputers to model their cars. However, to ensure a level playing field between teams, the amount of computer time they can use is severly limited e.g. I found this discussion on an F1 fan site: https://www.f1technical.net/forum/viewtopic.php?t=13311

  • @HarryTerkanian We have a few simple parallel programs written in MPI plus C or Fortran that we use on training courses - see for example the exercise material at http://www.archer.ac.uk/training/course-material/2017/12/intro-ati/index.php - which cover image processing, fluid dynamics and fractals. These should be relatively easy to port to a Raspberry Pi...

  • @SimonHennessey We have a few simple parallel programs written in MPI plus C or Fortran that we use on training courses - see for example the exercise material at http://www.archer.ac.uk/training/course-material/2017/12/intro-ati/index.php - which cover image processing, fluid dynamics and fractals. These should be relatively easy to port to a Raspberry Pi...

  • People have been looking at using FPGAs for HPC several years. Despite the potential for very good performance compared to power consumption, the problem has generally been programming them. It is very difficult to get good performance from large, numerically intensive programs written in C, C++ or Fortran.

  • @GillianC That's a good point - if a problem has a very complicated geometry such as if you wanted to simulate the air flow round an entire car then it is not easy to split the calculation up into equal-sized chunks. In situations like this then the approach is exactly as you describe - an important part of the pre-processing stage is "mesh partitioning" where...

  • @HarryTerkanian As ever, problems in computing have very good analogies in everyday life and "The Mythical Man Month" is an excellent analogy to the problem of just throwing more CPU-cores at a calculation. The real killer is that as you add more CPU-cores, each core is working on a smaller piece of the problem and the overhead of communication becomes greater.

  • The problem is to do with power consumption and heat production. Although we could produce a CPU with twice the speed, it would be so power hungry that it would be too expensive to run. It would also not be suitable for consumer devices as you would need expensive additional cooling to stop it overheating - your laptop can only really accommodate a small fan....

  • That's a very good point - on ARCHER the nodes are packaged so that there are four on a physical "blade". This means that these four nodes can actually communicate with each much more quickly than with nodes on a different blade.

  • I'm glad you found them useful - we significantly expanded the "Towards the Future" section after the first run last year as it was clearly an area that people were interested in.

  • That's correct, but it's important to note that this comes from the use of accelerators (in the case of Piz Daint, NVIDIA GPUs) rather than traditional multicore CPUs. Since GPUs have a very different architecture to CPUs, it's not immediately clear how many "cores" a GPU has, but the top500 list appears to count the number of "Streaming Multiprocessors". The...

  • That's an interesting observation, but in supercomputer networks it turns out that the major overhead is getting the data onto and off of the network infrastructure. Once data is on the network it travels very fast, so the cable length doesn't have such a big effect on the end-to-end transfer time.

  • I was always sceptical about whether driverless cars would take off as, even if they reduce risks at a statistical level (i.e. fewer accidents across thousands of drivers) an individual driver will always think that they would have done better than the robot in each particular accident. However, I read an article that made the point that for driving there is a...