Keichi Takahashi

Keichi Takahashi

I am a computer scientist at Nara Institute of Science and Technology (NAIST) working on high-performance computing.

Location Japan

Activity

  • Pros: MPI is a highly optimized library with a long-standing history. It's hard to beat MPI's performance and functionality with your own implementation using sockets, verbs, etc..

    Cons: MPI is too low-level. The fact that MPI gives you full control over what data is communicated between which processes also means that you need to describe every...

  • I wonder why Isendrecv (a non-blocking Sendrecv) does not exist?

  • Maybe it's worth noting that the same code does not hang if the message is small enough (< 2KB in my environment) since the message is sent with the eager protocol.

  • Cython is a good choice if you have an existing Python code base that you need to make "moderately" faster. However, if you want to squeeze out every last bit of performance, using CFFI or f2py makes more sense.

    To maximize the performance using Cython, you need to statically type variables, disable bounds check, etc. In the end, you lose all the nice...

  • Cython: 0.032s
    CFFI: 0.015s
    f2py: 0.019s

    Even though f2py resulted in slightly lower performance than CFFI, it's probably the easiest among them requiring minimal code change.

  • Cython: 0.032s
    CFFI: 0.015s

    ~2x speedup using CFFI. Considering that the C library is built with -O3, Cython is actually not bad!

  • Pure Python: 17.34 s
    Cythonized and optimized: 0.03 s

    It took me a while to figure out I had to cdef the loop counters...

  • Here's an iterative implementation that I came up with: https://gist.github.com/keichi/4cb14484ab68c685ec729a6cf8232530

  • Loops with dependencies (reductions, scans, etc.) are difficult to vectorize.

  • dx=0.1 S=0.9296501041259554
    dx=0.01 S=0.9992078366500485
    dx=0.001 S=0.9992037149227544
    dx=0.0001 S=0.9999036736218789
    dx=1e-05 S=0.9999936732092701
    dx=1e-06 S=0.9999996732051449

  • Total runtime was 16.721 s on my Ubuntu VM (running on a MacBook Pro, Intel Core i7-8559U). 16.631 s was spent in evolve().

  • I couldn't find heat_simple.py in the repository. Maybe heat_main.py?