Keichi Takahashi
I am a computer scientist at Nara Institute of Science and Technology (NAIST) working on high-performance computing.
Location Japan
Activity
-
Keichi Takahashi made a comment
Pros: MPI is a highly optimized library with a long-standing history. It's hard to beat MPI's performance and functionality with your own implementation using sockets, verbs, etc..
Cons: MPI is too low-level. The fact that MPI gives you full control over what data is communicated between which processes also means that you need to describe every...
-
Keichi Takahashi made a comment
I wonder why Isendrecv (a non-blocking Sendrecv) does not exist?
-
Keichi Takahashi made a comment
Maybe it's worth noting that the same code does not hang if the message is small enough (< 2KB in my environment) since the message is sent with the eager protocol.
-
Keichi Takahashi made a comment
Cython is a good choice if you have an existing Python code base that you need to make "moderately" faster. However, if you want to squeeze out every last bit of performance, using CFFI or f2py makes more sense.
To maximize the performance using Cython, you need to statically type variables, disable bounds check, etc. In the end, you lose all the nice...
-
Keichi Takahashi made a comment
Cython: 0.032s
CFFI: 0.015s
f2py: 0.019sEven though f2py resulted in slightly lower performance than CFFI, it's probably the easiest among them requiring minimal code change.
-
Keichi Takahashi made a comment
Cython: 0.032s
CFFI: 0.015s~2x speedup using CFFI. Considering that the C library is built with -O3, Cython is actually not bad!
-
Keichi Takahashi made a comment
Pure Python: 17.34 s
Cythonized and optimized: 0.03 sIt took me a while to figure out I had to cdef the loop counters...
-
Keichi Takahashi made a comment
Here's an iterative implementation that I came up with: https://gist.github.com/keichi/4cb14484ab68c685ec729a6cf8232530
-
Keichi Takahashi made a comment
Loops with dependencies (reductions, scans, etc.) are difficult to vectorize.
-
Keichi Takahashi made a comment
dx=0.1 S=0.9296501041259554
dx=0.01 S=0.9992078366500485
dx=0.001 S=0.9992037149227544
dx=0.0001 S=0.9999036736218789
dx=1e-05 S=0.9999936732092701
dx=1e-06 S=0.9999996732051449 -
Total runtime was 16.721 s on my Ubuntu VM (running on a MacBook Pro, Intel Core i7-8559U). 16.631 s was spent in evolve().
-
I couldn't find heat_simple.py in the repository. Maybe heat_main.py?