Department Seminar by Dr. Douglas Thain
Updated on Wed, 10/12/2011 - 1:59pm
High Throughput Scientific Computing with Condor: Computer Science Challenges in Large Scale Parallelism
Speaker: Dr. Douglas Thain
When: Thu, 10/27/2011 - 11:00am - 12:15pm
Room: CH 430
New discoveries in many fields of science and engineering are limited by aggregate computing throughput. How many results can one compute in total over the course of a week, month, or year? By deploying the Condor distributed computing system, one can harness all of the computing resources available at an institution to deliver sustained, high throughput computing on thousands of cores for a large community of users. This presents an opportunity and a challenge: one now has easy access to thousands of cores, but how can one easily write fault-tolerant programs with thousand-way parallelism?
To address these problems, my research group at Notre Dame has developed a variety of application frameworks for large scale computing. These frameworks allow the end user to specify very large applications with massive parallelism, and then execute them on Condor as well as other kinds of clusters, clouds, and grids. Internally, these present a number of interesting CS challenges related to distributed systems, compilers and databases.. I will explain our experience in using these frameworks to enable seamless computing in fields such as bioinformatics, economics, image processing, high energy physics, and molecular dynamics.
This talk will be accessible to both computer scientists as well as experts in other domains that rely upon large scale computing.
Douglas Thain is an Associate Professor in the Department of Computer Science and Engineering at the University of Notre Dame, where he directs the Cooperative Computing Lab. Douglas received the B.S. in Physics from the University of Minnesota and the M.S. and Ph.D. in Computer Sciences from the University of Wisconsin, where he contributed to the Condor distributed computing system. His research team at Notre Dame creates software systems that are used around the world to attack large scale data intensive problems in science and engineering.