Projects
From CCLwiki
Contents |
[edit] UABgrid
UAB is in constant process of building the UABgrid, a campus-wide computing environment that connects HPC resources across campus and offers access to regional resources via SURAgrid, TeraGrid and beyond. UABgrid leverages Globus technologies for system inter-connectivity and Shibboleth for federated identity management. UABgrid enables the construction of automated research workflows by providing consistent user identities and system interfaces across all HPC resources on campus. Through continuous research, UAB is unifying access to available systems and deploying applications in a manner that integrates grid related technologies with tools that applied scientists are accustomed to using. Our research in metaschedulers and workflow systems aims to provide seamless access to bioinformatics applications such as BLAST and NAMD. In order to maximize resource utilization and provide users with reduced application execution times, much work as been done on job scheduling. Through innovative scheduling methods, cluster applications have been successfully transitioned to grid nodes in such a way to exploit heterogeneity of available hardware and improve overall application performance.
[edit] Dynamic BLAST
Dynamic BLAST is a master-worker, grid-enabled application to execute BLAST searches on available resources. When dealing with a distributed and heterogeneous system such as the Grid, where numerous, hetegorgeneous resources are available, we can perform more than just use the available resources to execute the searches on. By realizing and accomodating features of available resources, we can perform BLAST-specific resource selection reducing the total execution time. This is accomplished through locally developed metascheduling ideas to complement resource selection with algorithm matching. Depending on available resources, not only are the resources ranked to select the most approprate ones first, but depending on the type and size of resource, the most appropriate BLAST algorithm is also used (e.g., mpiBLAST vs. sequential BLAST). Complementing these ideas with small enough query segments (as determined by Dynamic BLAST automatically based on user input data), load balancing can be done with greater granularity resulting in more accurate turnaround time predictions. Dynamic BLAST project is currenty in the process of being made available on UABgrid as the key application used for BLAST runs by bioinformaticians at UAB.
[edit] Adaptive Parallel Genetic Algorithms
Genetic algorithms are a widely used technique for search and optimization problems and belong to the group of evolutionary algorithms. One of its recent applications has been in image clustering. Serial implementations of the algorithm, customized for image clustering, suffer from slow execution. Also, a genetic algorithm, if run for small number of generations, gets stuck in the local minima, and therefore requires larger number of generations to achieve optimal results. These two factors motivated the parallel implementation of the algorithm. The concurrency in the application was identified and we partitioned it such that the I/O and computation was done in parallel. Both the “Master-Worker” paradigm and the “All-Worker” paradigm were implemented in C and MPI. Distribution of the computation intensive task amongst multiple processors resulted in a linear speedup and a vastly reduced execution time. While the “Master-Worker” model is better suited for the grid environment, the “All-Worker” model performs better on dedicated clusters. We modified the “Master-Worker” implementation further to run successfully and intelligently in Grid Environment where resources are heterogeneous, distributed across disparate locations, and can appear and disappear dynamically. This eventually turned out to be a scheduling problem that can be solved using operation research techniques.
[edit] BLAST performance analysis
Selection of an algorithm within an application, application and resource parameters, as well as input data, affect performance of a job. Thus, when submitting a job, tested and established selection of job parameters can greatly influence application productivity on a single resource and even more so across heterogeneous resource pool. The goal of this project is to collect and analyze possible parameters and performance variations exhibited when executing BLAST jobs using various algorithms, various parameters, various parameter values, and various resources. Through a set of examples and benchmarks, we are working toward deriving at a set of observations regarding which parameters are most influential in terms of execution time and associated resource cost.
[edit] Application Specification Language (ASL)
Application Specification Language (ASL) is a new grid language that we have developed and can be used by application developers and end-users to describe details of a given application. The ASL allows an individual application to be represented in the heterogeneous world of the grid by capturing its purpose, functionality, options. Through the use of ASL, application descriptions can be made available for immediate use or further advancements among applications such as application deployers, automated interface generators, job schedulers, and application-specific on-demand help provisioning. The ASL can also be used to describe how an application is compared and/or combined with other matching services and software. This ability to specify the composition of services can facilitate the creation of new and added functionality, as well as enabling further advancement of existing tools that can take advantage of the provided information.
[edit] Application Performance Database (AppDB)
Performance of any one application is more often than not very intimately related to the hardware and software characteristics of the resource the application is being executed on, as well as the use of application parameters during job instantiation. As such, execution of applications and associated user jobs in heterogeneous environments exhibit heterogeneous performance. Users perceive this variability through inconsistent job execution times and cost for identical or similar jobs when using variable resources. To remedy this drawback, we are developing Application Performance Database (AppDB), a tool that aims at providing the community with an option to store and later retrieve historical application performance information on global scale. This enables extractions of detailed relationships between an application and a resource, such as application architecture preference, CPU speed vs. CPU number preference, resource software stack preference, etc.
