04 :: COMPUTE MODULES
Projects
Budgeted Maximum Weight Clique Solver
High-performance, distributed C++ solver for the Budgeted Maximum Weight Clique (BMWC) problem. Leverages a parallelized Branch and Bound algorithm distributed across MPI processes, containerized via Docker for seamless serial and parallel execution.
Accelerating Compute-Intensive Kernels
Accelerating kernels using x86 architecture features — AVX2, AVX-512, NUMA topology, and multi-core hierarchies. Includes production-grade GEMM and Smith-Waterman sequence alignment optimizations with measured speedups.
Gem5 Architecture Enhancements
Implementation of advanced cache replacement policies (RRIP, SHiP) and branch predictors (TAGE, Perceptron) inside the Gem5 full-system simulator — with IPC benchmarking across SPEC workloads.
Lorenz Attractor Visualization
High-performance chaos system vizualization combining OpenMP task parallelism, MPI inter-node communication, and CUDA kernel execution — with real-time OpenGL rendering. Benchmarked strong and weak scaling.
VLIW Processor Simulator
32-bit 5-stage pipelined VLIW processor simulation with a custom assembly parser, register file monitoring, and pipeline hazard detection — implemented in Python/Verilog.