Projects

04 :: COMPUTE MODULES

Budgeted Maximum Weight Clique Solver

High-performance, distributed C++ solver for the Budgeted Maximum Weight Clique (BMWC) problem. Leverages a parallelized Branch and Bound algorithm distributed across MPI processes, containerized via Docker for seamless serial and parallel execution.

C++MPIDockerHPCAlgorithms

Source code ↗

Accelerating Compute-Intensive Kernels

Accelerating kernels using x86 architecture features — AVX2, AVX-512, NUMA topology, and multi-core hierarchies. Includes production-grade GEMM and Smith-Waterman sequence alignment optimizations with measured speedups.

> Collaborators: Yosep (Joseph) Ro ↗

C++HPCAVX-512OpenMPNUMA

GEMM ↗ Smith-Waterman ↗

Gem5 Architecture Enhancements

Implementation of advanced cache replacement policies (RRIP, SHiP) and branch predictors (TAGE, Perceptron) inside the Gem5 full-system simulator — with IPC benchmarking across SPEC workloads.

C++Pythongem5Microarchitecture

Cache policies ↗

Lorenz Attractor Visualization

High-performance chaos system vizualization combining OpenMP task parallelism, MPI inter-node communication, and CUDA kernel execution — with real-time OpenGL rendering. Benchmarked strong and weak scaling.

CUDAMPIOpenMPOpenGLC

Source code ↗

VLIW Processor Simulator

32-bit 5-stage pipelined VLIW processor simulation with a custom assembly parser, register file monitoring, and pipeline hazard detection — implemented in Python/Verilog.

PythonVerilogComputer Architecture

Source code ↗