海角社区 CSE Professor Awarded $300,000 NSF Grant
June 04, 2025
海角社区 Division of Computer Science and Engineering Professor Rahul Shah was recently
awarded a $300,315 National Science Foundation grant for his research involving parameterized
pattern matching and compressed data structures. Shah鈥檚 research will affect how text
data is stored and will balance compression with fast query performance.
鈥淐omputing has ushered us into a whole new era of possibilities and discoveries as more and more data and computing resources are available,鈥 Shah said. 鈥淲hile throwing computational resources at a problem has been a successful initial approach, eventually, the efficiency of resources becomes important. Theoretical computer science rigorously studies the difficulties of problems based on resource limitations like time, energy, space, compressibility, input/output, etc. For the text data, space resources and compressibility become important factors.鈥
Shah says that even though there is a huge amount of data generated, there is a lot of redundancy in data.
鈥淭his makes data highly compressible,鈥 he said. 鈥淚n the field of compressed data structures, the goal is to develop data structures which take space nearly optimal with respect to the best compression while at the same time can answer the queries as fast as the best uncompressed data structure can answer. This project considers parameterized pattern matching problem and constructing compressed index for it. The problem has direct applicability in software plagiarism detection, cloning and versioning systems. Many of the results and components considered in this project are likely good inclusion for course projects and homework problems.鈥
In parameterized pattern matching, the text consists of parameterized characters and static characters. Two strings are parameterized match if there is a bijective mapping between parameterized alphabets of the two strings, which when applied to the characters of one of the strings, it becomes equal to the other. For example, two strings consisting of parameterized alphabets 鈥渁bcdaab鈥 and 鈥渨xyzwwx鈥 are considered the same because a maps to w, b maps to x, c maps to y, and d maps to z, while no such consistent mapping will exist if the strings were 鈥渁aba鈥 and 鈥測yzw.鈥 In the context of comparing two computer programs, these parameterized symbols represent the variable names.
Shah, along with his previous graduate students, introduced succinct and compact indexes based on what they call parameterized Burrows-Wheeler Transform (pBWT). In this project, Shah aims to define entropy-based compression measures for parameterized text and achieve an index with such space occupancies; explore the construction aspects of parameterized index, including implementation issues like memory bottleneck and external memory construction; and obtain best space-time tradeoff results. Apart from these, the project will also focus on various related problems including compressed indexing for two-dimensional pattern matching.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
鈥淲e are incredibly proud of Dr. Rahul Shah for receiving this prestigious NSF grant,鈥 海角社区 Division of Computer Science and Engineering Chair Ibrahim 鈥淎be鈥 Bagilli said. 鈥淗is work in parameterized pattern matching is advancing the boundaries of what鈥檚 possible in data processing and algorithms. At the 海角社区 Division of Computer Science and Engineering, we鈥檙e committed to supporting impactful research like this that drives innovation and strengthens 海角社区鈥檚 role as a leader in computing.鈥