Sample of Current Research Projects

Tools and Analysis for Biopharmaceutical Operations

Biopharmaceuticals play a significant role in the US economy and health care system, and major pharmaceutical fi rms are increasingly acquiring biopharmaceutical firms, perceiving that biotechnology is the future of medicine in this country and the world. However, although the science of biotechnology is advancing rapidly, the manufacturing and logistics of biotechnology is not necessarily keeping pace. There is a tremendous opportunity to help the biopharmaceutical industry achieve its potential, by more efficiently designing, operating, and managing its production, logistics, and supply chain processes. The promise of biopharmaceuticals to play a key role in the healthcare system of the future requires systems that can produce and deliver products safely, reliably, and cost effectively to patients, while allowing biopharmaceutical fi rms to successfully navigate the many risks inherent in the industry. The goal of this research project is to develop tools, techniques, and approaches that make this possible. In collaboration with a variety of local biopharmaceutical firms, we are exploring strategic, tactical, and operational questions around capacity expansion, production planning, risk management, inventory planning, scheduling, and supply chain management. For example, in a recent preprint, we explore novel approaches to capacity expansion in biopharmaceutical firms: Production Capacity Investment with Data Updates

Energy-Aware Scheduling on Heterogeneous Processors

Energy cost has become the critical cost factor for server farms, and there is also increasing concern over the heat and carbon emissions generated. No longer is faster necessarily better; both speed and energy consumption must be considered in the operation of these systems, and often faster servers require more energy. Hence the emphasis is on providing good service while maintaining energy efficiency. We are doing research on dynamically scheduling jobs on servers that are heterogeneous in terms of both their speeds and energy usages in order to minimize both congestion and energy costs.

Resilient networks

Networks are vulnerable to different types of risk ranging from variable operating performance of the components, to random failures, to natural or human disturbances. We would like our networks to be resilient against variability, failures, and disturbances so that they may continue to function effectively. Therefore, we build redundancy into our network systems as a safeguard or insurance to ensure an adequate service level. Many fundamental questions arise in this context. How does one measure resilience? What is the right level of resilience? How does one achieve a desired level of resilience in an optimal way? Whereas random behavior of a component or its response to an outside disturbance may be easy to understand, it is the interconnection among possibly thousands of components that makes a network's vulnerability di cult to analyze. In particular, the combinatorial nature of the decisions makes this analysis very challenging as the corresponding mathematical representations become non-convex problems requiring an excessive computational effort. Our research project undertakes a systematic study of fundamental problems related to networks and risk with the goal of developing new theories, models, and effective computational methods that are not only useful for identifying critical vulnerabilities in our networks, but also for efficient allocation of scarce resources to mitigate those vulnerabilities. 
Link to a recent paper.

Disambiguation and co-authorship networks of the U.S. Patent Inventor Database (1975-2010)

Abstract: Research into invention, innovation policy, and technology strategy can greatly benefit from an accurate understanding of inventor careers. The United States Patent and Trademark Office does not provide unique inventor identifiers, however, making large-scale studies challenging. Many scholars of innovation have implemented ad-hoc disambiguation methods based on string similarity thresholds and string comparison matching; such methods have been shown to be vulnerable to a number of problems that can adversely affect research results. The authors address this issue contributing 1) an application of the authority disambiguation approach (Torvik, Smalheiser, et al., 2005; 2009) to the US utility patent database, 2) a new iterative blocking scheme that expands the match space of this algorithm while maintaining scalability, 3) a public posting of the algorithm and code, and 4) a public posting of the results of the algorithm in the form of a database of inventors and their associated patents. The paper provides an overview of the disambiguation method, assesses its accuracy, and calculates network measures based on co-authorship and collaboration variables. It illustrates the potential for large-scale innovation studies across time and space with visualizations of inventor mobility across the United States. The complete input and results data from the original disambiguation are available at (; revised data described here are at (; original and revised code is available at (

An Algorithm for Computing Customized 3D Printed Implants with Curvature Constrained Channels for Enhancing Intracavitary Brachytherapy Radiation Delivery

Abstract— Brachytherapy is a widely-used treatment modality for cancer in many sites in the body. In brachytherapy, small radioactive sources are positioned proximal to cancerous tumors. An ongoing challenge is to accurately place sources on a set of dwell positions to sufficiently irradiate the tumors while limiting radiation damage to healthy organs and tissues. In current practice, standardized applicators with internal channels are inserted into body cavities to guide the sources. These standardized implants are one-size-fits-all and are prone to shifting inside the body, resulting in suboptimal dosages. We propose a new approach that builds on recent results in 3D printing and steerable needle motion planning to create customized implants containing customized curvature-constrained internal channels that fit securely, minimize air gaps, and precisely guide radioactive sources through printed channels. When compared with standardized implants, customized implants also have the potential to provide better coverage: more potential source dwell positions proximal to tumors. We present an algorithm for computing curvature-constrained channels based on rapidly-expanding randomized trees (RRT). We consider a prototypical case of OB/GYN cervical and vaginal cancer with three treatment options: standardized ring implant (current practice), customized implant with linear channels, and customized implant with curved channels. Results with a two-parameter coverage metric suggest that customized implants with curved channels can offer significant improvement over current practice.


A WWW-based interactive repository and testbed for optimization algorithms. We propose to create a world wide web (www) site containing a collection of software for optimization. In addition to serving the research community, the site will also serve end users in government, industry and commerce with a readily available and easy to understand interface.

A recent project explores a model for clustering which combines two criteria: Given a collection of objects with pairwise similarity measure, the problem is to find a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. The two objectives are combined either as a ratio or with linear weights. The ratio problem, and its linear weighted version, are solved by a combinatorial algorithm within the complexity of a single minimum s,t-cut algorithm. We call this problem "the normalized cut prime" (NC') as it is closed related to the NP-hard problem of normalized cut. The relationship of NC' to normalized cut is generalized to a problem we call "q-normalized cut". It is shown that the spectral method that solves for the Fielder eigenvector of a related matrix is a continuous relaxation of the problem. In contrast, the generalization of the combinatorial algorithm solves a discrete problem resulting from a relaxation of a single sum constraint. We study the relationship between these two relaxations and demonstrate a number of advantages for the combinatorial algorithm. These advantages include a better approximation, in practice, of the normalized cut objective for image segmentation benchmark problems, and better complexity. On image segmentation benchmark, the NC' delivers a superior performance to that of the spectral method. NC' was furthermore generalized to apply, as a supervised machine learning technique, to data mining instances. The comparison of NC' to leading machine learning techniques on datasets selected from the UCI data mining benchmark. demonstrate that NC' has the highest average accuracy. Dorit S. Hochbaum. A polynomial time algorithm for Rayleigh ratio on discrete variables: Replacing spectral techniques for expander ratio, normalized cut and Cheeger constant. Operations Research , Vol. 61, No. 1, January-February 2013, pp. 184-198. A preprint is available.