
My research addresses challenges at the interface of operations and finance arising in the energy industry using reinforcement learning and optimization. I am in particular interested in the operations of commodity and energy conversion assets (e.g., storage, transport, and production), their valuation and risk management, the role of corporations in the clean energy transition, and the social impact of this transition.
Influenced by my energy research and work with industry, I am passionate about making reinforcement learning and optimization methods easier to adopt by reducing the effort needed to deploy them effectively across different applications. To this end, I develop methods that embed solve intelligence and are capable of adapting over time and to instancespecific information in a problem class.
(click on paper title to view
abstract)
Submitted Journal Papers
Energy and the Environment
 Decarbonizing buildings via energy
demand response and deep reinforcement learning: The deployment value of
supervisory planning and guardrails.
(with D. Jang, L. Spangher, C. Spanos). [pdf]
Energy demand response is projected to be critically important in
decarbonizing the grids of the future. We propose an agent to communicate
``artificial" price signals (i.e., prices for each hour) to workers as part of
an office building demand response program, where workers react to this price
signal by modifying their energy consumption to offpeak and less carbon
intensive hours of the day. The efficacy of this agent depends on its ability to
simultaneously learn how workers respond to price signals and choose them such
that the building's energy costs with respect to a timeofuse utility rate is
reduced. High energy costs can hinder deployment if the agent's price signals
are suboptimal when learning energy consumption. We assess the value of deep
reinforcement learning (RL) technology in mitigating this deployment hurdle.
Specifically, we explore the value of combining: (i) a modelfree deep RL method
that can learn by posting price signals to actual workers, (ii) a supervisory
``planning model'' that provides a synthetic learning environment, and (iii) a
guardrail method that determines whether a price should be posted to real
workers or the planning environment for feedback. This combination results in a
riskaware deep RL controller that uses the planning model for learning when it
thinks that a price is very likely to incur high energy costs in the real
environment. In a mediumsized office building, our riskaware controller
eliminates $175,000 in initial investment, decreases by 30% the energy cost, and
reduces emissions by 32% (106,000 tons of CO2) compared to using the timeofuse
rate as the price signal. In contrast, existing modelfree and modelbased deep
RL algorithms are unable to overcome initial learning costs. Our results bode
well for riskaware deep RL facilitating the deployment of building demand
response.
 Real options in energy: A guided
analysis of the operations literature.
(with N. Secomandi). [pdf; Invited review article; Under
revision for second round review at European Journal of Operational Research]
Real option models maximize the estimated market value of operational
assets, exploiting the flexibility that decision makers have in managing these
assets. Inspired by the valuation of financial options, their natural domains of
application feature high levels of market or operational risk. The existence of
financial markets that trade instruments associated with the inputs or outputs
of the underlying processes facilitates the use of these models. The energy
industry has thus received a substantial amount of attention within the real
option and operations (management/research) communities. The extant literature
lacks an appraisal of operations work on energy real options. We present a
concise guide to this topic and apply it to investigate this literature,
analyzing seventy papers published in fourteen journals. This body of research
has established that adapting managerial decisions to the unfolding of
uncertainty has the potential to add substantial value in energy compared to the
static discounted cash flow approach. Its research impact is considerable,
accounting for about six thousand Google Scholar citations. A few, yet
consequential, documented implementations of real option techniques have
informed managerial decisions in practice. We anticipate that the operations
literature on energy real options will continue to exhibit a vibrant stream of
contributions, emphasizing renewable energy as it relates to climate change and
environmental sustainability, as well as associated modeling and methodological
innovations.
 Least squares Monte Carlo and
pathwise optimization for merchant energy production.
(with B. Yang, N. Secomandi). [ssrn;
Under revision for third round review at Operations Research]
We study merchant energy production modeled as a compound switching and
timing option. The resulting Markov decision process is intractable. Least
squares Monte Carlo combined with information relaxation and duality is a
stateoftheart reinforcement learning methodology to obtain operating policies
and optimality gaps for related models. Pathwise optimization is a competing
technique developed for optimal stopping settings, in which it typically
provides superior results compared to this approach, albeit with a larger
computational effort. We apply these procedures to merchant energy production.
Employing pathwise optimization requires methodological extensions. We use
principal component analysis and block coordinate descent in novel ways to
respectively precondition and solve the ensuing illconditioned and large scale
linear program, which even a cuttingedge commercial solver is unable to handle
directly. Both techniques yield near optimal operating policies on realistic
ethanol production instances. However, at the cost of both considerably longer
run times and greater memory usage, pathwise optimization leads to substantially
tighter dual bounds compared to least squares Monte Carlo, even when specified
in a simple fashion, complementing it in this case. Thus, it plays a critical
role in obtaining small optimality gaps. Our numerical observations on the
magnitudes of these bound improvements differ from what is currently known. This
research has potential relevance for other commodity merchant operations
contexts and motivates additional algorithmic work in the area of pathwise
optimization.
Reinforcement Learning and Optimization
 Selfadapting network relaxations
for weakly coupled Markov decision processes.
(with A. Cire). [pdf;
Under
review at Management Science]
Weakly coupled Markov decision processes (WDPs) arise in dynamic
decisionmaking and reinforcement learning. These
models are often high dimensional but decompose into smaller component MDPs when
coupling constraints are relaxed.
Lagrangian relaxations of WDPs that dualize linking constraints are commonly
used to compute heuristic policies and
(optimistic) bounds. While computationally appealing, this Lagrangian approach
averages away combinatorial information
embedded in the linking constraints. We present a class of network relaxations,
dubbed feasibility network relaxations
(FNRs), that embed an exact network encoding of the linking constraints into a
linear programming flow model. We develop
a procedure to obtain the minimally sized such relaxation, which we refer to as
selfadapting FNR, as its size
automatically adjusts to the structure of the linking constraints. We analyze
this model to inform the approximation of
WDPs: (i) selfadapting FNR provides (weakly) stronger bounds than standard
Lagrangian relaxation and is polynomially
sized for instances where the linking constraints admit a tractable network
representation; and (ii) selfadapting FNR
provides bounds and policies that match the approximate linear programming (ALP)
approach but is substantially smaller
in size, even polynomially sized when existing ALP formulations are
exponentially large. We perform numerical
experiments on constrained dynamic assortment and preemptive maintenance
applications. Our results show that
selfadapting FNR significantly improves on standard Lagrangian relaxation in
terms of policy performance and/or bounds,
although requiring more time, while being an order of magnitude faster than
models based on alternative Lagrangian
approaches and ALP, which are unsolvable in several instances.
 Selfadapting robustness in demand
learning.
(with B. Chen, P. Pakiman, S. Jasin). [pdf;
Under revision for resubmission to Operations Research]
We study dynamic pricing over a finite number of periods in the presence of
demand model ambiguity. Departing from the typical noregret learning
environment, where price changes are allowed at any time, pricing decisions are
made at prespecified points in time and each price can be applied to a large
number of arrivals. In this environment, which arises in retailing, a pricing
decision based on an incorrect demand model can significantly impact cumulative
revenue. We develop an adaptivelyrobustlearning (ARL) pricing policy that
learns the true model parameters from the data while actively managing demand
model ambiguity. It optimizes an objective that is robust with respect to a
selfadapting set of demand models, where a given model is included in this set
only if the sales data revealed from prior pricing decisions makes it
``probable''. As a result, it gracefully transitions from being robust when
demand model ambiguity is high to minimizing regret when this ambiguity
diminishes upon receiving more data. We characterize the stochastic behavior of
ARL's selfadapting ambiguity sets and derive a regret bound that highlights the
link between the scale of revenue loss and the customer arrival pattern. We also
show that ARL, by being conscious of both model ambiguity and revenue, bridges
the gap between a distributionally robust policy and a followtheleader policy,
which focus on model ambiguity and revenue, respectively. We numerically find
that the ARL policy, or its extension thereof, exhibits superior performance
compared to distributionally robust, followtheleader, and
upperconfidencebound policies in terms of expected revenue and/or value at
risk.
 Selfguided
approximate
linear
programs.
(with P. Pakiman, N. Soheili, Q. Lin). [pdf;
Under revision for third round review at Management Science]
Approximate linear programs (ALPs) are wellknown models based on value
function approximations (VFAs) to obtain heuristic policies and lower bounds on
the optimal policy cost of Markov decision processes (MDPs). The ALP VFA is a
linear combination of predefined basis functions that are chosen using domain
knowledge and updated heuristically if the ALP optimality gap is large. We
sidestep the need for such basis function engineering in ALP  an
implementation bottleneck  by proposing a sequence of ALPs that embed
increasing numbers of random basis functions obtained via inexpensive sampling.
We provide a sampling guarantee and show that the VFAs from this sequence of
models converge to the exact value function. Nevertheless, the performance of
the ALP policy can fluctuate significantly as more basis functions are sampled.
To mitigate these fluctuations, we ``selfguide'' our convergent sequence of
ALPs using past VFA information such that a worstcase measure of policy
performance is improved. We perform numerical experiments on perishable
inventory control and generalized joint replenishment applications, which,
respectively, give rise to challenging discountedcost MDPs and averagecost
semiMDPs. We find that selfguided ALPs (i) significantly reduce policy cost
fluctuations and improve the optimality gaps from an ALP approach that employs
basis functions tailored to the former application, and (ii) deliver optimality
gaps that are comparable to a known adaptive basis function generation approach
targeting the latter application. More broadly, our methodology provides
applicationagnostic policies and lower bounds to benchmark approaches that
exploit application structure.
Accepted/Published Journal Papers
Energy and the Environment
 Datadriven storage
operations: Crosscommodity backtest and structured policies.
(with C. Mandl, S. Minner, N. Gavirneni). [pdf;
Forthcoming at Production and Operations Management]
Storage assets are critical for temporal trading of commodities under
volatile prices. Stateoftheart methods for managing storage such as the
reoptimization heuristic (RH), which are part of commercial software,
approximate a Markov decision process (MDP) assuming full information regarding
the state and the stochastic commodity price process and hence suffer from
informational inconsistencies with observed price data and structural
inconsistencies with the true optimal policy, which are both components of
generalization error. Based on extensive backtests, we find that this error can
lead to significantly suboptimal RH policies and qualitatively different
performance compared to the known nearoptimality and behavior of RH in the
fullinformation setting. We develop a forwardlooking datadriven approach
(DDA) to learn policies and overcome generalization error. This approach extends
standard (backwardlooking) DDA in two ways: (i) it uses financialmarket
features and estimates of future profits as part of the training objective,
which typically includes past profits alone; and (ii) it enforces structural
properties of the optimal policy. To elaborate, DDA trains parameters of
bangbang and basestock policies, respectively, by solving linearand
mixedinteger programs, thereby extending known DDAs that parameterize decisions
as functions of features without enforcing policy structure. We backtest the
performance of DDA and RH on six major commodities from 2000 to 2017 with
features constructed using Thomson Reuters and Bloomberg data. DDA significantly
improves RH on real data, with basestock structure needed to realize this
improvement. Our research advances the stateoftheart for storage operations
and suggests modifications to commercial software to handle generalization
error.
 Meeting corporate renewable
power
targets.
(with A. Trivella, D. MohseniTaheri). [pdf; video;
Received the 2021 Commodity and Energy Markets Association Best Paper Award and
the 2020 INFORMS ENRE Young Researcher Prize; Forthcoming at Management
Science]
Prominent companies have committed to procuring a percentage of their power
demand from renewable sources by a future date in the face of uncertain power
demand and stochastic power and renewable energy certificate (REC) prices. We
study procurement portfolios based on two dominant strategies to achieve this
target: longterm procurement of power and RECs at a fixed price using corporate
power purchase agreements (CPPAs) and shortterm purchases at volatile prices.
We analyze a twostage model to understand the behavior of procurement costs
when using financial and physical CPPA variants employed in practice, which
informs the structuring of these contracts. We subsequently formulate a Markov
decision process (MDP) that optimizes the multistage procurement of power to
reach and sustain a renewable procurement target. Our MDP is intractable because
its action space is nonconvex and its state space has highdimensional
endogenous and exogenous components. Although approximate methods to solve this
MDP are limited, a procurement policy can be obtained using an easytoimplement
``primal'' reoptimization strategy, which solves a deterministic model with
stochastic quantities in the MDP replaced by forecasts. This approach does not,
however, provide a lower bound on the optimal policy value. We propose a novel
``dual'' reoptimization heuristic which computes both procurement decisions and
a lower bound while retaining the desirable implementation properties of primal
reoptimization. On realistic instances, the dual reoptimization policy is
nearoptimal and outperforms policies from primal reoptimization and other
heuristics. Our numerical results also highlight the benefits of using CPPA
contracts to meet a renewable target.
 Socially responsible merchant
operations: Comparison of shutdownaverse CVaR and anticipated regret policies.
(with A. Trivella). Operations Research Letters, 49(4), 2021. [pdf]
Commodity and energy production assets are managed as real options on
market uncertainties. Social impacts of plant shutdowns incentivize balancing
asset value with shutdown probability. We propose new shutdownaverse policies
based on the popular dynamic conditional valueatrisk (CVaR). We analytically
and numerically compare these policies to known shutdownaverse policies based
on anticipated regret (AR). Our findings support the use of AR over CVaR to
embed shutdownaversion and the consideration of hybrid policies that are
partially timeconsistent but easily interpretable.
 Managing shutdown decisions in
merchant commodity and energy production: A social commerce perspective
(with A. Trivella, S.E. Fleten, D. Mazieres, D. Pisinger). Manufacturing and
Service Operations Management, 23 (2), 2021. [pdf]
Merchant commodity and energy production assets operate in markets with
volatile prices and exchange rates.
Producers can choose in each period between production, production suspension,
and permanent shutdown. Plant closures,
however, adversely affect societal entities beyond the specific plant being
shutdown such as the parent company
and the local community. Motivated by an aluminum producer, we study if
mitigating these hardtoassess broader
impacts of a shutdown is financially viable using the plant's operating
flexibility. Our social commerce perspective
towards managing shutdown decisions deviates from the commonly used asset value
maximization objective in merchant
operations. We formulate a highdimensional constrained Markov decision process
to manage shutdown decisions. We
approximate this intractable model using unconstrained stochastic dynamic
programs and define operating policies
that account for preferences to delay and reduce shutdowns. Our first policy
leverages anticipated regret theory
in behavioral psychology while the second policy generalizes production margin
heuristics used in practice using
machine learning. We compute the former and latter policies using a least
squares Monte Carlo method and combining
this method with binary classification, respectively. We also propose a
reoptimization heuristic to simplify the
anticipatedregret policy. We show that anticipatedregret policies possess
desirable asymptotic properties absent
in classification and reoptimizationbased policies. On instances created using
real data, anticipatedregret
and classificationbased policies outperform practicebased production margin
strategies and, to a lesser extent,
reoptimization. Specifically, the former policies decrease the shutdown
probability by 25% and, in addition, delay
shutdown decisions by an average of 4 years for a 4% asset value loss. Our
operating policies show that unaccounted
social costs amounting to a few percent of the maximum asset value can justify
delaying or avoiding the use of
a plant's shutdown option by adapting its operating flexibility in our
application. Thus, taking a social commerce
perspective towards managing a plant's operating flexibility appears financially
viable.
 Merchant energy trading in a
network
(with N. Secomandi). Operations Research, 66(5), 2018. [pdf]
We formulate the merchant trading of energy in a network of storage and
transport assets as a Markov decision
process with uncertain energy prices, generalizing known models. Because of the
intractability of our model, we
develop heuristics and both lower and dual (upper) bounds on the optimal policy
value estimated within Monte Carlo
simulation. We achieve tractability using linear optimization, extending near
optimal approximate dynamic programming
techniques for the case of a single storage asset, versions of two of which are
commercially available. We propose
(i) a generalization of a deterministic reoptimization heuristic, (ii) an
iterative version of the least squares
Monte Carlo approach, and (iii) a perfect information dual bound. We apply our
methods to a set of realistic natural
gas instances. The combination of our reoptimization heuristic and dual bound
emerges as a practical approach to
nearly optimally solve our model. Our iterative least squares Monte Carlo
heuristic is also close to optimal. Compared
to our other heuristic, it exhibits slightly larger optimality gaps and requires
some tuning, but is faster to
execute in some cases. Our methods could enhance single energy storage asset
software and have potential relevance
beyond our specific application.
 Comparison of
least squares Monte Carlo methods with applications to energy
real options (with F. Margot and N. Secomandi). European Journal of
Operational Research, 256(1), 2017. [pdf]
Least squares Monte Carlo (LSM) is a stateoftheart approximate dynamic
programming approach used in financial
engineering and real options to value and manage options with early or multiple
exercise opportunities. It is also
applicable to capacity investment and inventory/production management problems
with demand/supply forecast updates
arising in operations and hydropowerreservoir management. LSM has two variants,
referred to as regressnow/later
(LSMN/L), which compute continuation/value function approximations (C/VFAs). We
provide novel numerical evidence
for the relative performance of these methods applied to energy swing and
storage options, two typical real options,
using a common price evolution model. LSMN/L estimate C/VFAs that yield equally
accurate (near optimal) and precise
lower and dual (upper) bounds on the value of these real options. Estimating the
LSMN/L C/VFAs and their associated
lower bounds takes similar computational effort. In contrast, the estimation of
a dual bound using the LSML VFA
instead of the LSMN CFA takes seconds rather than minutes or hours. This finding
suggests the use of LSML in lieu
of LSMN when estimating dual bounds on the value of early or multiple exercise
options, as well as of related capacity
investment and inventory/production policies.

Relaxations
of approximate linear programs for the real option management
of commodity storage (with F. Margot and N. Secomandi).
Management Science, 61(12), 2015.
[pdf]
The real option management of commodity conversion assets gives rise to
intractable Markov decision processes
(MDPs), in part due to the use of high dimensional models of commodity forward
curve evolution, as commonly done
in practice. Focusing on commodity storage, we identify a deficiency of
approximate linear programming, which we
address by developing a novel approach to derive relaxations of approximate
linear programs (ALPs). We apply our
approach to obtain a class of tractable ALP relaxations, also subsuming an
existing method. We provide theoretical
support for the use of these ALP relaxations rather than their corresponding
ALPs. Applied to existing natural
gas storage instances, our ALP relaxations significantly outperform their
corresponding ALPs. Our best ALP relaxation
is both near optimal and competitive with, albeit slower than, stateoftheart
methods for computing heuristic
policies and lower bounds on the value of commodity storage, but is more
directly applicable for dual upper bound
estimation than these methods. Our approach is potentially relevant for the
approximate solution of MDPs that arise
in the real option management of other commodity conversion assets, as well as
the valuation of real and financial
options that depend on forward curve dynamics.
Reinforcement Learning and Optimization
 A data efficient
and
feasible level
set
method for stochastic convex optimization with expectation constraints.
(with Q. Lin, N. Soheili, T. Yang). Journal of Machine Learning Research,
21(143), 2020. [pdf]
Stochastic convex optimization problems with expectation constraints
(SOECs) are encountered in statistics and machine learning, business, and
engineering. In datarich environments, the SOEC objective and constraints
contain expectations defined with respect to large datasets. Therefore,
efficient algorithms for solving such SOECs need to limit the fraction of data
points that they use, which we refer to as algorithmic data complexity. Recent
stochastic first order methods exhibit low data complexity when handling SOECs
but guarantee nearfeasibility and nearoptimality only at convergence. These
methods may thus return highly infeasible solutions when heuristically
terminated, as is often the case, due to theoretical convergence criteria being
highly conservative. This issue limits the use of first order methods in several
applications where the SOEC constraints encode implementation requirements. We
design a stochastic feasible level set method (SFLS) for SOECs that has low data
complexity and emphasizes feasibility before convergence. Specifically, our
levelset method solves a rootfinding problem by calling a novel first order
oracle that computes a stochastic upper bound on the levelset function by
extending mirror descent and online validation techniques. We establish that
SFLS maintains a highprobability feasible solution at each rootfinding
iteration and exhibits favorable iteration complexity compared to
stateoftheart deterministic feasible level set and stochastic subgradient
methods. Numerical experiments on three diverse applications validate the low
data complexity of SFLS relative to the former approach and highlight how SFLS
finds feasible solutions with small optimality gaps significantly faster than
the latter method.
 Networkbased approximate linear
programming for discrete optimization
(with A. Cire). Operations Research, 68(6), 2020. [pdf]
We present relaxations for discrete optimization problems using approximate
linear programs (ALPs) defined on multiple networks that represent different
statespace aggregations. Our network ALP leverages information across these
networks using a piecewiseconstant value function approximation and its
optimistic bound is theoretically guaranteed to weakly improve upon the bounds
from the individual networks used in its construction. Solving network ALPs is
challenging due to its large number of constraints  a wellknown issue when
applying approximate linear programming. We sidestep this issue by using a
cliquebased graph representation to design a network ALP restriction that
admits a polynomialtime solvable extended formulation, which we show to also
deliver a weakly better bound than individual networks. We execute a
computational study on a challenging bilinear problem arising in marketing
analytics and a routing application encountered in the preemptive maintenance of
energy or cityowned assets. When used within a branchandbound scheme, the
network ALP restriction significantly outperforms in both solution quality and
time the following benchmarks: a stateoftheart mathematical programming
solver, a generic aggregation/dissaggregation scheme applied to a single
network, and a heuristic that chooses the best bound among individual networks.
 Partial hyperplane activation for
generalized intersection cuts
(with A. Kazachkov, E. Balas, and F. Margot). Mathematical Programming
Computation, 12, 2020. [pdf;
Received the Tepper Egon Balas PhD student
paper award]
The generalized intersection cut (GIC) paradigm is a recent framework for
generating cutting planes in mixed
integer programming with attractive theoretical properties. We investigate this
computationally unexplored paradigm
and observe that a key hyperplane activation procedure embedded in it is not
computationally viable. To overcome
this issue, we develop a novel replacement to this procedure called partial
hyperplane activation (PHA), introduce
a variant of PHA based on a notion of hyperplane tilting, and prove the validity
of both algorithms. We propose
several implementation strategies and parameter choices for our PHA algorithms
and provide supporting theoretical
results. We computationally evaluate these ideas in the COINOR framework on
MIPLIB instances. Our findings shed
light on the the strengths of the PHA approach as well as identify properties
related to strong cuts that can be
targeted in the future.
 Revisiting approximate linear
programming: Constraintviolation learning with applications to inventory
control and energy storage
(with Q. Lin and N. Soheili). Management Science, 66(4), 2020. [pdf]
Approximate linear programs (ALPs) are wellknown models for computing
value function approximations (VFAs) of intractable Markov decision processes
(MDPs). VFAs from ALPs have desirable theoretical properties, define an
operating policy, and provide a lower bound on the optimal policy cost. However,
solving ALPs nearoptimally remains challenging, for example, when approximating
MDPs with nonlinear cost functions and transition dynamics or when rich basis
functions are required to obtain a good VFA. We address this tension between
theory and solvability by proposing a convex saddlepoint reformulation of an
ALP that includes as primal and dual variables, respectively, a vector of basis
function weights and a constraint violation density function over the
stateaction space. To solve this reformulation, we develop a proximal
stochastic mirror descent (PSMD) method that learns regions of high ALP
constraint violation via its dual update. We establish that PSMD returns a
nearoptimal ALP solution and a lower bound on the optimal policy cost in a
finite number of iterations with high probability. We numerically compare PSMD
with several benchmarks on inventory control and energy storage applications. We
find that the PSMD lower bound is tighter than a perfect information bound. In
contrast, the constraint sampling approach to solve ALPs may not provide a lower
bound and applying row generation to tackle ALPs is not computationally viable.
PSMD policies outperform problemspecific heuristics and are comparable or
better than the policies obtained using constraint sampling. Overall, our ALP
reformulation and solution approach broadens the applicability of approximate
linear programming.
 A levelset method for convex
optimization with a feasible solution path
(with Q. Lin and N. Soheili). SIAM Journal on Optimization, 28(4), 2018.
[pdf]
Largescale constrained convex optimization problems arise in several
application domains. Firstorder methods
are good candidates to tackle such problems due to their low iteration
complexity and memory requirement. The levelset
framework extends the applicability of firstorder methods to tackle problems
with complicated convex objectives
and constraint sets. Current methods based on this framework either rely on the
solution of challenging subproblems
or do not guarantee a feasible solution, especially if the procedure is
terminated before convergence. We develop
a levelset method that finds an $\epsilon$relative optimal and feasible
solution to a constrained convex optimization
problem with a fairly general objective function and set of constraints,
maintains a feasible solution at each
iteration, and only relies on calls to firstorder oracles. We establish the
iteration complexity of our approach,
also accounting for the smoothness and strong convexity of the objective
function and constraints when these properties
hold. The dependence of our complexity on $\epsilon$ is similar to the analogous
dependence in the unconstrained
setting, which is not known to be true for levelset methods in the literature.
Nevertheless, ensuring feasibility
is not free. The iteration complexity of our method depends on a condition
number, while existing levelset methods
that do not guarantee feasibility can avoid such dependence. We numerically
validate the usefulness
of ensuring a feasible solution path by comparing our approach with an existing
level set method on a NeymanPearson
classification problem.
 Relationship between least squares
Monte Carlo and approximate linear programming
(with N. Secomandi). Operations Research Letters, 45(5), 2017. [pdf]
Least squares Monte Carlo (LSM) is commonly used to manage and value
intractable early and multiple exercise
financial and real options. Recent research in this area has started applying
approximate linear programming (ALP)
and its relaxations, which aim at addressing a possible ALP drawback. We show
that LSM is itself an ALP relaxation
that potentially corrects this ALP shortcoming. Our analysis consolidates two
separate streams of research and
suggests using LSM rather than ALP for the considered class of problems.
 LessThanTruckload carrier collaboration
problem: modeling framework and solution approach (with J. H. Bookbinder).
Journal of Heuristics, 19(6),
2013. [pdf]
LessThanTruckload (LTL) carriers generally serve geographical regions
that are more localized than the intercity
linehauls served by truckload carriers. That localization can lead to urban
freight transportation routes that
overlap. If trucks are traveling with less than full loads, there typically
exist opportunities for carriers to
collaborate over such routes. We introduce a two stage framework for LTL carrier
collaboration. Our first stage
involves collaboration between multiple carriers at the entrance to the city and
can be formulated as a vehicle
routing problem with time windows (VRPTW). We employ guided local search for
solving this VRPTW. The second stage
involves collaboration between carriers at transshipment facilities while
executing their routes identified in
phase one. For solving the second stage problem, we develop novel local search
heuristics, one of which leverages
integer programming to efficiently explore the union of neighborhoods defined by
new problemspecific move operators.
Our computational results indicate that integrating integer programming with
local search results in at least an
order of magnitude speed up in the second stage problem. We also perform
sensitivity analysis to assess the benefits
from collaboration. Our results indicate that distance savings of 7–15 %
can be achieved by collaborating
at the entrance to the city. Carriers involved in intracity collaboration can
further save 3–15 % in total
distance traveled, and also reduce their overall route times.
Conference Proceedings and Workshop Papers

Offlineonline reinforcement learning for energy pricing in office demand
response: Lowering energy and data costs (with D. Jang, L. Spangher, T.
Srivistava, M. Khattar, U. Agwan, and C. Spanos). Proceedings of the 8th ACM
International
Conference on Systems for EnergyEfficient Buildings, Cities, and Transportation
(BuildSys '21), 2021. [pdf]
Our team is proposing to run a fullscale energy demand response experiment in
an office building. Although this is an exciting endeavor which will provide
value to the community, collecting training data for the reinforcement learning
agent is costly and will be limited. In this work, we examine how offline
training can be leveraged to minimize data costs (accelerate convergence) and
program implementation costs. We present two approaches to doing so: pretraining
our model to warm start the experiment with simulated tasks, and using a
planning model trained to simulate the real world's rewards to the agent. We
present results that demonstrate the utility of offline reinforcement learning
to efficient pricesetting in the energy demand response problem.

A machine learning approach to methane emmisions mitigation in the oil and gas
industry (with Jiayang Wang, Jingfan Wang, and P. Ravikumar). Tackling climate
change with machine learning workshop, NeurIPS 2020. [pdf;
Selected for a Spot Light Talk and Overall Best Paper. Research covered by
Fortune magazine in its Eye on A.I.
newsletter.]
Reducing methane emissions from the oil and gas sector is a key component
of climate policy in the United States. Methane leaks across the supply chain
are stochastic and intermittent, with a small number of sites (‘superemitters’)
responsible for a majority of emissions. Thus, costeffective emissions
reduction critically relies on effectively identifying the superemitters from
thousands
of wellsites and millions of miles of pipelines. Conventional approaches such
as
walking surveys using optical gas imaging technology are slow and
timeconsuming. In
addition, several variables contribute to the formation of leaks such as
infrastructure age, production, weather conditions, and maintenance practices.
Here, we develop
a machine learning algorithm to predict highemitting sites that can be
prioritized for followup repair. Such prioritization can significantly reduce
the cost of
surveys and increase emissions reductions compared to conventional approaches.
Our
results show that the algorithm using logistic regression performs the best out
of several algorithms. The model achieved a 70% accuracy rate with a 57% recall
and a 66% balanced accuracy rate. Compared to the conventional approach, the
machine learning model reduced the time to achieve a 50% emissions mitigation
target by 42%. Correspondingly, the mitigation cost reduced from $85/t CO2e to
$49/t CO2e.

Selfguided approximate linear programs (with P. Pakiman, N. Soheili, and Q.
Lin). Selfsupervised learning workshop, NeurIPS 2020.
[pdf]
Approximate linear programs (ALPs) are wellknown models based on value function
approximations (VFAs) to obtain heuristic policies and lower bounds on
the optimal policy cost of Markov decision processes (MDPs). The ALP VFA is a
linear combination of predefined basis functions that are chosen using domain
knowledge and updated heuristically if the ALP optimality gap is large. We
sidestep the need for such basis function engineering in ALP – an
implementation bottleneck – by proposing a sequence of ALPs that embed
increasing numbers of random basis functions obtained via inexpensive sampling.
We provide a sampling guarantee and show that the VFAs from this sequence of
models converge to the exact value function. We also show under mild conditions
that the ALP policy cost is nearoptimal when the number of sampled random bases
is sufficiently large. Nevertheless, the performance of the ALP policy can
fluctuate significantly as more basis functions are sampled. To mitigate these
fluctuations, we “selfguide” our convergent sequence of ALPs using past VFA
information such that a worstcase measure of policy performance is improved.
Moreover, our method provides applicationagnostic policies and bounds to
benchmark approaches that exploit application structure.
 Least squares Monte Carlo and
approximate linear programming: Error bounds and energy real option application
(with N.
Secomandi). Advances in supply chain finance and FinTech innovations,
Foundations and Trends in Technology,
Information and Operations Management, Now
Publishers, 2020.
[pdf]
Least squares Monte Carlo (LSM) is an approximate dynamic programming (ADP)
technique commonly used for the valuation of
high dimensional financial and real options, but has broader applicability. It
is known that the regresslater version
of this method is an approximate linear programming (ALP) relaxation that
implicitly provides a potential solution to a
familiar ALP deficiency. We provide numerical backing for the usefulness of this
solution using a numerical study
dealing with merchant ethanol production, an energy real option application,
based on an ALP heuristic that we propose.
When both methodologies are applicable, our research supports the use of
regresslater LSM rather than this ALP
technique to approximately solve intractable Markov decision processes. Our
findings motivate additional research to
obtain even better methods than the regresslater version of LSM.
 Interpretable user
models
via decisionrule Gaussian processes. (with D. MohseniTaheri, T. Tulabandhula).
Advances in Approximate Bayesian Inference workshop, NeurIPS 2019. [pdf]
Models of user behavior are critical inputs in many prescriptive
settings and can be viewed as decision rules that
transform state information available to the user into actions.
Gaussian processes (GPs), as well as nonlinear extensions
thereof, provide a flexible framework to learn user models in
conjunction with approximate Bayesian inference. However,
the resulting models may not be interpretable in general. We
propose decisionrule GPs (DRGPs) that apply GPs in a transformed
space defined by decision rules that have immediate
interpretability to practitioners. We illustrate this modeling
tool on a real application and show that structural variational
inference techniques can be used with DRGPs. We find that
DRGPs outperform the direct use of GPs in terms of both
outofsample performance and the quality of optimized decisions.
These performance advantages continue to hold when
DRGPs are combined with transfer learning.
 Robust
demand
learning. (with B. Chen, S. Jasin).
Workshop on Safety and Robustness in Decision Making, NuerIPS 2019.
We consider a common revenue management problem where a seller needs to
determine the pricing decisions for a product with fixed initial inventory over
T time periods. Some parameters of the demand distribution (as a function of
price) are not known, and the firm needs to learn them from historical sales
data
while simultaneously trying to maximize expected total revenue. Departing from
the typical noregret learning environment, where T can be scaled to infinity,
the
number of time periods is prespecified and fixed in our setting. We propose
a new robust demand learning (RDL) framework that combines robust Markov
decision processes (MDPs) and noregret learning such that the seller is
“hedged”
during initial periods of the selling horizon when sales may be low (i.e., high
model ambiguity), while simultaneously leveraging learning to minimize regret
when sufficient sales data is available. RDL relies on the construction of
dynamic
uncertainty sets that are consistent with the goal of regret minimization. We
provide
theory showing that the RDL pricing policy has the same regret guarantee as pure
learning but has lower risk exposure to model ambiguity. In addition, RDL is
less
conservative than a pure robust MDP approach based on static uncertainty sets.
 SMOILE: Shopper marketing
optimization and inverse learning engine. (with A. Chenreddy, P. Pakiman, R.
Chandrasekaran, R. Abens).
Proceedings of the 25th ACM SIGKDD conference on knowledge discovery and data
mining, Anchorage, Alaska, 2019. (accepted for oral presentation; acceptance
rate 6.4%) [pdf]
Product brands employ shopper marketing (SM) strategies to convert shoppers
along the path to purchase. Traditional marketing mix models (MMMs), which
leverage regression techniques and historical data, can be used to predict the
component of sales lift due to SM tactics. The resulting predictive model is a
critical input to plan future SM strategies. The implementation of traditional
MMMs, however, requires significant adhoc manual intervention due to their
limited flexibility in (i) explicitly capturing the temporal link between
decisions; (ii) accounting for the interaction between business rules and past
(sales and decision) data during the attribution of lift to SM; and (iii)
ensuring that future decisions adhere to business rules. These issues
necessitate MMMs with tailored structures for specific products and retailers,
each requiring significant handengineering to achieve satisfactory performance
 a major implementation challenge.
We propose an SM Optimization and Inverse Learning Engine (SMOILE) that combines
optimization and inverse reinforcement learning to streamline implementation.
SMOILE learns a model of lift by viewing SM tactic choice as a sequential
process, leverages inverse reinforcement learning to explicitly couple sales and
decision data, and employs an optimization approach to handle a widearray of
business rules. Using a unique dataset containing sales and SM spend information
across retailers and products, we illustrate how SMOILE standardizes the use of
data to prescribe future SM decisions. We also track an industry benchmark to
showcase the importance of encoding SM lift and decision structures to mitigate
spurious results when uncovering the impact of SM decisions.
 A threeechelon integrated productiondistribution system (with J. H. Bookbinder).
International Conference on Decision
Sciences and Technology for Globalization, Decision Sciences Institute, Ghaziabad,
India. 2008.
 Enhancing transportation efficiencies through carrier collaboration (with J. H.
Bookbinder). BPC World Conference
, Mumbai, India. 2007.
 Nondestructive evaluation by low pulsing acoustic tap technique: Spectral
relationships (with T.S. Niranjan and
A.V.Varun). Flight 2006, National Aerospace Symposium, Chennai, India. 2006.
Book Chapters
 Real option
management of hydrocarbon cracking operations (with N.
Secomandi, G. Sowers, and J. Wassick). Real
Options in Energy and Commodity Markets, Now Publishers, 2017.
[pdf]
Commodity conversion assets play important economic roles. It is well known
that the market value of these
assets can be maximized by managing them as real options on the prices of their
inputs and/or outputs. In
particular,
when futures on these inputs and outputs are traded, managing such real options,
that is, valuing, hedging, and
exercising them, is analogous to managing options on such futures, using risk
neutral valuation and delta
hedging
methods. This statement holds because dynamically trading portfolios of these
futures and a risk less bond can
replicate the cash flows of these assets. This basic principle is not always
appreciated by managers of
commodity
conversion assets. Moreover, determining the optimal operational cash flows of
such an asset requires optimizing
the asset operating policy. This issue complicates the real option management of
commodity conversion assets.
This
chapter illustrates the application of this approach to manage a hydrocarbon
cracker, a specific commodity
conversion
asset, using linear programming and Monte Carlo simulation. The discussion is
based on a simplified
representation
of the operations of this asset. However, the material presented here has
potential applicability to the real
option
management of more realistic models of hydrocarbon cracking assets, as well as
other energy and commodity
conversion
assets.
Technical Reports
 Least
squares Monte Carlo and approximate linear programming: Error bounds and energy
real option application
(with N. Secomandi). [pdf; extended version of the
conference proceedings with the same title.
Least squares Monte Carlo (LSM) is an approximate dynamic programming (ADP)
technique commonly used for the valuation of
high dimensional financial and real options, but has broader applicability. It
is known that the regresslater version
of this method is an approximate linear programming (ALP) relaxation that
implicitly provides a potential solution to a
familiar ALP deficiency. Focusing on a generic finite horizon Markov decision
process, we provide both theoretical and
numerical backing for the usefulness of this solution, respectively using a
worstcase error bound analysis and a
numerical study dealing with merchant ethanol production, an energy real option
application, based on an ALP heuristic
that we propose. When both methodologies are applicable, our research supports
the use of regresslater LSM rather than
this ALP technique to approximately solve intractable Markov decision processes.
Our numerical findings motivate
additional research to obtain even better methods than the regresslater version
of LSM.
 Dynamic
pricing for
hotel
rooms when
customers request multipleday stays
(with Y. F. Lim, Q. Ding). [pdf]
Prominent hotel chains quote a booking price for a particular type of rooms
on each day and dynamically update these prices over time. We present a novel
Markov decision process (MDP) formulation that determines the optimal booking
price for a single type of rooms under this strategy, while considering the
availability of rooms throughout the multipleday stays requested by customers.
We analyze special cases of our MDP to highlight the importance of modeling
multipleday stays and provide guidelines to potentially simplify the
implementation of pricing policies around peakdemand events such as public
holidays and conferences. Since computing an optimal policy to our MDP is
intractable in general, we develop heuristics based on a fluid approximation and
approximate linear programming (ALP). We numerically benchmark our heuristics
against a singleday decomposition approach (SDD) and an adaptation of a
fixedprice heuristic. The ALPbased heuristic (i) outperforms the other
methods; (ii) generates up to 7% and 6% more revenue than the SDD and the
fixedprice heuristic respectively; and (iii) incurs a revenue loss of only less
than 1% when using our pricing structure around peakdemand events, which
supports the use of this simple pricing profile. Our findings are potentially
relevant beyond the hotel domain for applications involving the dynamic pricing
of capacitated resources.
