|
My research addresses
challenges at the interface of operations and finance arising in the energy industry using
reinforcement learning and optimization. I am in particular interested in the operations of
commodity and energy conversion assets (e.g., storage, transport, and production), their
valuation and risk management, the role of corporations in the clean energy transition, and
the social impact of this transition.
Influenced by my energy research and work with industry, I am passionate about making
reinforcement learning and optimization methods easier to adopt by reducing the effort
needed to deploy them effectively across different applications. To this end, I develop
methods that embed solve intelligence and are capable of adapting over time and to
instance-specific information in a problem class.
(click on paper title to view
abstract)
Working Papers
- Hierarchical planning for hydropower
capacity upgrade: Exploiting structure in reoptimization and investment policies
(with A. Kleiven and S. E. Fleten). [pdf; Under review at Manufacturing and Service Operations Management]
Problem definition: Upgrading the capacity of existing hydropower plants is
an important lever to achieve a
carbon-neutral power grid. Inspired by hierarchical planning that underpins
advanced planning systems in supply chains,
we investigate a hierarchical perspective towards capacity upgrade at a
hydropower plant based on work with a European
hydropower producer. Methodology/results: We develop a nested Markov decision
process model that captures both the
timing flexibility to delay upgrades and hourly-production flexibility. The
former is typically not considered in
practice, while the latter is not accurately captured in dynamic upgrade models
in academia. Our work bridges this gap.
We combine policy and price model structure to show that dynamic upgrade
policies that accurately value
hourly-production flexibility can be computed efficiently, thus adding to
decision intelligence in the hydropower
industry. We perform extensive numerical experiments on instances calibrated to
real market and operational data. We
find that our approach allows a hydropower producer to disentangle the value of
optimizing the timing of a capacity
upgrade versus the value of accurately capturing hourly-production flexibility.
Managerial implications: Accurately
valuing hourly-production flexibility results in more capacity addition,
regardless of whether the upgrade timing is
optimized. It is thus relevant to NPV models that are popular in practice.
Dynamic upgrade policies are viewed as a
means of increasing plant value by optimizing upgrade timing. We discover that a
significant alternative benefit of
optimized upgrade time lies in recovering the upgrade investment cost
substantially sooner, even when the increase in
market value is marginal.
- Decision intelligence for
healthcare decarbonization (with Sylvia Dziemian). [pdf; Under review at Foundations and
Trends in Technology, Information and Operations Management; Invited article]
Decarbonization goals in healthcare have increased significantly. Their
achievement requires aligning environmental
objectives with improvements in financial performance and health outcomes. Among
a wide range of options, the healthcare
sector can (i) adopt successful decarbonization strategies from other
energy intensive sectors, and (ii)
enhance its patient-focused activities with decarbonization potential.
This chapter provides examples of this
adopt-and-enhance approach, spotlighting the enabling role of decision
intelligence (DI), that is, the decision-centered
use of data and technology. First, we discuss DI for managing risks in adopting
renewables on site in a hospital.
Second, we highlight the challenge posed by access inequity in decarbonizing
tuberculosis care delivery and propose a DI
research framework to study this challenge. These examples highlight the
potential for DI to accelerate decarbonization.
- Physical vs. virtual corporate power
purchase agreements: Meeting renewable targets amid demand and price
uncertainty.
(with D. Mohseni Taheri, A. Trivella). [pdf; Under revision for second round review at European Journal of Operational Research]
Power purchase agreements (PPAs) have become an important corporate
procurement vehicle for renewable power, especially among companies that have
committed to targets requiring a certain fraction of their power demand be met
by renewables. PPAs are long-term contracts that provide renewable energy
certificates (RECs) to the corporate buyer and take two main forms: Physical vs
Virtual. Physical PPAs deliver power in addition to RECs, while virtual PPAs are
financial contracts that hedge (at least partially) power price uncertainty. We
compare procurement portfolios that sign physical PPAs with ones that sign
virtual PPAs, focusing on fixed-volume contracts and emphasizing uncertainties
in power demand and the prices of power and RECs. In particular, we first
analyze a two-stage stochastic model to understand the behavior of procurement
quantities and costs when using physical and virtual PPAs as well as variants
that limit risk. We subsequently formulate a Markov decision process (MDP) that
optimizes the multi-stage procurement of power to reach and sustain a renewable
procurement target. By leveraging state-of-the-art reoptimization techniques, we
solve this MDP on realistic instances to near optimality, and highlight the
relative benefits of using PPA types to meet a renewable target.
- A parameter-free and
projection-free
restarting
level set method for adaptive constrained convex optimization under the error
bound condition.
(with Q. Lin, R. Ma, and N. Soheili). [pdf;
Under
review at Journal of Machine Learning Research]
Recent efforts to accelerate first-order methods have focused on convex
optimization problems that satisfy a geometric property known as error-bound
condition,
which covers a broad class of problems, including piece-wise linear programs and
strongly convex programs. Parameter-free first-order methods that employ
projection-free
updates have the potential to broaden the benefit of acceleration. Such a method
has been developed for unconstrained convex optimization but is lacking for
general
constrained convex optimization. We propose a parameter-free level-set method
for
the latter constrained case based on projection-free subgradient decent that
exhibits
accelerated convergence for problems that satisfy an error-bound condition. Our
method maintains a separate copy of the level-set sub-problem for each level
parameter
value and restarts the computation of these copies based on objective function
progress. Applying such a restarting scheme in a level-set context is novel and
results
in an algorithm that dynamically adapts the precision of each copy. This
property is
key to extending prior restarting methods based on static precision that have
been
proposed for unconstrained convex optimization to handle constraints. We report
promising numerical performance relative to benchmark methods.
- Self-adapting robustness in
demand
learning.
(with B. Chen, P. Pakiman, S. Jasin). [pdf;
Under revision for resubmission to Operations Research]
We study dynamic pricing over a finite number of periods in the presence of
demand model ambiguity. Departing from the typical no-regret learning
environment, where price changes are allowed at any time, pricing decisions are
made at pre-specified points in time and each price can be applied to a large
number of arrivals. In this environment, which arises in retailing, a pricing
decision based on an incorrect demand model can significantly impact cumulative
revenue. We develop an adaptively-robust-learning (ARL) pricing policy that
learns the true model parameters from the data while actively managing demand
model ambiguity. It optimizes an objective that is robust with respect to a
self-adapting set of demand models, where a given model is included in this set
only if the sales data revealed from prior pricing decisions makes it
``probable''. As a result, it gracefully transitions from being robust when
demand model ambiguity is high to minimizing regret when this ambiguity
diminishes upon receiving more data. We characterize the stochastic behavior of
ARL's self-adapting ambiguity sets and derive a regret bound that highlights the
link between the scale of revenue loss and the customer arrival pattern. We also
show that ARL, by being conscious of both model ambiguity and revenue, bridges
the gap between a distributionally robust policy and a follow-the-leader policy,
which focus on model ambiguity and revenue, respectively. We numerically find
that the ARL policy, or its extension thereof, exhibits superior performance
compared to distributionally robust, follow-the-leader, and
upper-confidence-bound policies in terms of expected revenue and/or value at
risk.
- Self-adapting least squares
Monte
Carlo for managing financial and real options.
(with P. Pakiman).
Accepted/Published Journal Papers
- Self-guided approximate
linear programs: Randomized multi-shot approximation of discounted Markov
decision processes.
(with P. Pakiman, N. Soheili, Q. Lin), Forthcoming at Management Science.
[pdf]
Approximate linear programs (ALPs) are well-known models based on value
function approximations (VFAs) to obtain heuristic policies and lower bounds on
the optimal policy cost of Markov decision processes (MDPs). The ALP VFA is a
linear combination of predefined basis functions that are chosen using domain
knowledge and updated heuristically if the ALP optimality gap is large. We
side-step the need for such basis function engineering in ALP -- an
implementation bottleneck -- by proposing a sequence of ALPs that embed
increasing numbers of random basis functions obtained via inexpensive sampling.
We provide a sampling guarantee and show that the VFAs from this sequence of
models converge to the exact value function. Nevertheless, the performance of
the ALP policy can fluctuate significantly as more basis functions are sampled.
To mitigate these fluctuations, we ``self-guide'' our convergent sequence of
ALPs using past VFA information such that a worst-case measure of policy
performance is improved. We perform numerical experiments on perishable
inventory control and generalized joint replenishment applications, which,
respectively, give rise to challenging discounted-cost MDPs and average-cost
semi-MDPs. We find that self-guided ALPs (i) significantly reduce policy cost
fluctuations and improve the optimality gaps from an ALP approach that employs
basis functions tailored to the former application, and (ii) deliver optimality
gaps that are comparable to a known adaptive basis function generation approach
targeting the latter application. More broadly, our methodology provides
application-agnostic policies and lower bounds to benchmark approaches that
exploit application structure.
- Self-adapting network relaxations
for weakly coupled Markov decision processes.
(with A. Cire), Forthcoming at Management Science. [pdf; video]
Weakly coupled Markov decision processes (WDPs) arise in dynamic
decision-making and reinforcement learning. These
models are often high dimensional but decompose into smaller component MDPs when
coupling constraints are relaxed.
Lagrangian relaxations of WDPs that dualize linking constraints are commonly
used to compute heuristic policies and
(optimistic) bounds. While computationally appealing, this Lagrangian approach
averages away combinatorial information
embedded in the linking constraints. We present a class of network relaxations,
dubbed feasibility network relaxations
(FNRs), that embed an exact network encoding of the linking constraints into a
linear programming flow model. We develop
a procedure to obtain the minimally sized such relaxation, which we refer to as
self-adapting FNR, as its size
automatically adjusts to the structure of the linking constraints. We analyze
this model to inform the approximation of
WDPs: (i) self-adapting FNR provides (weakly) stronger bounds than standard
Lagrangian relaxation and is polynomially
sized for instances where the linking constraints admit a tractable network
representation; and (ii) self-adapting FNR
provides bounds and policies that match the approximate linear programming (ALP)
approach but is substantially smaller
in size, even polynomially sized when existing ALP formulations are
exponentially large. We perform numerical
experiments on constrained dynamic assortment and preemptive maintenance
applications. Our results show that
self-adapting FNR significantly improves on standard Lagrangian relaxation in
terms of policy performance and/or bounds,
although requiring more time, while being an order of magnitude faster than
models based on alternative Lagrangian
approaches and ALP, which are unsolvable in several instances.
- Corporate renewable procurement
analytics.
Foundations and Trends in Technology, Information and Operations Management, 16
(3-4),
2023. [pdf]
Corporate decarbonization goals have increased rapidly in the last few
years. The procurement of renewable power is a core strategy used by companies
to meet these goals, increasingly in a dynamic manner that addresses the risks
associated with uncertain prices and supply intermittency, among others. This
section discusses the interplay between data and decision analytics in this
rapidly evolving area by considering the construction of a dynamic portfolio of
power purchase agreements, which are popular long-term contracts signed by
corporations, to meet a future renewable procurement target. It analyzes a
stylized setting to provide insight into the effect on decisions of the joint
evolution of uncertainties. It also discusses how forecasts, stochastic
processes, and deterministic models can be used to obtain procurement policies
in practical settings. These elements have a rich history in operations
management but have received limited attention for renewable power procurement.
Emphasis is placed on how a traditional rolling planning model based on
forecasts can be adapted to this procurement setting, as well as where a recent
rolling planning technique based on information relaxations can add value.
- Least squares Monte Carlo and
pathwise optimization for merchant energy production.
(with B. Yang, N. Secomandi), Forthcoming in Operations Research,
2023. [pdf]
We study merchant energy production modeled as a compound switching and
timing option. The resulting Markov decision process is intractable. Least
squares Monte Carlo combined with information relaxation and duality is a
state-of-the-art reinforcement learning methodology to obtain operating policies
and optimality gaps for related models. Pathwise optimization is a competing
technique developed for optimal stopping settings, in which it typically
provides superior results compared with this approach, albeit with a larger
computational effort. We apply these procedures to merchant energy production.
Using pathwise optimization requires methodological extensions. We use principal
component analysis and block coordinate descent in novel ways to respectively
precondition and solve the ensuing ill-conditioned and large-scale linear
program, which even a cutting-edge commercial solver is unable to handle
directly. Both techniques yield near optimal operating policies on realistic
ethanol production instances. However, at the cost of both considerably longer
run times and greater memory usage, which limits the number of stages of the
instances that it can handle, pathwise optimization leads to substantially
tighter dual bounds compared with least squares Monte Carlo, even when specified
in a simple fashion, complementing it in this case. Thus, it plays a critical
role in obtaining small optimality gaps. Our numerical observations on the
magnitudes of these bound improvements differ from what is currently known. This
research has potential relevance for other commodity merchant operations
contexts and motivates additional algorithmic work in the area of pathwise
optimization.
- Deep reinforcement learning with
planning guardrails for building energy demand response.
(with D. Jang, L. Spangher, C. Spanos), Energy and AI, 11, 2023. [pdf]
Building energy demand response is projected to be important in
decarbonizing energy use. A demand response program that communicates
``artificial" hourly price signals to workers as part of a social game has the
potential to elicit energy consumption changes that simultaneously reduce energy
costs and emissions. The efficacy of such a program depends on the pricing
agent's ability to learn how workers respond to prices and mitigate the risk of
high energy costs during this learning process. We assess the value of deep
reinforcement learning (RL) for mitigating this risk. Specifically, we explore
the value of combining: (i) a model-free RL method that can learn by posting
price signals to workers, (ii) a supervisory ``planning model'' that provides a
synthetic learning environment, and (iii) a guardrail method that determines
whether a price should be posted to real workers or the planning environment for
feedback. In a simulated medium-sized office building, we compare our pricing
agent against existing model-free and model-based deep RL agents, and the
simpler strategy of passing on the time-of-use price signal to workers. We find
that our controller eliminates 175,000 US Dollars in initial investment,
decreases by 30% the energy cost, and curbs emissions by 32% compared to
energy consumption under the time-of-use rate. In contrast, the model-free and
model-based deep RL benchmarks are unable to overcome initial learning costs.
Our results bode well for risk-aware deep RL facilitating the deployment of
building demand response.
- A review of the operations
literature on real options in energy.
(with N. Secomandi), European Journal of Operational Research, 309 (2), 2023.
[pdf;
xlsx table]
Real option models maximize the estimated market value of operational
assets, exploiting the flexibility that decision makers have in managing these
assets. Inspired by the valuation of financial options, their natural domains of
application feature high levels of market or operational risk. The existence of
financial markets that trade instruments associated with the inputs or outputs
of the underlying processes facilitates the use of these models. The energy
industry has thus received a substantial amount of attention within the real
option and operations (management/research) communities. The extant work lacks a
review of the operations literature on energy real options. We present a
synthesis of this literature considering review categories that pertain to the
phenomena it studies and the tools it uses to conduct its analysis. Further, we
outline potential research directions.
- Meeting corporate renewable
power
targets.
(with A. Trivella, D. Mohseni-Taheri), Management
Science, 69 (1), 2023. [pdf; video;
Received the 2021 Commodity and Energy Markets Association Best Paper Award and
the 2020 INFORMS ENRE Young Researcher Prize]
Prominent companies have committed to procuring a percentage of their power
demand from renewable sources by a future date in the face of uncertain power
demand and stochastic power and renewable energy certificate (REC) prices. We
study procurement portfolios based on two dominant strategies to achieve this
target: long-term procurement of power and RECs at a fixed price using corporate
power purchase agreements (CPPAs) and short-term purchases at volatile prices.
We analyze a two-stage model to understand the behavior of procurement costs
when using financial and physical CPPA variants employed in practice, which
informs the structuring of these contracts. We subsequently formulate a Markov
decision process (MDP) that optimizes the multi-stage procurement of power to
reach and sustain a renewable procurement target. Our MDP is intractable because
its action space is non-convex and its state space has high-dimensional
endogenous and exogenous components. Although approximate methods to solve this
MDP are limited, a procurement policy can be obtained using an easy-to-implement
``primal'' reoptimization strategy, which solves a deterministic model with
stochastic quantities in the MDP replaced by forecasts. This approach does not,
however, provide a lower bound on the optimal policy value. We propose a novel
``dual'' reoptimization heuristic which computes both procurement decisions and
a lower bound while retaining the desirable implementation properties of primal
reoptimization. On realistic instances, the dual reoptimization policy is
near-optimal and outperforms policies from primal reoptimization and other
heuristics. Our numerical results also highlight the benefits of using CPPA
contracts to meet a renewable target.
- Data-driven storage
operations: Cross-commodity backtest and structured policies.
(with C. Mandl, S. Minner, N. Gavirneni), Production and Operations Management,
31(6), 2022. [pdf]
Storage assets are critical for temporal trading of commodities under
volatile prices. State-of-the-art methods for managing storage such as the
reoptimization heuristic (RH), which are part of commercial software,
approximate a Markov decision process (MDP) assuming full information regarding
the state and the stochastic commodity price process and hence suffer from
informational inconsistencies with observed price data and structural
inconsistencies with the true optimal policy, which are both components of
generalization error. Based on extensive backtests, we find that this error can
lead to significantly suboptimal RH policies and qualitatively different
performance compared to the known near-optimality and behavior of RH in the
full-information setting. We develop a forward-looking data-driven approach
(DDA) to learn policies and overcome generalization error. This approach extends
standard (backward-looking) DDA in two ways: (i) it uses financial-market
features and estimates of future profits as part of the training objective,
which typically includes past profits alone; and (ii) it enforces structural
properties of the optimal policy. To elaborate, DDA trains parameters of
bang-bang and base-stock policies, respectively, by solving linear-and
mixed-integer programs, thereby extending known DDAs that parameterize decisions
as functions of features without enforcing policy structure. We backtest the
performance of DDA and RH on six major commodities from 2000 to 2017 with
features constructed using Thomson Reuters and Bloomberg data. DDA significantly
improves RH on real data, with base-stock structure needed to realize this
improvement. Our research advances the state-of-the-art for storage operations
and suggests modifications to commercial software to handle generalization
error.
- Socially responsible merchant
operations: Comparison of shutdown-averse CVaR and anticipated regret policies.
(with A. Trivella). Operations Research Letters, 49(4), 2021. [pdf]
Commodity and energy production assets are managed as real options on
market uncertainties. Social impacts of plant shutdowns incentivize balancing
asset value with shutdown probability. We propose new shutdown-averse policies
based on the popular dynamic conditional value-at-risk (CVaR). We analytically
and numerically compare these policies to known shutdown-averse policies based
on anticipated regret (AR). Our findings support the use of AR over CVaR to
embed shutdown-aversion and the consideration of hybrid policies that are
partially time-consistent but easily interpretable.
- Managing shutdown decisions in
merchant commodity and energy production: A social commerce perspective
(with A. Trivella, S.E. Fleten, D. Mazieres, D. Pisinger). Manufacturing and
Service Operations Management, 23 (2), 2021. [pdf]
Merchant commodity and energy production assets operate in markets with
volatile prices and exchange rates.
Producers can choose in each period between production, production suspension,
and permanent shutdown. Plant closures,
however, adversely affect societal entities beyond the specific plant being
shutdown such as the parent company
and the local community. Motivated by an aluminum producer, we study if
mitigating these hard-to-assess broader
impacts of a shutdown is financially viable using the plant's operating
flexibility. Our social commerce perspective
towards managing shutdown decisions deviates from the commonly used asset value
maximization objective in merchant
operations. We formulate a high-dimensional constrained Markov decision process
to manage shutdown decisions. We
approximate this intractable model using unconstrained stochastic dynamic
programs and define operating policies
that account for preferences to delay and reduce shutdowns. Our first policy
leverages anticipated regret theory
in behavioral psychology while the second policy generalizes production margin
heuristics used in practice using
machine learning. We compute the former and latter policies using a least
squares Monte Carlo method and combining
this method with binary classification, respectively. We also propose a
reoptimization heuristic to simplify the
anticipated-regret policy. We show that anticipated-regret policies possess
desirable asymptotic properties absent
in classification- and reoptimization-based policies. On instances created using
real data, anticipated-regret
and classification-based policies outperform practice-based production margin
strategies and, to a lesser extent,
reoptimization. Specifically, the former policies decrease the shutdown
probability by 25% and, in addition, delay
shutdown decisions by an average of 4 years for a 4% asset value loss. Our
operating policies show that unaccounted
social costs amounting to a few percent of the maximum asset value can justify
delaying or avoiding the use of
a plant's shutdown option by adapting its operating flexibility in our
application. Thus, taking a social commerce
perspective towards managing a plant's operating flexibility appears financially
viable.
- A data efficient
and
feasible level
set
method for stochastic convex optimization with expectation constraints.
(with Q. Lin, N. Soheili, T. Yang). Journal of Machine Learning Research,
21(143), 2020. [pdf]
Stochastic convex optimization problems with expectation constraints
(SOECs) are encountered in statistics and machine learning, business, and
engineering. In data-rich environments, the SOEC objective and constraints
contain expectations defined with respect to large datasets. Therefore,
efficient algorithms for solving such SOECs need to limit the fraction of data
points that they use, which we refer to as algorithmic data complexity. Recent
stochastic first order methods exhibit low data complexity when handling SOECs
but guarantee near-feasibility and near-optimality only at convergence. These
methods may thus return highly infeasible solutions when heuristically
terminated, as is often the case, due to theoretical convergence criteria being
highly conservative. This issue limits the use of first order methods in several
applications where the SOEC constraints encode implementation requirements. We
design a stochastic feasible level set method (SFLS) for SOECs that has low data
complexity and emphasizes feasibility before convergence. Specifically, our
level-set method solves a root-finding problem by calling a novel first order
oracle that computes a stochastic upper bound on the level-set function by
extending mirror descent and online validation techniques. We establish that
SFLS maintains a high-probability feasible solution at each root-finding
iteration and exhibits favorable iteration complexity compared to
state-of-the-art deterministic feasible level set and stochastic subgradient
methods. Numerical experiments on three diverse applications validate the low
data complexity of SFLS relative to the former approach and highlight how SFLS
finds feasible solutions with small optimality gaps significantly faster than
the latter method.
- Network-based approximate linear
programming for discrete optimization
(with A. Cire). Operations Research, 68(6), 2020. [pdf]
We present relaxations for discrete optimization problems using approximate
linear programs (ALPs) defined on multiple networks that represent different
state-space aggregations. Our network ALP leverages information across these
networks using a piecewise-constant value function approximation and its
optimistic bound is theoretically guaranteed to weakly improve upon the bounds
from the individual networks used in its construction. Solving network ALPs is
challenging due to its large number of constraints -- a well-known issue when
applying approximate linear programming. We side-step this issue by using a
clique-based graph representation to design a network ALP restriction that
admits a polynomial-time solvable extended formulation, which we show to also
deliver a weakly better bound than individual networks. We execute a
computational study on a challenging bilinear problem arising in marketing
analytics and a routing application encountered in the preemptive maintenance of
energy or city-owned assets. When used within a branch-and-bound scheme, the
network ALP restriction significantly outperforms in both solution quality and
time the following benchmarks: a state-of-the-art mathematical programming
solver, a generic aggregation/dissaggregation scheme applied to a single
network, and a heuristic that chooses the best bound among individual networks.
- Partial hyperplane activation for
generalized intersection cuts
(with A. Kazachkov, E. Balas, and F. Margot). Mathematical Programming
Computation, 12, 2020. [pdf;
Received the Tepper Egon Balas PhD student
paper award]
The generalized intersection cut (GIC) paradigm is a recent framework for
generating cutting planes in mixed
integer programming with attractive theoretical properties. We investigate this
computationally unexplored paradigm
and observe that a key hyperplane activation procedure embedded in it is not
computationally viable. To overcome
this issue, we develop a novel replacement to this procedure called partial
hyperplane activation (PHA), introduce
a variant of PHA based on a notion of hyperplane tilting, and prove the validity
of both algorithms. We propose
several implementation strategies and parameter choices for our PHA algorithms
and provide supporting theoretical
results. We computationally evaluate these ideas in the COIN-OR framework on
MIPLIB instances. Our findings shed
light on the the strengths of the PHA approach as well as identify properties
related to strong cuts that can be
targeted in the future.
- Revisiting approximate linear
programming: Constraint-violation learning with applications to inventory
control and energy storage
(with Q. Lin and N. Soheili). Management Science, 66(4), 2020. [pdf]
Approximate linear programs (ALPs) are well-known models for computing
value function approximations (VFAs) of intractable Markov decision processes
(MDPs). VFAs from ALPs have desirable theoretical properties, define an
operating policy, and provide a lower bound on the optimal policy cost. However,
solving ALPs near-optimally remains challenging, for example, when approximating
MDPs with nonlinear cost functions and transition dynamics or when rich basis
functions are required to obtain a good VFA. We address this tension between
theory and solvability by proposing a convex saddle-point reformulation of an
ALP that includes as primal and dual variables, respectively, a vector of basis
function weights and a constraint violation density function over the
state-action space. To solve this reformulation, we develop a proximal
stochastic mirror descent (PSMD) method that learns regions of high ALP
constraint violation via its dual update. We establish that PSMD returns a
near-optimal ALP solution and a lower bound on the optimal policy cost in a
finite number of iterations with high probability. We numerically compare PSMD
with several benchmarks on inventory control and energy storage applications. We
find that the PSMD lower bound is tighter than a perfect information bound. In
contrast, the constraint sampling approach to solve ALPs may not provide a lower
bound and applying row generation to tackle ALPs is not computationally viable.
PSMD policies outperform problem-specific heuristics and are comparable or
better than the policies obtained using constraint sampling. Overall, our ALP
reformulation and solution approach broadens the applicability of approximate
linear programming.
- Least squares Monte Carlo and
approximate linear programming: Error bounds and energy real option application
(with N.
Secomandi). Advances in supply chain finance and FinTech innovations,
Foundations and Trends in Technology,
Information and Operations Management, 14 (1-2), 2020.
[pdf]
Least squares Monte Carlo (LSM) is an approximate dynamic programming (ADP)
technique commonly used for the valuation of
high dimensional financial and real options, but has broader applicability. It
is known that the regress-later version
of this method is an approximate linear programming (ALP) relaxation that
implicitly provides a potential solution to a
familiar ALP deficiency. We provide numerical backing for the usefulness of this
solution using a numerical study
dealing with merchant ethanol production, an energy real option application,
based on an ALP heuristic that we propose.
When both methodologies are applicable, our research supports the use of
regress-later LSM rather than this ALP
technique to approximately solve intractable Markov decision processes. Our
findings motivate additional research to
obtain even better methods than the regress-later version of LSM.
- Merchant energy trading in a
network
(with N. Secomandi). Operations Research, 66(5), 2018. [pdf]
We formulate the merchant trading of energy in a network of storage and
transport assets as a Markov decision
process with uncertain energy prices, generalizing known models. Because of the
intractability of our model, we
develop heuristics and both lower and dual (upper) bounds on the optimal policy
value estimated within Monte Carlo
simulation. We achieve tractability using linear optimization, extending near
optimal approximate dynamic programming
techniques for the case of a single storage asset, versions of two of which are
commercially available. We propose
(i) a generalization of a deterministic reoptimization heuristic, (ii) an
iterative version of the least squares
Monte Carlo approach, and (iii) a perfect information dual bound. We apply our
methods to a set of realistic natural
gas instances. The combination of our reoptimization heuristic and dual bound
emerges as a practical approach to
nearly optimally solve our model. Our iterative least squares Monte Carlo
heuristic is also close to optimal. Compared
to our other heuristic, it exhibits slightly larger optimality gaps and requires
some tuning, but is faster to
execute in some cases. Our methods could enhance single energy storage asset
software and have potential relevance
beyond our specific application.
- A level-set method for convex
optimization with a feasible solution path
(with Q. Lin and N. Soheili). SIAM Journal on Optimization, 28(4), 2018.
[pdf]
Large-scale constrained convex optimization problems arise in several
application domains. First-order methods
are good candidates to tackle such problems due to their low iteration
complexity and memory requirement. The level-set
framework extends the applicability of first-order methods to tackle problems
with complicated convex objectives
and constraint sets. Current methods based on this framework either rely on the
solution of challenging subproblems
or do not guarantee a feasible solution, especially if the procedure is
terminated before convergence. We develop
a level-set method that finds an $\epsilon$-relative optimal and feasible
solution to a constrained convex optimization
problem with a fairly general objective function and set of constraints,
maintains a feasible solution at each
iteration, and only relies on calls to first-order oracles. We establish the
iteration complexity of our approach,
also accounting for the smoothness and strong convexity of the objective
function and constraints when these properties
hold. The dependence of our complexity on $\epsilon$ is similar to the analogous
dependence in the unconstrained
setting, which is not known to be true for level-set methods in the literature.
Nevertheless, ensuring feasibility
is not free. The iteration complexity of our method depends on a condition
number, while existing level-set methods
that do not guarantee feasibility can avoid such dependence. We numerically
validate the usefulness
of ensuring a feasible solution path by comparing our approach with an existing
level set method on a Neyman-Pearson
classification problem.
- Relationship between least squares
Monte Carlo and approximate linear programming
(with N. Secomandi). Operations Research Letters, 45(5), 2017. [pdf]
Least squares Monte Carlo (LSM) is commonly used to manage and value
intractable early and multiple exercise
financial and real options. Recent research in this area has started applying
approximate linear programming (ALP)
and its relaxations, which aim at addressing a possible ALP drawback. We show
that LSM is itself an ALP relaxation
that potentially corrects this ALP shortcoming. Our analysis consolidates two
separate streams of research and
suggests using LSM rather than ALP for the considered class of problems.
- Comparison of
least squares Monte Carlo methods with applications to energy
real options (with F. Margot and N. Secomandi). European Journal of
Operational Research, 256(1), 2017. [pdf]
Least squares Monte Carlo (LSM) is a state-of-the-art approximate dynamic
programming approach used in financial
engineering and real options to value and manage options with early or multiple
exercise opportunities. It is also
applicable to capacity investment and inventory/production management problems
with demand/supply forecast updates
arising in operations and hydropower-reservoir management. LSM has two variants,
referred to as regress-now/later
(LSMN/L), which compute continuation/value function approximations (C/VFAs). We
provide novel numerical evidence
for the relative performance of these methods applied to energy swing and
storage options, two typical real options,
using a common price evolution model. LSMN/L estimate C/VFAs that yield equally
accurate (near optimal) and precise
lower and dual (upper) bounds on the value of these real options. Estimating the
LSMN/L C/VFAs and their associated
lower bounds takes similar computational effort. In contrast, the estimation of
a dual bound using the LSML VFA
instead of the LSMN CFA takes seconds rather than minutes or hours. This finding
suggests the use of LSML in lieu
of LSMN when estimating dual bounds on the value of early or multiple exercise
options, as well as of related capacity
investment and inventory/production policies.
-
Relaxations
of approximate linear programs for the real option management
of commodity storage (with F. Margot and N. Secomandi).
Management Science, 61(12), 2015.
[pdf]
The real option management of commodity conversion assets gives rise to
intractable Markov decision processes
(MDPs), in part due to the use of high dimensional models of commodity forward
curve evolution, as commonly done
in practice. Focusing on commodity storage, we identify a deficiency of
approximate linear programming, which we
address by developing a novel approach to derive relaxations of approximate
linear programs (ALPs). We apply our
approach to obtain a class of tractable ALP relaxations, also subsuming an
existing method. We provide theoretical
support for the use of these ALP relaxations rather than their corresponding
ALPs. Applied to existing natural
gas storage instances, our ALP relaxations significantly outperform their
corresponding ALPs. Our best ALP relaxation
is both near optimal and competitive with, albeit slower than, state-of-the-art
methods for computing heuristic
policies and lower bounds on the value of commodity storage, but is more
directly applicable for dual upper bound
estimation than these methods. Our approach is potentially relevant for the
approximate solution of MDPs that arise
in the real option management of other commodity conversion assets, as well as
the valuation of real and financial
options that depend on forward curve dynamics.
- Less-Than-Truckload carrier collaboration
problem: modeling framework and solution approach (with J. H. Bookbinder).
Journal of Heuristics, 19(6),
2013. [pdf]
Less-Than-Truckload (LTL) carriers generally serve geographical regions
that are more localized than the inter-city
line-hauls served by truckload carriers. That localization can lead to urban
freight transportation routes that
overlap. If trucks are traveling with less than full loads, there typically
exist opportunities for carriers to
collaborate over such routes. We introduce a two stage framework for LTL carrier
collaboration. Our first stage
involves collaboration between multiple carriers at the entrance to the city and
can be formulated as a vehicle
routing problem with time windows (VRPTW). We employ guided local search for
solving this VRPTW. The second stage
involves collaboration between carriers at transshipment facilities while
executing their routes identified in
phase one. For solving the second stage problem, we develop novel local search
heuristics, one of which leverages
integer programming to efficiently explore the union of neighborhoods defined by
new problem-specific move operators.
Our computational results indicate that integrating integer programming with
local search results in at least an
order of magnitude speed up in the second stage problem. We also perform
sensitivity analysis to assess the benefits
from collaboration. Our results indicate that distance savings of 7–15 %
can be achieved by collaborating
at the entrance to the city. Carriers involved in intra-city collaboration can
further save 3–15 % in total
distance traveled, and also reduce their overall route times.
Conference Proceedings and Workshop Papers
-
Offline-online reinforcement learning for energy pricing in office demand
response: Lowering energy and data costs (with D. Jang, L. Spangher, T.
Srivistava, M. Khattar, U. Agwan, and C. Spanos). Proceedings of the 8th ACM
International
Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
(BuildSys '21), 2021. [pdf]
Our team is proposing to run a full-scale energy demand response experiment in
an office building. Although this is an exciting endeavor which will provide
value to the community, collecting training data for the reinforcement learning
agent is costly and will be limited. In this work, we examine how offline
training can be leveraged to minimize data costs (accelerate convergence) and
program implementation costs. We present two approaches to doing so: pretraining
our model to warm start the experiment with simulated tasks, and using a
planning model trained to simulate the real world's rewards to the agent. We
present results that demonstrate the utility of offline reinforcement learning
to efficient price-setting in the energy demand response problem.
-
A machine learning approach to methane emmisions mitigation in the oil and gas
industry (with Jiayang Wang, Jingfan Wang, and P. Ravikumar). Tackling climate
change with machine learning workshop, NeurIPS 2020. [pdf;
Selected for a Spot Light Talk and Overall Best Paper. Research covered by
Fortune magazine in its Eye on A.I.
newsletter.]
Reducing methane emissions from the oil and gas sector is a key component
of climate policy in the United States. Methane leaks across the supply chain
are stochastic and intermittent, with a small number of sites (‘super-emitters’)
responsible for a majority of emissions. Thus, cost-effective emissions
reduction critically relies on effectively identifying the super-emitters from
thousands
of well-sites and millions of miles of pipelines. Conventional approaches such
as
walking surveys using optical gas imaging technology are slow and
time-consuming. In
addition, several variables contribute to the formation of leaks such as
infrastructure age, production, weather conditions, and maintenance practices.
Here, we develop
a machine learning algorithm to predict high-emitting sites that can be
prioritized for follow-up repair. Such prioritization can significantly reduce
the cost of
surveys and increase emissions reductions compared to conventional approaches.
Our
results show that the algorithm using logistic regression performs the best out
of several algorithms. The model achieved a 70% accuracy rate with a 57% recall
and a 66% balanced accuracy rate. Compared to the conventional approach, the
machine learning model reduced the time to achieve a 50% emissions mitigation
target by 42%. Correspondingly, the mitigation cost reduced from $85/t CO2e to
$49/t CO2e.
-
Self-guided approximate linear programs (with P. Pakiman, N. Soheili, and Q.
Lin). Self-supervised learning workshop, NeurIPS 2020.
[pdf]
Approximate linear programs (ALPs) are well-known models based on value function
approximations (VFAs) to obtain heuristic policies and lower bounds on
the optimal policy cost of Markov decision processes (MDPs). The ALP VFA is a
linear combination of predefined basis functions that are chosen using domain
knowledge and updated heuristically if the ALP optimality gap is large. We
side-step the need for such basis function engineering in ALP – an
implementation bottleneck – by proposing a sequence of ALPs that embed
increasing numbers of random basis functions obtained via inexpensive sampling.
We provide a sampling guarantee and show that the VFAs from this sequence of
models converge to the exact value function. We also show under mild conditions
that the ALP policy cost is near-optimal when the number of sampled random bases
is sufficiently large. Nevertheless, the performance of the ALP policy can
fluctuate significantly as more basis functions are sampled. To mitigate these
fluctuations, we “self-guide” our convergent sequence of ALPs using past VFA
information such that a worst-case measure of policy performance is improved.
Moreover, our method provides application-agnostic policies and bounds to
benchmark approaches that exploit application structure.
- Interpretable user
models
via decision-rule Gaussian processes. (with D. Mohseni-Taheri, T. Tulabandhula).
Advances in Approximate Bayesian Inference workshop, NeurIPS 2019. [pdf]
Models of user behavior are critical inputs in many prescriptive
settings and can be viewed as decision rules that
transform state information available to the user into actions.
Gaussian processes (GPs), as well as nonlinear extensions
thereof, provide a flexible framework to learn user models in
conjunction with approximate Bayesian inference. However,
the resulting models may not be interpretable in general. We
propose decision-rule GPs (DRGPs) that apply GPs in a transformed
space defined by decision rules that have immediate
interpretability to practitioners. We illustrate this modeling
tool on a real application and show that structural variational
inference techniques can be used with DRGPs. We find that
DRGPs outperform the direct use of GPs in terms of both
out-of-sample performance and the quality of optimized decisions.
These performance advantages continue to hold when
DRGPs are combined with transfer learning.
- Robust
demand
learning. (with B. Chen, S. Jasin).
Workshop on Safety and Robustness in Decision Making, NuerIPS 2019.
We consider a common revenue management problem where a seller needs to
determine the pricing decisions for a product with fixed initial inventory over
T time periods. Some parameters of the demand distribution (as a function of
price) are not known, and the firm needs to learn them from historical sales
data
while simultaneously trying to maximize expected total revenue. Departing from
the typical no-regret learning environment, where T can be scaled to infinity,
the
number of time periods is pre-specified and fixed in our setting. We propose
a new robust demand learning (RDL) framework that combines robust Markov
decision processes (MDPs) and no-regret learning such that the seller is
“hedged”
during initial periods of the selling horizon when sales may be low (i.e., high
model ambiguity), while simultaneously leveraging learning to minimize regret
when sufficient sales data is available. RDL relies on the construction of
dynamic
uncertainty sets that are consistent with the goal of regret minimization. We
provide
theory showing that the RDL pricing policy has the same regret guarantee as pure
learning but has lower risk exposure to model ambiguity. In addition, RDL is
less
conservative than a pure robust MDP approach based on static uncertainty sets.
- SMOILE: Shopper marketing
optimization and inverse learning engine. (with A. Chenreddy, P. Pakiman, R.
Chandrasekaran, R. Abens).
Proceedings of the 25th ACM SIGKDD conference on knowledge discovery and data
mining, Anchorage, Alaska, 2019. (accepted for oral presentation; acceptance
rate 6.4%) [pdf]
Product brands employ shopper marketing (SM) strategies to convert shoppers
along the path to purchase. Traditional marketing mix models (MMMs), which
leverage regression techniques and historical data, can be used to predict the
component of sales lift due to SM tactics. The resulting predictive model is a
critical input to plan future SM strategies. The implementation of traditional
MMMs, however, requires significant ad-hoc manual intervention due to their
limited flexibility in (i) explicitly capturing the temporal link between
decisions; (ii) accounting for the interaction between business rules and past
(sales and decision) data during the attribution of lift to SM; and (iii)
ensuring that future decisions adhere to business rules. These issues
necessitate MMMs with tailored structures for specific products and retailers,
each requiring significant hand-engineering to achieve satisfactory performance
-- a major implementation challenge.
We propose an SM Optimization and Inverse Learning Engine (SMOILE) that combines
optimization and inverse reinforcement learning to streamline implementation.
SMOILE learns a model of lift by viewing SM tactic choice as a sequential
process, leverages inverse reinforcement learning to explicitly couple sales and
decision data, and employs an optimization approach to handle a wide-array of
business rules. Using a unique dataset containing sales and SM spend information
across retailers and products, we illustrate how SMOILE standardizes the use of
data to prescribe future SM decisions. We also track an industry benchmark to
showcase the importance of encoding SM lift and decision structures to mitigate
spurious results when uncovering the impact of SM decisions.
- A three-echelon integrated production-distribution system (with J. H. Bookbinder).
International Conference on Decision
Sciences and Technology for Globalization, Decision Sciences Institute, Ghaziabad,
India. 2008.
- Enhancing transportation efficiencies through carrier collaboration (with J. H.
Bookbinder). BPC World Conference
, Mumbai, India. 2007.
- Non-destructive evaluation by low pulsing acoustic tap technique: Spectral
relationships (with T.S. Niranjan and
A.V.Varun). Flight 2006, National Aerospace Symposium, Chennai, India. 2006.
Book Chapters
- Real option
management of hydrocarbon cracking operations (with N.
Secomandi, G. Sowers, and J. Wassick). Real
Options in Energy and Commodity Markets, World Scientici-Now Publishers, 2017.
[pdf]
Commodity conversion assets play important economic roles. It is well known
that the market value of these
assets can be maximized by managing them as real options on the prices of their
inputs and/or outputs. In
particular,
when futures on these inputs and outputs are traded, managing such real options,
that is, valuing, hedging, and
exercising them, is analogous to managing options on such futures, using risk
neutral valuation and delta
hedging
methods. This statement holds because dynamically trading portfolios of these
futures and a risk less bond can
replicate the cash flows of these assets. This basic principle is not always
appreciated by managers of
commodity
conversion assets. Moreover, determining the optimal operational cash flows of
such an asset requires optimizing
the asset operating policy. This issue complicates the real option management of
commodity conversion assets.
This
chapter illustrates the application of this approach to manage a hydrocarbon
cracker, a specific commodity
conversion
asset, using linear programming and Monte Carlo simulation. The discussion is
based on a simplified
representation
of the operations of this asset. However, the material presented here has
potential applicability to the real
option
management of more realistic models of hydrocarbon cracking assets, as well as
other energy and commodity
conversion
assets.
Technical Reports
- Least
squares Monte Carlo and approximate linear programming: Error bounds and energy
real option application
(with N. Secomandi). [pdf; extended version of the
conference proceedings with the same title.
Least squares Monte Carlo (LSM) is an approximate dynamic programming (ADP)
technique commonly used for the valuation of
high dimensional financial and real options, but has broader applicability. It
is known that the regress-later version
of this method is an approximate linear programming (ALP) relaxation that
implicitly provides a potential solution to a
familiar ALP deficiency. Focusing on a generic finite horizon Markov decision
process, we provide both theoretical and
numerical backing for the usefulness of this solution, respectively using a
worst-case error bound analysis and a
numerical study dealing with merchant ethanol production, an energy real option
application, based on an ALP heuristic
that we propose. When both methodologies are applicable, our research supports
the use of regress-later LSM rather than
this ALP technique to approximately solve intractable Markov decision processes.
Our numerical findings motivate
additional research to obtain even better methods than the regress-later version
of LSM.
- Dynamic
pricing for
hotel
rooms when
customers request multiple-day stays
(with Y. F. Lim, Q. Ding). [pdf]
Prominent hotel chains quote a booking price for a particular type of rooms
on each day and dynamically update these prices over time. We present a novel
Markov decision process (MDP) formulation that determines the optimal booking
price for a single type of rooms under this strategy, while considering the
availability of rooms throughout the multiple-day stays requested by customers.
We analyze special cases of our MDP to highlight the importance of modeling
multiple-day stays and provide guidelines to potentially simplify the
implementation of pricing policies around peak-demand events such as public
holidays and conferences. Since computing an optimal policy to our MDP is
intractable in general, we develop heuristics based on a fluid approximation and
approximate linear programming (ALP). We numerically benchmark our heuristics
against a single-day decomposition approach (SDD) and an adaptation of a
fixed-price heuristic. The ALP-based heuristic (i) outperforms the other
methods; (ii) generates up to 7% and 6% more revenue than the SDD and the
fixed-price heuristic respectively; and (iii) incurs a revenue loss of only less
than 1% when using our pricing structure around peak-demand events, which
supports the use of this simple pricing profile. Our findings are potentially
relevant beyond the hotel domain for applications involving the dynamic pricing
of capacitated resources.
|