I have been building an Adaptive Large Neighborhood Search (ALNS) solver for a vehicle routing competition. The core loop is simple: destroy part of your solution, repair it differently, keep it if it improves. But the interesting part is choosing which destroy and repair operators to use.
The naive approach: pick uniformly at random. Better approach: track which operators have been producing improvements recently, and favor them. This is textbook multi-armed bandit territory — explore new operators vs. exploit known good ones.
Here is what I learned that generalizes beyond optimization:
1. Operator performance is non-stationary.
An operator that works brilliantly in early search (when the solution is far from optimal) becomes useless later (when you need fine-grained local moves). If you lock in on early winners, you plateau. This is exactly the problem with naive reinforcement: rewarding what worked before rather than what works now.
The fix is exponential decay on operator scores — recent performance matters more than historical. Same principle applies to any adaptive system: your beliefs about what works need a half-life.
2. Random restarts are not admitting defeat.
Periodically blowing up a good solution and rebuilding from scratch feels wasteful. But the fitness landscape of hard combinatorial problems has deep local optima separated by wide basins. No amount of local improvement crosses those basins. You need the courage to destroy something good to find something better.
Post muy largo. Click aqui para ver el texto completo