Enlightment

Shu,
Back

You can't connect the dots looking forward; you can only connect them looking backwards.

A while ago, I had an enlightening experience that has lingered in my mind for months, and I feel compelled to put it into words.

There exists a fascinating category of algorithmic problems known as offline problems. Offline problems are those where you can gather all the necessary information, process it, and provide the answer all at once. In contrast, online problems require processing information as it arrives and outputting intermediate results before all the information is available.

Range Query

Imagine a scenario where you are presented with a series of queries that involve determining the number of unique items within specific ranges. For instance, given a list of numbers [1, 3, 1, 2, 5], you may want to know how many unique numbers fall within the ranges [0, 3] and [1, 4] (with index starting from 0). The answers would be 3 and 4, respectively:

1 3 1 2 5
^ ^ ^ ^   => 3 unique numbers: 1, 2, 3
  ^ ^ ^ ^ => 4 unique numbers: 1, 2, 3, 5

The online version of this problem requires answering the queries one by one after obtaining the list. In the example mentioned, you would first need to respond to the query [0, 3] and then [1, 4]. On the other hand, the offline version provides all the queries upfront, and you must answer them collectively.

Let's begin with the online version. The naive solution involves iterating through the list and counting the number of unique items within each range. This approach has a time complexity of O(n*m), where n is the length of the list and m is the number of queries. This is because for each query, you need to iterate through the list to count the number of unique items within the specified range:

function countUniqueItems(list, range) {
  const [start, end] = range;
  const occurrences = {};
  let count = 0;
  
  for (let i = start; i <= end; i++) {
    if (!occurrences[list[i]]) {
      occurrences[list[i]] = true;
      count++;
    }
  }
  
  return count;
}

const list = inputList();
for (let i = 0; i < numberOfQueries; i++) {
  const range = inputRange();
  const count = countUniqueItems(list, range);
  console.log(count);
}

While there are various ways to optimize this solution, let's explore the intriguing aspects of the offline version.

“Offline”

In the offline version, we receive the list and all the queries simultaneously. To begin, let's take a step to sort the queries based on their left boundary:

const list = inputList();
const ranges = inputRanges();

const queries = ranges.map((range, index) => ({
  left: range[0],
  right: range[1],
  index
}));

// This sorts the queries based on their left boundary
queries.sort((a, b) => a.left - b.left);

Next, we can iterate through the sorted queries while maintaining the count of unique numbers within the range. For example, when transitioning from the range [0, 3] to [1, 4], we can simply subtract the count of list0 and add the count of number list4. All the other numbers remain unchanged.

const occurrences = {};
for (let i = 0; i < N; i++) {
  occurrences[i] = 0;
}

let count = 0;
let left = 0;
let right = -1;
const result = [];

for (let i = 0; i < queries.length; i++) {
  const query = queries[i];

  // Move the left boundary to the left of the new query
  while (left < query.left) {
    if (occurrences[list[left]] === 1) {
      count--;
    }
    occurrences[list[left]]--;
    left++;
  }

  // Move the right boundary to the right of the new query
  // Note that the right boundary isn't sorted, so we could
  // move in both directions
  while (right < query.right) {
    right++;
    if (occurrences[list[right]] === 0) {
      count++;
    }
    occurrences[list[right]]++;
  }
  while (right > query.right) {
    if (occurrences[list[right]] === 1) {
      count--;
    }
    occurrences[list[right]]--;
    right--;
  }

  // Record the result of that specific query
  result[query.index] = count;
}

// Print the results in the original order of the queries
for (let i = 0; i < result.length; i++) {
  console.log(result[i]);
}

Although this solution may appear intricate, it generally performs better when the queries are randomly distributed. In this generated test involving 10,000 items and 1,000 queries, the offline solution proved to be approximately 15 times faster than the online counterpart. Further optimizations are possible, as outlined in other resourceful approaches, and we can analyze its time complexity as well as examine the “worst-case” scenario. However, let us step back and consider what makes this solution truly remarkable.

Imagine having two ranges with significant overlap, such as [10, 1000] and [11, 1001]. The naive online solution redundantly iterates through the shared range [11, 1000] twice. In contrast, the offline solution adjusts the left and right boundaries only once, effectively reusing the information from the previous query.

Why can't the online solution achieve the same efficiency? Simply put, it lacks knowledge of the future. In a series of queries like [10, 1000], [0, 9], [11, 1001], how can you determine that the most efficient approach is to solve the ranges [10, 1000], [11, 1001], and then [0, 9] when you can only see the first query?

This distinction is key.

The Enlightment

After knowing this clever technique for a long time, earlier this year, I was playing a game called Immortality. It’s a game that involves solving a mystery by watching a series of video clips from three movies. The clips are presented in a non-chronological order, and you must piece together the story by yourself. It is a typical nonlinear narrative. What’s interesting is that the actual shooting of these clips are not in the same order as the story inside.

This made me think that in real life, a film crew embarks on a production armed with a meticulously planned film script, complete with scenes, props, acts, and everything in between. The script is typically sorted by time, ensuring a smooth chronological flow of events.

But when shooting the film, the crew organizes their work by sorting elements based on scenes. They gather all the necessary elements, such as actors, props, and set designs, for each scene before shooting it. By focusing on one scene at a time, the crew can make minute adjustments while transitioning from one scene to the next, ensuring consistency in the overall setup and minimizing the need for repeated setup and adjustments. It's a fascinating parallel to the offline range query technique of sorting query ranges and adapting as we move along.

Isn't it beautiful? If you think about it, the offline solution is essentially a “timeless” approach. When time is no longer a constraint, you have the superpower.

There must be something more.

So you have to trust that the dots will somehow connect in your future. You have to trust in something — your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life.


Twitter · GitHub · Instagram · g@shud.inCC BY-NC 4.0 © Shu Ding.RSS