Parallel Download Challenge

A few weeks ago, I had a job interview and one of the challenges was to download some files in parallel. I found it very interesting, so I decided to write an article about it. Speaking a bit about the challenge, I was given the pooledDownload function, which receives a list of URLs, a connect function that returns a connection, the maximum number of allowed connections, and a list of URLs that should be downloaded and saved; the action should be done in parallel.

Acceptance Criteria

  1. All files must be downloaded and saved.
  2. The file content must be saved as soon as the file is downloaded.
  3. Downloads must be evenly distributed across connections.
  4. Each connection can only download one file at a time. To download more files in parallel, create more connections.
  5. Any open connection must be closed.
  6. If it is not possible to open a connection to the server, you must reject with an Error{:js} containing the message Connection Failed.
  7. If an error occurs during the download, you must reject with the same error.
  8. Sometimes the server may not have slots to handle the number of simultaneous connections. In that case, you must stop opening new connections when the server reaches the limit.
const pooledDownload = async (connect, save, downloadList, maxConcurrency) => {
  // Implement the function here
};
module.exports = pooledDownload;

Solution

The first step I took was to create the getConnections{:js} function. It receives the connect{:js} function and the maximum number of allowed connections, and returns an array of open connections.

const getConnections = async (connect, maxConcurrency) => {
  const connections = [];
  for (let i = 0; i < maxConcurrency; i++) {
    try {
      connections.push(await connect());
    } catch (e) {
      break;
    }
  }

  return connections;
};

I know that using break in the catch block is not a best practice, but it is a simple way to ensure compliance with criterion 8.

Now that we have the open connections, we can implement part of the pooledDownload{:js} function.

const pooledDownload = async (connect, save, downloadList, maxConcurrency) => {
  const filesToDownload = downloadList.slice(0);
  const maxConnectionNeeded = Math.min(maxConcurrency, downloadList.length);
  const connections = await getConnections(connect, maxConnectionNeeded);

  if (!connections || !connections.length) {
    throw new Error("connection failed");
  }
};

The first thing I did was create a copy of the list of files to be downloaded, then get the minimum between the number of allowed connections and the number of files to be downloaded, and then open the connections. If it is not possible to open any connection, I reject the Promise with an Error{:js} containing the message Connection Failed.

You might wonder why I use the minimum between the number of allowed connections and the number of files to be downloaded. This is to ensure that we do not open more connections than necessary.

The next step is to create a loop that will download the files.

const promise = [];
for (let i = 0; i < maxConnectionNeeded; i++) {
  promise.push(execute(connections, filesToDownload, save));
}

try {
  await Promise.all(promise);
} finally {
  // close connections
}

The execute{:js} function is responsible for downloading the files, and this is where the magic happens.

const execute = (connections, downloadList, save) => {
  if (downloadList.length === 0 || connections.length === 0) return Promise.resolve();

  const currentDownload = downloadList.shift();
  const connection = connections.pop();

  return connection
    .download(currentDownload)
    .then((result) => {
      save(result);
      connections.unshift(connection);
      return execute(connections, downloadList, save);
    })
    .catch((e) => {
      downloadList.unshift(currentDownload);
      return Promise.reject(e);
    });
};

What happens in the execute{:js} function is that I check whether there are still files to be downloaded and whether there are still open connections. If not, I return a resolved promise. Then I take the first file from the list of files to be downloaded and the first open connection, download the file, save its content, put the connection back at the beginning of the connections array, and call the execute{:js} function again. If an error occurs during the download, I put the file back in the list of files to be downloaded and reject the promise with the same error.

Finally, I close the connections.

try {
  await Promise.all(promise);
} finally {
  connections.forEach((connection) => connection.close());
}

Conclusion

This was a very interesting challenge. I really enjoyed working on it, learned a lot, and I hope you learned something too. If you have any questions, suggestions, or feedback, please send me a message. I would love to hear what you have to say.

The final result looks like this:

const getConnections = async (connect, maxConcurrency) => {
  const connections = [];
  for (let i = 0; i < maxConcurrency; i++) {
    try {
      connections.push(await connect());
    } catch (e) {
      break;
    }
  }

  return connections;
};

const execute = (connections, downloadList, save) => {
  if (downloadList.length === 0 || connections.length === 0) return Promise.resolve();

  const currentDownload = downloadList.shift();
  const connection = connections.pop();

  return connection
    .download(currentDownload)
    .then((result) => {
      save(result);
      connections.unshift(connection);
      return execute(connections, downloadList, save);
    })
    .catch((e) => {
      downloadList.unshift(currentDownload);
      return Promise.reject(e);
    });
};

const pooledDownload = async (connect, save, downloadList, maxConcurrency) => {
  const filesToDownload = downloadList.slice(0);
  const maxConnectionNeeded = Math.min(maxConcurrency, downloadList.length);
  const connections = await getConnections(connect, maxConnectionNeeded);

  if (!connections || !connections.length) {
    throw new Error("connection failed");
  }

  const promise = [];
  for (let i = 0; i < maxConnectionNeeded; i++) {
    promise.push(execute(connections, filesToDownload, save));
  }

  try {
    await Promise.all(promise);
  } finally {
    connections.forEach((connection) => connection.close());
  }
};

module.exports = pooledDownload;