Quantitative Trader

HackerRank University CodeSprint 2

February 20, 2017, 11:32 am

I realized that I haven't posted anything here since 2014, and it looks bad to have a "blog" with nothing on it. I really enjoyed the problems in HackerRank's University CodeSprint 2 this weekend, so I figured I would write down the ideas from my solutions here.

Problems A (Breaking the Records) and B (Separate the Numbers) were straightforward implementation, and C (Game of Two Stacks) was a basic binary search/two pointers problem. Below are my thoughts on the other three problems I solved.

D: The Story of a Tree

We are given an (undirected, unrooted) tree and a set of queries which are each in the form of a directed edge. We are asked to consider rooting the tree at each of its \(N\) vertices, and in each case to count the number of queries that are "correct"--that is, queries \((u, v)\) in which \(u\) is the parent of \(v\). (It is guaranteed that there is an edge between \(u\) and \(v\).)

The obvious solution is \(O(N^2)\): for every root, DFS on the tree and count the number of correct queries. This will be too slow for this problem, but it's not hard to bring it down to \(O(N)\).

First, we will run the naive solution for an arbitrary root. Then note that the number of correct queries after rooting the tree at an adjacent node can differ by at most \(1\): if the edge we traverse is a query. So we will run another DFS starting at the same node as before (because we know the answer for this root), and every time we traverse an edge, we check if that edge is a query (in each direction, \((u, v)\) and \((v, u)\)) and update our current answer accordingly. Each of the two DFSs is linear, so the solution as a whole is linear.

E: Sherlock's Array Merging Algorithm

We are given the output of a strange array-merging procedure, and we are asked the number of distinct inputs that could have produced this output. The procedure is as follows: we start with a collection of arrays \(V\), and for each iteration, we remove the first item from each array, and insert these \(|V|\) items into our final array in sorted order. Not all the starting arrays have to be the same length (we just remove empty arrays after we remove their last item).

The first observation to make is that each iteration of the merging procedure produces a sorted subarray in the final array. We then want to count the number of ways to partition the final array into sorted subarrays of non-increasing length... sort of. If we do that, we will be undercounting: some partitions correspond to more than one initial collection. For example, partitioning \(123\) into \(12|3\) corresponds to both \(V = [13, 2]\) and \(V = [1, 23]\). Instead of saying \(DP[i][j]\) is the number of ways to partition the first \(i\) elements with the last partition having size \(j\), we will call it the number of "permuted partitions". (I'm not sure if there is a formal name for this.) Intuitively, we want to multiply by the number of ways to match up this segment with the previous one.

Then it is clear that \(DP[i][j] = \sum_{k \geq j} (DP[i-j][k] \frac{k!}{(k-j)!})\). These transitions are \(O(N)\), and there are \(O(N^2)\) states. With constraints of \(N \leq 1200\), I expected this to time out, but as it turns out, because of the small constant factor this is fast enough. I suspect there is a way to reduce the time to \(O(N^2)\), but I'm not sure how.

G: Minimum MST Graph

This problem took me all day Saturday, largely because I had an off-by-one bug in one of my for loops but I was sure the error was elsewhere.

We are given \(N\), \(M\), and \(S\)--the number of nodes, number of edges, and MST weight--of an unknown graph. Our goal is to give the minimum sum of weights of edges in a graph satisfying these three criteria.

First, instead of giving each edge a weight, we will give each node a weight. Each edge's weight is then the maximum of the weights of its two endpoints. We will assume that node \(0\) has the smallest weight, node \(1\) has the second smallest, etc., and furthermore that node \(0\) has weight \(0\) (every other node will have weight at least \(1\)). We will define an array \(B\) based on these values. That is, \(B[i]\) is the weight of node \(i\).

Now we will define a second array \(A\) which contains information about the structure of the graph we are constructing. In particular, \(A[i]\) is the number of "backward edges" from node \(i\). If an edge connects nodes \(i\) and \(j\) where \(i < j\), we say that edge is a "backward edge" from node \(j\). This is useful because of the ordering of the nodes: the weight of node \(i\) is no more than the weight of node \(j\), and so the weight of an edge is the same as the weight of the node for which it is a backward edge.

So we have \(B[0] = 0\), and \(B[i] \geq 1\) for \(i > 0\). We also have \(A[0] = 0\), and \(1 \leq A[i] \leq i\) for \(i > 0\). (If \(A[i] = 0\), then our graph may not be connected, and even if it is, we can make our answer no worse by making all \(A[i] \geq 1\); if \(A[i] > i\), then it must have multiple edges or self loops. I encourage you to verify this for yourself.) For simplicity, we will stop considering node \(0\) from now on, and assume that our arrays are only defined on indices \(1 \leq i \leq N - 1\).

Note that \(\sum A[i] = M\) and \(\sum B[i] = S\). Subject to all these constraints, we want to minimize \(\sum A[i] B[i]\).

This is a much more manageable and comprehensible problem, but it still seems hard. First, let's suppose \(B\) is fixed and build an optimal \(A\). To do this, first recall that \(B\) is increasing (albeit not strictly). Now, we can set \(A[i] = 1\) for \(1 \leq i \leq N - 1\). We will add the remaining \(M - (N - 1)\) edges to our graph greedily. For each edge, we should increment \(A[i]\) for the smallest \(i\) that is not yet "full"--that is, for which \(A[i] < i\). It is easy to see that this will lead to an array \(A\) with some prefix of its elements being set to their index, potentially followed by at most one element that is not \(1\), potentially followed by a bunch of \(1\)s. So it turns out that \(A\) does not depend on \(B\) at all; that is, the structure of our optimal graph depends only on \(N\) and \(M\)!

Having found \(A\), we only have to find the array \(B\) that minimizes \(A \cdot B\). There is a greedy algorithm for this, too: let \(C[i]\) be the "cost" of incrementing the last \(i\) elements of \(B\). It is equal to the sum of the last \(i\) elements of \(A\), divided by \(i\). Intuitively, \(C[i]\) is the amount by which our answer increases per unit of weight we allocate. Then if we currently have \(X\) weight to distribute, we find the \(i \leq X\) for which \(C[i]\) is minimum, and we increment the last \(i\) elements of \(B\). When we have no more weight to distribute, we are done. Simply computing the desired dot product yields the answer.

With reasonable implementation, this algorithm has time complexity \(O(N + S)\), which will pass all but the last subtask (getting 70% score). The last subtask isn't as interesting: it just takes some careful reasoning and algebra to find a closed-form answer that can be computed in \(O(1)\) time.

Well, thanks for reading, and maybe I'll post some more competitive programming-related stuff on here in the future if I again find myself trying to avoid doing homework.