Monthly Archives: April 2017

LeetCode Verify Preorder Serialization of a Binary Tree

LeetCode Verify Preorder Serialization of a Binary Tree One way to serialize a binary tree is to use pre-order traversal. When we encounter a non-null node, we record the node’s value. If it is a null node, we record using a sentinel value such as #.

     _9_
    /   \
   3     2
  / \   / \
 4   1  #  6
/ \ / \   / \
# # # #   # #

For example, the above binary tree can be serialized to the string "9,3,4,#,#,1,#,#,2,#,6,#,#", where # represents a null node. Given a string of comma separated values, verify whether it is a correct preorder traversal serialization of a binary tree. Find an algorithm without reconstructing the tree. Each comma separated value in the string must be either an integer or a character '#' representing null pointer. You may assume that the input format is always valid, for example it could never contain two consecutive commas such as "1,,3". Example 1: "9,3,4,#,#,1,#,#,2,#,6,#,#" Return true Example 2: "1,#" Return false Example 3: "9,#,#,1" Return false

根据树的先序遍历结果，在不重构树的情况下，判断该先序遍历是否合法。解法1，使用栈。因为先序遍历的顺序是根左右，如果我们遇到序列x,#,#，其中x不是#时，说明x的左右孩子都是#，此时，我们将x,#,#替换为#，表示将以x为根的子树替换为一个#。如此循环，如果栈中最终只剩下一个#，说明该序列合法。比如样例”9,3,4,#,#,1,#,#,2,#,6,#,#”：

9,3,4,#,# → 9,3,#
9,3,#,1,#,# → 9,3,#,# → 9,#
9,#,2,#,6,#,# → 9,#,2,#,# → 9,#,# → #

不需要担心遇到x,#,#,#的情况，因为在遍历到第二个#时，就符合x,#,#的pattern了，会把x,#,#变成#。代码如下： [cpp] class Solution { public: bool isValidSerialization(string preorder) { stack<string> stk; int i = 0, j = 0, n = preorder.size(); while (j < n) { while (j < n&&preorder[j] != ‘,’)++j; string node = preorder.substr(i, j – i); if (node == "#") { while (!stk.empty() && stk.top() == "#") { stk.pop(); if (stk.empty())return false; stk.pop(); } } stk.push(node); i = ++j; } return stk.size() == 1 && stk.top() == "#"; } }; [/cpp] 本代码提交AC，用时6MS。解法2，利用度的信息。一棵二叉树通过题目中的加#策略之后，变成了一棵满二叉树。这里的满是指一个节点要么有两个孩子，要么没有孩子，不存在有一个孩子的情况。（把#也看成孩子）那么如果是叶子节点，则只贡献一个入度；如果是非叶子节点，则贡献一个入度和两个出度。所以diff=outdegree-indegree不可能是负数。所以当来了一个节点，先–diff，因为这个节点肯定会贡献一个入度。然后判断这个节点如果是非叶子节点，则diff+=2，因为非叶子节点贡献两个出度。程序结束时，如果diff==0，则说明序列合法，否则不合法。因为就完整的树来说，入度肯定等于出度的，一条边对于父亲节点来说贡献出度，但对于孩子节点来说，贡献入度，所以总的入度和出度是相等的。初始时，因为根节点没有入度，但是在外层while循环中，我们把根节点也看成普通的节点了。普通节点假设是有入度的，所以一进来就会–diff。所以初始时给根节点也补上一个入度，即diff=1，这样进入while之后，–diff不至于小于0。代码如下： [cpp] class Solution { public: bool isValidSerialization(string preorder) { int i = 0, j = 0, n = preorder.size(), diff = 1; while (j < n) { while (j < n&&preorder[j] != ‘,’)++j; string node = preorder.substr(i, j – i); if (–diff < 0)return false; if (node != "#")diff += 2; i = ++j; } return diff == 0; } }; [/cpp] 本代码提交AC，用时3MS。解法3，也是利用度的信息。前面说了，通过题目中的策略，一棵普通二叉树已经变成了一棵满二叉树，满二叉树的特点是叶子节点的数目=非叶子节点的数目+1。简单证明如下：假设叶子节点数为leaves，非叶子节点数为nonleaves。则总节点数为leaves+nonleaves，总边数就是leaves+nonleaves-1。一条边贡献两个度，所以总度数就是2(leaves+nonleaves-1)。从另一个角度，一个叶子节点贡献一个度，一个非叶子节点贡献3个度，根节点也是非叶子节点，但它只贡献两个读，所以总度数加起来就是leaves+3*nonleaves-1。令2(leaves+nonleaves-1)=leaves+3*nonleaves-1，就解出了leaves=nonleaves+1。因为先序遍历是根左右，所以我们找一个满足leaves=nonleaves+1的最短前缀，如果这确实是一个合法的序列，则这个最短前缀就应该是序列本身。因为一棵满二叉树，在遍历的过程中是不会满足leaves=nonleaves+1的，只有当整棵树都遍历完之后才会满足这个条件。

     _9_
    /   \
   3     2
  / \   / \
 4   1  #  6
/ \ / \   / \
# # # #   # #

上面这棵树，虽然在遍历到节点2时，如果不再往下遍历了，看起来也是满二叉树，但是2是有孩子的，所以会被判断为nonleaves，所以还是不满足leaves=nonleaves+1。代码如下： [cpp] class Solution3 { public: bool isValidSerialization(string preorder) { int leaves = 0, nonleaves = 0; int i = 0, j = 0, n = preorder.size(); while (j < n) { while (j < n&&preorder[j] != ‘,’)++j; string node = preorder.substr(i, j – i); if (node == "#")++leaves; else ++nonleaves; i = ++j; if (leaves – nonleaves == 1 && j < n)return false; } return leaves – nonleaves == 1; } }; [/cpp] 本代码提交AC，用时6MS。]]>

LeetCode Longest Absolute File Path

Leave a reply

LeetCode Longest Absolute File Path Suppose we abstract our file system by a string in the following manner: The string "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext" represents:

dir
    subdir1
    subdir2
        file.ext

The directory dir contains an empty sub-directory subdir1 and a sub-directory subdir2 containing a file file.ext. The string "dir\n\tsubdir1\n\t\tfile1.ext\n\t\tsubsubdir1\n\tsubdir2\n\t\tsubsubdir2\n\t\t\tfile2.ext" represents:

dir
    subdir1
        file1.ext
        subsubdir1
    subdir2
        subsubdir2
            file2.ext

The directory dir contains two sub-directories subdir1 and subdir2. subdir1 contains a file file1.ext and an empty second-level sub-directory subsubdir1. subdir2contains a second-level sub-directory subsubdir2 containing a file file2.ext. We are interested in finding the longest (number of characters) absolute path to a file within our file system. For example, in the second example above, the longest absolute path is "dir/subdir2/subsubdir2/file2.ext", and its length is 32 (not including the double quotes). Given a string representing the file system in the above format, return the length of the longest absolute path to file in the abstracted file system. If there is no file in the system, return 0. Note:

The name of a file contains at least a . and an extension.
The name of a directory or sub-directory will not contain a ..

Time complexity required: O(n) where n is the size of the input string. Notice that a/aa/aaa/file1.txt is not the longest file path, if there is another path aaaaaaaaaaaaaaaaaaaaa/sth.png.

给定一个文件系统的树形结构，要求以O(n)的复杂度找出最长路径的文件的路径长度，这里的长度包括分隔符’/’。一开始我没有清晰的思路，后来在暗时间会拿出来思考，今天早上刷牙的时候想到了大概的解法。遍历一次字符串，用一个vector保存当前看到的文件系统的最深层次，用level保存当前进入到的文件系统的深度。vector初始为空，level初始为0。每次我们找’\n’，因为’\n’表示一层文件系统的结束。每层文件系统level都从0开始。比如第一个例子（如下），看到dir时，vector还是空的，所以vector.push_back(dir)。回车之后是’\t’，此时的level为0，且vector[0]有东西，说明要进入vector[0]表示的文件夹的下一层，level++。又遇到subdir1，此时的level=1了，已经大于等于vector的长度，说明目前进入到的深度已经超过了vector的表示范围，所以vector.push_back(subdir1)。遇到’\n’又重新开始了，level为0。遇到’\t’，vector有，进入vector[0]，level++。遇到subdir2，此时level=1，但vector[1]有东西，此时需要把vector[1]覆盖为subdir2，因为目前看到的第1层已经是新的了。遇到’\n’又重新开始，level为0。后面连遇到两个’\t’，且vector中都有对应，说明是在之前的路径里面，即dir/subdir2/。最后遇到file.ext文件，就能算到该文件的路径长度了。

dir
    subdir1
    subdir2
        file.ext

代码如下。需要注意一点，如果一个string为”dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext”，里面的’\n’和’\t’虽然看起来是两个字符\加n或者t，但是实际是一个char为’\n’或’\t’，内存中只用一个字符char来表示，所以直接判断input[i]==’\n’即可。 [cpp] class Solution { public: int lengthLongestPath(string input) { vector<string> paths; int maxLen = 0, n = input.size(), i = 0, j = 0; while (j < n) { bool isfile = false; int curLen = 0, level = 0; while (j < n&&input[j] != ‘\n’) { if (input[j] == ‘.’)isfile = true; else if (input[j] == ‘\t’) { if (paths.size() <= level)paths.push_back(input.substr(i, j – i)); curLen += paths[level++].size() + 1; // +1指分隔符’/’ i = j + 1; } ++j; } if (paths.size() <= level)paths.push_back(input.substr(i, j – i)); else paths[level] = input.substr(i, j – i); curLen += paths[level++].size(); // 末尾不需要分隔符’/’ i = ++j; if (isfile)maxLen = max(maxLen, curLen); } return maxLen; } }; [/cpp] 本代码提交AC，用时3MS。]]>

LeetCode H-Index II

Leave a reply

275. H-Index II

Given an array of citations sorted in ascending order (each citation is a non-negative integer) of a researcher, write a function to compute the researcher’s h-index.

According to the definition of h-index on Wikipedia: “A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each.”

Example:

Input: citations = [0,1,3,5,6]
Output: 3 
Explanation: [0,1,3,5,6] means the researcher has 5 papers in total and each of them had 
             received 0, 1, 3, 5, 6 citations respectively. 
             Since the researcher has 3 papers with at least 3 citations each and the remaining 
             two with no more than 3 citations each, her h-index is 3.

Note:

If there are several possible values for h, the maximum one is taken as the h-index.

Follow up:

This is a follow up problem to H-Index, where citations is now guaranteed to be sorted in ascending order.
Could you solve it in logarithmic time complexity? 275. H-Index II

是LeetCode H-Index的延伸，如果citations是按升序排列的，怎样把算法优化到O(logn)。看到O(logn)，马上想到二分查找。首先左右边界是0和n-1，判断中点的citations[m]和n-m，如果正好相等，则说明正好有n-m篇大于等于n-m引用的文章，所以H-index为n-m。如果citations[m]>n-m，说明引用很大，h-index还能往上涨，所以中心点需要左移，即r=m-1。反之则中心点右移l=m+1。
代码如下：

class Solution {
public:
    int hIndex(vector<int>& citations)
    {
        int l = 0, n = citations.size(), r = n – 1;
        while (l <= r) {
            int m = l + (r – l) / 2;
            if (n – m == citations[m])
                return n – m;
            else if (citations[m] > n – m)
                r = m – 1;
            else
                l = m + 1;
        }
        return n – l;
    }
};

本代码提交AC，用时9MS。
我一开始的代码是，如果citations[m]<n-m，中心点确实右移了，即l=m+1。但是当citations[m]>n-m时，我没想到用r=m-1来左移中心点，误以为r一直都是n-1是固定的，所以这个时候使用了一个while循环来线性查找正确的m，理论上性能要比上面的纯二分差。
代码如下：

class Solution {
public:
    int hIndex(vector<int>& citations)
    {
        int l = 0, r = citations.size() – 1, n = citations.size();
        if (n == 0)
            return 0;
        if (citations[0] >= n)
            return n;
        while (l < r) {
            int m = l + (r – l) / 2;
            if (n – m > citations[m])
                l = m + 1;
            else {
                while (m > 0 && n – m <= citations[m])
                    –m;
                return n – m – 1;
            }
        }
        if (citations[r] >= 1)
            return 1;
        else
            return 0;
    }
};

本代码提交AC，用时也是9MS。

二刷。更好理解的二分，因为是从小到大排序的，选定中点m后，中点及其右边共有n-m篇paper，如果paper数和citation相等，正好；如果citations[m]更大，则还可以增加paper数，所以r=m-1区间左移，右边合法的paper区间增大。因为如果合法的话，区间会一直左移，r会一直m-1，直到不满足要求，退出while循环，所以最后一个合法的下标r+1，则右边的paper数为n-(r+1)。

class Solution {
public:
	int hIndex(vector<int>& citations) {
		int n = citations.size();
		int l = 0, r = n - 1;
		while (l <= r) {
			int m = l + (r - l) / 2;
			int papers = n - m;
			if (citations[m] == papers) {
				return papers;
			}
			else if (citations[m] >= papers) {
				r = m - 1;
			}
			else {
				l = m + 1;
			}
		}
		return n - (r + 1);
	}
};

bitJoy > code

Keep calm and carry on coding

Monthly Archives: April 2017

LeetCode Verify Preorder Serialization of a Binary Tree

LeetCode Longest Absolute File Path

LeetCode H-Index II