Monthly Archives: June 2017

LeetCode Find Duplicate File in System

LeetCode Find Duplicate File in System Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths. A group of duplicate files consists of at least two files that have exactly the same content. A single directory info string in the input list has the following format: "root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)" It means there are n files (f1.txt, f2.txtfn.txt with content f1_content, f2_contentfn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory. The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format: "directory_path/file_name.txt" Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in a same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file infos are separated by a single blank space.
Follow up beyond contest:
  1. Imagine you are given a real file system, how will you search files? DFS or BFS ?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

给定一个文件系统,要找出所有文件内容相同的集合,分group输出。 也是简单题,对所有文件根据content Hash,Hash到同一个桶内的文件内容相同,把它们group起来就好。 需要注意是只有当一个桶内的文件数目多于1个时,才输出该桶内所有文件。 代码如下: [cpp] class Solution { private: struct MyFile { string name, directory; MyFile(string n_, string d_) :name(n_), directory(d_) {}; }; public: vector<vector<string>> findDuplicate(vector<string>& paths) { unordered_map<string, vector<MyFile>> ump; for (int i = 0; i < paths.size(); ++i) { string line = paths[i] + " "; int pos_blank = line.find(" "); string directory = line.substr(0, pos_blank); int start = pos_blank + 1, end = line.find(" ", start); while (end != string::npos) { string one_file = line.substr(start, end – start); int pos_parenthesis = one_file.find("("); string name = one_file.substr(0, pos_parenthesis); string content = one_file.substr(pos_parenthesis + 1, one_file.size() – pos_parenthesis – 2); ump[content].push_back(MyFile(name, directory)); start = end + 1; end = line.find(" ", start); } } vector<vector<string>> ans; for (const auto &it : ump) { if (it.second.size() > 1) { vector<string> group; for (const auto &f : it.second) { group.push_back(f.directory + "/" + f.name); } ans.push_back(group); } } return ans; } }; [/cpp] 本代码提交AC,用时122MS。]]>

LeetCode Construct String from Binary Tree

LeetCode Construct String from Binary Tree You need to construct a string consists of parenthesis and integers from a binary tree with the preorder traversing way. The null node needs to be represented by empty parenthesis pair “()”. And you need to omit all the empty parenthesis pairs that don’t affect the one-to-one mapping relationship between the string and the original binary tree. Example 1:

Input: Binary tree: [1,2,3,4]
       1
     /   \
    2     3
   /
  4
Output: "1(2(4))(3)"
Explanation: Originallay it needs to be "1(2(4)())(3()())",
but you need to omit all the unnecessary empty parenthesis pairs.
And it will be "1(2(4))(3)".
Example 2:
Input: Binary tree: [1,2,3,null,4]
       1
     /   \
    2     3
     \
      4
Output: "1(2()(4))(3)"
Explanation: Almost the same as the first example,
except we can't omit the first parenthesis pair to break the one-to-one mapping relationship between the input and the output.

本题要求把一棵二叉树转换为一个字符串,使用先序遍历的方法,并且左右子树需要用括号括起来,但是要保证从该字符串能唯一恢复回原来的二叉树,而且要去掉不必要的空括号。 首先明确先序遍历:根左右,不难。然后要删掉不必要的空括号,观察第二个样例,发现如果左孩子为空,则左孩子的空括号不能省,但是如果右孩子或者左右孩子都为空,则他们的空括号可以省略。所以分几种情况递归求解即可。 代码如下: [cpp] class Solution { public: string tree2str(TreeNode* t) { if (!t)return ""; if (!t->left && !t->right)return to_string(t->val); if (!t->left)return to_string(t->val) + "()(" + tree2str(t->right) + ")"; if (!t->right)return to_string(t->val) + "(" + tree2str(t->left) + ")"; return to_string(t->val) + "(" + tree2str(t->left) + ")(" + tree2str(t->right) + ")"; } }; [/cpp] 本代码提交AC,用时19MS。]]>

LeetCode Can Place Flowers

LeetCode Can Place Flowers Suppose you have a long flowerbed in which some of the plots are planted and some are not. However, flowers cannot be planted in adjacent plots – they would compete for water and both would die. Given a flowerbed (represented as an array containing 0 and 1, where 0 means empty and 1 means not empty), and a number n, return if n new flowers can be planted in it without violating the no-adjacent-flowers rule. Example 1:

Input: flowerbed = [1,0,0,0,1], n = 1
Output: True
Example 2:
Input: flowerbed = [1,0,0,0,1], n = 2
Output: False
Note:
  1. The input array won’t violate no-adjacent-flowers rule.
  2. The input array size is in the range of [1, 20000].
  3. n is a non-negative integer which won’t exceed the input array size.

今天第一次做Leetcode的weekly contest,AC了三题,最后一题看起来好繁琐,没A。 第一题,简单贪心即可,从第0个位置开始,检查每个位置的前后是否为1,如果不为1,说明该位置可以种花,填上1,最后统计新填上的1的个数和n的关系。 代码如下: [cpp] class Solution { public: bool canPlaceFlowers(vector<int>& flowerbed, int n) { int m = flowerbed.size(); for (int i = 0; i < m; ++i) { if (flowerbed[i] == 0) { if (i – 1 >= 0 && flowerbed[i – 1] == 1)continue; if (i + 1 < m&&flowerbed[i + 1] == 1)continue; flowerbed[i] = 1; –n; } } return n <= 0; } }; [/cpp] 本代码提交AC,用时26MS。]]>

HDOJ 1159-Common Subsequence

HDOJ 1159-Common Subsequence

Common Subsequence

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 59402    Accepted Submission(s): 27503

Problem DescriptionA subsequence of a given sequence is the given sequence with some elements (possible none) left out. Given a sequence X = <x1, x2, …, xm> another sequence Z = <z1, z2, …, zk> is a subsequence of X if there exists a strictly increasing sequence <i1, i2, …, ik> of indices of X such that for all j = 1,2,…,k, xij = zj. For example, Z = <a, b, f, c> is a subsequence of X = <a, b, c, f, b, c> with index sequence <1, 2, 4, 6>. Given two sequences X and Y the problem is to find the length of the maximum-length common subsequence of X and Y.
The program input is from a text file. Each data set in the file contains two strings representing the given sequences. The sequences are separated by any number of white spaces. The input data are correct. For each set of data the program prints on the standard output the length of the maximum-length common subsequence from the beginning of a separate line.

Sample Input

abcfbc abfcab
programming contest 
abcd mnp

Sample Output

4
2
0

SourceSoutheastern Europe 2003
RecommendIgnatius   |   We have carefully selected several similar problems for you:  10871176100310581069


求最长公共子序列。DP模板题,使用一个dp数组即可,用pre记录原来左上角的值。 代码如下,如果要求出一个lcs,参考《算法导论》P225,使用一个二维数组direction记录每个格子最大值来自哪个方向,重构lcs时从右下角不停的往前找。

#include <string>
#include <cstdio>
#include <vector>
#include <algorithm>
#include <iostream>
using namespace std;
int lcs(const string& a, const string& b)
{
    int n = a.size(), m = b.size();
    if (n == 0 || m == 0)
        return 0;
    vector<int> dp(m + 1, 0);
    for (int i = 1; i <= n; ++i) {
        int pre = 0;
        for (int j = 1; j <= m; ++j) {
            int cur = 0;
            if (a[i – 1] == b[j – 1])
                cur = pre + 1;
            else
                cur = max(dp[j – 1], dp[j]);
            pre = dp[j];
            dp[j] = cur;
        }
    }
    return dp[m]; // 最大值就是右下角的值
}
void construct_lcs(vector<vector<int> >& direction, const string& a, int i, int j, string& lcs)
{
    if (i == 0 || j == 0)
        return;
    if (direction[i][j] == 3) {
        construct_lcs(direction, a, i – 1, j – 1, lcs);
        lcs += string(1, a[i – 1]); // direction下标1对应a下标0,所以i-1
    }
    else if (direction[i][j] == 2) {
        construct_lcs(direction, a, i – 1, j, lcs);
    }
    else {
        construct_lcs(direction, a, i, j – 1, lcs);
    }
}
string lcs_string(const string& a, const string& b)
{
    int n = a.size(), m = b.size();
    if (n == 0 || m == 0)
        return 0;
    vector<int> dp(m + 1, 0);
    vector<vector<int> > direction(n + 1, vector<int>(m + 1, 0)); // 1:left,2:up,3:diagonal
    for (int i = 1; i <= n; ++i) {
        int pre = 0;
        for (int j = 1; j <= m; ++j) {
            int cur = 0;
            if (a[i – 1] == b[j – 1]) {
                cur = pre + 1;
                direction[i][j] = 3;
            }
            else {
                if (dp[j – 1] > dp[j]) {
                    cur = dp[j – 1];
                    direction[i][j] = 1;
                }
                else {
                    cur = dp[j];
                    direction[i][j] = 2;
                }
            }
            pre = dp[j];
            dp[j] = cur;
        }
    }
    string ans;
    construct_lcs(direction, a, n, m, ans); // 从右下角递归往前找,LCRS P225
    return ans;
}
int main()
{
    //freopen("input.txt", "r", stdin);
    string a, b;
    while (cin >> a >> b) {
        cout << lcs(a, b) << endl;
        //cout << lcs(a, b) << "\t" << lcs_string(a, b) << endl;;
    }
    return 0;
}

本代码提交AC,用时46MS。