Tuesday 19 January 2016

Design a Transaction System (e.g. Paypal)

ACID



Implementations

Tuesday 12 January 2016

Error Type

Dynamic Programming

  • [N+1] for string
  • A.charAt(N) ERROR
  • Return not dp[N] or dp[N][M], but some MAX or MIN

LinkedList
  • Remove Node p (setting p.next = null)
  • break link (setting the left end's next = null)

Comparator
  • private Comparator<ListNode> listNodeComparator = new Comparator<ListNode>() {
  •         public int compare(ListNode left, ListNode right) {
  •             if (left == null) {
  •                 return 1;
  •             } else if (right == null) {
  •                 return -1
  •             }
  •             
  •             return left.val - right.val;
  •         }
  •     };

Classic Data Structures and Functions


1. ArrayList
  • void add(int index, Object element)
  • boolean add(Object o)
  • boolean addAll(Collection c)
  • boolean addAll(int index, Collection c)
  • void clear()
  • Object clone(): Returns a shallow copy of this ArrayList.
  • boolean contains(Object o)
  • void ensureCapacity(int minCapacity)
  • Object get(int index)
  • int indexOf(Object o)
  • int lastIndexOf(Object o)
  • Object remove(int index)
  • protected void removeRange(int fromIndex, int toIndex)
  • Object set(int index, Object element)
  • int size()
  • Object[] toArray()
  • Object[] toArray(Object[] a)
  • void trimToSize()
2. HashMap

  • void clear()
  • Object clone(): Returns a shallow copy of this HashMap instance.
  • boolean containsKey(Object key)
  • boolean containsValue(Object value)
  • Set entrySet()
  • Object get(Object key)
  • boolean isEmpty()
  • Set keySet()
  • Object put(Object key, Object value)
  • putAll(Map m)
  • Object remove(Object key)
  • int size()
  • Collection values()


2.2 HashSet
  • add()
  • clear()
  • clone()
  • contains()
  • isEmpty()
  • remove()
  • size()

3. Queue / LinkedList

  • boolean add(E e)
  • void add(int index, E element)
  • boolean addAll(Collection<? extends E> c)
  • void clear()
  • Object clone()
  • boolean contains(Object o)
  • E get(int index)
  • E peek()
  • E poll()
  • int size()
  • Object[] toArray()
  • public boolean isEmpty()
4. Stack
  • boolean empty()
  • Object peek( )
  • Object pop( )
  • Object push(Object element)
  • int search(Object element)
5. StringBuilder 

  • StringBuilder append(char c)
  • StringBuilder append(char[] str)
  • StringBuilder append(String s)
  • StringBuilder delete(int start, int end)
  • StringBuilder deleteCharAt(int index)
  • void setCharAt(int index, char c)
  • void insert(int index, char/string s)
  • StringBuilder reverse()
  • String toString()
  • int length()
6. String

  • char charAt(int index)
  • boolean endsWith(String suffix)
  • byte getBytes()
  • int hashCode()
  • int length()
  • boolean matches(String regex)
  • String replace(char oldChar, char newChar)
  • String replaceAll(String regex, String replacement)
  • String replaceFirst(String regex, String replacement)
  • String[] split(String regex)
  • boolean startsWith(String prefix)
  • String substring(int beginIndex)
  • String substring(int beginIndex, int endIndex)
  • char[] toCharArray()
  • String toLowerCase()
  • String toUpperCase()
  • String trim()
7. Heap / Priority Queue

  • boolean add(E e)
  • void clear()
  • boolean contains(Object o)
  • Iterator<E> iterator()
  • boolean offer(E e)
  • E peek()
  • E poll()
  • boolean remove(Object o)
  • int size()
  • Object[] toArray()
  • boolean isEmpty()

Thursday 7 January 2016

Design '秒杀'

1 Cases:

Case 0: A buyer searches product list.
Case 1: A buyer makes an order by clicking the "Order" button.
Case 2: A buyer makes an order by frequently clicking the "Order" button.
Case 2: A buyer makes an order by sending HTTP requests using software.

2 Constraint:

Assuming the number of active buyers in the day is 100 Million Buyers.
Assuming the number of buyers during a peak second is: 100 Million Buyers / (12 * 3600) * 10 = 23150 Buyers. Assuming each buyer requests for 1 product in the peak second, there are 23150 requests per second.

Assuming that there are 10 Million products.

3. Data

Product
(Database: MySQL)

Transaction
(Database: MySQL)

4. System


Wednesday 6 January 2016

Design Google Analytics

1: Scenario

1. A user register/remove/log in/out Google Analytics.
2. A user register/remove a website in Google Analytics.
3. A user access a report in Google Analytics.
4. A visitor visits a web page and send some data to Google Analytics.

2: Constraints:

Assumed No. of Uses: 0.5 Million.
Assumed No. of websites: 1 Million.
Assumed No. of webpages: 1 Million * 100 = 100 Million.
Assumed No. of new websites per day: 1 Million * 0.1% = 1 K
Assumed No. of new websites per a peak second: 1 Million * 0.1% / (24 * 3600) * 10 = 0.1
Assumed No. of webpages access per day: 1 Million * 100 * 0.1 = 10 Million;
Assumed No. of webpages access in a peak second: 10 Million / (24 * 3600) * 10 = 1,157 visits;
Assumed No. of report checks per day: 1 Million
Assumed No. of report checks in a peak second: 1 Million / (24 * 3600) * 10 = 116 visits;
Assumed Traffic in a peak second: 116 * 1Mb/s + 1157 * 100kb/s = 232 Mb/s
Assumed Memory in a peak second (in memory report): 1 Million * 100 Kb = 100 Gb

3. Data

User Account
Re-use Google Account

Website Account & Webpage Access Record
Database (e.g. NoSQL)

Website Analytics Report
Database (e.g. NoSQL) + In Memory Database

4. Key Algorithm
Real-Time Report Generation

5. All In One

#In this framework, the write operations are very heavy and the read operations can allow some delays. So, I decide to split the read and write database for efficiency.

Tuesday 5 January 2016

Design a distributed crawl system

1. Constraint

Assuming 1 trillion web pages
Assuming each page has 10 links to other web pages on average
Assuming each web page is updated every 7 days on average

So, the task is to crawl 1 trillion web pages in 7 days, which is 1,653,439 web pages per second.
So, the peak hour task is 1,653,439 * 5 = 8,267,195 web pages per second.

Assuming that each machine can process 10 web pages per second. It requires 826,720 machines.
Assuming that each data center has 10,000 machines. It requires 83 data centers.

Assuming that each master machine can manage 1000 slave machines, it needs 830 master machines (each data center needs 10 master machines). A partitioning method is required between master machines. (e.g. prefix)

Assuming there are 1 million key words and each keyword is associated with 100,000,000 web pages. The data size requirement is 1,000,000 * ( [20 Bytes] + 100,000,000 * [20 Bytes]) = 
2,000,000,000,000,000 Byte = 2,000 TB.

Assuming that each machine has 1TB space, there is 2,000 machines.

2. Framework


Design User System

Scenario:

  • Log in/out
  • Register/Remove
  • Update
  • Forgot Password

Necessity:

  • No of Users
  • No of Peak users
  • Frequency of operators


Database (User Table):

  • long userID; P_KEY & Index
  • string userName; //max length as 50
  • string hiddenPass; //Not revertible and not easy to break using a brute force way (e.g. Hash).
  •                               //On-side encryption is not a good idea.
  •                               //Encryption with private key is still used (e.g. SSH).
  • int status; //activated? ban?


In Memory:

  • class User {
    • long userID; //P_KEY
    • string userName;
    • string hiddenPass;
    • int status;
  • }

  • class Session {
    • long sessionID; //P_KEY
    • long userID;
    • int devideType;
    • long timeOut; //ms
  • }

Design PayPal

1. Cases:

Case 1: A user open PayPal from some websites (e.g. shopping)
Case 2: A user login PayPal
Case 3: A user pay money using PayPal
Case 4: A user search payment history

Case 5: A seller login PayPal
Case 6: A seller configure payment when user access PayPal
Case 7: A seller search payment history

2. Constraints

Users: 100 Million users
Daily Active user: (assuming 1%) 1 million users
Average Transaction Per Second: 1 million users * 1.5 / (3600 * 24) = 17.36
Peak Transaction Per Second: 17.36 * 10 = 173.10 Tran/Sec

Assuming one machine's capacity is 10 trans/sec, requiring 18 machines. 

Daily new user (assuming 0.1%): 100,000 users
Average new user Per Second: 100,000 / (3600 * 24) = 1.57
Peak new user Per Second: 1.57 * 10 = 15.7 User/Sec

Assuming one machine's capacity is 10 users/sec, requiring 2 machines. 

3. Service:

User Service
(Register, Update, Remove, Login, Logout)

Transaction Service
(TranferFromOtherSystemInPaypal, TransferFromPaypalOut, TransferBetweenPaypals)

Record Service
(Insert, Delete, Search)

4. Data

User Account Data
(Database: MySQL)

Transaction Records
(Database: MySQL (double check functions))

Transaction Log
(Files)

5. Core Algorithms

1. Information Hashing
     
2. Transaction ACID
     Atomicity: Log first (Commit or Rollback)
     Consistency: Consistency checker
     Isolation: Independent Transactions
     Durability: Redundancy

6. All in One

Privacy Manager's Input: user name and password
Privacy Manager's Output: Hashed strings


Monday 4 January 2016

Design UberX

1. Scenarios:

Case 1: A driver starts its service and drive around.
             The driver should see his/her locations moving on map.
Case 2: A driver accepts/denies a trip request.
Case 3: A driver takes a rider to the destination.
             The fee should be updated frequently.
Case 4: A driver dhows the fee and print the receipt for the rider.

Case 5: A rider searches nearby cabs.
Case 6: A rider posts a request and read the response.
Case 7: A rider find the cab.
Case 8: A rider takes the cab to the destination.
Case 9: A rider pay the fee and get the receipt.

2. Constraints:

Assumed number of full-time Drivers: 1 Million
Assumed number of part-time Drivers: 1 Million
Assumed number of Drivers during a peak hour: 1 Million * 80% + 1 Million * 20% = 1 Million

Assumed Average Trip Length: 20 minutes
Assumed Number of Trips during Peak hour: 2 Million trips

Assumed Peak Trips in a second: 2 Million trips / 3600 * 2 = 1111 trip/second

Assumed Memory Requirements: 2 Million * 20 / 60 * 1 Kb = 667 Mb
Assumed Traffic Requirements: 2 Million * 20 / 60 * 10 Kb/s = 6.67 Gb/s

Assumed Data Storage Requirements Per peak hour: 2 Million trips * 1kb = 2 Gb

3. Service:
  • 1. UberX Monitoring Service (both Demand and Supply)
  • 2. Application Service
  • 3. Dispatch Service
  • 4. Business Logic Service (Fee & Receipt)
  • 5. Payment Service (3th Party)
  • 6. User Service
4. Data

Demand and Supply (including gps updating)
No SQL

Business Logic Service
No SQL

Dispatch
No SQL

User Service
Database (MySQL)

Payment Histrory
Database (MySQL)

5. All Together


6. Key Algorithms / Methods
  • Dispatching Algorithm  (Minimizing Estimated Arrive Time)
  • Arrive Time Estimation Algorithm


Sunday 3 January 2016

Design Netflix

1. Cases

Case 1: A user loges in to Netflix
    case sensitive? verification code? forgot password? first time log in?
Case 2: A user searches Netflix
    search by keywords, search by time, search by players, search by catalog.
Case 3: A user plays a movie
    pause, faster, skip the beginning, full screen.
Case 4: A user writes a comment
    comment format? remove comments?
Case 5: A user pays his/her subscription fees
    security

2. Constraints

2015 Oct: 69.17 million subscribers ~ 70 million

Active Users in Peak Time:
70 million * 30 minutes / (24 * 60) * 10 [Peak] = 14.6 million

Active Traffic Requirement in Peak Time:
14.6 million * 200,000 Byte/s = 2.92 TB/s

Storage (minimum):
Movie: 500,000 * 1 GB = 500 TB
Comments: 70 million * 1000 * 1000 Byte: 70 TB

3. Service

User Account Service

Movie & Catalog Service

Comment & Recommendation Service

4. Data

User Account Data
MySQL or PostgreSQL

Movie & Catalog Service
Files & Mongodb (Flexible Schema)

Comment & Recommendation Service
Mongodb (Large Data Size)

5. All Together

Friday 1 January 2016

九章的系统设计班 http://www.jiuzhang.com/course/2

系统设计班

你是否已经刷过很多算法,能顺利写出基本算法,但Dream company的面试,却倒在高难度的System Design问题?你是否有了一定工作基础,对基本题已经失去兴趣,但仰望进入更有挑战更NB的公司?九章算法系统设计班,会为你提供一线情报和实战训练,全面提升你的综合素质,我们会竭力帮助你临门一脚,改变你的人生!

课程优势

在线视频授课。无需东奔西走求学,浏览器,手机端,无需预装任何软件即可上课。课堂实时互动。学生可以向老师提问,及时解决课堂上的疑问。
谁来讲:张无忌老师。北京大学博士。前阿里巴巴高级专家(P8);组建50人团队,研发千万级用户平台,实现千万级收入。

讲什么:通过实战面试真题,揭秘数据系统设计、web系统设计、爬虫系统设计、消息系统设计、统计系统设计的面试技巧。助你临门一脚,拿到更好的offer!课程的题目不仅是面试真题,更是在实际工程中会遇到的真实问题。因此不仅有利于面试,更有利于同学的多快好省地完成实际工作。

适合谁:无需任何算法基础和系统设计基础。适合北美和国内的应届毕业生,非应届毕业生,特别是已经有工作经验准备跳槽的工程师。课程也适合于想要在公司内快速晋升的工程师,将所学应用在日常的工作和晋升答辩之中。

本课程每节课2小时,共7节课

授课内容

1

走进系统设计 Introduce System Design 【免费试听】


本节大纲
  • 什么是系统设计?
  • 如何高效并有效的备考面试?《结构化面试指南》
  • 如何回答系统设计题?精讲一道真题让学员轻松pass一半以上的系统结构面试——如何设计Uber

2

数据系统设计 Database System Design


本节大纲
  • 实战真题
        1)Design data with class and inheritance
        2)Design user system
        3)Design payment system
  • 掌握数据系统设计的知识和方法论    
  • 关键词:database, primary/foreign key, table, row, attribute, index, transaction, log, lock, lifecycle graph, binary search tree, B+Tree, atomicity, consistency, isolation, durability, session

3

爬虫系统设计 Web Crawler System Design


本节大纲
  • 实战真题
        1)Design a crawler
        2)Design thread-safe consumer and producer
        3)Design Tiny URL
  • 掌握爬虫系统设计的知识和方法论
  • 关键词:crawler , HTTP, socket, HTML , TCP, thread, lock, conditional variable, semaphore, tiny URL, insert, lookup, encoding, producer&consumer

4

网站应用系统设计 Web Application System Design


本节大纲
  • 实战真题
        1)What happens when you visit google?
        2)How to increase the visiting speed of a page?
        3)Design rate limiter
  • 掌握Web系统设计的知识和方法论
  • 关键词:failure rate, DNS, web server, file server, timeout, content delivery network, cookie, HTTP, divide and conquer, Internet service provider, hosts, hijack, retention rate, cache, lazy load, rate limiter, QPS, counter , expire, request list, token bucket

5

分布式系统设计 Distributed System Design


本节大纲
  • 实战真题
        1)What is GFS, BigTable, MapReduce?
        2)Design distributed file system
        3)Design distributed database
        4)Calculate with MapReduce
  • 掌握分布式系统设计的知识和方法论
  • 关键词:Google File System, BigTable, MapReduce, bloom-filter, index, log, hot spot, replication, read/write, chunk, traffic, key/value

6

订阅和消息系统设计 Feed & Message System Design


本节大纲
  • 实战真题:
        1)Design feed system
        2)Design chat system
  • 掌握订阅和消息系统设计的知识和方法论
  • 关键词:feed, pull, push, search, chat, SQL, NoSQL, memory database, broadcast, batch, compression, delay, WebSocket, online, timeline

7

面向对象系统设计 OOD System Design


本节大纲
  • 实战真题:
        1)Design typeahead
        2)Design elevator
        3)Design parking lot
        4)Design blackjack
        5)Design Achievement system
  • 掌握面向对象设计的知识和方法论
  • 关键词:typeahead, instant search, binary search tree, trie tree, ternary search tree, like, cache, friend of friend, log, OOD, class, know/do/is, elevator, parking lot, card game, achievement system