ACID
Implementations
Tuesday, 19 January 2016
Tuesday, 12 January 2016
Error Type
Dynamic Programming
- [N+1] for string
- A.charAt(N) ERROR
- Return not dp[N] or dp[N][M], but some MAX or MIN
LinkedList
- Remove Node p (setting p.next = null)
- break link (setting the left end's next = null)
Comparator
- private Comparator<ListNode> listNodeComparator = new Comparator<ListNode>() {
- public int compare(ListNode left, ListNode right) {
- if (left == null) {
- return 1;
- } else if (right == null) {
- return -1
- }
- return left.val - right.val;
- }
- };
Classic Data Structures and Functions
1. ArrayList
- void add(int index, Object element)
- boolean add(Object o)
- boolean addAll(Collection c)
- boolean addAll(int index, Collection c)
- void clear()
- Object clone(): Returns a shallow copy of this ArrayList.
- boolean contains(Object o)
- void ensureCapacity(int minCapacity)
- Object get(int index)
- int indexOf(Object o)
- int lastIndexOf(Object o)
- Object remove(int index)
- protected void removeRange(int fromIndex, int toIndex)
- Object set(int index, Object element)
- int size()
- Object[] toArray()
- Object[] toArray(Object[] a)
- void trimToSize()
2. HashMap
- void clear()
- Object clone(): Returns a shallow copy of this HashMap instance.
- boolean containsKey(Object key)
- boolean containsValue(Object value)
- Set entrySet()
- Object get(Object key)
- boolean isEmpty()
- Set keySet()
- Object put(Object key, Object value)
- putAll(Map m)
- Object remove(Object key)
- int size()
- Collection values()
- add()
- clear()
- clone()
- contains()
- isEmpty()
- remove()
- size()
3. Queue / LinkedList
- boolean add(E e)
- void add(int index, E element)
- boolean addAll(Collection<? extends E> c)
- void clear()
- Object clone()
- boolean contains(Object o)
- E get(int index)
- E peek()
- E poll()
- int size()
- Object[] toArray()
- public boolean isEmpty()
4. Stack
- boolean empty()
- Object peek( )
- Object pop( )
- Object push(Object element)
- int search(Object element)
5. StringBuilder
- StringBuilder append(char c)
- StringBuilder append(char[] str)
- StringBuilder append(String s)
- StringBuilder delete(int start, int end)
- StringBuilder deleteCharAt(int index)
- void setCharAt(int index, char c)
- void insert(int index, char/string s)
- StringBuilder reverse()
- String toString()
- int length()
- char charAt(int index)
- boolean endsWith(String suffix)
- byte getBytes()
- int hashCode()
- int length()
- boolean matches(String regex)
- String replace(char oldChar, char newChar)
- String replaceAll(String regex, String replacement)
- String replaceFirst(String regex, String replacement)
- String[] split(String regex)
- boolean startsWith(String prefix)
- String substring(int beginIndex)
- String substring(int beginIndex, int endIndex)
- char[] toCharArray()
- String toLowerCase()
- String toUpperCase()
- String trim()
- boolean add(E e)
- void clear()
- boolean contains(Object o)
- Iterator<E> iterator()
- boolean offer(E e)
- E peek()
- E poll()
- boolean remove(Object o)
- int size()
- Object[] toArray()
boolean isEmpty()
Thursday, 7 January 2016
Design '秒杀'
1 Cases:
Case 0: A buyer searches product list.
Case 1: A buyer makes an order by clicking the "Order" button.
Case 2: A buyer makes an order by frequently clicking the "Order" button.
Case 2: A buyer makes an order by sending HTTP requests using software.
2 Constraint:
Assuming the number of active buyers in the day is 100 Million Buyers.
Assuming the number of buyers during a peak second is: 100 Million Buyers / (12 * 3600) * 10 = 23150 Buyers. Assuming each buyer requests for 1 product in the peak second, there are 23150 requests per second.
Assuming that there are 10 Million products.
3. Data
Product
(Database: MySQL)
Transaction
(Database: MySQL)
4. System
Case 0: A buyer searches product list.
Case 1: A buyer makes an order by clicking the "Order" button.
Case 2: A buyer makes an order by frequently clicking the "Order" button.
Case 2: A buyer makes an order by sending HTTP requests using software.
2 Constraint:
Assuming the number of active buyers in the day is 100 Million Buyers.
Assuming the number of buyers during a peak second is: 100 Million Buyers / (12 * 3600) * 10 = 23150 Buyers. Assuming each buyer requests for 1 product in the peak second, there are 23150 requests per second.
Assuming that there are 10 Million products.
3. Data
Product
(Database: MySQL)
Transaction
(Database: MySQL)
4. System
Wednesday, 6 January 2016
Design Google Analytics
1: Scenario
1. A user register/remove/log in/out Google Analytics.
2. A user register/remove a website in Google Analytics.
3. A user access a report in Google Analytics.
4. A visitor visits a web page and send some data to Google Analytics.
2: Constraints:
Assumed No. of Uses: 0.5 Million.
Assumed No. of websites: 1 Million.
Assumed No. of webpages: 1 Million * 100 = 100 Million.
Assumed No. of new websites per day: 1 Million * 0.1% = 1 K
Assumed No. of new websites per a peak second: 1 Million * 0.1% / (24 * 3600) * 10 = 0.1
Assumed No. of webpages access per day: 1 Million * 100 * 0.1 = 10 Million;
Assumed No. of webpages access in a peak second: 10 Million / (24 * 3600) * 10 = 1,157 visits;
Assumed No. of report checks per day: 1 Million
Assumed No. of report checks in a peak second: 1 Million / (24 * 3600) * 10 = 116 visits;
Assumed Traffic in a peak second: 116 * 1Mb/s + 1157 * 100kb/s = 232 Mb/s
Assumed Memory in a peak second (in memory report): 1 Million * 100 Kb = 100 Gb
3. Data
User Account
Re-use Google Account
Website Account & Webpage Access Record
Database (e.g. NoSQL)
Website Analytics Report
Database (e.g. NoSQL) + In Memory Database
4. Key Algorithm
Real-Time Report Generation
5. All In One
#In this framework, the write operations are very heavy and the read operations can allow some delays. So, I decide to split the read and write database for efficiency.
1. A user register/remove/log in/out Google Analytics.
2. A user register/remove a website in Google Analytics.
3. A user access a report in Google Analytics.
4. A visitor visits a web page and send some data to Google Analytics.
2: Constraints:
Assumed No. of Uses: 0.5 Million.
Assumed No. of websites: 1 Million.
Assumed No. of webpages: 1 Million * 100 = 100 Million.
Assumed No. of new websites per day: 1 Million * 0.1% = 1 K
Assumed No. of new websites per a peak second: 1 Million * 0.1% / (24 * 3600) * 10 = 0.1
Assumed No. of webpages access per day: 1 Million * 100 * 0.1 = 10 Million;
Assumed No. of webpages access in a peak second: 10 Million / (24 * 3600) * 10 = 1,157 visits;
Assumed No. of report checks per day: 1 Million
Assumed No. of report checks in a peak second: 1 Million / (24 * 3600) * 10 = 116 visits;
Assumed Traffic in a peak second: 116 * 1Mb/s + 1157 * 100kb/s = 232 Mb/s
Assumed Memory in a peak second (in memory report): 1 Million * 100 Kb = 100 Gb
3. Data
User Account
Re-use Google Account
Website Account & Webpage Access Record
Database (e.g. NoSQL)
Website Analytics Report
Database (e.g. NoSQL) + In Memory Database
4. Key Algorithm
Real-Time Report Generation
5. All In One
#In this framework, the write operations are very heavy and the read operations can allow some delays. So, I decide to split the read and write database for efficiency.
Tuesday, 5 January 2016
Design a distributed crawl system
1. Constraint
Assuming 1 trillion web pages
Assuming each page has 10 links to other web pages on average
Assuming each web page is updated every 7 days on average
So, the task is to crawl 1 trillion web pages in 7 days, which is 1,653,439 web pages per second.
So, the peak hour task is 1,653,439 * 5 = 8,267,195 web pages per second.
Assuming that each machine can process 10 web pages per second. It requires 826,720 machines.
Assuming that each data center has 10,000 machines. It requires 83 data centers.
Assuming that each master machine can manage 1000 slave machines, it needs 830 master machines (each data center needs 10 master machines). A partitioning method is required between master machines. (e.g. prefix)
Assuming there are 1 million key words and each keyword is associated with 100,000,000 web pages. The data size requirement is 1,000,000 * ( [20 Bytes] + 100,000,000 * [20 Bytes]) =
2,000,000,000,000,000 Byte = 2,000 TB.
Assuming that each machine has 1TB space, there is 2,000 machines.
2. Framework
Design User System
Scenario:
Necessity:
Database (User Table):
In Memory:
- Log in/out
- Register/Remove
- Update
- Forgot Password
Necessity:
- No of Users
- No of Peak users
- Frequency of operators
Database (User Table):
- long userID; P_KEY & Index
- string userName; //max length as 50
- string hiddenPass; //Not revertible and not easy to break using a brute force way (e.g. Hash).
- //On-side encryption is not a good idea.
- //Encryption with private key is still used (e.g. SSH).
- int status; //activated? ban?
In Memory:
- class User {
- long userID; //P_KEY
- string userName;
- string hiddenPass;
- int status;
- }
- class Session {
- long sessionID; //P_KEY
- long userID;
- int devideType;
- long timeOut; //ms
- }
Design PayPal
1. Cases:
Case 1: A user open PayPal from some websites (e.g. shopping)
Case 2: A user login PayPal
Case 3: A user pay money using PayPal
Case 4: A user search payment history
Case 5: A seller login PayPal
Case 6: A seller configure payment when user access PayPal
Case 7: A seller search payment history
2. Constraints
Users: 100 Million users
Daily Active user: (assuming 1%) 1 million users
Average Transaction Per Second: 1 million users * 1.5 / (3600 * 24) = 17.36
Peak Transaction Per Second: 17.36 * 10 = 173.10 Tran/Sec
Assuming one machine's capacity is 10 trans/sec, requiring 18 machines.
Daily new user (assuming 0.1%): 100,000 users
Average new user Per Second: 100,000 / (3600 * 24) = 1.57
Peak new user Per Second: 1.57 * 10 = 15.7 User/Sec
Assuming one machine's capacity is 10 users/sec, requiring 2 machines.
3. Service:
User Service
(Register, Update, Remove, Login, Logout)
Transaction Service
(TranferFromOtherSystemInPaypal, TransferFromPaypalOut, TransferBetweenPaypals)
Record Service
(Insert, Delete, Search)
4. Data
User Account Data
(Database: MySQL)
Transaction Records
(Database: MySQL (double check functions))
Transaction Log
(Files)
5. Core Algorithms
1. Information Hashing
2. Transaction ACID
Atomicity: Log first (Commit or Rollback)
Consistency: Consistency checker
Isolation: Independent Transactions
Durability: Redundancy
6. All in One
Privacy Manager's Input: user name and password
Privacy Manager's Output: Hashed strings
Case 1: A user open PayPal from some websites (e.g. shopping)
Case 2: A user login PayPal
Case 3: A user pay money using PayPal
Case 4: A user search payment history
Case 5: A seller login PayPal
Case 6: A seller configure payment when user access PayPal
Case 7: A seller search payment history
2. Constraints
Users: 100 Million users
Daily Active user: (assuming 1%) 1 million users
Average Transaction Per Second: 1 million users * 1.5 / (3600 * 24) = 17.36
Peak Transaction Per Second: 17.36 * 10 = 173.10 Tran/Sec
Assuming one machine's capacity is 10 trans/sec, requiring 18 machines.
Daily new user (assuming 0.1%): 100,000 users
Average new user Per Second: 100,000 / (3600 * 24) = 1.57
Peak new user Per Second: 1.57 * 10 = 15.7 User/Sec
Assuming one machine's capacity is 10 users/sec, requiring 2 machines.
3. Service:
User Service
(Register, Update, Remove, Login, Logout)
Transaction Service
(TranferFromOtherSystemInPaypal, TransferFromPaypalOut, TransferBetweenPaypals)
Record Service
(Insert, Delete, Search)
4. Data
User Account Data
(Database: MySQL)
Transaction Records
(Database: MySQL (double check functions))
Transaction Log
(Files)
5. Core Algorithms
1. Information Hashing
2. Transaction ACID
Atomicity: Log first (Commit or Rollback)
Consistency: Consistency checker
Isolation: Independent Transactions
Durability: Redundancy
6. All in One
Privacy Manager's Input: user name and password
Privacy Manager's Output: Hashed strings
Monday, 4 January 2016
Design UberX
1. Scenarios:
Case 1: A driver starts its service and drive around.
The driver should see his/her locations moving on map.
Case 2: A driver accepts/denies a trip request.
Case 3: A driver takes a rider to the destination.
The fee should be updated frequently.
Case 4: A driver dhows the fee and print the receipt for the rider.
Case 5: A rider searches nearby cabs.
Case 6: A rider posts a request and read the response.
Case 7: A rider find the cab.
Case 8: A rider takes the cab to the destination.
Case 9: A rider pay the fee and get the receipt.
2. Constraints:
Assumed number of full-time Drivers: 1 Million
Assumed number of part-time Drivers: 1 Million
Assumed number of Drivers during a peak hour: 1 Million * 80% + 1 Million * 20% = 1 Million
Assumed Average Trip Length: 20 minutes
Assumed Number of Trips during Peak hour: 2 Million trips
Assumed Peak Trips in a second: 2 Million trips / 3600 * 2 = 1111 trip/second
Assumed Memory Requirements: 2 Million * 20 / 60 * 1 Kb = 667 Mb
Assumed Traffic Requirements: 2 Million * 20 / 60 * 10 Kb/s = 6.67 Gb/s
Assumed Data Storage Requirements Per peak hour: 2 Million trips * 1kb = 2 Gb
3. Service:
Case 1: A driver starts its service and drive around.
The driver should see his/her locations moving on map.
Case 2: A driver accepts/denies a trip request.
Case 3: A driver takes a rider to the destination.
The fee should be updated frequently.
Case 4: A driver dhows the fee and print the receipt for the rider.
Case 5: A rider searches nearby cabs.
Case 6: A rider posts a request and read the response.
Case 7: A rider find the cab.
Case 8: A rider takes the cab to the destination.
Case 9: A rider pay the fee and get the receipt.
2. Constraints:
Assumed number of full-time Drivers: 1 Million
Assumed number of part-time Drivers: 1 Million
Assumed number of Drivers during a peak hour: 1 Million * 80% + 1 Million * 20% = 1 Million
Assumed Average Trip Length: 20 minutes
Assumed Number of Trips during Peak hour: 2 Million trips
Assumed Peak Trips in a second: 2 Million trips / 3600 * 2 = 1111 trip/second
Assumed Memory Requirements: 2 Million * 20 / 60 * 1 Kb = 667 Mb
Assumed Traffic Requirements: 2 Million * 20 / 60 * 10 Kb/s = 6.67 Gb/s
Assumed Data Storage Requirements Per peak hour: 2 Million trips * 1kb = 2 Gb
3. Service:
- 1. UberX Monitoring Service (both Demand and Supply)
- 2. Application Service
- 3. Dispatch Service
- 4. Business Logic Service (Fee & Receipt)
- 5. Payment Service (3th Party)
- 6. User Service
4. Data
Demand and Supply (including gps updating)
No SQL
Business Logic Service
No SQL
Dispatch
No SQL
User Service
Database (MySQL)
Payment Histrory
Database (MySQL)
Payment Histrory
Database (MySQL)
5. All Together
Sunday, 3 January 2016
Design Netflix
1. Cases
Case 1: A user loges in to Netflix
case sensitive? verification code? forgot password? first time log in?
Case 2: A user searches Netflix
search by keywords, search by time, search by players, search by catalog.
Case 3: A user plays a movie
pause, faster, skip the beginning, full screen.
Case 4: A user writes a comment
comment format? remove comments?
Case 5: A user pays his/her subscription fees
security
2. Constraints
2015 Oct: 69.17 million subscribers ~ 70 million
Active Users in Peak Time:
70 million * 30 minutes / (24 * 60) * 10 [Peak] = 14.6 million
Active Traffic Requirement in Peak Time:
14.6 million * 200,000 Byte/s = 2.92 TB/s
Storage (minimum):
Movie: 500,000 * 1 GB = 500 TB
Comments: 70 million * 1000 * 1000 Byte: 70 TB
3. Service
User Account Service
Movie & Catalog Service
Comment & Recommendation Service
4. Data
User Account Data
MySQL or PostgreSQL
Movie & Catalog Service
Files & Mongodb (Flexible Schema)
Comment & Recommendation Service
Mongodb (Large Data Size)
5. All Together
Case 1: A user loges in to Netflix
case sensitive? verification code? forgot password? first time log in?
Case 2: A user searches Netflix
search by keywords, search by time, search by players, search by catalog.
Case 3: A user plays a movie
pause, faster, skip the beginning, full screen.
Case 4: A user writes a comment
comment format? remove comments?
Case 5: A user pays his/her subscription fees
security
2. Constraints
2015 Oct: 69.17 million subscribers ~ 70 million
Active Users in Peak Time:
70 million * 30 minutes / (24 * 60) * 10 [Peak] = 14.6 million
Active Traffic Requirement in Peak Time:
14.6 million * 200,000 Byte/s = 2.92 TB/s
Storage (minimum):
Movie: 500,000 * 1 GB = 500 TB
Comments: 70 million * 1000 * 1000 Byte: 70 TB
3. Service
User Account Service
Movie & Catalog Service
Comment & Recommendation Service
4. Data
User Account Data
MySQL or PostgreSQL
Movie & Catalog Service
Files & Mongodb (Flexible Schema)
Comment & Recommendation Service
Mongodb (Large Data Size)
5. All Together
Friday, 1 January 2016
九章的系统设计班 http://www.jiuzhang.com/course/2
系统设计班
你是否已经刷过很多算法,能顺利写出基本算法,但Dream company的面试,却倒在高难度的System Design问题?你是否有了一定工作基础,对基本题已经失去兴趣,但仰望进入更有挑战更NB的公司?九章算法系统设计班,会为你提供一线情报和实战训练,全面提升你的综合素质,我们会竭力帮助你临门一脚,改变你的人生!
课程优势
在线视频授课。无需东奔西走求学,浏览器,手机端,无需预装任何软件即可上课。课堂实时互动。学生可以向老师提问,及时解决课堂上的疑问。
谁来讲:张无忌老师。北京大学博士。前阿里巴巴高级专家(P8);组建50人团队,研发千万级用户平台,实现千万级收入。
讲什么:通过实战面试真题,揭秘数据系统设计、web系统设计、爬虫系统设计、消息系统设计、统计系统设计的面试技巧。助你临门一脚,拿到更好的offer!课程的题目不仅是面试真题,更是在实际工程中会遇到的真实问题。因此不仅有利于面试,更有利于同学的多快好省地完成实际工作。
适合谁:无需任何算法基础和系统设计基础。适合北美和国内的应届毕业生,非应届毕业生,特别是已经有工作经验准备跳槽的工程师。课程也适合于想要在公司内快速晋升的工程师,将所学应用在日常的工作和晋升答辩之中。
本课程每节课2小时,共7节课
授课内容
第1章 | 走进系统设计 Introduce System Design 【免费试听】 | ||
本节大纲 |
|
第2章 | 数据系统设计 Database System Design | ||
本节大纲 |
2)Design user system 3)Design payment system
|
第3章 | 爬虫系统设计 Web Crawler System Design | ||
本节大纲 |
2)Design thread-safe consumer and producer 3)Design Tiny URL
|
第4章 | 网站应用系统设计 Web Application System Design | ||
本节大纲 |
2)How to increase the visiting speed of a page? 3)Design rate limiter
|
第5章 | 分布式系统设计 Distributed System Design | ||
本节大纲 |
2)Design distributed file system 3)Design distributed database 4)Calculate with MapReduce
|
第6章 | 订阅和消息系统设计 Feed & Message System Design | ||
本节大纲 |
2)Design chat system
|
第7章 | 面向对象系统设计 OOD System Design | ||
本节大纲 |
1)Design typeahead
2)Design elevator
3)Design parking lot
4)Design blackjack
5)Design Achievement system
|
Subscribe to:
Posts (Atom)