Wednesday, 6 January 2016

Design Google Analytics

1: Scenario

1. A user register/remove/log in/out Google Analytics.
2. A user register/remove a website in Google Analytics.
3. A user access a report in Google Analytics.
4. A visitor visits a web page and send some data to Google Analytics.

2: Constraints:

Assumed No. of Uses: 0.5 Million.
Assumed No. of websites: 1 Million.
Assumed No. of webpages: 1 Million * 100 = 100 Million.
Assumed No. of new websites per day: 1 Million * 0.1% = 1 K
Assumed No. of new websites per a peak second: 1 Million * 0.1% / (24 * 3600) * 10 = 0.1
Assumed No. of webpages access per day: 1 Million * 100 * 0.1 = 10 Million;
Assumed No. of webpages access in a peak second: 10 Million / (24 * 3600) * 10 = 1,157 visits;
Assumed No. of report checks per day: 1 Million
Assumed No. of report checks in a peak second: 1 Million / (24 * 3600) * 10 = 116 visits;
Assumed Traffic in a peak second: 116 * 1Mb/s + 1157 * 100kb/s = 232 Mb/s
Assumed Memory in a peak second (in memory report): 1 Million * 100 Kb = 100 Gb

3. Data

User Account
Re-use Google Account

Website Account & Webpage Access Record
Database (e.g. NoSQL)

Website Analytics Report
Database (e.g. NoSQL) + In Memory Database

4. Key Algorithm
Real-Time Report Generation

5. All In One

#In this framework, the write operations are very heavy and the read operations can allow some delays. So, I decide to split the read and write database for efficiency.

No comments:

Post a Comment