xuyan.nus: April 2014

Wednesday, 30 April 2014

Hash Algorithms

Hash (Design Skills) should be considered firstly in case of Insert and Query.

Instead of scanning to find a element, in Hash, a Hash function is designed to map the key directly to an integer (or an address to find the element), so that the cost turns to be O(1).

The three key features:
1) Hash Function
2) Collision Process
3) Dynamic Internal Array Size

Basic:
1) Hash Function (Based on Uniform Distribution)
    1.1: distribute the keys uniformly into the slots of the table
    1.2: Example: h(k) = k mod m.
           Pick m to be a prime not too close to a power of 2 or 10.
   Purpose: keys are likely even dirtributed.
   Example m: 77813

    1.3: Example string hash function:

        int address = key[0] + 31 * key[1] + 137 * key[2] + 1571 * key[3] + 
              11047 * key[4] + 77813 * key[5];
        return address % kNumBuckets;

2) Collision Process
    2.1: Open Address:
          Key Idea:
          Two parameters (key and index)
          Problem:
          Deletion
Hash Load factor:
Given an open-addressed hash table with load factor α = n/m < 1, the
expected number of probes in an unsuccessful search is at most 1/(1–α).

    2.2: Linked List

3) Dynamic Array Size
if the internal array (s) is too small, you will have to do n/s operations for a lookup, which is still O(n). So you need to dynamically expand that internal array as the number of items in the hash table grows in order to keep the O(1) performance.

Implementation:
1: C++ std::map:
Very wired, the underlying data structure is a binary search tree. (a red-black tree).
WHY? ordered iterator. Do we need that? In most cases, no!!!

2: unordered_map:
The underlying data structure is a Hash Function.
unordered_map will be faster on insert and delete than a map.

Tuesday, 29 April 2014

Bash Shell Basic

Bash Shell is universally used in Linux Programming, especially in Testing and Evaluation.

Basically, Bash Shell can be used in 3 different ways:
1) interactive shell
2) shell scripting
3) interactive session

Key concepts in Shell
1) Pipeline A | B : the output of A turns to be the input of B.
2) Redirection > and >>: Use ">" to overwrite any existing file, use ">>" to append to any existing file. In both cases, if no file exists, one is created.
3) command processor: #!/bin/bash
It is the path of the executable bash.
4) execute mode: chmod 755 ***.sh
5) Branching: if []/then-else-fi
6) Loops and Repetition: for [] do XXX done; while [] do XXX done;
7) Value $ : get the value of a variable.
8) read & select
9) string operations
10) array operations
11) function with parameters
function test {
$1
exit;
}
12) critical commands:
file, find, grep, pwd, sort, whereis, date, ls, echo,

Thursday, 24 April 2014

Sorting Algorithms

It is only for reviewing, not for learning.
-----------------------------------------------
-----------------------------------------------
Algorithm 0: Sorting Lower Bounds
-----------------------------------------------
-----------------------------------------------

The hight of the tree is log(n^2) = n * log(n)

-----------------------------------------------
-----------------------------------------------
Algorithm 1: Insertion Sort
-----------------------------------------------
-----------------------------------------------
For the i th value, swap it with its previous until the previous is smaller than it.
* Best: O(n)
* Average/Worst: O(n^2)
* Space: O(1)
* Stable and Online
* Idea:

-----------------------------------------------
-----------------------------------------------
Algorithm 2: Merge Sort
-----------------------------------------------
-----------------------------------------------
1: Divide and Conquer
2: Recursive Algorithm
3: Bottom-Up Sorting
4: Best/Average/Worst: O(nlog(n))
5: Space: O(n) for merging
6: Stable
7: can use for external sorting
Idea:

And the key is in merge:

-----------------------------------------------
-----------------------------------------------
Algorithm 3: Quick Sort (Straightforward Version)
-----------------------------------------------
-----------------------------------------------
1: Divide and Conquer
2: Recursive Algorithm
3: Top-Down Sorting
4: Best/Average: O(nlog(n))
5: Worst: O(n^2) (sort a reverse ordered list)
6: Space: O(log n)
7: Not Stable
Idea:

Pseudo-Code

-----------------------------------------------
-----------------------------------------------
Algorithm 5: Heap Sort
-----------------------------------------------
-----------------------------------------------
Idea:
1: Build a Maximum Heap;
2: Repeatedly Pick the Max Value in the Heap and Rebuild the left Maximum heap
3:Also useful for pick the Top k values.
4: Best/Average/Worst: O(nlog(n))
5:The amazing of Heap Tree is that it is always balanced.
6: Space: O(1)
7: Not Stable

father: N (root with index = 0)
then left child: N*2+1 and Right child: N*2+2;

-----------------------------------------------
-----------------------------------------------
Algorithm 6: Counting Sort
-----------------------------------------------
-----------------------------------------------
1: Comparison is not always necessary.
2: Time Complexity: O(n+k)
3: space: O(n+k)
4: Stable
5: Pseudo-Code

-----------------------------------------------
-----------------------------------------------
Algorithm 7: Radix Sort
-----------------------------------------------
-----------------------------------------------
1: Sort on the least-significant digit first with the Counting Sort.
2: Data Access is not local, thus cache performance is bad, compared to quick sort.
3: Time Complexity: O(n+k)

Shortest Path Algorithm

It is only for reviewing, not for learning.
-----------------------------------------------
-----------------------------------------------
Algorithm 1: DIJKSTRA(G, w, s)
Greedy Algorithm
1: Overlapping subproblems
2: Optimal substructure
3: Greedy-choice property
-----------------------------------------------
-----------------------------------------------

----------------------------------------------
Cost:

---------------------------------------------
Notes:
1) DIJKSTRA is a type of BFS (Breadth-First-Search).
2) If each edge has the same cost, the algorithm can be improved by using FIFO queue instead of heap and get O(V+E).
Nature:
Smaller weights edge always pushed in the queue earlier. Or
v comes after u in Q implies that d[v] = d[u] or d[v] = d[u] + 1.

-----------------------------------------------
-----------------------------------------------
Algorithm 2: DFS
-----------------------------------------------
----------------------------------------------- Recursive Algorithm

DFS(Node) {
   Color(Node);
   Loop (n in Adjacent Nodes) {
           DFS(n);
     }
}

Non-Recursive Algorithm (Using LIFO Stack)

list nodes_to_visit = {root};
while( nodes_to_visit isn't empty ) {
  currentnode = nodes_to_visit.first();  // nodes_to_visit is a LIFO stack
  nodes_to_visit.prepend( currentnode.children );
  //do something
}

-----------------------------------------------
-----------------------------------------------
Algorithm 3: BFS
-----------------------------------------------
-----------------------------------------------
Non-Recursive Algorithm (Using FIFO Queue)

list nodes_to_visit = {root};
while( nodes_to_visit isn't empty ) {
  currentnode = nodes_to_visit.first();  // nodes_to_visit is a FIFO queue
  nodes_to_visit.prepend( currentnode.children );
  //do something
}

Recursive Algorithm // surprisingly, it is not that easy as DFS.

BFS(Queue) {
   if(Queue is empty) return;
     Queue.pop();
   //do something
     Queue.push(Children);
   BFS(Queue);
}

-----------------------------------------------
-----------------------------------------------
Algorithm 4: A* Algorithm
-----------------------------------------------
-----------------------------------------------
1) DIJKSTRA uses local optimal in the greedy algorithm. However, local optimal is not a good enough choice in many many cases.
2) A* Algorithm uses estimated global optimal in the greedy algorithm.
How to estimate? That's the MAGIC.

It uses a knowledge-plus-heuristic cost function of node x (usually denoted f(x)) to determine the order in which the search visits nodes in the tree.

The h(x) part of the f(x) function must be an admissible heuristic; that is, it must not overestimate the distance to the goal.It is a very string requirement, since the true cost is unknown.

But it does exist in many cases. And, h(x) part of the f(x) function must be very near to the distance to the goal. Otherwise, it does not help much on the algorithm efficiency.

An example of an A star (A*) algorithm in action where nodes are cities connected with roads and h(x) is the straight-line distance to target point.

The contribution of A* Algorithm is that:
A* Algorithm gives a different type of algorithm design methodology, which can be used to make searching more efficient, if localized searching is much worse than optimal.

The time complexity of A* depends on the heuristic. The big-O time complexity may be the same with DIJKSTRA. But the number of checking nodes can be much smaller.

DIJKSTRA Algorithm Search Space

A* Algorithm Search Space

Saturday, 19 April 2014

Financial Industry Categories

1. Market

New York Stock Exchange

an auction market and uses specialists to trade securities.
specialists: keeping the market in equilibrium, they are required to execute all customer orders ahead of their own
SuperDOT system is an electronic system used to place orders for stocks on the NYSE.

NASDAQ (National Association of Security Dealers Automatic Quote System)

an OTC market where trading is facilitated through market makers.
Market makers: ensure there is a buyer for every sell order and a seller for every buy order at any time. The major risk is the time lapse between the two transactions; the faster he or she can make the spread the more money the market maker has the potential to make.
ECNs network major brokerages and traders, so that they can trade between themselves without involving a middleman.
SOES is an automatic order execution for individual traders with orders less than or equal to 1,000 shares.
There are three levels on the Nasdaq that vary on the amount of information and access they provide to investors. (Not Fair)

American Stock Exchange (AMEX)
Chicago Board of Trade (CBOT)
Commodity Exchange of New York

2. Regulation and control

Federal Reserve
Securities and Exchange Commission (SEC)
Federal Deposit Insurance Corporation
Commodity Futures Trading Commission
Department of the Treasury

3. Market Player and Roles

Companies & Investors
Hedge Fund Sector

investment advisers, investment managers, fund administrators, buy-side analysts, fund custodians, legal advisers, auditors, registrars, research analysts, transfer agents

Mutual Fund Sector

investment advisers, investment managers, fund administrators, buy-side analysts, fund custodians, legal advisers, auditors, registrars, research analysts, transfer agents

Private-Equity Sector

financial analysts and associates, portfolio managers, research analysts, accountants, and lawyers

Venture Capital Sector

general partners, junior partners, financial analysts, financial associates, research analysts, and accountants. Many funds have only partners and support staff

Banking Industry Categories

Categories of Bank
1. Commercial Bank
1.1 Retail bank (consumer banking or personal banking)
Customer: Individual, make them rich.

Checking and savings accounts
Mortgages
Credit cards
Automobile financing
Foreign currency and remittance services
Insurance
Private banking
Wealth management
Trust service
Payment services

1.2 Corporate bank (business banking)
Customer: company, make them rich.

Loans
cash management services
Equipment lending
Commercial real estate
Employer services
Trust service
Payment services

2. Investment Bank

Traditional investment banking

initial public offerings (IPOs), secondary offerings, brokerage, and mergers and acquisitions.

Asset management

(insurance companies, pension funds, corporations etc.) or private investors (both directly via investment contracts and more commonly via collective investment schemes e.g., mutual funds)

Trading and principal investments

"dealer" transactions (MAJOR PROFIT)
"broker" transactions

CS Industry Categories

Introduction

Google, Yahoo, Microsoft, Twitter, Amazon, Alibaba ...
When I started to look for a job in IT industry, these are the companies I am targeting for. But what is the difference between them? What do I really want to go for my career? There are some Terminologies, which are very basic but very important.

1. Information Technology
Information Technology is basically everything related with computers. It includes three broad categories: hardware, software, and the Internet.

Hardware refers to the physical equipment of a computer, such as motherboards, memory chips, microprocessors, router, mobile phone and tablet.

Software includes the programs that tell the hardware exactly what to do and how to do it, such as OS, compiler, library, applications and games.

The Internet is composed of numerous global networks that are connected to each other, such as social network, chatting, searching, blog and WWW.

2. Computer Hardware Structure
Products:

PC, tablet and cell phones
semiconductor
navigational, measuring, electromedical
peripheral manufacturing, such as USB flash drives, wireless Internet cards, speakers, scanners, web cameras

Roles:

Computer scientists
design engineers
system analysts
computer production engineers
Quality assurance engineers
Computer sales representatives
Technical writers
Computer support specialists

*When I am looking for the job, my skill set is not in Hardware.

3. Computer Software Structure
It includes three categories:

corporate information services (IS) departments.

Roles: Programmers, Database/data warehouse administrators, Systems analysts, Directors and managers , Unix administrators, Network engineers, operators, Chief information officers

software vendors.

software architects or software engineers, UI design analysts, testers, and quality-assurance technicians, The sales staff, project managers and trainers,Directors of product planning, Vice presidents or directors of development

consultants.

software professionals, project managers, consultants

Cloud Platform and Mobile Platform are big recent changes, which may move a lot of jobs from

IT Departments in various industries, to consultants companies.

4. The Internet
the Internet is one of the most dynamic and evolving sectors of the U.S., and the world. So, there is no clear definition of the internet industry. But, the internet includes:

Internet service providers (ISP)
Search engine
E-commerce
Internet security
Online gaming
social network and chatting
Online video
Online advertising
Cloud computing
Mobile networking

Besides, there are a larger number of jobs and opportunities that are not listed in this paper, like Jobs in Research Centres, Governments, etc.

Remember:
The nature of Computer is to fire people who does not work professionally, especially when AI-robots are created.