Wednesday 30 April 2014

Hash Algorithms

Hash (Design Skills) should be considered firstly in case of Insert and Query.

Instead of scanning to find a element, in Hash, a Hash function is designed to map the key directly to an integer (or an address to find the element), so that the cost turns to be O(1).

The three key features:
1) Hash Function
2) Collision Process
3) Dynamic Internal Array Size

Basic:
1) Hash Function (Based on Uniform Distribution)
    1.1: distribute the keys uniformly into the slots of the table
    1.2: Example: h(k) = k mod m.
           Pick m to be a prime not too close to a power of 2 or 10.
           Purpose:  keys are likely even dirtributed.
           Example m: 77813

    1.3: Example string hash function:
          
        int address = key[0] + 31 * key[1] + 137 * key[2] + 1571 * key[3] + 
              11047 * key[4] + 77813 * key[5];
        return address % kNumBuckets;
 
2) Collision Process
    2.1: Open Address:
          Key Idea:
          Two parameters (key and index)
          Problem:
          Deletion
          Hash Load factor:
          Given an open-addressed hash table with load factor α = n/m < 1, the
expected number of probes in an unsuccessful search is at most 1/(1–α).

    2.2: Linked List

3) Dynamic Array Size
if the internal array (s) is too small, you will have to do n/s operations for a lookup, which is still O(n). So you need to dynamically expand that internal array as the number of items in the hash table grows in order to keep the O(1) performance.

Implementation:
1: C++ std::map:
Very wired, the underlying data structure is a binary search tree. (a red-black tree).
WHY? ordered iterator. Do we need that? In most cases, no!!!

2: unordered_map:
The underlying data structure is a Hash Function.
unordered_map will be faster on insert and delete than a map.


         

Tuesday 29 April 2014

Bash Shell Basic

Bash Shell is universally used in Linux Programming, especially in Testing and Evaluation.

Basically, Bash Shell can be used in 3 different ways:
1) interactive shell
2) shell scripting
3) interactive session

Key concepts in Shell
1) Pipeline A | B : the output of A turns to be the input of B.
2) Redirection > and >>: Use ">" to overwrite any existing file, use ">>" to append to any existing file. In both cases, if no file exists, one is created.
3) command processor: #!/bin/bash
   It is the path of the executable bash. 
4) execute mode: chmod 755 ***.sh
5) Branching: if []/then-else-fi
6) Loops and Repetition:  for [] do XXX done; while [] do XXX done;
7) Value $ : get the value of a variable.
8) read & select
9) string operations
10) array operations
11) function with parameters
    function test {
      $1
      exit;
    }
12) critical commands:
   file, find, grep, pwd, sort, whereis, date, ls, echo, 

Thursday 24 April 2014

Sorting Algorithms

It is only for reviewing, not for learning.
-----------------------------------------------
-----------------------------------------------
Algorithm 0: Sorting Lower Bounds
-----------------------------------------------
-----------------------------------------------














The hight of the tree is log(n^2) = n * log(n)

-----------------------------------------------
-----------------------------------------------
Algorithm 1: Insertion Sort
-----------------------------------------------
-----------------------------------------------
For the i th value, swap it with its previous until the previous is smaller than it.
* Best: O(n)
* Average/Worst: O(n^2)
* Space: O(1)
* Stable and Online
* Idea:












-----------------------------------------------
-----------------------------------------------
Algorithm 2: Merge Sort
-----------------------------------------------
-----------------------------------------------
1: Divide and Conquer
2: Recursive Algorithm
3: Bottom-Up Sorting
4: Best/Average/Worst: O(nlog(n))
5: Space: O(n) for merging
6: Stable
7: can use for external sorting
Idea:
 










And the key is in merge:







-----------------------------------------------
-----------------------------------------------
Algorithm 3: Quick Sort (Straightforward Version)
-----------------------------------------------
-----------------------------------------------
1: Divide and Conquer
2: Recursive Algorithm
3: Top-Down Sorting
4: Best/Average: O(nlog(n))
5: Worst: O(n^2) (sort a reverse ordered list)
6: Space: O(log n)
7: Not Stable
Idea:











Pseudo-Code












-----------------------------------------------
-----------------------------------------------
Algorithm 5: Heap Sort
-----------------------------------------------
-----------------------------------------------
Idea:
1: Build a Maximum Heap;
2: Repeatedly Pick the Max Value in the Heap and Rebuild the left Maximum heap
3:Also useful for pick the Top k values.
4: Best/Average/Worst: O(nlog(n))
5:The amazing of Heap Tree is that it is always balanced.
6: Space: O(1)
7: Not Stable

father: N (root with index = 0)
then left child: N*2+1 and Right child: N*2+2;

-----------------------------------------------
-----------------------------------------------
Algorithm 6: Counting Sort
-----------------------------------------------
-----------------------------------------------
1: Comparison is not always necessary.
2: Time Complexity: O(n+k)
3: space: O(n+k)
4: Stable
5: Pseudo-Code












-----------------------------------------------
-----------------------------------------------
Algorithm 7: Radix Sort
-----------------------------------------------
-----------------------------------------------
1: Sort on the least-significant digit first with the Counting Sort.
2: Data Access is not local, thus cache performance is bad, compared to quick sort.
3: Time Complexity: O(n+k)




Shortest Path Algorithm

It is only for reviewing, not for learning.
-----------------------------------------------
-----------------------------------------------
Algorithm 1: DIJKSTRA(G, w, s)
Greedy Algorithm
1: Overlapping subproblems
2: Optimal substructure
3: Greedy-choice property
-----------------------------------------------
-----------------------------------------------















----------------------------------------------
Cost:














--------------------------------------------- 
Notes:
1) DIJKSTRA is a type of BFS (Breadth-First-Search).
2) If each edge has the same cost, the algorithm can be improved by using FIFO queue instead of heap and get O(V+E).
Nature:
Smaller weights edge always pushed in the queue earlier. Or
v comes after u in Q implies that d[v] = d[u] or d[v] = d[u] + 1.



-----------------------------------------------
-----------------------------------------------
Algorithm 2: DFS
-----------------------------------------------
----------------------------------------------- Recursive Algorithm

DFS(Node) {
     Color(Node);
     Loop (n in Adjacent Nodes) {
           DFS(n);
     }
}

Non-Recursive Algorithm (Using LIFO Stack)

list nodes_to_visit = {root};
while( nodes_to_visit isn't empty ) {
  currentnode = nodes_to_visit.first();  // nodes_to_visit is a LIFO stack
  nodes_to_visit.prepend( currentnode.children );
  //do something
}


-----------------------------------------------
-----------------------------------------------
Algorithm 3: BFS
-----------------------------------------------
-----------------------------------------------
Non-Recursive Algorithm (Using FIFO Queue)

list nodes_to_visit = {root};
while( nodes_to_visit isn't empty ) {
  currentnode = nodes_to_visit.first();  // nodes_to_visit is a FIFO queue
  nodes_to_visit.prepend( currentnode.children );
  //do something
}
 
Recursive Algorithm // surprisingly, it is not that easy as DFS.

BFS(Queue) {
     if(Queue is empty) return;
     Queue.pop();
     //do something
     Queue.push(Children);
     BFS(Queue);
}
 

-----------------------------------------------
-----------------------------------------------
Algorithm 4: A* Algorithm
-----------------------------------------------
-----------------------------------------------
1) DIJKSTRA uses local optimal in the greedy algorithm. However, local optimal is not a good enough choice in many many cases.
2) A* Algorithm uses estimated global optimal in the greedy algorithm.
How to estimate? That's the MAGIC.

It uses a knowledge-plus-heuristic cost function of node x (usually denoted f(x)) to determine the order in which the search visits nodes in the tree.

The h(x) part of the f(x) function must be an admissible heuristic; that is, it must not overestimate the distance to the goal.It is a very string requirement, since the true cost is unknown.

But it does exist in many cases. And, h(x) part of the f(x) function must be very near to the distance to the goal. Otherwise, it does not help much on the algorithm efficiency.

An example of an A star (A*) algorithm in action where nodes are cities connected with roads and h(x) is the straight-line distance to target point.

The contribution of A* Algorithm is that:
A* Algorithm gives a different type of algorithm design methodology, which can be used to make searching more efficient, if localized searching is much worse than optimal.

The time complexity of A* depends on the heuristic. The big-O time complexity may be the same with DIJKSTRA. But the number of checking nodes can be much smaller.

                                      DIJKSTRA Algorithm Search Space

                                                 A* Algorithm Search Space

Saturday 19 April 2014

Financial Industry Categories

1. Market

  • New York Stock Exchange
    • an auction market and uses specialists to trade securities.
    • specialists: keeping the market in equilibrium, they are required to execute all customer orders ahead of their own
    • SuperDOT system is an electronic system used to place orders for stocks on the NYSE.  

    • NASDAQ (National Association of Security Dealers Automatic Quote System)
      • an OTC market where trading is facilitated through market makers.
      • Market makers:  ensure there is a buyer for every sell order and a seller for every buy order at any time. The major risk is the time lapse between the two transactions; the faster he or she can make the spread the more money the market maker has the potential to make. 
      • ECNs network major brokerages and traders, so that they can trade between themselves without involving a middleman.
      • SOES is an automatic order execution for individual traders with orders less than or equal to 1,000 shares.
      • There are three levels on the Nasdaq that vary on the amount of information and access they provide to investors. (Not Fair)
    • American Stock Exchange (AMEX)
    • Chicago Board of Trade (CBOT) 
    • Commodity Exchange of New York

    2. Regulation and control

    • Federal Reserve
    • Securities and Exchange Commission (SEC) 
    • Federal Deposit Insurance Corporation
    • Commodity Futures Trading Commission
    • Department of the Treasury


    3. Market Player and Roles

    • Companies & Investors
    • Hedge Fund Sector
      • investment advisers, investment managers, fund administrators, buy-side analysts, fund custodians, legal advisers, auditors, registrars, research analysts, transfer agents
    • Mutual Fund Sector
      • investment advisers, investment managers, fund administrators, buy-side analysts, fund custodians, legal advisers, auditors, registrars, research analysts, transfer agents
    • Private-Equity Sector
      • financial analysts and associates, portfolio managers, research analysts, accountants, and lawyers
    • Venture Capital Sector
      • general partners, junior partners, financial analysts, financial associates, research analysts, and accountants. Many funds have only partners and support staff

    Banking Industry Categories

    Categories of Bank
    1. Commercial Bank
    1.1 Retail bank (consumer banking or personal banking)
        Customer: Individual, make them rich.

    • Checking and savings accounts
    • Mortgages
    • Credit cards
    • Automobile financing
    • Foreign currency and remittance services
    • Insurance
    • Private banking 
    • Wealth management
    • Trust service
    • Payment services
    1.2 Corporate bank (business banking)
         Customer: company, make them rich.
    • Loans 
    • cash management services
    • Equipment lending
    • Commercial real estate
    • Employer services
    • Trust service
    • Payment services
    2. Investment Bank

    • Traditional investment banking 
    • Asset management
      • (insurance companies, pension funds, corporations etc.) or private investors (both directly via investment contracts and more commonly via collective investment schemes e.g., mutual funds)
    • Trading and principal investments
      • "dealer" transactions (MAJOR PROFIT)
      • "broker" transactions

    CS Industry Categories

    Introduction

    Google, Yahoo, Microsoft, Twitter, Amazon, Alibaba ...

    When I started to look for a job in IT industry, these are the companies I am targeting for. But what is the difference between them? What do I really want to go for my career? There are some Terminologies, which are very basic but very important.

    1. Information Technology

    Information Technology is basically everything related with computers. It includes three broad categories: hardware, software, and the Internet. 
    • Hardware refers to the physical equipment of a computer, such as motherboards, memory chips, microprocessors, routermobile phone and tablet.
    • Software includes the programs that tell the hardware exactly what to do and how to do it, such as OS, compiler, library, applications and games. 
    • The Internet is composed of numerous global networks that are connected to each other, such as social network, chatting, searching, blog and WWW.
    2. Computer Hardware Structure
    Products: 
    • PC, tablet and cell phones
    • semiconductor 
    • navigational, measuring, electromedical
    • peripheral manufacturing, such as USB flash drives, wireless Internet cards, speakers, scanners, web cameras
    Roles:
    • Computer scientists
    • design engineers
    • system analysts
    • computer production engineers
    • Quality assurance engineers
    • Computer sales representatives
    • Technical writers
    • Computer support specialists

    *When I am looking for the job, my skill set is not in Hardware.


    3. Computer Software Structure

    It includes three categories:
    • corporate information services (IS) departments.
      • Roles: Programmers, Database/data warehouse administrators, Systems analysts, Directors and managers , Unix administrators, Network engineers, operators, Chief information officers
    • software vendors.
      • software architects or software engineers, UI design analyststesters, and quality-assurance technicians, The sales staff, project managers and trainers,Directors of product planning, Vice presidents or directors of development
    • consultants.
      • software professionals, project managers, consultants

    Cloud Platform and Mobile Platform are big recent changes, which may move a lot of jobs from 

    IT Departments in various industries, to consultants companies.



    4. The Internet 
    the Internet is one of the most dynamic and evolving sectors of the U.S., and the world. So, there is no clear definition of the internet industry. But, the internet includes:
    • Internet service providers (ISP)
    • Search engine 
    • E-commerce
    • Internet security 
    • Online gaming 
    • social network and chatting
    • Online video
    • Online advertising
    • Cloud computing
    • Mobile networking
    Besides, there are a larger number of jobs and opportunities that are not listed in this paper, like Jobs in Research Centres, Governments, etc. 
      
    Remember: 
    The nature of Computer is to fire people who does not work professionally, especially when AI-robots are created.