Saturday 24 May 2014

Data In Memory (C++) [Contains Chinese]

I found a better source:
http://help.topcoder.com/data-science/competing-in-algorithm-challenges/algorithm-tutorials/representation-of-integers-and-reals/

1. Int (32bits)
Range: (-2^31+1, 2^31-1)
Positive: [0] [XXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX]
Negative: save the complement.
Given: -10
1) abs (-10) = 10 -> XXX 00001010 (X: 0)
2) adverse: -> XXX 11110101 (X:1)
3) +1: -> XXX 11110110 (X:1)

2. Unsigned Int
There is no sign digit.

3. short int
-32768~32767
-32768: [1] [000000 00000000 00000000 00000000]
32767:  [0] [111111 1111111 1111111 1111111]











4: float
在二进制科学表示法中,S=M*2^N 主要由三部分构成:符号位+阶码(N)+尾数(M)。
对于float型数据,其二进制有32位,其中符号位1位,阶码8位,尾数23位;
对于double型数据,其二进制为64位,符号位1位,阶码11位,尾数52位。
-------------------------------------
                31        30-23       22-0
float       符号位     阶码        尾数
                63        62-52       51-0

double    符号位     阶码        尾数
------------------------------------------
符号位:0表示正,1表示负
阶码:这里阶码采用移码表示,对于float型数据其规定偏置量为127,阶码有正有负,对于8位二进制,则其表示范围为-128-127,double型规定为1023,其表示范围为-1024-1023。

尾数:有效数字位,即部分二进制位(小数点后面的二进制位),因为规定M的整数部分恒为1,所以这个1就不进行存储了。
------------
for example:
float型数据125.5转换为标准浮点格式
125二进制表示形式为1111101,小数部分表示为二进制为 1,则125.5二进制表示为1111101.1,由于规定尾数的整数部分恒为1,则表示为1.1111011*2^6,阶码为6,加上127为133 (+127的目的是防止出现阶码为负的可能性),则表示为10000101,而对于尾数将整数部分1去掉,为1111011,在其后面补0使其位数达到23位,则为11110110000000000000000
则其二进制表示形式为
0 10000101 11110110000000000000000,则在内存中存放方式为:
00000000   低地址
00000000
11111011
01000010   高地址

Tuesday 20 May 2014

Algorithm Mistakes I made

  1. Initialize a BST with a parent pointer, but forgot to initialize parent pointers.
  2. Pointers and References. Wrong Assignment.
    (Reference is better in the functions passing; Pointer is better as variables)
  3. Private Access Attributes in Class.
  4. Forgot to design functions to replace function copy.
  5. Forgot to initialize all parameters, especially pointers.
  6. Check whether pointer is NULL. (not null)
  7. Sorting:
    1. Merge Sort: stop_condition if(end-start<=0) return; [not 1]
    2. Quick Sort: Partition Function has return value, the middle point.
    3. Heap Sort: be careful on the boundary: < or <=.
  8. Class & Struct ending with semicolon.
    1. struct {}; 
    2. class {};
  9. Be careful when copy codes;
  10. Pointers operations cannot be passed into another function.
    1. example function:
    2. movePointers(Pointer* p); will not affect the original pointer!!
  11. Did not test the idea before implementation.
  12. Strictly follow the algorithm flow chart.
  13. Be careful to use Recursive Algorithm, it tends to be exponential and slow. Analysis the performance cost first.
  14. AA

Monday 19 May 2014

C++ Data Structure I: what is the memory looking like?

1: Vector
Like arrays, vectors use contiguous storage locations for their elements, but unlike arrays, their size can change dynamically, with their storage being handled automatically by the container. It may need to be reallocated in order to grow in size when new elements are inserted, which implies allocating a new array and moving all elements to it. This is a relatively expensive task in terms of processing time. but in any case, reallocations should only happen at logarithmically growing intervals of size so that the insertion of individual elements at the end of the vector can be provided with amortized constant time complexity (see push_back).

Functions: reserve(), push_back(),capacity(),empty(), operator[], insert()

2: List
Lists are sequence containers that allow constant time insert and erase operations anywhere within the sequence, and iteration in both directions. List containers are implemented as doubly-linked lists;
The main design difference between a forward_list container and a list container is that the first keeps internally only a link to the next element, while the latter keeps two links per element: one pointing to the next element and one to the preceding one, allowing efficient iteration in both directions, but consuming additional storage per element and with a slight higher time overhead inserting and removing elements.

Functions: push_front, push_back, pop_back(), insert(), back(), front();

3: Deque
double-ended queue. Double-ended queues are sequence containers with dynamic sizes that can be expanded or contracted on both ends (either its front or its back).
Both vectors and deques provide a very similar interface and can be used for similar purposes, but internally both work in quite different ways: While vectors use a single array that needs to be occasionally reallocated for growth, the elements of a deque can be scattered in different chunks of storage.

4: Set
Sets are containers that store unique elements following a specific order. The value of the elements in a set cannot be modified once in the container.
Sets are typically implemented as binary search trees.\
Hash: unordered_set

5: Multisets
Multisets are containers that store elements following a specific order, and where multiple elements can have equivalent values.
Multisets are typically implemented as binary search trees.
Hash: unordered_multiset

6: Map
Maps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order.
The mapped values in a map can be accessed directly by their corresponding key using the bracket operator ((operator[]).
Maps are typically implemented as binary search trees.
Function: insert ( std::pair<char,int>('a',100) ); find(), erase();
Hash: unordered_map

7: Multimap
Multimaps are associative containers that store elements formed by a combination of a key value and a mapped value, following a specific order, and where multiple elements can have equivalent keys.
Hash: unordered_multimap

8: Stack
Stacks are a type of container adaptor, specifically designed to operate in a LIFO context (last-in first-out), where elements are inserted and extracted only from one end of the container.
By default, if no container class is specified for a particular stack class instantiation, the standard container deque is used.

Functions: push(), top(), pop()

Sunday 11 May 2014

Theory of Computing

Firstly, check the difference of complexity.







(source: http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=importance_of_algorithms)

P Problem:
A decision problem that can be solved in polynomial time. That is, given an instance of the problem, the answer yes or no can be decided in polynomial time.

NP Problem:
A decision problem where instances of the problem for which the answer is yes have proofs that can be verified in polynomial time. This means that if someone gives us an instance of the problem and a certificate (sometimes called a witness) to the answer being yes, we can check that it is correct in polynomial time.

P and NP (whether searching is necessary?) Unknown.

NP-complete:
An NP problem X for which it is possible to reduce any other NP problem Y to X in polynomial time. Intuitively this means that we can solve Y quickly if we know how to solve X quickly. Precisely, Y is reducible to X if there is a polynomial time algorithm f to transform instances y of Y to instances x = f(y) of X in polynomial time with the property that the answer to y is yes if and only if the answer to f(y) is yes.

NP-hard: Intuitively these are the problems that are even harder than the NP-complete problems. Note that NP-hard problems do not have to be in NP (they do not have to be decision problems). The precise definition here is that a problem X is NP-hard if there is an NP-complete problem Y such that Y is reducible to X in polynomial time.

Wednesday 7 May 2014

How to do System Design

Step 1: System Clarify
(UDF)
User: who is the user? how does the user use the system? frequency? expected results? malicious?
Data: what data content? what format? size? redundancy?
Function: Insert? Delete? Query? Scale? Major limitations?

Step 2: Concept Design
Components (or Modules)
Entities (or Classes)
Interactions (or Interfaces)

Step 3: Design
Data Structures
Algorithms
Database Schema (Key and Index)
Class Diagram

Step 4: Analysis
Complexity Analysis (Time & Space)
Scalability Analysis
Memory Limitation
Flexibility or Extensions

Monday 5 May 2014

Graph Algorithm Basic

1. Graph Representation in Memory

  1. Adjacency matrix:
  2. Sorted Edge List (minor): every edge is stored as a tuple (vi;vj) in a sorted list.
  3. Adjacency List: node array + edge list;
  4. Objects and Pointers: (similar to (2), but not sorted)

1) The optimal representation depends on the type of the graph. For instance, a Matrix wastes a lot of memory if a sparse graph is given, but it has optimal access times for any type of graph.
2) Objects and Pointers: very easy to insert and delete.
3) In Matrix, only use 0 or 1; But in Adjacency List, has to use Integers.
4) In Matrix, has to scan all N nodes to get Adjacency nodes.

Graph Traversal:
DFS: Fast to find one solution, but may need to scan the whole search space;

  1. explicitly stack-based method: more complex but use heap memory (more safe)
  2. implicitly recursive functions: simply to code but careful the stack overflow

BFS: No need to search the whole search space; especially for questions like "find the shortest path to ..."
  1. explicitly queue-based method: more complex but use heap memory (more safe)
Notice the similarities between this and a depth-first search, we only differ in the data structure used and we mark a vertex visited as we push it into the queue, not as we pop it. 

2. Shortest Path Problems

2.1 Single source shortest path problem 
Dijkstra’s algorithm



























  • A* Algorithm tends to be faster, but has the same big-O notation.
  • Bellman-Ford algorithm can deal with negative weights.

2.2 All pairs shortest path problem

Floyd-Warshall Algorithm:










Friday 2 May 2014

Design Patterns

(This Post is mostly copy from: http://www.sourcetricks.com/2008/06/about-design-patterns.html)

1: Definition

Design patterns can be considered as a standardization of commonly agreed best practices to solve specific design problems.

2: Classification
Depending on the design problem they address, design patterns can be classified in different categories, of which the main categories are:

///////////////////////////////
3. Creational Patterns
///////////////////////////////

creational design patterns are design patterns that deal with object creation mechanisms, trying to create objects in a manner suitable to the situation.

3.1: Singleton
(全局唯一对象)
  • Singleton is a creational design pattern.
  • A design pattern to provide one and only instance of an object.
  • Make the constructors of the class private.
  • Store the object created privately.
  • Provide access to get the instance through a public method.
  • Can be extended to create a pool of objects. 
C++ Code:
---------------------- 
Class Singleton {
private:
static Singleton* instance;

private:
Singleton()
{}

public:
Singleton* GetInstance();
};
---------------------- 

3.2: Factory Pattern
(中央控制,屏蔽复杂的多态类创建细节)






  • Factory pattern is a creational design pattern.
  • Idea of the factory patterns is to localize the object creation code.
  • This prevents disturbing the entire system for a new type introduction.
  • Typically when a new type is introduced in the system, change is at one place only where the object is created to decide which constructor to use.
  • Simplest of the factory is introduction of a static method in the base class itself, which creates the required object based on the type.

  • Some Good Comments:
    -----------
    • The factory pattern is useful if you need to encapsulate create-time polymorphism from consumers, that is, you want to provide a transparent point from which new instances of a polymorphic type are created.
    • If the type in question is not polymorphic, the factory pattern is pointless.
    • If a single point of creation doesn't make sense, neither does the factory pattern.
    • If it is undesirable to hide the polymorphism details away, the factory pattern is probably inappropriate, too.
    • As with anything that adds complexity, you should default to not using it, but spot the point at which it is beneficial early on and apply it before it is too late.
    ---------

    C++ Code:
    ----------------------
    class Shape {
    public:
      virtual void Draw() = 0;
      static Shape* Create(string type);
    };


    class Circle: public Shape {
    public:
      void Draw() {
        printf("I am a circle\n");
      }
      friend class Shape;
    };


    Shape* Shape::Create(string type) {
      if(type.compare("Circle")) return new Circle();
      return NULL;
    }

    ----------------- 

    Another version is named "Abstract Factory", to move the functions to a separate class.
    Explain:
    Provide an interface for creating families of related or dependent objects without specifying their concrete classes.
    ---------------

    // Abstract Factory returning a mobile
    class MobileFactory {
        public:
           Mobile* GetMobile(string type);
    };
    
    Mobile* MobileFactory::GetMobile(string type) {
        if ( type == "Low-End" ) return new LowEndMobile();
        if ( type == "High-End" ) return new HighEndMobile();
        return NULL;
    }
    ------------------- 

















    3.3: object pool pattern
    (对costly资源的包装和分配)

    The object pool pattern is a software creational design pattern that uses a set of initialized objects kept ready to use – a "pool" – rather than allocating and destroying them on demand. A client of the pool will request an object from the pool and perform operations on the returned object. When the client has finished, it returns the object to the pool rather than destroying it; this can be done manually or automatically.

    Avoid expensive acquisition and release of resources by recycling objects that are no longer in use. Can be considered a generalisation of connection pool (database and networking) and thread pool patterns.

    Object pools employ one of three strategies to handle a request when there are no spare objects in the pool.
    1. Fail to provide an object (and return an error to the client).
    2. Allocate a new object, thus increasing the size of the pool.
    3. In a multithreaded environment, a pool may block the client until another thread returns an object to the pool.
    3.4: Build Pattern
    (对复杂Object的多态包装)
    Separate the construction of a complex object from its representation, allowing the same construction process to create various representations.

    Build Class:
    Define the virtual functions to be overlapped by sub-classes.
    Build Class replaces the role of Product in Director.

    ///////////////////////////////
    4. Structural patterns 
    ///////////////////////////////

    4.1: Adapter Pattern (Wrapper Pattern)
    (接口转换器)

    An adapter helps two incompatible interfaces to work together. This is the real world definition for an adapter. The adapter design pattern is used when you want two different classes with incompatible interfaces to work together. Interfaces may be incompatible but the inner functionality should suit the need. The Adapter pattern allows otherwise incompatible classes to work together by converting the interface of one class into an interface expected by the clients.



  • The adapter translates calls to its interface into calls to the original interface.
  • The adapter is also responsible for transforming data into appropriate forms.
  • The adapter is created by implementing or inheriting both the interface that is expected and the interface that is pre-existing.  






  • C++ Code
    ------------------------
    Class Adapter : public PublicInterface {
    private:
    serviceObject* object;

    public:
    Adapter(serviceObject* object) {this->object = object;}

    public:
    void PublicInterfaceFunction(AAA){
     BBB = transform(AAA);
     object->SomeOtherFunction(BBB)
    }
    };
    ------------------------

    4.2: Composite Pattern
    (对象的树形结构)

    Composite pattern is useful when you want to treat a group of objects in the same manner as a single instance of the object. The objects grouped, themselves could be in composite manner in a tree fashion.















    Compose objects into tree structures to represent leaf-composite hierarchies. Composite lets clients treat individual objects and compositions of objects uniformly.

    There are two types of components:
    1) working components: Leaf
    2) composite components: a group of components


    4.3: Decorator Pattern 
    (功能增强)

    Useful, when you want to add capabilities (statically or dynamically) to an object without sub-classing the concrete object's class as well as not affecting other objects of the same concrete class.



















    The key is that the Decorate Class extends the functions of the Component but does not sub-class it. (Has-A replaces Is-A)


    4.4: Facade Pattern
    (统一用户接口,提供Module Level Interface)

    • Facade pattern is a structural design pattern.
    • Makes an existing complex software library easier to use by providing a simpler interface for common tasks.
    • Allows the applications/ clients using the library de-coupled from inner workings of a complex library.
















    ///////////////////////////////
    5. Behavioral patterns
    ///////////////////////////////

    5.1: Template Algorithm (or Method) Design Pattern
    (抽象算法-伪代码)

    Define the skeleton of an algorithm in an operation, deferring some steps to subclasses. Template method lets subclasses redefine certain steps of an algorithm without changing the algorithm's structure.

    • Template pattern is a behavioral design pattern.
    • This has nothing to do with C++ templates as such.
    • Template patterns is a common form in object oriented programming. Having an abstract base class (one or more pure virtual functions) is a simple example of template design pattern.
    • In the template pattern, parts of program which are well defined like an algorithm are defined as a concrete method in the base class. The specifics of implementation are left to the derived classes by making these methods as abstract in the base class.
    • The method which implements the algorithm is also refered as template method and the class which implements this methods as the template class.


    C++ Example: 
    ----------------------------
    // Base class
    // Template class
    class Account {
        public:
            // Abstract Methods
            virtual void Start() = 0;
            virtual void Do() = 0;
            virtual void End() = 0;
    
            // Template Method
            void Withdraw(int amount) {
                Start();
                Do();
                End();  
            }
    };
    ---------------------------- 


    5.2: Chain of Responsibility
    (火枪N联机)


    • Chain of Responsibility is a behavioral design pattern.
    • The basic idea behind this pattern is that a request or a command passes through a chain of objects until it is handled.
    • Objects involved in this pattern are of two types. Processing Objects and Command Objects.
    • Processing objects handle the commands given by command objects.
    • Each processing object knows what it can handle and passes the command to the next member in the chain if not handled.
    • A common example to understand this pattern better is the C++ exception handling mechanism. Unhandled exceptions are passed up the call stack until somebody acts upon it.

    C++ code:
    ---------------------------- 
    
    
    class ErrorHandle {
    
      protected:
        ErrorStates state;
        ErrorHandle* successor;
          
      public: 
        //This handle is only responsible for an error state 
        ErrorHandle(ErrorStates aState) { state = aState; }
    
        void SetSuccessor  (ErrorHandle* handle) {
            this->successor = handle;
        } 
        //if cannot handle, then pass to the next.
        //It is multiple functional calls in the stack. 
        virtual void ProcessError(ErrorReport& report) = 0;
    };
    ----------------------------  

    5.3: Command Pattern
    (用户命令的封装和历史保存)

    Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

    • Command pattern is a behavioral design pattern.
    • Encapsulates a command/ request. The command itself is treated as an object.
    • Classes participating in a command pattern include:-
      • Command:- An abstract interface defining the execute method.
      • Concrete Commands:- Extend the Command interface and implements the execute method. Invokes the command on the receiver object.
      • Receiver:- Knows how to perform the command action.
      • Invoker:- Asks the command object to carry out the request.
      • Client:- Creates the commands and associates with the receiver.
    • Some examples:- 
      • Used when history of requests is needed. (Stock orders executed for today)
      • Asynchronous processing. Commands need to be executed at variant times.
      • Installation wizards.


    Example C++ code:
    --------------

    // Command interface
    class Command
    {
    public:
        virtual void execute() = 0;
    };
    
    // Receiver
    class StockTrade
    {
    public:
        StockTrade() {}
        void buy() { cout << "Buy stock" << endl; }
        void sell() { cout << "Sell stock" << endl; }
    };
    
    // Concrete command 1
    class BuyOrder : public Command
    {
        StockTrade* stock;
    public:
        BuyOrder(StockTrade* stock)
        {
            this->stock = stock;
        }
    
        void execute()
        {
            stock->buy();
        }
    };
    
    // Concrete command 2
    class SellOrder : public Command
    {
        StockTrade* stock;
    public:
        SellOrder(StockTrade* stock)
        {
            this->stock = stock;
        }
    
        void execute()
        {
            stock->sell();
        }
    };
    
    // Invoker
    class StockAgent
    {
    public:
        StockAgent() {}
        void order( Command* command )
        {
            commandList.push_back(command);
            command->execute();
        }
    private:
        // Looking at this command list gives
        // this order history
        vector<Command*> commandList;
    };
    --------------------------- 


    5.4: Visitor Pattern 
    (数据结构与算法分离)

    the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations to existing object structures without modifying those structures. It is one way to follow the open/closed principle.

    • The visitor pattern is a behavioral design pattern. 
    • Visitor pattern allows to separate the data structures and the algorithms to be applied on the data.
    • Both data structure objects and algorithm objects can evolve separately. Makes development and changes easier.
    • Data structure (element) objects have an "accept" method which take in a visitor (algorithmic) object.
    • Algorithmic objects have a "visit" method which take in a data structure object.

    Example C++ Code
    ----------------------

    // Forwards
    class VisitorIntf;
    
    // Abstract interface for Element objects
    class ElementIntf
    {
    public:
        virtual string name() = 0;
        virtual void accept(VisitorIntf* object) = 0;
    };
    
    // Abstract interface for Visitor objects
    class VisitorIntf
    {
    public:
        virtual void visit(ElementIntf* object) = 0;
    };
    
    // Concrete element object
    class ConcreteElement1 : public ElementIntf
    {
    public:
        string name()
        {
            return "ConcreteElement1";
        }
    
        void accept(VisitorIntf *object)
        {
            object->visit(this);
        }
    };
    
    // Concrete element object
    class ConcreteElement2 : public ElementIntf
    {
    public:
        string name()
        {
            return "ConcreteElement2";
        }
    
        void accept(VisitorIntf *object)
        {
            object->visit(this);
        }
    };
    
    // Visitor logic 1
    class ConcreteVisitor1 : public VisitorIntf
    {
    public:
        void visit(ElementIntf *object)
        {
            cout << "Visited " << object->name() <<
                    " using ConcreteVisitor1." << endl;
        }
    };
    
    // Visitor logic 2
    class ConcreteVisitor2 : public VisitorIntf
    {
    public:
        void visit(ElementIntf *object)
        {
            cout << "Visited " << object->name() <<
                 " using ConcreteVisitor2." << endl;
        }
    };
    ---------------------- 


    5.5: Strategy pattern
    (算法切换)

    In computer programming, the strategy pattern (also known as the policy pattern) is a software design pattern that enables an algorithm's behavior to be selected at runtime. The strategy pattern
    • defines a family of algorithms,
    • encapsulates each algorithm, and
    • makes the algorithms interchangeable within that family.
    It is a straightforward design, so not detailed here.











    ///////////////////////////////
    6. System Design Patterns
    ///////////////////////////////

    6.0: SOLID (object-oriented design)
    (设计圣经I)

    It is part of an overall strategy of agile and adaptive programming.

    1) S: Single responsibility principle
        a class should have only a single responsibility and that responsibility should be entirely encapsulated by the class.
    2) O: Open/closed principle
        software entities … should be open for extension, but closed for modification.
    3) L: Liskov substitution principle
        objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program
    4) I: Interface segregation principle
        many client-specific interfaces are better than one general-purpose interface.
    5) D: Dependency inversion principle
        one should “Depend upon Abstractions. Do not depend upon concretions.”

    6.0: KISS principle
    (设计圣经II)

    Keep it simple, stupid.

    6.1:Publish–subscribe Pattern
    (互不关心的消息分类传递)

    publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers. Instead, published messages are characterized into classes, without knowledge of what, if any, subscribers there may be.

    Similarly, subscribers express interest in one or more classes, and only receive messages that are of interest, without knowledge of what, if any, publishers there are.

    Benefits:
    Loose coupling
    Scalable 

    Disadvantages:
    No guarantee for delivery

    Possible delay




















    6.2: Model–view–controller
    (经典的网站UI分层)

    Model–view–controller (MVC) is a software architectural pattern for implementing user interfaces. It divides a given software application into three interconnected parts, so as to separate internal representations of information from the ways that information is presented to or accepted from the user.[1][2] The central component, the model, consists of application data, business rules, logic and functions. A view can be any output representation of information, such as a chart or a diagram. Multiple views of the same information are possible, such as a bar chart for management and a tabular view for accountants. The third part, the controller, accepts input and converts it to commands for the model or view.



    In addition to dividing the application into three kinds of components, the model–view–controller design defines the interactions between them.[4]
    • A controller can send commands to the model to update the model's state (e.g., editing a document). It can also send commands to its associated view to change the view's presentation of the model (e.g., by scrolling through a document).
    • A model notifies its associated views and controllers when there has been a change in its state. This notification allows the views to produce updated output, and the controllers to change the available set of commands. In some cases an MVC implementation might instead be "passive," so that other components must poll the model for updates rather than being notified.
    • A view requests information from the model that it needs for generating an output representation to the user.

     


    6.3: Internet Layers Design Pattern
    (经典的互联网分层设计)
      
    5-Layer:
    • 5: Process & Applications
    • 4: TCP/IP
    • 3: Internet (MAC)
    • 2:  Network
    • 1:    Physical  
    7-Layer:

     6.4: Event-based Design Pattern (or Event-driven architecture)
     (事情响应机制)
    There are four types of objects: 
    • Events  
    • Event List
    • Event Scheduler
    • Events Handler
    It is widely used in Windows OS, UI, Java, etc.













    6.5: Peer-to-peer Design (or Peer-to-peer architecture)

     (纯分布式系统)

    The most common type of structured P2P networks implement a distributed hash table (DHT),[23][24] in which a variant of consistent hashing is used to assign ownership of each file to a particular peer.[25][26] This enables peers to search for resources on the network using a hash table: that is, (key, value) pairs are stored in the DHT, and any participating node can efficiently retrieve the value associated with a given key.

    Example: Chord

    Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers (known as "nodes"); a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

    Step 1:
    IDs and keys are assigned an m-bit identifier using consistent hashing. The SHA-1 algorithm is the base hashing function for consistent hashing.

    Step 2:
    Each node has a successor and a predecessor. The successor to a node (or key) is the next node in the identifier circle in a clockwise direction. The predecessor is counter-clockwise. 

    Step 3:
    A logical ring with positions numbered 0 to 2^{m}-1 is formed among nodes. Key k is assigned to node successor(k), which is the node whose identifier is equal to or follows the identifier of k. If there are N nodes and K keys, then each node is responsible for roughly K/N keys.

    Step 4:
    With high probability, Chord contacts O(\log N) nodes to find a successor in an N-node network.















    Example: Chord Design





    Google Products

    1995:
    Page Rank Algorithm

    1998:
    Google Homepage Launches

    1999:
    Google Logo Design

    2000:
    AdWords
    Keywords and Ads (Right Side of Window)
    Pay only when they click and visit the website.

    2000:
    Eric Join Google as the chairman

    2001:
    Google Japan first international branch

    2003:
    Blogger

    AdSense
    For Ads: Put Ads on tons of content website that talks about a topic.
    For Website: your website makes money when people clicks the Ads.

    2004:
    Gmail with 1 GB storage

    2005: (BIG YEAR)
    Google Map gies Live
    Google Talk
    Google Analytic

    2006:
    Youtube joins Google
    Google Apps & Doc offer Cloud-based computing

    2007:
    Street View

    2008:
    (10 Years Celebration)
    Chrome Launches

    2009:
    Google Translate
    Google Ocean
    Google Voice
    Google Latitude
    Google Sky Map

    2010
    Google Phone
    Nexus

    2011
    Google Crisis Response
    Google Android Tablet
    Google Art Online





    Thursday 1 May 2014

    What is the most important thing in non-technical interviews?

    1: Details
    At least prepare the questions.

    2: Five Points
    : Focus on the Impacts.
    : Focus on the Actions.
    : Concise
    : Well-Structured
    : Prioritize


    What is important in a technical interview?

    Technical Questions:

    "The way the interviewer approach the problem is more important than the skills and experiences"

    1: Process
    Took the problem.
    Understand the problem.
    Design basic test-cases.
    Conceptual Idea.
    Mathematically check the correctness. 
    Design the Data Structure and Algorithm.
    Code Reviewer.
    Testing the solution. (Functional -> Boundary -> Wrong Inputs)

    2:
    Communicate with the interviewer.
    Do you have suggestions?
    How do you think?

    3: Skillset
    1) Writing good code (production code)
    2) Algorithms and Data Structure (big O notation)
    3) Analytical skills (questioning the problem and think through the solution step by step and mathematically check the solution is correct)
    4) Sound Design (Testing)

    4. Tips
    1) Do not optimize before correctness.
    2) Time/Space trade-offs
    3) Assumptions
    4) Ask if cannot remember APIs
    5) Must Test (boundary conditions, etc)


    Tree Structures

    Definition of Tree:











    Some related definitions:
    • Root - the top most node in a tree.
    • Parent - node that has a child.
    • Siblings - nodes with the same parent.
    • Leaves - nodes with no children.
    • Internal nodes - nodes with at least one child.
    • Degree - number of sub trees of a node.
    • Edge - connection between one node to another.
    • Level - The level of a node is defined by 1 + the number of connections between the node and the root
    • Height - The height of a node is the length of the longest downward path between the node and a leaf
    • Forest - A forest is a set of n ≥ 0 disjoint trees.
    Summary:
    1. Binary Tree: each node has at most two children
    2. Binary Search Tree: The left sub-tree contains only nodes with keys less than the parent node and The right sub-tree contains only nodes with keys greater than the parent node.
    3. AVL Tree: the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property.
    4. Red–black: Same number of black nodes.
    5. B-tree/B+-Tree: More than 2 children (100), 50% full, balanced.
    6. R-Tree: 2-D B+-Tree, Split and Merge More complex.
    7. binary heap tree: Use Array in the underline.

    1. Binary Tree
    A tree that each node has at most two children (referred to as the left child and the right child)

    Example:
    binary search trees
    binary heaps (Maximum Heap Tree or Minimum Heap Tree)

    2. Binary Search Tree
    • Each node has no more than two child nodes. 
    • Each child must either be a leaf node or the root of another binary search tree. 
    • The left sub-tree contains only nodes with keys less than the parent node
    • The right sub-tree contains only nodes with keys greater than the parent node.
    Operations:
    • Insert
    • Delete
    • Query (Note that there is no guarantee for performance)
    • Sort (preorder or postorder)
    • Find Successor
    • Find predecessor
    2.1: Find Successor
    1). The node has a right subtree.
    If the given node has a right subtree then by the BST property the next larger key must be in the right subtree. Since all keys in a right subtree are larger than the key of the given node, the successor must be the smallest of all those keys in the right subtree.

    2). The node does not have a right subtree.
    In this case we will have to look up the tree since that's the only place we might find the next larger key. There is no point looking at the left subtree as all keys in the left subtree are guaranteed to be smaller than the key in the given tree.


    3). In this case, there is no Successor.

    2.2:Tree traversal 
    1. Depth-first and 
    2. Breadth-first
    There are three types of depth-first traversal: pre-order,[1] in-order,[1] and post-order.[1] For a binary tree, they are defined as operations recursively at each node, starting with the root node as follows: 

    Pre-order
    1. Visit the root.
    2. Traverse the left subtree.
    3. Traverse the right subtree

    In-order (symmetric) (It is also the sorting order)

    1. Traverse the left subtree.
    2. Visit the root.
    3. Traverse the right subtree.

    Post-order

    1. Traverse the left subtree.
    2. Traverse the right subtree.
    3. Visit the root.

    3: Self-balancing binary search tree
    (This is maybe the most important basic of tree structures, which is efficient for point query and range query)

    Automatically keeps its height (maximal number of levels below the root) small in the face of arbitrary item insertions and deletions.

    Note that:
    any Self-balancing binary search tree can be used to support:
    1) Sorting
    2) Insert and Delete
    3) Find the next larger or the previous smaller. 

    3.1: AVL Tree

    the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property.

    Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

    It has to be ordered (For Sure)









    In case of insertion and deletion, the AVL property might be worng. Thus, rotation is required: There are 4 types of rotation, LL, LR, RL and RR.

    3.2: Red–black tree











    The balancing of the tree is not perfect but it is good enough to allow it to guarantee searching in O(log n) time, where n is the total number of elements in the tree. The insertion and deletion operations, along with the tree rearrangement and recoloring, are also performed in O(log n) time.

    4. Other Popular Trees:
    4.1: B-tree
    B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

    The B-tree uses all of the ideas described above. In particular, a B-tree:
    • keeps keys in sorted order for sequential traversing
    • uses a hierarchical index to minimize the number of disk reads
    • uses partially full blocks to speed insertions and deletions
    • keeps the index balanced with an elegant recursive algorithm
    In addition, a B-tree minimizes waste by making sure the interior nodes are at least half full. A B-tree can handle an arbitrary number of insertions and deletions.















    Insertion


    Deletion

    4.2: R-Tree
    Expanded from B-Tree, R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.



















    4.3: binary heap (for Heap Tree)
    Note that it is not a BST.
       
    It has a very beautiful property: 
    Left Child: 2 * x + 1 and right child: 2 * x + 2

     


    4.4: Suffix Trie
    Build a Suffix Trie for T and the support to do with S(m):
    1) Sub-string search
    2) Suffix Search
    3) How many times the sub-string occurs?

    Space Cost: O(T^2)
    Time Cost of build: O(T^2)
    Time Cost of search: O(m)

    4.5: Suffix Tree (Compressed Suffix Trie)
    (Nodes) Space Cost: O(2*T) 
    (All) Space Cost: O(2*T) 













    The label size is still O(T^2), so the next change is:












    Use (offset, length);
    Another thing is that each leaf node has a number, indicating the location of the suffix.












    Note:
    It can be used for other operations:
    1) longest common sub-string.

    4.5 + - generalized suffix tree:

    Combine different strings together and build a suffix tree.