Saturday, 7 September 2013

Check Sim-Tree Paper

1: Introduction
explain the problem to non-experts,
give an example,
why is important?
the contributions.

2:Motivations
clear?
make sense?
what-if-not?

3:To Do List
Check Query Lost in a small case.



Tuesday, 27 August 2013

Gary's suggestions

1:段落空两格
2:decides -> determines
3:going on -> on going
4:单复数
5:fitness -> fit
6:What SimMobility is.
7:To be exactly -> To be exact
8:nearby other agents -> nearby agents
9:is like -> is
10:one -> a/an
11 : did -> do
12 : proposed -> propose
13 : in -> during (for times)
14 : experience -> experiment
15 : It -> This
16 : the Figure -> Figure
17 : Bugis -> the Bugis & Singapore -> the Singapore
18 : speed-up -> speedup (check all Google)
19 : prove -> show
20 : 

Monday, 26 August 2013

My own tips to write a A+ Research Paper

1: Ask myself who is the major reader? or the possible reviewer?

1+: Point out the exact problem explicitly.

1++: Point out exactly why the problem is important.

2: Point out the motivation & main idea of the paper explicitly.

3: List Contribution in Introduction and Where each contribution point is located in the paper.

4: Point out the weak point in the end myself. Do not let other people find it.

5: Related work put in the end.

6: Careful on the title, BIZ, it is the most read part of the paper.

7: Listen to reviewers.

8: Use examples to explain the problem and the idea.

9: Do I believe the conclusions based on the evidence I give? 

10: 

Monday, 29 July 2013

TCMalloc based NUMA Heap: Source Code View

1: 函数定义
void* operator new(size_t size)
    ATTRIBUTE_SECTION(google_malloc);
void operator delete(void* p)
    __THROW ATTRIBUTE_SECTION(google_malloc);
void* operator new[](size_t size)
    ATTRIBUTE_SECTION(google_malloc);
void operator delete[](void* p)
    __THROW ATTRIBUTE_SECTION(google_malloc);

2: 调用
static inline void* cpp_alloc(size_t size, bool nothrow)

3: 调用
static inline void* do_malloc(size_t size)
这个函数的逻辑:

4: TCMalloc_ThreadCache是入口Class,其它的Class都是透明的。
4.1TCMalloc_ThreadCache* heap = TCMalloc_ThreadCache::GetCache();
-- GetCache()是static function;
--如果是第一次使用,则调用static function InitModule();
--否则,表示该thread已经使用过,因此,调用static function GetThreadHeap();
4.2先说说InitModule()
这个函数是全局的初始化,包括global variables.
-- static PageHeapAllocator<TCMalloc_ThreadCache> threadheap_allocator;
-- static PageHeapAllocator<Span> span_allocator;
-- static PageHeapAllocator<StackTrace> stacktrace_allocator;
-- static TCMalloc_Central_FreeList central_cache[kNumClasses];
-- TCMalloc_PageHeap
4.3 PageHeapAllocator
这个类是一个Pool对象管理器

5:  获取TCMalloc_ThreadCache* heap之后,
if size <= kMaxSize: CheckedMallocResult(heap->Allocate(size));
else: do_malloc_pages(pages(size));

6: 先说说heap->Allocate(size)
If can find the most suitable FreeList, return.
If cannot, go to central list. fetch one.

7:  CheckedMallocResult(Point*)
这个函数的作用是check改指针位置是否符合要求

8:pages(int);
convert byte into pages

9:do_malloc_pages();

10:  目标转移回到Allocate(),如果没有发现合适的space,则调用FetchFromCentralCache获取central space。
Fetch的数量取决num_objects_to_move[]的大小 。
Right now, Use approx 64k transfers between thread and central caches.

11: FetchFromCentralCache
central_cache[cl].RemoveRange
central_cache’s Class is: TCMalloc_Central_FreeList.

////////////////////
// 主要数据结构:
//////////////////

1: TCMalloc_PageHeap
提供最底层服务,为每一个Node保存一个TCMalloc_PageHeap_NodeData;
提供的主要接口:
Span* New(Length n, int *node);
Carve(int node, Span* span, Length n, bool released); 切割Span
void Delete(Span* span);

2: TCMalloc_PageHeap_NodeData
这个类比较简单,contains一系列Span的容器


////////////////////
// 实验数据:
//////////////////
1: Get from global space is not the dominate problem.
    仅占用1000毫秒左右;
2:
1928104    ,2590838    ,662734    ,__tls_get_addr
5522090    ,6820971    ,1298881    ,operator::delete(void*)
2128626    ,2760420    ,631794    ,__strlen_sse42
2761045    ,3711965    ,950920    ,memcpy
7149410    ,9600234    ,2450824    ,operator::new(unsigned
3: 先分析new吧;
为什么new会增加呢?
因为malloc的调用次数没有变化,因此,断定是do_malloc耗时增加
do_malloc只有两种情况:small size than kMaxSize or Large Size
在small size里面:
FetchFromCentralCache function的耗时很少,这一点很大的缩小了问题的范围;
这说明绝大多数的时间是消耗Allocate()
在Large Size里面:
do_malloc_pages function的耗时很少,这一点很大的缩小了问题的范围;

XXX
因为Allocate的函数内容很简单,我发现的主要原因是TCMalloc_ThreadCache处在Node 0的区间里面;

这个问题处理了之后,并没有解决问题,只是有稍微的改善;
于是,我开始考虑Malloc里面的其它语句;
SizeClass(size)
class_array[ClassIndex(size)];
这些变量,都是全局static变量,会产生remote access,因此,产生了局部化的想法;

4:再分析delete吧:
static inline void do_free(void* ptr)
发现相似的问题:
主要的时间消耗在



Tuesday, 18 June 2013

NUMA 学习

1:
Before any other calls in this library can be used numa_available() must be called. If it returns -1, all other functions in this library are undefined.