Big Data Engineer

Big Data Engineer

2016 salary range: $129,500 to $183,500

2017 salary range: $135,000 to $196,000

Projected salary growth: 5.8%

These pros design and implement solutions for big data challenges. They gather business objectives from users and data scientists, and then they translate those objectives into data-processing workflows. They should have a strong knowledge of statistics, programming experience in Python or Java, and the ability to design and implement big data solutions. Experience with NoSQL is preferred. They should also have experience with data mining, processing large amounts of raw data, and maintaining relational databases for storage and data acquisition. These pros typically hold a bachelor’s degree in a related field and have from four to six years of experience.

C++ big data

1. map reduce

thrill(http://project-thrill.org)
hce (hadoop C++ extension)
google mr4c(https://github.com/google/mr4c)

2. fs

qfs (https://github.com/quantcast/qfs)

3. big table

hypertable (http://www.hypertable.org)

知识体系

1. C++
C++ Primer, Fifth Edition

2. Design pattern
Design Patterns Elements of Reusable Object-Oriented Software

3. multi-thread
c++ concurrency in action: practical multithreading

4. distributed algorithm
Distributed Systems An Algorithmic Approach

5. parallel algorithm
Introduction to Parallel Computing, Second Edition

sheepdog join sequences

$ sheep -n  /var/lib/sheepdog,/mnt/sheep/0 -c zookeeper:192.168.2.45:2181,192.168.2.46:2181,192.168.2.47:2181

join sequence(3 steps)
========================================
1. slave node: send EVENT_JOIN to zookeeper
#0  add_join_event (msg=0x6708d0 <__sys+112>, msglen=491568) at cluster/zookeeper.c:799
#1  0x0000000000435a31 in zk_join (myself=0x670870 <__sys+16>, opaque=0x6708d0 <__sys+112>, opaque_len=491568) at cluster/zookeeper.c:1027
#2  0x000000000040abc6 in send_join_request () at group.c:1094
#3  0x000000000040bbd9 in create_cluster (port=7000, zone=-1, nr_vnodes=128, explicit_addr=false) at group.c:1435
#4  0x0000000000406356 in main (argc=6, argv=0x7fffffffe628) at sheep.c:963

2. master node: handle EVENT_JOIN, send EVENT_ACCEPT to zookeeper
#0  push_join_response (ev=0x7fffffefd2c0) at cluster/zookeeper.c:560
#1  0x0000000000435ccd in zk_handle_join (ev=0x7fffffefd2c0) at cluster/zookeeper.c:1078
#2  0x0000000000436af3 in zk_event_handler (listen_fd=16, events=1, data=0x0) at cluster/zookeeper.c:1310
#3  0x0000000000437f26 in do_event_loop (timeout=-1, sort_with_prio=false) at event.c:220
#4  0x0000000000437f5e in event_loop (timeout=-1) at event.c:230
#5  0x00000000004065ca in main (argc=6, argv=0x7fffffffe628) at sheep.c:1039

3. slave node: handle EVENT_ACCEPT, create path:/member/IPv4 ip:10.0.3.46 port:7000
#0  zk_handle_accept (ev=0x7fffffefd2c0) at cluster/zookeeper.c:1113
#1  0x0000000000436af3 in zk_event_handler (listen_fd=16, events=1, data=0x0) at cluster/zookeeper.c:1310
#2  0x0000000000437f26 in do_event_loop (timeout=-1, sort_with_prio=false) at event.c:220
#3  0x0000000000437f5e in event_loop (timeout=-1) at event.c:230
#4  0x00000000004065ca in main (argc=6, argv=0x7fffffffe628) at sheep.c:1039

docker usage

$ docker run –name test -d -it debian
275c44472aebd77c926d4527885bb09f2f6db21d878c75f0a1c212c03d3bcfab
$ docker attach test

docker commit 255efb733f97 minggr/nvmeof

docker login -u <user> -p <password>

docker push minggr/nvmeof
docker pull minggr/nvmeof