Using Erasure Coding System

Note

For a quick experience, please refer to Single-Deploy Mode.

Compilation and Building

$ git clone https://github.com/cubefs/cubefs.git
$ source build/cgo_env.sh
$ make blobstore

After successful building, the following executable files will be generated in the build/bin/blobstore directory:

├── build/bin/blobstore
│   ├── access
│   ├── clustermgr
│   ├── proxy
│   ├── scheduler
│   ├── blobnode
│   └── blobstore-cli

Cluster Deployment

Due to the interdependence of the modules, the deployment should be carried out in the following order to avoid deployment failure caused by service dependencies.

Basic Environment

Supported Platforms
Linux
Dependent Components
Kafkaopen in new window
Consulopen in new window
Language Environment
Goopen in new window (1.17.x)

Install Clustermgr

Note

Deploying Clustermgr requires at least three nodes to ensure service availability.

The node startup example is as follows. The node startup requires changing the corresponding configuration file and ensuring that the associated configuration between the cluster nodes is consistent.

Startup (three-node cluster)

# Example, it is recommended to use process management to start and stop services, the same below
nohup ./clustermgr -f clustermgr.conf
nohup ./clustermgr -f clustermgr1.conf
nohup ./clustermgr -f clustermgr2.conf

Cluster configuration of the three nodes, example node 1: clustermgr.conf

{
     "bind_addr":":9998",
     "cluster_id":1,
     "idc":["z0"],
     "chunk_size": 16777216,
     "log": {
         "level": "info",
         "filename": "./run/logs/clustermgr.log"
      },
     "auth": {
         "enable_auth": false,
         "secret": "testsecret"
     },
     "region": "test-region",
     "db_path":"./run/db0",
     "code_mode_policies": [
         {"mode_name":"EC3P3","min_size":0,"max_size":50331648,"size_ratio":1,"enable":true}
     ],
     "raft_config": {
         "server_config": {
             "nodeId": 1,
             "listen_port": 10110,
             "raft_wal_dir": "./run/raftwal0"
         },
         "raft_node_config":{
             "node_protocol": "http://",
             "members": [
                     {"id":1, "host":"127.0.0.1:10110", "learner": false, "node_host":"127.0.0.1:9998"},
                     {"id":2, "host":"127.0.0.1:10111", "learner": false, "node_host":"127.0.0.1:9999"},
                     {"id":3, "host":"127.0.0.1:10112", "learner": false, "node_host":"127.0.0.1:10000"}]
         }
     },
     "volume_mgr_config": {
         "allocatable_size": 10485760
     },
     "disk_mgr_config": {
         "refresh_interval_s": 10,
         "rack_aware":false,
         "host_aware":false
     }
}

Set the initial value of background task according to the mentioned in ClusterMgr Manage

#example balance
$> curl -X POST http://127.0.0.1:9998/config/set -d '{"key":"balance","value":"false"}' --header 'Content-Type: application/json'

Install BlobNode

Create the relevant directories in the compiled blobnode binary directory.

# This directory corresponds to the path in the configuration file
mkdir -p ./run/disks/disk{1..8} # Each directory needs to mount a disk to ensure data collection accuracy

Start the service.

nohup ./blobnode -f blobnode.conf

Example blobnode.conf:

{
   "bind_addr": ":8899",
   "cluster_id": 1,
   "idc": "z0",
   "rack": "testrack",
   "host": "http://127.0.0.1:8899",
   "dropped_bid_record": {
     "dir": "./run/logs/blobnode_dropped"
   },
   "disks": [
     {
       "path": "./run/disks/disk1",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk2",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk3",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk4",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk5",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk6",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk7",
       "auto_format": true,
       "disable_sync": true
     },
     {
       "path": "./run/disks/disk8",
       "auto_format": true,
       "disable_sync": true
     }
   ],
   "clustermgr": {
     "hosts": [
       "http://127.0.0.1:9998",
       "http://127.0.0.1:9999",
       "http://127.0.0.1:10000"
     ]
   },
   "disk_config":{
     "set_default_switch": true,
     "must_mount_point": true,
     "data_qos": {
       "read_mbps": 100,
       "write_mbps": 60,
       "background_mbps": 20
     }
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/blobnode.log"
   },
   "auditlog":{
      "logdir":"./run/auditlog/blobnode"
   }
}

Install Proxy

proxy depends on the Kafka component and requires the creation of corresponding topics for blob_delete_topic, shard_repair_topic, and shard_repair_priority_topic in advance.

Note

Kafka can also use other topic names, but it is necessary to ensure that the Kafka of the Proxy and Scheduler service modules are consistent.

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic blob_delete shard_repair shard_repair_priority

Start the service. To ensure availability, at least one proxy node needs to be deployed in each IDC.

nohup ./proxy -f proxy.conf &

Example proxy.conf:

{
   "bind_addr": ":9600",
   "host": "http://127.0.0.1:9600",
   "idc": "z0",
   "cluster_id": 1,
   "clustermgr": {
     "hosts": [
       "http://127.0.0.1:9998",
       "http://127.0.0.1:9999",
       "http://127.0.0.1:10000"
       ]
   },
   "auth": {
       "enable_auth": false,
       "secret": "test"
   },
   "mq": {
     "blob_delete_topic": "blob_delete",
     "shard_repair_topic": "shard_repair",
     "shard_repair_priority_topic": "shard_repair_prior",
     "msg_sender": {
       "broker_list": ["127.0.0.1:9092"]
     }
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/proxy.log"
   }
}

Install Scheduler

Start the service.

nohup ./scheduler -f scheduler.conf &

Example scheduler.conf: Note that the Scheduler module is deployed on a single node.

{
   "bind_addr": ":9800",
   "cluster_id": 1,
   "services": {
     "leader": 1,
     "node_id": 1,
     "members": {"1": "127.0.0.1:9800"}
   },
   "service_register": {
     "host": "http://127.0.0.1:9800",
     "idc": "z0"
   },
   "clustermgr": {
     "hosts": ["http://127.0.0.1:9998", "http://127.0.0.1:9999", "http://127.0.0.1:10000"]
   },
   "kafka": {
     "broker_list": ["127.0.0.1:9092"]
   },
   "blob_delete": {
     "max_batch_size": 10,
     "batch_interval_s": 2,
     "delete_log": {
       "dir": "./run/delete_log"
     }
   },
   "shard_repair": {
     "orphan_shard_log": {
       "dir": "./run/orphan_shard_log"
     }
   },
   "log": {
     "level": "info",
     "filename": "./run/logs/scheduler.log"
   },
   "task_log": {
     "dir": "./run/task_log"
   }
}

Install Access

Note

The Access module is a stateless service node and can be deployed on multiple nodes.

Start the service.

nohup ./access -f access.conf

Example access.conf:

{
     "bind_addr": ":9500",
     "log": {
         "level": "info",
         "filename": "./run/logs/access.log"
      },
     "stream": {
         "idc": "z0",
         "cluster_config": {
             "region": "test-region",
             "clusters":[
                 {"cluster_id":1,"hosts":["http://127.0.0.1:9998","http://127.0.0.1:9999","http://127.0.0.1:10000"]}]
         }
     }
}

Configuration Instructions

Deployment Tips

After the deployment of Clustermgr and BlobNode fails, residual data needs to be cleaned up before redeployment to avoid registration disk failure or data display errors. The command is as follows:

# Example for blobnode
rm -f -r ./run/disks/disk*/.*
rm -f -r ./run/disks/disk*/*

# Example for clustermgr
rm -f -r /tmp/raftdb0
rm -f -r /tmp/volumedb0
rm -f -r /tmp/clustermgr
rm -f -r /tmp/normaldb0
rm -f -r /tmp/normalwal0

Clustermgr adds learner nodes.

Note

Learner nodes are generally used for data backup and fault recovery.

Enable the Clustermgr service on the new node and add the member information of the current node to the configuration of the new service.

Call the member addition interface to add the newly started learner node to the cluster.

curl -X POST --header 'Content-Type: application/json' -d '{"peer_id": 4, "host": "127.0.0.1:10113","node_host": "127.0.0.1:10001", "member_type": 1}' "http://127.0.0.1:9998/member/add"

After the addition is successful, the data will be automatically synchronized.

The reference configuration is as follows: clustermgr-learner.conf:

{
     "bind_addr":":10001",
     "cluster_id":1,
     "idc":["z0"],
     "chunk_size": 16777216,
     "log": {
         "level": "info",
         "filename": "./run/logs/clustermgr3.log"
      },
     "auth": {
         "enable_auth": false,
         "secret": "testsecret"
     },
     "region": "test-region",
     "db_path":"./run/db3",
     "code_mode_policies": [
         {"mode_name":"EC3P3","min_size":0,"max_size":50331648,"size_ratio":1,"enable":true}
     ],
     "raft_config": {
         "server_config": {
             "nodeId": 4,
             "listen_port": 10113,
             "raft_wal_dir": "./run/raftwal3"
         },
         "raft_node_config":{
             "node_protocol": "http://",
             "members": [
                     {"id":1, "host":"127.0.0.1:10110", "learner": false, "node_host":"127.0.0.1:9998"},
                     {"id":2, "host":"127.0.0.1:10111", "learner": false, "node_host":"127.0.0.1:9999"},
                     {"id":3, "host":"127.0.0.1:10112", "learner": false, "node_host":"127.0.0.1:10000"},
                     {"id":4, "host":"127.0.0.1:10113", "learner": true, "node_host": "127.0.0.1:10001"}]
         }
     },
     "disk_mgr_config": {
         "refresh_interval_s": 10,
         "rack_aware":false,
         "host_aware":false
     }
}

After all modules are deployed successfully, the upload verification needs to be delayed for a period of time to wait for the successful creation of the volume.

Upload Testing

Refer to Quick Use for details.

Modifying Master Configuration

Modify the ebsAddr configuration item in the Master configuration file (more configuration references) to the Consul address registered by the Access node.

Creating Erasure-coded Volume

Refer to Creating a Volume.

Appendix

Encoding strategy: commonly used strategy table

Category	Description
EC12P4	{N: 12, M: 04, L: 0, AZCount: 1, PutQuorum: 15, GetQuorum: 0, MinShardSize: 2048}
EC3P3	{N: 3, M: 3, L: 0, AZCount: 1, PutQuorum: 5, GetQuorum: 0, MinShardSize: 2048}
EC16P20L2	{N: 16, M: 20, L: 2, AZCount: 2, PutQuorum: 34, GetQuorum: 0, MinShardSize: 2048}
EC6P10L2	{N: 6, M: 10, L: 2, AZCount: 2, PutQuorum: 14, GetQuorum: 0, MinShardSize: 2048}
EC12P9	{N: 12, M: 9, L: 0, AZCount: 3, PutQuorum: 20, GetQuorum: 0, MinShardSize: 2048}
EC15P12	{N: 15, M: 12, L: 0, AZCount: 3, PutQuorum: 24, GetQuorum: 0, MinShardSize: 2048}
EC6P6	{N: 6, M: 6, L: 0, AZCount: 3, PutQuorum: 11, GetQuorum: 0, MinShardSize: 2048}

Where

N: number of data blocks, M: number of check blocks, L: number of local check blocks, AZCount: number of AZs
PutQuorum: (N + M) / AZCount + N \<= PutQuorum \<= M + N
MinShardSize: minimum shard size, the data is continuously filled into the 0-N shards. If the data size is less than MinShardSize*N, it is aligned with zero bytes. See codeopen in new window for details.

Edit on GitHub

Quick Start

Operation Guide

FAQ

Using Erasure Coding System

Compilation and Building

Cluster Deployment

Basic Environment

Install Clustermgr

Install BlobNode

Install Proxy

Install Scheduler

Install Access

Configuration Instructions

Deployment Tips

Upload Testing

Modifying Master Configuration

Creating Erasure-coded Volume

Appendix

# Using Erasure Coding System

# Compilation and Building

# Cluster Deployment

# Basic Environment

# Install Clustermgr

# Install BlobNode

# Install Proxy

# Install Scheduler

# Install Access

# Configuration Instructions

# Deployment Tips

# Upload Testing

# Modifying Master Configuration

# Creating Erasure-coded Volume

# Appendix

Using Erasure Coding System

Compilation and Building

Cluster Deployment

Basic Environment

Install Clustermgr

Install BlobNode

Install Proxy

Install Scheduler

Install Access

Configuration Instructions

Deployment Tips

Upload Testing

Modifying Master Configuration

Creating Erasure-coded Volume

Appendix