Anyone following Ethereum has come across horror stories of how difficult it is to sync a node to the mainnet. We all heard of nodes stuck in the syncing
process for days. Problems with finding peers or the horrible consumption of limited SSD space!
You never know how accurate these stories are. Still, for a long time this was enough for me to stay away from trying.
My development needs were largely satisfied by running private Ethereum blockchains using Geth.
When it came to live testing and deployment, Infura would bridge the gap.
Earlier this year I finally set myself to find out for myself what it takes to run a node. I wanted to answer 3 simple questions:
- How long would an initial sync take?
- How much storage space would it consume?
- How long would it take for me to resync a node that has been idle for some time?
The third question is in fact the most important of the three. Working on various projects, for different blockchains, means that I cannot really keep nodes
running all the time. A more realistic scenario for me is to sync to the blockchain on a fixed schedule. Such that when needed, I would only have to wait for
a "short while" for the sync to complete.
My Machine Specs
To put everything into context here are the basic node machine specs.
Dell OptiPlex 3070 MT Core i5-9500 8GB
SSD: Samsung 860 EVO SATA 2.5" SSD 1TB
OS: Windows 10
Geth Version: 1.9.11
Download Speed: 75Mbps
Geth Command-line
Geth can be downloaded from:
https://geth.ethereum.org/downloads/
Geth is being run with these command-line parameters:
geth
--syncmode fast
--datadir <Data Path>
--rpc
--rpcapi "debug,eth,net,web3,personal,admin,miner,txpool"
--rpcport 8545
--cache 2048
--etherbase <address>
For conciseness I won't discuss the parameters in detail. Just a few points to note:
--syncmode fast
- Fast syncing is the default. I am including it anyway to highlight the used mode.
Light mode is not enough for my needs and full is too slow. Later, we will get an indication of how slow full sync mode is.
--datadir <Data Path>
- Identifies the node storage location. We will look at this directory size to measure storage consumption.
--rpcapi "debug,eth,..."
- Lists the interfaces to be exposed by this node. As a developer, I want to play
with many interfaces, hence the long list. However, do watch out from exposing many interfaces in rpcapi. Each is opening
access to our node. Interfaces we probably don't want others to access.
Initial Sync
Running the above command, we start the sync process.
The first obvious question is; How far behind are we? If you go to an Ethereum blockchain explorer you can get the highest block number, which at the time of writing was 9986931.
Once the node connects to a few peers and starts pulling the first blocks, look for logs starting with "Imported new block receipts"
.
Here are my filtered logs, showing the first few entries of this type:
Imported new block receipts count=2 elapsed=20.000ms number=2 hash=b495a1…4698c9 age=4y9mo3w size=1.69KiB
Imported new block receipts count=4 elapsed=19.049ms number=6 hash=1f1aed…6b326e age=4y9mo3w size=3.30KiB
Imported new block receipts count=570 elapsed=125.964ms number=576 hash=41a746…6a8b38 age=4y9mo3w size=407.94KiB
Imported new block receipts count=1 elapsed=88.031ms number=577 hash=c4cee3…93da3f age=4y9mo3w size=578.00B
Imported new block receipts count=481 elapsed=81.999ms number=1058 hash=2af79b…35d557 age=4y9mo3w size=349.01KiB
Number
shows the block number our node is at. We can see this getting incremented with every newly received batch of blocks.
Age
shows how far behind our node is, in terms of time. Starting a new node now, gives me 4 years 9 months and 3 weeks.
That's how much time has passed since the first Ethereum block was mined. As the sync progresses we will observe number
going up and age
going down.
In addition to the blocks, our node will also be syncing state and generate log entries of the type "Imported new state entries"
. Here is a
snippet from my node:
Imported new state entries count=1920 elapsed=8.998ms processed=116260 pending=24675 retry=0 duplicate=0 unexpected=0
Imported new state entries count=1530 elapsed=4.000ms processed=117790 pending=24419 retry=0 duplicate=0 unexpected=0
Imported new state entries count=1920 elapsed=5.998ms processed=119710 pending=24828 retry=0 duplicate=0 unexpected=0
Imported new state entries count=2304 elapsed=8.033ms processed=122014 pending=22966 retry=0 duplicate=0 unexpected=0
Imported new state entries count=1536 elapsed=4.999ms processed=123550 pending=21957 retry=0 duplicate=0 unexpected=0
Again processed
will go up as the sync progresses. Indeed state syncing is what consumes most time in a fast sync. At some
point our node will look as if it has almost reached the chain tail, but will continue for many hours syncing state entries.
Monitoring the Sync Progress
Instead of looking at fast scrolling logs, we can grab the salient syncing counters by attaching a second Geth instance to the node. From a second console run:
geth attach http://localhost:8545
From here query for the syncing state using:
eth.syncing
currentBlock
and pulledStates
conveniently show the two counters of interest for the number of blocks and the number of state entries our node has retrieved.
highestBlock
is the current highest block number showing us how far behind the node is, in terms of blocks.
knownStates
shows the highest state count our node is aware of. This is not very useful since it doesn't really give us an indication of how many state entries are outstanding.
startingBlock
shows the block number from which our sync has started. Of course a fresh sync starts from zero.
We can stop Geth anytime by pressing CTRL-C. Re-running the same Geth command-line, syncing continues from where it left off.
startingBlock
would now show the new sync start block count number.
State Syncing is the Key
The highest block number can be easily determined from a blockchain explorer or the highestBlock
value. But what about the highest state entry count? This is not readily available
from any blockchain explorer. Instead, on github there is a thread where people regularly post the highest observed state count:
https://github.com/ethereum/go-ethereum/issues/14647
Looking at the last thread post, we know that we have to definitely reach and exceed the count reported there.
At the time of writing, the last reported state count is dated 14th April 2020 and has the value of 487,040,102.
Without getting into any complicated Maths we can pick a couple of values and get a better approximation. Assuming a constant state count increase rate:
17 Mar 2020 - 466184476
14 Apr 2020 - 487040102 (+28 days @ 744,843/day)
So at the time of writing (2nd May 2020) I know that I would at least need to wait for:
487040102 + 18 * 744843 = 500,447,276
Of course this is a very rough estimate. However at some point our block count will start looking
as if it is perpetually stuck, just short of reaching the chain tail. At that point we might as well
ignore the block count and look exclusively at the state count. Estimating our target state count is
helpful for us to stay cool during this phase.
With patience we should finally reach the synced state. Running eth.syncing
will now simply return false
.
Blockchain Resyncing
My first sync operation completed on 2nd March 2020 after syncing for 2 days and 18 hours (all the stats are summarized at the end).
However the most interesting data was collected in the weeks that followed. As already pointed out I am very interested in the resyncing time.
So following the initial sync I have resynced the node approximately every 7 days and timed this process.
An important fact to observe is that once the first sync is completed, fast sync mode is no longer available. Resync will now run in full sync mode. The log
will mostly show "Imported new chain segment"
entries. The resync rate is orders of magnitude slower than that for the first fast sync.
Initial Sync Data
First sync total time |
2 days 18 hours |
First sync completion date |
2nd March 2020 |
First sync block height |
9591728 |
First sync blockchain age (aprox.) |
4years 7months 3weeks |
Last sync completion date |
1st May 2020 |
Last sync block height |
9982207 |
Last sync storage size |
268GB |
Resyncing Data
Sync Date |
Time Start |
Time End |
Time Taken (h:mm:ss) |
Time Taken (sec) |
Start Block |
End Block |
Total Blocks |
Block Rate (Blk/sec) |
Initial Sync Age |
2-Apr |
14:31:04 |
20:41:57 |
6:10:53 |
22253 |
9742655 |
9794210 |
51555 |
2.32 |
1w15h51m |
11-Apr |
13:42:22 |
21:21:54 |
7:39:32 |
27572 |
9795215 |
9852805 |
57590 |
2.09 |
1w1d13h |
16-Apr |
12:33:59 |
16:22:53 |
3:48:54 |
13734 |
9853639 |
9884032 |
30393 |
2.21 |
4d12h 6m |
24-Apr |
13:02:32 |
20:20:15 |
7:17:43 |
26263 |
9885743 |
9936790 |
51047 |
1.94 |
1w14h18m |
1-May |
14:19:11 |
21:31:11 |
7:12:00 |
25920 |
9937598 |
9982200 |
44602 |
1.72 |
6d15h1m |
From the resyncing data it looks like we have to approximately allocate 1 hour of syncing for every day our
node is lagging behind. Note the huge difference between the fast sync mode used in the initial sync and the
full sync mode used on resyncing. Remember, fast sync caught up with over 4 years 9 months of data in under 3 days.
Concluding Remarks
My Ethereum node syncing went fairly smooth. I consider the results to be more than reasonable. With their regular updates the Ethereum core team are clearly delivering many improvements. If in the past your experience was less pleasant, you may want to give it another shot.
Useful Links
Go Ethereum
What is the upper bound of "imported new state entries"?
Geth progress when switching to trie download