Let’s take a look at the third component of the blockchain database and how data are sent and received over the network. Say you use a client application like a Bitcoin wallet to generate some transaction data. If you are using a traditional database to record it, this process is easy. You just send it over to the web address of the database server and it’ll be quickly received and processed. If you’re using a relatively centralized blockchain with limited number of nodes like Ripple, this process is also easy. You just broadcast the data to all the nodes simultaneously and they’ll receive and process the data in a relatively short timeframe.
Things get a little bit more complicated if you’re using a distributed blockchain like Bitcoin. Instead of a few central parties, you get a network with hundreds of thousands to millions of nodes. Because of network latency, if you’re trying to get the transaction, upload it to all the nodes, you would take a really long time. Therefore, in a distributed blockchain, some compromise has to be implemented to make this process more time-efficient. The setting will look familiar to you if you have used peer-to-peer file sharing apps like BitTorrent. If not, don’t worry. Things will be quite clear after this video.
Say you have a network of nodes carrying the blockchain, let’s use Bitcoin as an example, where the network of nodes are also called minors. Suppose you want to send some Bitcoin that you have to another person. This means getting the transaction record uploaded onto the Bitcoin blockchain. So you’ll use your wallet app to generate the transaction data, which as we’ll see shortly, it’s just a programming script. But instead of sending it to the entire network, you’re going to broadcast the transaction data to a few miners that are closest to you in terms of network latency. The miners will propagate the data on the network using what’s called a gossip protocol.
As its name suggests, the protocol is similar to how a gossip was spread among a network of friends. You wouldn’t shout the gossip aloud to everyone, but instead you whisper it to the people sitting next to you. Same process here. The miners will propagate the transaction to the miners that are closest to them in terms of network latency. This process goes on and on until the entire network has received the data. Graph theory suggests that this form of information propagation is more efficient than the setting where you broadcast the information to everybody. In practice though, this has two limitations.
The obvious one is that despite being more efficient, it will still take some time for that information to go through, particularly if the network is large or the data is large in size, it can still take minutes or even hours. Therefore, most blockchain variations implement some sort of time cut-off. For instance, in Bitcoin, every block is cut at roughly every 10 minutes. If your data didn’t make it to the block, you’ll have to wait for it to get onto the next block 10 minutes later. This process however introduces a second problem. Sometimes the 10-minute cutoff is not enough time for the data to be propagated across the entire network.
So at the end of every interval, some nodes would have received it, some nodes wouldn’t. They would have a different set of transaction information received from other wallet apps closest to them. This conflict has to be resolved, and this is the reason for the fourth block chain component, the consensus mechanism, which we’ll discuss in the next video. Now that we know how the data is propagated across the network, let’s take a closer look at the data themselves and see how they are generated and received. Again, as I said before, most blockchains use a scripting language. So the data broadcast and stored on the blockchain are essentially some programming code and the associated inputs and outputs.
This is an example of the Bitcoin transaction data that any wallet could generate. Now with the knowledge from the previous couple of videos, we can almost make sense of it. It’s a simple script with some inputs and outputs. The input has a hash pointer containing the hash of a previous transaction, proving that you have received the Bitcoin. Next, scriptSig is just a script that uses your private key to sign the transaction. As an output, you’re going to put the amount that you want to send, the receiver’s “shipping address” which is just a hash of their public key.
Finally, a couple of scripts for the miners to execute to check the validity of the transaction, including checking the signature to make sure that it matches your public key and also checking the transaction amount to make sure that you’re not spending more than you have. That’s essentially how your data is processed in a distributed blockchain like Bitcoin. Let’s take another 30,000-foot view. The transaction data is generated as a script using the client app and signed using your private key. When the minors receive the script, they execute them to check the transaction has the valid signature and the amount, then propagate them to the other miners.
Once it officially make to the blockchain after the consensus is reached, the receiver can then repeat the process and send the coin that they just received elsewhere. The final piece of the blockchain data processing puzzle is the notion of blocks. As we saw a couple of minutes ago, blocks are there simply to serve as a “batch processing” mechanism to enhance the efficiency and enforce a time cut-off for data to be broadcast across the network. In this feature, the blockchain network is similar to the ACH network that we talked before instead of the credit card network. Sure, you can use each transaction as a block and process them in real time just like a credit card transaction.
But usually it’s more efficient to do at least some batch processing like an ACH transaction. Different blockchains use different parameters on a block size and time intervals and there’s a lot of flexibility there. From around 10 minutes in Bitcoin to between 10 and 20 seconds for Ethereum. During this interval, the nodes receive the data as usual. Instead of sending each one through in real time, they’re going to group them into a pending block. Again, because of network latency or even attack attempts, this pending block could be different for each node, as each pending blog will contain a different set of transactions that the nodes have received so far.
At the end of each interval, the nodes are going to reconcile their pending blocks using some consensus algorithm. So at the end, only one block makes it to the chain and is downloaded by all the nodes. This process would repeat. Within each block, the data is usually organized by the nodes themselves using, for instance, Merkle trees. Here, the important part is because of the decentralized nature of the network, there’s no hard and fast rule that requires the nodes to organize the data strictly in a first-come-first-served basis. In fact, the nodes have complete discretion on whether to receive the data and how to organize it within the block.
In many cases, you might have to pay to get your data received and broadcasts. As we’ll see in the module on cryptocurrencies, this could serve as an important incentive for the nodes to participate in the network.