Principles and Analogies of Error Correction Codes

I use a very abstract visual way to explain about different error correction algorithms. Here I will over-simplify every algorithm and then use visualizations instead, just to get the principle of how they works, and why.

Linear Block Code

In linear block code, every codewords will have n-k bits of redundant checkin part, and k-bits of message.

Linear Block Code

The generator matrix would look like this, where there is some identity matrix in it, that later will be used to generate syndrome to identify errors.

Linear Block Code Table

Hamming Code
The picture below shows how to identify error with even parity bits. The bits in black is the message, the bits in green is the parity to make sure that the number of 1 in a circle is even, the bit in red is the error, that later can be identified because the overlapping circle area will show where the error occur.

Hamming Code Diagram

The Hamming Code works like shown by the table below, the parity bits (unlike the previous method) are inserted in certain positions (2^0, 2^1, 2^2 and 2^3) to make sure that each of every 4 bits of message is being screened so that later the error can be tracked.

Hamming Code Table

Cyclic Code

This is my favorite. This method use the characteristic of polynomial in GF(2^n). So that identifying errors can be done by divide the codeword with the generator g(x). If the result is not 0, there is an error. Another thing that I like from this method is that every codeword is a shift from previous codeword (that’s why this method called “cyclic”) . So the implementation will be very easy, just shifting here and there (using LFSR), which in hardware implementation, is “costless”.

Cyclic Code Table

BCH Code

This method is a little bit complicated but the point is, to make a matrix H so that this matrix can screen out every bits (that’s why there are 2 rows that are linearly redundant).

BCH Code
Reed Muller

This one is interesting. This method has orders to identify different number of errors. Look at the matrix below. R(1,3) means that the order is 1 and the message length is 3. In Reed Muller, the pattern is obvious. Each vector in the matrix (x1, x2, x3) are used to search the location of error in convergence. Each row will direct the algorithm to a smaller space of error possibility location.
Reed Muller 1
For R(2, 3), we have the same number bits of message, which is three, but this time in order 2. Means that the matrix will be expanded and thus, more error can be detected. The part of matrix in yellow is the “basic” vectors, while the green shows the “additional” vectors to help allocating more errors.

Now look at R(3,3). More rows of the matrix, and more errors to be detected.


Reed Muller 3

So oversimplifyingly, I can say that error correction code, in principle, how to encode a message (to be a codeword) to get through a channel (error correction code is in domain channel coding, while data compression is in domain source coding) so that we would be able to make the message reliable (while in source coding, the target is not reliability but efficiency), by making sure that we can locate error and fix it. And the way to decode is like the illustration of quadtree below. The methods will generate matrix, or equations, or mathematical characteristic (polynomial groups etc) to help to narrow down the search space.

Quadtree
Quadtree

Hopefully this article can be useful, I will teach Reed Solomon Code next week, and let’s see if I can add up more things here, or publish a new blogpost.


Intel’s foray into Personal Data

So this is getting very interesting: The world’s largest chip maker wants to see a new kind of economy bloom around personal data (article here).

It looks like Intel is entering into the personal data & big data narrative. Given that Intel owns a considerable chunk of the motherboard & SoC real-estate (think Processors, discrete TPMs, AMT, etc. etc), they do indeed have access to the plumbing of my machine.

One question is whether hardware and chipset providers will begin to require end-users to agree to Terms of Service (allowing them to access data bits moving around the board). Such a move would complicate the user’s life.  A typical person would then be forced to accept TOS and EULAs at three layers (at least):

  • The hardware layer.
  • The OS level (think EULAs)
  • The application layer (think EULAs when installing Office productivity tools)
  • The Services Provider (SP) and IdP layer (think Click-Thru agreement when signing-up to accounts)

 

 

UMA Presentation from IIW#16

Eve Maler kindly prepared an excellent set of slides for me to present at IIW#16 in Mountain View, CA late April: UMA_for_IIW16_2013-05

After discussions during the presentation, I believe one of the technical issues that still causes confusion is the fact that UMA uses three (3) distinct OAuth2.0 Tokens:

  • AAT Tokens: Authorization API Token — this OAuth2.0 token is used by the Client to prove (to the Authorization Server) that it has authorization to access the APIs at the Authorization Server.
  • PAT Tokens: Protection API Tokens — this OAuth2.0 token is used by the Resource Server to prove (to the Authorization Server) that the Resource Owner (e.g. Alice) has provided it (the Resource Server) with authorization to register Alice’s resource at the Authorization Server.
  • RPT Tokens: Requesting Party Tokens — this OAuth2.0 token provides authorization for the Requesting Party to access resources at the Resource Server.

Here are the key take-aways:

  1. All three tokens are OAuth2.0 tokens.
  2. All three tokens are issued by the Authorization Server (or what used to be called the Authorization Manager in UMA).
  3. All three tokens should ideally be used in conjunction with the relevant parts of the UMA Binding Obligations (BO) spec. The BO spec tells the parties involved what their legal obligations will be.