## Thursday, July 31, 2008

### Large Linear Codes

The problem of size for binary codes has been easily fixed in terms of their length-- the original codes used machine words for codewords, and the new ones use bitsets instead. This will allow for arbitrary length codes, however dimension is still an issue, for much deeper reasons. The current approach is to think of the 2k words as additional points of the underlying bipartite graph being examined, and to run the usual algorithm. However, this seems inefficient, since the permutation group we're looking for is one that acts on the columns only.

In Leon's original paper on backtracking algorithms, he describes his method of working with a linear binary [n,k] code C. The idea is to produce a set of vectors V in F2n such that
1. V is invariant under Aut(C),
2. V is a relatively small set, and
3. [Aut(V):Aut(C)] is very small (the runtime rises rapidly as a function of this index).
He then describes several ways of trying to find a suitable such V, such as looking at the set of codewords of C of minimal weight, or those of the orthogonal complement of C. He also gives a way to pare down the invariant set by defining Vc,h = {v in V : wt(v+w) = c for exactly h elements w of V}. Then you use the refinement process coming from the nonlinear code V, checking at the bottom of the partition stacks whether the permutations are automorphisms of C.

Another approach to refinement is to never store such large sets in memory. We want to avoid o(2k) memory usage, so we only ever store a basis for the code, and we only keep track of the column partitions. This way memory usage will be around O(nk). However, this will require a different approach to refinement, which will either be effective in refining partitions and run in time O(n2k), or won't be very effective but will run in time O(nk). The idea for the prior is to generalize the notion of degree between a point p and a cell: each word which has a 1 at position p has a certain number w of 1's in the cell. Then the degree is an (n+1)-tuple (w_0, ..., w_n), where w_i is the number of words which has a 1 in position p and i 1's in the cell. This requires a gray code traversal of every word in the code, so time must be O(n2k), but the constants will be very nice: for each word, we need a few operations for the gray code traversal, a few operations for mask and looking up hamming weight, and the n is actually divided by the radix.

So we have three approaches:

1. Make a gigantic bipartite graph and analyze that.
2. Use refinements of a small (often difficult to find) invariant set whose symmetries are not much bigger than those of our code.
3. Use a small memory profile and submit to more frequent gray code traversals of the code.

None of these really solves the dimension problem (*), however the second and particularly the third make it feasible to at least ask the question of a large code. Given the potential of these algorithms for parallelization, the third option becomes particularly appealing, since the overhead of copying data from worker to worker would be small, and the underlying time-consuming operations could at least be cut down by the brute force of many processors. (*)- The dimension problem seems to be necessary in all of this, since the crucial difference between an effective and an ineffective refinement process is the ability to tell the weight of words in the code (loosely). The difficulty of problems reducing to weights of words in codes is well known...