Thursday, July 31, 2008

Large Linear Codes

The problem of size for binary codes has been easily fixed in terms of their length-- the original codes used machine words for codewords, and the new ones use bitsets instead. This will allow for arbitrary length codes, however dimension is still an issue, for much deeper reasons. The current approach is to think of the 2k words as additional points of the underlying bipartite graph being examined, and to run the usual algorithm. However, this seems inefficient, since the permutation group we're looking for is one that acts on the columns only.

In Leon's original paper on backtracking algorithms, he describes his method of working with a linear binary [n,k] code C. The idea is to produce a set of vectors V in F2n such that
  1. V is invariant under Aut(C),
  2. V is a relatively small set, and
  3. [Aut(V):Aut(C)] is very small (the runtime rises rapidly as a function of this index).
He then describes several ways of trying to find a suitable such V, such as looking at the set of codewords of C of minimal weight, or those of the orthogonal complement of C. He also gives a way to pare down the invariant set by defining Vc,h = {v in V : wt(v+w) = c for exactly h elements w of V}. Then you use the refinement process coming from the nonlinear code V, checking at the bottom of the partition stacks whether the permutations are automorphisms of C.

Another approach to refinement is to never store such large sets in memory. We want to avoid o(2k) memory usage, so we only ever store a basis for the code, and we only keep track of the column partitions. This way memory usage will be around O(nk). However, this will require a different approach to refinement, which will either be effective in refining partitions and run in time O(n2k), or won't be very effective but will run in time O(nk). The idea for the prior is to generalize the notion of degree between a point p and a cell: each word which has a 1 at position p has a certain number w of 1's in the cell. Then the degree is an (n+1)-tuple (w_0, ..., w_n), where w_i is the number of words which has a 1 in position p and i 1's in the cell. This requires a gray code traversal of every word in the code, so time must be O(n2k), but the constants will be very nice: for each word, we need a few operations for the gray code traversal, a few operations for mask and looking up hamming weight, and the n is actually divided by the radix.

So we have three approaches:

  1. Make a gigantic bipartite graph and analyze that.
  2. Use refinements of a small (often difficult to find) invariant set whose symmetries are not much bigger than those of our code.
  3. Use a small memory profile and submit to more frequent gray code traversals of the code.

None of these really solves the dimension problem (*), however the second and particularly the third make it feasible to at least ask the question of a large code. Given the potential of these algorithms for parallelization, the third option becomes particularly appealing, since the overhead of copying data from worker to worker would be small, and the underlying time-consuming operations could at least be cut down by the brute force of many processors. (*)- The dimension problem seems to be necessary in all of this, since the crucial difference between an effective and an ineffective refinement process is the ability to tell the weight of words in the code (loosely). The difficulty of problems reducing to weights of words in codes is well known...

Tuesday, July 29, 2008

Binary Codes

I'm currently working on refactoring the binary code part of sage into the new framework, and I'm trying to get my head clear on exactly what is to be done. I'll be rewriting a binary code class/struct where degree is arbitrary (the underlying implementation will use bitsets for the basis). In order to retain focus, I won't be tying this in to Sage's general linear codes until after my work for Google is done, keeping my work in the newly created sage/groups/perm_gps/partn_ref module.

To adapt incidence structures to the algorithm in general, the standard procedure is to produce a bipartite graph whose vertices are divided into two sets: points and lines of the incidence structure. However, the automorphism group of an incidence structure is a permutation group acting on the points of the structure, which induces an action on the lines. Binary codes are an example of incidence structures, with additional linear structure.

However, the bipartite graph approach can be more expensive-- this is in essence the solution to the riddle (item 4 here) of the projective planes of order 16 (which I will confirm experimentally later, once the machinery is there). Instead, the refinement tree only needs to be partitions of the points. Although it will be handy to keep the partition of the words of the code around, this need not be in the main algorithm, since it will only be used on refinement.

Tuesday, July 22, 2008


Adinkras are visual symbols ("originally created by the Akan of Ghana and the Gyaman of Cote d'Ivoire in West Africa" - wikipedia) representing different concepts. In the paper by Faux and Gates, the term "Adinkra" was used to refer to an edge-colored bipartite graph with height assignments on the vertices which contains all the data of a certain type of representation of the supersymmetry algebra. They look like this:

In classifying the off-shell representations of the supersymmetry algebra, Adinkras are a key tool. I have spent the last few days applying the new Google-funded code to classify (undashed) Adinkras, for a paper in preparation. On the first day, other than writing up a rather naive implementation of a classification algorithm, I spent most of my time hunting for this bug. The second day, today, was spent optimizing the algorithm (from 173.54 secs to 88.74 secs for all N=5 Adinkras).

Here is how to compute the canonical label and automorphism group of an undashed Adinkra. An isomorphism of Adinkras is a map of the vertices which leaves the heights unchanged, which is a graph isomorphism on the underlying graph with the option of permuting the colors which label the edges. Thus we form a partition of the vertices by height. Next, an edge u,v with color c becomes a triple of edges. One new vertex is inserted to represent the edge, and two edges come from this vertex to the vertices incident with the original edge. The third edge goes from a vertex representing the original edge to a vertex representing the color c. The partition then gets a cell consisting of the edges, and a cell consisting of the colors. Thus the edges may permute amongst themselves and the colors may permute amongst themselves.

Friday, July 18, 2008

Patch #1

I submitted a patch to trac today. In it, there is a new module, sage.groups.perm_gps.partn_ref, which has a module for generic partition backtrack, tree_traversal, and a module for graph isomorphism/automorphism groups, refinement_graphs. The function search_tree in refinement_graphs is meant to ultimately replace the function of the same name in graph_isom, which should eventually become obsolete and removed. All tests that the original passes, the replacement now also passes. This is my first Google sponsored patch!

I'd like to add two notes. There were two places in graph_isom.pyx where I wasn't exactly sure what was going on, and I now have explanations for both of them:

# TODO: investigate the following line
if nu.k == -1: nu.k = 0 # not in BDM,
# broke at G = Graph({0:[], 1:[]}),
# Pi = [[0,1]], lab=False

Resolved! In fact, k should never be -1 here, and this was a false fix for a different bug (since squished). I've verified in the new version that this never ever happens.

# TODO: investigate why, in practice, the
# same does not seem to be true for hzf < k...
# BDM had !=, not <, and this broke at
# G = Graph({0:[],1:[],2:[]}), Pi = [[0,1,2]]
if nu.k < hzf: state = 8; continue

Resolved! This is similar to the last "fix"- the comments in graph_isom.pyx are wrong. hzf is the length for which indicator values line up, so for there to be an automorphism, nu must line up with zeta all the way, exactly. I've verified in the new version that != here instead does work.

Wednesday, July 16, 2008

Graph Isomorphism

I have just reproduced some of the graph isomorphism doctests with the new partition refinement code. Of the two files meant to replace graph_isom.pyx, one is over a thousand lines and one is around 300 lines. The graph-specific parts become the 300 lines, and the generic stuff is most of it. I'm currently scraping graph_isom.pyx for useful tests of this new software. In a day or two, I should have a complete replacement for the current graph isomorphism code, which will show up on trac as a patch. I'm particularly excited to demonstrate some nontrivial calculations using the new software, such as the sizes of the automorphism groups of the dodecahedron and the cube (search_tree_old is a function with the same signature as the function in graph_isom.pyx which uses the new architecture):

sage: import sage.groups.perm_gps.partn_ref.refinement_graphs
sage: st = sage.groups.perm_gps.partn_ref.refinement_graphs.search_tree_old

sage: G = graphs.DodecahedralGraph()
sage: Pi = [range(20)]
sage: st(G, Pi, order=True)[2]

sage: G = graphs.CubeGraph(3)
sage: G.relabel()
sage: Pi = [G.vertices()]
sage: st(G, Pi, order=True)[2]

Friday, July 11, 2008

Phase 1 Complete

Again, grateful thanks to Google for funding my summer work.

"Phase 1," by which I mean rewriting the common parts of sage.graphs.graph_isom and sage.coding.binary_code, is now finished. All of the variable names are logically labeled, so that in fact many of the comments in the original sage.graphs.graph_isom code have been simply removed. Now, reading the code itself is much like reading the comments in the original.

I have also tested the code written so far with valgrind, and fixed the errors that were found. For each of the problem specific functions, I have implemented a trivial version, which essentially does nothing. The first, which refines partitions, returns the partition given. The second, which allows for determining that all the nodes below the current position, always returns false. The third, which compares structures under a permutation, always returns true, i.e. that the permutation is an isomorphism of the structure in question. Basically, this means that given a partition with blocks of size n(1), ..., n(k), the algorithm just computes Sn(1) x ... x Sn(k).

In theory, this should be ready for arbitrary refinement functions and structures, so Phase 2 of my project will be to implement as many specific such things as possible. I'd like to try graphs, block designs (including binary codes), computing the automorphism group of a matrix (permuting columns such that the set of rows is preserved), reflexive polytopes (perhaps using some of the functionality of PALP, perhaps not...), Adinkras, and certain kinds of combinatorial species.

On a side note, I've said a few times to people that it's nice to be getting funding for "what I'd be doing anyway," but I have to admit that it has been making me much more productive in my work than I might have been otherwise. Thanks again, Google!

Wednesday, July 9, 2008

Partition Refinement Snapshot

I've just passed the thousand line mark on the partition backtrack code I'm writing this summer (thanks to the support of Google!), so I thought I'd take a snapshot of the work done so far and post it. As you can see, I've taken care of the datastructures, supporting functions, memory management, and I'm through the main loop itself. The naming conventions are better, the code is much more self-explanatory, and the loop is structured in a logical way, instead of a mazelike circuit of GOTOs. You can continue to watch my progress through the guts of NICE here, as I am trying to keep it current with what I am doing.

Sunday, July 6, 2008

Variable names, people!

Once again thanks go to Google for funding my summer work!

If you've ever been frustrated with unhelpful variable names, perhaps you'll appreciate the planned changes for the partition backtrack algorithm I'm working on currently:

Theta -> orbits_of_subgroup
index -> subgroup_primary_orbit_size
size -> subgroup_size
L -> max_len_of_fp_and_mcr
l -> index_in_fp_and_mcr
Phi -> fixed_points_of_generators
Omega -> minimal_cell_reps_of_generators
W -> vertices_to_split
nu -> current_ps
zeta -> first_ps
rho -> label_ps
h -> first_meets_current
hb -> label_meets_current
hh -> current_kids_are_same
ht -> first_kids_are_same
Lambda -> current_indicators
zf -> first_indicators
zb -> label_indicators
hzf -> first_and_current_indicator_same
hzb -> label_and_current_indicator_same