rlm-blog: September 2008

Sunday, September 14, 2008

Double cosets implemented

I have finally implemented the "missing link" for doing canonical augmentation right. This is the question of computing double-coset type problems. Suppose you have two properties P_L and P_R which form subgroups of a permutation group G (i.e. this is true of the elements for which these hold), and a third property P. Further suppose that if P holds for any permutation in G, then it forms both a left coset of P_L and a right coset of P_R. This is the kind of problem the double-coset approach tackles.

In particular, this is a more efficient method of isomorphism than the usual "canonical label times two" approach. Essentially, only one tree structure is traversed (instead of two), and worse case performance is when this one traversal takes about the same time. So the best case will be a little more than half the time of the other approach.

I have implemented randomized testing of the new code on graphs only thus far, but everything now seems to be working, and what I have is posted here.

This was the last obstacle before canonical augmentation itself could be tackled. Currently, in the code which implements augmentation for binary codes, the step which should be accomplished by a double-coset approach is farmed out to GAP, in an inefficient way. Essentially, GAP computes a group intersection and a coset traversal, in order to find whether (C,P) ~ (C,M). Instead, using refinements for each structure in the pair, we can adapt the (flexible, thanks to the generalized framework) new double coset program to compute this very efficiently. In practice, GAP can take up to 70% of cpu time when being used in this way, where the new approach is expected to be a much smaller fraction of computation time. This should speed up the classification under way here.

Tuesday, September 9, 2008

Matrix Automorphism Groups

Suppose you are given an arbitrary matrix, and you wish to know the set of permutations of the columns, such that the (unordered) set of rows remains the same. This is often called the automorphism group of the matrix, and can be computed as follows, using the ability to compute the automorphism groups of hypergraphs.

First, suppose there are N symbols in your matrix. Construct N matrices of the same size as your original matrix, one for each symbol in the original matrix. Put a 1 in any position which contains the corresponding symbol in the original, and you get a collection of (0,1)-matrices. Think of the rows of these N matrices as the blocks of N hypergraphs. Each permutation in the matrix automorphism group must also be an automorphism of each of these hypergraphs. Note that you can freely delete any zero rows in these matrices without harm.

Finally, if you have refinement methods for computing automorphism groups of hypergraphs, you can easily construct a refinement method for the matrix automorphism group, by repeatedly refining against these hypergraphs until the partition stabilizes. The other two functions needed for a specific implementation are very easy to do for matrix automorphism, and there you have it.

Hypergraphs are due to appear sometime tomorrow (this is the plan anyway) in Sage 3.1.2, and matrix automorphism groups are currently coded, based on this. They should appear in Sage 3.1.3. Thanks once again to Google for funding this work.

Tuesday, September 2, 2008

Hypergraphs

Also known as incidence structures, nonlinear binary codes, block designs and (0,1)-matrices... The object is a collection of points P, and a collection of blocks, which are subsets of P. The action of S_P induces an action on subsets of P, and the subgroup of S_P which takes blocks to blocks is the automorphism group of the incidence structure.

This is the latest application of the generalized partition backtrack methods for computing automorphism groups and canonical labels. It is implemented with surprising similarity to the linear case (for example, the exact same refinement procedure is used). In fact, one of the optimizations Leon uses is to take some subset of the words of a linear binary code, and use them for refinement. With this new code in place, it will be trivial to implement such an optimization. However, choosing which subset of blocks, and then finding them all, is not a simple problem -- his approach is to simply take the set of words of minimum weight. This does reduce the time taken to refine each partition substantially, but it may have unexpected adverse effects of enlarging the search tree, depending on the specific code.

Currently the code is valgrinding to find the usual segfaults and other human errata. Then there should be a patch appearing soon on trac. Thanks again to Google for funding all this!

rlm-blog