06/12/08

Permalink 01:03:53 pm, Categories: Configuration management, Augeas, 122 words  

Augeas on linux.com

linux.com has a very nice article on Augeas. It's a very nice overview and introduction.

As embarrassed as I am by the bug the author ran into (blank lines in /etc/hosts threw Augeas' parser off), I am glad to say that it's fixed in the most recent version 0.2.0. The bug and its fix underscores a bigger point though: by basing your config-mangling script on Augeas, rather than parsing files yourself, you will benefit from others finding and fixing bugs in Augeas' parsing logic.

Some of the comments on the article are a little confused about Augeas' purpose: it's not meant for the situations where you'd be perfectly happy to use vi, it's main purpose is to ease scripting configuration changes.

Permalink

05/16/08

Permalink 01:31:49 pm, Categories: Programming, Configuration management, Augeas, 216 words  

Augeas 0.1.1

I just released Augeas 0.1.1; without really planning it, it turned out that the last two weeks were mostly spend on fixing bugs (besides the regular expression enhancement I blogged about previously — even though the real reason for doing that was that the typechecker had a serious bug, and subtraction of regular languages is needed to make the fixed typecheck usable)

The reference counting code in the interpreter had some serious leaks. I had known about them for a while, but never tried to track them down systematically, partly because I thought it would be way too hairy. As it turned out, they weren't that hard to track down; the key ingredient in squashing them was writing little test scripts that only exercised a small number of operations, like

  let l = key /a/

and then running Valgrind a lot, and gdb a little. Of course, the real trick is to figure out what little toy scripts to write ...

Besides memory leaks, I also realized, using Valgrind's massif tool, that compiled regular expressions are huge, and I was hanging on to them for way too long.

With all that, Augeas 0.1.1 has no known memory leaks, and uses a reasonable amount of memory. Most of the honor for that goes to Valgrind, which is an amazingly useful tool.

Permalink

05/13/08

Permalink 06:14:30 pm, Categories: Programming, Augeas, 162 words  

Fun with regular languages

For Augeas, I wanted to support subtraction of regular expresions, so that you can say

  let key_re = /[A-Za-z]+/ - /(Allow|Deny)(Groups|Users)/

which would make key_re match all words made up of lower and upper case letters except for AllowGroups, AllowUsers, DenyGroups and DenyUsers --- the reason being, that those four special cases are handled differently from "generic" keys.

Since the - can't be expressed in regular language notation, it needs to be constructed by compiling its two operands into a finite automaton, subtracting the two automata from each other, and then converting the automaton back into a regular expression. All these operations, except for the conversion from automaton to regular expression, were already supported by libfa.

Implementing the conversion was quite a bit of fun, and the implementation follows almost literally the proof [pdf] that every language recognized by a finite automaton is regular. For some reason, these graph algorithms are always fun to implement, especially when they wind up working ;)

Permalink

05/05/08

Permalink 10:42:24 am, Categories: Configuration management, Augeas, 981 words  

Augeas - a configuration API

A while ago I had what would be a hallway conversation with Mark if we worked in the same office (or country, for that matter.) Something he said set me thinking that getting a better handle on the mess of file formats in /etc would be possible, and in a way that would hide much of the pain those different formats inflict when config files need to be changed programmatically; it's actually nice that config data is stored in text files for interactive use (yes, that means vi), but a smoldering trainwreck when changes need to be scripted.

Editorial Note: we apologize for the length of this entry. If you don't want to read all this slipslop, feel free to go straight to the Augeas website. Just tell lutter how much fun you had reading his blog

It's a commonplace that the colorful variety of files and file formats used to configure the average Linux (or Unixy) system keeps us from having any sort of API to modify config data, and that any attempt to change that is doomed. Pretty exactly a year ago, I argued precisely that point (convincingly, I thought): that the best we can hope for is to have a few better tools for each service to modify its configuration. Maybe we can even build something on top of those tools, but that that's about as far as any such attempt could ever go in practice.

After that non-hallway conversation, it dawned on me that the various attempts to deal with this situation boiled down to three different approaches:

  1. Bear it and smile; if you are unfortunate enough to have to make config changes, fiddle with sed or awk or the equivalent regexp functionality in your favorite scripting language long enough to make those changes, and keep fingers crossed that nobody with an "unusual" file will ever run your script; unusual might be as simple as whitespace at the end of a line.
  2. Propose a real API with a real data store. The individual approaches vary, but it usually boils down to exposing a tree through the API, and storing config data in LDAP/a relational DB/XML/anywhere else anybody has ever stored data. Once that's implemented, all we need is for every program that reads config data from one of those files in /etc to use that really good API.
  3. Use templating; expose some form of API that in the backend just fills values into some sort of template and writes that into the right place in /etc.

All these have been tried, and they all have serious limitations:

  1. The sed approach is the most widely used, and its problems are pretty well understood: works reasonably well for simple file formats, but changing dhcpd.conf that way is not for the faint of heart. The bigger problem is that that's just no way to build an API, and the same "solution" for editing config file X gets reinvented for a variety of reasons — that excellent script to change X is written in Python, and Perl is needed, or that that code is impossible to find, buried deep in the guts of something else, or, most likely, config editing is a pain, nothing can be done about it, so suck it up and get away from it, quick.
  2. The unified API approach generally starts with a lot of good thought, and a good list of goals, usually way beyond just editing files; in practice they go nowhere: if they don't collapse under the weight of all the really good things that you can do with an API on top of editing, reality comes to kill them, because upstream generally doesn't jump up and down at the opportunity to change their code, for a very valid reason: the API is completely unproven, and there's no guarantee that it will ever be widely accepted. There are very strong negative network effects in place that kill any config API that requires upstream changes to be useful.
  3. Templating works in some situations, but has the huge drawback that the templating mechanism is the only one that will ever be allowed to touch the "real" config files. Besides, coming up with templating schemes that work well for a wide variety of uses and a reasonable set of config files is hard.

With all this in mind, my list of requirements for Augeas roughly looked like this:

  • Make programmatic edits of config data easy and reliable, and build a simple API around that. Edits should lead to intuitively "minimal" diffs; in particular preserve comments and formatting details.
  • Do as little beyond that as possible. In systems management, premature modeling is the root of all evil.
  • Make it reasonably easy to describe config file formats and how data should be exposed through the API. Ideally, these descriptions can be improved incrementally as their use turns up inevitable flaws in the descriptions.
  • Augeas must be useful without any changes to upstream code.
  • No additional data store. The only data that Augeas can use is what's in the config files, together with the description of file formats.

After banging my head against the above for a while, and learning most, if not all, of the ways in which not to achieve it, I came across some work by the programming languages group at Penn, in particular Harmony and Boomerang. That work sent me down the right path. Because of the nice theoretical foundation laid by Harmony and Boomerang, Augeas checks descriptions statically (i.e., before they are ever used to modify a single file) to guard against a whole host of possible problems. Some of these problems are quite subtle, and are much easier for a computer to detect than for a human.

The Augeas website contains an introductory tour showing how the API is used, and more details on how file formats are described.

Permalink

03/07/08

Permalink 10:18:11 am, Categories: Programming, 331 words  

Finite Automata in C

Recently, I needed a finite automata library written in C (for those of you who don't remember their formal language classes too well, finite automata are the theoretical underpinning of regular expressions) In a nutshell, a finite automaton represents the set of all strings matching a regular expression.

Such a library is a little different from regular expression matching, for which there are plenty of libraries, like GNU regex, because it supports operations that are impossible with regular expressions alone, such as intersection and the more exotic deciding of ambiguity.

Unfortunately, there isn't a well-maintained open-source C library to do that. Lucky for me, there is a very well-written Java library, dk.brics.automaton by Anders Møller. Based on that, I just finished implementing libfa. It is built as a separate DSO, but it's not distributed separately from Augeas yet; if you need an FA library and libfa seems like it would be useful, drop me a line and I'll split it out. It's mostly a matter of wrestling with autotools.

If you're curious about what you can do with finite automata that you can't with regular expressions alone, deciding ambiguous concatenation is a good example: the concatenation of two regular expressions r1 and r2 is ambiguous if there is a string upv that matches r1r2 such that both u and up match r1 (and therefore pv and v match r2)

Ambiguity is important if you attach actions to r1 and r2, for example, delete strings matching r1 from the output and convert everything matching r2 to uppercase: if you're confronted with an ambiguous string upv, you don't know what to do with p: should you delete it (splitting upv into up and v) or should you upcase it (splitting upv into u and pv). What's worse, when you match r1r2 against upv, you'll never know that there is ambiguity, and how it gets split largely depends on minute details of the implementation of the regular expression matcher.

:: Next Page >>

Search

Syndicate this blog XML

What is RSS?

Misc

powered by
b2evolution