ongoing by Tim Bray

ongoing fragmented essay by Tim Bray

Unbackslash 22 Sep 2024, 7:00 pm

Old software joke: “After the apocalypse, all that’ll be left will be cockroaches, Keith Richards, and markup characters that have been escaped (or unescaped) one too many (or few) times.” I’m working on a programming problem where escaping is a major pain in the ass, specifically “\”. So, for reasons that seem good to me, I want to replace it. What with?

The problem

My Quamina project is all about matching patterns (not going into any further details here, I’ve written this one to death). Recently, I implemented a “wildcard” pattern, that works just like a shell glob, so you can match things like *.xlsx or invoice-*.pdf. The only metacharacter is *, so it has basic escaping, just \* and \\.

It wasn’t hard to write the code, but the unit tests were a freaking nightmare, because \. Specifically, because Quamina’s patterns are wrapped in JSON, which also uses \ for escaping, and I’m coding in Go, which does too, differently for strings delimited by " and `. In the worst case, to test whether \\ was handled properly, I’d have \\\\\\\\ in my test code.

It got to the point that when a test was failing, I had to go into the debugger to figure out what eventually got passed to the library code I was working on. One of the cats jumped up on my keyboard while I was beset with \\\\ and found itself trying to tread air. (It was a short drop onto a soft carpet. But did I ever get glared at.)

Regular expressions ouch

That’s the Quamina feature I’ve just started working on. And as everyone knows, they use \ promiscuously. Dear Reader, I’m going to spare you the “Sickening Regexps I Have Known” war stories. I’m sure you have your own. And I bet they include lots of \’s.

(The particular dialect of regexps I’m writing is I-Regexp.)

I’ve never implemented a regular-expression processor myself, so I expect to find it a bit challenging. And I expect to have really a lot of unit tests. And the prospect of wrangling the \’s in those tests is making me nauseous.

I was telling myself to suck it up when a little voice in the back of my head piped up “But the people who use this library will be writing Go code to generate and test patterns that are JSON-wrapped, so they’re going to suffer just like you are now.”

Crazy idea

So I tried to adopt the worldview of a weary developer trying to unit-test their patterns and simultaneously fighting JSON and Go about what \\ might mean. And I thought “What if I used some other character for escaping in the regexp? One that didn’t have special meanings to multiple layers of software?”

“But that’s crazy” said the other half of my brain. Everyone has been writing things like \S+\.txt and [^{}[\]]+ for years and just thinks that way. Also, the Spanish Inquisition.”

Whatever; like Prince said, let’s go crazy.

The new backslash

We need something that’s visually distinctive, relatively unlikely to appear in common regular expressions, and not too hard for a programmer to enter. Here are some candidates, in no particular order.

For each, we’ll take a simple harmless regexp that matches a pair of parentheses containing no line breaks, like so:

Original: \([^\n\r)]*\)

And replace its \‘s with the candidate to see what it looks like:

Left guillemet: «

This is commonly used as open-quotation in non-English languages, in particular French. “Open quotation” has a good semantic feel; after all, \ sort of ”quotes” the following character. It’s visually pretty distinctive. But it’s hard to type on keyboards not located in Europe. Speaking of developers sitting behind those keyboards, they’re more likely to want to use « in a regexp. Hmm.

Sample: «([^«n«r)]*«)

Em dash: —

Speaking of characters used to begin quotes, Em dash seems visually identical to U+2015 QUOTATION DASH, which I’ve often seen as a quotation start in English-language fiction. Em dash is reasonably easy to type, unlikely to appear much in real life. Visually compelling.

Sample: —([^—n—r)]*—)

Left double quotation mark: “

(AKA left smart quote.) So if we like something that suggests an opening quote, why not just use an opening quote? There’s a key combo to generate it on most people’s keyboards. It’s not that likely to appear in developers’ regular expressions. Visually strong enough?

Sample: “([^“n“r)]*“)

Pilcrow: ¶

Usually used to mark a paragraph, so no semantic linkage. But, it’s visually strong (maybe too strong?) and has combos on many keyboards. Unlikely to appear in a regular expression.

Sample: ¶([^¶n¶r)]*¶)

Section sign: §

Once again, visually (maybe too) strong, accessible from many keyboards, not commonly found in regexps.

Sample: §([^§n§r)]*§)

Tilde: ~

Why not? I’ve never seen one in a regexp.

Sample: ~([^~n~r)]*~)

Escaping

Suppose we used tilde to replace backslash. We’d need a way to escape tilde when we wanted it to mean itself. I think just doubling the magic character works fine. So suppose you wanted to match anything beginning with . in my home directory: ~~timbray/~..*

“But wait,” you cry, “why are any of these better than \?” Because there aren’t other layers of software fighting to interpret them as an escape, it’s all yours.

You can vote!

I’m going to run a series of polls on Mastodon. Get yourself an account anywhere in the Fediverse and follow the #unbackslash hashtag. Polls will occur on Friday September 27, in reasonable Pacific times. Of course, one of the options will be “Don’t do this crazy thing, stick with good ol’ \!”

New Amplification 9 Sep 2024, 7:00 pm

The less interesting part of the story is that my big home stereo has new amplification: Tiny Class-D Monoblocks! (Terminology explained below.) More interesting, another audiophile tenet has been holed below the waterline by Moore’s Law. This is a good thing, both for people who just want good sound to be cheaper, and for deranged audiophiles like me.

Tl;dr

This was going to be a short piece, but it got out of control. So, here’s the deal: Audiophiles who love good sound and are willing to throw money at the problem should now throw almost all of it at the pure-analog pieces:

  1. Speakers.

  2. Listening room setup.

  3. Phono cartridge (and maybe turntable) (if you do LPs).

What’s new and different is that amplification technology has joined D-to-A conversion as a domain where small, cheap, semiconductors offer performance that’s close enough to perfect to not matter. The rest of this piece is an overly-long discussion of what amplification is and of the new technology.

Fosi V3 Mono

The future of amplifiers looks something like this; more below.

What’s an “amp”?

A stereo system can have lots of pieces: Record players, cartridges, DACs, volume and tone controls, input selectors, and speakers. But in every system the last step before the speakers is the “power amplifier”; let’s just say “amp”. Upstream, music is routed round the system, not with electrical currents, but by a voltage signal, we say “line level”. That is to say, the voltage vibrates back and forth, usually between +/-1V, the vibration pattern being that of the music, i.e. that of the sound-wave vibrations you want the speakers to produce in the air between them and your ears.

Now, it takes a lot more than +/-1V to make sound come out of speakers. You need actual electrical current and multiple watts of energy to vibrate the electromagnets in your speakers and generate sound by pushing air around, which will push your eardrums around, which sends data to your brain that results in the experience of pleasure. If you have a big room and not-terribly-efficient speakers and are trying to play a Mahler symphony really loud, it can get into hundreds of watts.

So what an amp does take the line-level voltage signal and turn it into a corresponding electric-current signal with enough oomph behind it to emulate the hundred or so musicians required for that Mahler.

Some speakers (subwoofers, sound bars) come with amps built in, so you just have to send them the line-level signal and they take care of the rest. But in a serious audiophile system, your speakers are typically passive unpowered devices driven through speaker wires by an amp.

Historically, high-end amps have often been large, heavy, expensive, impressive-looking devices. The power can come either from vacuum tubes or “solid-state” circuits (basically, transistors and capacitors). Vacuum tubes are old technology and prone to distortion when driven too hard; electric-guitar amps do this deliberately to produce cool snarly sounds. But there are audiophiles who love tube amps and plenty are sold.

Amps come in pairs, one for each speaker, usually together in a box called a “stereo amplifier”. Sometimes the box also has volume and tone controls and so on, in which case it’s called an “integrated amplifier”.

So, what’s new?

TI TPA3255

TPA3255

This thing, made by Texas Instruments, is described as a “315-W stereo, 600-W mono, 18 to 53.5V supply, analog input Class-D audio amplifier”. It’s tiny: 14x6.1mm! It sort of blows my mind that this little sliver of semiconductor can serve as the engine for the class of amps that used to weigh 20kg and be the size of a small suitcase. Also that it can deliver hundreds of watts of power without vanishing in a puff of smoke.

Also, it costs less than $20, quantity one.

It’s not that new, was released in 2016. It would be wrong to have expected products built around it to arrive right away. I said above that the chip is the engine of an amplifier, and just like a car, once you have an engine there’s still lots to be built. You have to route the signal and power to the chip — and this particular chip needs a lot of power. You have to route the chip output to the speaker connection, and you have to deal with the fact that speakers’ impedences (impedance is resistance, except for alternating rather than direct current) vary with audio frequency in complicated ways.

Anyhow, to make a long story short, in the last couple of years there have started to be TPA3255-based amps that are aimed at audiophiles, claiming a combination of high power, high accuracy, small size, and low price. And technically-sophisticated reviewers have started to do serious measurements on them and… wow. The results seem to show that the power is as advertised, and that any distortion or nonlinearity is way down below the sensitivity of human hearing. Which is to say, more or less perfect.

For example, check out the work of Archimago, an extremely technical high-end audio blogger, who’s been digging in deep on TPA3255-based amps. If you want to look at a lot of graphs most of which will be incomprehensible unless you’ve got a university education in the subject, check out his reviews of the AIYIMA A08 Pro, Fosi Audio TB10D, and Aoshida A7.

Or, actually, don’t. Below I’ll link to the measurements of the one I bought, and discuss why it’s especially interesting. (Well, maybe do have a quick look, because some of these little beasties come with a charming steampunk aesthetic.)

PWM

That stands for pulse-width modulation, the technique that makes Class-D amps work. It’s remarkably clever. You have the line-level audio input, and you also bring in a triangle-wave signal (straight lines up then back down) at a higher frequency, and you take samples at another higher frequency and if the audio voltage is higher than the sawtooth voltage, you turn the power on, and if lower, you turn it off. So the effect is that the louder the music, the higher the proportion of time the power is on. So you get current output that is shaped like the voltage input, only with lots of little square corners that look like high-frequency noise; an appropriate circuit filters out the high frequencies and reproduces the shape of the input wave with high accuracy.

If that didn’t make sense, here’s a decent YouTube explainer.

The explanation, which my understanding of practical electronics doesn’t go deep enough to validate, is that because the power is only ever on or off, no intermediate states are necessary and the circuit is super efficient therefore cheap.

Monoblocks

Most amps are “stereo amplifiers”, i.e. two amps in a box. They have to solve the problem of keeping the two stereo signals from affecting each other. It turns out the TPA3255 does this right on the chip. So the people who measure and evaluate these devices pay a lot of attention to “channel separation” and “crosstalk”. This has led to high-end audiophiles liking “monoblock” amps, where you have two separate boxes, one for each speaker. Poof! crosstalk is no longer an issue.

Enter Fosi

You may have noticed that you didn’t recognize any of the brand names in that list of reviews above. I didn’t either. This is because mainstream brands from North America, Europe, and Japan are not exactly eager to start replacing their big impressive high-end amps costing thousands of dollars with small, cheap TPA3255-based products at a tenth the price.

Shenzen ain’t got time for that. Near as I can tell, all these outfits shipping little cheap amps are down some back street off a back street in the Shenzen-Guanghzhou megalopolis. One of them is Fosi Audio.

They have a decent web site but are definitely a back-street Shenzen operation. What caught my attention was Archimago’s 2-part review (1, 2) of Fosi’s V3 Mono.

This is a monoblock power amp with some ludicrously high power rating that you can buy as a pair with a shared power supply for a super-reasonable price. They launched with a Kickstarter.

I recommend reading either or both of Archimago’s reviews to feel the flavor of the quantitative-audio approach and also for the general coolness of these products.

I’m stealing one of Archimago’s pictures here, to reveal how insanely small the chip is; it’s the little black/grey rectangle at the middle of the board.

Internals of Fosi V3 Mono

And here is my own pair of V3 Monos to the right of the record player.

Fosi V3 Mono amplifiers beside a Rega turntable

My own experience

My previous amp (an Ayre Acoustics V-5xe) was just fine albeit kinda ugly, but we’re moving to a new place and it’s just not gonna fit into the setup there. I was wrestling with this problem when Archimago published those Fosi write-ups and I was sold, so there they are.

They’re actually a little bit difficult to set up because they’re so small and the power supply is heavier than both amps put together. So I had a little trouble getting all the wires plugged in and arranged. As Archimago suggests, I used the balanced rather than RCA connectors.

Having said all that, once they were set up, they vanished, as in, if it weren’t for the space between the speakers where the old amp used to be, I wouldn’t know the difference. They didn’t cost much. They fit in. They sound good.

One concern: These little suckers get hot when I pump music through them for an extended time. I think I’m going to want to arrange them side-by-side rather than stacked, just to reduce the chances of them cooking themselves.

Also, a mild disappoinment: They have an AUX setting where they turn themselves on when music starts and off again after a few minutes of silence. Works great. But, per Archimago’s measurements, they’re drawing 10 watts in that mode, which seems like way too much to me, and they remain warm to the touch. So, nice feature, but I guess I’ll have to flick their switches from OFF to ON like a savage when I want to listen to music.

The lesson

Maybe you love really good sound. Most of you don’t know because you’ve probably never heard it. I’m totally OK with Sonos or car-audio levels of quality when it’s background music for cooking or cleaning or driving. But sitting down facing a quality high-end system is really a different sort of thing. Not for everyone, but for some people, strongly habit-forming.

If it turns out that if you’re one of those people, it’s now smart to invest all your money in your speakers, and in fiddling with the room where they are to get the best sound out of them. For amplification and the digital parts of the chain, buy cheap close-enough-to-perfect semiconductor products.

And of course, listen to good music. Which, to be fair, is not always that well-produced or well-recorded. But at least the limiting factor won’t be what’s in the room with you.

Standing on High Ground 8 Sep 2024, 7:00 pm

That’s the title of a book coming out October 29th that has my name on the cover. The subtitle is “Civil Disobedience on Burnaby Mountain”. It’s an anthology; I’m both an author and co-editor. The other authors are people who, like me, were arrested resisting the awful “TMX” Trans Mountain pipeline project.

Cover of “Standing on High Ground

Pulling together a book with 25 contributing authors is a lot of work! One of the contributions started out as a 45-minute phone conversation, transcribed by me. The others manifested in a remarkable melange of styles, structures, and formats.

Which is what makes it fun. Five of our authors are Indigenous people. Another is Elizabeth May, leader of Canada’s Green party. There is a sprinkling of university professors and faith leaders. There are two young Tyrannosauri Rex (no, really). And then there’s me, the Internet geek.

As I wrote then, my brush with the law was very soft; arrested on the very first day of a protest sequence, I got off with a fine. Since fines weren’t stopping the protest, eventually the arrestees started getting jail time. Some of the best writing in the book is the prison narratives, all from people previously unacquainted with the pointy end of our justice system.

Quoting from my own contribution:

Let me break the fourth wall here and speak as a co-editor of the book you are now reading. As I work on the jail-time narratives from other arrestees, alternately graceful, funny, and terrifying, I am consumed with rage at the judicial system. It is apparently content to allow itself to be used as a hammer to beat down resistance to stupid and toxic rent-seeking behaviour, oblivious to issues of the greater good. At no point has anyone in the judiciary looked in the mirror as they jailed yet another group of self-sacrificing people trying to throw themselves between TMX’s engine of destruction and the earth that sustains us, and asked themselves, Are we on the right side here?

Of necessity, the law is constructed of formalisms. But life is constructed on a basis of the oceans and the atmosphere and the mesh of interdependent ecosystems they sustain. At some point, the formalisms need to find the flexibility to favour life, not death. It seems little to ask.

We asked each contributor for a brief bio, a narrative of their experience, and the statement they made to the judge at the time of their sentencing. Our contributors being what they are, sometimes we instead got poems and music-theory disquisitions and discourse on Allodial title. Cartoons too!

Which, once again, is what makes it fun. Well, when it’s not rage-inducing. After all, we lost; they built the pipeline and it’s now doing its bit to worsen the onrushing climate catastrophe, meanwhile endangering Vancouver’s civic waters and shipping economy.

Supportive quote from Bill McKibben

We got endorsements! Lots more on
the Web site and book cover.

The effort was worthwhile, though. There is reason to hope that our work helped raise the political and public-image cost of this kind of bone-stupid anti-survival project to the point that few or no more will ever be built.

Along with transcribing and editing, my contribution to the book included a couple of photos and three maps. Making the maps was massively fun, so I’m going to share them here just because I can. (Warning: These are large images.);

The first appears as a two-page spread, occupying all of the left page and the top third or so of the right.

Route of the TMX pipeline

Then there’s a map of Vancouver and the Lower Mainland, highlighting the locations where much of the book’s action took place.

The Vancouver region, highlighting TMX resistance locations

Finally, here’s a close-up of Burnaby Mountain, where TMX meets the sea, and where most of the arrests happened.

TMX resistance sites around Burnaby Mountain

The credits say “Maps by Tim Bray, based on data from Google Maps, OpenStreetMap, and TMX regulatory filings.”

I suspect that if you’re the kind of person who finds yourself reading this blog from time to time, you’d probably enjoy reading Standing on High Ground. The buy-this-book link is here. If you end up buying a copy — please do — the money will go in part to our publisher Between The Lines, who seem a decent lot and were extremely supportive and competent in getting this job done. The rest gets distributed equally among all the contributors. Each contributor is given the option of declining their share, which makes sense, since some of us are highly privileged and the money wouldn’t make any difference; others can really use the dough.

What’s next?

We’re going to have a launch event sometime this autumn. I’ll announce it here and everywhere else I have a presence. There will be music and food and drink; please come!

What’s really next is the next big harebrained scheme to pad oil companies’ shareholders’ pockets by building destructive infrastructure through irreplaceable wilderness, unceded Indigenous land, and along fragile waterways. Then we’ll have to go out and get arrested again and make it more trouble than it’s worth. It wouldn’t take that many people, and it’d be nice if you were one of them.

I put in years of effort to stop the pipeline. Based on existing laws, I concluded that the pipeline was illegal and presented those arguments to the National Energy Board review panel. When we got to the moment on Burnaby Mountain when the RCMP advanced to read out the injunction to us, I was still acting in the public interest. The true lawbreakers were elsewhere.

[From Elizabeth May’s contribution.]

Thanks!

Chiefly, to our contributors, generous with their words and time, tolerant of our nit-picky editing. From me personally, to my co-editors Rosemary Cornell and Adrienne Drobnies; we didn’t always agree on everything but the considerable work of getting this thing done left nobody with hard feelings. And, as the book’s dedication says, to all those who went out and got arrested to try to convince the powers that be to do the right thing.

I’m going to close with a picture which appears in the book. It shows Kwekwecnewtxw (“Kwe-kwek-new-tukh”), the Watch House built by the Tsleil-Waututh Nation to oversee the enemy’s work, that work also visible in the background. If you want to know what a Watch House is, you’ll need to read the very first contribution in the book, which begins “Jim Leyden is my adopted name—my spirit name is Stehm Mekoch Kanim, which means Blackbear Warrior.”

Kwekwecnewtxw, the TMX Watch House

0 dependencies! 4 Sep 2024, 7:00 pm

Here’s a tiny little done-in-a-couple-hours project consisting of a single static Web page and a cute little badge you can slap on your GitHub project.

0 dependencies!

The Web site is at 0dependencies.dev. The badge is visible on my current open-source projects, for example check out Topfew (you have to scroll down a bit).

Zero, you say?

In recent months I keep seeing these eruptions of geek angst about the fulminating masses of dependencies squirming under the surface of just about any software anyone uses for anything. The most recent, and what precipitated this, was Mike Perham’s Kill Your Dependencies.

It’s not just that dependencies are a fertile field for CVEs (*cough* xz *cough*) and tech debt, they’re also an enemy of predictable performance.

Also, they’re unavoidable. When you take a dependency, often you’re standing on the shoulders of giants. (Unfortunately, sometimes you’re standing in the shoes of clowns.) Software is accretive and it’s a good thing that that’s OK because it’s also inevitable.

In particular, don’t write your own crypto, etc. Because in software, as in life, you’re gonna have to take some dependencies. But… how about we take less? And how about, sometimes we strive for zero?

The lower you go

… the closer you are to zero. So, suppose you’re writing library code. Consider these criteria:

  • It’s low-level, might be useful to a lot of apps aimed at entirely different goals.

  • Good performance is important. Actually, let me revise that: predictably good performance is important.

  • Security is important.

If you touch all three of these bases, I respectfully suggest that you try to earn this badge:  ⓿ ⓿ dependencies! dependencies! (By the way, it’s cool that I can toss a chunk of SVG into my HTML and it Just Works. And, you can click on it.)

How to?

First, whatever programming language you’re in, try to stay within the bounds of what comes with the language. In Go, where I live these days, that means your go.sum file is empty. Good for you!

Second, be aggressive. For example, Go’s JSON support is known to be kind of slow and memory-hungry. That’s OK because there are better open-source options. For Quamina, I rejected the alternatives and wrote my own JSON parser for the hot code path. Which, to be honest, is a pretty low bar: JSON’s grammar could be inscribed on a grain of rice, or you can rely on Doug Crockford’s JSON.org.

So, get your dependencies to zero and display the badge proudly. Or if you can’t, think about each of your dependencies. Does each of them add enough value, compared to you writing the code yourself? In particular, taking a dependency on a huge general-purpose library for one small simple function is an antipattern.

What are you going to do, Tim?

I’m not trying to start a movement or anything. I just made a badge, a one-page website, and a blog post.

If I were fanatically dedicated, 0dependencies.dev would be database-backed with a React front-end and multiple Kubernetes pods, to track bearers of the badge. Uh, no.

But, I’ll keep my eyes open. And if any particularly visible projects that you know about want to claim the badge, let me know and maybe I’ll start a 0dependency hall of fame.

Long Links 2 Sep 2024, 7:00 pm

It’s been a while. Between 2020 and mid-2023, I wrote pretty regular “Long Links” posts, curating links to long-form pieces that I thought were good and I had time to read all of because, unlike my readers, I was lightly employed. Well, then along came my Uncle Sam gig, then fun Open Source with Topfew and Quamina, then personal turmoil, and I’ve got really a lot of browser tabs that I thought I’d share one day. That day is today.

Which is to say that some of these are pretty old. But still worth a look I think.

True North Indexed

Let’s start with Canadian stuff; how about a poem? No, really, check out Emergency Exit, by Kayla Czaga; touched me and made me smile.

Then there’s Canada Modern, from which comes the title of this section. It’s an endless scroll of 20th-century Canadian design statements. Go take it for a spin, it’s gentle wholesome stuff

Renaissance prof

uses this has had a pretty good run since 2009; I quote: “Uses This is a collection of nerdy interviews asking people from all walks of life what they use to get the job done.” Older readers may find that my own May 2010 appearance offers a nostalgic glow.

Anyhow, a recent entry covers “Robert W Gehl, Professor (Communication and Media Studies)”, and reading it fills me with envy at Prof. Gehl’s ability to get along on the most pristine free-software diet imaginable. I mean, I know the answer: I’m addicted to Adobe graphics software and to Apple’s Keynote. No, wait, I don’t give that many conference talks any more and when I do, I rely on preloaded set of browser tabs that my audience can visit and follow along.

If it weren’t for that damn photo-editing software. Anyhow, major hat-tip in Prof. Gehl’s direction. Some of you should try to be more like him. I should too.

Now for some tech culture.

Consensus

The IETF does most of the work of nailing down the design of the Internet in sufficient detail that programmers can read the design docs and write code that interoperates. It’s all done without voting, by consensus. Consensus, you say? What does that mean? Mark Nottingham (Mnot for short) has the details. Consensus in Internet Standards doesn’t limit its discussion to the IETF. You probably don’t need to know this unless you’re planning to join a standards committee (in which case you really do) but I think many people would be interested in how Internet-standards morlocks work.

More Mnot

Check out his Centralization, Decentralization, and Internet Standards The Internet’s design is radically decentralized. Contemporary late-capitalist business structures are inherently centralized. I know which I prefer. But the tension won’t go away, and Mnot goes way deep on the nature of the problem and what we might be able to do it.

For what it’s worth, I think “The Fediverse” is a good answer to several of Mnot’s questions.

More IETF

From last year, Reflections on Ten Years Past the Snowden Revelations is a solid piece of work. Ed Snowden changed the Internet, made it safer for everyone, by giving us a picture of what adversaries did and knew. It took a lot of work. I hope Snowden gets to come home someday.

Polling Palestinians

We hear lots of stern-toned denunciations of the Middle East’s murderers — both flavors, Zionist and Palestinian — and quite a variety of voices from inside Israel. But the only Palestinians who get quoted are officials from Hamas or the PLA; neither organization has earned the privilege of your attention. So why not go out and use modern polling methodology to find out what actual Palestinians think? The project got a write-up in the New Yorker: What It Takes to Give Palestinians a Voice. And then here’s the actual poll, conducted by the “Palestinian Center for Policy and Survey Research”, of which I know nothing. Raw random data about one of the world’s hardest problems.

Music rage

Like music? Feel like a blast of pure white-hot cleansing rage? Got what you need: Same Old Song: Private Equity Is Destroying Our Music Ecosystem. I mean, stories whose titles begin “Private equity is destroying…” are getting into “There was a Tuesday in last week” territory. But this one hit me particularly hard. I mean, take the ship up and nuke the site from orbit. It’s the only way to be sure.

Movies too

Existentially threatened by late capitalism, I mean. Hollywood’s Slo-Mo Self-Sabotage has the organizational details about how the biz is eating its seed corn in the name of “efficiency”.

I’m increasingly convinced that the whole notion of streaming is irremediably broken; these articles speak to the specifics and if they’re right, we may get to try out new approaches after the streamers self-immolate.

A target for luck

I’ve mostly not been a fan of Paul Graham. Like many, I was impressed by his early essays, then saddened as he veered into a conventional right-wing flavor that was reactionary, boring, and mostly wrong. So these days, I hesitate to recommend his writing. Having said that, here’s an outtake from How To Do Great Work:

When you read biographies of people who've done great work, it's remarkable how much luck is involved. They discover what to work on as a result of a chance meeting, or by reading a book they happen to pick up. So you need to make yourself a big target for luck, and the way to do that is to be curious. Try lots of things, meet lots of people, read lots of books, ask lots of questions.

Amen. And the humility — recognition that good outcomes need more than brains and energy — is not exactly typical of the Bay-Aryan elite, and is welcome. And there’s other thought-provoking stuff in there too, but the tone will put many off; the wisdom is dispensed with an entire absence of humility, or really any supporting evidence. And that title is a little cringey. Could have been shorter, too.

“readable, writerly web layouts”

Jeffrey Zeldman asks who will design them. It’s mostly a list of links to plausible candidates for that design role. Year-old links, now, too. But still worth grazing on if you care about this stuff, which most of us probably should.

Speaking of which, consider heather buchel’s Just normal web things. I suspect that basically 100% of the people who find their way here will be muttering FUCK YEAH! at every paragraph.

Enshittification stanzas

(My oldest tabs, I think.) I’m talking about Ellis Hamburger’s Social media is doomed to die and Cat Valente’s Stop Talking to Each Other and Start Buying Things: Three Decades of Survival in the Desert of Social Media, say many of the same things that Cory is. But with more personal from-the-inside flavor. And not without streaks of optimism.

Billionaires

It’s amazing how fast this word has become shorthand for the problem that an increasing number of people believe is at the center of the most important social pathologies: The absurd level of inequality that has has grown tumorously under modern capitalism. American billionaires are a policy failure doesn’t really focus on the injustice, but rather does the numbers, presenting a compelling argument that a society having billionaires yields little to no benefit to that society, and precious little to the billionaires. It’s sobering, enlightening, stuff.

Gotta talk about AI I guess

The “T” in GPT stands for “Transformation”. From Was Linguistic A.I. Created by Accident? comes this quote:

It’s fitting that the architecture outlined in “Attention Is All You Need” is called the transformer only because Uszkoreit liked the sound of that word. (“I never really understood the name,” Gomez told me. “It sounds cool, though.”)

Which is to say, this piece casts an interesting sidelight on the LLM origin story, starting in the spring of 2017. If you’ve put any study into the field this probably won’t teach you anything you don’t know. But I knew relatively little of this early history.

Visual falsehood

Everyone who’s taken a serious look at the intersection of AI and photography offered by the Pixel 9 has reacted intensely. The terms applied have ranged from “cool” to “terrifying”. I particularly like Sarah Jeong’s No one’s ready for this, from which a few soundbites:
“These photographs are extraordinarily convincing, and they are all extremely fucking fake.”
“…the easiest, breeziest user interface for top-tier lies…”
“…the default assumption about a photo is about to become that it’s faked…”
“A photo, in this world, stops being a supplement to fallible human recollection, but instead a mirror of it.”
We are fucked.

And that’s just the words; the picture accompanying the article are a stomach-churning stanza of visual lies.

Fortunately, I’m not convinced we’re fucked. But Google needs to get its shit together and force this AI voodoo to leave tracks, be transparent, disclose what it’s doing. We’re starting to have the tools, in particular a thing called C2PA on which I’ve had plenty to say.

Specifically, what Google needs to do is, when someone applies an AI technique to produce an image of something that didn’t happen, write a notification that this is the case into the picture’s EXIF and include that in the C2PA-signed manifest. And help create a culture where anything that doesn’t have a verifiable C2PA-signed provenance trail should be presumed a lie and neither forwarded nor reposted nor otherwise allowed to continue on its lying path.

Fade out

Here’s some beautifully performed and recorded music that has melody and integrity and grace: The Raconteurs feat. Ricky Skaggs and Ashley Monroe - Old Enough.

I wish things were a little less hectic. Because I miss having the time for Long Links.

Let’s all do the best we can with what we have.

Q Numbers Redux Explained 31 Aug 2024, 7:00 pm

[The first guest post in some years. Welcome, Arne Hormann!]

Hi there, I'm Arne. In July, I stumbled on a lobste.rs link (thanks, Carlana) that led me to Tim’s article on Q Numbers. As I'm regularly working with both floating point numbers and Go, the post made me curious — I was sure I could improve the compression scheme. I posted comments. Tim liked my idea and challenged me to create a PR. I did the encoding but felt overwhelmed with the integration. Tim took over and merged it. And since v1.4.0 released in August 28th, it's in Quamina.

In Tim’s post about that change, he wrote “Arne explained to me how it works in some chat that I can’t find, and to be honest I can’t quite look at this and remember the explanation”. Well, you're reading it right now.

Float64

Let's first talk about the data we are dealing with here. Quamina operates on JSON. While no JSON specification limits numbers to IEEE 754 floating point (Go calls them float64), RFC 8259 recommends treating them as such.

They look like this:

  • 1 bit sign; 0 is positive, 1 is negative.

  • 11 bits exponent; It uses a bias of 1023 ((1<<10)-1), and all exponent bits set means Infinity or Not a Number (NaN)

  • 1 mantissa high bit (implicit, never stored: 0 if all exponent bits are 0, otherwise 1). This is required so there's exactly one way to store each number.

  • 52 explicit mantissa bits. The 53 mantissa bits are the binary digits of the number itself.

Both 0 and -0 exist and are equal but are represented differently. We have to normalize -0 to 0 for comparability, but according to Tim’s tests, -0 cannot occur in Quamina because JSON decoding silently converts -0 to 0.

With an exponent with all bits set, Infinity has a mantissa of 0. Any other mantissa is NaN. But both sign values are used. There are a lot of different NaNs; 1<<53 - 2 different ones!

In JSON, Infinity and NaN are not representable as numbers, so we don't have to concern ourselves with them.

Finally, keep in mind that these are binary numbers, not decimals. Decimal 0.1 cannot be accurately encoded. But 0.5 can. And all integers up to 1<<53, too.

Adding numbers to Quamina

Quamina operates on UTF-8 strings and compares them byte by byte. To add numbers to it, they have to be bytewise-comparable and all bytes have to be valid in UTF-8.

Given these constraints, let's consider the problem of comparability, first.

Sortable bits

We can use math.Float64bits() to convert a float64 into its individual bits (stored as uint64).

Positive numbers are already perfectly sorted. But they are smaller than negative numbers. To fix that, we always flip the sign bit so it's 1 for positive and 0 for negative values.

Negative values are sorted exactly the wrong way around. To totally reverse their order, we have to flip all exponent and mantissa bits in addition to the sign bits.

A simple implementation would look like:


func numbitsFromFloat64(f float64) numbits {
	u := math.Float64bits(f)
	if f < 0 {
		return numbits(^u)
	}
	return numbits(u) | (1 << 63)
}

Now let's look at the actual code:


func numbitsFromFloat64(f float64) numbits {
	u := math.Float64bits(f)
	mask := (u>>63)*^uint64(0) | (1 << 63)
	return numbits(u ^ mask)
}

The mask line can be a bit of a headscratcher...

  1. u>>63 moves the sign bit of the number to the lowest bit

  2. the result will be 0 for positive values and 1 for negative values

  3. that's multiplied with ^0 - all 64 bits set to true

  4. we now have 0 for positive numbers and all bits set for negative numbers

  5. regardless, always set the sign bit with | (1 << 63)

By xoring mask to the original bits, we get our transformation for lexically ordered numbers.

The code is a bit hard to parse because it avoids branches for performance reasons.

Sortable bytes

If we were to use the uint64 now, it would work out great. But it has to be split into its individual bytes. And on little endian systems - most of the frequently used ones - the byte order will be wrong and has to be reversed. That's done by storing the bytes in big endian encoding.

UTF-8 bytes

Our next problem is the required UTF-8 encoding. Axel suggested to only use the lower 7 bits. That means we need up to 10 byte instead of 8 (7*10 = 70 fits 8*8 = 64).

Compress

The final insight was that trailing 0 bytes do not influence the comparison at all and can be dropped. Which compresses positive power-of-two numbers and integers even further. Not negative values, though - the required bit inversion messes it up.

And that's all on this topic concerning Quamina — even if there's so, so much more about floats.

Q Numbers Redux 28 Aug 2024, 7:00 pm

Back in July I wrote about Q numbers, which make it possible to compare numeric values using a finite automaton. It represented a subset of numbers as 14-hex-digit strings. In a remarkable instance of BDD (Blog-Driven Development, obviously) Arne Hormann and Axel Wagner figured out a way to represent all 64-bit floats in at most ten bytes of UTF-8 and often fewer. This feels nearly miraculous to me; read on for heroic bit-twiddling.

Numbits

Arne Hormann worked out how to rearrange the sign, exponent and mantissa that make up a float’s 64 bits into a big-endian integer that you probably couldn’t do math with but you can compare for equality and ordering. Turn that into sixteen hex digits and you’ve got automaton fuel which covers all the floats at the cost of being a little bigger.

If you want to admire Arne’s awesome bit-twiddling skills, look at numbits.go. He explained to me how it works in some chat that I can’t find, and to be honest I can’t quite look at this and remember the explanation.

    u := math.Float64bits(f)
    // transform without branching
    // if high bit is 0, xor with sign bit 1 << 63,
    // else negate (xor with ^0)
    mask := (u>>63)*^uint64(0) | (1 << 63)
    return numbits(u ^ mask)

[Update: Arne wrote it up! See Q Numbers Redux Explained.]

Even when I was puzzled, I wasn’t worried because the unit tests are good; it works.

Arne called these “numbits” and wrote a nice complete API for them, although Quamina just needs .fromFloat64() and .toUTF8(). I and Arne both thought he’d invented this, but then he discovered that the same trick was being used in the DB2 on-disk data format years and years ago. Still, damn clever, and I’ve urged him to make numbits into a standalone library.

We want less!

We care about size; Among other things, the time an automaton takes to match a value is linear (sometimes worse) in its length. So the growth from 14 to 16 bytes made us unhappy. But, no problemo! Axel Wagner pointed out that if you use base-128, you can squeeze those 64 bits into ten usable bytes of UTF-8. So now we’re shorter than the previous iteration of Q numbers while handling all the float64 values…

But wait, there’s more! Arne noticed that for purposes of equality and comparison, trailing zeroes (0x0, not ‘0’) in those 10-byte strings are entirely insignificant and can just be discarded. The final digit only has 1/128 chance of being zero, so maybe no big deal. But it turns out that you do get dramatic trailing-0 sequences in positive integers, especially small ones, which in my experience are the kinds of numbers you most often want to match. Here’s a chart of the length of the lengths the of numbits-based Q numbers for the integers zero through 100,000 inclusive.

LengthCount
11
21
3115
47590
592294

They’re all shorter than 5 until you get to 1,000.

Unfortunately, none of my benchmarks prove any performance increase because they focus on corner cases and extreme numbers; the benefits here are to the world’s most boring numbers, namely small non-negative integers.

Here I am, well past retirement age, still getting my jollies from open-source bit-banging. I hope other people manage to preserve their professional passions into later life.

Mozart Requiem 25 Aug 2024, 7:00 pm

Vancouver has many choirs, with differing proficiency levels and repertoire choices. Most gather fall-to-spring and take the summer off. Thus, Summerchor, which aggregates a couple of hundred singers from many choirs to tackle one of the Really Big choral pieces each August. This year it was the Mozart Requiem. Mozart died while writing this work and there are many “completions” by other composers. Consider just the Modern completions; this performance was of Robert Levin’s.

Mozart’s Requiem performed in St. Andrews-Wesley

Summerchor performs Mozart’s Requiem in
St.Andrews-Wesley United Church, August 24, 2024.

The 200 singers were assisted by four soloists, a piano, a trombone, the church’s excellent pipe organ, and finally, of course, by the towering arched space.

The combined power of Mozart’s music, the force of massed voices, and the loveliness of the great room yielded a entirely overwhelming torrent of beauty. I felt like my whole body was being squeezed.

Obviously, in an hour-long work, some parts are stronger than others. For me, the opening Requiem Aeternum and Kyrie hit hard, then the absolutely wonderful ascending line opening the Domine Jesu totally crushed me. But for every one of the three-thousand-plus seconds, I was left in no doubt that I was experiencing about as much beauty as a human being can.

God?

We were in a house of worship. The words we were listening to were liturgical, excerpted from Scripture. What, then, of the fact that I am actively hostile to religion? Yeah, I freely acknowledge that all this beauty I’m soaking up is founded on faith. Other people’s faith, of other times. The proportion of people who profess that (or any) faith is monotonically declining, maybe not everywhere, but certainly where I live.

Should I feel sad about that? Not really; The fact that architects and musicians worked for the Church is related to the fact that the Church was willing to pay. Musicians can manage without God, generally, as long as they’re getting paid.

The sound

We’ve been going out to concerts quite a lot recently so of course I’ve been writing about it, and usually discussing the sound quality too.

The sound at the Requiem was beyond awesome. If you look at the picture above you can see there’s a soundboard and a guy sitting at it, but I’m pretty sure the only boost was on the piano, which had to compete with 200 singers and the organ. So, this was the usual classical-music scenario: If you want dynamic range, or to hear soloists, or to blend parts, you do that with musical skill and human throats and fingers and a whole lot of practice. There’s no knob to twirl.

I mean, I love well-executed electric sound, but large-scale classical, done well, is on a whole other level.

Above, I mentioned the rising line at the opening of the Domine Jesu; the pulses, and the space between them, rose up in the endless vertical space as they rose up the scale, and yet were clearly clipped at start and end, because you don’t get that much reverb when the pews are full of soft human flesh and the roof is made of old wood, no matter how big the church is. I just don’t have words for how wonderful it sounded.

Classical?

Obviously, this is what is conventionally called “classical” music. But I’m getting a little less comfortable with that term, especially the connotation that it’s “music for old people” (even though I am one of those). Because so is rootsy rock music and bluegrass and Americana and GoGo Pengin and Guns N’ Roses.

2024 Pollscrolling 20 Aug 2024, 7:00 pm

The 2024 US election has, in the last few weeks, become the most interesting one I can recall. I’m pretty old, so that’s a strong statement; I can recall a lot of US elections. The Internet makes it way too easy to obsess over a story that’s this big and has this many people sharing opinions. Here is my opinion, not on who’s winning, but on how, with only a very moderate expenditure of time and money, you can be as well-informed as anybody in the world as to how it’s going.

Disclosures: I’m not American, got no vote on this one. Am left of the US Democratic party, but at this point they represent an infinitely better option for America than does anything connected to Donald Trump. Thus, the following remarks should be assumed to have a strong Harris/Walz bias.

I claim that using the following sites should, in 5-15 minutes a day, give you an understanding of the state of the race that is very close to that of the big-name prognosticators writing in the big-name publications.

Polls generally

They’ve been wrong a lot recently. The industry is still trying to complete the transition off landlines, which few people have any more; those who still do are an entirely unrepresentative sample. I’ve seen a few smart people guessing that pollers may be getting better, this cycle. We’ll see, won’t we?

Prognosticators generally

They suck. The arm-waving, gut-feel bullshit, and undisclosed bias is disgusting. In the pages of the really big properties like the NYT and WaPo, the opinion columnists are mostly partisan hacks looking for reasons to explain why their favored party is doing just fine and the other is failing. I’ve pretty well given up reading them.

OK, now let’s get to our sources, in no particular order.

Talking Points Memo

Talking Points Memo

TPM is the home of Josh Marshall, the ur-blogger on US Politics, and still as good as anyone. He’s hired a bunch of other clear and clear-eyed writers, who obsess about elections all day every day. They have a strong and acknowledged pro-Democratic and anti-Trump bias, but in my opinion don’t let it clutter their analyses. If you’re reading the polling sites that I’m recommending below, TPM will have smart interpretive pieces about what they might mean.

A lot of their stuff is free, but the best isn’t. Unfortunately, they only offer annual subscriptions at US$70; I’ve subscribed for many years. It depends on how fierce the election monkey on your back is, but in my personal experience, there is no other site that offers more depth and clarity on US politics. Maybe worth your money for one year, this year.

Silver Bulletin

Nate Silver

The former 538 guy is now at the Silver Bulletin. Nate is not everyone’s cup of tea, but he retained the IP rights to the 538 election model, which is updated on a daily basis. He admits to being moderately pro-Harris this time around but argues convincingly that he doesn’t let this cloud his analysis, which I generally find pretty cogent. You probably need a little bit of statistical literacy to fully appreciate this stuff.

As of today, August 20, 2024, his probability-of-win numbers are Harris 53.6%, Trump 45.7%.

The Silver Bulletin has some free stuff, but the best parts, including the model updates, are paywalled. The price is currently $14/month, but on September 1st they’re going up to $20, just until the election is over. Because a lot of people like me signed up with every expectation of unsubscribing in three months.

If you happen to care about professional sports there’s lots of brain candy there too.

538 Poll Report

538

Their Latest Polls page is now part of ABC news but seems to retain many of the virtues that made 538 the flavor-of-the-month a few years ago.

The crucial thing, if you’re visiting once a day, is to find the “Sort by Date” widget and click on “Added”, which brings the most recent stuff that you probably haven’t seen yet to the top.

As I write this, their national polling average has Harris 46.6%, Trump 43.8%. This is quite different from Silver’s “probability of winning”. 538’s main virtue is that they get the polls up about as fast as anyone else. It’s free.

RealClearPolitics

RCP

I kind of hate to mention RealClearPolitics because it is at least in part a hive of filthy MAGA-friendly screamers. Their front page is all links, MAGA-dominated but including a sprinkling of analysis from more even-handed or openly-progressive sources. Anyhow, the problem with that stuff isn’t the bias, it’s the fact that they’re just a bunch of low-value prognosticators. I wouldn’t waste much time on the front page, I’d start by clicking on “Polls”, near the top left.

Their aggregation of poll results will contain about what you’ll see at 538 (sometimes important polls get to one place first, sometimes the other). But RCP offers other useful resources. There is the RCP Pollster Scorecard, which offers data on the accuracy and bias of most of the pollers whose results they report. Since some of those pollers are super extra biased, this can be a useful sanity check.

What I really like is the Electoral College Map, which as I write is predicting a Trump victory, 287-251 in EC votes. You can click on each state and see the polls they used to compute their prediction for that state.

I think their MAGA bias shows in the predictions, but that’s OK, because there’s a “Create Your Own Map” link, where you can disagree with them and explore each side’s path to victory or defeat. Looking at today’s map, my conclusion is that if Harris can flip Pennsylvania from red to blue she probably wins, and if she can bring along either or both of Arizona and North Carolina, Trump is roadkill.

CNN

CNN

No, really. It’s cheesy and overhyped but feels to me like it’s speaking to a pretty big constituency that I don’t know anybody from, there’s a bit of zeitgeist in the flow. To my eye it leans a little more Dem than GOP (that’s a surprise) and is not actually terrible.

When?

My advice is to wait until at least mid-afternoon, when the polls of the day have been published and ingested, then put in your pollscrolling time. Won’t take too long and you’ll know what the allegedly-smart people know.

Basic Infrastructure 13 Aug 2024, 7:00 pm

Recently, I was looking at the infrastructure bills for our CoSocial co-op member-owned Mastodon instance, mostly Digital Ocean and a bit of AWS. They seemed too high for what we’re getting. Which makes me think about the kind of infrastructure that a decentralized social network needs, and how to get it.

I worked at AWS for 5½ years and part of my job was explaining why public-cloud infrastructure is a good idea. I had no trouble doing that because, for the people who are using it, it was (and is) a good idea. The public cloud offers a quality of service, measured by performance, security, and durability, that most customers couldn’t build by themselves. One way to put it is this: If you experience problems in those areas, they are much more likely to be problems in your software than in the cloud infrastructure.

Of course, providing this level of service costs billions in capex and salaries for thousands of expensive senior engineers. So you can expect your monthly cloud-services bill to be substantial.

But what if…

What if you don’t need that quality of service? What if an hour of downtime now and then was an irritant but not an existential problem? What if you were OK with occasionally needing to restore data from backup? What if everything on your server was public data and not interesting to bad actors?

Put another way, what if you were running a small-to-medium Fediverse instance?

If it goes offline occasionally, nobody’s life is damaged much. And, while I grant that this is not well-understood, at this point in time everything on Fedi should be considered public, and I don’t think that’ll change even when we get end-to-end encryption because that data of course isn’t plain text. Here is what you care about:

  1. Members’ posts don’t get permanently lost.

  2. You don’t want bad people hijacking your members’ accounts and posting damaging stuff.

  3. You don’t want to provision and monitor a relational database.

“Basic”?

So, what I want for decentralized social media is computers and storage “in the cloud”, as in I don’t want to have to visit them physically. But I don’t need them to be very fast or to be any more reliable than modern server and disk hardware generally are. I do need some sort of effective backup/restore facility, and I want good solid modern authentication.

And, of course, I want this to be a whole lot cheaper than the “enterprise”-facing public cloud. Because I’m not an enterprise.

(I think I still need a CDN. But that’s OK because they’re commoditized and competitive these days.)

I know this is achievable. What I don’t know is who might want to offer this kind of infrastructure. I think some of it is already out there, but you have to be pretty savvy about knowing who the vendors are and their product ranges and strengths and weaknesses.

Maybe we don’t need any new products, just a new product category, so people like me know which products to look at.

How about “Basic Infrastructure”?

Countrywomen 11 Aug 2024, 7:00 pm

In the last couple of weeks I’ve been at shows by Molly Tuttle and Sierra Ferrell (I recommend clicking both those links just for the front-page portraits). Herewith thoughts on the genres, performances, and sound quality.

Tuttle is post-bluegrass and Ferrell is, um, well, Wikipedia says “folk, bluegrass, gypsy jazz, and Latin styles” which, OK, I guess, but it doesn’t mention pure old-fashioned country, her strongest flavor. These days, “Americana” is used to describe both these artists. The notion that Americana implies “by white people” is just wrong, check out Rhiannon Giddens’ charming video on the origin of the banjo. (Ms Giddens is a goddess; if you don’t know about her, check her out.)

Both bands (for brevity, just Molly and Sierra) feature mandolin, fiddle, stand-up bass, and acoustic guitar. Molly adds banjo, Sierra drums and occasional electric guitar. Both offer flashy instrumental displays; Molly adds big group meltdowns, veering into jam-band territory. Both women sing divinely and the bands regularly contribute lovely multi-part harmonies.

I think that Americana just now is one of the most interesting living musical directions. These artists are young, are standing on firm foundations, and are pushing into new territory. Judging by the crowds these days I’m not alone, so for those who agree, I’ll offer a few words on each performance.

Molly Tuttle and Golden Highway at the Hollywood Theatre

Interestingly, this is the same venue where I first saw Sierra, back in March of 2022. It’s intimate and nice-looking and has decent sound.

The crowd was pretty grey-haired; the previous week we’d taken in an Early Music Vancouver concert dedicated to Gabrieli (1557-1612) and the age demographic wasn’t that different, except for Molly’s fans wear jeans and leather and, frequently, hippie accoutrements. It dawns on me that bluegrass is in some respects a “classical” genre; It has lots of rules and formalisms and an absolute insistence on virtuosic skill.

She played a generous selection of favorites (El Dorado, Dooley’s Farm, Crooked Tree) and exceptionally tasty covers (Dire Wolf, She’s a Rainbow). The band was awesomely tight and Molly was in fine form.

Molly Tuttle and Golden Highway

In most pictures of Molly she has hair, but during her concerts she usually tells the story of how as a young child she had Alopecia universalis (total all-body hair loss) and, particularly if the concert venue is warm, whips off her wig. At this show she talked about how, on behalf of a support organization, she’d visited a little Vancouver girl with Alopecia, and how sad she was that the kid couldn’t come to the show since the Hollywood is also a bar. It was touching; good on her.

Molly, a fine singer and songwriter, is also a virtuoso bluegrass guitar flat-picker and her band are all right up there, so the playing on balance was probably a little finer than Sierra’s posse offered. And as I mentioned, they do the occasional jam-band rave-up, which I really enjoyed.

But their sound guy needs to be fired. I was at the show alone and thus found a corner to prop myself up that happened to be right behind this bozo’s desk. He had a couple of devices that I didn’t recognize, with plenty of sliders, physical and on-screen, and he was hard at work from end to end “enhancing” the sound. He threw oceans of echo on Molly’s voice then yanked it out, injected big rumble on song climaxes, brightened up the banjo and mandolin so they sounded like someone driving nails into metal, and slammed the balance back and forth to create fake stereo when licks were being traded. This sort of worked when they were doing the extended-jam thing, but damaged every song that relied on sonic truth or subtlety, which was most of them. Feaugh. Concert sound people should get out of the fucking way and reproduce what the musicians are playing. I guess Molly must like this or she wouldn’t have hired him? I wish she could come out and hear what it sounds like though.

Anyhow, it’s a good band that plays good songs with astonishing skill. If you’re open to this kind of music you’d enjoy their show.

The last encore was Helpless. I’m not 100% sure that Molly knew what she was in for. Every grey-haired Canadian knows that tune and every word of its lyrics. So as soon as she was three words in, the whole audience was booming along heartily, having a fine time. Quite a few grizzled cheeks were wet with tears, but I thought Molly looked a little taken aback. She went with it, and it was lovely.

Sierra Ferrell at the Orpheum

This hall is one of Vancouver’s two big venues where the symphony plays, operas are presented, and so on. It opened in 1927 and the decor is lavish, tastefully over-the-top, but ignore the execrable ceiling art.

Sierra Ferrell

Sierra Ferrell, singing Whispering Waltz.

On this picture, my usually-trusty Pixel 7 failed me. The focus is unacceptably bad but I’m running it anyhow to share Sierra’s outfit, which is as always fabulous. She kicked up her heels once or twice, revealing big tall Barbie-pink boots under that dress.

The audience had plenty of greybeards but on balance was way younger than Molly’s, with a high proportion of women dressed to the nines in Western-wear finery and some of the prettiest dresses I’ve seen in years. It was really a lot of fun just to look around and enjoy the shapes that Sierra’s influence takes.

Sierra is a wonderful singer but those songs, wow, I’m sure some of them will be loved and shared long after I and she are in the grave. Her set didn’t leave out any of the favorites. There were a few covers, notably Me and Bobby McGee, which was heartbreaking and then rousing. Before starting Sierra acknowledged her debt to Janis Joplin, whom I never saw, but I felt Janis there in spirit.

Everybody is going to have a few favorites among her songs. The three-song sequence, Lighthouse, The Sea, and Far Away Across the Sea, was so beautiful it left me feeling emptied. They turned Far Away into a rocker with a bit of extended jamming and it was just wonderful.

But the thing about a Sierra Ferrell show isn’t just the songs or the singing or the playing, it’s her million watts of charisma, and the connection with the crowd. People kept bringing her floral garlands and, after “Garden”, someone ran up to the stage with a little potted plant. There are some people who, when they get up on the stage, you just can’t take your eyes off them, and she’s one of those. I’m pretty confident that if she keeps holding it together and writing those songs, she’s headed for Dolly Parton territory in terms of fame and fortune.

Any complaints? Yes, this was the first stop on a new tour and the sound was initially pretty rough, but they got it fixed up so that’s forgivable. There’s still a problem: When Sierra leans into a really big note she overloads whatever mike they’re using; not sure what the cure is for that.

Another gripe: Sierra used to have a part of the set where the band gathered around an old-school radio mike with acoustic instruments and played in a very traditional style. I think she shouldn’t leave that out.

Finally, one more problem: Vancouver loves Sierra just a little too much. Every little vocal flourish, every cool little instrumental break, every one of those got a huge roar of approval from the crowd, which, fine, but some of those songs take the level way down and then back up again in a very artful way, and I wished the crowd would shut up, let Sierra drive, and clap at the end of the song.

Americana

Like I said, this is where some of the most interesting living artists are digging in and doing great work. Highly recommended.

Invisible Attackers 30 Jul 2024, 7:00 pm

In the last few days we’ve had an outburst of painful, intelligent, useful conversation about racism and abuse in the world of Mastodon and the Fediverse. I certainly learned things I hadn’t known, and I’m going to walk you through the recent drama and toss in ideas on how to improve safety.

For me, the story started back in early 2023 when Timnit Gebru (the person fired by Google for questioning the LLM-is-great orthodoxy, co-author of Stochastic Parrots) shouted loudly and eloquently that her arrival on Mastodon was greeted by a volley of racist abuse. This shocked a lot of hyperoverprivileged people like me who don’t experience that stuff. As the months went on after that, my perception was that the Mastodon community had pulled up its moderation socks and things were getting better.

July 2024

Then, just this week, Kim Crayton issued a passionate invitation to the “White Dudes for Kamala Harris” event, followed immediately by examples of the racist trolling that she saw in response. With a content warning that this is not pretty stuff, here are two of her posts: 1, 2.

Let me quote Ms Crayton:

The racist attacks you’ve witnessed directed at me since Friday, particularly by instances with 1 or 2 individuals, SHOULD cause you to ask “why?” and here’s the part “good white folx” often miss…these attacks are about YOU…these attacks are INTENDED to keep me from putting a mirror in your faces and showing you that YOU TOO are harmed by white supremacy and anti-Blackness…these attacks are no different than banning books…they’re INTENDED to keep you IGNORANT about the fact that you’re COMPLICIT

She quite appropriately shouted at the community generally and the Mastodon developers specifically. Her voice was reinforced by many others, some of whom sharpened the criticism by calling the Mastodon team whiteness-afflicted at best and racist at worst.

People asked a lot of questions and we learned a few things. First of all, It turns out that some attackers came from instances that are known to be toxic and should long-since have been defederated by Ms Crayton’s. Defederation is the Fediverse’s nuclear weapon, our best tool for keeping even the sloppiest admins on their toes. To the extent our tools work at all, they’re useless if they’re not applied.

But on the other hand it’s cheap and fast to spin up a single-user Mastodon instance that won’t get defederated until the slime-thrower has thrown slime.

Invisibility

What I’ve only now come to understand is that Mastodon helps griefers hide. Suppose you’re on instance A and looking at a post from instance B, which has a comment from an account on instance C. Whether or not you can see that comment… is complicated. But lots of times, you can’t. Let me excerpt a couple of remarks from someone who wishes to remain anonymous.

Thinking about how mastodon works in the context of all the poc i follow who complain constantly about racist harassment and how often i look at their mentions and how I’ve literally never seen an example of the abuse they’re experiencing despite actively looking for it.

It must be maddening to have lots of people saying horrible things to you while nobody who’d be willing to defend you can see anyone doing anything to you.

But also it really does breed suspicion in allies. I believe it when people say they’re being harassed, but when I’m looking for evidence of it on two separate instances and not ever seeing it? I have to step hard on the part of me that’s like … really?

Take-away 1

This is a problem that the Masto/Fedi community can’t ignore. We can honestly say that up till now, we didn’t realize how serious it was. Now we know.

Take-away 2

Let’s try to cut the Mastodon developers some slack. Here’s a quote from one, in a private chat:

I must admit that my mentions today are making me rethink my involvement in Mastodon

I am burning myself out for this project for a long time, not getting much in return, and now I am a racist because I dont fix racism.

I think it is entirely reasonable to disagree with the team, which is tiny and underfunded, on their development priorities. Especially after these last few days, it looks like a lot of people — me, for sure — failed to dive deep into the narrated experience of racist abuse. In the team’s defense, they’re getting yelled at all the time by many people, all of whom have strong opinions about their feature that needs to ship right now!

Conversations

One of the Black Fedi voices that most influences me is Mekka Okereke, who weighed in intelligently, from which this, on the subject of Ms Crayton:

  • She should not have to experience this

  • It should be easier for admins at DAIR, and across the whole Fediverse, to prevent this

Mekka has set up a meeting with the Mastodon team and says Ms Crayton will be coming along. I hope that turns out to be useful.

More good input

Let’s start with Marco Rogers, also known as @polotek@social.polotek.net. I followed Marco for ages on Twitter, not always agreeing with his strong opinions on Web/Cloud technology, but always enjoying them. He’s been on Mastodon in recent months and, as usual, offers long-form opinions that are worth reading.

He waded into the furore around our abuse problem, starting here, from which a few highlights.

I see a lot of the drama that is happening between people of color on the platform and the mastodon dev team. I feel like I need to help.

If people of color still find ourselves dependent on a small team of white devs to get what we want, that is a failure of the principles of the fediverse.

I want to know how I can find and support people that are aligned with my values. I want to enable those people to work on a platform that I can use. And we don't need permission from the mastodon team to do so. They're not in charge.

Mekka, previously mentioned, re-entered the fray:

If you run a Mastodon instance, and you don't block at least the minimum list of known terrible instances, and you have Black users, it's just a matter of time before your users face a hate brigade.

That's the only reason these awful instances exist. That's all they do.

Telling users "Just move to a better server!" is supremely unhelpful. It doesn't help the mods, and it doesn't help the users.

It needs to be easier. It's currently too hard to block them and keep up with the new ones.

And more; this is from Jerry Bell, one of the longest-lasting Fediverse builders (and I think the only person I’m quoting here who doesn’t present as Black). These are short excerpts from a long and excellent piece.

I am writing this because I'm tired of watching the cycle repeat itself, I'm tired of watching good people get harassed, and I'm tired of the same trove of responses that inevitably follows.

… About this time, the sea lions show up in replies to the victim, accusing them of embracing the victim role, trying to cause racial drama, and so on.

A major factor in your experience on the fediverse has to do with the instance you sign up to. Despite what the folks on /r/mastodon will tell you, you won't get the same experience on every instance.

What next?

I don’t know. But I feel a buzz of energy, and smart people getting their teeth into the meat of the problem.

Now I have thoughts to offer about moving forward.

Who are the enemy?

They fall into two baskets: Professional and amateur. I think the current Mastodon attackers are mostly amateurs. These are lonely Nazis, incels, channers, your basic scummy online assholes. Their organization is loose at best (“He’s pointing at her, so I will too”), and they’re typically not well-funded nor are they deep technical experts.

Then there are the pros, people doing this as their day job. I suspect most of those are working for nation states, and yes, we all know which nation states those probably are. They have sophisticated automation to help them launch armies of bots.

Here are some suggestions about potential fight-backs, mostly aimed at amateurs.

Countermeasure: Money

There’s this nonprofit called IFTAS which is working on tools and support structures for moderation. How about they start offering a curated allowlist of servers that it’s safe to federate with? How do you get on that list? Pay $50 to IFTAS, which will add you to the watchlist, and also to a service scanning your members’ posts for abusive stuff during your first month or so of operation.

Cue the howls of outrage saying “Many oppressed people can’t afford $50, you’re discriminating against the victims!” I suppose, but they can still get online at any of the (many) free-to-use instances. I think it’s totally reasonable to throw a $50 roadblock in the process of setting up a server.

In this world, what happens? Joe Incel sets up an instance at ownthelibs.nazi or wherever, pays his $50, and starts throwing slime. This gets reported and pretty soon, he’s defederated. Sure, he can do it again. But how many times is this basement-dweller willing to spend $50, leaving a paper trail each time just in case he says something that’s illegal to say in the jurisdiction where he lives? Not that many, I think?

Countermeasure: Steal from Pleroma

It turns out Mastodon isn’t the only Fediverse software. One of the competitors is Pleroma. Unfortunately, it seems to be the server of choice for our attackers, because it’s easy and cheap to set up. Having said that, its moderation facilities are generally regarded as superior to Mastodon’s, notably a subsystem called Message Rewrite Facility (MRF) which I haven’t been near but is frequently brought up as something that would be useful.

Countermeasure: Make reporting better

I report abusive posts sometimes, and, as a moderator for CoSocial.ca I see reports too. I think the “Report post” interface on many clients is weak, asking you unnecessary questions.

And when I get a report, it seems like half the time none of the abusive material is attached, and it takes me multiple clicks to look at the reported account’s feed, which feels like a pretty essential step.

Here’s how I’d like reporting to work.

  1. There’s a single button labeled “Report this post”. When you click it, a popup says “Reported, thanks” and you’re done. Maybe it could query whether you want to block the user or instance, but it’s super important that the process be lightweight.

  2. The software should pull together a report package including the reported post’s text and graphics. (Not just the URLs, because the attackers like to cover their tracks.) Also the attacker’s profile page. No report should ever be filed without evidence.

Countermeasure: Rules of thumb

Lauren offered this: Suppose a reply or mention comes in for someone on Instance A from someone on Instance B. Suppose Instance A could check whether anyone else on A follows anyone on B. If not, reject the incoming message. This would have to be a per-user not global setting, and I see it as a placeholder for a whole class of heuristics that could usefully get in the attackers’ way.

Wish us luck

Obviously I’m not claiming that any of these ideas are the magic bullet that’s going to slay the online-abuse monster. But we do need ideas to work with, because it’s not a monster that we can afford to ignore.

I care intensely about this, because I think decentralization is an essential ingredient of online conversation, and online conversation is valuable, and if we can’t make it safe we won’t have it.

Union of Finite Automata 28 Jul 2024, 7:00 pm

In building Quamina, I needed to compute the union of two finite automata (FAs). I remembered from some university course 100 years ago that this was possible in theory, so I went looking for the algorithm, but was left unhappy. The descriptions I found tended to be hyper-academic, loaded with mathematical notation that I found unhelpful, and didn’t describe an approach that I thought a reasonable programmer would reasonably take. The purpose of this ongoing entry is to present a programmer-friendly description of the problem and of the algorithm I adopted, with the hope that some future developer, facing the same problem, will have a more satisfying search experience.

There is very little math in this discussion (a few subscripts), and no circles-and-arrows pictures. But it does have working Go code.

Finite automata?

I’m not going to rehash the theory of FAs (often called state machines). In practice the purpose of an FA is to match (or fail to match) some input against some pattern. What the software does when the input matches the pattern (or doesn’t) isn’t relevant to our discussion today. Usually the inputs are strings and the patterns are regular expressions or equivalent. In practice, you compile a pattern into an FA, and then you go through the input, character by character, trying to traverse the FA to find out whether it matches the input.

An FA has a bunch of states, and for each state there can be a list of input symbols that lead to transitions to other states. What exactly I mean by “input symbol” turns out to be interesting and affects your choice of algorithm, but let’s ignore that for now.

The following statements apply:

  1. One state is designated as the “start state” because, well, that’s where you start.

  2. Some states are called “final”, and reaching them means you’ve matched one or more patterns. In Quamina’s FAs, each state has an extra field (usually empty) saying “if you got here you matched P*, yay!”, where P* is a list of labels for the (possibly more than one) patterns you matched.

  3. It is possible that you’re in a state and for some particular input, you transition to more than one other state. If this is true, your FA is nondeterministic, abbreviated NFA.

  4. It is possible that a state can have one or more “epsilon transitions”, ones that you can just take any time, not requiring any particular input. (I wrote about this in Epsilon Love.) Once again, if this is true, you’ve got an NFA. If neither this statement nor the previous are true, it’s a deterministic finite automaton, DFA.

The discussion here works for NFAs, but lots of interesting problems can be solved with DFAs, which are simpler and faster, and this algorithm works there too.

Union?

If I have FA1 that matches “foo” and FA2 that matches “bar”, then their union, FA1 ∪ FA2, matches both “foo” and “bar”. In practice Quamina often computes the union of a large number of FAs, but it does so a pair at a time, so we’re only going to worry about the union of two FAs.

The academic approach

There are plenty of Web pages and YouTubes covering this. Most of them are full of Greek characters and math symbols. They go like this:

  1. You have two FAs, call them A and B. A has states A1, … AmaxA, B has B1, … BmaxB

  2. The union contains all the states in A, all the states in B, and the “product” of A and B, which is to say states you could call A1B1, A1B2, A2B1, A2B2, … AmaxABmaxB.

  3. For each state AXBY, you work out its transitions by looking at the transitions of the two states being combined. For some input symbol, if AX has a transition to AXX but BY has no transition, then the combined state just has the A transition. The reverse for an input where BY has a transition but AX doesn’t. And if AX transitions to AXX and BY transitions to BYY, then the transition is to AXXBYY.

  4. Now you’ll have a lot of states, and it usually turns out that many of them aren’t reachable. But there are plenty of algorithms to filter those out. You’re done, you’ve computed the union and A1B1 is its start state!

Programmer-think

If you’re like me, the idea of computing all the states, then throwing out the unreachable ones, feels wrong. So here’s what I suggest, and has worked well in practice for Quamina:

  1. First, merge A1 and B1 to make your new start state A1B1. Here’s how:

  2. If an input symbol causes no transitions in either A1 or B1, it also doesn’t cause any in A1B1.

  3. If an input symbol causes a transition in A1 to AX but no transition in B1, then you adopt AX into the union, and any other A states it points to, and any they point to, and so on.

  4. And of course if B1 has a transition to BY but A1 doesn’t transition, you flip it the other way, adopting BY and its descendents.

  5. And if A1 transitions to AX and B1 transitions to BY, then you adopt a new state AXBY, which you compute recursively the way you just did for A1B1. So you’ll never compute anything that’s not reachable.

I could stop there. I think that’s enough for a competent developers to get the idea? But it turns out there are a few details, some of them interesting. So, let’s dig in.

“Input symbol”?

The academic discussion of FAs is very abstract on this subject, which is fair enough, because when you’re talking about how to build, or traverse, or compute the union of FAs, the algorithm doesn’t depend very much on what the symbols actually are. But when you’re writing code, it turns out to matter a lot.

In practice, I’ve done a lot of work with FAs over the years, and I’ve only ever seen four things used as input symbols to drive them. They are:

  • Unicode “characters” represented by code points, integers in the range 0…1,114,111 inclusive.

  • UTF-8 bytes, which have values in the range 0…244 inclusive.

  • UTF-16 values, unsigned 16-bit integers. I’ve only ever seen this used in Java programs because that’s what its native char type is. You probably don’t want to do this.

  • Enum values, small integers with names, which tend to come in small collections.

As I said, this is all I’ve seen, but 100% of the FAs that I’ve seen automatically generated and subject to set-arithmetic operations like Union are based on UTF-8. And that’s what Quamina uses, so that’s what I’m going to use in the rest of this discussion.

Code starts here

This comes from Quamina’s nfa.go. We’re going to look at the function mergeFAStates, which implements the merge-two-states logic described above.

Lesson: This process can lead to a lot of wasteful work. Particularly if either or both of the states transition on ranges of values like 0…9 or a…z. So we only want to do the work merging any pair of states once, and we want there only to be one merged value. Thus we start with a straightforward memo-ization.

func mergeFAStates(state1, state2 *faState, keyMemo map[faStepKey]*faState) *faState {
    // try to memo-ize
    mKey := faStepKey{state1, state2}
    combined, ok := keyMemo[mKey]
    if ok {
        return combined
    }

Now some housekeeping. Remember, I noted above that any state might contain a signal saying that arriving here means you’ve matched pattern(s). This is called fieldTransitions, and the merged state obviously has to match all the things that either of the merged states match. Of course, in the vast majority of cases neither merged state matched anything and so this is a no-op.

    fieldTransitions := append(state1.fieldTransitions, state2.fieldTransitions...)

Since our memo-ization attempt came up empty, we have to allocate an empty structure for the new merged state, and add it to the memo-izer.

    combined = &faState{table: newSmallTable(), fieldTransitions: fieldTransitions}
    keyMemo[mKey] = combined

Here’s where it gets interesting. The algorithm talks about looking at the inputs that cause transitions in the states we’re merging. How do you find them? Well, in the case where you’re transitioning on UTF-8 bytes, since there are only 244 values, why not do the simplest thing that could possibly work and just check each byte value?

Every Quamina state contains a table that encodes the byte transitions, which operates like the Go construct map[byte]state. Those tables are implemented in a compact data structure optimized for fast traversal. But for doing this kind of work, it’s easy to “unpack” them into a fixed-sized table; in Go, [244]state. Let’s do that for the states we’re merging and for the new table we’re building.

    u1 := unpackTable(state1.table)
    u2 := unpackTable(state2.table)
    var uComb unpackedTable

uComb is where we’ll fill in the merged transitions.

Now we’ll run through all the possible input values; i is the byte value, next1 and next2 are the transitions on that value. In practice, next1 and next2 are going to be null most of the time.

    for i, next1 := range u1 {
        next2 := u2[i]

Here’s where we start building up the new transitions in the unpacked array uComb.

For many values of i, you can avoid actually merging the states to create a new one. If the transition is the same in both input FAs, or if either of them are null, or if the transitions for this value of i are the same as for the last value. This is all about avoiding unnecessary work and the switch/case structure is the result of a bunch of profiling and optimization.

        switch {
        case next1 == next2: // no need to merge
            uComb[i] = next1
        case next2 == nil: // u1 must be non-nil
            uComb[i] = next1
        case next1 == nil: // u2 must be non-nil
            uComb[i] = next2
        case i > 0 && next1 == u1[i-1] && next2 == u2[i-1]: // dupe of previous step - happens a lot
            uComb[i] = uComb[i-1]

If none of these work, we haven’t been able to avoid merging the two states. We do that by a recursive call to invoke all the logic we just discussed.

There is a complication. The automaton might be nondeterministic, which means that there might be more than one transition for some byte value. So the data structure actually behaves like map[byte]*faNext, where faNext is a wrapper for a list of states you can transition to.

So here we’ve got a nested loop to recurse for each possible combination of transitioned-to states that can occur on this byte value. In a high proportion of cases the FA is deterministic, so there’s only one state from each FA being merged and this nested loop collapses to a single recursive call.

        default: // have to recurse & merge
            var comboNext []*faState
            for _, nextStep1 := range next1.states {
                for _, nextStep2 := range next2.states {
                    comboNext = append(comboNext, mergeFAStates(nextStep1, nextStep2, keyMemo))
                }
            }
            uComb[i] = &faNext{states: comboNext}
        }
    }

We’ve filled up the unpacked state-transition table, so we’re almost done. First, we have to compress it into its optimized-for-traversal form.

    combined.table.pack(&uComb)

Remember, if the FA is nondeterministic, each state can have “epsilon” transitions which you can follow any time without requiring any particular input. The merged state needs to contain all the epsilon transitions from each input state.

    combined.table.epsilon = append(state1.table.epsilon, state2.table.epsilon...)

    return combined
}

And, we’re done. I mean, we are once all those recursive calls have finished crawling through the states being merged.

Is that efficient?

As I said above, this is an example of a “simplest thing that could possibly work” design. Both the recursion and the unpack/pack sequence are kind of code smells, suggesting that this could be a pool of performance quicksand.

But apparently not. I ran a benchmark where I added 4,000 patterns synthesized from the Wordle word-list; each of them looked like this:

{"allis": { "biggy": [ "ceils", "daisy", "elpee", "fumet", "junta", … (195 more).

This produced a huge deterministic FA with about 4.4 million states, with the addition of these hideous worst-case patterns running at 500/second. Good enough for rock ’n’ roll.

How about nondeterministic FAs? I went back to that Wordle source and, for each of its 12,959 words, added a pattern with a random wildcard; here are three of them:

{"x": [ {"shellstyle": "f*ouls" } ] }
{"x": [ {"shellstyle": "pa*sta" } ] }
{"x": [ {"shellstyle": "utter*" } ] }

This produced an NFA with 46K states, the addition process ran at 70K patterns/second.

Sometimes the simplest thing that could possibly work, works.

Terse Directions 19 Jul 2024, 7:00 pm

This post describes a service I want from my online-map provider. I’d use it all the time. Summary: When I’m navigating an area I already know about, don’t give me turn-by-turn, just give me a short list of the streets to take.

I’ve been living in Vancouver for decades and, driving or cycling, know how to get almost anywhere. It helps that, like most North American cities, we have a fairly regular north-south-east-west grid. These days, when I’m going any distance by car, I get directions from Google Maps because it knows where the traffic is bad, and the traffic is usually bad somewhere. But it gives me way more directions than I need. Let’s look at a concrete example.

Part of central and east Vancouver

To follow the narrative, you’ll probably have to click to expand, and it’s pretty big. It may be easier just to open the map in another window.
Map credit: OpenStreetMap.

My place is near the bottom edge of this map off to the west, between Cambie and Main streets. I occasionally attend a meetup at New Brighton Park, which is the little splodge of green at the very top right corner of the map, across the highway from Hastings Racecourse. It’s on McGill street. To get there, I have to go quite a distance both east and north. Candidates for north/south travel include Main, Clark, Nanaimo, and Renfrew streets. Candidates for east/west include 12th, Broadway, 1st, and Hastings.

Right now, Google Maps insists on turn-by turn, with three warnings for each turn. It’s dumb and annoying and interrupts whatever music or show I’m listening to.

What I want is to get in the car and say “Short directions to New Brighton Park” and have it say “Take Main to 12th to Nanaimo to 1st to Renfrew to McGill.” Then when I’m driving, I’d get one vocal warning a block out from each turn, like “Next left on Nanaimo” or some such.

Of course, when I’m navigating in a strange place, I’d want the traditional turn-by-turn. Don’t know about you, but the bulk of my navigation is in territory I know and mostly about avoiding traffic.

The OpenStreetMap data is public and good. Traffic data is… a problem. But for anyone who has it, you can have me for a customer. Just learn to be terse.

2009 Ranger 12 Jul 2024, 7:00 pm

This week we’re vacationing at the family cabin on an island; the nearest town is Gibsons. Mid-week, we hit town to pick up groceries and hardware. Unfortunately, it’s a really demanding walk from the waterfront to the mall, particularly with a load to carry, and there’s little public transit. Fortunately, there’s Coast Car Co-op, a competent and friendly little five-car outfit. We booked a couple of hours and the closest vehicle was a 2009 Ford Ranger, described as a “compact pickup” or “minitruck”. It made me think.

2009 Ford Ranger

Think back fifteen years

I got in the Ranger and tried to adjust the seat, but that was as far back as it went. It didn’t go up or down. There were no cameras to help me back up. There was nowhere to plug my phone in. It had a gearshift on the steering column that moved a little red needle in a PRNDL tucked under the speedometer. There was no storage except for the truck bed. It wasn’t very fast. The radio was just a radio. It was smaller than almost anything on the road. I had to manipulate a a physical “key” thing to make it go. To open and close the window, you have to turn a crank. I bet there were hardly any CPUs under the hood.

And, it was… perfectly OK.

The rear-view mirrors were big and showed me what I needed. It was dead easy to park, I could see all four of its corners. There was enough space in the back to carry all our stuff with plenty room to spare. You wouldn’t want to drive fast in a small tourist town with lots of steep hills, blind corners, and distracted pedestrians. It wasn’t tracking my trips and selling the info. The seats were comfy enough.

Car companies: Dare to do less

I couldn’t possibly walk away from our time in the Ranger without thinking about the absolutely insane amounts of money and resources and carbon loading we could save by building smaller, simpler, cheaper, dumber, automobiles.

Q Numbers 9 Jul 2024, 7:00 pm

This ongoing fragment describes how to match and compare numbers using a finite automaton, which involves transforming them into strings with the right lexical properties. My hope is that there are at least twelve people in the world who are interested in the intersection of numeric representation and finite automata.
[Note: This whole piece, except for the description of the problem, has been obsoleted by Q Numbers Redux.]

Background

(Feel free to skip this part if you already know about Quamina.)

This is yet another entry in the Quamina Diary series of blog posts. Quamina is a Go-language library that allows you to compile a bunch of “Patterns” together and, when presented with “events”, i.e. JSON data blobs, informs you which (if any) of the Patterns match each event, at a speed which is high (often millions/second) and only weakly related to the number of Patterns in any Quamina instance.

Quamina was inspired by AWS Event Ruler (“Ruler” for short), a package I helped develop while at AWS that has since been open-sourced. (Thanks, AWS!) By “based on” I mean “does a subset of the same things compatibly, with a design that is quite different, in interesting ways”. Quamina is also a fruitful source of software geekery for me to write about here, which I enjoy.

The problem

Suppose you want to match records for boxes whose height is 20cm. A sample of such a record, with most fields removed:

{
  "box": {
    "dimensions": {
      "width": 100,
      "height": 20,

(Much omitted.)

A Quamina Pattern designed to match those records would look like this:

{
  "box": {
    "dimensions": { 
      "height": [ 20 ]
    }
  } 
}

All good so far. But what if, due to some upstream computer program or inventory admin, a message showed up like so?

{
  "box": {
    "dimensions": {
      "width": 100.0,
      "height": 20.0,

Up until my last PR landed, Quamina didn’t know that “20” and “20.0” and “2.0e1” were the same quantity; it knew how to compare strings to other strings and that was all. Which was unsatisfactory. And a problem which had been solved years ago (partly by me) in Ruler.

Question

Pause a moment and ask yourself: How would you write a finite automaton which would Do The Right Thing with numbers? I’m not going to claim that the way Ruler and Quamina do it is optimal, but it’s worked pretty well for years and processed trillions of events with no breakages I know of.

Our answer: normalize the numbers into fixed-sized strings whose lexical ordering is that of the numbers they represent. Code first:

func qNumFromFloat(f float64) (qNumber, error) {
	if f < -FiveBillion || f > FiveBillion {
		return nil, errors.New("value must be between -5e9 and +5e9 inclusive")
	}
	value := uint64(TenE6 * (FiveBillion + f))
	return toHexStringSkippingFirstByte(value), nil
}
    

Constraints

Quamina requires, for numeric matching to work properly, that:

  1. The numbers be between -/+5×109, inclusive.

  2. The numbers have no more than five digits to the right of the decimal point.

You’ll notice that the code above enforces the first condition but not the second. We’ll get to that.

Effects

So, what that code is doing is:

  1. Adding 5.0e9 to the number so it’s in the range 0 … 10.0e9.

  2. Multiplying by 106 to push the five-digit fractional part to the left of the decimal point, preserving its sixth digit (if any) so the rounding in the next step works.

  3. Converting it from a float64 into a uint64.

  4. Turning that into a big-endian 14-byte hex string.

So any number that meets the constraints above is represented as 14 hex digits whose lexical order is consistent with the underlying numbers. “20”, “20.0” and “2.0e1” are all “11C3793911AD00”. Which means that this Pattern will do what reasonable people expect:

{"box": { "dimensions": { "height": [ 20 ] } } }

More formally

There are 1015 numbers that meet the constraints described above. This process maps them into hex strings. The first three and their mappings are:

-5,000,000,000, -4,999,999,999.99999, -4,999,999,999.99998
00000000000000,       00000000000009,       00000000000014

And the last three:

4,999,999,999.99998, 4,999,999,999.99999,  5,000,000,000
     2386F26FC0FFEC,      2386F26FC0FFF6, 2386F26FC10000

Less formally

This includes “most” numbers that are used in practice, including prices, occurrence counts, size measurements, and so on.

Examples of numbers that do not meet these criteria include AWS account numbers, some telephone numbers, and cryptographic keys/signatures. For those, Quamina just preserves the digits, whatever they may be, and in fact, this also usually ends up doing what people expect.

Could we do better?

I think so. To start with, hex digits are an inefficient way to represent bits; there are many other options.

The current hex approach hasn’t changed since a very early version of Ruler because it’s never been a pain point.

Speaking of Ruler, they recently landed a PR that lets them have 6 fractional digits as opposed to Quamina’s 5, simply by using decimal rather than binary arithmetic. It’s fast, too! This was made easier by the fact that Java has BigDecimal built in, while Go doesn’t. There are good open-source options out there, but I am extremely reluctant to accept dependencies in a package as low-level as Quamina. I don’t think matching more one more fractional digit justifies the cost of a dependency.

“Q numbers?”

In Ruler, the data type is ComparableNumber. In Quamina I toyed with comparableNumber and canonicalNumber but neither is quite right and both produced excessively long and ugly variable and function names. So I decided to call them “Q numbers”, where Q is for Quamina. The code became noticeably more readable.

While the set of Rationals is called “Q”, the only other significant use of the phrase “Q number” is some weird old thing out of Texas Instruments.

Practicalities

The code to enforce the constraints and do the conversion isn’t that cheap. When I first naively dropped it in, I saw a nasty performance regression. Code optimization helped, but I realized then that it’s really important not to convert an incoming field that happens to be a JSON number into a Q number unless you know, first, that it meets the constraints and second, that the finite automaton we’re trying to match has a Pattern with a numerical match that also met the Q number constraints.

The one moderately clever stroke here relies on the fact that Quamina has its own JSON parser, because Reasons. The parser obviously has to step its way through numbers, and it’s easy enough there to notice where the syntax characters like “.” and “e” are and cheaply figure out if the decimal is the right size.

Conclusion

Quamina now knows numbers. It’s a little slower to match a 14-digit Q than a string like “20”, but finite automata are fast and anyhow, being right matters.

Lounge Penguin 23 Jun 2024, 7:00 pm

Lounge, as in a jazz club. Penguin, as in GoGo Pengin, a piano/bass/drums trio. We caught their show at Jazz Alley in Seattle last week. Maybe you should go hit a jazz lounge sometime.

What happened was

My daughter turned eighteen and graduated high school. She had heard that Car Seat Headrest was playing Seattle’s Woodland Park Zoo, and could tickets and a road trip (me buying and driving) be her present? Seemed reasonable, and she found a friend to take along. I wouldn’t mind seeing the Headrests (decent indie rock stuff) but her party, her friend. I noticed that GoGo Penguin was playing Seattle’s Jazz Alley, and Lauren was agreeable to coming along for the ride and the show.

I only know about GoGo Penguin because YouTube Music drops them into my default stream now and then. I’d thought “sounds good, maybe a little abstract”, couldn’t have named a song, but hey.

The “Jazz Club” concept

You’ve seen it in a million old movies, and the Vic Fontaine episodes of ST:DS9. The lights are low, the audience is sitting at tables with little lamps on them, the band’s on a thrust stage among the tables, there’s expected to be a soft background of clinking glasses and conversation. Some people are focusing in tight on the music, others are socializing at a respectfully low volume.

Of course, usually a gunfight breaks out or an alien materializes on stage… no wait, that’s just on-screen not real-life.

All jazz clubs serve alcohol — fancy cocktails, natch — and many will sell you dinner too. Dimitriou’s Jazz Alley in Seattle is a fine example.

GoGo Penguin at Demetriou’s Jazz Alley in Seattle

GoGo Penguin at Jazz Alley; June 20th, 2024.
Our table was in the balcony.

We had a decent if conventional Pacific-Northwest dinner (crab and halibut), with a good bottle of local white. They’ve got things set up so most people have finished eating by the time the music starts. The seats were comfy. The decor was pleasing. The service was impeccable. I felt very grown-up.

GoGo Penguin

They’re three youngish guys from Manchester. Their Web site says they’re an “emotive, cinematic break-beat trio”. OK then. Piano/bass/drums is the canonical minimal jazz ensemble. Only they’re not minimal and it’s not jazz. I guess if you redefined “jazz” as complex rhythmically-sophisticated music featuring virtuoso soloing skills, well yeah. Damn, those guys can play. But their music is heavily composed, not a lot of opportunities for anyone to stretch out and ride the groove.

And it ain’t got that swing; can it still mean a thing?

I guess so, because I enjoyed myself. There wasn’t a microsecond that was boring, plus the arrangements were super intelligent and kept surprising me.

But most of all, the bass. Nick Blacka hit me harder than any bassist since I saw (and blogged!) Robbie Shakespeare of Sly and Robbie in 2004.

It’s really something special. It may be a stand-up acoustic bass, but it’s wired up so he can dominate the band’s sound when he reaches back for it (which he does neither too little nor too much). Plus the instrument’s acoustic texture roars out entirely unmarred, you can feel those strings and wood in your gut. He moves between bowing and plucking and banging and you hardly even notice because it’s always the right thing.

I don’t wanna diss Chris Illingsworth on piano or Jon Scott on drums; both of them made me catch my breath. But it’s Blacka’s bass explosions that I took home with me.

That swing?

These days my musical obsessions are Americana (i.e. bluegrass with pretensions) and old blues. The first of which also features instrumental complexity and virtuosity. And, if I’m being honest, both offer a whole lot more soul than Penguins.

I respect what they’re doing. I’ll go see them again. But I wish they’d get the hell out from behind those diamond-bright razor-sharp arrangements and just get down sometimes.

Next?

Lauren and I had real fun and left feeling a bit guilty that we’ve been ignoring Vancouver’s own jazz clubs. Not that I’m going to stop going to metal or post-punk or baroque concerts. But jazz clubs are a good grown-up option.

Epsilon Love 17 Jun 2024, 7:00 pm

Quamina was for a time my favorite among all my software contributions. But then it stalled after I shipped 1.0 in January of 2023. First of all, I got busy with the expert witness for Uncle Sam gig and second, there was a horrible problem in there that I couldn’t fix. Except for now I have! And I haven’t done much codeblogging recently. So, here are notes on nondeterministic finite automata, epsilon transitions, Ken Thompson, Golang generics, and prettyprinting. If some subset of those things interests you, you’ll probably like this.

(Warning: if you’ve already had your hands on the theory and practice of finite automata, this may all be old hat.)

[Update: This is kind of embarrassing. It looks like what this post refers to as an “epsilon” is not the same epsilon that features in the theory of finite automata. I mean, it still works well for where I’m using it, but I obviously need dig in harder and deeper.]

Sidebar: What’s a Quamina?

I don’t think there’s much to be gained by duplicating Quamina’s README but in brief: “A fast pattern-matching library in Go with a large and growing pattern vocabulary and no dependencies outside Go’s standard libraries.” If you want much, much more, this Quamina Diary blog series has it.

The problem

Combining too many patterns with wild-cards in them caused Quamina 1.0’s data structures to explode in size with a growth rate not far off the terrifying O(2N), which meant that once you’d added much more than 20 patterns you couldn’t add any more, because the add-pattern code’s runtime was O(2N) too.

Those structures are state machines generally, “nondeterministic finite automata” (NFA’s) in particular. Which offer good solutions to many software problems, but when they get to be any size at all, are really hard to fit into a human mind. So when I was looking at Quamina’s unreasonably-big automata and trying to figure out how they got that way, my brain was screaming “Stop the pain!”

Lesson: Prettyprint!

At the point I stalled on Quamina, I’d started a refactor based on the theory that the NFAs were huge because of a failure to deduplicate state transitions. But the code I’d written based on that theory was utterly broken; it failed simple unit tests and I couldn’t see why.

During the months when I was ignoring the problem, I privately despaired because I wasn’t sure I could ever crack it, and I couldn’t stomach more struggling with ad-hoc Printf and debugger output. So I decided to generate human-readable renditions of my automata. Given that, if I still couldn’t figure out what was going on, I’d have to admit I wasn’t smart enough for this shit and walk away from the problem.

Which turned out to be a good call. Generating an information-dense but readable display was hard, and I decided to be ruthless about getting the spaces and punctuation in the right places. Because I didn’t want to walk away.

Back in the day, we used to call this “prettyprinting”.

It worked! First of all, my prettyprinter showed me that the automata emitted based on my deduplication theory were just wrong, and what was wrong about them, and I found that code and fixed it.

Bad news: My deduplication theory was also just wrong. Good news: My prettyprinter provided unavoidable proof of the wrongness and made me go back to first principles.

And I just landed a PR that cleanly removed the state explosion.

Free advice

I’ll show off the prettyprinter output below where I dig into the state-explosion fix. But for the moment, a recommendation: If you have a data structure that’s not Working As Intended and is hard to grok, go hide for a couple of days and write yourself a prettyprinter. Prettyprinting is an intelligence amplifier. Your Future Self will thank you heartily.

“Back to first principles”?

The single best write-up on NFA and regex basics that I’ve ever encountered is Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...) by Russ Cox. It’s a discussion of, and reflection on, the regular expression library constructed by Ken Thompson in the mid-Sixties, before he got mixed up in Unix.

What’s annoying is that I had read this before I started wiring NFAs into Quamina, but ignored most of its important lessons due to a combination of not understanding them and thinking that my existing code could do what Cox described. A couple of weeks ago I went back and read it again, and it all made perfect sense and showed me the way forward. So I guess the lesson is that if you’re not Ken Thompson, you’re going to have trouble understanding what he did until you’ve tried and failed yourself?

So, major thanks to Ken for this (and Unix and other things too) and to Russ for the write-up.

Epsilon transitions

These are the magic bullet that make NFA’s work. Quamina didn’t have them, now it does. There are other bits and pieces but that’s the core of the thing.

I think the easiest way to explain is by showing you an NFA as displayed by Quamina’s new prettyprinter. It matches the regular expression "x.*9" — note that the " delimiters are part of the pattern:

 758[START HERE] '"' → 910[on "]
 910[on "] 'x' → 821[gS]
 821[gS] ε → 821[gS] / '9' → 551[gX on 9]
 551[gX on 9] '"' → 937[on "]
 937[on "] 'ℵ' → 820[last step]
 820[last step]  [1 transition(s)]
  • There’s an API to attach labels to states as you build automata, which as a side-effect gives each a random 3-digit number too. This is done in a way that can be turned into a no-op at production time.

  • 758: The start state; the only character that does anything is the opening " delimiter which transitions to state 910.

  • 910: You get here when you see the " and the only exit is if you see an x, which moves to 821.

  • 821: This state is the “glob” * operator. gS in its label stands for “glob spin”. It has an "epsilon" (ε) transition to itself. In Computer-Science theory, they claim that the epsilon transition can occur at any time, spontaneously, la-di-da. In programming practice, you take an epsilon transition for every input character. 821 also has an ordinary transition on 9 to state 551.

    This possibility of having multiple transitions out of a state on the same input symbol, and the existence of epsilon transitions, are the defining characteristics that make NFAs “nondeterministic”.

  • 551: Its label includes gX for “glob exit”. The only transition is on the closing " delimiter, to 937.

  • 937 has only one transition, on (stands for the reserved value Quamina inserts to signal the end of input) to 820.

  • 820 doesn’t do anything, but the [1 transition(s)] label means that if you reach here you’ve matched this field’s value and can transition to working on the next field.

Now I’m going to display the prettyprint again so you can look at it as you read the next paragraph.

 758[START HERE] '"' → 910[on "]
 910[on "] 'x' → 821[gS]
 821[gS] ε → 821[gS] / '9' → 551[gX on 9]
 551[gX on 9] '"' → 937[on "]
 937[on "] 'ℵ' → 820[last step]
 820[last step]  [1 transition(s)]

A little thought shows how the epsilon-transition magic works. Suppose the input string is "xyz909". The code will match the leading " then x and hit state 821. When it sees y and z, the only thing that happens is that the epsilon transition loops back to 821 every time. When it hits the first 9, it’ll advance to 551 but than stall out because the following character is 0 which doesn’t match the only path forward through ". But the epsilon transition keeps looping and when the second 9 comes along it’ll proceed smoothly through 551, 937, and 820, signaling a match. Yay!

So now, I have a fuzz test which adds a pattern for each of about thirteen thousand 5-letter words, with one * embedded in each at a random offset, including the leading and trailing positions. The add-pattern code hardly slows down at all. The matching code slows down a lot, to below 10,000/second, in stark contrast to most Quamina instances, which can achieve millions of matches/second.

I’m sort of OK with this trade-off; after all, it’s matching 10K-plus patterns! I’m going to work on optimizing it, but I have to accept that the math, as in finite-automata theory, might be against me. But almost certainly there are some optimizations to be had. There are possibilities suggested by Cox’s description of Thompson’s methods. And the search for paths forward will likely be good blog fodder. Yay!

Ken again

When I re-read Russ Cox’s piece, I was looking at the pictures and narrative, mostly ignoring the C code. When everything was working, I went back and was irrationally thrilled that my bottom-level function for one state traversal had the same name as Ken Thompson’s: step().

Also, when you process an NFA, you can be in multiple states at once; see the "xyz909" example above. When you’re in multiple states and you process an input symbol, you might end up in zero, one, or many new states. Russ writes, of Ken Thompson’s code, “To avoid allocating on every iteration of the loop, match uses two preallocated lists l1 and l2 as clist and nlist, swapping the two after each step.”

Me too! Only mine are called currentStates and nextStates because it’s 2024.

And thereby hangs a blog or maybe more than one. Because traversing the NFA is at Quamina’s white-hot center. You really REALLY don’t want to be allocating memory in that code path. Which should be straightforward. But it’s not, for interesting reasons that raise optimization problems I’m just starting to think about, but you’ll probably hear all about it when I do.

Un-generic

In the process of moving Quamina from DFAs to mixed DFA/NFA to pure-NFA I adopted and then abandoned Go’s generics. They hate me. Or I’m not smart enough. Or something. I wrote about the experience back in 2022 and while that piece ended inconclusively, I am personally much happier with generics-free Go code. Maybe they make other people happy.

Hard to understand

And then finally, there’s this one function I wrote in June 2022, doesn’t matter what it does. It has a a comment at the top that begins: “Spookeh. The idea is that…” and goes on for a long paragraph which, well, I can’t understand. Then I look at the code and think “that can’t work.” I keep thinking of sequences that should send it off the rails and write the unit tests and they fail to fail, and I use the prettyprinter and the NFA it generates is ruthlessly correct. I go back and look at it every few days and end up shaking my head. This is making me grumpy.

But after all, I did write, in a previous Quamina Diary episode: “The observation that computer programmers can build executable abstractions that work but they then have trouble understanding is not new and not surprising. Lots of our code is smarter than we are.”

But I’ll figure it out. And it’s nice to have interesting computer-programming stuff to blog about.

Wikipedia Pain 15 Jun 2024, 7:00 pm

There are voices — some loud and well-respected — who argue that Wikipedia is deeply flawed, a hellscape of psychotic editors and contempt for expertise. I mostly disagree, but those voices deserve, at least, to be heard.

[Note: There’s a companion blog post, Sex Edit War!, about my own experience in a Wikipedia Edit War. (I won! It was fun!) I hope it’ll make some of this narrative more concrete.]

Background

If you look at this post’s Reference Publishing topic, you’ll see a lot of Wikipedia-related material. I was one of its early defenders against the early-days waves of attackers who compared it to a public toilet and its editors to the Khmer Rouge.

I should also disclose that, over the years, I have made some 2,300 Wikipedia edits, created seven articles, and (what makes me happiest) contributed 49 images which have been used, in aggregate, 228 times.

I say all this to acknowledge that I am probably predisposed to defend Wikipedia.

What happened was…

Somebody spoke up on the Fediverse, saying “I wonder if reporters know that Wikipedia hallucinates too??” I’m not giving that a link, since they followed up with a post asserting that ChatGPT is better than Wikipedia. Life’s too short for that.

Anyhow, I replied “The difference is, errors in Wikipedia tend to get systematically fixed. Sometimes it takes more work than it should, but the vast majority of articles are moving in the right direction a vast majority of the time.” Much discussion ensued; follow the threads.

Shortly thereafter, the redoubtable JWZ complained about an edit to his page and I spoke up noting that the edit had been reversed, as bad edits (in my experience) usually are. That conversation branched out vigorously, dozens of contributions. Feel free to trawl through the Fediverse threads, but you don’t have to, I’ll summarize.

Gripe: Bad editors

This kept coming back.

I dunno. I don’t want to gaslight those people; if that’s the experience they had, that’s the experience they had. My own experience is different: The editors I’ve interacted with have generally been friendly and supportive, and often exceptionally skilled at digging up quality citations. But I think that these reports are something Wikipedia should worry about.

Gripe: Disrespect of expertise

By number and volume of complaints, this was the #1 issue that came up in those threads:

I generally disagree with these takes. Wikipedia not only respects but requires expert support for its content. However, it uses a very specific definition of “expert”: Someone who can get their assertions published in one or more Reliable Sources.

I think that if you’re about to have an opinion about Wikipedia and expertise and citations, you should give that Reliable-Sources article a careful read first. Here’s why: It is at the white-hot center of any conversation about what Wikipedia should and should not say. Since Wikipedia is commonly the top result for a Web search, and since a couple of generations of students have been taught to consult but not cite it, the article is central to what literate people consider to be true.

Let’s consider the complaints above. Mr Dear literally Wrote the Book. But, I dunno. I went and looked at the PLATO article and subjects linked to it, and, well, it looks good to me? It cites Mr Dear’s book but just once. Maybe the editors didn’t think Mr Dear’s book was very good? Maybe Dear says controversial things that you wouldn’t want to publish without independent evidence? The picture is inconclusive.

As for Mr O’Neill’s complaint, no sympathy. Given the social structure of capitalism, the employees and leadership of a company are the last people who should be considered Reliable Sources on that company. Particularly on anything that’s remotely controversial.

Mr Zawinski is upset that the person who chooses citations from Reliable Sources “knows nothing”, which I take to be an abbreviation for “is not a subject-matter expert”. There’s some truth here.

When it comes to bald statements of fact, you don’t need to be an expert; If more than one quality magazine or academic journal says that the company was incorporated in 1989, you don’t need to know anything about the company or its products to allow “founded in 1989” into an article.

On the other hand, I think we can all agree that people who make significant changes on articles concerning complex subjects should know the turf. My impression is that, for academic subjects, that condition is generally met.

Mr Rosenberg, once again, is upset that his personal expertise about the PS3 is being disregarded in favor of material sourced from a gamer blog. I’d have to know the details, but the best possible outcome would be Mr Rosenberg establishing his expertise by publishing his narrative in a Reliable Source.

Bad Pattern

There’s a pattern I’ve seen a few times where a person sees something in Wikipedia in an area where they think they’re knowledgeable and think it’s wrong and decide “I’ll just fix that.” Then their edits get bounced because they don’t include citations. Even though they’re an “expert”. Then that person stomps away fuming publicly that Wikipedia is crap. That’s unfortunate, and maybe Wikipedia should change its tag-line from “anyone can edit” to “anyone who’s willing to provide citations can edit.”

Implications

This policy concerning expertise has some consequences:

  1. The decision on who is and isn’t an expert is by and large outsourced to the editorial staff of Reliable Sources.

  2. There are ferocious debates among editors about which sources are Reliable and which are not, in the context of some specific article. Which is perfectly appropriate and necessary. For example, last time I checked, Fox News is considered entirely Reliable on the finer points of NFL football, but not at all on US politics.

  3. There are many things which people know to be true but aren’t in Wikipedia and likely never will be, because no Reliable Source has ever discussed the matter. For example, I created the East Van Cross article, and subsequently learned the story of the cross’s origin. I found it entirely convincing but it was from an guy I met at a friend’s party who was a student at the high school where and when the graphic was first dreamed up. I looked around but found no Reliable Sources saying anything on the subject. I doubt it’ll ever be in Wikipedia.

What do you think of those trade-offs? I think they’re pretty well OK.

The notion that anyone should be allowed to add uncited assertions to Wikipedia because they think they’re an expert strikes me as simultaneously ridiculous and dangerous.

Real problems

Obviously, Wikipedia isn’t perfect. There are two problems in particular that bother me all the time, one small, one big.

Small first: The editor culture is a thicket of acronyms and it’s hard to keep them straight. I have considered, in some future not-too-fierce editorial debate, saying “Wait, WP:Potrezebie says you can’t say that!” Then see if anyone calls me on it.

The big problem: The community of editors is heavily male-dominated, and there have repeatedly been credible accusations of misogyny. I have direct experience: I created the article for Sarah Smarsh, because we read her excellent book Heartland in my book club, then I was shocked to find no entry. Despite the existence of that mainstream-published and well-reviewed book, and the fact that she had published in The Guardian and the Columbia Journalism Review, some other editor decreed that that was insufficient notability.

At the time, I reacted by gradually accumulating more and more citations and updating the draft. Eventually she published another book and the argument was over. These days, in that situation I would raise holy hell and escalate the obstruction up the Wikipedia stack.

To Wikipedia’s credit, its leadership knows about this problem and gives the appearance of trying to improve it. I don’t know the details of what they’re trying and whether they’re moving the needle at all. But it’s clearly still a problem.

Once again…

I stand by what I said in December 2004: Wikipedia dwarfs its critics.

Page processed in 2.367 seconds.

Powered by SimplePie 1.4-dev, Build 20170403172323. Run the SimplePie Compatibility Test. SimplePie is © 2004–2024, Ryan Parman and Geoffrey Sneddon, and licensed under the BSD License.