Updating the Arecibo message for 2025

In 1974, a group of scientists broadcast a message towards Globular Cluster Messier 13 with the Arecibo telescope. The message was crafted in such a way that any aliens receiving it would potentially be able to understand some basic information about humanity. However, it contains some information that we now know to be inaccurate. What would the modern version look like?

The original Arecibo message

The components

The wikipedia article is great and goes into detail about how the components are meant to be interpreted, but I'll briefly summarize their purpose here (from top to bottom, with colors matching the image):

1. The numbers one through ten
2. Chemical elements in DNA
3. DNA backbone with two base pairs
4. The estimated number of base pairs in the human genome
5. A picture of a double helix
6. (left) A ruler to indicate the human's height
7. (left) The number 14 (when multiplied by the wavelength of the message, it gives the height of the average human male)
8. (center) A picture of a human
9. (right) The estimated population of humans in 1974
10. The major bodies of the solar system, with Earth raised
11. A picture of the telescope
12. Unclear - maybe this indicates the ground?
13. The number 2,430 (when multiplied by the wavelength, it gives the diameter of the telescope)

The problem

The technology of the time made it impossible to accurately measure the number of base pairs in the human genome, and the value they used turns out to have overestimated it by over 37%. It was also coincidentally extremely close to the then-estimated population, which I can only imagine would send our alien interlocutors down a rabbit hole. The full human reference genome sequence was (perhaps surprisingly) only finally determined in August 2023 (see this nice overview that covers how the reference genome has improved over the last few decades).

Updating the Arecibo message

I wrote a CLI tool to generate the message with user-provided values for the genome size as well as the population, which has obviously changed. On the left, I've highlighted the components being updated (blue is the genome size, red is the population), and on the right, we see the message if it were being sent today, with 3,117,275,501 base pairs and 8,098,171,861 humans (the estimated population by the US Census at the time of this writing):

The components we're updating The updated Arecibo message

Buffer overflows and Pluto

The human population is represented by a binary number read from left to right, top to bottom, with the least-significant digit coming first. The largest value this could ever hold (see image on the left) without expanding into the depiction of the solar system would be 281,474,976,710,655 (unless we destroy every planet after Jupiter, which would give us an additional five rows of six bits, for a total of 302,231,454,903,657,293,676,543) - either way, we'll be able to continue using this format to alert aliens to our presence for many years to come. To accommodate the transition that occurred in 2006, the CLI tool also has a --pluto-is-not-a-planet flag that disables Pluto (right).

The Arecibo message with a population of 281 trillion humans The updated Arecibo message but without Pluto

The CLI

Install: cargo install modern-arecibo
Repo: https://github.com/jimrybarski/modern-arecibo
Crate: https://crates.io/crates/modern-arecibo

2025-02-01

My experience using mutation testing in production

What is mutation testing?

Mutation testing is a technique for improving the correctness of your software, answering the question: do my unit tests actually cover every branch of my code? Mutation testing tools are external programs that make syntactically-legal alterations to your codebase, run your test suite (which is left unaltered), and check whether any of your tests fail. Effectively, it's like deliberately adding bugs to your code in a systematic way and seeing if you're already testing for those bugs (and, of course, reverting the bugs after testing is complete). For example, take this toy function:

def filter_records(records):
    for record in records:
        if record.quality < 30:
            continue
        yield record

The mutation testing library will flip the < to >, yield record to yield None, 30 to a bunch of different numbers, and so forth. It's mostly limited to operators, strings, and constants, and the details vary by tool and language. Only one of these so-called mutants is tested at a time - while it would be possible to create multiple mutants for a single test run, practical experience shows that this doesn't add much value, and the run time would explode exponentially.

If none of your tests fail even after adding these "bugs", it shows that your tests are incomplete - adding bugs should cause failing tests! Your job is then to either write more unit tests or refactor such that the things being mutated cease to exist. This is complementary to code coverage tools, which can only show whether a line was executed during the run of a test suite. It's still possible - likely even - that you can have 100% line coverage while still missing some behavior. Take this example:

if x and y:
    launch_rocket()
if z:
    load_cargo_onto_rocket()

Suppose you have one test where x and y evaluate to True but z is False, and another test where x or y is False and z is True. In one test, launch_rocket() will run, in the other, load_cargo_onto_rocket() will run, but you still haven't exercised the scenario where x, y, and z are all True, which would reveal that this code is going to launch an empty rocket and then try to load cargo into a vehicle that is no longer there. A test coverage tool will correctly inform you that all four lines are tested, but the most critical behavior is ignored.

My experience with mutmut

I had a greenfield project at work and I decided it was a great opportunity to give mutation testing a whirl. This was for a Python application that, in broad terms, took raw sequencing data and determined the error rate of an enzyme. I looked at Cosmic Ray, Mutatest, and mutmut. I ended up choosing mutmut as it just seemed the most polished at the time. I never did any rigorous comparison so I don't want to endorse it over the others, but I was mostly pleased with it.

Overall, I'm super happy with mutation testing, but not because it caught many bugs. In fact, I think it really only found 1 or 2 true positives, and they were relatively minor. The real benefit was that it deeply impacted the design of the codebase such that it was the most testable piece of software I've ever written. This happened because each time I added some new feature, I had to immediately consider whether I wanted to write dozens of unit tests to eliminate the mutants, or whether I wanted to refactor in a way that made it easier to test. Having a bunch of new mutants show up is often a sign of unnecessary complexity.

Towards the end of the project, I had been thinking that it was probably not worth it to test the main entrypoint function in the tool as I'd need to essentially simulate an entire run of the application starting from raw sequencing data, but as I was so close I decided to spend a few hours writing it, and I'm glad I did. Having the entire codebase killing all its mutants not only gave me confidence that the code was correct, I could also fearlessly make changes.

mutmut has a few flaws

Some mutation testing tools do everything without altering the source on disk - the code is loaded, altered and tested in memory (which is apparently possible!). Because each mutant is independent of every other, this is embarrassingly parallelizable. mutmut, on the other hand, writes each change to disk and then runs the test suite, one mutant at a time. This is certainly a much simpler design, but in addition to being slower, if you cancel the run partway through, the mutant that was being tested at the time will persist on disk! This isn't an issue if you committed your source just before the run since it would make reverting it trivial, but I typically want to see if my tests pass before I commit, so I had to sift through all of the deliberate changes I had made in order to find a single-character alteration.

Advice on adopting mutation testing

  1. Fast test suites are essential

If you can run your entire test suite in one second, and you have 300 mutants to test, then adding mutmut to your workflow means it now takes five minutes to run your tests. There are a couple things you can do to optimize this, fortunately.

First, the key is to observe that the vast majority of the time, mutants will in fact cause tests to fail, so you want to optimize for failing fast. If you have any property-based tests or slow-running tests in general, you can configure pytest to run them last by marking them as slow. Often, mutants will be killed by several different tests, so if you can kill them with fast tests, you can shave off meaningful amounts of time.

Second, while a bit obvious, is to not run mutmut until you're ready to commit, or only run in CI. Since it will identify a number of false positives in any new code, any time spent resolving those is wasted if you end up changing your design. I found that just running unit tests while developing, and then only running property-based and mutation tests once I thought I had something worthwhile ended up being a good compromise. The iteration time on finding out if I had architectural flaws was still fast enough that I never had to do any major refactors.

  1. Start early

After my initial success, I tried bolting on mutation testing to an existing project and it was a nightmare - there were hundreds of surviving mutants, and resolving them would require several refactors and just tons and tons of work that I simply couldn't justify. Having the tool constrain the design from the start really is critical. It's not impossible to adopt it later, but it does require a non-trivial investment.

  1. Skip plotting functions and other "untestable" code, but here there be dragons

mutmut works on an opt-in basis, so it will only modify files you explicitly tell it to. My tool generated a bunch of figures with matplotlib, which is not practically testable, and for those functions I just kept them in a separate module. I do have the habit of doing some light data wrangling in such code (e.g. something like getting all the values from a dictionary and plotting a histogram, instead of just passing in a list of values directly). Although it's trivial, "this can't possibly be wrong" is the thing that everyone writing a bug is telling themselves, so I tried to move as much of that code as possible to the modules where mutmut could evaluate them.

  1. Mutation testing tools are still somewhat limited

Only being able to modify operators, strings, and constants is powerful but doesn't provide complete coverage. Notably lacking from all of the tools I looked at is the ability to alter method calls. For example, they can't remove .strip() from a string variable, or swap .is_upper() with .is_lower(). To do so would require type inference, and the libraries for doing that in Python don't seem like they could easily integrate with a mutation testing tool, if they would even work at all. I have high hopes for Ruff, which is implementing a type inference engine, and once that matures I think there would be a great opportunity to merge that into existing tools or to design one around it. Until then, it's still a great technique, but it's important to recognize this limitation.

  1. Mutation testing won't catch all bugs

While the combination of unit tests, property-based tests and mutation tests did basically result in functions that were all correct (I mean, as far as I could tell), I still encountered a bug that was more strategic in error. In essence, I had told my tool to do the Wrong Thing, and then had tests that ensured the Wrong Thing was being done exactly as instructed. This kind of error will never be caught by any of these techniques, so we'll always need a human or human-level intelligence to review the code and think critically about it.

2025-01-19

Some initial thoughts on developing Lua plugins for Neovim

I wrote a bioinformatics Neovim plugin recently. It's nothing super special - it just provides some sequence manipulation functionality like generating or searching for a reverse complement, performing a pairwise alignment, etc. It really surprised me how often I found myself using it in day-to-day work right after I installed it.

This was my first Lua codebase and it definitely left me underwhelmed with the language. I mean, it's fine - it's a very small language and you can get proficient pretty quickly. First-class functions are easy and it feels like you're being guided by the language towards using them. But overall, it just feels unpolished and I simply don't understand the enthusiasm for it. Some issues I had:

  • The only compound data type is a table (an associative array) that returns nil when unmapped keys are accessed. So more or less the same API as defaultdict in Python. This can make debugging a nightmare, as a typo in a field name where nil is an expected value makes it seem like the value just never gets set. Maybe this is less of a problem with experience, but it's a completely unforced error. Other languages simply don't have this problem because they have data types that make this error impossible.
  • Variables are scoped globally by default, with opt-in local scoping. I just can't imagine a scenario where this is preferable.
  • The lack of a built-in unit testing framework is disappointing. I had trouble getting Vusted (a tool for unit testing Neovim plugins) installed because I didn't run the installer as root. What is the deal with modern language-specific package managers requiring root (thinking of the node ecosystem in particular here)? My only guess is that this is easier for people who don't understand how to set their shell's path and this is a strategy for not discouraging less-experienced programmers. But that sort of person is just copying-and-pasting anyway, and you can just do what Rustup does: add something to their .bashrc and print a little note about how they need to restart their terminal to be able to immediately use their new program.
  • The vim object in Neovim plugins can't be accessed by an LSP, so you're unable to view or reference the internal Neovim API, and it just appears to the LSP like an undefined variable. It's possible to disable those lints on a per-variable basis, but this is still pretty suboptimal. There does seem to be plugin for this so maybe this is just growing pains.
  • Unintuitive naming for built-in functions, like #variable_name to get the length of a string or gsub() to do replacement. What does that "g" stand for? It might be "global", but I haven't found a source for this that wasn't just speculation.

That said, once you figure out the idioms and boilerplate, plugin development becomes super easy. The language provides so few abstractions that there's really only one way to do anything, which I appreciate.

2025-01-11

DS9 viewing guide

Star Trek: Deep Space Nine is one of my favorite shows, but the quality of episodes is all over the place, especially in the first two seasons. I made this list so that people I recommend the show to can skip the bad episodes and also follow the longer-term story arcs. Episodes colored in yellow have important implications for subsequent episodes, introduce a new character, or change the nature of some important relationship. This currently only goes through season 5, I will add to this over time.

SeasonEpisodeTitleRatingComments
11Emissary, Part 1FineSeries debut. Some context: the reason Sisko doesn’t like Picard is because in an episode of TNG, Picard was kidnapped by cyborgs (the Borg), turned into a cyborg, and he became their leader and went on a killing spree (which killed Sisko’s wife) before ultimately being saved and reverted to a human
12Emissary, Part 2Fine
13Past PrologueFineIntroduces Garak, the best character
14A Man AloneMeh
15BabelMeh
16Captive PursuitMeh
17Q-LessSkipThe only Q episode, which just grates against the entire theme of the show (even more so than in TNG)
18DaxFineExplains the Trill species
19The PassengerMeh
110Move Along HomeUnwatchable
111The NagusPoor
112VortexMehChangeling backstory
113Battle LinesFineEhrmantraut from Breaking Bad in space
114The StorytellerUnwatchable
115ProgressFineOccupation worldbuilding. The old man is a bit much
116If Wishes Were HorsesUnwatchable
117The ForsakenVery mehMostly bad, but has a sweet ending.
118Dramatis PersonaeSkip
119DuetExcellentIncredible performances, great story
120In the Hands of the ProphetsMeh
21The HomecomingGood
22The CircleFine
23The SiegeGood
24Invasive ProceduresFrustratingI would skip because a character does something really unforgiveable and it’s better to think of this as non-canon. The rest of the characters certainly do
25CardassiansGreatGarak absolutely annihilates this episode
26MeloraBad
27Rules of AcquisitionVery mehThe Ferengi go to the Gamma Quadrant. Some important foreshadowing but nothing critical happens.
28Necessary EvilDecent
29Second SightBad
210SanctuaryBad
211RivalsBad
212The AlternateMeh
213Armageddon GameDecentO’Brien must suffer
214WhispersGoodO’Brien must suffer
215ParadiseFineO’Brien must suffer
216ShadowplayFine
217Playing GodMehTrill background
218Profit and LossFine
219Blood OathFineA scene in this episode is often referenced by trans rights activists
220The Maquis, Part 1DecentWikipedia: Maquis (World War II)
221The Maquis, Part 2Decent
222The WireGoodIconic Garak episode. His philosophy on the truth had a big effect on me.
223CrossoverSkipI personally hate Mirror Universe episodes
224The CollaboratorGood
225TribunalFineO’Brien must suffer
226The Jem’HadarDecentNog is very annoying but it’s otherwise a good episode
31The Search, Part 1GoodScepter
32The Search, Part 2Good
33The House of QuarkFine, goofySome great moments, but you have to buy into the ridiculousness of Klingon culture to enjoy this
34EquilibriumMeh
35Second SkinDecentGarak
36The AbandonedFineQuark makes a hilarious purchase
37Civil DefenseGood
38MeridianMeh
39DefiantGoodI’m not actually that into these episodes, but here is a quote from Wikipedia about it: “as this episode was finishing production an article appeared in the Los Angeles Times describing a proposal by the mayor to create fenced-in "havens" for the city's homeless, to make downtown Los Angeles more desirable for business. The cast and crew were shocked that this was essentially the same scenario that Past Tense warned might happen in three decades, but was now being seriously proposed in the present."
310FascinationBad
311Past Tense, Part 1Meh
312Past Tense, Part 2Meh
313Life SupportFine
314Heart of StoneMixedA story: bad, B story: great
315DestinyFine
316Prophet MotiveMeh
317VisionaryGoodO’Brien must suffer
318Distant VoicesBad
319Through the Looking GlassSkipMirror Universe episode
320Improbable CauseGood
321The Die is CastGood
322ExplorersMeh
323Family BusinessBadTwo recurring characters are introduced, but man this episode has some bad takes
324ShakaarFine
325FacetsMixedA story is lame, B story with Nog is decent
326The AdversaryGood
41The Way of the Warrior, Part 1Good
42The Way of the Warrior, Part 2Good
43The VisitorGood
44Hippocratic OathGoodIn case you forgot this was made in the 90s, the episode opens with a very unfortunate gay panic
45IndiscretionDecent
46RejoinedDecentGay rights allegory in space. “Hippocratic Oath” must have had a completely different production team
47Starship DownDecent
48Little Green MenFine, goofy
49The Sword of KahlessFine
410Our Man BashirGood, extremely goofyGarak’s emphatic defense of cowardice left a big impression on me as a youth
411HomefrontDecentThis episode aired almost six years before 9/11
412Paradise LostDecent
413CrossfireMehOdo becomes an incel
414Return to GraceFine
415Sons of MoghDecent
416Bar AssociationDecent“He was more than a hero - he was a union man!”
417AccessionFineI think this is the episode that introduces the weird way that Bajorans clap. They clap weird. Look at their hands when they clap.
418Rules of EngagementDecent
419Hard TimeGoodThe epitome of the “O’Brien must suffer” episodes
420Shattered MirrorSkipMirror Universe episode
421The MuseBad
422For the CauseGood
423To the DeathGood
424The QuickeningDecent
425Body PartsDecent
426Broken LinkGood
51Apocalypse RisingGood
52The ShipGood
53Looking for par'Mach in All the Wrong PlacesFine
54...Nor the Battle to the StrongGoodYou’d think this title was a Bible quote, but nope
55The AssignmentGoodO’Brien must suffer. Nana Visitor (who plays Kira Narys) was written out of this episode as she went into labor on set. Alexander Siddig (Julian Bashir) was the father.
56Trials and Tribble-ationsGoofy funBasically one continuous reference to a famous Original Series episode
57Let He Who Is Without Sin…Bad
58Things PastMehOdo occupation backstory
59The AscentDecent
510RaptureDecent
511The Darkness and the LightDecent
512The BegottenMeh
513For the UniformDecent
514In Purgatory’s ShadowGoodGarak’s trolling of Worf was another formative interaction
515By Inferno’s LightGreat
516Dr. Bashir, I Presume?Fine
517A Simple InvestigationBadOdo loses his virginity but somehow remains an incel
518Business as UsualDecent
519Ties of Blood and WaterDecentReferences a character from S3E05 “Second Skin”
520Ferengi Love SongsMehQuark and Rom have to decide if misogyny is good or bad
521Soldiers of the EmpireMehThey do Martok’s character dirty here. He’s so much better in every other episode he’s in. It is also revealed that the infirmary has carpet that’s difficult to get blood out of.
522Children of TimeGood
523Blaze of GloryDecent
524Empok NorGood
525In the CardsGoofy, meh
526Call to ArmsGood

1999-06-02