On the Mozilla Layoffs and the Future of the Web in General

Mozilla, the confusing combo of foundation and company behind the web browser Firefox recently laid off a substantial number of workers, citing economic reasons. As far as I know, the full extent of the damage to the organisation is still unknown, but the layoffs seem to have wiped out virtually every team working on the actual future of Firefox, including as the next-generation web engine Servo, as well as the incident response team, the much-beloved documentation team, and several core Rust developers. It seems clear that these layoffs come on the heels of some rather serious mismanagement, bordering on sabotage.

What Mozilla is doing is getting rid of absolutely everyone they need to ensure that Firefox has a future as an independent browser. This is bad news for the Web. If we lose Firefox, there will be exactly two browsers left: Google’s Chrome and Apple’s Safari. Which means that the future of the most important medium of communication will be in the hands of two very big corporations, one of which is a rent-seeking monopolist, and the other which is the same, and laid the foundations to surveillance capitalism. Hardly the types of companies you’d want responsible for a lemonade stand, let alone the Web.

Mozilla’s predicament is an indication of a larger problem. The Web started its life as a document format and has since been extended. A lot. These days it’s an all-singing, all-dancing application platform. The way we got there was by shoe-horning absolutely every functionality you could think of into what was the equivalent of Word documents. There are literally thousands of standards documents totalling hundreds of millions of words describing how a Web browser works. Drew De Vault did a quick study:

If you added the combined word counts of the C11, C++17, UEFI, USB 3.2, and POSIX specifications, all 8,754 published RFCs, and the combined word counts of everything on Wikipedia’s list of longest novels, you would be 12 million words short of the W3C specifications

Web browsers are probably the most complicated pieces of software we have. They are more complicated than the rest of our operating systems, and probably more complicated than computers themselves. Let me turn the subtext into text here: this is not sustainable.

This problem is further exacerbated by the fact that Web browsers are impossible to monetise, as they don’t really do anything on their own. Nobody wants the browser, they want the web pages the browsers display. The browser itself is infrastructure. Which is one of the reasons why Mozilla is in financial trouble; in contrast to Apple and Google, they only make a browser. Which, as I mentioned just before, is so hard it is practically impossible, requiring large teams of very skilled workers. And nobody wants to pay for it. From what I gather, most of Mozilla’s actual income came from Google paying to be the default search engine. Which they probably did in part to avoid antitrust legislation.

As some pointed out, this spells trouble for other large, public-goods type projects like the Matrix chat protocol. Historically, we have addressed these types of situations by collecting taxes. And indeed Matrix, for example, has received government sponsorship, as did the initial work that made the Web possible.

However, taxing the Web, or tech companies in general, is a bit more complicated, given their flighty and international nature. There are projects underway within the OECD to tax the tech sector, and they may even be a very good idea, in particular for righting some global inequalities (podcast link). Taxation might not be a good idea though, given that democratic control of states seems to be waning.

Another way to address the problem is the growing movement towards smaller technologies (“smolnets”), that would be cheaper to implement, more “human scale”, and less conducive to mass surveillance. And also, possibly more fun. Such projects stretch from “fantasy consoles” like the PICO-8 (games), to retro-networking like the older Gopher protocol, as well as its reimagining in the slightly larger Gemini protocol, and the resurgence of the experimental operating system Plan 9 from Bell Labs through projects like 9front and the ANTS public grid.

A Review of the reMarkable

A reMarkable

The reMarkable is very much an early adopter technology. It still has some rough edges, though the integration of an electronic pencil on an e-ink screen works remarkably well. Like most young technologies, it’s supporting tech, in particular the system software and the supporting apps, are not yet there in terms of design, coherence, and stability. Navigating PDFs for non-sequential reading is arduous, and I found the pencil-imitating brushes to be more annoying than useful. If you really need a tablet to review A4-sized text (approximately) on, and can work around a bit of a hassle, the reMarkable is probably great for you. Otherwise, I’d recommend waiting for more mature—and less pricey—iterations, whether they are from reMarkable or not.

Overall Experience

Overall, my impression is that the interface feels slightly half-baked; many things (most notably PDF navigation) is hard to reach, it is not always apparent what the buttons (neither software nor hardware) do, etc. This is perhaps to be expected, and a number of these shortcomings were addressed in a recent software upgrade (that by the way downloaded and installed without hassle for me, showing that at least the upgrade chain works as expected).

I have also had the entire UI freeze multiple times, most notably in cold weather, with forced restarts being the only way of getting back to usability.

The synchronisation apps are awful; perhaps most notably, the way to add a document to a folder is to select the document, click the move button, go into the target directory, and press “move here”. It is as if drag and drop was never invented. And worse yet, dragging documents into the app will upload them in the root directory, rather than (as you would expect) in the currently viewed directory.

Frankly, I don’t really understand why they had to go with their own app. Preferably, I’d have my reMarkable just connect to a cloud provider (in my case Dropbox, but they should support all of them), and download my files off it. Having a separate client application to synchronise files provide no benefit that I can identify.

Finally, I encountered a weird bug in the PDF export facility of both the macOS app and the reMarkable email export. I had a document where I had previously drawn some illustrations. On both exports, there were erased pencil lines that were not shown on the reMarkable or in the preview app, much like when one scans a paper with pencil traces in a normal scanner. That might be going a bit too far in emulating the paper experience.

Reviewing Documents and Reading

My primary intended use-case for the reMarkable was reading and annotating papers and other PDF documents, for which it worked quite nicely; highlighting and making notes in the margin works as you would expect, and it has been able to render every PDF I have given it reasonably. Getting the documents onto the reMarkable is ok (though organising them is, as I have previously stated, a bit of a hassle). The support for exporting annotated PDFs works exactly as you would expect: you get highligter lines where you used the highlighter, and pen-y notes where you used a pen.

An annotated PDF document on the reMarkable

It does read epub books, which is a bit awkward due to the width of the device, but it works as one would expect. Where it really falls short is navigating PDFs, though. Page numbers are offset-from-start page numbers, and not the page numbers you see on the actual pages (as with some PDF readers). No PDF link works, and there is no way to get a table of contents, even if one exists in the metadata. There are basically two options for navigation: tapping on a page in a very spatially inefficient overview of the PDF contents, or entering a page number (that is, an offset from the beginning) manually. This is particularly annoying because the table of content data is there, in machine-readable form, as are the page numbers. And you can’t even use the document’s own table of contents because you can’t (as far as I can tell) click the links on it!

A reMarkable browsing the pages of a PDF

This is entirely a software problem, though, and I’d expect this to be fixed in a future release of the system software.

Note-taking and Drawing

A note made with the reMarkable

Note-taking works surprisingly smoothly; the pen feels a lot like using a pen on paper, and more pages can be continuously added to a notebook by pressing the right button (as long as the current page is not empty; a nice detail that keeps you from creating unneccessary empty pages).

A drawing made with the reMarkable

My only complaint is that the eraser is quite unaccessible. A future iteration of the hardware should preferably have an eraser on the back of the pen or similar to facilitate moving between erasing and writing.

Testing a Small Query Language in Python with Hypothesis

An experimental setup with three beakers, illustration from a book

This entry was intended to be cross-posted to the CERN Databases blog, but is currently pending review. Consider it a pre-release version.

Hypothesis is an implementation of Property-based testing for Python, similar to QuickCheck in Haskell/Erlang and test.check in Clojure (among others). Basically, it allows the programmer to formulate invariants about their programs, and have an automated system attempt to generate counter-examples that invalidates them.

A Small Query Language

During my internship at CERN, I am developing a small (partial) two-way monitoring system to propagate alerts from filers to CERN’s incident management system. In the course of developing this monitor, I decided to invent a very minimal query/filtering language for logging events. It maps directly against Python objects using regular expressions (basically: “does object x have a property y matching regex z”?). The following is its grammar (written for the Grako parser-generator):

start = expression ;

expression
        =
        '(' expression ')' binary_operator '(' expression ')'
        | unary_operator '(' expression ')'
        | statement
       ;

binary_operator = 'AND' | 'OR';
unary_operator = 'NOT';
statement = field ':' "'" regex "'";
field =
      /[0-9A-Za-z_]+/
      ;

regex
    = /([^'])*/
    ;

An example (from a test configuration file) could be event_type:'disk.failed' (disk failures) or (source_type:'(?i)Aggregate') AND (NOT(source_name:'aggr0')) (log events from aggregates, but not aggr0).

The following invariants should hold, where q is any valid query:

  • NOT(NOT(q))q
  • (q) AND (q)q
  • (q) OR (q)q
  • (q) OR (NOT(q)) is always True

In addition, the following properties should also hold:

  • key:'value' matches every object containing a property key with exact value value for any valid values of key and value (that is, valid Python variable names for key and more or less any string for value)
  • key:'' matches every object that has an attribute key regardless of its value

Generating Examples

There are several types of inputs we need to generate to test the system. Let’s break them down:

  • objects with various fields
  • regular expressions
  • valid statements
  • valid queries

Let’s start from the top. As Python is a dynamic language, we can do crazy things, like dynamically generating objects from dictionaries. The following is a fairly common hack:

class objectview(object):
    def __init__(self, d):
        self.__dict__ = d

    def __repr__(self):
        return str(self.__dict__)

    def __str__(self):
        return str(self.__dict__)

This allows us to instantiate an object with (almost) arbitrary fields:

cat = objectview({'colour': 'red', 'fav_food_barcode': '1941230190'})
>>> cat.colour
'red'
>>> cat.fav_food_barcode
'1941230190'

Given this, we can just generate valid objects using the @composite decorator in Hypothesis:

@composite
def objects(draw):
    ds = draw(dictionaries(keys=valid_properties,
                           values=valid_values,
                           min_size=1))

    return objectview(ds)

Generating valid values is much simpler:

valid_values = text()

Any text string is a valid string value. Of course! Properties are a bit trickier though:

valid_properties = (characters(max_codepoint=91,
                               whitelist_categories=["Ll", "Lu", "Nd"])
                    .filter(lambda s: not s[0].isdigit()))

Variable names can’t start with a number, and has to be basically mostly ASCII, so we slightly modify and filter the characters strategy.

Statements can be generated in much the same way, using composite strategies:

@composite
def statements(draw):
    # any valid key followed by a valid regex
    key = draw(valid_properties)
    regex = draw(regexes)

    return u"{key}:'{regex}'".format(key=key, regex=regex)

However, how do we produce regular expressions? Let’s start with some valid candidates:

regex_string_candidates = characters(blacklist_characters=[u'?', u'\\', u"'"])

Then we can generate regular expressions using Hypothesis’ back-tracking functionality through assume(), which causes it to discard bad examples (in this instance is_valid_regex() simply tries to compile the string as a Python regular expression, and returns False if it fails):

@composite
def regex_strings(draw):
    maybe_regex = draw(regex_string_candidates)
    assume(is_valid_regex(maybe_regex))
    return maybe_regex

But we can also use recursive generation strategies to produce more complex regular expressions:

regexes = recursive(regex_strings(), lambda subexps:
                    # match one or more
                    subexps.map(lambda re: u"({re})+".format(re=re)) |

                    # match zero or more
                    subexps.map(lambda re: u"({re})*".format(re=re)) |

                    # Append "match any following"
                    subexps.map(lambda re: u"{re}.*".format(re=re)) |

                    # Prepend "match any following"
                    subexps.map(lambda re: u".*{re}".format(re=re)) |

                    # Prepend start of string
                    subexps.map(lambda re: u"^{re}".format(re=re)) |

                    # Append end of string
                    subexps.map(lambda re: u"{re}$".format(re=re)) |

                    # Append escaped backslash
                    subexps.map(lambda re: u"{re}\\\\".format(re=re)) |

                    # Append escaped parenthesis
                    subexps.map(lambda re: u"{re}\(".format(re=re)) |

                    # Append dot
                    subexps.map(lambda re: u"{re}.".format(re=re)) |

                    # Match zero or one
                    subexps.map(lambda re: u"({re})?".format(re=re)) |

                    # Match five to six occurrences
                    subexps.map(lambda re: (u"({re})"
                                            .format(re=re)) + u"{5,6}") |

                    # concatenate two regexes
                    tuples(subexps, subexps).map(lambda res: u"%s%s" % res) |

                    # OR two regexes
                    tuples(subexps, subexps).map(lambda res: u"%s|%s" % res))

The same strategy also works for the highly recursive structure of the query language:

queries = recursive(statements(),
                    lambda subqueries:
                    subqueries.map(negated_query) |
                    tuples(subqueries, subqueries).map(ored_queries) |
                    tuples(subqueries, subqueries).map(anded_queries))

Read as: “a valid query is any statement, or a any valid query negated, or two valid queries AND:ed or OR:ed”.

Making Assertions

To finally assert properties, we assert things similarly to how we would in normal unit tests. For example, let’s verify that the empty regular expression matches anything:

@given(target=objects(), key=valid_properties)
def test_query_for_empty_regex_always_matches(target, key):
    q = "{key}:''".format(key=key)
    assert query.matches_object(q, target)

Hypothesis immediately finds a counter-example:

>       assert query.matches_object(q, target)
E       assert False
E        +  where False = <function matches_object at 0x7fa76dc6f5f0>("A:''", {u'B': u''})
E        +    where <function matches_object at 0x7fa76dc6f5f0> = query.matches_object

key        = 'A'
q          = "A:''"
target     = {u'B': u''}

syncd/eql/test/test_hypothesis.py:188: AssertionError
---------- Hypothesis ---------
Falsifying example: test_query_for_empty_regex_always_matches(target={u'B': u''}, key=u'A')

An object which doesn’t have the specified property will not match the query, even if the query is looking for the empty string. Ok, so that’s a bad example depending on how we want to treat this edge-case. If we really did want the empty regular expression to match even objects which does not have their keys, this would have been a proper bug in the implementation. However, it makes more sense to require the object to have the property checked for, and so this is a bad counter-example. We can exclude it by adding assume(hasattr(target, key)) to the test, causing it to back-track on any examples where the target object does not have the key:

@given(target=objects(), key=valid_properties)
def test_query_for_empty_regex_always_matches(target, key):
    assume(hasattr(target, key))

    q = "{key}:''".format(key=key)

    assert query.matches_object(q, target)

And now, the test passes.

The image is from “Chemistry: general, medical and pharmaceutical…” from 1894, courtesy of the Internet Archive Book Images

Shipping Out: One Month Later

Entrance to the CERN Computer Centre

This September, I started a one-year internship at CERN’s Database department (more specifically the IT-DB-IMS department, as they are clearly very fond of hierarchies). My assignment is related to logging and storage monitoring, and my primary task will be to provide a solution for automatic propagation and reporting of storage-related errors (think dead hard drives, power failures etc).

Shipping Out

Early morning August the 27th, I started my journey. Split between a suitcase and two backpacks, I brought the following with me on the plane:

  • Soft-shell jacket and rain gear
  • Clothes for at least ten days, vacuum packed (*)
  • Micro-fibre towel
  • Sleeping bag + ultra-light sleeping pad
  • Various toiletries
  • Computer with charger
  • 6 USB chargers with cables
  • Spare AA cells
  • Bluetooth keyboard
  • Bluetooth headphones
  • Running gear for at least two weeks
  • Swimming trunks and goggles
  • 3 single-board computers (two CHIPs, one Raspberry Pi)
  • Stuffed tiger, vacuum packed
  • USB microphone
  • A few USB thumb drives
  • External disk drive (for back-ups)
  • Various tools (including lock-picks, various special-purpose screwdrivers and a side cutter)
  • Thermos mug
  • Pyjamas
  • Mountaineering/hiking boots

(*) this turned out to be largely false – perhaps ten normal days, but not ten really hot days.

View from my office

After an uneventful (and very WiFi-less) flight of about two hours, I landed in Geneva, found my luggage after walking an impressive number of hallways, each one plastered with advertisements for luxury watches, various banking services, perfumes etc, and proceeded to board the bus to Ferney-Voltaire, the small French village near the Swiss border where I would be staying.

The process involved much frustration and about one hour of pacing the arrivals hall trying to find someone who could help me buy a ticket from the vending machine. It turned out the Right Thing To Do was to buy a ticket in the machine for a different company than the one who operated the bus I ended up taking. From there, the rest of the trip was mostly uncomplicated, as I had practised walking most of the way from the bus stop to the place I would be staying at on Google Maps in advance. Something which, I suppose, is evidence of the same combination of absolute neuroticism and ruthless pragmatism as vacuum-packing one’s stuffed animal for the journey.

During the first month of my stay, I was living without a mobile data plan. The experience reminds me of Cory Doctorow’s short story After the Siege (available online as part of Overclocked). Everything works mostly as normal (only worse) – until I leave the apartment. Then everything immediately stops working. Accidentally closed an email attachment when trying to read the email that contained it? Sorry, it’s now unavailable. The song I just added to my playlist? Gone. Too bad the EU hasn’t succeeded in outlawing roaming fees yet.

On the Betrayal of Things

The pensioned synchrotron from the official CERN Tour

Living in another country for any longer duration is a lot like being at the wrong end of a set of really evil unit tests. In life in general, as in programming, at any given point in time one holds a lot of preconceptions about how the world works – unreflected habits and assumptions. Living in another country immediately invalidates a non-empty, randomly selected set of these assumptions. It is precisely the experience that Foucault describes in the Preface to Les mots de choses of the “laugh that shatters”; at first one laughs at the three different positions of dried beans in the supermarket, the consistent placement of the potato crisps next to the hard liquor, or the three different, non-consecutive candy aisles – until one realises that categories such as “baking supplies”, “candy”, etc are entirely arbitrary, and just took on an air of ontological truths by virtue of being a widely accepted convention. There are, after all, several ways to skin a cat.

Ils sont fous ces gaules

The culture shock I experience with the French culture is admittedly a very mild one, but there are definitely subtle but unsettling differences. Milk typically comes high-temperature-pasteurised, which gives it a different flavour than fresh milk. And everything is perfumed and/or tinted pink; toilet paper (!), trash bags – everything. Loose-leaf tea is not available anywhere, except in gift tin boxes without any option for refills. Muesli is hard to come by, and the “healthy” granola/muesli mixes “without added sugar” contains little chocolate drops. Not to mention the entire pastries-for-breakfast thing. I’m still not over that.

If anything, French administrative culture is a confusing jumble of obstinate strictness and unregulated chaos. Applying for a bank account? Sign 200 papers and prove your residence with a phone or utilities bill, which is apparently the standard way of doing it. Going for a swim? Well, certain types of swimming trunks are forbidden but there is no standardised vocabulary for describing swimming trunks so you just have to figure out which ones are ok through trial and error. But at least they gave me the rebate for residents after I dutifully showed my phone bill at the reception (the de facto standard proof of residence). I may also have slammed my swimwear at the desk and asked “CECI SONT INTERDITS?!”. Lucky me, they weren’t.

How’s Work?

A fiber switch in the CERN Computer centre

I have never seen anyone suffer so badly from NIH syndrome as CERN does. For almost every given task, they have their own, subtly different solution from the industry standard; Mattermost in stead of Slack, Terrible Exchange Web Mail That Probably Escaped From 2007 in stead of Google Apps (the only instance where I wish they’d have had more NIH and just gone with their own solution), self-hosted GitLab in stead of Github, and a customized variant of Dropbox called “CERNBox”. Oh, and they have their own GNU/Linux distros. Two of them. And they are both RPM-based.

From an economical standpoint this makes sense. They are already running operations on huge server farms. Adding a few extra services is probably very cheap, both in terms of labour and other resources, especially compared to paying for a service offered by someone else. And in the long run, their involvement in several FLOSS replacements for common industry applications is most likely greatly improving them. I just wish they would have put more effort into the UX on some of these apps, so that they didn’t all suffer from flakiness and the general “a cheaper copy of X” look and feel (well-known from Libre/OpenOffice, which somehow manages to look even worse than their proprietary counterparts – all things considered no small feat). Frankly, I like my FLOSS implementations as I like my tech products in general – either better or at least as good as the things they are replacing, or nonexistent.

Other than that, the work is interesting, if a bit mundane. I learn a lot every day, but it is really hard to muster any real enthusiasm for a small python daemon with the task of basically carting data from one API to another. The only real challenges I have found so far is to implement a query language and to test things really, really well, about which I might go into some detail in the future.

Scheduling Periodic Cleaning in macOS

On most of my machines, I treat the default Downloads folder as a staging area: it’s where things arrive before being filed away wherever they belong. Most files, however, are only useful for a very brief period of time and belong nowhere really. Therefore, the Downloads folders on my systems tend to accumulate all sorts of meaningless gunk, as sorting and taking out the digital trash is precisely the sort of menial task we built computers to avoid having to do. Or, they did before I automated the process of taking out the trash. Here’s how.

Begin by setting up an Automator workflow to do the actual cleaning. You might be tempted to choose a workflow along the lines of “if file creation date is not within…”, but that will – for some reason – exclude files created today. The proper way of doing it is to set up a NOT-AND criterion: if a file is not created today and not created within the last 60 days, move it to Trash as seen below.

Screenshot of an Automator workflow for moving files in Downloads to Trash

The next step is to periodically schedule the Automator workflow using Launchd. If your computer is always on at a given time, you could use Cron, but Launchd has more advanced trigger options, and will also make sure to reschedule your task, should your computer have been off or sleeping at the time when it should have run. In this instance, this is not very important, as the script runs every hour, but if you would – say – clean your Downloads folder every first monday of the month, it suddenly becomes more important. In the script below, the workflow is called when it is first loaded (e.g. on login etc) as well as periodically every hour (on the 0:th minute), which may or may not be excessive for your use case (it probably is for mine).

Change the path to your Automator workflow file below (mine is in ~/Documents and is called clean-downloads.workflow). It may be a good idea to avoid spaces in the file name. Save the Launchd configuration file in ~/Library/LaunchAgents/com.orgname.scriptname.plist (as you can see below, i used org.amanda and cleandownloads for organisation name and script/agent name respectively).

Once you are done, you may or may not need to load the script using launchctl load <path-to-script>, e.g. launchctl load ~/Library/LaunchAgents/org.amanda.cleandownloads.plist.

If you want to do more in-depth editing of the Launchd script, I’d recommend using LaunchControl. See also this StackOverflow thread on creating Launchd tasks and where to place them. It also contains other ways of scheduling periodic tasks under macOS.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>org.amanda.cleandownloads</string>
	<key>ProgramArguments</key>
	<array>
		<string>automator</string>
		<string>/Users/amanda/Documents/clean-downloads.workflow</string>
	</array>
	<key>RunAtLoad</key>
	<true/>
	<key>StartCalendarInterval</key>
	<dict>
		<key>Minute</key>
		<integer>0</integer>
	</dict>
</dict>
</plist>

Please note that this script (for security reasons) doesn’t actually delete anything, it just moves them to the Trash. But it pairs excellently with macOS Sierra’s feature for automatically purging Trash of old files!