Publications & Projects
Incoming links: Website updates CV
Publications
Black Ostrich: Web Application Scanning with String Solvers (2023) [BiBTeX entry]
by By Benjamin Eriksson, Amanda Stjerna, Riccardo De Masellis, Philipp Rümmer, Andrei Sabelfeld.
Abstract
Securing web applications remains a pressing challenge. Unfortunately, the state of the art in web crawling and security scanning still falls short of deep crawling. A major roadblock is the crawlers’ limited ability to pass input validation checks when web applications require data of a certain format, such as email, phone number, or zip code. This paper develops Black Ostrich, a principled approach to deep web crawling and scanning. The key idea is to equip web crawling with string constraint solving capabilities to dynamically infer suitable inputs from regular expression patterns in web applications and thereby pass input validation checks. To enable this use of constraint solvers, we develop new automata-based techniques to process JavaScript regular expressions. We implement our approach extending and combining the Ostrich constraint solver with the Black Widow web crawler. We evaluate Black Ostrich on a set of 8,820 unique validation patterns gathered from over 21,667,978 forms from a combination of the July 2021 Common Crawl and Tranco top 100K. For these forms and reconstructions of input elements corresponding to the patterns, we demonstrate that Black Ostrich achieves a 99% coverage of the form validations compared to an average of 36% for the state-of-the-art scanners. Moreover, out of the 66,377 domains using these patterns, we solve all patterns on 66,309 (99%) while the combined efforts of the other scanners cover 52,632 (79%). We further show that our approach can boost coverage by evaluating it on three open-source applications. Our empirical studies include a study of email validation patterns, where we find that 213 (26%) out of the 825 found email validation patterns liberally admit XSS injection payloads.
Resources available on the website.
Theses under my previous name
Emma Goldman : Mot en postindividualistisk teori om gemenskaper (2010) [BiBTeX entry]
by Stjerna, Albin.
Abstract
Denna uppsats undersöker den feministiska agitatorn och anarkisten Emma Goldman i syfte att identifiera en radikal teori om samarbete och gemenskaper som tillåter individuell frihet utan konflikt med det sociala. Framförallt fokuserar den på Goldmans upplevelser av Sovjetunionen och det Spanska inbördeskriget, och på hennes idé om den »konstruktiva revolutionen « som hon utvecklar i interaktion med dessa. Själva undersökningen sker i fyra delar med fyra olika fokus: individen, massan, revolutionen och organisationen i den ordningen. Som material används framförallt Goldmans två böcker om Sovjetunionen – med fokus på efterordet i den senare boken – samt textsamlingen Vision On Fire med Goldmans texter om det spanska inbördeskriget och den anarkistiska organiseringen där. Uppsatsen fokuserar huvudsakligen på problemen med individens beroende av omvärlden, anarkismens roll i revolutionen, individens relation till massan – både i rent ontologisk bemärkelse och i aspekter av kön och makt – och på hur samhället bör organiseras. Goldmans konstruktiva revolution handlar framförallt om samhällsorganisation, och uppsatsen följer de spänningar mellan den reellt existerande individen och det potentiella tillståndet av individualitet som finns i hennes teorier, där den fria individen inte är utgångspunkten för- utan målet med samhällsorganisationen, vilket öppnar för en potentiellt könstranscenderande individualitet. Upplösningen av denna spänning innebär att den konstruktiva revolutionens samhällsorganisation får en kompensativ karaktär: den handlar inte bara om att organisera ett bättre samhälle, utan också om att kompensera exempelvis ekonomiska eller könsbaserade maktrelationer hos det förrevolutionära samhället. Den individ under tillblivande som Goldman diskuterar är också relevant för feminismens kritik av hur kvinnor historiskt och i samtiden förvägrats individualitet bl.a. genom institutioner som familjen eller äktenskapet, och visar på behovet av andra individualiseringsprocesser.
The Order² of Books : A Foucauldian Archaeology of the early Swedish Library knowledge between 1912 and 1939 (2014) [BiBTeX entry]
by Stjerna, Albin.
Abstract
This thesis investigates the early field of library knowledge in Sweden between 1912 and 1939 (circa) through the lens ofFoucault’s archaelogy using a number of official documents (reports, bills, and statutes) as well as a number of articlesand speeches published in the journal Biblioteksbladet (founded 1916). It seeks to answer the question of how it becamepossible to form a field of knowledge, which external relations structured and enabled the field to exist, and which internalrelations of power and authority made it possible for librarians, state officials, ministers of education, and other experts toagree and disagree on the proper management of public libraries during the period.
Medium Data on Big Data Predicting Disk Failures in CERNs NetApp-based Data Storage System (2017) [BiBTeX entry]
by Stjerna, Albin.
Abstract
I describe in this report an experimental system for using classification and regression trees to generate predictions of disk failures in a NetApp-based storage system at the European Organisation for Nuclear Research (CERN) based on a mixture of SMART data, system logs, and low-level system performance dataparticular to NetApp's storage solutions. Additionally, I make an attempt at profiling the system's built-in failure prediction method, and compiling statistics on historical complete-disk failures as well as bad blocks developed. Finally, I experiment with various parameters for producing classification trees and end up with two candidate models which have a true-positive rate of 86% with a false-alarm rate of 4% or atrue-positive rate of 71% and a false-alarm rate of 0.9% respectively, illustrating that classification trees might be a viable method for predicting real-life disk failures in CERNs storage systems.
Modelling Rust’s Reference Ownership Analysis Declaratively in Datalog (2020) [BiBTeX entry]
by Stjerna, Albin.
Abstract
Rust is a modern systems programming language that offers improved memory safety over traditional languages like C or C++ as well as automatic memory managementwithout introducing garbage collection. In particular, it guarantees that well-typedprograms are free from data-races caused by memory-aliasing, use-after-frees, and accesses to deinitialised or uninitialised memory. At the heart of Rust's memory safety guarantees lies a system of memory ownership, verified statically in the compiler by aprocess called the borrow check. However, the current implementation of theborrow check is not expressive enough to prove several desirable programs safe,despite being so. This report introduces an improved borrow check called Polonius,which increases the resolution of the analysis to reason at the program statement level, and enables a more expressive formulation of the borrow check itself through the use of a domain-specific language, Datalog. To the best of our knowledge, Polonius is the first use of Datalog for type verification in the compiler of aproduction language. Specifically, this thesis extends Polonius with initialisation and liveness computationsfor variables, and constitutes the first complete description of Polonius in text. Finally,it describes an exploratory study of input data for Polonius generated by analysingcirca 12~000 popular Git repositories found on GitHub and the Crates.io Rustpackage index. Some central findings from the study are that deallocations are uncommon relative to other variable uses, and that a weaker (and therefore faster)analysis than Polonius is often sufficient to prove a program correct. Indeed, many functions (circa 64%) do not create any references at all, and therefore do not involve the reference-analysis part of the borrow check.
Projects
Audio Locker: an everything-to-Podcast web application for one user written in Rust using the Actix web framework. Converts things that are not podcasts audio to podcast audio and publishes it to a (non-indexed) private feed for later consumption. Think of it as a read it later service but for listening. Currently not public and supports only YouTube video, but the plan is to also handle at least some web pages through text-to-speech.
MacSE Weather Display Network: a network of ESP32-based temperature sensors reporting back to a display hosted inside the shell of an old Macintosh SE used to display the current outdoor temperature and a weather forecast.