Memories at risk TU Braunschweig discovers security gaps in web archives
Websites are changed, deleted or restructured every second. Web archives store snapshots of these pages and thus preserve parts of our digital memory. But how reliable and trustworthy are these archives? Are they really an unadulterated and unalterable reflection of websites at an earlier point in time? Scientists at Technische Universität Braunschweig have investigated the security of web archives and specifically attacked them for this purpose.
In February of this year, thousands of US government websites were taken offline. For example, the US government’s official help page on LGBTQ bullying is now blocked. Web archives are a well-known solution to this problem. They create so-called ‘snapshots’ of a website at a specific point in time, helping us to document our past. The website on bullying can still be viewed via the ‘Internet Archive’. Billions of websites are now archived there.
Reliable sources
“Web archives, such as the Wayback Machine, are pillars of truthful information preservation,” says Professor Martin Johns, head of the Institute for Application Security at TU Braunschweig. Researchers use the archives to measure the development of JavaScript tracking over time. Judges immortalise websites that are used as evidence in court cases. Wikipedia editors refer to archives as reliable sources, and users can use the archives to track secret changes to online articles.
It is implicitly clear to all users of the archives that the snapshots enable the archives to create an ‘unadulterated’ and ‘unchangeable’ image of the website at an earlier point in time. But do these ‘time machines’ really live up to this expectation?
Two new attack models against web archives
Doctoral student Robin Kirchner from the Institute for Application Security, in collaboration with Professor Nick Nikiforakis from Stony Brook University, New York (USA), has discovered that these archives are more fragile than previously thought.
The scientists developed two new attack models and investigated whether eight well-known archives could withstand these attacks. They published their findings in a paper and presented them at the Conference on Computer and Communications Security (CCS) in Taiwan. “The security vulnerabilities we uncovered effectively allow retroactively archived web content to be falsified, i.e. to lie about the past,” said Professor Johns.
For example, the ‘Evasive Adversary’ developed by the scientists attempts to hide its own website from archiving. This web attacker can detect in real time when an archive is attempting to archive its website. As a result, it can hide from the archives or deliberately provide them with falsified information.
The ‘Anachronistic Adversary’, on the other hand, manages to retain control over its archived websites and modify them retrospectively at any time. The archive itself is not attacked in the process. The code responsible for the attack remains in the snapshot after archiving and loads current content from a source controlled by the attacker when called up. This allows the attacker to subsequently change the content displayed by the snapshot without modifying the archived snapshot itself.
Eight archives under the microscope
The researchers examined eight well-known internet archives. These include the best-known archive, the Wayback Machine, and Harvard University’s paid service, Perma.cc. They also took a close look at Archive.today, FreezePage, Arquivo, Megalodon, GhostArchive and Conifer.
The result: all archives are vulnerable to at least one of the two attack models. According to the scientists’ observations, they were able to hide their attacks from all archives. Even more alarming for Professor Johns: “Most archives were also vulnerable to our anachronistic attacks. This means that we were able to subsequently modify the archive copies of our websites in seven out of eight archives at any time via web attacks.”
Links:
https://www.nytimes.com/2025/02/02/upshot/trump-government-websites-missing-pages.html, Archive Link: https://archive.ph/mxScx
https://www.stopbullying.gov/bullying/lgbtq
https://web.archive.org/web/20250121064232/https://www.stopbullying.gov/bullying/lgbtq