When the Taliban breached the presidential palace in Kabul, Afghanistan, on August 15, Liladhar R. Pendse knew he had to do something. Pendse, librarian for the East European and Central Asian collection at University of California, Berkeley Library, initiated a project that same day to archive web content at risk of being taken down under Taliban rule. This included websites, social media posts, and news clips by and about artists, journalists, social activists, and others based in the country. With the help of campus colleagues, Pendse created the At-Risk Afghanistan Website Archiving Project (ARAWA), a seven-week-long project with the goal of preserving and archiving digital cultural content that could be permanently lost.
In mid-August, I watched Al Jazeera in dismay as news footage depicted Taliban representatives sitting at the desk of then–Afghan President Ashraf Ghani in the presidential palace in Kabul. Watching images of them at the presidential table and the evacuation of the last US ground troops signaled to me the potential changes in all aspects of Afghan civic life.
Despite the US occupation, a semblance of civil society had emerged in major urban centers like Kabul and Herat over the past 20 years. After the Taliban was removed from power in 2001, the ensuing government restored women’s rights. Journalism thrived. Higher education institutions filled with students. Government departments began to have a web presence.
Between 2001 and 2021, the country generated thousands of digital artifacts—policy documents, working papers, tweets, media clips—all serving as essential markers of recent Afghan history that were suddenly in danger of being lost forever. The religious conservative movement signaled the potential erasure, sanctioning, or altering of these websites to suit the Taliban’s ideological views.
In light of this threat, ARAWA’s goal had been to selectively crawl and preserve content from the websites most at risk of being taken down.
I had envisioned this as a collaborative project, reaching out to several key faculty members and doctoral students specializing in Afghanistan, and I invited our curator and cataloger for South Asian studies, Adnan Malik, to partner with me. Professors from other institutions—including Stanford University and James Madison University—were also enthusiastic supporters and helped initiate the project. UC Berkeley Library covered the cost of web archiving, and we used the Internet Archive’s web archiving platform.
Because of the seriousness of the situation on the ground in Afghanistan, the ARAWA project has kept several social media crawls and individual tweets of certain activists and archivists private for now. Some scholars, students, artists, and other Afghan citizens have also requested privacy or an embargo of their content from public view.
We were concerned initially about whether including the names of faculty members at higher education institutions in the web crawls would put them in danger. But because the information is publicly available, we decided to archive the Afghan university sites and make them viewable.
We’ve found that this project, like others, offers new lessons in the scope and limitations of the software used to archive websites. In some cases, the crawls can be unsuccessful because of a website’s settings or required permissions. In other situations, it has been difficult to reach a site’s creators: The transition to the Taliban government has been fluid, and the whereabouts of some artists or social activists are unknown. Emails we send to solicit permissions to crawl can go unanswered.
However, keeping in mind the larger goal of preserving these websites is crucial. Because this was an archival project geared toward documenting select governmental and nongovernmental sites that were bound to change under the Taliban, it was vital to have a finite end date to sunset the project. From the beginning, that date was scheduled for September 30, 2021.
As of early October, the project had archived 83 websites with more than 100 GB of data, including 846,111 files in Dari (Afghan Persian), Pashto, and English.
Collaboration is the mantra for any successful project. The partnerships with faculty, graduate students, library colleagues, and administration have been essential for preventing erasure in the virtual realm, making this project worthwhile and successful.