Free downloadable collection for research purposes


Internet Memory Foundation gives access to Livingknowledge subcollection.


In the framework of LivingKnowledge, a three-year European project (No. 231126 (2009-2012), funded by the European Commission through FP7, Internet Memory Foundation built a news and blogs annotated collection.

This data set is already used for a call for participation organized in the framework of NTCIR Temporal Information Access (Temporalia). The results will be presented at NTCIR-12 conference @ NII, Tokyo, Japan.

If other researchers need access to this collection, feel free to contact us.

Detailed information:
The collection is approximately 20GB uncompressed and over 5GB zipped in size.
It spans from May 2011 to March 2013 and contains around 3.8M documents collected from about 1.500 different blogs and news sources.
The data is split into 970 files, named after the date of that day and some information about its sources (there might be more than one file per day).



IMF at Archiving 2014, in May 13-16, in Berlin


« Building Scalable Web Archives » will be presented in the Technical Program on Thursday May 15, 2014, at Arsenal Cinema (Berlin)

Internet Memory Foundation is glad to participate to the Archiving Conference organized by the Society for Imaging Science and Technology (IS&T).

Our presentation Building Scalable Web Archives aims at introducing the Internet Memory Foundation platform based on its distributed infrastructure and the associated tools and workflows that facilitate data management and preservation actions at large scale.

IMF’s main concern over the past years has been related to scalability issues in terms of crawling, indexing, preserving and accessing content. To answer these issues, the foundation developed its own crawler and built a new infrastructure.

This presentation will outline the difficulty of analyzing content stored in (W)ARCs and the solution applied within IMF platform. We will also describe our automated quality assurance workflow and the results obtained through our new approach.

Preliminary Program  is now online and registration open.



Successful end of LAWA project


The LAWA project ended successfully in October 2013.
LAWA (Longitudinal Analytics of Web Archive) conducted an ambitious, focused research and development on Big Data analytics for very large Web collections.

In LAWA project, a particular emphasis has been put on temporal aspects, including change analysis, trend detection, and data aggregation over time periods.
Internet Memory participated to the project as the main data provider and software integrator.

The projects outcome consist of many top-level research contributions, published in major conferences of the field (SIGMOD, VLDB, ICDE to name a few), as well as industrial achievements now incorporated in Internet Memory software. MemoryBot, the Internet Memory crawler, has been designed and implemented as part of the LAWA effort. The LAWA architecture, based on the Hadoop/HBase suite, and featuring a high-level analytic framework, is largely inspired from IM R&D activity, and has been adopted for the Mignify platform, provided by Internet Memory Research.
Overall, LAWA, whose results have been deemed “excellent” by the European Commission, is an representative example of a fruitful cooperation between high-level research group and innovative SMEs.

We thank the European Commission for its support, and hope that the LAWA components integrated in our products will demonstrate their effectiveness.



Online Web Archiving Session by ADBS


The ADBS organized on September 25th, 2012 in Paris a session focused on Web archiving. Internet Memory was glad to participate and to present its vision and projects. Watch the online session!

The ADBS (Association des professionnels de l’information et de la documentation) is the professional association for librarians and information professionals in France. With 5000 members ADBS is a leading information professionals association in Europe. The association aims at promotion of the profession in its many forms, improvement of the professionalism of its members and lobby for the interest of the profession.

The video “Archivage du web : quelle mise en oeuvre?” has been cut in 3 parts:
- Overview and definition
- Use cases and Research
- Discussion
We wish you a nice viewing!

You want to discuss about your Web archiving project?
Contact us to receive more information.
contact /at/



Participation to WebArchivists Camp in Paris, on November 22th


Internet Memory is existed to be part of this First BarCamp on Web archiving organized in Paris by WebArchivists association.
Be welcome to join!


What is WebArchists, and the WebArchivists Camp?

WebArchists is a non-profit organization founded in 2009, based mostly in Paris. Its goals are:
- To increase mainstream medias and public interest for webarchives,
- To gather the news and projects graviting around this subject,
- To invite people from different horizons to meet, think and imagine what the future of web archiving will looks like and so on.

For this WebArchivists Camp, you will have the opportunity to meet Julien Masanès, director of Internet Memory, the team working on the French Legal Deposit from the National Library of France, but also hackers, developers, traditional archivists, graphic designers, net-artists and everyone who want to jump in!
Based on the Barcamp system, the evening will be organized around different workshops, thinking about problems to solve, ideas, projects… At the end, results of the different workshops will be discussed.

A live-streaming of the event, as well as a translation of the main ideas and speeches should be put in place by WebArchists team. And you can follow them on Twitter @webarchivists (hashtag #WACamp).

More info

WebArchivists Website
Event page
La Cantine, 12 Galerie Montmartre
75002 Paris
Thursday 22th Novembre 2012
From 6.30 pm to 9.30 pm



How to fit in? Integrating a web archiving program in your organization


Internet Memory will be part of IIPC-sponsored workshop held at the Bibliothèque nationale de France, Paris, on Friday, November 30, 2012.

IIPC New Logo
Ten National Libraries from all over the world will be attending the workshop and will have the opportunity, on Friday 30th November to learn more about Internet Memory activities:
- Our Partnerships with heritage institutions and research centers
- Web archiving services: Production, Quality Assurance and tools we developed to improve crawl, access, and usage
- Research projects which enables Internet Memory to collaborate on innovative projects with prestigious labs.

List of participants:
Bibliotheca Alexandrina
British Library
Library of Congress
National and University Library of Slovenia
National Library and Archives of Québec
National Library of Estonia
National Library of Germany
National Library of the Netherlands
National Library of Singapore
National Library of Spain

More information



Workshop on Big-Data Analytics for the Temporal Web, Paris, November 13, 2012


The LAWA project organizes an International Workshop on Big-Data Analytics for the Temporal Web, Paris, November 13, 2012.
Keynotes by Yahoo! Research, Barcelona (R. Baeza-Yates) and L3S Research Center, Hanover (W. Nejdl).

Lawa Logo

The LAWA project organizes an one-day workshop with researchers using (or planning to use) the Web as a corpus for their studies.

The focus is on methods, tools, and platforms for big-data analytics, including requirements on and experiences with such technologies.
Topics of interest include but are not limited to:
- Web dynamics, history, and archives;
- Text mining and contents classification,
- Temporal/longitudinal studies
- Scalable methods (e.g., cloud-based map-reduce),
- Large scale data storage,
- Community detection and evolution.

The workshop will have presentations by participating researchers and big-data users, including the LAWA project team.

Keynotes by:
- Ricardo Baeza-Yates from Yahoo! Research, Barcelona
- Wolfgang Nejdl from L3S Research Center, Hanover

Emphasis will be on experience-sharing and discussing mutual interests in big-data analytics for the temporal Web.

The workshop is free of charge and open to public, but registration is compulsory by sending an email to:
.(JavaScript must be enabled to view this email address)

Be Welcome!

More Information about LAWA project



Web Archiving Workshop at FIAT/IFTA 2012


For several years, Internet Memory enjoys to attend FIAT/IFTA conferences to meet and exchange with Audiovisual and Broadcasting Archivists, who are more and more engaged in Web archiving field.

This year, the annual meeting of FIAT/IFTA takes place at the British Library in London from Friday 29th Spetember to Monday 1st October 2012.

We are pleased to participate to the Web archiving Workshop led by the INA (Institut National de l’Audiovisuel), with the British Library on Sunday, 30th September 2012.
Some of the questions that will be addressed :
• How do broadcasters decide on web content?
• How can the public contribute?
• Archiving user-generated content

See you there!
If you can not attend this event, you can stay tuned by following on Twitter:
#webarchiving, @FIAT_IFTA and @InternetMemory.




Web archiving pilot with local authorities


Internet Memory Foundation participated in collaboration with the UK National Archives to a Web archiving pilot with seven local authorities archive services in the UK. Great success!

Since several years, Internet Memory collaborates with the UK National Archives to collect and preserve Web content from UK governmental websites. In this framework, terabytes of online documents are archived every year, generating millions of hits per week on the UK National Web Archives.

In order to encourage local archive services to create their own Web archive, The National Archives decided one year ago to organize a Web archiving pilot with 7 local authority archive services, representing 20 local authorities.
Internet Memory was glad to be part of this pilot for the training and operational steps including:
- training session about processes, tools, challenges
- selection of websites: each service had to select 3 websites to archive
- crawls in January 2012, monitoring and QA involving each service.

At the end of the pilot, participants were satisfied by the experience and results.
They are now considering future options to develop their own Web archives.
To be continued…

Results of Web archiving Pilot by local archive services

Greater Manchester Archives Group

- Manchester International Film Festival  
- Football Club United of Manchester
- Greater Manchester Coalition of Disabled People

North Yorkshire County Record Office

- Taylors of Harrogate
- Northallerton Town Football Club
- UNISON North Yorkshire Local Government


- Nick Clegg, MP for Sheffield Hallam
- Sheffield Pride
- South Yorkshire Housing Association   


- Diocese of Lichfield
- Stoke on Trent, Pottery and Ceramics
- Staffordshire Hoard


- Hambledon
- Painshill Park
- Surrey Wildlife Trust

West Yorkshire Archives Service

- Wakefield Anglican Diocese
- Incredible Edible Todmorden
- The Culture Vulture

Dorset History Centre

- Bournemouth Holidays and Tourist Information
- Visit Dorset
- Poole Tourism

The UK National Archives links
- News story
- Press Release




LIBER 41st annual conference in Estonia


Internet Memory strives to be present at international conferences to promote Web archiving. Thus our institution is glad to attend the LIBER 41st annual conference, which takes place this year in Tartu (Estonia).

Web archiving, or preserving a precious heritage

LIBER conference is a key event for research libraries to share and collaborate on their own issues, including collection and preservation. On the other side, Internet Memory main target is to share and inform about its new technical know-how in the world of digital collection and Web archiving.

On Wednesday, June 27th, Internet Memory in partnership with the National Library of Ireland, presents a paper, which clarifies a use case for the establishment of Web archiving campaigns.

How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Both speakers will discuss the different steps of putting in place a web archiving project, project Definition, Selection, Permission, Crawl, Quality Assurance and Access to web archive collections.

This case illustrates the mission of Internet Memory: to develop new collaborations and partnerships to expand Web preservation initiatives.

For the National Library of Ireland, our mission was to collect, preserve and provide access to high value Web content (political data around several elections in Ireland).



Focus on the first SCAPE project year


The progress of Project Scape in one year with IMF.

The Internet Memory Foundation is an active partner of the SCAPE project. In this major project, our team contributes to implement solutions and innovations needed to address SCAPE challenges.

The cornerstone of the SCAPE project is now viable. Engineers and researchers in the Internet Memory Foundation have participated to design the architecture of the scalable preservation platform. The IMF has also provided expertise in the design of the platform’s testbeds, of preservation scenarios and data provision. Within IMF, a first iteration of the platftom is deployed as a central instance available for all other project partners..

In just one year, the SCAPE project already has six deliverables - including five public – delivered within the European Commission deadlines. They have all been accepted by the EC in succession to the first year review. The website dedicated to SCAPE project gains three new reports and 15 scientific publications, focusing on recent results developed in the project SCAPE, published in revues and at conferences.

This first year has been prolific in highlights, including:
- An experimental cluster for development work on the preservation platform has been deployed.
- Several applications and components have been released (e.g. a Prototype for command-line execution of Hadoop applications, the SCAPE Action Catalogue, the Akubra HDFS adaptor).
- 22 SCAPE Scenarios have been developed. Here Datasets, Issues, and Solutions from SCAPE content providers are documented.
- Numerous experimental Taverna Workflows have been developed and tested.
- 52 action services are online and available for these Taverna test workflows.

To read more about the project SCAPE please sign up to follow the news or visit the project Newsletter already online.



Workshop at the IIPC 2012 General Assembly : Leveraging Web Archives Research


Internet Memory developed a new infrastructure with the ambition to reach “Web-scale” in terms of Web documents acquisition and computable data storage.

Internet Research requires the ability to store and analyse large portions of the Web as a foundational block for most content-centric studies.

For this, a combination of Web archives together with a distributed infrastructure supporting extended analytical tools is a necessary tool. With such an infrastructure, large-scale measurements, topological information and trends at Internet scale can be brought to researchers and information professional’s scrutiny.

Internet Memory developed a new infrastructure with the ambition to reach “Web-scale” in terms of Web documents acquisition (billions of resources crawled per week) and computable data storage (Petabytes of data). This platform, partly supported by several EU projects among which LAWA (Longitudinal Analytics of Web Archive data) includes:

- A new crawler, entirely implemented in Erlang to support the retrieval of billions of pages in days. Thanks to its innovative frontier and seen-URL data structure, it sustains throughput for weeks while enabling Web-scale exploration.
- A new Web Archive repository for content and metadata based on HBase. It offers a perfect storage layer for Web archives as it is functionally isomorphic to WARC, but abstracts lots of the underlying data management (replication, index creation etc) while exposing analytical friendly APIs.
- Filters and extractors to distil relevant information and create processing chain in a distributed execution environment.

This presentation will offer an overview of this platform and discuss the next steps of its development.
International Internet Preservation Consortium (IIPC) 2012 General Assembly
Library of Congress, Washington DC
Tuesday May 1, 2012, 2:30 pm -3:45 pm (Members only)
Presented by Leïla Medjkoune




HBASE CON2012 : Mignify, A Big Data Refinery Built on HBase


In the framework of LAWA project, IMF will present at HBasecon 2012 progress of the design and development of a Big Data Platform: May 22, 2012 in San Francisco

Mignify: A Big Data Refinery Built on HBase

HBasecon 2012
Tuesday, May 22, 2012, 2:20pm – 3:00pm, InterContinental San Francisco Hotel
Presented by Stanislav Barton

This platform is partly supported by several EU projects among which LAWA (Longitudinal Analytics of Web Archive data).

Mignify is a platform for collecting, storing and analyzing Big Data harvested from the web. It aims at providing an easy access to focused and structured information extracted from Web data flows. It consists of a distributed crawler, a resource-oriented storage based on HDFS and HBase, and an extraction framework that produces filtered, enriched, and aggregated data from large document collections, including the temporal aspect. The whole system is deployed in an innovative hardware architecture comprising of a high number of small (low-consumption) nodes. This talk will tackle the decisions made along the design and development of the platform, both under a technical and functional perspective. It will introduce the cloud infrastructure, the LTE-like ingestion of the crawler output into HBase/HDFS, and the triggering mechanism of analytics based on a declarative filter/extraction specification. The design choices will be illustrated with a pilot application targeting Daily Web Monitoring in the context of a national domain.

HBasecon 2012 is the first industry conference for Apache HBase users, contributors, administrators and application developers and we are glad to present




Web Archiving at the College de France


On March 28th, at 11.00 am, a Web archiving Seminar held by Julien Masanès

At the College de France, Chair of Information Technology and Digital Sciences

Information technology has revolutionized our lives. Computers are traditionally seen as computing machines, although their main purpose is now to manage data. This course will cover essential aspects of data management, including its close relationship with mathematical logic and complexity theory. The Web can be seen as a huge distributed database: its most exciting aspects will also be studied, such as its scale or the challenges of distributed computing and the Semantic Web.

Wednesday, March 28th, from 10.00 to 12.00 am: Semantic Web, Open Data and Web Archiving

Serge Abiteboul opens the conference with a lecture about the Semantic Web and invites François Bancilhon, Director of DataPublica to talk and Julien Masanès, Director of the Internet Memory Foundation to talk about Open Data and Web archiving.

Feel free to join!

Amphithéâtre Maurice Halbwachs
Collège de France
11, place Marcelin Berthelot
75231 Paris Cedex 05



Open source version of the LivingKnowledge testbed publicly released on SourceForge


Since its public release on SourceForge in August 2011 under the name of Diversity Engine, many downloads were made and some of the testbed components will be reused in other FP7 research projects such as TrendMiner.

LivingKnowledge Project

The LivingKnowledge project (LK) enhances the state of the art of retrieving information from the Web by formalizing the notions of bias and diversity, creating tools that analyze, summarize and visualize bias in textual and image documents and finally, by creating applications that exploit this technology.

LivingKnowledge Testbed

The testbed integrates the following components, all of which contribute to diversity and bias aware search:
- document collections chosen to reflect a diversity of document types and content,
- image and text analysis tools supporting the analysis of diversity in text and image documents,
- indexing and search tools supporting the bias and diversity aware search including novel visualization methods,

The testbed processing starts with document collections that are available upon request from the Internet Memory Foundation, including 280 News sites and 750 blogs.
Furthermore, the testbed supports a number of collection formats allowing users to incorporate their own collections.

Hands-On session with over 30 participants (Symposium on Bias and Diversity) was held during the 8th International Summer School on Information Retrieval (ESSIR), which tooks place in Koblenz (Germany) in August/September 2011.

More info

Living Knowledge Project
Diversity Engine
Symposium on Bias and Diversity in IR (ESSIR 2011)



Temporal Web Analytics Workshop (TempWeb02) at WWW2012 in Lyon on April 17,


TempWeb02 will take place April 17th, 2012 in conjunction with International World Wide Web Conference in Lyon, France.

As PC-Chair and Organizer, Internet Memory Foundation informs you that the submission deadline for paper is fixed to February 24, 2012.


The objective of this workshop is to provide a venue for researchers of all domains (IE/IR, Web mining etc.) where the temporal dimension opens up an entirely new range of challenges and possibilities. The workshops ambition is to help shaping a community of interest on the research challenges and possibilities resulting from the introduction of the time dimension in Web analysis.

TempWeb focuses on temporal data analysis along the time dimension for Web data that has been collected over extended time periods. A major challenge in this regard is the sheer size of the data it exposes and the ability to make sense of it in a useful and meaningful manner for its users. Web scale data analytics therefore needs to develop infrastructures and extended analytical tools to make sense of these.

Workshop topics

• Web scale data analytics
• Temporal Web analytics
• Distributed data analytics
• Web science
• Web dynamics
• Data quality metrics
• Web spam
• Knowledge evolution on the Web
• Systematic exploitation of Web archives
• Large scale data storage
• Large scale data processing
• Data aggregation
• Web trends
• Topic mining
• Terminology evolution
• Community detection and evolution

Important Dates

• Paper submission deadline: February 24, 2012
• Notification of acceptance: March 5, 2012
• Camera ready copy deadline: March 16, 2012
• Workshop: April 17, 2012

Please post your submission (up to 8 pages) using the ACM template:

Note that the workshop proceedings will be published in ACM DL (ISBN 978-1-4503-1188-5)


This workshop is organized with the support of the EU 7th Framework ICT STREP on Longitudinal Analytics of Web Archive data (LAWA) under contract no. 258105.

Workshop Officials


PC-Chairs and Organizers:

Ricardo Baeza-­Yates (Yahoo! Research, Spain)
Julien Masanès (Internet Memory Foundation, France and Netherlands)
Marc Spaniol (Max Planck Institute for Informatics, Germany)

Program Committee:

Eytan Adar (University of Michigan, USA)
Omar Alonso (Microsoft Bing, USA)
Srikanta Bedathur (IIIT-Delhi, India)
Andras Benczur (Hungarian Academy of Science)
Klaus Berberich (Max Planck Institute for Informatics, Germany)
Roi Blanco (Yahoo! Research, Spain)
Adam Jatowt (Kyoto University, Japan)
Scott Kirkpatrick (Hebrew University Jerusalem, Israel)
Christian König (Microsoft Research, USA)
Frank McCown (Harding University, USA)
Michael Nelson (Old Dominion University, USA)
Nikos Ntarmos (University of Patras, Greece)
Kjetil Norvag (Norwegian University of Science and Technology, Norway)
Philippe Rigaux (Internet Memory Foundation, France and Netherlands)
Thomas Risse (L3S Research Center, Germany)
Pierre Senellart (Télécom ParisTech, France)
Torsten Suel (NYU Polytechnic, USA)
Masashi Toyoda (Tokyo University, Japan)
Peter Triantafillou (University of Patras, Greece)
Michalis Vazirgiannis (Athens University of Economics and Business & École Polytechnique)
Gerhard Weikum (Max Planck Institute for Informatics, Germany)



TV Show: « La mémoire de toile » (net memory) and Web archiving challenges


Reportage on Web archiving by France24


The Internet has become one of the most productive media for information and news. Thus, there’s an absolute need to preserve web content and promote Web archiving at large scale. This idea begins to be one of the great challenges of the Web.
Media are already interested in the subject, and France24, the French international news channel, is broadcasting a video reportage on web harvesting in France (due to the French legal deposit), on Web archiving in general and on giving access to these Web archive collections.

This video shows a rapid overview of French initiatives and Web archiving technologies thanks to the participation of the National Library of France, the National Audiovisual Institute of France and the Internet Memory Foundation (interview of Julien Masanès by Natalia Gallois in our offices in Paris).

To view the video and discover the challenges of Web archiving click here (in French).
TV Show: “Web News”, News seen on the Web and about the web.



Happy New Year 2012!


We present you our best wishes for this New Year 2012!

2012 will be a year full of projects and developments, so follow us on Twitter and save our RSS feed!



November 7-8th, Kick-Off of a new R&D project: TrendMiner


We are glad to announce the kick-off of the European Research project, TrendMiner on Large-scale, Cross-lingual Trend Mining and Summarization of Real-time Media Streams

Today in Luxembourg starts TrendMiner project (Large-scale, Cross-lingual Trend Mining and Summarization of Real-time Media Streams). It is a three-year European project funded by the European Commission through the Seventh Research Framework Program (FP7-ICT) and under Project No 287863.

Beside Internet Memory Foundation are involved:
- Deutsches Forschungszentrum für Künstliche Intelligenz GmbH(Germany) as Coordinator,
- The University of Sheffield (United Kingdom),
- Ontotext AD (Bulgaria),
- University of Southampton (UK),
- Eurokleis S.R.L. (Italy),
- Sora Ogris & Hofinger GmbH (Austria)
- and Hardik Fintrade Pvt Ltd. (India).

This project aims at delivering innovative, portable open-source real-time methods for cross-lingual mining and summarization of large-scale stream media.

IMF will contribute to the Platform for Real Time Media collection, Analysis and storage by :
- providing scalable infrastructure to partners, with support for integration and experiment.
- designing and developing an application-aware crawler mechanism for social media.

For more information on TrendMiner, please visit the Project website (under construction).



Interview with France Lasfargues after IFTA 2011


France Lasfargues, project manager for the foundation, conducts two research projects on web archiving and a portfolio of Internet Memory partners. She talks about the results of her participation in the conference of the International Federation of Television Archives (IFTA) in Turin in September 2011, where she led a workshop on web archiving and audiovisual archives with two partners: SWR (German Television) and in Beeld en Geluid (Netherlands Institute for Sound and Vision).


Was this your first participation in the FIAT?

France Lasfargues: Personally, yes. But, it is not the first participation of Internet Memory Foundation, which is an associate member of IFTA. Last year, Chloe Martin, Business Developer at the Foundation, presented a poster based on our web archiving platform, (ATN) and issues related to the collection and access videos that are broadcasted on the Web.

Is it easy for Internet Memory to participate in this international conference?

FL : To make this conference, we must first answer the call for participation that occurs at least three months before. It is then decided how we can present our activities, the participants we would like to integrate and form of presentation (poster, workshop, plenary session, ...). We submit our proposal and expect a return from IFTA. So, we decided to focus on issues that involve the expectations and needs of audiovisual archives and our skills and areas of development. The shape of the workshop seemed most appropriate, also to open a space for dialogue and exchange with the audience.

This brings us precisely to talk in more detail the reason for the presence of the Internet Memory at the IFTA.

FL : Our goal is simple : to communicate about the need of web archiving for audiovisual archives and, thereby, to share our expertise in this area. Internet Memory wants to drive projects, motivate institutions to get engaged in web archiving projects, now, in order to stop the loss of relevant content and high added value.

What is the angle chosen by Internet Memory for this workshop?

FL : The workshop was mainly an opportunity to invite audiovisual archives to share their current experience and problems in terms of web archiving and make up to date with capture and access solutions that we have developed. We must say that we have strong arguments on the matter. This gave us the opportunity to communicate on all of our projects LIWA, LK, LAWA, SCAPE, especially, ARCOMEM, which are a European large-scale projects and an excellent reference to show the extent of our technologies and skills. In detail, and because attendees were audiovisual archives, we have focused on the technical challenges of capturing video in web sites (LIWA). Equally important, the social web and the challenges it poses for archivists (ARCOMEM). We have also talked about various tools that we develop (including Application Aware Crawling, API Crawls, etc ...) to solve archiving problems and improve the archival collection.

How many participants were present to this international conference? Was your workshop appreciated?

The conference brought over 300 audiovisual archivists.
As for the workshop that we held, the workshop room was full with over than 120 participants. I admit that we did not expect such a success. Last year, the workshop on web archiving had mobilized at most 40 people! Moreover, the organizers of the conference highlighted our “score of attendance”. This proves that much more archivists are interested in web archiving and audiovisual archives. Internet Memory services could be developed in the near future and we are always ready to repeat the experience at the IFTA.



Workshop on web archiving for audiovisual archives at FIAT 2011 in Turin


IMF is glad to participate to the workshop Web Archiving for Audiovisual Archives 'No content without context' with SWR and the Netherlands Institute for Sound and Vision Nederlands on Friday, 30 September (2.30 pm)

This workshop will present our platform (AtN) and several concrete use cases, which will contribute to raise awareness and interest of audiovisual archivists.
1/ SWR use case: Rock am Ring 2011
2/ Multimedia Web archiving following recently finalized Living Web Archives project (LiWA).

Hope you will be many to attend it!

Find all details on FIAT website



Temporal Web Analytics Workshop (TWAW 2011) Proceeding


The Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW 2011) are online now.

The Proceedings of the 1st International Temporal Web Analytics Workshop (TWAW 2011) held in conjunction with the 20th International World Wide Web Conference (www2011) in Hyderabad, India on March 28, 2011 are online.
The workshop was co-organized by the LAWA project and chaired by R. Baeza-Yates (Yahoo! Research Barcelona), J. Masanès (Internet Memory Foundation) and M. Spaniol (Max-Planck-Institut für Informatik).



ARCOMEM Meeting in Paris on May 9-10-11


The ARCOMEM Consortium will come together to discuss and fix the next milestones of the system architecture work packages. This meeting will be hosted by Télécom ParisTech, a member of the ARCOMEM Consortium.

The core areas of this meeting in Paris will be the system architecture of the different modules (e.g. content crawling, social web analysis, archive enrichment, storage module …).
After the meeting in Paris ARCOMEM will provide the results on the project website.



LAWA Presentation at the FIRE Research Workshop


The LAWA project will be presented at the FIRE Research Workshop in Budapest (Hungary) on May 16th 2011.

The FIRE Research Workshop will be held on May 16th 2011 in Budapest (Hungary). This event is part of the Future Internet Week 2011. LAWA is proud to give a presentation in the Future Internet, Living Labs and Web analytics session.

Presentation Abstract
Organizations like the Internet Archive have been capturing Web contents over decades. This time-versioned content is a gold mine for analysts, focusing on longitudinal studies. An application example is tracking and analyzing a politician’s public appearances over a decade. The LAWA project develops methods and tools for time-travel indexing and querying, entity detection and tracking along the time axis, and advanced analyses and knowledge discovery. For scalability, we pursue Hadoop-based distributed computations. We also prepare reference data and will provide analytics services. We will offer a user workshop in late 2011 to disseminate these opportunities and explore interesting use cases.



LivingKnowledge partners attend the Fet11 in Budapest


LivingKnowledge project is currently part of the exhibition to show the project diversity-aware technologies.
This event is organised by the “ICT forever yours” community.

LK partners are present at FET 2011 in Budapest, Hungary in May 4th-6th, 2011

Looking for a search engine to find images or web pages about David Beckham arranged in terms of the various clubs he has played for?
A 15 minute demonstration of LivingKnowledge showcases technologies enabling bias-aware, diversity-aware and evolution-aware information access, including diversity-aware search within texts and images, analysis of future predictions as well as fact-and-opinion extraction.



SCAPE Project on the Web


We are pleased to inform that the official SCAPE website is now online!

What is it about?
The SCAPE project will develop scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. SCAPE will enhance the state of the art of digital preservation in three ways: by developing infrastructure and tools for scalable preservation actions; by providing a framework for automated, quality-assured preservation workflows and by integrating these components with a policy-based preservation planning and watch system. These concrete project results will be validated within three large-scale Testbeds from diverse application areas.

SCAPE also has a Twitter account: @SCAPEproject
Tweets with the hashtag #SCAPEproject (or pointed to @SCAPEproject) will be re-tweeted and appear on the website’s Twitter feed.



Museums and the Web


Internet Memory will participate to the Museums & the Web conference in Philadelphia, April 6-9 2011.

In an one-hour workshop, main aspects of Web archiving will be presented.
If you can’t join, follow #mw2011 Tag on Twitter



1st LAWA Newsletter


Read the first Newsletter of Longitudinal Analytics of Web Archive Data Project !

This newsletter presents an overview and main research areas of LAWA project.

At this stage, IM focuses on specifications of a distributed architecture (p.2)

Enjoy reading!



Brewster Kahle in Paris


The Foundation is organizing a conference with Brewster Kahle on Wednesday, March 16 at La Cantine (Paris)

“Towards Universal Access to Human Knowledge” Brewster Kahle

The Internet Memory Foundation, in partnership with La Cantine, invites Brewster Kahle, co-founder of the Internet Archive, the Open Access Foundation and the Internet Memory Foundation, to talk about the open and universal access to data and knowledge, mass digitization and the Internet Archive.

This conference will be held at La Cantine, first Collaborative Network workspace (, on Wednesday, March 16 from 7.30.
It will be followed by a discussion with the participants, moderated by Julien Masanès, director of Internet Memory Foundation.

The Internet Memory Foundation (formerly European Archive) is, like the Internet Archive, a nonprofit institution which, since 2005, actively supports the preservation of the Internet as a new media. (

7.00 p.m. Welcome by Internet Memory Foundation
7:30 p.m. Brewster Kahle: “Towards Universal Access to Human Knowledge”
8:15 p.m. Julien Masanès, moderator, discussions

Thank you for informing us of your participation by sending an email to event / at /

Wednesday, March 16, 2011, from 7.00 pm
La Cantine, 12 Galerie Montmartre (151 rue Montmartre) - 75002 Paris



SCAPE Kick-off Meeting


The SCAPE Kick-off Meeting is currently taking place in Vienna, Austria.

SCAPE (SCAlable Preservation Environments) is a three-year European Research project FP7 funded by the European Commission. Learn more



Suggest a website: Arab Spring


We are currently crawling websites about Arab Spring.

We invite you to participate in suggesting websites dealing with this topic.
Please, feel free to fulfill this form



Job position: Development engineer


We are currently hiring an Erlang experienced developer. See job post and do not hesitate to forward this job announcement. Thanks!



LivingKnowledge Project


General Meeting in Bangalore

Internet Memory is attending the General Meeting of LivingKnowledge project in Bangalore from January 31st to February 2nd 2011. Learn more!





The Kick-Off of the European project ARCOMEM took place in the University of Sheffield on January 24-26th, 2011. Great participants and future developments!

ARCOMEM (Collect-All ARchives to COmmunity MEMories) is a three-year European project funded by the European Commission through the Seventh Research Framework Programme. Learn more!



Happy New Year 2011!


We present you our best wishes for this New Year 2011!



Living Web Archives Survey


Participate to LiWA Survey on Web archiving!

In the context of the Living Web Archives (LiWA) project funded by the European Commission (Project no. 216267), the Foundation is carrying out a survey of Web archiving in European and international institutions (archives, libraries, institutions or departments with a vocation to preserve cultural heritage).



A new website for Internet Memory Foundation


We are pleased to announce the launch of our new website!
Enjoy it and follow us on Twitter!



More News…