Internet Memory Foundation gives access to Livingknowledge subcollection.
In the framework of LivingKnowledge, a three-year European project (No. 231126 (2009-2012), funded by the European Commission through FP7, Internet Memory Foundation built a news and blogs annotated collection.
This data set is already used for a call for participation organized in the framework of NTCIR Temporal Information Access (Temporalia). The results will be presented at NTCIR-12 conference @ NII, Tokyo, Japan.
If other researchers need access to this collection, feel free to contact us.
The collection is approximately 20GB uncompressed and over 5GB zipped in size.
It spans from May 2011 to March 2013 and contains around 3.8M documents collected from about 1.500 different blogs and news sources.
The data is split into 970 files, named after the date of that day and some information about its sources (there might be more than one file per day).