<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:atom="http://www.w3.org/2005/Atom">

    <channel>
    
    <title><![CDATA[internet Memory Foundation]]></title>
    <link>http://internetmemory.org/en</link>
    <description>The internet Memory Foundation is an european non-profit institution dedicated to web archiving.</description>
    <dc:language>en</dc:language>
    <dc:creator>http://internetmemory.org/en</dc:creator>
    <dc:rights>Copyright 2012</dc:rights>
    <pubDate>Wed, 07 Nov 2012 10:46:23 GMT</pubDate>
    <atom:link href="http://internetmemory.org/en/index.php/RSS" rel="self" type="application/rss+xml" />

    

    <item>
      <title>Women equality at work, how are we doing at internet memory?</title>
      <link>http://internetmemory.org/en/index.php/Memoranda/women_equality_at_work_how_are_we_doing_at_internet_memory</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/Memoranda/women_equality_at_work_how_are_we_doing_at_internet_memory#id:238#date:00:18</guid>
      <description><![CDATA[<p>It&#8217;s women international day today and we hear lot of discourse full of good intention, but we all know women&#8217;s equality is not there and progress are overall very slow. So we thought we it could be useful and healthy to check how we are doing at Internet Memory. </p>

<p>We have been compiling to this effect figures and ratios on important issues like wages equality and management position for women. We also did a benchmark against IT sector in France for which we have comparative figures.</p>

<p>Overall in IT, situation is not very good for women as the benchmark shows. This is maybe one of the reason women, although very successful in higher education seem, overall, to find IT not so attractive. At IM, we believe there is no fatality to that as, we hope, the results of our little study show. We will let you judge!</p>

<p><br />
<img src="http://internetmemory.org/images/uploads/8march-facts.jpg" alt="8 March  Facts" width="630" style="border: 0;" /></p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Sat, 09 Mar 2013 00:18 GMT</pubDate>
    </item>

    <item>
      <title>Reducing Energy consumption for large web archives</title>
      <link>http://internetmemory.org/en/index.php/Memoranda/reducing_energy_consumption_for_large_web_archives</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/Memoranda/reducing_energy_consumption_for_large_web_archives#id:227#date:10:58</guid>
      <description><![CDATA[By hosting hundreds of Terabytes of Web data, Internet Memory considers its energy footprint as an important challenge to address. In this present post, we focus on green and innovative solutions we decide to implement on Internet Memory infrastructure.  <h1>Hosting infrastructure, a strategic question for a Web archive</h1>

<p>On the one hand, IM is conscious that Web archiving field (as ICT in general) is part of the problem with the resources and energy it consumes (such as the majority of datacenters). On the other, we believe that the Web deserves a memory, because this media is pervasive in our society, and certainly today one of its most important representation. As we now store data in the Petabyte ranges, we had to do something about this.</p>

<h1>Green IT for a Web archive? Yes, it is possible.</h1>

<p>Since it’s inception in 2005, Internet Memory (then called <a href="http://europarchive.org">European Archive</a>), has been working on reducing its energy footprint, by using servers built from low power consumption parts (the so-called red boxes, also used by <a href="http://archive.org">The Internet Archive</a>). Although used in a traditional datacenter, they contribute significantly to the overall energy efficiency of the datacenter where they are hosted in Amsterdam. This already puts the first IM Datacenter above the standard of the industry in this regard.</p>

<p>But IM wanted to go one step further, and this required leaving behind traditional datacenters, which are, by design, heavy users of energy and cooling resources. In collaboration with <a href="http://www.no-rack.com/">No Rack</a> which is specialized in Green IT, we went on to use a new generation of servers and infrastructures, dedicated to massive storage, with a highly scalable architecture, a very low consumption and… without cooling.<br />
Today, this new infrastructure is operational in our Paris ‘Datacenter’ and it can support up to 1,2 Petabytes of data.</p>

<h1>New Internet Memory Datacenter</h1>

<p>That’s the result of improvements at several levels, including a new design of cylindrical ‘rack’, which enables a free cooling system and a lower energy consumption at all levels (servers, disks and motherboards). </p>

<p>The free-cooling system has been made possible due to a very low thermic diffusion (for 72 nodes, IM datacenter is set between 5300 W and 6300 W depending on the configuration of server class) and due to an innovative design, which enables natural heat extraction.</p>

<p>Here’s a comparison between a regular datacenter and IM datacenter:<br />
<img src="http://internetmemory.org/images/uploads/energy_thumb.png" alt="Energy" width="600" height="155"  style="border: 0;" /><br />
These figures highlight an economy of kW, which represents a carbon footprint 8 times lower (22,000 kg CO2, instead of 180,000 kg CO2). </p>

<h1>Internet Memory Architecture and process</h1>

<p>Internet Memory has implemented an efficient distributed architecture, which enables virtualization, better performances and faster processes. <br />
Thus, all Archive users share Internet Memory infrastructure and applications, which ensures maximization of storage utilisation and reduction in the number of devices required, saving energy and costs.</p>

<p><strong>If you are interested to know more, drop us a line, or come and we will organize a visit for you!</strong></p>

<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Mon, 12 Nov 2012 10:58 GMT</pubDate>
    </item>

    <item>
      <title>Participation to WebArchivists Camp in Paris, on November 22th</title>
      <link>http://internetmemory.org/en/index.php/News/participation_to_webarchivists_camp_in_paris_on_november_22th</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/participation_to_webarchivists_camp_in_paris_on_november_22th#id:226#date:10:46</guid>
      <description><![CDATA[Internet Memory is existed to be part of this First BarCamp on Web archiving organized in Paris by WebArchivists association. <br />
Be welcome to join!<p><img src="http://internetmemory.org/images/uploads/WACamp_thumb.png" alt="WACamp" width="438" height="264"  style="border: 0;" /></p><h3>What is WebArchists, and the WebArchivists Camp?</h3>

<p><a href="http://www.webarchivists.org/"target="new">WebArchists</a> is a non-profit organization founded in 2009, based mostly in Paris. Its goals are: <br />
- To increase mainstream medias and public interest for webarchives, <br />
- To gather the news and projects graviting around this subject, <br />
- To invite people from different horizons to meet, think and imagine what the future of web archiving will looks like and so on. </p>

<p>For this WebArchivists Camp, you will have the opportunity to meet Julien Masanès, director of <a href="http://internetmemory.org/en">Internet Memory</a>, the team working on the French Legal Deposit from the National Library of France, but also hackers, developers, traditional archivists, graphic designers, net-artists and everyone who want to jump in! <br />
Based on the Barcamp system, the evening will be organized around different workshops, thinking about problems to solve, ideas, projects&#8230; At the end, results of the different workshops will be  discussed.</p>

<p>A live-streaming of the event, as well as a translation of the main ideas and speeches should be put in place by WebArchists team. And you can follow them on Twitter @webarchivists (hashtag #WACamp).</p>

<h3>More info</h3>
<p><a href="http://www.webarchivists.org/2012/10/rejoignez-le-premier-webarchivists-camp-le-22-novembre-a-paris/"target="new">WebArchivists Website</a> <br />
<a href="http://webarchivistscamp.eventbrite.fr/#english"target="new">Event page</a> <br />
La Cantine, 12 Galerie Montmartre<br />
75002 Paris<br />
Thursday 22th Novembre 2012 <br />
From 6.30 pm to 9.30 pm</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Wed, 07 Nov 2012 10:46 GMT</pubDate>
    </item>

    <item>
      <title>How to fit in? Integrating a web archiving program in your organization</title>
      <link>http://internetmemory.org/en/index.php/News/how_to_fit_in_integrating_a_web_archiving_program_in_your_organization</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/how_to_fit_in_integrating_a_web_archiving_program_in_your_organization#id:225#date:12:59</guid>
      <description><![CDATA[Internet Memory will be part of IIPC-sponsored workshop held at the Bibliothèque nationale de France, Paris, on Friday, November 30, 2012.<p><img src="http://internetmemory.org/images/uploads/IIPC_new_Logo_thumb.png" alt="IIPC New Logo" width="321" height="103"  style="border: 0;" /><br />
Ten National Libraries from all over the world will be attending the <a href="http://netpreserve.org/events/how-fit-integrating-web-archiving-program-your-organization">workshop</a> and will have the opportunity, on Friday 30th November to learn more about Internet Memory activities:<br />
- Our Partnerships with heritage institutions and research centers<br />
- Web archiving services: Production, Quality Assurance and tools we developed to improve crawl, access, and usage<br />
- Research projects which enables Internet Memory to collaborate on innovative projects with prestigious labs. </p>

<p><strong>List of participants:</strong><br />
Bibliotheca Alexandrina<br />
British Library<br />
Library of Congress<br />
National and University Library of Slovenia<br />
National Library and Archives of Québec<br />
National Library of Estonia<br />
National Library of Germany<br />
National Library of the Netherlands <br />
National Library of Singapore<br />
National Library of Spain</p>

<p><a href="http://netpreserve.org/events/how-fit-integrating-web-archiving-program-your-organization">More information</a></p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Wed, 31 Oct 2012 12:59 GMT</pubDate>
    </item>

    <item>
      <title>Workshop on Big-Data Analytics for the Temporal Web, Paris, November 13, 2012</title>
      <link>http://internetmemory.org/en/index.php/News/workshop_on_big_data_analytics_for_the_temporal_web_paris_november_13_2012</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/workshop_on_big_data_analytics_for_the_temporal_web_paris_november_13_2012#id:220#date:12:04</guid>
      <description><![CDATA[The <a href="http://www.lawa-project.eu/">LAWA project</a> organizes an International Workshop on Big-Data Analytics for the Temporal Web, Paris, November 13, 2012. <br />
Keynotes by Yahoo! Research, Barcelona (R. Baeza-Yates) and L3S Research Center, Hanover (W. Nejdl).<p><img src="http://internetmemory.org/images/uploads/LAWA_Logo.png" alt="Lawa Logo" width="160" height="60" style="border: 0;" /></p>

<p>The <a href="http://www.lawa-project.eu/">LAWA project</a> organizes an one-day workshop with researchers using (or planning to use) the Web as a corpus for their studies. </p>

<p>The focus is on methods, tools, and platforms for big-data analytics, including requirements on and experiences with such technologies. <br />
Topics of interest include but are not limited to: <br />
- Web dynamics, history, and archives; <br />
- Text mining and contents classification, <br />
- Temporal/longitudinal studies<br />
- Scalable methods (e.g., cloud-based map-reduce),<br />
- Large scale data storage, <br />
- Community detection and evolution.</p>

<p>The workshop will have presentations by participating researchers and big-data users, including the LAWA project team. </p>

<p>Keynotes by: <br />
- Ricardo Baeza-Yates from Yahoo! Research, Barcelona<br />
- Wolfgang Nejdl from L3S Research Center, Hanover</p>

<p>Emphasis will be on experience-sharing and discussing mutual interests in big-data analytics for the temporal Web. </p>

<p>The workshop is free of charge and open to public, but registration is compulsory by sending an email to:<br />
<strong>lawa@mpi-inf.mpg.de</strong></p>

<p>Be Welcome!</p>

<p><a href="http://internetmemory.org/en/index.php/projects/lawa">More Information about LAWA project</a></p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Wed, 10 Oct 2012 12:04 GMT</pubDate>
    </item>

    <item>
      <title>Web Archiving Workshop at FIAT/IFTA 2012</title>
      <link>http://internetmemory.org/en/index.php/News/web_archiving_workshop_at_fiat_ifta_2012</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/web_archiving_workshop_at_fiat_ifta_2012#id:212#date:08:51</guid>
      <description><![CDATA[For several years, Internet Memory enjoys to attend FIAT/IFTA conferences to meet and exchange with Audiovisual and Broadcasting Archivists, who are more and more engaged in Web archiving field.<p><img src="http://internetmemory.org/images/uploads/Fiat_Ifta.png" alt="Fiat2012" width="254" height="288"  style="border: 0;" /><br />
This year, the annual meeting of <a href="http://www.fiatifta.org/index.php/conference-2012/">FIAT/IFTA</a> takes place at the British Library in London from Friday 29th Spetember to Monday 1st October 2012.</p>

<p>We are pleased to participate to the Web archiving Workshop led by the INA (Institut National de l’Audiovisuel), with the British Library on Sunday, 30th September 2012.<br />
Some of the questions that will be addressed :<br />
•	How do broadcasters decide on web content? <br />
•	How can the public contribute? <br />
•	Archiving user-generated content</p>

<p>See you there!<br />
If you can not attend this event, you can stay tuned by following on Twitter:<br />
#webarchiving, @FIAT_IFTA and @InternetMemory.</p>



<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Thu, 27 Sep 2012 08:51 GMT</pubDate>
    </item>

    <item>
      <title>Web archiving pilot with local authorities</title>
      <link>http://internetmemory.org/en/index.php/News/web_archiving_pilot_with_local_authorities</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/web_archiving_pilot_with_local_authorities#id:194#date:11:41</guid>
      <description><![CDATA[Internet Memory Foundation participated in collaboration with the UK National Archives to a Web archiving pilot with seven local authorities archive services in the UK. Great success! <p>Since several years, Internet Memory collaborates with the UK National Archives to collect and preserve Web content from UK governmental websites. In this framework, terabytes of online documents are archived every year, generating millions of hits per week on the UK National Web Archives.</p>

<p>In order to encourage local archive services to create their own Web archive, The National Archives decided one year ago to organize a Web archiving pilot with 7 local authority archive services, representing 20 local authorities. <br />
Internet Memory was glad to be part of this pilot for the training and operational steps including:<br />
- training session about processes, tools, challenges<br />
- selection of websites: each service had to select 3 websites to archive<br />
- crawls in January 2012, monitoring and QA involving each service.</p>

<p>At the end of the pilot, participants were satisfied by the experience and results. <br />
They are now considering future options to develop their own Web archives. <br />
To be continued&#8230;</p>

<h2>Results of Web archiving Pilot by local archive services</h2>

<h3>Greater Manchester Archives Group</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126111754/http://mif.co.uk/ ">Manchester International Film Festival </a>&nbsp;  <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126115910/http://fc-utd.co.uk/">Football Club United of Manchester</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126113204/http://www.gmcdp.com/about.html">Greater Manchester Coalition of Disabled People</a> </p>

<h3>North Yorkshire County Record Office</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126114451/taylorsofharrogate.co.uk">Taylors of Harrogate</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126135617/http://www.northallertontownfc.net/home/home.asp">Northallerton Town Football Club</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126085641/http://www.northyorks-unison.org.uk/">UNISON North Yorkshire Local Government</a></p>

<h3>Sheffield</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126140628/http://www.nickclegg.org.uk">Nick Clegg, MP for Sheffield Hallam</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126145547/http://www.sheffieldpride.org.uk/">Sheffield Pride</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120125174104/http://www.syha.co.uk">South Yorkshire Housing Association</a>&nbsp;  &nbsp; </p>

<h3>Staffordshire</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126151447/http://www.lichfield.anglican.org/">Diocese of Lichfield</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126160758/http://www.thepotteries.org/index.html">Stoke on Trent, Pottery and Ceramics</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126162001/http://www.staffordshirehoard.org.uk/">Staffordshire Hoard </a></p>

<h3>Surrey</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126101656/http://www.hambledonsurrey.co.uk">Hambledon</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126135144/http://www.painshill.co.uk">Painshill Park</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126092333/http://www.surreywildlifetrust.org/">Surrey Wildlife Trust</a> </p>

<h3>West Yorkshire Archives Service</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126085231/http://www.wakefield.anglican.org">Wakefield Anglican Diocese</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120125174026/http://www.incredible-edible-todmorden.co.uk/">Incredible Edible Todmorden </a><br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120125180150/http://theculturevulture.co.uk/">The Culture Vulture </a></p>

<h3>Dorset History Centre</h3>

<p>- <a href="http://webarchive.nationalarchives.gov.uk/20120126090607/http://www.bournemouth.co.uk">Bournemouth Holidays and Tourist Information</a> <br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126091858/http://www.visit-dorset.com/">Visit Dorset </a><br />
- <a href="http://webarchive.nationalarchives.gov.uk/20120126113947/http://www.pooletourism.com/">Poole Tourism </a></p>

<p><strong>The UK National Archives links</strong><br />
- <a href="http://www.nationalarchives.gov.uk/news/734.htm">News story</a><br />
- <a href="http://www.nationalarchives.gov.uk/documents/web-archiving-final.pdf">Press Release</a></p>

<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Thu, 05 Jul 2012 11:41 GMT</pubDate>
    </item>

    <item>
      <title>LIBER 41st annual conference in Estonia</title>
      <link>http://internetmemory.org/en/index.php/News/liber_41st_annual_conference_in_estonia</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/liber_41st_annual_conference_in_estonia#id:172#date:13:37</guid>
      <description><![CDATA[Internet Memory strives to be present at international conferences to promote Web archiving. Thus our institution is glad to attend the <a href="http://www.utlib.ee/liber2012/index.php">LIBER 41st annual conference</a>, which takes place this year in Tartu (Estonia).<h2>Web archiving, or preserving a precious heritage</h2>

<p><img src="http://internetmemory.org/images/uploads/LIBER.png" alt="" width="162" height="106" style="border: 0;" /></p>

<p><a href="http://www.utlib.ee/liber2012/index.php" title="">LIBER conference</a> is a key event for research libraries to share and collaborate on their own issues, including collection and preservation. On the other side, Internet Memory  main target is to share and inform about its new technical know-how in the world of digital collection and Web archiving.</p>

<p>On <a href="http://www.utlib.ee/liber2012/index.php?id=prog_main">Wednesday, June 27th</a>, Internet Memory in partnership with the <a href="http://www.nli.ie/" title="">National Library of Ireland</a>, presents a paper, which clarifies a use case for the establishment of Web archiving campaigns.</p>

<p><strong>How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge</strong></p>

<p>Both speakers will discuss the different steps of putting in place a web archiving project, project Definition, Selection, Permission, Crawl, Quality Assurance and Access to <a href="http://internetmemory.org/en/index.php/about/collections1" title="">web archive collections</a>.</p>

<p>This case illustrates the mission of Internet Memory: to develop new collaborations and partnerships to expand Web preservation initiatives.</p>

<p>For the National Library of Ireland, our mission was to collect, preserve and provide access to high value Web content (political data around several elections in Ireland).</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Mon, 25 Jun 2012 13:37 GMT</pubDate>
    </item>

    <item>
      <title>Focus on the first SCAPE project year</title>
      <link>http://internetmemory.org/en/index.php/News/focus_on_the_first_scape_project_year</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/focus_on_the_first_scape_project_year#id:160#date:12:27</guid>
      <description><![CDATA[The progress of Project Scape in one year with IMF.<p><img src="http://internetmemory.org/images/uploads/SCAPE_logo_thumb.jpg" alt="" width="598" height="243"  style="border: 0;" /></p>

<p>The Internet Memory Foundation is an active partner of the<a href="http://internetmemory.org/fr/index.php/projects/scape1"> SCAPE project</a>. In this major project, our team contributes to implement solutions and innovations needed to address SCAPE challenges.</p>

<p>The cornerstone of the SCAPE project is now viable. Engineers and researchers in the Internet Memory Foundation have participated to design the architecture of the scalable preservation platform. The IMF has also provided expertise in the design of the platform’s testbeds, of preservation scenarios and data provision. Within IMF, a first iteration of the platftom is deployed as a central instance available for all other project partners..</p>

<p>In just one year, the SCAPE project already has six <a href="http://www.scape-project.eu/category/deliverable">deliverables</a> - including five public – delivered within the European Commission deadlines. They have all been accepted by the EC in succession to the first year review. The website dedicated to SCAPE project gains <a href="http://www.scape-project.eu/category/report" title="">three new reports</a> and <a href="http://www.scape-project.eu/category/publication" title="">15 scientific publications</a>, focusing on recent results developed in the project SCAPE, published in revues and at conferences.</p>

<p>This first year has been prolific in highlights, including:<br />
 - An experimental cluster for development work on the preservation platform has been deployed.<br />
 - Several applications and components have been released (e.g. a <a href="https://github.com/openplanets/scape/tree/master/pt-mapred">Prototype for command-line execution of Hadoop applications</a>, the <a href="http://catalogue.scape-project.eu/">SCAPE Action Catalogue</a>, the<a href="https://github.com/fasseg/akubra-hdfs" title=""> Akubra HDFS adaptor</a>).<br />
 - 22 <a href="http://wiki.opf-labs.org/display/SP/SCAPE+Scenarios+-+Datasets,+Issues+and+Solutions">SCAPE Scenarios</a> have been developed. Here Datasets, Issues, and Solutions from SCAPE content providers are documented.<br />
 - Numerous experimental <a href="http://www.myexperiment.org/search?query=SCAPE&amp;type=all&amp;commit=Search">Taverna Workflows</a> have been developed and tested.<br />
 - 52 <a href="http://scape.keep.pt/" title="">action services</a> are online and available for these Taverna test workflows. </p>

<p>To read more about the project SCAPE please <a href="http://scape-project.us4.list-manage1.com/subscribe?u=20cef0f757e3840df2769745b&amp;id=a9d1929cac" title="">sign up</a> to follow the news or visit the project <a href="http://www.scape-project.eu/news/scape-newsletter-1" title="">Newsletter</a> already online.</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Fri, 01 Jun 2012 12:27 GMT</pubDate>
    </item>

    <item>
      <title>Workshop at the IIPC 2012 General Assembly : Leveraging Web Archives Research</title>
      <link>http://internetmemory.org/en/index.php/News/workshop_at_the_iipc_2012_general_assembly_leveraging_web_archives_research</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/workshop_at_the_iipc_2012_general_assembly_leveraging_web_archives_research#id:153#date:09:54</guid>
      <description><![CDATA[Internet Memory developed a new infrastructure with the ambition to reach “Web-scale” in terms of Web documents acquisition and computable data storage.<p>Internet Research requires the ability to store and analyse large portions of the Web as a foundational block for most content-centric studies.</p>

<p>For this, a combination of Web archives together with a distributed infrastructure supporting extended analytical tools is a necessary tool. With such an infrastructure, large-scale measurements, topological information and trends at Internet scale can be brought to researchers and information professional’s scrutiny. </p>

<p>Internet Memory developed a new infrastructure with the ambition to reach “Web-scale” in terms of Web documents acquisition (billions of resources crawled per week) and computable data storage (Petabytes of data). This platform, partly supported by several EU projects among which LAWA (<a href="http://www.lawa-project.eu/">Longitudinal Analytics of Web Archive data</a>) includes:</p>

<p>-	<strong>A new crawler</strong>, entirely implemented in Erlang to support the retrieval of billions of pages in days. Thanks to its innovative frontier and seen-URL data structure, it sustains throughput for weeks while enabling Web-scale exploration.<br />
-	<strong>A new Web Archive repository</strong> for content and metadata based on HBase. It offers a perfect storage layer for Web archives as it is functionally isomorphic to WARC, but abstracts lots of the underlying data management (replication, index creation etc) while exposing analytical friendly APIs.<br />
-	<strong>Filters and extractors</strong> to distil relevant information and create processing chain in a distributed execution environment.</p>

<p>This presentation will offer an overview of this platform and discuss the next steps of its development.<br />
<a href="http://netpreserve.org/events/2012ga.php">International Internet Preservation Consortium (IIPC) 2012 General Assembly</a><br />
Library of Congress, Washington DC<br />
Tuesday May 1, 2012, 2:30 pm -3:45 pm (Members only)<br />
Presented by Leïla Medjkoune</p>



<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Fri, 27 Apr 2012 09:54 GMT</pubDate>
    </item>

    <item>
      <title>HBASE CON2012 : Mignify, A Big Data Refinery Built on HBase</title>
      <link>http://internetmemory.org/en/index.php/News/hbase_con2012_mignify_a_big_data_refinery_built_on_hbase</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/hbase_con2012_mignify_a_big_data_refinery_built_on_hbase#id:149#date:08:46</guid>
      <description><![CDATA[In the framework of <a href="http://www.lawa-project.eu/">LAWA project</a>, IMF will present at <a href="http://www.hbasecon.com/">HBasecon 2012</a> progress of the design and development of a Big Data Platform: May 22, 2012 in San Francisco<h1>Mignify: A Big Data Refinery Built on HBase</h1>

<p><a href="http://www.hbasecon.com/sessions/mignify-a-big-data-refinery-built-on-hbase/">HBasecon 2012</a><br />
Tuesday, May 22, 2012, 2:20pm – 3:00pm, InterContinental San Francisco Hotel<br />
Presented by Stanislav Barton</p>

<p>This platform is partly supported by several EU projects among which LAWA (<a href="http://www.lawa-project.eu/">Longitudinal Analytics of Web Archive data</a>).</p>

<p><a href="http://mignify.com">Mignify</a> is a platform for collecting, storing and analyzing Big Data harvested from the web. It aims at providing an easy access to focused and structured information extracted from Web data flows. It consists of a distributed crawler, a resource-oriented storage based on HDFS and HBase, and an extraction framework that produces filtered, enriched, and aggregated data from large document collections, including the temporal aspect. The whole system is deployed in an innovative hardware architecture comprising of a high number of small (low-consumption) nodes. This talk will tackle the decisions made along the design and development of the platform, both under a technical and functional perspective. It will introduce the cloud infrastructure, the LTE-like ingestion of the crawler output into HBase/HDFS, and the triggering mechanism of analytics based on a declarative filter/extraction specification. The design choices will be illustrated with a pilot application targeting Daily Web Monitoring in the context of a national domain. </p>

<p><a href="http://www.hbasecon.com/">HBasecon 2012</a> is the first industry conference for Apache HBase users, contributors, administrators and application developers and we are glad to present </p>

<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Fri, 27 Apr 2012 08:46 GMT</pubDate>
    </item>

    <item>
      <title>Web Archiving at the College de France</title>
      <link>http://internetmemory.org/en/index.php/News/web_archiving_at_the_college_de_france</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/web_archiving_at_the_college_de_france#id:139#date:17:06</guid>
      <description><![CDATA[On March 28th, at 11.00 am, a <a href="http://www.college-de-france.fr/site/serge-abiteboul/ouverture-des-donnees-publiques-archivage-du-web-.htm">Web archiving Seminar</a> held by Julien Masanès<h3>At the College de France, Chair of Information Technology and Digital Sciences</h3>

<p>Information technology has revolutionized our lives. Computers are traditionally seen as computing machines, although their main purpose is now to manage data. This course will cover essential aspects of data management, including its close relationship with mathematical logic and complexity theory. The Web can be seen as a huge distributed database: its most exciting aspects will also be studied, such as its scale or the challenges of distributed computing and the Semantic Web.</p>

<h3>Wednesday, March 28th, from 10.00 to 12.00 am: Semantic Web, Open Data and Web Archiving</h3>
<p><a href="http://www.college-de-france.fr/site/en-serge-abiteboul/index.htm">Serge Abiteboul</a> opens the conference with a lecture about the Semantic Web and invites François Bancilhon, Director of DataPublica to talk and Julien Masanès, Director of the Internet Memory Foundation to talk about Open Data and Web archiving.</p>

<h3>Feel free to join!</h3>
<p>Address:<br />
Amphithéâtre Maurice Halbwachs <br />
Collège de France<br />
11, place Marcelin Berthelot<br />
75231 Paris Cedex 05<br />
France</p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Tue, 27 Mar 2012 17:06 GMT</pubDate>
    </item>

    <item>
      <title>On the Power of HBase Filters</title>
      <link>http://internetmemory.org/en/index.php/Synapse/on_the_power_of_hbase_filters</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/Synapse/on_the_power_of_hbase_filters#id:134#date:06:57</guid>
      <description><![CDATA[Filters are a powerful feature of HBase to delegate the selection of rows to the servers rather than moving rows to the Client. We present the filtering mechanism as an illustration of the general data locality principle and compare it to the traditional select-and-project data access pattern.<p>Dealing with massive amounts of data changes the way you think about data processing tasks. In a standard business application context, people use a Relational Database System (RDBMS) and consider this system as a service in charge of providing data to the client application. How this data is processed, manipulated, shown to the user, is considered to be the full responsability of the application. In other words, the role of the data server is restricted to what is does best: efficient, safe and consistent storage and access.</p>

<p>The naive approach that consists in getting all the required data at the Client in order to apply locally some processing should be limited in a distributed setting to trivial tasks operating on a tiny subset. There are two fundamentals reasons for that. First, this generates a lot of network exchanges, consuming without necessity a lot of resources and sometimes leading to unacceptable response time. Second, centralizing all the information then processing it, simply misses all the advantages brought by a powerful cluster of hundreds or even thousands machines. The lesson is simply:</p>

<center><em>When you deal with BigData, the data center is your computer.</em></center><p> </p>

<p>It is fair to acknowledge that server-side computation is not limited to Hadoop-like frameworks, but is also possible with relational systems, in the form of &#8220;transactional SQL&#8221; languages - e.g., PL/SQL - which are executed in the server. Still, moving the computation to the data server, instead of moving the data to the client computing, becomes of the essence in a BigData context. The principle is most often termed <em>data locality</em>. </p>

<h2>Server-side filtering</h2>

<p>Let us examine one important feature of HBase which helps us to put this principle in action: <em>filters</em>. For concreteness, consider the canonical example of a program which scans a collection of Web documents and applies some analysis method to RSS feeds (typical of the daily tasks operated at Internet Memory). The algorithm is trivial: we need to access each document, check whether it is indeed a RSS one, and run the analytics. We can implement this algorithm as a program running at a Client node, using a scanner on some HBase table and a filter (on documents&#8217; type) on the client side. The program retrieves the documents from the distributed system (input data flows), and locally performs the computation.</p>

<p><img src="http://internetmemory.org/images/uploads/nondistr-computing.png" alt="Non distributied mode" width="312" height="182" style="border: 0;" /></p>

<p>The disadvantage is obvious. For very large data sets, the computing and storage resources of the Client machine are likely to become quickly overwhelmed, creating a bottleneck. Most of the time will be spent by exchanging documents that do not participate to the result. HBase filters enable a different scenario, illustrated in the Figure below, when the selection of RSS documents occurs at each server. This is likely (on our example) to drastically limit communications by applying local data processing as much as possible. In such a setting, the Client plays the role of a coordinator sending pieces of code to each server, initiating, and possibly coordinating a fully decentralized computation.</p>

<p><img src="http://internetmemory.org/images/uploads/filters.png" alt="Filters in HBase" width="312" height="182" style="border: 0;" /></p>

<h2>Filters in HBase</h2>

<p>Filters in HBase are objects implementing the &#8220;Filter&#8221; interface, equipped with a Boolean &#8220;FilterRow()&#8221; method  which tells whether a row passes or not the filter. The semantics is that rows are filtered <em>out</em> by a filter, which means that you rather tell the rows that you want to ignore (just the opposite of what you are used to express if you are familiar with the SQL &#8220;WHERE&#8221; clause). Scanners can incorporate filters with the &#8220;setFilter()&#8221; method.</p><center>
<p>&#8220;scan.setFilter(myFilter);&#8221;</p>
</center>
<p>This means, among others consequences, that you can use filters for MapReduce jobs that take their inputs from a HBase scanner. We will not cover the full Filter functionality in this post (rather look at the HBase Wiki or the Javadoc) but briefly touch a few of it meain features.<p>HBase comes with a list of pre-defined filters, including:</p><ul>
&nbsp;  <li>&#8220;RowFilter&#8221;: data filtering based on row key values;</li>
&nbsp;  <li>&#8220;FamilyFilter&#8221;: allows to filter out some families on their <em>names</em>;</li>
&nbsp;  <li>&#8220;QualifierFilter&#8221;: allows to filter out some qualifiers on their <em>names</em>;</li>
&nbsp;  <li>&#8220;ValueFilter&#8221;: allows to filter out some qualifiers on their <em>values</em>;</li>
&nbsp;  <li>&#8220;TimestampsFilter&#8221;: allows to filter out rows based on a list of timestamps.</li>
&nbsp;  </ul>

<p>As suggested but these examples, filters are much more powerful than a simple all-or-nothing semantics  applied to HBase rows. You can choose to filter out a row as whole based on some qualifier value, but also  filter out some part of a row, namely a family and/or a column (qualifier) in a family, etc. In other words, you can apply filters to the schema (family and column names) as well as on the values, a recurring features of semi-stuctured data models.</p>

<p>Compare with the well-known SQL world. When you express a SELECT-FROM-WHERE query, you restrict the number or rows (with the &#8220;WHERE&#8221; clause) and the number of columns for each row (with the &#8220;SELECT&#8221; clause). Filters in HBase let you do both: fully ignore some rows, and for those rows that pass, restrict the family, columns, or timestamps. This must be related to the underlying motivation: limit as much as possible the network bandwidth used to communicate withe the client application.</p>

<p>There exists many other filters, some of which implementing utilities likes pagination. Again refer to the documentation.&nbsp; </p>

<h2>Combining filters: &#8220;FilterList&#8221;</h2>

<p>Even though HBase comes with a lot of filter types, their expressive power would remain limited without the ability to combine them with Boolean connectors. This is the purpose of the &#8220;FilterList&#8221; class. A &#8220;FilterList&#8221; is defined by a connector (&#8220;and&#8221;, &#8220;or&#8221;) and a list of component filters (which can be of any type). The main constructor is:</p><center>
<p>&#8220;FilterList (Operator operator, List&lt;Filter&gt; filters)&#8221;</p>
</center>

<p>Since a filter list is itself a &#8220;Filter&#8221; instance, we can build hierarchies of filters representing nested Boolean combinations, and gaining the ability to obtain arbitrarily complex filters.<br /></p><h2>Building custom filters</h2>
<p> <br />
Finally, it is worth mentioning that you can write your own filters, in case the existing ones would not be sufficient. This amounts to write a Java class subclassing the &#8220;FilterBase&#8221; abstract class, implementing a few abstract methods which operate, in the Region servers, on the local rows. The downside of custom filters is that they require a dissemination in the cluster prior to their execution, which makes the set-up of the whole system a bit more complicated.</p>

<h2>Summary: let HBase do the data selection job for you!</h2>

<p>The bottom line of what precedes is: do not ever overload in your application with the burden of row filtering! HBase can do it for you in a much more effective way, at scale. And this encourages us to thing as if our computing machinery is no longer our laptop, but a whole set of servers interconnected with a bandwidth. Consider the resources and limitations of such a system, particularly regarding the network bandwidth, and adopt the measures that avoid to overwhelm these resources. HBase filters (and other features to be covered next) are just built for that.</p>]]></description>
      <dc:subject><![CDATA[English, Big Data, Hadoop, Hbase,]]></dc:subject>
      <pubDate>Fri, 02 Mar 2012 06:57 GMT</pubDate>
    </item>

    <item>
      <title>Preserving Research Projects&#8217; websites</title>
      <link>http://internetmemory.org/en/index.php/Memoranda/preserving_research_projects_websites</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/Memoranda/preserving_research_projects_websites#id:131#date:13:04</guid>
      <description><![CDATA[The quality research project management often requires creation and maintenance of the research project’s website that is used to make available the new developments and results. But what happens to such website when the project and its funding end?<h3>Inside Installations use case</h3>

<p><img src="http://internetmemory.org/images/uploads/InsideInstallation_thumb.png" alt="InsideInstallation" width="600" height="339"  style="border: 0;" /></p>

<p>Few months ago, the <a href="http://www.cultureelerfgoed.nl/"><strong>Cultural Heritage Agency of the Netherlands</strong></a> (RCE) contacted us to expose its situation:</p>

<p><strong>Inside Installations Project</strong>, Preservation and Presentation of Installation Art, was a research project (2004-2007) into the management and conservation of installations and was supported by the European Commission’s Culture 2000 programme. <br />
Rapid obsolescence of media technologies, interactivity and, for instance, the site specific character of many installations are a challenge for prevailing views about long-term conservation, documentation and presentation. Thirty complex installations (many multimedia) were re-installed, investigated and documented. <br />
By sharing their experience partners worked together to develop guidelines for conservation, re-installation and documentation of installation art. </p>

<p>The Cultural Heritage Agency of the Netherlands was the coordinator of the project, which was co-organised by: <br />
- <a href="http://www.tate.org.uk/">Tate</a>, London; <br />
- <a href="http://www.duesseldorf.de/restaurierungszentrum/index.shtml">Restaurierungszentrum</a>, Düsseldorf; <br />
- <a href="http://www.smak.be/">Stedelijk Museum for Modern Art (S.M.A.K.)</a>, Ghent; <br />
- <a href="http://www.museoreinasofia.es/portada/portada.php">Museo Nacional Centro de Arte Reina Sofia</a>, Madrid <br />
- and the <a href="http://www.sbmk.nl/">Foundation for the Conservation of Modern Art (SBMK)</a> in The Netherlands.</p>

<p>In this framework, they developed a <a href="http://www.inside-installations.org/">high content website</a> (Online Version). (<a href="http://collections.europarchive.org/rce/20120208162002/http://www.inside-installations.org/"><em>Archived Version</em>)</p>

<p>More than four years after finishing the project, maintaining this website means a certain annual expense for the coordinator, who does not have specific funding for this. <br />
Which alternatives did he have? <br />
- To continue to fund the website himself, or ask for contributions to other institutions,<br />
- To close the website, remove all content and make it unavailable,<br />
- Or to archive it and ensure an open access to its Web archive.</p>

<h3>Internet Memory proposes solutions</h3>

<p>The consortium decided to follow Cultural Heritage Agency of the Netherlands (RCE) initiative: to buy the archival of the project website “www.inside-installations.org” once and for good and thus <strong>to preserve results of the European project</strong> Inside Installations. <br />
The process of Web archiving and preservation was delegated to Internet Memory Foundation. </p>

<p>See <a href="http://collections.europarchive.org/rce/20120208162002/http://www.inside-installations.org/">archived version</a> captured in February 2012.</p>

<h3>Results of such Web archiving initiatives</h3>

<p><strong>* Websites are preserved and therefore they might remain a part of the cultural heritage for decades.<br />
* They are publicly available <a href="http://internetmemory.org/en/index.php/about/collections1">online</a>.<br />
* This solution is less expensive than maintaining websites that are not any more updated.</strong></p>

<h6><em><strong>Web archiving as an efficient solution to offer a second life to your project websites!</strong></em></h6>

<p>Internet Memory proposes solutions to archive and preserve high quality websites such are research projects’ websites thanks to its automated Web archiving platform, <a href="http://archivethe.net"><strong>ArchivetheNet</strong></a>.</p>

<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Mon, 20 Feb 2012 13:04 GMT</pubDate>
    </item>

    <item>
      <title>Using Hadoop for Video Streaming</title>
      <link>http://internetmemory.org/en/index.php/Synapse/using_hadoop_for_video_streaming</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/Synapse/using_hadoop_for_video_streaming#id:130#date:08:52</guid>
      <description><![CDATA[Internet Memory supplies a service to browse archived Web pages, including multimedia content. We use Hadoop, HDFS and HBase for storing and indexing our data, and associates this storage  with a Web server that lets users navigate through the  archive and retrieve documents. In the present post, we focus on <i>videos</i> and detail the solution adopted to serve true streaming from HDFS storage. <h2>Basics</h2>
<p>
Many video formats are found on the Web, including Windows Media (.wmv), RealMedia (.rm), Quicktime (.mov), MPEG, Adobe Flash (.flv), etc. In order to display a video, we need a <em>player</em>, which can be incorporated in the Web browser. The  player depends on the specific video format, but most browsers are able to detect the format and choose the appropriate player. Firefox for instance comes with a lot of plugins, which can be quickly integrated in the presence of a specific video to display it content.
</p>

<p>
There are basically two ways to play a video. The simplest one is a two-steps process:
first the whole file is downloaded from the Web server to the user&#8217;s computer, and then displayed by the player running the local copy. It has the  disadvantage that the download step may take a while is the file is big (hundreds of megabytes are not uncommon).
The second one uses (true) <em>streaming</em>: the video file is split into fragments which are sent from the Web server to the player, giving the illusion of a continuous stream. From the user point of view, it looks as if a window is swept over the video content, saving the need of a full 
initial download of the whole file.
</p>
<p>
Obviously, streaming is a more involved method because it requires a strong coordination between the components involved in the process, namely the player, the Web server, and the file system from which the video is retrieved. We examine this technical issue in the context of a Hadoop system where files are stored in HDFS, a file system dedicated to large distributed storage. 
</p>
<p>
<img src="http://internetmemory.org/images/uploads/components.png" alt="" width="405" height="93" style="border: 0;" />
</p>

<h2>File seeking with HDFS</h2>

<p>
At explained above, streaming requires a strong coordination between the Web server and the file system. The former
produces requests to access chunks of the video file (think to what happens when the user suddenly requires a  move to a specific part of the video), whereas the later must be able to seek in the file  to position the cursor at a specific location. When using HDFS, enabling such a close cooperation turns out to be a problem because HDFS can in principle only be accessed through a Hadoop client, which the standard Apache server is not. We investigated two possible solutions: Hoop, the Hadoop web server, and Apache/FUSE.
</p>
<p>
Hoop (see http:///cloudera.github.com/hoop/) is an HTTP-HDFS-Connector. It allows the HDFS file system to be accessed via HTTP. 
A working local prototype has been developed using JW Player and a large video file.
Streaming works, but seeking in an unbuffered part results in the playback stopping. 
It seems that the Hoop API does not support seeking in a file, so we had to give up this approach.
</p>
<p>
The second solution is based on HDFS/FUSE. FUSE (File System in User Space) is an API that captures the file system operations and allows to implement them with ad-hoc functions running in the the user&#8217;s processus space (thereby saving the need to change the operating system kernel, a tricky and dangerous option). FUSE is provided in Hadoop as a component named &#8220;Mountable HDFS&#8221; (see <a href="http://wiki.apache.org/hadoop/MountableHDFS">http://wiki.apache.org/hadoop/MountableHDFS</a>). It lets the standard file system user or program see the HDFS name space as a locally mounted directory. All file system operations, including directory browsing, file opening and content access, are enabled over HDFS content through the FUSE interface. 
</p>
<h2>Apache server configuration</h2>
<p>
It remained to configure Apache to access the mounted FUSE system and load content from video files. 
How this is done depends on the video format. At the moment, we tested and validated
<i>.mp4</i> files and Flash video files. For the first format we use H264 Streaming Module (see <a href="http://h264.code-shop.com/trac">http://h264.code-shop.com/trac</a>), an Apache plugin, which enables adaptive streaming. For FLV we used pseudo-stream module for Apache named &#8220;mod_flv&#8221;. Both behave nicely and go along with the mountable HDFS without problem.
</p>
<h2>Conclusion</h2>

<p>The solution based on Apache + Mountable HDFS (FUSE) turned out to be both reliable, functionally adequate (seeking is well supported) and efficient. The architecture is simple and easy to set up, and allows to combine the benefits of HDFS for very large repositories and standard Web server streaming solutions. Although we chose to adopt Apache plugins in our current service, nothing keeps you from using a more
powerful streaming server since the FUSE approach (virtually) moves all the HDFS content in the standard file system scope. 
</p>
<p>
Hoop remains a potential option for the future, but it appeared not mature enough when we tested it, at least for the complex operations (seeking at a specific offset in a file) required by video streaming.
</p>

<p>&nbsp;</p>]]></description>
      <dc:subject><![CDATA[English, Hadoop, Hbase, Video Streaming,]]></dc:subject>
      <pubDate>Fri, 03 Feb 2012 08:52 GMT</pubDate>
    </item>

    <item>
      <title>Open source version of the LivingKnowledge testbed publicly released on SourceForge</title>
      <link>http://internetmemory.org/en/index.php/News/open_source_version_of_the_livingknowledge_testbed</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/open_source_version_of_the_livingknowledge_testbed#id:129#date:17:49</guid>
      <description><![CDATA[Since its public release on <a href="http://sourceforge.net/p/diversityengine/wiki/Home/">SourceForge</a> in August 2011 under the name of Diversity Engine, many downloads were made and some of the testbed components will be reused in other FP7 research projects such as <a href="http://internetmemory.org/en/index.php/projects/trendminer">TrendMiner</a>.<h2>LivingKnowledge Project</h2>

<p>The <a href="http://livingknowledge.europarchive.org/">LivingKnowledge</a> project (LK) enhances the state of the art of retrieving information from the Web by formalizing the notions of bias and diversity, creating tools that analyze, summarize and visualize bias in textual and image documents and finally, by creating applications that exploit this technology.</p>

<h2>LivingKnowledge Testbed</h2>

<p>The testbed integrates the following components, all of which contribute to diversity and bias aware search:<br />
- <strong>document collections</strong> chosen to reflect a diversity of document types and content,<br />
- <strong>image and text analysis tools</strong> supporting the analysis of diversity in text and image documents,<br />
- <strong>indexing and search tools</strong> supporting the bias and diversity aware search including novel visualization methods,</p>

<p>The testbed processing starts with document collections that are available upon request from the <a href="http://internetmemory.org/en/index.php/projects/livingknowledge">Internet Memory Foundation</a>, including 280 News sites and 750 blogs.<br />
Furthermore, the testbed supports a number of collection formats allowing users to incorporate their own collections.</p>

<p>Hands-On session with over 30 participants (Symposium on Bias and Diversity) was held during the 8th <a href="http://essir.uni-koblenz.de/">International Summer School on Information Retrieval</a> (ESSIR), which tooks place in Koblenz (Germany) in August/September 2011.</p>

<h2>More info</h2>
<p><a href="http://livingknowledge.europarchive.org/">Living Knowledge Project</a> <br />
<a href="http://sourceforge.net/p/diversityengine/wiki/Home/">SourceForge</a><br />
<a href="www.diversityengine.org">Diversity Engine</a><br />
<a href="http://essir.uni-koblenz.de/">Symposium on Bias and Diversity in IR (ESSIR 2011) </a></p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Thu, 02 Feb 2012 17:49 GMT</pubDate>
    </item>

    <item>
      <title>Temporal Web Analytics Workshop (TempWeb02) at WWW2012 in Lyon on April 17,</title>
      <link>http://internetmemory.org/en/index.php/News/temporal_web_analytics_workshop</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/temporal_web_analytics_workshop#id:128#date:10:23</guid>
      <description><![CDATA[<a href="http://temporalweb.net/">TempWeb02</a> will take place April 17th, 2012 in conjunction with <a href="http://www2012.wwwconference.org/">International World Wide Web Conference</a> in Lyon, France. <br />
<p><strong>As PC-Chair and Organizer, Internet Memory Foundation informs you that the submission deadline for paper is fixed to February 24, 2012.</strong></p>

<h2>Objectives</h2>

<p>The objective of this workshop is to provide a venue for researchers of all domains (IE/IR, Web mining etc.) where the temporal dimension opens up an entirely new range of challenges and possibilities. The workshops ambition is to help shaping a community of interest on the research challenges and possibilities resulting from the introduction of the time dimension in Web analysis.</p>

<p>TempWeb focuses on temporal data analysis along the time dimension for Web data that has been collected over extended time periods. A major challenge in this regard is the sheer size of the data it exposes and the ability to make sense of it in a useful and meaningful manner for its users. Web scale data analytics therefore needs to develop infrastructures and extended analytical tools to make sense of these. </p>

<h2>Workshop topics</h2>

<p>• Web scale data analytics<br />
• Temporal Web analytics<br />
• Distributed data analytics<br />
• Web science<br />
• Web dynamics<br />
• Data quality metrics<br />
• Web spam<br />
• Knowledge evolution on the Web<br />
• Systematic exploitation of Web archives<br />
• Large scale data storage<br />
• Large scale data processing<br />
• Data aggregation<br />
• Web trends<br />
• Topic mining<br />
• Terminology evolution<br />
• Community detection and evolution</p>

<h2>Important Dates</h2>

<p>• Paper submission deadline: February 24, 2012<br />
• Notification of acceptance: March 5, 2012<br />
• Camera ready copy deadline: March 16, 2012<br />
• Workshop: April 17, 2012</p>

<p>Please post your submission (up to 8 pages) using the ACM template:<br />
<a href="http://www.acm.org/sigs/publications/proceedings-templates">http://www.acm.org/sigs/publications/proceedings-templates</a><br />
at:<br />
<a href="https://www.easychair.org/account/signin.cgi?conf=tempweb2012">https://www.easychair.org/account/signin.cgi?conf=tempweb2012</a></p>

<p>Note that the workshop proceedings will be published in ACM DL (ISBN 978-1-4503-1188-5)</p>

<h2>Support</h2>

<p>This workshop is organized with the support of the EU 7th Framework ICT STREP on Longitudinal Analytics of Web Archive data (<a href="http://www.lawa-project.eu/">LAWA</a>) under contract no. 258105.</p>

<h2>Workshop Officials</h2>

<p><strong>Chair:</p>

<p>PC-Chairs and Organizers:</strong></p>

<p>Ricardo Baeza-­Yates (<a href="http://research.yahoo.com/Ricardo_Baeza-Yates">Yahoo! Research</a>, Spain)<br />
Julien Masanès (<a href="http://internetmemory.org/en/index.php/about/the_board">Internet Memory Foundation</a>, France and Netherlands)<br />
Marc Spaniol (<a href="http://www.mpi-inf.mpg.de/~mspaniol/">Max Planck Institute for Informatics</a>, Germany)</p>

<p><strong>Program Committee:</strong></p>

<p>Eytan Adar (University of Michigan, USA)<br />
Omar Alonso (Microsoft Bing, USA)<br />
Srikanta Bedathur (IIIT-Delhi, India)<br />
Andras Benczur (Hungarian Academy of Science)<br />
Klaus Berberich (Max Planck Institute for Informatics, Germany)<br />
Roi Blanco (Yahoo! Research, Spain)<br />
Adam Jatowt (Kyoto University, Japan)<br />
Scott Kirkpatrick (Hebrew University Jerusalem, Israel)<br />
Christian König (Microsoft Research, USA)<br />
Frank McCown (Harding University, USA)<br />
Michael Nelson (Old Dominion University, USA)<br />
Nikos Ntarmos (University of Patras, Greece)<br />
Kjetil Norvag (Norwegian University of Science and Technology, Norway)<br />
Philippe Rigaux (Internet Memory Foundation, France and Netherlands)<br />
Thomas Risse (L3S Research Center, Germany)<br />
Pierre Senellart (Télécom ParisTech, France)<br />
Torsten Suel (NYU Polytechnic, USA)<br />
Masashi Toyoda (Tokyo University, Japan)<br />
Peter Triantafillou (University of Patras, Greece)<br />
Michalis Vazirgiannis (Athens University of Economics and Business &amp; École Polytechnique)<br />
Gerhard Weikum (Max Planck Institute for Informatics, Germany)</p>]]></description>
      <dc:subject><![CDATA[English, French,]]></dc:subject>
      <pubDate>Thu, 02 Feb 2012 10:23 GMT</pubDate>
    </item>

    <item>
      <title>TV Show: « La mémoire de toile » (net memory) and Web archiving challenges</title>
      <link>http://internetmemory.org/en/index.php/News/tv_show_la_memoire_de_toile_the_net_memory_and_the_web_archiving_challenges</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/tv_show_la_memoire_de_toile_the_net_memory_and_the_web_archiving_challenges#id:124#date:16:38</guid>
      <description><![CDATA[Reportage on Web archiving by France24<p><img src="http://internetmemory.org/images/uploads/memoiredelatoile_thumb.png" alt="memoiredelatoile" width="300" height="228"  style="border: 0;" alt="image" /></p>

<p>The Internet has become one of the most productive media for information and news. Thus, there&#8217;s an absolute need to preserve web content and promote Web archiving at large scale. This idea begins to be one of the great challenges of the Web. <br />
Media are already interested in the subject, and <a href="http://www.france24.com/en/" title="France 24">France24</a>, the French international news channel, is broadcasting a <a href="http://www.france24.com/fr/20111231-memoire-internet-archivage">video reportage</a> on web harvesting in France (due to the French legal deposit), on Web archiving in general and on giving access to these <a href="http://internetmemory.org/en/index.php/about/collections1">Web archive collections</a>. </p>

<p>This video shows a rapid overview of French initiatives and <a href="http://internetmemory.org/en/index.php/IM/blogs" title="Blog InternetMemory">Web archiving technologies</a> thanks to the participation of the <a href="http://www.bnf.fr/fr/collections_et_services/livre_presse_medias/a.archives_internet.html">National Library of France</a>, the National Audiovisual Institute of France and the Internet Memory Foundation (interview of Julien Masanès by Natalia Gallois in our offices in Paris).</p>

<p>To view the video and discover the challenges of Web archiving <a href="http://www.france24.com/fr/20111231-memoire-internet-archivage" title="France24">click here</a> (in French).<br />
TV Show: <a href="http://www.france24.com/fr/taxonomy/emission/16758">&#8220;Web News&#8221;</a>, News seen on the Web and about the web.</p>

]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Tue, 03 Jan 2012 16:38 GMT</pubDate>
    </item>

    <item>
      <title>Happy New Year 2012!</title>
      <link>http://internetmemory.org/en/index.php/News/happy_new_year_2012</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/happy_new_year_2012#id:123#date:14:38</guid>
      <description><![CDATA[We present you our best wishes for this New Year 2012!<p><strong>2012</strong> will be a year full of projects and developments, so follow us on <a href="http://twitter.com/#!/InternetMemory">Twitter</a> and save our <a href="http://internetmemory.org/en/index.php/RSS">RSS feed</a>!</p>

]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Fri, 30 Dec 2011 14:38 GMT</pubDate>
    </item>

    <item>
      <title>November 7-8th, Kick-Off of a new R&amp;D project: TrendMiner</title>
      <link>http://internetmemory.org/en/index.php/News/november_7_8th_kick_off_of_a_new_rd_project_trendminer</link>
      <guid isPermaLink="true">http://internetmemory.org/en/index.php/News/november_7_8th_kick_off_of_a_new_rd_project_trendminer#id:117#date:09:48</guid>
      <description><![CDATA[We are glad to announce the kick-off of the European Research project, TrendMiner on Large-scale, Cross-lingual Trend Mining and Summarization of Real-time Media Streams<p>Today in Luxembourg starts TrendMiner project (Large-scale, Cross-lingual Trend Mining and Summarization of Real-time Media Streams). It is a three-year European project funded by the European Commission through the Seventh Research Framework Program (FP7-ICT) and under Project No 287863. </p>

<p>Beside Internet Memory Foundation are involved:<br />
- <a href="http://www.dfki.de/web/welcome?set_language=en&amp;cl=en" target="new">Deutsches Forschungszentrum für Künstliche Intelligenz GmbH(Germany)</a> as Coordinator,<br />
- <a href="http://www.shef.ac.uk/" target="new">The University of Sheffield (United Kingdom)</a>, <br />
- <a href="http://www.ontotext.com/" target="new">Ontotext AD (Bulgaria)</a>, <br />
- <a href="http://www.soton.ac.uk/" target="new">University of Southampton (UK)</a>, <br />
- <a href="http://en.eurokleis.com/" target="new">Eurokleis S.R.L. (Italy)</a>, <br />
- <a href="http://www.sora.at/index.php?id=72&amp;L=1" target="new">Sora Ogris &amp; Hofinger GmbH (Austria)</a> <br />
- and <a href="http://hardikgroup.com/" target="new">Hardik Fintrade Pvt Ltd. (India)</a>.</p>

<p>This project aims at delivering innovative, portable open-source real-time methods for cross-lingual mining and summarization of large-scale stream media.</p>

<p>IMF will contribute to the Platform for Real Time Media collection, Analysis and storage by :<br />
- providing scalable infrastructure to partners, with support for integration and experiment.<br />
- designing and developing an application-aware crawler mechanism for social media.</p>

<p>For more information on TrendMiner, please visit the <a href="http://www.trendminer-project.eu/" target="new">Project website</a> (under construction).</p>

<p><img src="http://internetmemory.org/images/uploads/fp7logoban1.jpg" alt="" width="60" height="56" style="border: 0;" alt="image" /> <img src="http://internetmemory.org/images/uploads/Eur-flag.jpg" alt="" width="63" height="44" style="border: 0;" alt="image" /></p>]]></description>
      <dc:subject><![CDATA[English,]]></dc:subject>
      <pubDate>Mon, 07 Nov 2011 09:48 GMT</pubDate>
    </item>

    
    </channel>
</rss>