[Reprozip-users] Web Archiving

Rasa Bočytė rbocyte at beeldengeluid.nl
Fri May 11 09:54:26 EDT 2018


I was just wondering if you would be able to help me with testing out

I have my case study website set up on my laptop. It does not require any
special server-side software, I can run it with a simple php inbuilt server
or Lampp. At the moment I am doing it only for testing purposes, so I could
use any server software that would work with Reprozip.

A couple of questions:
- apart from tracing the server, do I need to trace any other processes to
ensure that I can reproduce the experiment?
- how do I ensure that my package includes all the website data? I tried
running a php inbuilt server (reprozip trace php -S localhost:8000 -t
path/to/the/website/files) and the finished package only captured data from
the web pages that I visited while running the experiment.
- can I somehow include the browser needed to run the website into the

Thank you for your help!


On 24 April 2018 at 15:37, Rasa Bočytė <rbocyte at beeldengeluid.nl> wrote:

> Hi both,
> I presented ReproZip to other researchers in my institution and everyone
> seems quite excited to see if it would work for us! I still need to discuss
> it with a couple of other colleagues but I think we will try to test it.
> One of the things that I am trying to figure out is how to include
> client-side software, i.e. web browser, into the equation. Would you have
> to create a separate container for that? I deally, we would like to package
> everything, source files, server-side dependencies and client-side
> dependencies, into one place, but I don't know if that is feasible.
> Regards,
> Rasa
> On 18 April 2018 at 18:27, Vicky Steeves <vicky.steeves at nyu.edu> wrote:
>> Hi Rasa,
>> Apologies, we were traveling and just got back to the office. We are very
>> glad to be of help!
>> We let the users packing experiments to edit the yml file before the
>> final packing step, and for those secondary users who unpack, we let them
>> download and view the yml file. We certainly *could* automatically
>> extract categories of information for the user. It bears more thinking
>> about, especially since there are a few ways that unpacking users interface
>> with ReproUnzip.
>> Best,
>> Vicky
>> Vicky Steeves
>> Research Data Management and Reproducibility Librarian
>> Phone: 1-212-992-6269
>> ORCID: orcid.org/0000-0003-4298-168X/
>> vickysteeves.com | @VickySteeves <https://twitter.com/VickySteeves>
>> NYU Libraries Data Services | NYU Center for Data Science
>> On Tue, Apr 10, 2018 at 4:46 AM, Rasa Bočytė <rbocyte at beeldengeluid.nl>
>> wrote:
>>> Hi Remi,
>>> In terms of migration, originally my institute planned to acquire files
>>> from the creators and then figure out what to do with them, most likely
>>> migrate individual files to updated versions when needed. Which I think is
>>> not a helpful approach since you need to start at the server and capture
>>> the environment and software that manipulates those files to create a
>>> website. Especially, if you want to be able to reproduce it.
>>> I am definitely leaning towards the idea that virtualisation of a web
>>> server would be the best approach for us. I will try to test out the
>>> examples that you have on your website and see if I can run some tests with
>>> my own case studies (of course, it depends if the creators will allow us to
>>> do it).
>>> I promise I won't bother you too much but my last question is about the
>>> metadata captured on the yml file. It is machine and human readable, but
>>> the question is what do you with it and how you present it once you have it
>>> so it becomes a valuable resource for those using the preserved object.
>>> Have you thought about automatically extracting some categories of
>>> information from that file in a user-friendly format or do you think it is
>>> enough as it is?
>>> Just wanted to say a massive thank you for your feedback. It has been
>>> incredibly helpful!
>>> Rasa
>>> On 6 April 2018 at 19:53, Rémi Rampin <remi.rampin at nyu.edu> wrote:
>>>> Rasa,
>>>> 2018-04-04 08:03 EDT, Rasa Bočytė <rbocyte at beeldengeluid.nl>:
>>>>> In our case, we are getting all the source files directly from content
>>>>> creators and we are looking for a way to record and store all the
>>>>> technical, administrative and descriptive metadata, and visualise
>>>>> dependencies on software/hardware/file formats/ etc. (similar to what
>>>>> Binder does).
>>>> I didn't think Binder did that (this binder?
>>>> <https://github.com/jupyterhub/binderhub>). It is certainly a good
>>>> resource for reproducing environments already described as a Docker image
>>>> or Conda YaML, but I am not aware of ways to use it to track or visualize
>>>> dependencies or any metadata.
>>>> We have been mostly considering migration as it is a more scalable
>>>>> approach and less technically demanding. Do you find that virtualisation is
>>>>> a better strategy for website preservation? At least from the archival
>>>>> community, we have heard some reservations about using Docker since it is
>>>>> not considered a stable platform.
>>>> When you talk of migration, do you mean to new hardware? What would you
>>>> be migrating to? Or do you mean upgrading underlying software/frameworks?
>>>> The way I see it, virtualization (sometimes referred to as "preserving
>>>> the mess") is definitely less technically demanding than migration. Could
>>>> you share a bit more about what you mean by this?
>>>> Thanks
>>>> PS: Please make sure you keep users at reprozip.org in the recipients
>>>> list.
>>>> --
>>>> Rémi Rampin
>>>> ReproZip Developer
>>>> Center for Data Science, New York University
>>> --
>>> *Rasa Bocyte*
>>> Web Archiving Intern
>>> *Netherlands Institute for Sound and Vision*
>>> *Media
>>> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g> Parkboulevard
>>> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g> 1
>>> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g>, 1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum | **beeldengeluid.nl
>>> <http://www.beeldengeluid.nl/>*
>>> _______________________________________________
>>> Reprozip-users mailing list
>>> Reprozip-users at vgc.poly.edu
>>> https://vgc.poly.edu/mailman/listinfo/reprozip-users
> --
> *Rasa Bocyte*
> Web Archiving Intern
> *Netherlands Institute for Sound and Vision*
> *Media
> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g> Parkboulevard
> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g> 1
> <https://maps.google.com/?q=Media%C2%A0Parkboulevard%C2%A01&entry=gmail&source=g>, 1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum | **beeldengeluid.nl
> <http://www.beeldengeluid.nl/>*


*Rasa Bocyte*
Web Archiving Intern

*Netherlands Institute for Sound and Vision*
1217 WE  Hilversum | Postbus 1060, 1200 BB  Hilversum |
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://vgc.poly.edu/pipermail/reprozip-users/attachments/20180511/2fc53adb/attachment.html>

More information about the Reprozip-users mailing list