c.im is one of the many independent Mastodon servers you can use to participate in the fediverse.
C.IM is a general, mainly English-speaking Mastodon instance.

Server stats:

2.9K
active users

#archive

60 posts48 participants1 post today

🕸️ We are doing this conference at Media of Cooperation in June ... It is called THE DATAFIED WEB, and it will be a blast! Registration for #RESAW25 is now open. Please spread widely, and do not forget to check out the marvelous programme at datafiedweb.net!

Really looking forward to everything, especially the keynotes by @nthylstrup on "Vanishing Points: Technographies of Data Loss" and @jwyg on "Public Data Cultures"!

I've mirrored a relatively simple website (redsails.org; it's mostly text, some images) for posterity via #wget. However, I also wanted to grab snapshots of any outlinks (of which there are many, as citations/references). By default, I couldn't figure out a configuration where wget would do that out of the box, without endlessly, recursively spidering the whole internet. I ended up making a kind-of poor man's #ArchiveBox instead:

for i in $(cat others.txt) ; do dirname=$(echo "$i" | sha256sum | cut -d' ' -f 1) ; mkdir -p $dirname ; wget --span-hosts --page-requisites --convert-links --backup-converted --adjust-extension --tries=5 --warc-file="$dirname/$dirname" --execute robots=off --wait 1 --waitretry 5 --timeout 60 -o "$dirname/wget-$dirname.log" --directory-prefix="$dirname/" $i ; done

Basically, there's a list of bookmarks^W URLs in others.txt that I grabbed from the initial mirror of the website with some #grep foo. I want to do as good of a mirror/snapshot of each specific URL as I can, without spidering/mirroring endlessly all over. So, I hash the URL, and kick off a specific wget job for it that will span hosts, but only for the purposes of making the specific URL as usable locally/offline as possible. I know from experience that this isn't perfect. But... it'll be good enough for my purposes. I'm also stashing a WARC file. Probably a bit overkill, but I figure it might be nice to have.

Here's a fun fact. You don't need to pay for the Twitter API if you know the post you want to retrieve.

The Twitter embedding service provides a fully inflated JSON response. It includes parents to replies, quote posts, links to media, total number of likes, retweets, replies, etc.

Here's an example of someone replying to me:

cdn.syndication.twimg.com/twee

You need to add a token to the end, but it can be any random string. No other restrictions.

Very useful for archiving.

[Archive — 2023] LHDG07. GNU/Linux, c'est trop compliqué ?

Quand je vois le brol délirant des interfaces de logiciels propriétaires avec lequel les gens s'accommodent, ça me fait marrer d'entendre que le libre, c'est « compliqué »…

▶️ Écouter cette chronique : grisebouille.net/lhdg07-gnulin
📗 Le livre best of : editions.ptilouk.net/gb10ans
❤️ Soutien : ptilouk.net/#soutien

#archive #GriseBouille #humour #chronique #logicielLibre #GAFAM #UI #UX
grisebouille.net/lhdg07-gnulin

Grise Bouille · LHDG07. GNU/Linux, c'est trop compliqué ?
More from gee Ⓐ⚑

[Archive — 2016] Le grimoire de l'éternité

Une nouvelle de fantasy écrite avec contraintes imposées. Le sachiez-vous : un des personnages est l'ancêtre de Barne dans « Sortilèges & Syndicats » (oui, ça se passe dans le même univers 😇).

▶️ Lire cette nouvelle : grisebouille.net/le-grimoire-d
📗 Le recueil de nouvelles : editions.ptilouk.net/enfant
❤️ Soutien : ptilouk.net/#soutien