Follow

:heart_parrot: HELLO AGAIN!! :heart_parrot:

I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer β€” it's not uploaded to the cloud!

There's lots of room for additional features so feel free to suggest some! πŸ’š

beta.observablehq.com/@nuklear

Holy hell, 16 boosts in 15 minutes?! No wonder my Chrome tab crashed!! πŸ˜‚ πŸ’šπŸ’šπŸ’š

Update: I've made it so you can select the whole .tar.gz file now (instead of having to extract it first), and also made everything prettier. :party_heart:

:heart_parrot: Masto Archive Tool v4!!! :heart_parrot:

I've updated the Masto Archive Tool to give basic stats on your top-5 most liked/boosted/mentioned users. Try it out and let me know how it works! πŸ’š

Currently it only really works for Masto users β€” sorry Pleroma users, I'll get to you next, promise!!!

beta.observablehq.com/@nuklear

Quick update, released v5, which now support stats for both versions of the Masto archive format. πŸ˜…

@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later

@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features β€” will add them once home!

@andi
@anna
I'm glad you made it. Every time people want to search their archives I want to recommend the tool I use, but installing an ocaml compiler and stuff is a bit much for most people

@Authoritimmy @anna Uy! I think I've gotten stuck on that before trying to get Flow to work... πŸ˜… What's the other tool out of curiosity?

@andi @anna
It was this

github.com/kit-ty-kate/mastodo

Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me

@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🀷

@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.

@ben Huh, true... I need to walk home (IN THE FREEZING RAIN 😭 ) but will do that once there!

@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass

@fresh_newlook Glad you like it! It's really basic (but also easy to extend β€” it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests. :heart_parrot:

@andi
Cool! I'll be sure to break it for you later 😈

@eject AHAHAHA wouldn't have it any other way!!! 😘 πŸ’šπŸ’šπŸ’š

@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎡 (U+1F3B5)

They do appear in results if contained in a toot though

(don't you just love unicode? 😘 )

@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?

Nothing stopping me from writing a UTF-16 aware tokeniser tho!! πŸ’š

@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them

that probably doesn't help much, but i thought it was worth pointing out

@eject Ah cheers! I think I also really need to debounce that input πŸ˜…

@alana_is_tooting Thanks!! πŸ’šπŸ’šπŸ’š Do let me know if you have any feature ideas!! 😘

@root @andi

"Hey would it be cool if I made a tool that does X?"

"Uh, I guess? I'm not sure."

"Too late, I already made a tool that does X."

@lewdmood Wasted childhood? πŸ€·πŸΌβ€β™€οΈ

@lewdmood Also running an average of 4h sleep per night probably also has something to do with it, now that I think about it...

@andi I'm going to steal that lovely heart. Any objections/req's for credit if you're the author or know of them? :3

@ella_kane All of my emoji are totally open for stealing, go right ahead! Most of them are just standard emoji I put through this thing I made: beta.observablehq.com/@nuklear

(Though the heart one I painstakingly made in Gimp before I built that tool, lol. I think the colourisation turned out better on it, ngl)

:party_mushroom: :party_blobcatmelt:

@ella_kane No particular requirement for attribution, though I may submit the heart myself to the cultofthepartyparrot.com collection at some point. :heart_parrot:

@ella_kane No problem whatsoever!! Thanks for spreading the flashy psychedelic looooove!!!! 😊 :party_sparkles: :party_heart: :party_sparkles: :party_heart:

@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.

@andi so i found it and its even worse than i thought itd be

@carbontwelve @Authoritimmy N.b., I have it filtering out everything that isn't a toot (so boosts mainly) so you won't see boosts in the output! I may add an option in the future to change this. :heart_parrot:

@andi
Does it filter out toots with media as well? I tried to find a toot with an attached image and did not see it
@carbontwelve

@Authoritimmy @carbontwelve It doesn't filter them out, it just doesn't display the media. I think? My knowledge of the ActivityPub API is pretty bad, ngl... πŸ˜…

Media is the next thing I try to get working in it. :heart_parrot:

@andi @carbontwelve
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.

If you extract it to a temp directory it'll be more consistent

@Authoritimmy @carbontwelve It actually consumes the entire tar.gz file since V2, no extracting necessary!

@codesections Ooh, interesting, hadn't seen that before!

This is mainly just an interface that uses the Web File API to untar and create local file blobs from a Masto archive tgz file, which then becomes searchable via a client-side version of ElasticSearch. It's super basic at the moment!

I'm definitely wanting to improve upon it (possibly add some things like analytics, network diagrams, etc.) so please let me know if you have any suggestions!! :heart_parrot:

@andi you've used named regex groups and those aren't supported in Firefox :blobsad:

It was quite easy to change your code to use unnamed groups though. I would send a pull request or a diff but it doesn't look like observable supports that kind of thing πŸ€·β€β™€οΈ

@andi scratch that, i've found the suggest button :blobnerd:

Sign in to participate in the conversation
Nuklear Family

This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘