:heart_parrot: HELLO AGAIN!! :heart_parrot:

I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer — it's not uploaded to the cloud!

There's lots of room for additional features so feel free to suggest some! 💚

Holy hell, 16 boosts in 15 minutes?! No wonder my Chrome tab crashed!! 😂 💚💚💚

Update: I've made it so you can select the whole .tar.gz file now (instead of having to extract it first), and also made everything prettier. :party_heart:

:heart_parrot: Masto Archive Tool v4!!! :heart_parrot:

I've updated the Masto Archive Tool to give basic stats on your top-5 most liked/boosted/mentioned users. Try it out and let me know how it works! 💚

Currently it only really works for Masto users — sorry Pleroma users, I'll get to you next, promise!!!

Quick update, released v5, which now support stats for both versions of the Masto archive format. 😅

@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later

@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features — will add them once home!

I'm glad you made it. Every time people want to search their archives I want to recommend the tool I use, but installing an ocaml compiler and stuff is a bit much for most people

@Authoritimmy @anna Uy! I think I've gotten stuck on that before trying to get Flow to work... 😅 What's the other tool out of curiosity?

@andi @anna
It was this

Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me

@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🤷

@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.

@ben Huh, true... I need to walk home (IN THE FREEZING RAIN 😭 ) but will do that once there!

@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass

@fresh_newlook Glad you like it! It's really basic (but also easy to extend — it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests. :heart_parrot:

Cool! I'll be sure to break it for you later 😈

@eject AHAHAHA wouldn't have it any other way!!! 😘 💚💚💚

@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎵 (U+1F3B5)

They do appear in results if contained in a toot though

(don't you just love unicode? 😘 )

@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?

Nothing stopping me from writing a UTF-16 aware tokeniser tho!! 💚

@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them

that probably doesn't help much, but i thought it was worth pointing out

@eject Ah cheers! I think I also really need to debounce that input 😅

@root @andi

"Hey would it be cool if I made a tool that does X?"

"Uh, I guess? I'm not sure."

"Too late, I already made a tool that does X."

@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.

@carbontwelve @Authoritimmy N.b., I have it filtering out everything that isn't a toot (so boosts mainly) so you won't see boosts in the output! I may add an option in the future to change this. :heart_parrot:

Does it filter out toots with media as well? I tried to find a toot with an attached image and did not see it

@Authoritimmy @carbontwelve It doesn't filter them out, it just doesn't display the media. I think? My knowledge of the ActivityPub API is pretty bad, ngl... 😅

Media is the next thing I try to get working in it. :heart_parrot:

@andi @carbontwelve
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.

If you extract it to a temp directory it'll be more consistent

@Authoritimmy @carbontwelve It actually consumes the entire tar.gz file since V2, no extracting necessary!

@codesections Ooh, interesting, hadn't seen that before!

This is mainly just an interface that uses the Web File API to untar and create local file blobs from a Masto archive tgz file, which then becomes searchable via a client-side version of ElasticSearch. It's super basic at the moment!

I'm definitely wanting to improve upon it (possibly add some things like analytics, network diagrams, etc.) so please let me know if you have any suggestions!! :heart_parrot:

@andi hello 👋 idk if you're still taking suggestions, but...

would it be possible to just browse a list of one's own toots? i don't necessarily want to perform a search, i'd like to just scroll through mine

also does the json data contain how many boosts/likes the toots got? it would be fun to see and sort by that data if possible

this is such an awesome tool btw. thank you so much for publishing it 👏 💯 🏆

@red I'm working on a better version called — it will cost your firstborn to use* but will be amazing

* Will not actually require sacrificing first born to use

@red That's an excellent suggestion tho.

Boosts I have data on, favs are a joke in terms of the API so no idea what those actually are.

@andi ooOOooh i am excited and intrigued! and thank you 🙏

before i found your app i was stressing about how to turn that mess of data into something useful, but you made it so easy. so convenient!!

@evan @andi

thank you for killing the least funny joke ever generated by the testes


mmmm... interesting. Thanks.

Makes me wonder why we don't have tools to explore our exported Mastodon archives.

A simple php or python script could do the trick.

Sign in to participate in the conversation
Nuklear Family

This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘