I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer — it's not uploaded to the cloud!
There's lots of room for additional features so feel free to suggest some! 💚
Holy hell, 16 boosts in 15 minutes?! No wonder my Chrome tab crashed!! 😂 💚💚💚
Update: I've made it so you can select the whole .tar.gz file now (instead of having to extract it first), and also made everything prettier.
Also now renders custom emoji
Masto Archive Tool v4!!!
I've updated the Masto Archive Tool to give basic stats on your top-5 most liked/boosted/mentioned users. Try it out and let me know how it works! 💚
Currently it only really works for Masto users — sorry Pleroma users, I'll get to you next, promise!!!
Quick update, released v5, which now support stats for both versions of the Masto archive format. 😅
@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later
@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features — will add them once home!
Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me
@andi !!!!! wow! ✨
@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🤷
@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.
@ben Huh, true... I need to walk home (IN THE FREEZING RAIN 😭 ) but will do that once there!
@ben Wooooo I did it 🎉
@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass
@fresh_newlook Glad you like it! It's really basic (but also easy to extend — it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests.
Cool! I'll be sure to break it for you later 😈
@eject AHAHAHA wouldn't have it any other way!!! 😘 💚💚💚
@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎵 (U+1F3B5)
They do appear in results if contained in a toot though
(don't you just love unicode? 😘 )
@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?
Nothing stopping me from writing a UTF-16 aware tokeniser tho!! 💚
@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them
that probably doesn't help much, but i thought it was worth pointing out
@eject Ah cheers! I think I also really need to debounce that input 😅
@andi this is awesome!
@alana_is_tooting Thanks!! 💚💚💚 Do let me know if you have any feature ideas!! 😘
whyare you so perfect
@root Hehe 😊 *blushes*
@andi How do you do so much
@lewdmood Wasted childhood? 🤷🏼♀️
@lewdmood Also running an average of 4h sleep per night probably also has something to do with it, now that I think about it...
@andi That’s what crossed my mind…
@andi oh sweet thisll help me find my first ever post
@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.
@andi so i found it and its even worse than i thought itd be
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.
If you extract it to a temp directory it'll be more consistent
That looks really cool. Yay more tooling!
Possibly related prior art: https://alexschroeder.ch/cgit/mastodon-archive/about/
@codesections Ooh, interesting, hadn't seen that before!
This is mainly just an interface that uses the Web File API to untar and create local file blobs from a Masto archive tgz file, which then becomes searchable via a client-side version of ElasticSearch. It's super basic at the moment!
I'm definitely wanting to improve upon it (possibly add some things like analytics, network diagrams, etc.) so please let me know if you have any suggestions!!
@andi hello 👋 idk if you're still taking suggestions, but...
would it be possible to just browse a list of one's own toots? i don't necessarily want to perform a search, i'd like to just scroll through mine
also does the json data contain how many boosts/likes the toots got? it would be fun to see and sort by that data if possible
this is such an awesome tool btw. thank you so much for publishing it 👏 💯 🏆
@red I'm working on a better version called tootz.app — it will cost your firstborn to use* but will be amazing
* Will not actually require sacrificing first born to use
@red That's an excellent suggestion tho.
Boosts I have data on, favs are a joke in terms of the API so no idea what those actually are.
@andi ooOOooh i am excited and intrigued! and thank you 🙏
before i found your app i was stressing about how to turn that mess of data into something useful, but you made it so easy. so convenient!!
@red Glad it's helpful! ☺️💚💚
@andi you've used named regex groups and those aren't supported in Firefox
It was quite easy to change your code to use unnamed groups though. I would send a pull request or a diff but it doesn't look like observable supports that kind of thing 🤷♀️
@andi scratch that, i've found the suggest button
@eject Ooh awesome, thanks!!! 💚
hell yeah, now i can actually read these things
@carbontwelve Thank you!! 💚
@andi LOL I KILLED BOFA
This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘