I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer — it's not uploaded to the cloud!
There's lots of room for additional features so feel free to suggest some! 💚
Masto Archive Tool v4!!!
I've updated the Masto Archive Tool to give basic stats on your top-5 most liked/boosted/mentioned users. Try it out and let me know how it works! 💚
Currently it only really works for Masto users — sorry Pleroma users, I'll get to you next, promise!!!
@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later
@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features — will add them once home!
Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me
@andi !!!!! wow! ✨
@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🤷
@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.
@ben Huh, true... I need to walk home (IN THE FREEZING RAIN 😭 ) but will do that once there!
@ben Wooooo I did it 🎉
@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass
@fresh_newlook Glad you like it! It's really basic (but also easy to extend — it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests.
Cool! I'll be sure to break it for you later 😈
@eject AHAHAHA wouldn't have it any other way!!! 😘 💚💚💚
@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎵 (U+1F3B5)
They do appear in results if contained in a toot though
(don't you just love unicode? 😘 )
@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?
Nothing stopping me from writing a UTF-16 aware tokeniser tho!! 💚
@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them
that probably doesn't help much, but i thought it was worth pointing out
@eject Ah cheers! I think I also really need to debounce that input 😅
@andi this is awesome!
@alana_is_tooting Thanks!! 💚💚💚 Do let me know if you have any feature ideas!! 😘
whyare you so perfect
@root Hehe 😊 *blushes*
@andi How do you do so much
@lewdmood Wasted childhood? 🤷🏼♀️
@lewdmood Also running an average of 4h sleep per night probably also has something to do with it, now that I think about it...
@andi That’s what crossed my mind…
@andi I'm going to steal that lovely heart. Any objections/req's for credit if you're the author or know of them? :3
@ella_kane All of my emoji are totally open for stealing, go right ahead! Most of them are just standard emoji I put through this thing I made: https://beta.observablehq.com/@nuklearfiziks/partyizer-online
(Though the heart one I painstakingly made in Gimp before I built that tool, lol. I think the colourisation turned out better on it, ngl)
@ella_kane No particular requirement for attribution, though I may submit the heart myself to the cultofthepartyparrot.com collection at some point.
@andi that's awesome! Thank youuuuu!!! <3
@ella_kane No problem whatsoever!! Thanks for spreading the flashy psychedelic looooove!!!! 😊
@andi thanks for enabling it! :D
@andi oh sweet thisll help me find my first ever post
@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.
@andi so i found it and its even worse than i thought itd be
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.
If you extract it to a temp directory it'll be more consistent
That looks really cool. Yay more tooling!
Possibly related prior art: https://alexschroeder.ch/cgit/mastodon-archive/about/
@codesections Ooh, interesting, hadn't seen that before!
This is mainly just an interface that uses the Web File API to untar and create local file blobs from a Masto archive tgz file, which then becomes searchable via a client-side version of ElasticSearch. It's super basic at the moment!
I'm definitely wanting to improve upon it (possibly add some things like analytics, network diagrams, etc.) so please let me know if you have any suggestions!!
@andi you've used named regex groups and those aren't supported in Firefox
It was quite easy to change your code to use unnamed groups though. I would send a pull request or a diff but it doesn't look like observable supports that kind of thing 🤷♀️
@andi scratch that, i've found the suggest button
@eject Ooh awesome, thanks!!! 💚
hell yeah, now i can actually read these things
@carbontwelve Thank you!! 💚
This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘