I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer — it's not uploaded to the cloud!
There's lots of room for additional features so feel free to suggest some! 💚
Masto Archive Tool v4!!!
I've updated the Masto Archive Tool to give basic stats on your top-5 most liked/boosted/mentioned users. Try it out and let me know how it works! 💚
Currently it only really works for Masto users — sorry Pleroma users, I'll get to you next, promise!!!
@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later
@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features — will add them once home!
Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me
@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🤷
@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.
@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass
@fresh_newlook Glad you like it! It's really basic (but also easy to extend — it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests.
@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎵 (U+1F3B5)
They do appear in results if contained in a toot though
(don't you just love unicode? 😘 )
@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?
Nothing stopping me from writing a UTF-16 aware tokeniser tho!! 💚
@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them
that probably doesn't help much, but i thought it was worth pointing out
@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.
If you extract it to a temp directory it'll be more consistent
That looks really cool. Yay more tooling!
Possibly related prior art: https://alexschroeder.ch/cgit/mastodon-archive/about/
@codesections Ooh, interesting, hadn't seen that before!
This is mainly just an interface that uses the Web File API to untar and create local file blobs from a Masto archive tgz file, which then becomes searchable via a client-side version of ElasticSearch. It's super basic at the moment!
I'm definitely wanting to improve upon it (possibly add some things like analytics, network diagrams, etc.) so please let me know if you have any suggestions!!
@andi hello 👋 idk if you're still taking suggestions, but...
would it be possible to just browse a list of one's own toots? i don't necessarily want to perform a search, i'd like to just scroll through mine
also does the json data contain how many boosts/likes the toots got? it would be fun to see and sort by that data if possible
this is such an awesome tool btw. thank you so much for publishing it 👏 💯 🏆
@red I'm working on a better version called tootz.app — it will cost your firstborn to use* but will be amazing
* Will not actually require sacrificing first born to use
@red That's an excellent suggestion tho.
Boosts I have data on, favs are a joke in terms of the API so no idea what those actually are.
@andi ooOOooh i am excited and intrigued! and thank you 🙏
before i found your app i was stressing about how to turn that mess of data into something useful, but you made it so easy. so convenient!!
mmmm... interesting. Thanks.
Makes me wonder why we don't have tools to explore our exported Mastodon archives.
A simple php or python script could do the trick.
This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘