I've made a really simple tool for searching your outbox.json (Mastodon archive) file. It's all browser-based and all data stays on your computer — it's not uploaded to the cloud!
There's lots of room for additional features so feel free to suggest some! 💚
@andi oh sweet the ones i found for perusing my bofa archive were all p limited and bad ill have to check this out later
@anna It's suuuuuuuper basic, mainly just a tokenised search facility at this point. I've cobbled it together in 15 minutes while waiting for stuff to compile at work so feel free to suggest features — will add them once home!
Worth noting, Kate was extremely responsive when I filed a bug report, and now it works pretty much as expected for me
@andi !!!!! wow! ✨
@andi you super cool, andi
@ben Yeah, I'm tempted to automatically extract outbox.json but I worry that loading a 80mb tar file into the browser might not be the best for stability... 🤷
@andi ehh, it's not being uploaded anywhere, so bandwidth isn't an issue, and I'm sure there are news websites with 80MB front pages at this point.
@ben Huh, true... I need to walk home (IN THE FREEZING RAIN 😭 ) but will do that once there!
@ben Wooooo I did it 🎉
@andi trying this out on the knzk archive i downloaded during our recent Times of Uncertainty and Woe and this whips ass
@fresh_newlook Glad you like it! It's really basic (but also easy to extend — it's just a notebook) but I've put all of 15 minutes of effort into it and am happy to take feature requests.
Cool! I'll be sure to break it for you later 😈
@eject AHAHAHA wouldn't have it any other way!!! 😘 💚💚💚
@andi alright, first bug: doesn't work with non-ASCII chars, e.g. 🎵 (U+1F3B5)
They do appear in results if contained in a toot though
(don't you just love unicode? 😘 )
@eject Hmmm!!! That's probably a limitation with how the in-browser ElasticSearch implementation tokenises content, you'd probably get the same result by entering one character?
Nothing stopping me from writing a UTF-16 aware tokeniser tho!! 💚
@andi if there are toots with that letter on its own (i.e. not part of a word) it will find them
that probably doesn't help much, but i thought it was worth pointing out
@eject Ah cheers! I think I also really need to debounce that input 😅
@andi this is awesome!
@alana_is_tooting Thanks!! 💚💚💚 Do let me know if you have any feature ideas!! 😘
whyare you so perfect
@root Hehe 😊 *blushes*
@andi How do you do so much
@lewdmood Wasted childhood? 🤷🏼♀️
@lewdmood Also running an average of 4h sleep per night probably also has something to do with it, now that I think about it...
@andi That’s what crossed my mind…
@andi I'm going to steal that lovely heart. Any objections/req's for credit if you're the author or know of them? :3
@ella_kane All of my emoji are totally open for stealing, go right ahead! Most of them are just standard emoji I put through this thing I made: https://beta.observablehq.com/@nuklearfiziks/partyizer-online
(Though the heart one I painstakingly made in Gimp before I built that tool, lol. I think the colourisation turned out better on it, ngl)
@ella_kane No particular requirement for attribution, though I may submit the heart myself to the cultofthepartyparrot.com collection at some point.
@andi that's awesome! Thank youuuuu!!! <3
@ella_kane No problem whatsoever!! Thanks for spreading the flashy psychedelic looooove!!!! 😊
@andi thanks for enabling it! :D
@andi oh sweet thisll help me find my first ever post
@jackofallEves The fact even doing that via the main interface is so difficult is kinda hilarious, ngl.
@andi so i found it and its even worse than i thought itd be
I think you might want to implement it using the zipped archive first, so you don't have to deal with a headache of finding where people extracted files. They should likely be normal (same directory as the outbox.json is a media_attachments folder) but the common issue I see is someone extracting 2 archives in the same folder causing things to get renamed.
If you extract it to a temp directory it'll be more consistent
This is the personal instance of Andi N. Fiziks. Love me or hate me it's still an obsession 😘