Subject: Spreadsheet archives are now updated. (nm)
Author:
Posted on: 2017-07-28 10:42:00 UTC
-
Altchives updated by
on 2017-07-20 05:20:00 UTC
Reply
Last night, I updated the Board altchives.
This year's update includes:
- The Board archive updated through last night (the Board currently goes back to somewhere in October 2015)
- All of the old Other Board
- All of T-Board from the beginning of 2016 (the post-New Year's Board outage) to now
- Fixes (applied retroactively) for the Cyrillic text problem (aka, the one where I declared the Board was often in Windows-1251, as opposed to the correct Windows-1252 because of a typo). This means that the mini-Missingno Pokйmon is has gone off to live on a farm somewhere.
- Similar fixes to escape a few characters that T-Board considers formatting, like ^. This means some people's emoticons are no longer floating in mid-air.
- Very importantly, link maps! These .csv files (because I haven't figured out a better place where to put them yet (suggestions welcome!)) allow you to convert between Board (or Other Board, and so on) links and archive links. However, this data can't be recovered for the older scrapes, and so is only available starting in October 2015.
Because of the updates and changes (especially the attempted data corruption fix), could anyone who's been keeping copies of the archives (or anyone who wants to start doing that) let me know how they want their new versions delivered (I've forgotten who's part of my distributed backup system, honestly, which is why I'm asking here). Similarly, hS, how would you like to get the new versions of the spreadsheets?
- Tomash -
Oh, wonderful. by
on 2017-07-20 10:03:00 UTC
Reply
Um, have you put the Other Board and T-Board into the same list? As in, in-line with the Board? That seems like an odd decision to me.
I forget how you gave me the spreadsheets before - it must have been either straight email or by upload to Google Drive. But to be honest anything works - I'm sure you have a preferred file transfer setup. :) You have my email.
hS -
Yep, they're all interleaved by
on 2017-07-20 17:02:00 UTC
Reply
The main reason is the internal architecture of the archives and T-Board. Specifically, the way the archival process works is that various bits of scraping code (the script that hits a YWA board and the function that dumps segments of a T-Board) output the contents of their respective post-containing thing in a specific format (Ruby Marshal dump of lists of hashes and lists). Then, there's another bit of code that takes such an archive files and posts it all to a T-Board instance. In order to make it easy to distinguish the archives from everything else, all archived posts are posted by the user "Archive Script". (This started as a workaround for needing the posts to be posted by some user account, and turned out to be useful.) Finally, there's a series of commands and scripts (which I really need to chain together in a more automated way) that'll get T-Board to output the posted archives into the format everyone sees (and a different, much smaller, set to generate the spreadsheets)
The reason there isn't anything more principled to store multiple archives is that the current system works and I'm lazy.
This system also has the advantage of making it easier to compute statistics over all our various boards. For example, here's some information about the number of posts per day we make
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 22.00 37.00 43.05 56.00 317.00
- Tomash, who'd do more stats but needs to go job -
Spreadsheet archives are now updated. (nm) by
on 2017-07-28 10:42:00 UTC
Reply