LisaList2

Advanced search  

News:

2019.06.07 fixed NChat for the "Curve" theme, will eventually move it to its own page and add it to the default theme as well. Other plugins are next. see post in the Meta board for details

Poll

How should we import the old mailing list?

Import all the old messages with fake user accounts and lock them
- 0 (0%)
Recreate all the users and allow them to change their password to reclaim them
- 3 (100%)

Total Members Voted: 3

Voting closed: February 02, 2019, 06:42:01 pm


Pages: [1]   Go Down

Author Topic: How Should we import the old mailing list?  (Read 2154 times)

rayarachelian

  • Administrator
  • Sr. Member
  • *****
  • Karma: +14/-0
  • Offline Offline
  • Posts: 324
  • "But what's puzzling you is the nature of my game"
    • LisaEm
How Should we import the old mailing list?
« on: January 19, 2019, 06:42:01 pm »

Unfortunately there's no existing software to import a mailing list archive to an SMF board, which means I have to write one. To do so we have to recreate the users on the list to allow the posts to be imported.

Do we want to recreate the old users using their actual email addresses, thus allowing them to reclaim their accounts (or if they've already registered here to have their messages under their accounts)?

Or

Do we want to create fake invalid email addresses with real names and import the messages in that way?

Logged
Fate whispers to the warrior, 'You can not withstand the storm.'  The warrior whispers back, 'I am the storm.'

sigma7

  • Administrator
  • Full Member
  • *****
  • Karma: +22/-0
  • Offline Offline
  • Posts: 27
  • Warning: Memory errors found. Verify comments.
Re: How Should we import the old mailing list?
« Reply #1 on: January 20, 2019, 02:59:08 am »

Another option is to make a single post for each subject thread, with individual messages separated therein (similar to a digest). This might be quicker to read/review compared to having to click through a bunch of individual postings, but would eliminate the potential for someone to claim their postings.

Presumably the archived messages won't be editable and will be locked to prevent replies, so the value in claiming them might be minimal.

If all messages from a specific email address were associated with a unique key (eg. by obscuring some of the email address with a hash), then a user could have a comment in their profile saying that such and such a hash (perhaps more than one if they used multiple email addresses) belonged to them.
Logged
Warning: Memory errors found. ECC non-functional. Verify comments if accuracy is important to you.

rayarachelian

  • Administrator
  • Sr. Member
  • *****
  • Karma: +14/-0
  • Offline Offline
  • Posts: 324
  • "But what's puzzling you is the nature of my game"
    • LisaEm
Re: How Should we import the old mailing list?
« Reply #2 on: January 20, 2019, 09:56:00 am »

Yes, that's the plan for the actual messages.This specific poll is about user names, however.

The question of breaking up by thread is a problem with forums software as they're not threaded. You can't reply to a reply, you can only reply to the same topic.

That is, the conversation is a chronological list, rather than a tree, where as with a mailing list, it's a tree and you can easily see a reply to a reply, rather than a reply to the whole discussion.
« Last Edit: January 20, 2019, 10:08:39 am by rayarachelian »
Logged
Fate whispers to the warrior, 'You can not withstand the storm.'  The warrior whispers back, 'I am the storm.'

D.Finni

  • Sr. Member
  • ****
  • Karma: +15/-0
  • Offline Offline
  • Posts: 62
Re: How Should we import the old mailing list?
« Reply #3 on: January 20, 2019, 06:55:16 pm »

Quote
Do we want to create fake invalid email addresses with real names and import the messages in that way?
Since it's an email list, don't you already have their email addresses? Why invent fake addresses?

If we assign the messages to user accounts on SMF, that allows one to easily research post history; the list of all posts by a certain person.
Logged

rayarachelian

  • Administrator
  • Sr. Member
  • *****
  • Karma: +14/-0
  • Offline Offline
  • Posts: 324
  • "But what's puzzling you is the nature of my game"
    • LisaEm
Re: How Should we import the old mailing list?
« Reply #4 on: January 20, 2019, 07:55:25 pm »

Quote
Since it's an email list, don't you already have their email addresses? Why invent fake addresses?

In order to import messages and insert posts by a specific user, that user has to be created in SMF, which means a user name and email address has to be pre-registered. Doing this before the user registers with SMF means that the user doesn't know the password to their own account. They'd need to use the Forgotten Password feature which doesn't yet work, but will soon.
It's possible that some users may no longer control their old email addresses from the pre-google group days because they switched ISPs, etc. So there is a bit of security danger there incase someone steals the old accounts, though it's unlikely to happen because to do so would need the attacker to control the domain or register an email address that's been previously used with an old ISP but no longer exists.
Maybe I'm being too cautious?
Quote
If we assign the messages to user accounts on SMF, that allows one to easily research post history; the list of all posts by a certain person.
Yeah, I agree with this. There's also the question of should we allow replies to the old messages or lock them down? Locking them down for historical purposes may be good, but for example, I have an issue with one of my Widgets that I'm trying to repair and there is an in-flight conversation on the old list that I'd like to continue.
So there's advantages and disadvantages to both paths. That's why I thought, lets setup a bunch of polls so the users themselves can decide and we'll do whatever they pick.
Logged
Fate whispers to the warrior, 'You can not withstand the storm.'  The warrior whispers back, 'I am the storm.'

jamesdenton

  • Full Member
  • ***
  • Karma: +21/-0
  • Offline Offline
  • Posts: 37
  • ArcaneByte
    • ArcaneByte
Re: How Should we import the old mailing list?
« Reply #5 on: January 24, 2019, 12:55:19 pm »

Quote
Do we want to create fake invalid email addresses with real names and import the messages in that way?
Since it's an email list, don't you already have their email addresses? Why invent fake addresses?

If we assign the messages to user accounts on SMF, that allows one to easily research post history; the list of all posts by a certain person.

Well, we don't really have the email addresses, as Google Groups obscures them in the headers. You can see this from Google Groups messages themselves when clicking on 'Original' for a given message, and the same thing applies when the messages were all downloaded for archiving. The user names are not obscured, but the emails were. See:

From: James MacPhail <g...@sigmasevensystems.com>

JD
Logged

rayarachelian

  • Administrator
  • Sr. Member
  • *****
  • Karma: +14/-0
  • Offline Offline
  • Posts: 324
  • "But what's puzzling you is the nature of my game"
    • LisaEm
Re: How Should we import the old mailing list?
« Reply #6 on: January 24, 2019, 02:13:41 pm »

Quote
we don't really have the email addresses, as Google Groups obscures them in the headers.
Actually, we do.

When you subscribe to a Google Group there's some option somewhere to have the messages mailed to you (or perhaps that's because I have Thunderbird pull emails from google over IMAP, etc.). If you subscribed on day one, you'll have all of them in your mailbox.
In my case, I have all of the Google Groups LisaList ones all in a Thunderbird mailbox which is a pretty standard format. Those have all the mail headers including actual member email addresses and full names.

We're not going to have the email addresses of folks who are lurkers and have never posted even one post. Also some email addresses may be invalid (domain change, moving emails from one ISP to another, unsubscribed and never resubscribed, etc.)
That list can be parsed into a member name (from the full name) and an email address and that can then be used to create an account for SMF with a random password.
Here's the process:
  • extract all the email addresses and names, filtering out duplicates. (i.e. | sort | uniq)
  • pre-create all the users if they don't already exist in SMF (via PHP) - first script
  • remove duplicate messages
  • optionally remove advertising signatures (the old LisaList had some signature that advertised various services)
  • for each thread, sort the mailbox by subject, find the original message that replies are attached to, create a new post - second script
  • add replies to the original message for each thread
It's mostly documented here: https://www.simplemachines.org/community/index.php?topic=564500.msg4002936#msg4002936
It's a bit more complicated as it has to be aware of HTML posts converting things like bold/italics/bullet lists, paragraphs, etc., filter out anything that might be bad, and also parse any attachments and reattach them to the correct posts. The html stuff might be easily resolved by dumping out the html to a file and then using lynx --dump file://filename.html to extract it as plaintext.
« Last Edit: January 24, 2019, 02:37:02 pm by rayarachelian »
Logged
Fate whispers to the warrior, 'You can not withstand the storm.'  The warrior whispers back, 'I am the storm.'

jamesdenton

  • Full Member
  • ***
  • Karma: +21/-0
  • Offline Offline
  • Posts: 37
  • ArcaneByte
    • ArcaneByte
Re: How Should we import the old mailing list?
« Reply #7 on: January 24, 2019, 04:34:47 pm »

Quote
we don't really have the email addresses, as Google Groups obscures them in the headers.
Actually, we do.

When you subscribe to a Google Group there's some option somewhere to have the messages mailed to you (or perhaps that's because I have Thunderbird pull emails from google over IMAP, etc.). If you subscribed on day one, you'll have all of them in your mailbox.

Fair enough. I forgot to take you old timers into consideration. And we even talked about this at one point. My bad!
Logged

D.Finni

  • Sr. Member
  • ****
  • Karma: +15/-0
  • Offline Offline
  • Posts: 62
Re: How Should we import the old mailing list?
« Reply #8 on: January 29, 2019, 06:58:55 pm »

Yeah, I agree with this. There's also the question of should we allow replies to the old messages or lock them down? Locking them down for historical purposes may be good, but for example, I have an issue with one of my Widgets that I'm trying to repair and there is an in-flight conversation on the old list that I'd like to continue.
Agreed, I can see arguments for and against. Some threads are just old and dead forever. But others might stand being revived if new and relevant information comes to light. Perhaps this could be handled on a case-by-case basis?

Or perhaps there could be a warning message like, "This thread is over a year old, are you sure you want to reply?" when going to make a post to it.
Logged

rayarachelian

  • Administrator
  • Sr. Member
  • *****
  • Karma: +14/-0
  • Offline Offline
  • Posts: 324
  • "But what's puzzling you is the nature of my game"
    • LisaEm
Re: How Should we import the old mailing list?
« Reply #9 on: January 30, 2019, 09:31:11 pm »

I'm not an expert of SMF, but I swear I saw an option that I can set here that expires a thread after a certain number of days, so yeah, we can prevent replies to really old threads.There's nothing stopping anyone from linking to old threads, you can just copy the permalink url to an old post in a new message and say something like "I see in this thread that Woz said Blueboxes were really cool."  8)
We'll have to see once I start to import these guys what as to what happens. One of my (few?) better decisions was to setup all of this in a bunch of docker containers, so I can experiment on a simulacrum of the forum on my laptop, so once I'm ready to import I'll see what happens. I hope I can set the date on the posts to the same date as the original message. If it sets it to the date of the import, that would suck. (And if it doesn't work right, I can trash the container and try again without messing up the live forum.)
Also apologies for taking it slow on this. I was sick with the martian death flu most of last week, so didn't get anywhere near what I wanted to achieve.  :(
I did extract all the email addresses and looked over the mailbox, and it's quite the mess. I see a bunch of users on the old initial LisaList that re-signed up and posted with multiple accounts. So I'd need to manually find and replace the duplicates so that posts from the same user show up as from the same account and not from 3 or 4 different ones.

This will need edits to the mailbox file too. (i.e. some folks had either their own domain, or emails with their ISPs, then changed ISPs, and then then switched yet again to another, so 3-4 accounts over 19 years.) And there's a lot of craziness over UTF8 stuff.
So while I can do it the cheap and quick way, I think that's a bad idea. I think this needs to be done right because otherwise we won't be able to undo it without having to write SQL statements.
Logged
Fate whispers to the warrior, 'You can not withstand the storm.'  The warrior whispers back, 'I am the storm.'
Pages: [1]   Go Up