The World of Â

For technical issues, problems, bugs, suggestions on improving these forums, discussion of the rules, etc.
Post Reply
User avatar
Mr. Oragahn
Admiral
Posts: 6865
Joined: Sun Dec 03, 2006 11:58 am
Location: Paradise Mountain

The World of Â

Post by Mr. Oragahn » Sat Mar 06, 2010 12:54 am

Yo. I've been rereading old topics last days and I noticed that for some reason, many posts, probably predating the second last update, have been butchered with those invasive  and perhaps some other mystical symbols.
Isn't there a script somewhere on Internet to clean up all posts automatically, something that would correct or remove those Â?
I suppose they can be safely removed since I don't think any of us has use  even once in our posts.

Jedi Master Spock
Site Admin
Posts: 2164
Joined: Mon Aug 14, 2006 8:26 pm
Contact:

Re: The World of Â

Post by Jedi Master Spock » Tue May 04, 2010 6:57 pm

What happened is that the character conversion when we went from PHPBB2 to PHPBB3 is that it butchered some "special" characters and the stuff surrounding them. I tried to fix as much as possible of it and tried a few different conversion techniques, but regardless of which "fix" I used, it seemed like some of the special characters the old board software supported got converted incorrectly.

In some of the older attempts at conversion, entire paragraphs were replaced by strings of special characters; what you see is the most minimally damaging conversion I could find, and I tried to spot and replace anything missing, but there really are too many of them for it to be worth the time.

If there's a specific post that you'd like to clean or restore, especially if you think some material has been lost, indicate it in this thread and I'll know what to look for in the old database backups.

If those characters are cropping up in posts made after the major 2=>3 update (that's to say when the skins change) that's a sign that the database is becoming corrupted, which would be a bad thing.

User avatar
Mr. Oragahn
Admiral
Posts: 6865
Joined: Sun Dec 03, 2006 11:58 am
Location: Paradise Mountain

Re: The World of Â

Post by Mr. Oragahn » Tue May 04, 2010 9:25 pm

Bit late on that, s'rry.
Affected posts are only the old ones. It's very weird though, aside from ² or ³ I rarely used anything else, not even a good umlaut.

However it seems some " have been hit hard. It's quite a weird bug. I didn't know that all the text would be reinterpreted during an update. I thought text was stored in sort of immutable blocks and only the interface and functions would change.

User avatar
Mr. Oragahn
Admiral
Posts: 6865
Joined: Sun Dec 03, 2006 11:58 am
Location: Paradise Mountain

Re: The World of Â

Post by Mr. Oragahn » Sat May 08, 2010 9:44 pm

I have an example here:

http://www.starfleetjedi.net/forum/view ... 181#p14181

Here is the list of letters and symbols that haven't made it through (RIP):

é
'
-

The bugs might have occurred when they were found next to other letters.
It's really weird that those didn't got recognized during the update. Not all posts seem to have been affected btw. If you can't even trust a template-update, I can only begin to imagine the nightmare it is to administrate a huge website.

Here's another one:
http://www.starfleetjedi.net/forum/view ... 859#p14859

I don't know what WILGA used here, perhaps "

Something similar here:

http://www.starfleetjedi.net/forum/view ... 858#p14858

There may be a randomness in how often the bug appeared, but when it did, it followed a pattern, as each group of weird letters can be deciphered back to what they were, and it seems to always be the same result for each single group.
Last edited by Jedi Master Spock on Sun May 09, 2010 1:34 am, edited 1 time in total.
Reason: Removed SIDs from URLs

Jedi Master Spock
Site Admin
Posts: 2164
Joined: Mon Aug 14, 2006 8:26 pm
Contact:

Re: The World of Â

Post by Jedi Master Spock » Sun May 09, 2010 1:45 am

(I've removed the SIDs from the URLs you posted. You generally don't want to post URLs including SIDs.)

One common element to many affected posts seemed to be that they were written in some other text editor, and then copy-pasted back onto the board. (One of the major offenders was a quotation symbol.) In some cases, the "normal" corresponding symbols made it through intact.

Doing a find-and-replace on the database can be a little bit of a pain, but it's doable if the symbol sequences don't overlap each other too much. IIRC, I did it for the ones I'd noticed back at the time of the conversion.

User avatar
Mr. Oragahn
Admiral
Posts: 6865
Joined: Sun Dec 03, 2006 11:58 am
Location: Paradise Mountain

Re: The World of Â

Post by Mr. Oragahn » Sun May 09, 2010 3:10 am

So an outside text editor does attaches some invisible data to the text.
I thought the phpbb system automatically filtered all bizarre symbols and letters.
If there's no automatic process to clean up all the database's posts, don't bother. It doesn't make the posts unreadable.

Post Reply