Unix and Linux Help, Resources and information for Unix/Linux, Mac OS X. Articles on blogging, web site mechanics, and self employment. Mostly techy, Unix/Linux related, but we don't really try to stay tightly focused. If you've never been here before, there's a lot to explore.
For the past week I've been playing Hold-em on Facebook. Let's make one thing perfectly clear immediately: this is nothing at all like playing poker with real money.
As proof, consider that I, a moderately (and only moderately) skillful player, began the week with a $1,000 play money bankroll. As I write this, my fake bankroll stands at $423,027. If you seriously think that anyone, never mind me, could accomplish that feat in the real world, well, I'd like to sit down and play poker with you. I'm sure we'd both enjoy it.
Anyway, even though FaceBook Texas Hold-em is full of "Bingo" players (people who make ridiculous bets with any cards at all) there seem to be enough serious players there that you can (with a little effort) enjoy a good game now and then.
The path to a good game requires getting rid of the Bingo players. Interestingly, it's not that hard to do. For those who don't understand the game, in Texas Hold-em you are dealt two cards. Everyone bets based upon what they now have (or would like other players to think you have). In a real game, a typical bet at this point would usually be not more than three or four times whatever the minimum bet is. The reason is that you really don't know what you have - there are more cards to come. A pair of Aces in your hand is certainly good, but it's hardly a guaranteed win. In a tournament, a player might push in all their chips with Aces if the situation is ripe for that play - if it's late in the tourney, if your chips dominate the other players, if you are playing in "late position" (being the last or close to the last person to bet).
In a cash game (not a tournament) , it would be extremely unusual to risk everything you have on one bet before seeing more cards. But that's just what the Bingo players do, and they don't wait for Aces to do it - they'll often just push in chips no matter what their cards are.
There's no point in complaining. I made a mildly sarcastic comment once and got the retort "It's called gambling, dickhead!". OK, yes, that it is a type of gambling. It's not poker.
However, if there are at least a few of you at the table who would like to play poker as though the play money really meant something, there is a way to get rid of the Bingo players. You simply ignore them.
That is, if they bet their stack pre-fop (before the remaining cards are dealt), you simply fold. You fold even if you are holding Aces yourself, you fold even if you have already called smaller raises before the Bingo players jump in.
The other Bingo players will call the bets. It doesn't matter very much what cards they hold - they are in it for the excitement of a giant pot. They'll keep playing like this until they wipe each other out.
However, if you and just a few other players refuse to call ridiculous pre-flop bets, there is no excitement. As the other Bingo players are wiped out, there will be more serious players than Bingo players and the only time the Bingo folks can get any action is post-flop. Even then, you don't play to their silliness unless you have the "nuts" (the best possible hand) or close to it. Otherwise, you just fold - ignoring pot odds and implied odds and not caring at all about whatever money you are throwing away. Your purpose at this point is to get rid of the Bingo folks so you don't call their bets unless you are very sure to win.
This strategy works. Of course new players will join the table, and some will be more Bingo players, but if you all stick to the strategy, they'll soon either be bored or broke, leaving the rest of you to enjoy a good game of poker.
You'll also acquire a big pile of fake money earned from the Bingo players foolish enough to push all in with their post-flop pair of Kings against your made full house or Ace high flush. That's where my $400,000 came from - playing against the "real" players probably gave me much less - if anything at all. Those were the games that were fun, though, where skill, cunning and a little luck are what you and your opponents are using. With serious players, you can even bluff now and then - something you can never do against a Bingo player. At times, you might even forget that it isn't real money - well, until it's time to stand up and you realize you are not going home with several hundred thousand dollars. Oh well: I'm having fun - that's what matters.
I do wish they had a "post and fold" option so that you could leave your seat for a few minutes and not lose it. It would also be nice to be able to set your "Raise" button to some multiple of the minimum bet and have it stay their until you change it. There are other minor interface annoyance issues, but overall I do find this enjoyable.
/Misc/fbpoker.html copyright and reprint notice
The iPad is still not much more than pages on Apple's web site, but already some folks are telling us that it's unimportant, a bust, a no-show, insufficient, ill-conceived and all that. That some of those nay-sayers cast similar barbs at the iPhone could be amusing, but I wouldn't argue against most of the complaints: they are absolutely correct that Apple's new device has warts.
And they are absolutely wrong that it will be a failure.
The iPad is a game changer. The people carping about its defects are missing the bigger picture - devices like this will ultimately change the way we use our computers.
Consider the form factor for a moment. No, the iPad doesn't roll up to fit in your pocket (though some future device like this might). But at 9.6 inches diagonal and with 720p resolution, it is big enough and sharp enough for pictures, TV and movies and, of course, books. Yeah, yeah, E-ink is "better" for books, but that misses the big picture - the iPad has books AND everything else.
It also has apps. The existing 140,000 iPhone apps and many more to come that will take advantage of the larger screen real estate. Among those apps are a few that implement some very important three letter acronyms: RDP, VNC and SSH.
Those three will make the iPad the perfect choice for technology workers. Set your iPad in its keyboard dock and connect to the company server. But what most everyone is missing is that you'll probably end up using this the same way at home. You may very well have a Mac or a Windows machine packed with ram and disk storage and you may be accustomed now to sitting down at its keyboard. One day you'll realize that you can use the iPad as a client. Maybe at first you only use it that way now and then - when you are sitting on the couch or enjoying your patio or porch. But as you realize how convenient that is, you might start doing it more and more.
By that time, the iPad will have probably insinuated itself into your life in other ways. You'll probably be using it to control your TV and associated devices - remote controls are horridly primitive, aren't they? You'll have started using Google Voice and Gizmo and you may be wondering if you still need a cell phone at all. The iPad has become your constant companion - you put it down on your bedside table before you nod off to sleep and it wakes you up in the morning. You may have even fumbled for it in the wee hours of the morning to record some Very Important Thought that raised you from your dreams.
Of course you use if for your calendar, your music, your movies, your books, probably your newspaper. It's your photo album, your email client, your gaming console, your web browser, your music, your journal, your spreadsheets, your private movie theatre, your work.. your life.
It's not perfect. There are things you want: maybe a camera, maybe other things. Those will come, but you have so much now that those gripes seem almost unimportant.
Not perfect, but it IS magical. It IS a game changer. Microsoft already hates it and will be spreading all the negative FUD it can. Cell phone makers might join in as people start switching to Google and Gizmo. There will people who insist they don't need it, don't want it. They'll be lying to themselves.
/MacOSX/ipad.html copyright and reprint notice
I was recently contracted to help another consultant sniff a customer's network for suspicious activity. The situation was that the customer had been put on blacklists because some internal machine had apparently been compromised and was sending out spam.
Obviously the first task was to find and clean up any infected machines. The consultant contracted that out to someone else who updated virus software and ran scans. Unfortunately, that person didn't provide details of his work - he just reported that he had found and fixed "some problems". This didn't leave anyone feeling confident that the problem had actually been dealt with.
I pointed out that, if possible, all machines other than the internal mailserver should be blocked from sending email (other than to the internal mailserver, of course). Ideally, they should be locked down to only whatever outgoing ports are absolutely necessary, but blocking 25 and 465 is a good start. That was done, but my contact still wanted to know how to sniff what is actually happening on the network.
I had him buy a DualComm port mirroring switch and arranged to meet him at the customer site. The DualComm is an inexpensive 5 port, USB powered switch that, by default, mirrors port 1 to port 5. It's small enough to keep in your laptop bag, cheap enough that you can leave it at a customer site and the USB power means one less outlet to hunt for. The default port mirroring makes this ideal for lan sniffing.
Because the consultant wanted to use Windows, I brought a Windows laptop with Windump installed. Windump is just tcpdump so that makes it easy for me and it also means that he can search for tcpdump tutorials and learn more about its usage.
Both Linux and MacOSX users have tcpdump installed by default. Personally, I'd much rather carry a Mac or Linux laptop for this kind of work as there are many other tools that Windows doesn't bother to include. But this consultant was more comfortable with Windows, so that's what we did.
On site, I connected my laptop to port 5 of his DualComm, took the patch cord that went to the ISP's router and put it in port 2, and then ran port 1 back to the customer's switch where I had unplugged the router cable. I started up a CMD window and showed him that we could do things like
That showed traffic to and from the internal Kerio mailserver as we'd expect. I then stopped the mailserver and all Windump output ceased. We watched for a few minutes, saw nothing, and turned the mailserver back on. I showed him that the Kerio admin "Active Connections" under Status should match the IP's we were seeing in Windump. This made him feel more confident that the problem was indeed resolved. I did suggest that he might want to log some longer runs just to be certain, but as I confirmed that client machines were blocked, I don't expect to see this problem again. The sloppiness of the contractor who did the virus cleanup bothers me a bit, but otherwise this is under control.
/Security/dualcomm.html copyright and reprint notice
Google's Custom Search engine is generally accurate at finding the results I'd like it to find when searching this site. However, sometimes it doesn't find what really is the best result. Until now, there hasn't been much you could do about that.
The new "Promotions" feature gives us a way to help. Simply, you define keywords, a title and a link. If those keywords are used, your link appears above all the normal results.
You can define more than one promotion. For example, if you type "laserjet" into the search box at the top of this page, you'll see that I have added two promotions: one for "netcat" and another for escape sequences to select trays. I think those links are likely to be more important than what Google selects by itself.
This is new (I just found out about it this morning), so I haven't added very many of these yet. I don't know what limitations Google has on the total number of promotions or the number per keyword. but I'm sure I'll be using as many as they will allow. Better search results mean happier visitors, right?
/Web/cse-promotions.html copyright and reprint notice
As I mentioned at Internet Scrabble (the name Scrabble is a trademark of Hasbro, Inc. in the United States and Canada and of Mattel elsewhere), our family has been enjoying playing Scrabble on-line. Aside from the fun of playing, it gives us another reason to keep in touch with each other - the small interactions in the chat windows are part of our involvement in each others lives.
i have also been playing with strangers. When I first started that, I was a little shocked by some of the word usage. In our family games, we had never allowed slang, abbreviations, or foreign words. If you had a Q and no U, you were stuck - there was no QI or QAT or QANAT in our games. Isn't that why a Q is worth 10 points? If all it takes is an I to play it, it shouldn't be worth much more than H, should it?
Well, that's not how Scrabble is played today. Foreign words, abbreviations, old English and slang have found their way into TWL (The Word List). It almost seems that you can toss down three or four random letters and have a good chance of being able to create a valid Scrabble word.
Of course you have to know the words. With the Facebook version, you can guess and let the dictionary correct you, but if you ever want to play face to face, that won't work. You need to learn more words.
Not just any words, though. While a large vocabulary isn't a bad thing to have for Scrabble, you can improve your scores by just concentrating on the two and three letter words plus a few other odd words that use the high scoring letters. Add to that the "aa" and "ii" words (for those times when you get a rack full of them), take away the obvious stuff that you already know, and you'll be left with less than 1,000 words, give or take. That's a fair pile to memorize, though not impossible.
If I were younger, I probably would just memorize those words. I think I probably still could, but instead I let my computer help me learn them. That's easier on my tired old brain.
I made a list of the words I want to learn. I added to it the words that that I have a hard time accepting as legal plays like "AB", "SIM" and a few other abbreviations I just don't think of as words. I added a ":" and then a definition for each word - I find it much easier to remember words if I know what they mean. The final step was to write a program to present these to me randomly like flash cards.
My first effort was just a Perl script that randomly shuffles the list and then outputs each line to a terminal window. That was fine, but I had to have that window open to see the output. I wanted the words and definitions to be always visible to me - to appear on top of whatever else I happen to be doing, but to be translucent so that it wouldn't interfere too much. Hmmm.. sounds an awful lot like Growl.
I already use Growl for mail notification and the command line "growlnotify" was just what I needed. Here's what it looks like while running (click on the picture for a larger view). The Growl notification is in the upper right corner showing "KAE" at that moment. The simple Perl code follows.
I call that "scramble" and leave it running as "while :;do scramble mywords;done" (that way I can add words while it is running or have it use different lists).
/MacOSX/scrabble-growl.html copyright and reprint notice
I and other people here have mentioned TrueCrypt before. I thought (and perhaps you did too) that it was very simple and obvious to use but I've had several people write to me complaining that they downloaded and installed it, but have no idea what to do next.
OK, maybe the interface isn't all that user friendly. It really is simple, but after looking at it from an "ordinary person" perspective, I can agree that it could leave you staring at the screen saying "Huh?" So let's run through using this in plain English.
The most common fear I heard from people was that they were afraid TrueCrypt was going to encrypt their hard drive and that something would go wrong or they'd forget the password.
Yes, TrueCrypt can encrypt entire hard drives, and yes, things could go horribly wrong or you could forget your password. So, yes, you have reason to be concerned. I definitely would NOT advise using TrueCrypt for that purpose unless you completely understand what you are doing, what the risks are, and (perhaps most importantly) WHY you are doing it.Most of us need to protect individual files. Maybe you have a text file with all your passwords in it. Maybe you handle sensitive documents for your clients. Whatever it is, you usually don't need to encrypt a whole drive. You just need to lock up those particular files.
This is the simplest and safest TrueCrypt operation. Start up TrueCrypt. You've never used it before, so what you want to do is click on Create Volume. You want to create an "Encrypted File Container" (that's the default). Click Next, and then select "Standard True Crypt Volume" (again that's the default). What happens next seems to confuse people: a file dialog comes up, which perhaps makes you think that you need to select some file.
No, it's looking for you to give the name and location of a NEW file. This file will be the "container" for the files you actually want to hide. It's going to eventually end up as another disk drive on your system, which is perhaps another reason this can confuse folks: it's a "volume", it's a "container", it's a disk drive. No wonder people are hesitant to proceed!
So click on "Select File", navigate to where you want to keep this, and give it a name. Remember, this is the "container". It's the box your secret files will hide in. You might call it "Secrets", "My Secret Stuff" or "Fred" - choose something that makes sense to you. IF YOU CHOOSE AN EXISTING FILE, IT WILL BE DELETED.
So, after choosing a name, click Next and the following screen asks what kind of encryption you want to use. For most of us, the default AES is fine. The TrueCrypt help file suggests reasons why you might choose one of the others:
If you store the backup volume in any location where an adversary can make a copy of the volume, consider encrypting the volume with a cascade of ciphers (for example, with AES-Twofish- Serpent). Otherwise, if the volume is encrypted only with a single encryption algorithm and the algorithm is later broken (for example, due to advances in cryptanalysis), the attacker might be able to decrypt his copies of the volume. The probability that three distinct encryption algorithms will be broken is significantly lower than the probability that only one of them will be broken.
You can take the default choice for the hash algorithm and click Next.
Now you need to choose the size of your container. Obviously it needs to be large enough to store the files you want to hide, but you may want to think about making more than one smaller container. For example, if you are going to store backups of this container (a good idea!), you might want to do that on a CD or DVD - obviously the container size has to be small enough to fit on the storage media. Or perhaps you plan on using one of the many free Internet storage sites - your choice of size may be limited by what they will give you for free.
There's also a minimum size - not because TrueCrypt really cares, but because your operating system can't create a disk drive (which is what this container ultimately becomes) smaller than a minimum size. Once you've decided how big or small this wiill be, you click Next and it's time to choose a password.
TrueCrypt isn't looking for "joe123" or even :"P^%WErt45!@p.k" . It's looking for a long sequence - they recommend at least 20 characters and you can use up to 64.
You could make up a long string of nonsense, but how are you going to remember "Ht^%f2HH(hpo&mnE$%d";q\n*^$sdf"? I suggest using a phrase - a sentence - that you can remember. It might be words from a song: "Memories are all I have to cling to - cling to!" or a string of names: "Thomas, Jonathan, Sarah and THEN William!". If you always keep the books on your shelf in the same order, maybe you could use their titles: "Programming Perl, Perl Cookbook, Linux Firewalls and Linux Cookbook". It is best if you can include some random punctuation, but if this password is never going to be written down and will live only in your head, it's better to be a little more weak than risk forgetting it - once you've locked your files up with this, they are not coming back without that password!
A weaker password can be augmented with a "key" file or files. These are simply files that TrueCrypt takes 1024 bytes from and mixes into your encryption. You can use any file (or multiple files) on your disk as long as the first 1024 bytes of it will never change. You could use a file stored on a USB stick - if someone stole your computer but didn't get that USB drive, they can't open your TrueCrypt files even if they have the password. Of course, you can't either - you have to have the key file(s) available to get at your stuff.
Once you have decided on your password and any key files, it's time to actually create the container. You'll be asked to move your mouse randomly for a bit and then click Format. The purpose of the random mouse stuff is to generate better encryption, so just do it even though it sounds like someone might be pulling your leg. After you click Format and TrueCrypt says it is all done, you can exit back to the main screen.
You have now created a container. You haven't put anything in it yet and to do that, you need to mount it as a disk drive. You'd think you cold just click "Mount" and TrueCrypt would ask you what you want to mount, but no, you need to first click "Select File", find your container, point at your key file(s) if you used any), and then click "Mount". You can select what drive letter or (Mac) volume identifier to use and once it is mounted you can exit TrueCrypt - you can unmount the container using ordinary operating system methods if you wish.
While it is mounted, you can put files in it. I suggest keeping safe copies of your files until you feel completely comfortable with TrueCrypt - remember, if you can't recall the password or lose any required key files, you will have no access to your data.
That's it. After you have loaded the drive up with files, you unmount it and that's it - the encrypted container is protected by your password and any key files you specified. It was pretty simple, wasn't it?
/Basics/truecrypt.html copyright and reprint notice
Our family has recently discovered online Scrabble (the name Scrabble is a trademark of Hasbro, Inc. in the United States and Canada and of Mattel elsewhere) through Facebook. We've been a Scrabble playing family for many decades; we all love the game, but we are seldom all together to play and even when we are, we usually have other things to do.
Internet Scrabble changes all that. You can make one move per day or even less. More importantly, you can play more than one game at once: I have one game going with my wife, one with each of our daughters, and one with the four of us. I also have more games going with friends and a few with strangers. I can spend as much or as little time playing as I like - and so can all my opponents.
Playing Scrabble on-line is different in other ways. There's no arguing about words: the game enforces acceptable words. On the other hand, words our family never would have allowed in home games can be used; in one of my first games with a stranger, I was shocked to see "NGWEE" appear. That word is the Nigerian unit of currency, but I'm sure you knew that. I'm sure you also know that "AA" is a dry from of lava and that "DUIT" is a Dutch coin - and that all of them are acceptable words in Facebook Scrabble.
Another issue, especially when playing with strangers, is the easy availability of on-line anagram solvers. In our family games, we allow checking the dictionary for spelling and minor word-hunting, but using an anagram program is a bit much. On the other hand, there is a fair amount of strategy and plain old luck to Scrabble, so the cheaters aren't guaranteed a win. Finally, if you really feel someone must be using an anagram program, you can fight back by doing the same yourself.
Speaking of anagram solvers, there are Scrabble robots at other sites. I haven't tried playing against one, so I can't say how good they are at strategy - I'm sure they are pretty good at finding the best scores, but that's not always the best play, so a human may still have a fighting chance.
So, if you are wondering what I'm doing in between working, eating, sleeping, breathing, and all that, well, I'm probably playing Scrabble. Maybe I'll see some of you on the other side of the board!
Another popular Scrabble site is Internet Scrabble Club. They have timed play there, but when I tried it out they were having server problems and that made playing very difficult. I don't know if that's usual or unusual.
/Web/facebook_scrabble.html copyright and reprint notice
This is a continuation of Detecting Comment Spam, Part 2
In the previous posts in this series, I've said that spammers habits allow us to detect their attempts to leave inappropriate comments. I use these techniques here and am able to send most spam comments directly to the bit-bucket without ever having to examine them myself. When I suspect spam but am not sure, I just send the comment to moderation. In practice, very few spam attempts get by the automatic filters.
The code I use is a hodgepodge I developed over many years of fighting spam comments. I need to rewrite it and my thought is to move toward a scoring system similar to that used by Spamassassin for mail spam: each "bad habit" increases an overall spam score and the final judgement is made based on the total score accumulated. To that end, I have been pulling out the various tests I use now and examining them. Some always cause the comment to be treated as spam; those would add enough points to ensure that would still happen. Other habits now cause moderation, but in my new design each of those will increase the spam score. Any spam score at all is cause for moderation, but if the post accumulates enough minor points, I can skip that and just throw it away.
By the way, there is an effort to create a BlogBlogSpamAssassin.
I'll be reviewing things covered in the previous two posts and will introduce some new ideas. Items covered in the previous posts are marked with a "*".
* Known spam links: Reusing links we already know are spam. This would carry enough points to always be spam.
* High link to text ratio: Sometimes nothing but links. This could be legitimate, but it's a strong indicator. In my current code, this always treats the comment as spam, but I think I will change it so that at least one other spam point is required.
* Nonsense words: The higher the ratio of nonsense to total word count, the more likely this is spam.
* Many quick posts: Legitimate commenters may post more than once per day, but there will be some time delay between comments as they read the articles they are commenting on. I simply enforce a "posting too frequently" policy.
* Direct posts: Some spammers bypass your forms and send direct POST requests. That's definite spam.
* Typed too fast: Legitimate commenters MAY use cut and paste, but moving from form load to POST too quickly needs to carry some weight.
Multiple posts to same article: Legitimate commenters almost never post more than one comment without some other comment intervening. Sure, someone may have an afterthought and add a second comment, but more usually this indicates a spammer, so we should add points.
Same comment at another article: As always, spammers are lazy. They sometimes post exactly the same text to different articles. The posts may not come from the same IP address, but the content is the same. This should probably carry enough points for flat rejection. It does mean that you need to maintain a database, but you only need to store 100 characters or so (spam comments are usually short) and you can clean it out after 24 or 48 hours.
Text and link have same word: This is a minor spam indicator, but if the word "Horrivea" appears as "Buy Horrivea: http://someaddress/horivea", it may be spam.
Failure to reload: I require clicking on a link to reload the original page after submitting a comment. A legitimate poster will almost always click, a spammer almost never will. Again, this is not an absolute indicator, but is worth a point or two.
Akismet is a popular comment checker. I have found it to be less effective than my own tests, but it can't hurt to have another opinion, can it?
Captcha tests: See Creating a blog... with anti-spam measures for an example of incorporating Captcha. As this technology is annoying, you might consider adding it as a confirmation only when some other conditions have raised suspicion. Similar tests involve solving simple math or providing answers to obvious questions. Spammers are in a hurry; they are usually using bots and if not they just don't want to spend time thinking, however small that time is.
There are a few things that spammers don't usually do. Legitimate comments almost always use punctuation, so the presence of punctuation could decrease a spam count. Legitimate posters sometimes quote text from the article or from a previous comment; spammers almost never do that, so again we might decrease a spam score if we see this.
We can control spam comments. Not all of it, but we can limit the amount that has to be checked manually and we don't need to annoy our visitors with required registration or automatic "human reader" testing.
I'd love to hear your thoughts, comments and ideas.
/Web/detecting-comment-spam3.html copyright and reprint notice
This is a continuation of Detecting Comment Spam, Part 1
In part 1, I talked about code to read a list of spammish words from a file and look for those words in comment posts. Commenters pointed out that spammers will obfuscate words with dashes, spaces, bizarre spellings and so on, making it very difficult to catch these programmatically. That's true, but there's more to the story.
The spam list I use here has some of those common spam words, but most of it is taken up with web addresses. Links are much more difficult for spammers to mangle - they can use redirection at the destination site, but the site itself is static: if a spammer wants you to visit some page at Iamaspammer.com, either that name or its IP must be in a link. Most of my spam list is websites that have been the destination for spammers links. Once the site is in my list, the spammer can never post anything with that link in it - no matter how they mung other words, they will never be allowed to post. A comment here that contains one of those links doesn't even go to moderation - it just gets flatly rejected.
Spammers do move on - jklljas.blogspot.com may be a spam link now, but it will get abandoned eventually. I trim the list every month to remove old entries.
Should't I just block the spammer's IP from leaving comments? Yes, but spammer's IP's change over time - their IP gets blocked everywhere so they move on. If you block a particular IP forever, it may end up being the IP of a legitimate user, so you probably don't want to do that. There's is also the issue that your list of banned IP's could get very large over time.
For some websites, it makes sense to block by country of origin. I don't do that here, but if you only want U.S. visitors, you could certainly do that. See Blocking Unwanted Visitors.
You can do the blocking inside your script or use .htaccess (or your Apache configuration files). I prefer to block inside my comment posting script because if I'm wrong or if the IP has transferred to a non-spammer, I'm only blocking them from commenting, not from any site access. In extreme cases (such as a spammer attempting to use any cgi script it can find or guess) I will add them to the .htaccess file. However, whether in scripts or .htaccess, I don't keep the ip blocked for more than a week at a time.
For me, the need to block by IP is infrequent enough that I don't need to automate the removal of bans, but it isn't difficult to write such code if needed.
Let's talk about moderation for a moment. Some sites moderate all comments, but that's annoying for both the moderator and the people leaving comments. Regular posters shouldn't have to wait for their comments to appear and most web site owners have better things to do than moderate comments all day long.
One solution is to require registration. If a user can provide a login and password that has been previously approved, their comment can be posted immediately. Many sites use that scheme, but there is still a degree of annoyance: the registration process is an extra step that annoys some people.
I do something similar here, but there's no registration process per se. If you have posted here legitimately in the past and are posting again from the same IP address and with the same username (actual username, not "anonymous") your post will not need to be moderated - assuming it passes all the other tests I'll talk about below!
But as alluded to when talking about destination links above, there's no need to moderate posts that are definitely spam - we just throw those away.
I'm going to talk frankly about all the things I do here to limit spam and what I plan to do in my next version of the commenting software. I suppose there is some small risk to this; I've been reluctant to discuss all of it before because I don't want to help spammers learn better ways to bypass systems like this, but I don't think spammers are likely to read this and if they do, oh well: the war goes on.
Spammers aren't generally going to spend a lot of time and effort on posts. The pros are using scripts, the inept are probably at least automated enough to use cut and paste. Even if they are typing comments in, they probably tend to reuse the same words and links.
Some habits of lazy spammers are fairly easy to block. One is a habit of only leaving links with little or any text. That's easy to detect: just count the total words in the posts and divide by the number of http links: if the result is too small, the post is probably spam. I do that here now - if you leave a comment that is only a link, it won't get posted and it won't even go to moderation - whether the link is legitimate or not. That might be too draconian, but I think you should at least flag such posts for moderation. I do need to fix my present code to allow known user/ip posters to leave such short comments.
When lazy spammers do pad their word count, they often use nonsense words. Again, that's relatively easy to thwart if you have the luxury of time and disk space- look up each word in a dictionary and compare found words to unfound - if the ratio is too high, this is probably spam. But that requires a fair amount of work. I take a simpler approach:
If $in is the text to be checked, the Perl expression
counts the number of times four or more consonants appear in a row. That is, it counts garbage like "fghk" or "hdfr" - the kind of junk you'll get from random banging on the keyboard. If that count is high when compared to the number of words in the input, you probably have spam. My code says if the ratio of nonsense to words is over 15% it's spam. This is much faster than checking input against a dictionary and is very effective.
Remember that "jklljas.blogspot.com"? If all he posts is something like "Xanax: http://klljas.blogspot.com", he's got a 50% nonsense count ("http" and "jklljas" against 4 words) - that's enough to count as spam right there. Not all spammers use nonsense words in their sites, but quite a few do, and many are too lazy to type much more than that.
This won't catch all nonsense: we've all seen random phrases from books used as a preface to a link. Nonetheless, this stops SOME spam, and every post we stop is one that doesn't annoy us or our readers.
Another thing spammers do is send direct POSTS. That is, their automated software examines your comment form once, picks out the fields it needs to supply, and then submits multiple POST requests. I'll don't allow that: when the comment form is first loaded, the commenter's IP address is stored in a database. When they actually POST, the IP is checked and immediately removed. If the IP doesn't exist (which it will not after the first POST), the comment is thrown away. The spammer can defeat this by requesting the form before doing each POST, but many of these folks are too lazy to bother.
After reading a link suggested in the comments, I realized that there out to be a minimum thinking/typing time also. So along with the IP, I store the time the form was loaded. When the POST is made, I count the words and divide by 10 - if at least that many seconds haven't elapsed, I won't allow the post. Only a cut and paste spammer can type more than 10 words per second, so that's a fair limit - maybe even less is fair, but a legitimate poster might cut and paste some text..
Some spammers are greedy. They aren't content with putting one piece of graffiti on your site; they want to leave many spam posts. I use a timing algorithm to control that. It's simple enough: for each post from your IP, a counter is incremented. I use that counter to determine how long before you are allowed to post again. You can't make your second post until 15 seconds after your first, your third until 120 seconds after that, your fourth until 405 seconds and so on - it's the number of posts cubed times 15 seconds. This very effectively stops greedy spammers - they don't hang around. In the current version, these limits affect everyone (even me!) but I want to make them less restrictive (maybe posts cubed times 5 seconds) for known users in the next version.
So that's how I control spam comments here. Most obvious spam goes right to a black hole, users I know and trust get posted immediately, and everything else goes to moderation. I'm going to add an Akismet check in the next version - it never hurts to have a second opinion!
Instead of moderation, you could also throw up Captcha or arithmetic challenges, or require a random series of other answers or actions when a post is suspect. A spammer (especially an automated spammer) probably won't respond to even a simple "Click here to confirm your post" - I may add something like that to my next version. That's a minor annoyance for a first time poster or someone who insists upon using "anonymous", but it will stop most spammers dead.
Of course it is impossible to stop all spam. No matter what we do, we'll at least have to moderate a post now and then. However, with the spam controls I have here, I very seldom see spam - and if I do see it, I'm unlikely to see it again, because I'll adjust my code as necessary. I do have to moderate new visitors and frequent posters with new IP addresses, but that's not very onerous.
The war on spam never ends, but we can win most of the battles.
See Detecting Comment Spam, Part 3 for the continuation of this series.
/Web/detecting-comment-spam2.html copyright and reprint notice
Suppose you were writing a commenting system for a website and you wanted to check user input against a list of words that might indicate spam. You'd want the list of suspicious words in a file and you'd run through that list. An easy way to do that in Perl is to use Perl's "grep" command.
We'll start with a program that won't work. This will show a side effect of grep you need to be aware of:
You need this code and a "spamlist" file. You'd put the words you want to match in that file. For the purposes of this article, I'll assume that "fribble" is NOT in the list. Therefore, if you run this little script and type "fribble" and press Enter and then Cntrl-D, you'd expect no response - "fribble" isn't in the list of spam words.
But that's not what happens. When you run the program, type "fribble", it seems like "fribble" (or any other input) matches every word in the list. That can't be right, can it?
The problem is that "grep" modifies $_ in your loop. That's simple to fix; we set a temporary variable:
That's a bit better. Running it with "fribble" produces no output, but if you give it something in your list, it finds it. Great!
Not quite. Let's say you had "ambien" in your list because you want to stop common pharmaceutical spam. If you type "ambien" when running the program, yes, it finds it, but it will also find "ambient". That's not good - how do we fix that?
Well, we want "ambien" only when it's a word by itself. Your first thought might be to use a space or "\s":
But that fails if "ambien" is at the beginning of a line in @TST. It works if "ambien" is at the end because \s matches end of line as "space" in addition to real spaces, tabs and formfeeds.
OK, we could do this:
Fortunately, we don't need to. Perl has a better way:
That "\b" is for "word boundary" and it does exactly what we want: it matches "ambien" wherever it is in a line by itself. Note that this same syntax works with command line grep, but "grep "\<word\>" files" only works with command line grep, not Perl.
So, finding "spam" words isn't too hard. The next question is what to do about them when you find them. A few thoughts come to mind:
None of these are ideal. In our next post, I'll dig into that a bit more deeply.
/Web/detecting-comment-spam.html copyright and reprint notice
This is a continuation of Unix and Linux startup scripts, Part 2
So, after being rudely interrupted by a server crash and a few days of website migration, we're ready to continue exploring Unix and Linux startup scripts. We looked at both System V and BSD methods; until fairly recently that would have been the end of the discussion: if you were running Unix/Linux, your system used one or the other of these. Not everyone was satisfied, though
You can see the hints of unhappiness in the script directives that crept in to BSD startup scripts. More fine grained control was needed and neither System V inittab nor the BSD rc scripts provided enough.
One reason is boot speed. Even when scripts can be run in parallel as in SCO's prc_sync "P" scripts , running mostly serial shell scripts takes time; you and I and everyone else want our systems up NOW. But it's not just that.
if you step back and look at init objectively, it is just a program starter. That was certainly true with the original BSD /etc/rc and inittab only added a bit more complexity. Init is a "starter", but so are inetd /xinetd, at and cron. Why shouldn't init be able to take on some of that work?
There are "event driven" tasks. Today, there are many kinds of hardware we expect to be able to just plug into a running system and be recognized. Usually some program needs to be started up when that happens. Sometimes you want something special to happen when a new file appears in a certain directory, or when some other task finishes or fails - life can be complicated and init, (x)inetd and cron/at don't always match our needs.
Think of some of the things you might want to control for any given script or process:
It's not that you can't have all of that; it's that some of it can be done by inittab, some by (x)inetd, some by cron/at, some by device drivers and some you have to do inside your script or program. It's a hodgepodge of stuff, and that's why there are attempts to replace it all with something more powerful and complete. The the man pages for inittab, cron, at, xinetd, udev and maybe a few others and glom them all together: THAT'S the ideal.
Implementing all that is something else entirely.
Aside from very real dependencies from programs that expect certain directories, certain processes and certain system calls, you also have inertia, developer resistance to seeing their projects replaced, user/administrator/programmer resistance to learning anything new and of course inevitable arguments about how, how much, when - given all that it's surprising that much progress has been made at all. But it has. Fedora and Ubuntu now use Upstart. Mac OS X has Launchd. The path to get there hasn't always been smooth - launchd was implemented in stages, first replacing cron, then xinetd and the rc scripts have only disappeared in Snow Leopard. Ubuntu 9.10 has replaced init with upstart but rc scripts still exist. xinetd is replaced, but not cron.
Be aware of this when reading Internet articles. For example, Mac OS X posts on launchd may not have been updated to reflect its current state on Snow Leopard or beyond. Even the Ubuntu Upstart FAQ has a section stating that Upstart probably won't replace xinetd - but it did.
You also need to be careful about the things you think you know. You might add something to /etc/rc.common on Snow Leopard, but it won't be run. On Ubuntu 9.10, scripts are still in /etc/init.d, but they get run by Upstart (see /etc/init for the scripts that start them). Fedora still has /etc/inittab, but it is only used to set the default run level - anything else added there will be ignored. There is plenty enough confusion to be had, especially for those of us who work on different systems and release levels.
Comments /Misc/fbpoker.html
Sun Feb 7 03:23:58 2010 TonyLawrence
Some of the worst players end up losing all their chips. If they are incapable of building up a bankroll through skill, they have to buy chips. That doesn't cost a lot: they can buy 400,000 chips for $20 in real money. From what I can see, that doesn't change their habits much: I have seen Bingo players wiped out only to return a few minutes later charged up with a full bankroll and still willing to bet everything on a pair of Queens or less.
Add your comments