APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS













Unix and Linux Help, Resources and information for Unix/Linux, Mac OS X. Articles on blogging, web site mechanics, and self employment. Mostly techy, Unix/Linux related, but we don't really try to stay tightly focused. If you've never been here before, there's a lot to explore.


Main Index



Has someone responded to something you wrote or commented on?
Latest Reader Comments Thu Dec 24 03:32:58 2009

cartoon

Facebook Poker


2010/02/06

For the past week I've been playing Hold-em on Facebook. Let's make one thing perfectly clear immediately: this is nothing at all like playing poker with real money.

As proof, consider that I, a moderately (and only moderately) skillful player, began the week with a $1,000 play money bankroll. As I write this, my fake bankroll stands at $423,027. If you seriously think that anyone, never mind me, could accomplish that feat in the real world, well, I'd like to sit down and play poker with you. I'm sure we'd both enjoy it.

Anyway, even though FaceBook Texas Hold-em is full of "Bingo" players (people who make ridiculous bets with any cards at all) there seem to be enough serious players there that you can (with a little effort) enjoy a good game now and then.

The path to a good game requires getting rid of the Bingo players. Interestingly, it's not that hard to do. For those who don't understand the game, in Texas Hold-em you are dealt two cards. Everyone bets based upon what they now have (or would like other players to think you have). In a real game, a typical bet at this point would usually be not more than three or four times whatever the minimum bet is. The reason is that you really don't know what you have - there are more cards to come. A pair of Aces in your hand is certainly good, but it's hardly a guaranteed win. In a tournament, a player might push in all their chips with Aces if the situation is ripe for that play - if it's late in the tourney, if your chips dominate the other players, if you are playing in "late position" (being the last or close to the last person to bet).

In a cash game (not a tournament) , it would be extremely unusual to risk everything you have on one bet before seeing more cards. But that's just what the Bingo players do, and they don't wait for Aces to do it - they'll often just push in chips no matter what their cards are.

There's no point in complaining. I made a mildly sarcastic comment once and got the retort "It's called gambling, dickhead!". OK, yes, that it is a type of gambling. It's not poker.

However, if there are at least a few of you at the table who would like to play poker as though the play money really meant something, there is a way to get rid of the Bingo players. You simply ignore them.

That is, if they bet their stack pre-fop (before the remaining cards are dealt), you simply fold. You fold even if you are holding Aces yourself, you fold even if you have already called smaller raises before the Bingo players jump in.

The other Bingo players will call the bets. It doesn't matter very much what cards they hold - they are in it for the excitement of a giant pot. They'll keep playing like this until they wipe each other out.

However, if you and just a few other players refuse to call ridiculous pre-flop bets, there is no excitement. As the other Bingo players are wiped out, there will be more serious players than Bingo players and the only time the Bingo folks can get any action is post-flop. Even then, you don't play to their silliness unless you have the "nuts" (the best possible hand) or close to it. Otherwise, you just fold - ignoring pot odds and implied odds and not caring at all about whatever money you are throwing away. Your purpose at this point is to get rid of the Bingo folks so you don't call their bets unless you are very sure to win.

This strategy works. Of course new players will join the table, and some will be more Bingo players, but if you all stick to the strategy, they'll soon either be bored or broke, leaving the rest of you to enjoy a good game of poker.

You'll also acquire a big pile of fake money earned from the Bingo players foolish enough to push all in with their post-flop pair of Kings against your made full house or Ace high flush. That's where my $400,000 came from - playing against the "real" players probably gave me much less - if anything at all. Those were the games that were fun, though, where skill, cunning and a little luck are what you and your opponents are using. With serious players, you can even bluff now and then - something you can never do against a Bingo player. At times, you might even forget that it isn't real money - well, until it's time to stand up and you realize you are not going home with several hundred thousand dollars. Oh well: I'm having fun - that's what matters.

I do wish they had a "post and fold" option so that you could leave your seat for a few minutes and not lose it. It would also be nice to be able to set your "Raise" button to some multiple of the minimum bet and have it stay their until you change it. There are other minor interface annoyance issues, but overall I do find this enjoyable.

/Misc/fbpoker.html copyright and reprint notice

Comments /Misc/fbpoker.html

Sun Feb 7 03:23:58 2010 TonyLawrence

Some of the worst players end up losing all their chips. If they are incapable of building up a bankroll through skill, they have to buy chips. That doesn't cost a lot: they can buy 400,000 chips for $20 in real money. From what I can see, that doesn't change their habits much: I have seen Bingo players wiped out only to return a few minutes later charged up with a full bankroll and still willing to bet everything on a pair of Queens or less.

Add your comments




cartoon

Why the iPad IS important


2010/01/29

The iPad is still not much more than pages on Apple's web site, but already some folks are telling us that it's unimportant, a bust, a no-show, insufficient, ill-conceived and all that. That some of those nay-sayers cast similar barbs at the iPhone could be amusing, but I wouldn't argue against most of the complaints: they are absolutely correct that Apple's new device has warts.

And they are absolutely wrong that it will be a failure.

The iPad is a game changer. The people carping about its defects are missing the bigger picture - devices like this will ultimately change the way we use our computers.

Consider the form factor for a moment. No, the iPad doesn't roll up to fit in your pocket (though some future device like this might). But at 9.6 inches diagonal and with 720p resolution, it is big enough and sharp enough for pictures, TV and movies and, of course, books. Yeah, yeah, E-ink is "better" for books, but that misses the big picture - the iPad has books AND everything else.

It also has apps. The existing 140,000 iPhone apps and many more to come that will take advantage of the larger screen real estate. Among those apps are a few that implement some very important three letter acronyms: RDP, VNC and SSH.

Those three will make the iPad the perfect choice for technology workers. Set your iPad in its keyboard dock and connect to the company server. But what most everyone is missing is that you'll probably end up using this the same way at home. You may very well have a Mac or a Windows machine packed with ram and disk storage and you may be accustomed now to sitting down at its keyboard. One day you'll realize that you can use the iPad as a client. Maybe at first you only use it that way now and then - when you are sitting on the couch or enjoying your patio or porch. But as you realize how convenient that is, you might start doing it more and more.

By that time, the iPad will have probably insinuated itself into your life in other ways. You'll probably be using it to control your TV and associated devices - remote controls are horridly primitive, aren't they? You'll have started using Google Voice and Gizmo and you may be wondering if you still need a cell phone at all. The iPad has become your constant companion - you put it down on your bedside table before you nod off to sleep and it wakes you up in the morning. You may have even fumbled for it in the wee hours of the morning to record some Very Important Thought that raised you from your dreams.

Of course you use if for your calendar, your music, your movies, your books, probably your newspaper. It's your photo album, your email client, your gaming console, your web browser, your music, your journal, your spreadsheets, your private movie theatre, your work.. your life.

It's not perfect. There are things you want: maybe a camera, maybe other things. Those will come, but you have so much now that those gripes seem almost unimportant.

Not perfect, but it IS magical. It IS a game changer. Microsoft already hates it and will be spreading all the negative FUD it can. Cell phone makers might join in as people start switching to Google and Gizmo. There will people who insist they don't need it, don't want it. They'll be lying to themselves.

/MacOSX/ipad.html copyright and reprint notice

Comments /MacOSX/ipad.html

Fri Jan 29 13:57:01 2010 TonyLawrence

I suggest that those who still don't believe it go watch all of the introductory event: http://www.apple.com/ipad/#video

Fri Jan 29 14:35:15 2010 BrettLegree

http://6weeks.ca

It definitely is a game changer. If people can't see that, they are either jaded that they didn't get an invite to the event (!) or they are just very narrow minded and unimaginative.

When I saw it, I thought, "yeah, perfect for me - it does everything they say it does, movies music books email light surfing, plus I can VNC into my Linux box".

Later I thought, with a custom mount, and a 12 V and audio adapter, this thing is the perfect car computer. It does GPS (according to the Apple web site), lots of music with a groovy touch screen interface, and so on.

Plus, when I get to my destination, I just unplug it and take it with me.

I swear that the people who are down on it are being paid to sling mud.

Or very shortsighted.

Fri Jan 29 16:08:39 2010 MikeHostetler

http://squarepegsystems.com

http://www.themaninblue.com/writing/perspective/2010/01/30/ This just came up on my Twitter feed. Excellent thoughts.

Fri Jan 29 15:34:27 2010 MikeHostetler

http://squarepegsystems.com

You guys already know that I agree with you both. This is a game changer. It's an touch screen for everyone -- for those that want a portable entertainment unit to a power user that needs access to a machine . No one really has that in a usable way -- I mean, you an ssh from your iPhone, but how usable is that?



I think the most killer feature is the price. I was expecting a base price for $999, but $499? Really? That's the cost of a decent laptop and I think that the iPad is better!!



But it couldn't live up to the hype that the press had been rolling in it. I think people were expecting this to be their next boyfriend/girlfriend/friend with benefits. "What I can read book on it? It should read it to me while rubbing my back!"



I am interested in seeing how the iBooks app is. The Kindle is a wonderful ebook machine but it's only for books. The iPad (as we have said over and over again) much more than that.

Fri Jan 29 16:49:02 2010 TonyLawrence

I missed the GPS spec on first read, but it it is there on the 3G models.

Fri Jan 29 15:10:14 2010 jtimberman

Geeks are so jaded to new technology and think that the world of technology revolves around them.

When I described the iPad to my wife, she wanted one immediately. Four of my coworkers had the exact same reaction from their also-non-technical wives. They, and other non-geeks really don't care about things like "no multitasking" or "it doesn't run a REAL Mac OS." They want to go to Email, Calendars, Facebook and Etsy. My wife's biggest complaint about the iPhone is its too small, and the iPad totally cures that issue.

I imagine I'll be buying at least one, if not two (like I'll get a chance to use hers!). It's an intriguing device, and I think Tony is right - it changes the game.

Fri Jan 29 15:03:07 2010 Game changer-ish MarkBelanger

http://nemasket.net

The large screen size and Apple's panache and ability to generate buzz make it a self-fulfilling game changer even though others have made similar products before - notably the ArchOS 5 Internet tablet http://bit.ly/ajSt5L. With bluetooth, wifi, usb 2, and sd slots, the ArchOS internet tablet has much of the same potential as the iPad and maybe more considering the bluetooth. However it's small screen size makes it just a bit better than an iTouch and nowhere near as potentially useful as the iPad.

Given that it's been done before, the iPad loses points for originality, but like I said, Apple's je ne sais quoi will certainly propel the table PC into mainstream. I'm excited about the potential for android based devices competing with the iPad and driving better and cheaper incarnations.


Fri Jan 29 17:08:59 2010 TonyLawrence

ArchOS 5 Internet tablet

But that doesn't have 140,000 apps, doesn't have iTunes, iPhoto, iWorks, deals with book publishers, a no contract deal with AT&T ... I really do not think anyone can even begin to compete.

When I described the iPad to my wife, she wanted one immediately

My wife went to Apple and watched the video. Yeah, she wants one too. She also chastised me for not putting every dime we had into Apple stock :-)

I swear that the people who are down on it are being paid to sling mud.

Some, surely. But some honestly just don't get it or get side tracked by things like multitasking.


Fri Jan 29 17:19:25 2010 BrettLegree

http://6weeks.ca

@MikeHostetler,

You're right on both counts - the price is amazing, and as per your tweet this morning, it couldn't possibly have lived up to the hype. Perhaps that's okay though, since a lot of the folks who will buy these may not have been following the hype.

My mom wants one :) and so does my wife hee hee so that's how I'll get her into the Apple fold.

Neither of them followed the hype, so for them, it works.

@jtimbernan,

That's been the thing that has bugged me the most about a lot of the Twitter and blog traffic, all the "self-proclaimed elite geeks" saying this or that about it.

News flash guys and gals, everything Apple (or anyone else for that matter) makes is not necessarily meant for you. The 30-something (20-something?) crowd does *not* hold the bulk of the spending dollars today.

Make something for the generation that does, on the other hand... instant gold.

@Tony,

Good points - they don't get it.

And multitasking? Meh.

To do two things at once is to do neither.

I have played around with an iPhone, it didn't bug me one bit.

In any case, a lot of what I would do on an iPad can all be done in a tabbed browser anyway.

Fri Jan 29 17:34:28 2010 TonyLawrence

And if you really need multitasking, you VNC, ssh or remote desktop to a real computer.

Fri Jan 29 17:37:07 2010 BrettLegree

http://6weeks.ca

Exactly :)

(I think about 5 minutes after I saw the specs, I searched and found several good VNC clients for the thing. Well, for the iPhone anyway, but of course they will work.)

Fri Jan 29 18:42:24 2010 TonyLawrence

Excellent post at http://stevenf.tumblr.com/post/359224392/i-need-to-talk-to-you-about-computers-ive-been

Quote:
The iPad as a particular device is not necessarily the future of computing. But as an ideology, I think it just might be. In hindsight, I think arguments over “why would I buy this if I already have a phone and a laptop?” are going to seem as silly as “why would I buy an iPod if it has less space than a Nomad?”


Exactly.

Fri Jan 29 19:04:39 2010 TonyLawrence

http://nemasket.net

I suggest that those who still don't believe it go watch all of the introductory event: http://www.apple.com/ipad/#video

It annoys me to no end that Apple has gotten so much from the Open Source community and can't be bothered to make QuickTime or ITunes for Linux. Which is one reason that I'm leaning toward the Android platform.

Fri Jan 29 19:05:52 2010 TonyLawrence

has much of the same potential as the iPad and maybe more considering the bluetooth

The iPad has Bluetooth 2.1 + EDR technology

Fri Jan 29 19:08:32 2010 TonyLawrence

It annoys me to no end that Apple has gotten so much from the Open Source

I know. I'm wracked by open source guilt: http://aplawrence.com/MacOSX/shame.html

But I'm still using Macs and I'm going to find some way to squeeze an iPad into our budget.

Fri Jan 29 19:35:56 2010 BrettLegree

http://6weeks.ca

I used to have that "it's not open source" guilt.

Then I decided I needed to get on with my life and get stuff done. I don't have time to think to myself "does this tool have the 'proper' ideology" when it is the best tool for the job I need to do.

Plus, I have four kids, so I don't have much time to build my own stuff and hack at things to make them work :) if I can buy something that works, I buy it.

Fri Jan 29 20:19:32 2010 Ferk

IMHO, the game was changed looooooooong ago..
If iPad sells a lot will be because of the advertising and promotion, just because it's from Apple. Not because of being a "perfect choice".

There have been tablet PCs since a long time, and they are tipically much more functional than the iPad. The only new thing that the iPad brings is the lack of input/ouput standard ports. I would like much more one of those sleek laptops that can turn their screen 360º and be used as a tablet, while keeping all the functionality of a normal computer.

Fri Jan 29 21:27:32 2010 TonyLawrence

There have been tablet PCs since a long time, and they are typically much more functional than the iPad.

Nope. You are missing the big picture.

Fri Jan 29 22:10:11 2010 TonyLawrence

I'm also not going to agree with the "more functional" idea. I don't think anything has even approached this sort of interface and it's definite that the number of apps far exceeds anything other than PC's.

No, this isn't a tired rehash like something Microsoft would have done. This is exciting and new.

Sat Jan 30 11:47:08 2010 TonyLawrence

The Free Software Foundation is also on an anti-iPad rant.

Their gripe is more about politics than functionality. I tend to side with what this post opines: http://ostatic.com/blog/defective-by-design-is-defective

Sat Jan 30 12:46:14 2010 BrettLegree

http://6weeks.ca

The Defective By Design thing always seemed a bit childish to me, to be honest. It sounds like, "we can't bring the people the product they want because of our pseudo-political leanings, so let's piss on someone else's parade instead."

I know there are some smart people in the FSF, probably some good businesspeople too.

So stop complaining and make a freePad already. Make a freebook Pro, and a freebook. Take the best of BSD or Linux and make freeS X already.

Of course, I don't think they should really clone Apple. But there is a lot of really great FLOSS out there - I use it, you use it, lots of people use it.

Make it a bit better, a bit easier to use, market it properly, turn it into something people really want.

And heck, feel free to *charge* them for it. A lot of people still have a hard time believing something is good quality if you're giving it away.

Sat Jan 30 13:30:05 2010 MarkBelanger

http://nemasket.net

This gamers-eye-view takes issue - big issue - with the lack of multi-tasking - http://bit.ly/d7Gc7p.

I don't think the lack of multi-tasking will prevent Apple from selling zillions of iPads or prevent the iPad from having a major impact on non-desktop computing. That said, I think the lack will be a big negative in the long term. 6 months to a year from now, there'll be a half-dozen viable Android based alternatives that can mulittask - easily. Even today the number of apps for Android, while still far less than iPhone/Touch is plenty for the average Linus.

So short term, lack of muli-tasking is no big deal. Long term it is quite a limitation for a device that wants to replace a laptop in certain situations.

About the Open Source discussion. I love the principals but that's not my issue with Apple. My issue is that key Apple technologies - specifically QuickTime and iTunes are not available. I think this is insanely shortsighted and sort of cheesy given how much Apple has taken from OpenSource. I don't expect open source versions of any Apple technologies. IMO, making Linux a viable alternative helps Apple. I think a person is likely to go to Linux not from Apple but from Windows. That Linux user has potential to eventually move to Apple.

Sat Jan 30 13:59:59 2010 BrettLegree

http://6weeks.ca

@MarkBelanger,

Your point about multitasking - 6 months is a long time. Perhaps Apple will have a multitasking iPhone/iPad OS by then. I can see it, if people are asking for it.

If I were a betting man, I'd say that it already exists, and we'll see it at the next Apple event, for anything that runs a variant of the iPhone OS.

I know what people are saying about Apple and their apps and giving back to open source.

Truthfully, even if I could run iTunes and Quicktime and whatever else on a Linux machine natively, I would not. I don't use them on my Mac because there are other tools that do the job better for me.

Obviously, this isn't the case for everyone, so... but then again, if you're running a Linux machine (or BSD or whatever) and you (or the person who set it up for you) couldn't find suitable alternatives for iTunes and Quicktime, then I'd probably raise an eyebrow and wonder what you're doing...

Sat Jan 30 14:51:25 2010 TonyLawrence

On the other hand, limiting multitasking is a security feature too..

Sat Jan 30 14:56:42 2010 BrettLegree

http://6weeks.ca

That's an excellent point Tony, I never thought of that.

Sat Jan 30 15:07:59 2010 MarkBelanger

http://nemasket.net

Saying that the lack of multi-tasking is a security feature is like saying a car without an engine is a low-maintenance feature. I'd be OK with "we skipped multi-tasking for better time to market" or "we need more time for a secure implementation" but selling it as a security feature is weak at best.

Sat Jan 30 15:10:33 2010 TonyLawrence

I don't know that they are "selling" this as a feature. I'm just noting that security is easier without it.

Sat Jan 30 15:28:02 2010 BrettLegree

http://6weeks.ca

I'd tend to say it isn't so much a car without an engine, but maybe a car with a normal engine as opposed to a hybrid powerplant, or an amphibious car, or something like that.

I'd say most people (i.e. not the people who read this site, or use Twitter, or know what a blog is) tend to single task anyway.

I work at a nuclear company. I've watched the way most people work there - by and large, these are people who write documentation.

One app at a time. Don't even know what Alt-Tab does.

So for them, is lack of multitasking a problem on iPad?

Nope.

Remember - maybe Apple didn't design this thing for you and I as the primary target market. Maybe it's for your grandmother.

Sat Jan 30 16:35:04 2010 TonyLawrence

And again, if I need to, I ssh or vnc to my real computer.

Add your comments




cartoon

Lan sniffing with a DualComm port mirroring switch and Windump


2010/01/19

I was recently contracted to help another consultant sniff a customer's network for suspicious activity. The situation was that the customer had been put on blacklists because some internal machine had apparently been compromised and was sending out spam.

Obviously the first task was to find and clean up any infected machines. The consultant contracted that out to someone else who updated virus software and ran scans. Unfortunately, that person didn't provide details of his work - he just reported that he had found and fixed "some problems". This didn't leave anyone feeling confident that the problem had actually been dealt with.

I pointed out that, if possible, all machines other than the internal mailserver should be blocked from sending email (other than to the internal mailserver, of course). Ideally, they should be locked down to only whatever outgoing ports are absolutely necessary, but blocking 25 and 465 is a good start. That was done, but my contact still wanted to know how to sniff what is actually happening on the network.

I had him buy a DualComm port mirroring switch and arranged to meet him at the customer site. The DualComm is an inexpensive 5 port, USB powered switch that, by default, mirrors port 1 to port 5. It's small enough to keep in your laptop bag, cheap enough that you can leave it at a customer site and the USB power means one less outlet to hunt for. The default port mirroring makes this ideal for lan sniffing.

Because the consultant wanted to use Windows, I brought a Windows laptop with Windump installed. Windump is just tcpdump so that makes it easy for me and it also means that he can search for tcpdump tutorials and learn more about its usage.

Both Linux and MacOSX users have tcpdump installed by default. Personally, I'd much rather carry a Mac or Linux laptop for this kind of work as there are many other tools that Windows doesn't bother to include. But this consultant was more comfortable with Windows, so that's what we did.

On site, I connected my laptop to port 5 of his DualComm, took the patch cord that went to the ISP's router and put it in port 2, and then ran port 1 back to the customer's switch where I had unplugged the router cable. I started up a CMD window and showed him that we could do things like

windump "tcp port 25 or tcp port 465"

That showed traffic to and from the internal Kerio mailserver as we'd expect. I then stopped the mailserver and all Windump output ceased. We watched for a few minutes, saw nothing, and turned the mailserver back on. I showed him that the Kerio admin "Active Connections" under Status should match the IP's we were seeing in Windump. This made him feel more confident that the problem was indeed resolved. I did suggest that he might want to log some longer runs just to be certain, but as I confirmed that client machines were blocked, I don't expect to see this problem again. The sloppiness of the contractor who did the virus cleanup bothers me a bit, but otherwise this is under control.

/Security/dualcomm.html copyright and reprint notice

Comments /Security/dualcomm.html

Sat Jan 23 07:19:16 2010 Michiel

Have you tried WireShark? It's a great tool for sniffing network traffic and analyzing it. It can also visualize conversations between two machines. There exist versions for unix, windows and osx. http://www.wireshark.org

Sat Jan 23 21:07:25 2010 TonyLawrence

I haven't. I just downloaded the so-called Mac version (it isn't, it's an X app). I detest having to run stuff in X when I'm using a Mac, but I'll give it a twhirl,

Add your comments




cartoon

Google Custom Search Promotions


2010/01/14

Google's Custom Search engine is generally accurate at finding the results I'd like it to find when searching this site. However, sometimes it doesn't find what really is the best result. Until now, there hasn't been much you could do about that.

The new "Promotions" feature gives us a way to help. Simply, you define keywords, a title and a link. If those keywords are used, your link appears above all the normal results.

You can define more than one promotion. For example, if you type "laserjet" into the search box at the top of this page, you'll see that I have added two promotions: one for "netcat" and another for escape sequences to select trays. I think those links are likely to be more important than what Google selects by itself.

This is new (I just found out about it this morning), so I haven't added very many of these yet. I don't know what limitations Google has on the total number of promotions or the number per keyword. but I'm sure I'll be using as many as they will allow. Better search results mean happier visitors, right?

/Web/cse-promotions.html copyright and reprint notice

Comments /Web/cse-promotions.html

Add your comments




cartoon

Mac OS X Scrabble Word Trainer with Growl


2010/01/11

As I mentioned at Internet Scrabble (the name Scrabble is a trademark of Hasbro, Inc. in the United States and Canada and of Mattel elsewhere), our family has been enjoying playing Scrabble on-line. Aside from the fun of playing, it gives us another reason to keep in touch with each other - the small interactions in the chat windows are part of our involvement in each others lives.

i have also been playing with strangers. When I first started that, I was a little shocked by some of the word usage. In our family games, we had never allowed slang, abbreviations, or foreign words. If you had a Q and no U, you were stuck - there was no QI or QAT or QANAT in our games. Isn't that why a Q is worth 10 points? If all it takes is an I to play it, it shouldn't be worth much more than H, should it?

Well, that's not how Scrabble is played today. Foreign words, abbreviations, old English and slang have found their way into TWL (The Word List). It almost seems that you can toss down three or four random letters and have a good chance of being able to create a valid Scrabble word.

Of course you have to know the words. With the Facebook version, you can guess and let the dictionary correct you, but if you ever want to play face to face, that won't work. You need to learn more words.

Not just any words, though. While a large vocabulary isn't a bad thing to have for Scrabble, you can improve your scores by just concentrating on the two and three letter words plus a few other odd words that use the high scoring letters. Add to that the "aa" and "ii" words (for those times when you get a rack full of them), take away the obvious stuff that you already know, and you'll be left with less than 1,000 words, give or take. That's a fair pile to memorize, though not impossible.

If I were younger, I probably would just memorize those words. I think I probably still could, but instead I let my computer help me learn them. That's easier on my tired old brain.

I made a list of the words I want to learn. I added to it the words that that I have a hard time accepting as legal plays like "AB", "SIM" and a few other abbreviations I just don't think of as words. I added a ":" and then a definition for each word - I find it much easier to remember words if I know what they mean. The final step was to write a program to present these to me randomly like flash cards.

My first effort was just a Perl script that randomly shuffles the list and then outputs each line to a terminal window. That was fine, but I had to have that window open to see the output. I wanted the words and definitions to be always visible to me - to appear on top of whatever else I happen to be doing, but to be translucent so that it wouldn't interfere too much. Hmmm.. sounds an awful lot like Growl.

I already use Growl for mail notification and the command line "growlnotify" was just what I needed. Here's what it looks like while running (click on the picture for a larger view). The Growl notification is in the upper right corner showing "KAE" at that moment. The simple Perl code follows.

growlnotify running

Code

#!/usr/bin/perl use List::Util 'shuffle'; @s=<>; @shuffled=shuffle(@s); foreach (@shuffled) { $extra=""; @s=split /:/; $word=uc($s[0]); $lword=$s[0]; s/.*://; $extra="NO U! " if ($word =~ /Q/ and $word !~ /U/); print "\033[1m $word $extra \033[0m, $lword: $_\n";; open(O,"|/usr/local/bin/growlnotify $word $extra"); print O "$lword $_"; close O; sleep 7; # if you increase the default display persistence in Growl preferences, increase this also }

I call that "scramble" and leave it running as "while :;do scramble mywords;done" (that way I can add words while it is running or have it use different lists).

/MacOSX/scrabble-growl.html copyright and reprint notice

Comments /MacOSX/scrabble-growl.html

Tue Jan 12 11:53:58 2010 TonyLawrence

You can also do things like:

grep ".*ii.*:" mywords | scramble
# study the "ii" words

grep "^..:" mywords | scramble
# two letter words

The growlnotify is useful for studying or just being reminded of any sort of list - things you want to do this week or this year, words you never spell correctly, or anything else.

Sat Jan 16 18:19:53 2010 TonyLawrence

You could easily bring this to Windows:

http://www.growlforwindows.com/gfw/ has "growlnotify" so the only other thing you need is Perl and the above script.



Add your comments




cartoon

Basic TrueCrypt Usage


2010/01/07

I and other people here have mentioned TrueCrypt before. I thought (and perhaps you did too) that it was very simple and obvious to use but I've had several people write to me complaining that they downloaded and installed it, but have no idea what to do next.

OK, maybe the interface isn't all that user friendly. It really is simple, but after looking at it from an "ordinary person" perspective, I can agree that it could leave you staring at the screen saying "Huh?" So let's run through using this in plain English.

What you don't want to do

The most common fear I heard from people was that they were afraid TrueCrypt was going to encrypt their hard drive and that something would go wrong or they'd forget the password. Yes, TrueCrypt can encrypt entire hard drives, and yes, things could go horribly wrong or you could forget your password. So, yes, you have reason to be concerned. I definitely would NOT advise using TrueCrypt for that purpose unless you completely understand what you are doing, what the risks are, and (perhaps most importantly) WHY you are doing it.

Most of us need to protect individual files. Maybe you have a text file with all your passwords in it. Maybe you handle sensitive documents for your clients. Whatever it is, you usually don't need to encrypt a whole drive. You just need to lock up those particular files.

Protect a file or files

Click on Create Volume

This is the simplest and safest TrueCrypt operation. Start up TrueCrypt. You've never used it before, so what you want to do is click on Create Volume. You want to create an "Encrypted File Container" (that's the default). Click Next, and then select "Standard True Crypt Volume" (again that's the default). What happens next seems to confuse people: a file dialog comes up, which perhaps makes you think that you need to select some file.

No, it's looking for you to give the name and location of a NEW file. This file will be the "container" for the files you actually want to hide. It's going to eventually end up as another disk drive on your system, which is perhaps another reason this can confuse folks: it's a "volume", it's a "container", it's a disk drive. No wonder people are hesitant to proceed!

So click on "Select File", navigate to where you want to keep this, and give it a name. Remember, this is the "container". It's the box your secret files will hide in. You might call it "Secrets", "My Secret Stuff" or "Fred" - choose something that makes sense to you. IF YOU CHOOSE AN EXISTING FILE, IT WILL BE DELETED.

So, after choosing a name, click Next and the following screen asks what kind of encryption you want to use. For most of us, the default AES is fine. The TrueCrypt help file suggests reasons why you might choose one of the others:

If you store the backup volume in any location where an adversary can make a copy of the volume, consider encrypting the volume with a cascade of ciphers (for example, with AES-Twofish- Serpent). Otherwise, if the volume is encrypted only with a single encryption algorithm and the algorithm is later broken (for example, due to advances in cryptanalysis), the attacker might be able to decrypt his copies of the volume. The probability that three distinct encryption algorithms will be broken is significantly lower than the probability that only one of them will be broken.

You can take the default choice for the hash algorithm and click Next.

Now you need to choose the size of your container. Obviously it needs to be large enough to store the files you want to hide, but you may want to think about making more than one smaller container. For example, if you are going to store backups of this container (a good idea!), you might want to do that on a CD or DVD - obviously the container size has to be small enough to fit on the storage media. Or perhaps you plan on using one of the many free Internet storage sites - your choice of size may be limited by what they will give you for free.

There's also a minimum size - not because TrueCrypt really cares, but because your operating system can't create a disk drive (which is what this container ultimately becomes) smaller than a minimum size. Once you've decided how big or small this wiill be, you click Next and it's time to choose a password.

Think of a sentence

TrueCrypt isn't looking for "joe123" or even :"P^%WErt45!@p.k" . It's looking for a long sequence - they recommend at least 20 characters and you can use up to 64.

You could make up a long string of nonsense, but how are you going to remember "Ht^%f2HH(hpo&mnE$%d";q\n*^$sdf"? I suggest using a phrase - a sentence - that you can remember. It might be words from a song: "Memories are all I have to cling to - cling to!" or a string of names: "Thomas, Jonathan, Sarah and THEN William!". If you always keep the books on your shelf in the same order, maybe you could use their titles: "Programming Perl, Perl Cookbook, Linux Firewalls and Linux Cookbook". It is best if you can include some random punctuation, but if this password is never going to be written down and will live only in your head, it's better to be a little more weak than risk forgetting it - once you've locked your files up with this, they are not coming back without that password!

A weaker password can be augmented with a "key" file or files. These are simply files that TrueCrypt takes 1024 bytes from and mixes into your encryption. You can use any file (or multiple files) on your disk as long as the first 1024 bytes of it will never change. You could use a file stored on a USB stick - if someone stole your computer but didn't get that USB drive, they can't open your TrueCrypt files even if they have the password. Of course, you can't either - you have to have the key file(s) available to get at your stuff.

Once you have decided on your password and any key files, it's time to actually create the container. You'll be asked to move your mouse randomly for a bit and then click Format. The purpose of the random mouse stuff is to generate better encryption, so just do it even though it sounds like someone might be pulling your leg. After you click Format and TrueCrypt says it is all done, you can exit back to the main screen.

Adding Files

You have now created a container. You haven't put anything in it yet and to do that, you need to mount it as a disk drive. You'd think you cold just click "Mount" and TrueCrypt would ask you what you want to mount, but no, you need to first click "Select File", find your container, point at your key file(s) if you used any), and then click "Mount". You can select what drive letter or (Mac) volume identifier to use and once it is mounted you can exit TrueCrypt - you can unmount the container using ordinary operating system methods if you wish.

While it is mounted, you can put files in it. I suggest keeping safe copies of your files until you feel completely comfortable with TrueCrypt - remember, if you can't recall the password or lose any required key files, you will have no access to your data.

That's it. After you have loaded the drive up with files, you unmount it and that's it - the encrypted container is protected by your password and any key files you specified. It was pretty simple, wasn't it?

/Basics/truecrypt.html copyright and reprint notice

Comments /Basics/truecrypt.html

Thu Jan 7 22:40:10 2010 BrettLegree

http://6weeks.ca

I have a few personal files from time to time on my work laptop.

I keep them on a TrueCrypt volume... and the keyfile is on a USB stick (I have the keyfile backed up at home).

(Yeah, I know, "bad Brett mixing personal with work" - but hey, *they* call me at *home* sometimes!)

Sat Jan 9 07:46:57 2010 sledge

Funny story, but only related to moving the mouse:
So I built my first Linux box running RedHat 5.1 and StarOffice 4. The purpose was to have internet access for the family. I researched getting the perfect modem because I had heard how hard it is to make things work under Linux. Got my hands on an ISA modem with jumpers so I could choose the IRQ for myself. I set it up using IRQ 9. Then I connected a serial mouse and fired the everything up. I didn't understand why the modem worked better when I shook the mouse, but it did. So I hunted around the Internet (shaking like a cheap motel bed the whole time) and even posted questions to Usenet for the first time. I received several responses referring to 'setserial' but I didn't follow the logic.
To make a short story long, the serial port was using IRQ 2 (which cascades to 9) and the modem worked like gang-busters after I moved the jumper even without the wiggling.
I still have that machine in the shed - I need to dig it out, it has a file on it that I didn't save anywhere else.
PS I use TrueCrypt to encrypt an entire hard drive and now I don't worry about my "adversaries" any more. But the process if setting up the container file confused me the first time I did it. I understood the mouse's relationship to entropy from earlier encryption stuff so it didn't seem silly to me.

Mon Jan 11 13:17:26 2010 Anonymous

One thing I love about truecrypt is that it's pretty much platform independent and the encrypted volumes can be opened by truecrypt on Windows/OSx/Linux. However as a Mac user it's a bit of an overkill for a basic tool, when creating an encrypted Sparsebundle with a native OSx program will do the same thing and any nasty long passwords can be saved automatically using keychain.
However you can't beat Truecrypt for hidden encrypted volumes.. That's a sweet utility and gives some peace of mind.

Tue Jan 12 09:01:44 2010 anonymous

...so now we can safely hide all of that porn from our wives!

Tue Jan 12 11:31:38 2010 TonyLawrence

I suppose so, though you might want to seriously think about what kind of relationship includes lies and deception.

Tue Jan 12 13:18:32 2010 anonymous

Well I was more thinking of customer/personal sensitive info/docs/log/configurations and anything which might aid someone else in malicias activities target at either myself or one of my customers.

Don't worry about me.. My wife knows how to unlock my keychain....


Tue Jan 12 13:32:50 2010 TonyLawrence

Years ago a local company sold a document scanning system to another local firm. A few months later, they got a nasty call from the customer who insisted that they had mis-sized the storage because they were only 10% through scanning their docs and were already out of disk space.

I was sent in to investigate. What I found was gigabytes of porn - apparently an employee noticed all this available disk space and used it for his "collection".



Add your comments




cartoon

Internet Scrabble


2010/01/02

Our family has recently discovered online Scrabble (the name Scrabble is a trademark of Hasbro, Inc. in the United States and Canada and of Mattel elsewhere) through Facebook. We've been a Scrabble playing family for many decades; we all love the game, but we are seldom all together to play and even when we are, we usually have other things to do.

Internet Scrabble changes all that. You can make one move per day or even less. More importantly, you can play more than one game at once: I have one game going with my wife, one with each of our daughters, and one with the four of us. I also have more games going with friends and a few with strangers. I can spend as much or as little time playing as I like - and so can all my opponents.

Playing Scrabble on-line is different in other ways. There's no arguing about words: the game enforces acceptable words. On the other hand, words our family never would have allowed in home games can be used; in one of my first games with a stranger, I was shocked to see "NGWEE" appear. That word is the Nigerian unit of currency, but I'm sure you knew that. I'm sure you also know that "AA" is a dry from of lava and that "DUIT" is a Dutch coin - and that all of them are acceptable words in Facebook Scrabble.

Another issue, especially when playing with strangers, is the easy availability of on-line anagram solvers. In our family games, we allow checking the dictionary for spelling and minor word-hunting, but using an anagram program is a bit much. On the other hand, there is a fair amount of strategy and plain old luck to Scrabble, so the cheaters aren't guaranteed a win. Finally, if you really feel someone must be using an anagram program, you can fight back by doing the same yourself.

Speaking of anagram solvers, there are Scrabble robots at other sites. I haven't tried playing against one, so I can't say how good they are at strategy - I'm sure they are pretty good at finding the best scores, but that's not always the best play, so a human may still have a fighting chance.

So, if you are wondering what I'm doing in between working, eating, sleeping, breathing, and all that, well, I'm probably playing Scrabble. Maybe I'll see some of you on the other side of the board!

Another popular Scrabble site is Internet Scrabble Club. They have timed play there, but when I tried it out they were having server problems and that made playing very difficult. I don't know if that's usual or unusual.

/Web/facebook_scrabble.html copyright and reprint notice

Comments /Web/facebook_scrabble.html

Tue Jan 5 18:56:03 2010 MikeHostetler

http://squarepegsystems.com

When your family is actually together, you should try Bananagrams. It's sorta like Scrabble on Speed. You actually have your own little Scrabble board going and when you or someone else use all your letters you say "Peel" and you have to take another letter. Then your nicely-organized little group of words may have to be torn apart and re-built cuz you got a Q and it goes no where.
http://www.bananagrams-intl.com/instructions.asp

(btw, their site would fail most usability tests).

Wed Jan 6 15:30:50 2010 Cyr

I love playing Bananagrams with the family (although we have always called it "Speed Scrabble")
As for the lone "Q", I have always found the word "qat" invaluable.
(It is a shrub found in east Africa and Arabian peninsula whose leaves are chewed similarly to betel or tobacco to produce a stimulating effect)

Wed Jan 6 15:45:35 2010 TonyLawrence

Sounds like fun.

An alternate form of Scrabble that I think I invented is similar. In this variant, you can replace letters on the board that are part of the word you want to form. You get the tiles back, so you might pick up QU from QUICK and replace it with ST and add an S on the end for STICKS. You don't get the doubles or triples underneath again and you MUST use at least one letter without swapping.

It means the Q's and Z's and J's can get played many times - lots of fun!

Thu Jan 7 02:21:13 2010 BruceGarlock

My wife and I played "Bannanagrams" the other night. Not exactly scrabble, but essentially the same idea. It was fun, but made me realize what a poor vocabulary I have. I need to stop reading tech books, and read some more classics :-)

Thu Jan 7 15:36:20 2010 TonyLawrence

More Q without U words (legal in TWS Scrabble):

qadi : Islamic judge
qadis : plural of QADIqaid : a Muslim tribal chief or senior official
qaids : plural of QAID
qanat : gently sloping underground tunnel for irrigation
qanats : plural of QANAT
qat : leaf of the shrub Catha edulis
qats : plural of QAT
qi : a circulating life energy in Chinese philosophy
qindar : Albanian currency
qindarka : plural of QINDAR
qindars : plural of QINDAR
qinta, quintas: a country estate in Portugal or Latin America
qintar : Albanian currency
qintars : plural of QINTAR
qoph : 19th letter of the Hebrew alphabet
qophs : plural of QOPH
qwerty : the traditional configuration of computer keyboard keys
qwertys : plural of QWERTY
sheqel : any of several ancient units of weight
sheqelim : plural of SHEQEL
suq, sooq, souq - commercial quarter
tranq: : sedative
tranqs: : plural of TRANQ
faqir -- Muslim or Hindu monk
faqirs -- plural of FAQIR





Wed Jan 13 06:04:22 2010 Speed Scrabble Peter

http://www.supernifty.com.au/

You can play Speed Scrabble Online at http://www.supernifty.com.au/speed_scrabble.php

Add your comments




cartoon

Detecting Comment Spam, Part 3


2009/12/29

This is a continuation of Detecting Comment Spam, Part 2

In the previous posts in this series, I've said that spammers habits allow us to detect their attempts to leave inappropriate comments. I use these techniques here and am able to send most spam comments directly to the bit-bucket without ever having to examine them myself. When I suspect spam but am not sure, I just send the comment to moderation. In practice, very few spam attempts get by the automatic filters.

The code I use is a hodgepodge I developed over many years of fighting spam comments. I need to rewrite it and my thought is to move toward a scoring system similar to that used by Spamassassin for mail spam: each "bad habit" increases an overall spam score and the final judgement is made based on the total score accumulated. To that end, I have been pulling out the various tests I use now and examining them. Some always cause the comment to be treated as spam; those would add enough points to ensure that would still happen. Other habits now cause moderation, but in my new design each of those will increase the spam score. Any spam score at all is cause for moderation, but if the post accumulates enough minor points, I can skip that and just throw it away.

By the way, there is an effort to create a BlogBlogSpamAssassin.

Seven (or more) habits of ineffective spammers

I'll be reviewing things covered in the previous two posts and will introduce some new ideas. Items covered in the previous posts are marked with a "*".

* Known spam links: Reusing links we already know are spam. This would carry enough points to always be spam.

* High link to text ratio: Sometimes nothing but links. This could be legitimate, but it's a strong indicator. In my current code, this always treats the comment as spam, but I think I will change it so that at least one other spam point is required.

* Nonsense words: The higher the ratio of nonsense to total word count, the more likely this is spam.

* Many quick posts: Legitimate commenters may post more than once per day, but there will be some time delay between comments as they read the articles they are commenting on. I simply enforce a "posting too frequently" policy.

* Direct posts: Some spammers bypass your forms and send direct POST requests. That's definite spam.

* Typed too fast: Legitimate commenters MAY use cut and paste, but moving from form load to POST too quickly needs to carry some weight.

Multiple posts to same article: Legitimate commenters almost never post more than one comment without some other comment intervening. Sure, someone may have an afterthought and add a second comment, but more usually this indicates a spammer, so we should add points.

Same comment at another article: As always, spammers are lazy. They sometimes post exactly the same text to different articles. The posts may not come from the same IP address, but the content is the same. This should probably carry enough points for flat rejection. It does mean that you need to maintain a database, but you only need to store 100 characters or so (spam comments are usually short) and you can clean it out after 24 or 48 hours.

Text and link have same word: This is a minor spam indicator, but if the word "Horrivea" appears as "Buy Horrivea: http://someaddress/horivea", it may be spam.

Failure to reload: I require clicking on a link to reload the original page after submitting a comment. A legitimate poster will almost always click, a spammer almost never will. Again, this is not an absolute indicator, but is worth a point or two.

External tests

Akismet is a popular comment checker. I have found it to be less effective than my own tests, but it can't hurt to have another opinion, can it?

Captcha tests: See Creating a blog... with anti-spam measures for an example of incorporating Captcha. As this technology is annoying, you might consider adding it as a confirmation only when some other conditions have raised suspicion. Similar tests involve solving simple math or providing answers to obvious questions. Spammers are in a hurry; they are usually using bots and if not they just don't want to spend time thinking, however small that time is.

Legitimate Commenters Habits

There are a few things that spammers don't usually do. Legitimate comments almost always use punctuation, so the presence of punctuation could decrease a spam count. Legitimate posters sometimes quote text from the article or from a previous comment; spammers almost never do that, so again we might decrease a spam score if we see this.

Conclusion

We can control spam comments. Not all of it, but we can limit the amount that has to be checked manually and we don't need to annoy our visitors with required registration or automatic "human reader" testing.

I'd love to hear your thoughts, comments and ideas.

/Web/detecting-comment-spam3.html copyright and reprint notice

Comments /Web/detecting-comment-spam3.html

Thu Dec 31 00:24:49 2009 Web Comment Spam DaleReagan

http://web-tech.ga-usa.com/

Greetings,

As usual some good info. :)

I would suggest adding web log checks for:

a) referral links (where is the BSpammer coming from) and/or
b) search links (how is the BSpammer finding you)

I am seeing patterns which leads me to create new rules Using Apache + Mod_Security (a WAF - web application firewall.) I also do some auto-scanning of web server logs to auto-create IP bans for bots... I then aggregate abuses and ban entire subnets based on my very low threshold for any type of SPAM. I am sure that Bot-abuse will vary depending upon the size of a given web site (as well as geo-location) and IP-banning may not be a good solution for all but it works well for me since I do not typically have business concerns for the locations from which Bot/Spammers seem to be operating from (i.e. currently mostly from Europe and Asia.)

This AM I had 43 'comment posts' (Blog SPAM, of course) that managed to get by the CAPTCHA tool - the IPs are now banned. My guess is that bots reported the CAPTCHA and a human returned to create the SPAM.

There are also additional benefits from using Mod_Security - worth exploring for anyone interested in keeping a web site online and relatively secure.

Of course, your mileage should vary. :)

Dale

Thu Dec 31 01:49:17 2009 TonyLawrence

I hadn't thought of looking back for referral/search - thanks!

Add your comments




cartoon

Detecting Comment Spam, Part 2


2009/12/23

This is a continuation of Detecting Comment Spam, Part 1

In part 1, I talked about code to read a list of spammish words from a file and look for those words in comment posts. Commenters pointed out that spammers will obfuscate words with dashes, spaces, bizarre spellings and so on, making it very difficult to catch these programmatically. That's true, but there's more to the story.

The spam list I use here has some of those common spam words, but most of it is taken up with web addresses. Links are much more difficult for spammers to mangle - they can use redirection at the destination site, but the site itself is static: if a spammer wants you to visit some page at Iamaspammer.com, either that name or its IP must be in a link. Most of my spam list is websites that have been the destination for spammers links. Once the site is in my list, the spammer can never post anything with that link in it - no matter how they mung other words, they will never be allowed to post. A comment here that contains one of those links doesn't even go to moderation - it just gets flatly rejected.

Spammers do move on - jklljas.blogspot.com may be a spam link now, but it will get abandoned eventually. I trim the list every month to remove old entries.

IP Blocking

Should't I just block the spammer's IP from leaving comments? Yes, but spammer's IP's change over time - their IP gets blocked everywhere so they move on. If you block a particular IP forever, it may end up being the IP of a legitimate user, so you probably don't want to do that. There's is also the issue that your list of banned IP's could get very large over time.

For some websites, it makes sense to block by country of origin. I don't do that here, but if you only want U.S. visitors, you could certainly do that. See Blocking Unwanted Visitors.

You can do the blocking inside your script or use .htaccess (or your Apache configuration files). I prefer to block inside my comment posting script because if I'm wrong or if the IP has transferred to a non-spammer, I'm only blocking them from commenting, not from any site access. In extreme cases (such as a spammer attempting to use any cgi script it can find or guess) I will add them to the .htaccess file. However, whether in scripts or .htaccess, I don't keep the ip blocked for more than a week at a time.

For me, the need to block by IP is infrequent enough that I don't need to automate the removal of bans, but it isn't difficult to write such code if needed.

Moderation

Let's talk about moderation for a moment. Some sites moderate all comments, but that's annoying for both the moderator and the people leaving comments. Regular posters shouldn't have to wait for their comments to appear and most web site owners have better things to do than moderate comments all day long.

One solution is to require registration. If a user can provide a login and password that has been previously approved, their comment can be posted immediately. Many sites use that scheme, but there is still a degree of annoyance: the registration process is an extra step that annoys some people.

I do something similar here, but there's no registration process per se. If you have posted here legitimately in the past and are posting again from the same IP address and with the same username (actual username, not "anonymous") your post will not need to be moderated - assuming it passes all the other tests I'll talk about below!

But as alluded to when talking about destination links above, there's no need to moderate posts that are definitely spam - we just throw those away.

Other spam control

I'm going to talk frankly about all the things I do here to limit spam and what I plan to do in my next version of the commenting software. I suppose there is some small risk to this; I've been reluctant to discuss all of it before because I don't want to help spammers learn better ways to bypass systems like this, but I don't think spammers are likely to read this and if they do, oh well: the war goes on.

Lazy Spammers

Spammers aren't generally going to spend a lot of time and effort on posts. The pros are using scripts, the inept are probably at least automated enough to use cut and paste. Even if they are typing comments in, they probably tend to reuse the same words and links.

Not much text

Some habits of lazy spammers are fairly easy to block. One is a habit of only leaving links with little or any text. That's easy to detect: just count the total words in the posts and divide by the number of http links: if the result is too small, the post is probably spam. I do that here now - if you leave a comment that is only a link, it won't get posted and it won't even go to moderation - whether the link is legitimate or not. That might be too draconian, but I think you should at least flag such posts for moderation. I do need to fix my present code to allow known user/ip posters to leave such short comments.

Nonsense words

When lazy spammers do pad their word count, they often use nonsense words. Again, that's relatively easy to thwart if you have the luxury of time and disk space- look up each word in a dictionary and compare found words to unfound - if the ratio is too high, this is probably spam. But that requires a fair amount of work. I take a simpler approach:

If $in is the text to be checked, the Perl expression

$consonants++ while $in =~ /[qwrtpssdfghjklzxcvbnm]{4,}/ig;

counts the number of times four or more consonants appear in a row. That is, it counts garbage like "fghk" or "hdfr" - the kind of junk you'll get from random banging on the keyboard. If that count is high when compared to the number of words in the input, you probably have spam. My code says if the ratio of nonsense to words is over 15% it's spam. This is much faster than checking input against a dictionary and is very effective.

Remember that "jklljas.blogspot.com"? If all he posts is something like "Xanax: http://klljas.blogspot.com", he's got a 50% nonsense count ("http" and "jklljas" against 4 words) - that's enough to count as spam right there. Not all spammers use nonsense words in their sites, but quite a few do, and many are too lazy to type much more than that.

This won't catch all nonsense: we've all seen random phrases from books used as a preface to a link. Nonetheless, this stops SOME spam, and every post we stop is one that doesn't annoy us or our readers.

Direct POSTS

Another thing spammers do is send direct POSTS. That is, their automated software examines your comment form once, picks out the fields it needs to supply, and then submits multiple POST requests. I'll don't allow that: when the comment form is first loaded, the commenter's IP address is stored in a database. When they actually POST, the IP is checked and immediately removed. If the IP doesn't exist (which it will not after the first POST), the comment is thrown away. The spammer can defeat this by requesting the form before doing each POST, but many of these folks are too lazy to bother.

Typed too fast

After reading a link suggested in the comments, I realized that there out to be a minimum thinking/typing time also. So along with the IP, I store the time the form was loaded. When the POST is made, I count the words and divide by 10 - if at least that many seconds haven't elapsed, I won't allow the post. Only a cut and paste spammer can type more than 10 words per second, so that's a fair limit - maybe even less is fair, but a legitimate poster might cut and paste some text..

Excessive posting

Some spammers are greedy. They aren't content with putting one piece of graffiti on your site; they want to leave many spam posts. I use a timing algorithm to control that. It's simple enough: for each post from your IP, a counter is incremented. I use that counter to determine how long before you are allowed to post again. You can't make your second post until 15 seconds after your first, your third until 120 seconds after that, your fourth until 405 seconds and so on - it's the number of posts cubed times 15 seconds. This very effectively stops greedy spammers - they don't hang around. In the current version, these limits affect everyone (even me!) but I want to make them less restrictive (maybe posts cubed times 5 seconds) for known users in the next version.

The war is never over

So that's how I control spam comments here. Most obvious spam goes right to a black hole, users I know and trust get posted immediately, and everything else goes to moderation. I'm going to add an Akismet check in the next version - it never hurts to have a second opinion!

Instead of moderation, you could also throw up Captcha or arithmetic challenges, or require a random series of other answers or actions when a post is suspect. A spammer (especially an automated spammer) probably won't respond to even a simple "Click here to confirm your post" - I may add something like that to my next version. That's a minor annoyance for a first time poster or someone who insists upon using "anonymous", but it will stop most spammers dead.

Of course it is impossible to stop all spam. No matter what we do, we'll at least have to moderate a post now and then. However, with the spam controls I have here, I very seldom see spam - and if I do see it, I'm unlikely to see it again, because I'll adjust my code as necessary. I do have to moderate new visitors and frequent posters with new IP addresses, but that's not very onerous.

The war on spam never ends, but we can win most of the battles.

See Detecting Comment Spam, Part 3 for the continuation of this series.

/Web/detecting-comment-spam2.html copyright and reprint notice

Comments /Web/detecting-comment-spam2.html

Wed Dec 23 20:37:03 2009 MikeHostetler

http://squarepegsystems.com

I think you have a good plan here -- it's one thing for the spammers to know what you are doing to stop them, it's another for them to find a hole in your system. One thing about a good plan is that it has to be adaptable -- you may lose a battle here and there, but if you learn from your mistakes and tweak here and there, you will win many more than you will lose.

It would be interesting if you had a "minority report" bucket for comments -- ones that your system and Akismet have different opinions on. Maybe you can review them manually for a while -- perhaps you will learn a few things. Or maybe you will find out that you have a better system than they do!

Another interesting idea is to see if you can hook up SpamAssassin also help. I can't imagine the rules for email spam to be tons different than blog spam. Or maybe they are. Some notes have been done on it, but I think that's about it:
http://wiki.apache.org/spamassassin/BlogSpamAssassin

Wed Dec 23 20:41:46 2009 TonyLawrence

You'll be amused by this, Mike: Akismet says your comment is spam :-)

Obviously I disagree. I'll take a look at that Spamasassin link, thanks.

Wed Dec 23 20:51:38 2009 TonyLawrence

A comment at that Spammassin link gave me another idea:

When the form is loaded, I store the time against the users IP. When they hit POST, it's allowed if the time lapse is less than 2 hours (that's a pretty generous time limit to compose your post). The time is reset to zero upon posting, so a new POST without a form load won't work.

The Spamassasin link suggested that a minimum time makes sense too - a real user presumably spends at least a few seconds composing a message. That makes sense - I'll be adding something to do that.


Wed Dec 23 20:57:51 2009 TonyLawrence

Yeah, that makes sense: count the words, divide by 10 (pretty fast typing). If it hasn't been that many seconds between loading and posting, disallow.

Wed Dec 23 21:49:22 2009 MikeHostetler

http://squarepegsystems.com

That's hilarious that they marked me as spam. Probably because I used a link. See -- evidence that you need a multi-prong attack!

Thu Dec 24 01:51:57 2009 TonyLawrence

Subject:

Website: I don't think it was the link - I tried running it again to see why, and it didn't hit that time. We'll see what happens over time.

Thu Dec 24 02:04:17 2009 TonyLawrence

Subject:

Website: Ooops - I see part of the problem. I introduced a bug in my code so that the akismet got called twice. It was the first one that failed - but why would two calls with the same data return different results?

Thu Dec 24 02:06:22 2009 pcunixtony

Subject:

Website: Just testing with a different name to see the bug.

Thu Dec 24 03:12:01 2009 TonyLawrence

I found the problem. As I mentioned before, the old server sent cgi href's as GET's - this one always uses POST. My comment code was still expecting a GET on the first load - which led me to check Akismet twice, the first time without any data.

Fixed now (I hope).

Thu Dec 24 13:27:41 2009 TonyLawrence

Just as a general point of interest:

Last night this code caught 4 comments, three of which were pure spam and thrown away and one which I had to moderate (and subsequently toss because it was spam). Akismet didn't think that one was spam amd I don't send the pure junk to them for checking.

I don't track attempted comments thrown away because of posting too quickly but I can see in the logs that spammers do get caught by that.


Thu Dec 24 14:02:00 2009 TonyLawrence

Akismet just caught one. My consonant filter also caught it and it was tagged again for links to words ratio, so there wasn't much hope for that one to get through :-)

Fri Dec 25 02:58:24 2009 Michiel

Interesting, I'm going to follow these articles. It gives me some good ideas. In the meantime I have some stuff done, and reCAPTCHA is next. See http://www.michielovertoom.com/simpleblog/log.html for a short story on my proceedings.

Tue Dec 29 02:54:38 2009 Michiel

Yesterday I implemented reCAPTCHA in my prototype blog software. It was easier than expected: all you have to do is to sign up and receive some keys in the mail, then include a little bit of code in your script. Basically two functions do all the work, and they are provided in a ready-to-include script file.

I followed the suggestion done earlier to remember both the username and IP adress as a combination, and only require captcha validation once.

I also updated the webpage in which I describe the steps I took. See http://michielovertoom.com/simpleblog/log.html .

I'm looking forward to implement some more antispam ideas!


Fri Feb 5 14:00:49 2010 TonyLawrence

I tried Akismet for a month or so. I found that it seldom helped me - that is, if it said something was spam, my code usually already knew that was true and my code caught spam that it didn't know.

There were a few instances where it saw spam that I didn't. These were so few that I didn't feel it was worth paying the monthly fee (I have far too much traffic to qualify for the free version).

It's a good tool - if you can't write your own code, I would definitely recommend it.

Add your comments




cartoon

Detecting Comment Spam, Part 1

2009/12/21

Suppose you were writing a commenting system for a website and you wanted to check user input against a list of words that might indicate spam. You'd want the list of suspicious words in a file and you'd run through that list. An easy way to do that in Perl is to use Perl's "grep" command.

We'll start with a program that won't work. This will show a side effect of grep you need to be aware of:

#!/usr/bin/perl # Build the "comment" while (<>) { push @TST, $_; } # Now test against our list open(SPAM,"spamlist"); while (<SPAM>) { chomp; if (grep /$_/, @TST ) { print "found $_\n"; } }

You need this code and a "spamlist" file. You'd put the words you want to match in that file. For the purposes of this article, I'll assume that "fribble" is NOT in the list. Therefore, if you run this little script and type "fribble" and press Enter and then Cntrl-D, you'd expect no response - "fribble" isn't in the list of spam words.

But that's not what happens. When you run the program, type "fribble", it seems like "fribble" (or any other input) matches every word in the list. That can't be right, can it?

The problem is that "grep" modifies $_ in your loop. That's simple to fix; we set a temporary variable:

#!/usr/bin/perl # Build the "comment" while (<>) { push @TST, $_; } # Now test against our list open(SPAM,"spamlist"); while (<SPAM>) { chomp; $testing=$_; if (grep /$testing/, @TST ) { print "found $testing\n"; } }

That's a bit better. Running it with "fribble" produces no output, but if you give it something in your list, it finds it. Great!

Not quite. Let's say you had "ambien" in your list because you want to stop common pharmaceutical spam. If you type "ambien" when running the program, yes, it finds it, but it will also find "ambient". That's not good - how do we fix that?

Well, we want "ambien" only when it's a word by itself. Your first thought might be to use a space or "\s":

if (grep / $testing /, @TST ) ... if (grep /\s$testing\s/, @TST )

But that fails if "ambien" is at the beginning of a line in @TST. It works if "ambien" is at the end because \s matches end of line as "space" in addition to real spaces, tabs and formfeeds.

OK, we could do this:

if (grep /\s$testing\s/, @TST or grep /\s$testing$/, @TST or igrep /^$testing\s/, @TST )

Fortunately, we don't need to. Perl has a better way:

if (grep /\b$testing\b/, @TST)

That "\b" is for "word boundary" and it does exactly what we want: it matches "ambien" wherever it is in a line by itself. Note that this same syntax works with command line grep, but "grep "\<word\>" files" only works with command line grep, not Perl.

So, finding "spam" words isn't too hard. The next question is what to do about them when you find them. A few thoughts come to mind:

  • Refuse the comment flatly.
  • Refuse the comment but tell the user what word(s) triggered the rejection.
  • Remove the offending words from the comment
  • Increment a counter for each spam indicator found; refuse if the count exceeds some value (this is how SpamAssassin works).
  • Set the comment to require administrative approval before posting

None of these are ideal. In our next post, I'll dig into that a bit more deeply.

/Web/detecting-comment-spam.html copyright and reprint notice

Comments /Web/detecting-comment-spam.html

Mon Dec 21 21:18:25 2009 MikeHostetler

http://squarepegsystems.com

I've actually used Akismet in the past -- if you are a small user, it's free and it works well. It uses the same ideas that Tony is going to talk about, but it uses it for a much wider audience. More people to count means (ideally) better patter detection.

http://akismet.com/

Mon Dec 21 21:31:23 2009 TonyLawrence

I see a lot of people use Askimet. I still prefer to roll my own - for many reasons.

Mon Dec 21 21:44:45 2009 TonyLawrence

I do notice that Askimet has a Perl module on CPAN :

http://search.cpan.org/~nikolay/Net-Akismet/lib/Net/Akismet.pm

With that, you could "roll your own" while still taking advantage of Askimet.

I need to rewrite my comments code; I might just do that... no harm in having another opinion, after all.

Tue Dec 22 11:33:08 2009 Michiel

I'm experimenting with anti-spam measures too, and I think this is what I will use: When my blog detects that a contribution is made from an unknown IP address, it'll present a 'ReCAPTCHA' (See http://recaptcha.net/ ) for verification. I'm not sure scanning for spam words will work as well, since spammers always find new ways to hide their message from automatic scanners.

Tue Dec 22 11:45:02 2009 TonyLawrence

I'm not sure scanning for spam words will work as well, since spammers always find new ways to hide their message from automatic scanners.

I'll be talking about this more in the next part, but "words" aren't just things like pharmaceutical names. One thing spammers can't obfuscate is their link destination - the whole point is to post a link. The destination is part of the spam list.

Tue Dec 22 12:55:34 2009 Ralph

http://linuxcoaching.eu

In recent times there have been blog comments that only consist of legitimate words, so that the spam checker would not find any cause for rejection. But I regard these comments as spam too, because they are totally unrelated to the blog posting and their only purpose is to place a link to a suspect page. These comments tend to be overwhelmingly "positive" (and slimey) but they're nevertheless junk and clutter the blog with nonsensical praise.

I fear, we cannot fight spammers with software.

Tue Dec 22 13:08:19 2009 TonyLawrence

their only purpose is to place a link to a suspect page

Exactly. That's why the page(s) go into my spam list.

we cannot fight spammers with software

We can. But like any war, we'll lose a few battles. We can't stop EVERY spam post with software, but we can stop most of them.


Tue Dec 22 13:43:58 2009 MikeHostetler

http://squarepegsystems.com

Michial said, "When my blog detects that a contribution is made from an unknown IP address, it'll present a 'ReCAPTCHA' (See http://recaptcha.net/ ) for verification"

The problem with ReCAPTCHA is that it annoys legitimate users. But your use of only doing it for unknown IP addresses will help a little bit. But what if a legitimate user and a spammer are both behind the same firewall? Not to mention that sometimes figuring out what the Captcha is can be difficult (though ReCAPTCHA is better than most).

Tony said, "I need to rewrite my comments code; I might just do that... no harm in having another opinion, after all. "

I used Python's Askimet library to do some integration and it was very easy. You're right -- it's never bad to another opinion.

Tue Dec 22 14:23:41 2009 TonyLawrence

But what if a legitimate user and a spammer are both behind the same firewall?

Perhaps similar to what I do to decide if the comment can be published immediately or needs moderation. I key on IP plus username (anonymous is always moderated). A spammer could only guess at usernames that would match a legitimate users IP.

So Michiel could check both the IP and the username before throwing up a captcha - if the user has had that ip before, no captcha.



Tue Dec 22 14:40:43 2009 TonyLawrence

By the way, I want to wish all who are reading today a happy holiday.

We're pretty busy this week getting ready to have family in over the weekend, so I don't plan on Part 2 of this until next week. In that post, I want to discuss more about some of the ideas raised here in the comments: the ideals and realities of controlling spam comments, possible methods, pros and cons and so on.



Tue Dec 22 15:52:00 2009 anonymous

Subject:

Website: Tony replied:

their only purpose is to place a link to a suspect page
Exactly. That's why the page(s) go into my spam list.

But if you want to build such a list in these cases you have to examine the link "by hand" to find out if it is suspect or not, I cannot imagine any software to do that. I've decided to delete an unrelated comment based on the lack of information in it without looking at the link.

Tue Dec 22 15:52:19 2009 Ralph

linuxcoaching.eu

Tony replied:

their only purpose is to place a link to a suspect page
Exactly. That's why the page(s) go into my spam list.

But if you want to build such a list in these cases you have to examine the link "by hand" to find out if it is suspect or not, I cannot imagine any software to do that. I've decided to delete an unrelated comment based on the lack of information in it without looking at the link.

Tue Dec 22 16:09:25 2009 TonyLawrence

But if you want to build such a list in these cases you have to examine the link "by hand" to find out if it is suspect or not

Not entirely. I'll be talking more about that in the next post.

The war on spam is never a matter of using one tool. It's a combination of approaches that will give you good control.

Tue Dec 22 17:25:00 2009 BigDumbDInosaur

http://bcstechnology.net

If qualifying words in text against a list of verboten words (e.g., Viagra) is to be a primary anti-spam tool, I'd be inclined to keep the verboten word list sorted and use a binary search on it. A binary search for a single object in an ordered list executes in, worst-case, O(logN) iterations (where N is the number of words in the list), whereas grep's linear search will always require N iterations to determine that a suspect word is not verboten. This aspect of grep could represent a significant amount of processing time on a busy site, especially with long-winded posts.

Also, it might be useful to develop some sort of mechanism that could automate the adding of bad words to the verboten list. Phrases might be good as well, as oftentimes phrases are spam whereas individual words used in a phrase may be benign.

Tue Dec 22 21:55:35 2009 TonyLawrence

If you are searching a large list, certainly. Here, it's less than 200 lines and 2K. . But more importantly - we're not searching the list, we are searching the post for words in the list. You'd need to invert the search (take each word in the post and search in the list). I'm doubt this would be faster for the typical data sets, but it would be interesting to try some tests.

Add your comments




cartoon

Unix and Linux startup scripts, Part 3


2009/12/14

This is a continuation of Unix and Linux startup scripts, Part 2

So, after being rudely interrupted by a server crash and a few days of website migration, we're ready to continue exploring Unix and Linux startup scripts. We looked at both System V and BSD methods; until fairly recently that would have been the end of the discussion: if you were running Unix/Linux, your system used one or the other of these. Not everyone was satisfied, though

You can see the hints of unhappiness in the script directives that crept in to BSD startup scripts. More fine grained control was needed and neither System V inittab nor the BSD rc scripts provided enough.

Why replace init?

One reason is boot speed. Even when scripts can be run in parallel as in SCO's prc_sync "P" scripts , running mostly serial shell scripts takes time; you and I and everyone else want our systems up NOW. But it's not just that.

if you step back and look at init objectively, it is just a program starter. That was certainly true with the original BSD /etc/rc and inittab only added a bit more complexity. Init is a "starter", but so are inetd /xinetd, at and cron. Why shouldn't init be able to take on some of that work?

There are "event driven" tasks. Today, there are many kinds of hardware we expect to be able to just plug into a running system and be recognized. Usually some program needs to be started up when that happens. Sometimes you want something special to happen when a new file appears in a certain directory, or when some other task finishes or fails - life can be complicated and init, (x)inetd and cron/at don't always match our needs.

Think of some of the things you might want to control for any given script or process:

  • Persistency - should this process be restarted when it ends? Does it matter whether it ends normally or abnormally?

  • Runlevel - should this run in single user mode? In the default runlevel?

  • Privilege - what id should this run in?

  • Dependencies - what other tasks must be running or not running?

  • Security - should this be run in a chroot jail? Should it be sandboxed in other ways (no network access, for example)? Should only certain users, remote or local, be allowed to use this service? Start it? Stop it?

  • Resource limits - how many concurrent instances of this process should run? How many times can it fork? How many clients can it have? How much disk space, ram, and so on?

  • Timing - should this be running after working hours?

  • Logging - did the task run and if not, why not?

  • Grouping - there may be several tasks to which we would like to apply identical constraints or privileges

It's not that you can't have all of that; it's that some of it can be done by inittab, some by (x)inetd, some by cron/at, some by device drivers and some you have to do inside your script or program. It's a hodgepodge of stuff, and that's why there are attempts to replace it all with something more powerful and complete. The the man pages for inittab, cron, at, xinetd, udev and maybe a few others and glom them all together: THAT'S the ideal.

Implementing all that is something else entirely.

Aside from very real dependencies from programs that expect certain directories, certain processes and certain system calls, you also have inertia, developer resistance to seeing their projects replaced, user/administrator/programmer resistance to learning anything new and of course inevitable arguments about how, how much, when - given all that it's surprising that much progress has been made at all. But it has. Fedora and Ubuntu now use Upstart. Mac OS X has Launchd. The path to get there hasn't always been smooth - launchd was implemented in stages, first replacing cron, then xinetd and the rc scripts have only disappeared in Snow Leopard. Ubuntu 9.10 has replaced init with upstart but rc scripts still exist. xinetd is replaced, but not cron.

Be aware of this when reading Internet articles. For example, Mac OS X posts on launchd may not have been updated to reflect its current state on Snow Leopard or beyond. Even the Ubuntu Upstart FAQ has a section stating that Upstart probably won't replace xinetd - but it did.

You also need to be careful about the things you think you know. You might add something to /etc/rc.common on Snow Leopard, but it won't be run. On Ubuntu 9.10, scripts are still in /etc/init.d, but they get run by Upstart (see /etc/init for the scripts that start them). Fedora still has /etc/inittab, but it is only used to set the default run level - anything else added there will be ignored. There is plenty enough confusion to be had, especially for those of us who work on different systems and release levels.

References and further reading

/Basics/unix-startup-scripts-3.html copyright and reprint notice

Comments /Basics/unix-startup-scripts-3.html

Add your comments




cartoon
Older