Recently I had downloaded a csv file with the intention of extracting some data to satisfy my curiousity about something. I wrote a little Perl script to slice and dice the data, and that would have been that - except I wanted
to know something quickly from the original file, so I did something like "grep whatever 2003.csv".
I got back nothing.
That's odd, I thought. I know that "whatever" is in there. So I fired up vim and did "/whatever" and, sure enough, there it was.
So why couldn't I extract it with grep?
Hmm. Let's do a "more". Ooops! After warning my that "2003.csv" may be a binary file. See it anyway?, "more" showed me a mess.
Well, duh, that's why I couldn't grep from the file - the darn thing is utf-16!
So, what can you do if faced with this situation? You have a few choices. You could ask vim to rewrite it. That's easy:
Vim can do all sorts of file encoding rewriting; see
Using another encoding in the VIM docs.
You could use Perl to rewrite the file, though Perl has some funny ideas about what utf8 means, plus some other oddities here and there.
At the Terminal command line, you can use "iconv":
iconv -f utf-16 -t utf-8 2003.csv | grep whatever
Though that gets old fast, so I just converted the file.
Wouldn't it have been nice if we never had 7 or 8 bit encodings?
No such file or directory error message
Had a client on a Red Hat system complain that he was getting an error
from cron. A script in cron.weekly complained about "No such file or
directory", but the file was there - it made no immediate sense.
The error seemed rather definite:
/usr/bin/run-parts: /etc/cron.weekly/procmail-users: No such file or
I figured it was going to be a symlink problem or an incorrect shebang
line, but no, everything looked fine, and you'd get the same error
running it from the command line.
I kept looking and looking at this until I noticed that while editing it
in vi, a little "[dos]" appeared next to the file name at the bottom of
Ahah. A "file proc*" confirmed that this had CRLF line endings. But
normally I'd expect to see ^M's in vi; I didn't. That puzzles me a
little, but since the script was just a one-liner, I removed and
recreated it manually and now of course it works.
You can also do:
and then write the file, you'll convert dos or mac file endings to unix.
Of course there's :set ff=dos and :set ff=mac too.
You can be more verbose if you wish:
Nowadays, you may run into UTF-8 vs. UTF-16 problems too. See Converting File Encodings
Got something to add? Send me email.
Increase ad revenue 50-250% with Ezoic
More Articles by Anthony Lawrence
Find me on Google+
© 2013-07-31 Anthony Lawrence