Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

Friday, April 22, 2011

Finding the Last Occurrence of a Character in an Excel String: A User-Defined Function

I was looking for the last occurrence of a hyphen in a string of text data in a cell in Excel 2003.  I wasn't sure how many hyphens there would be, so I couldn't just do repeated searches for a hyphen from the beginning.  I vaguely recalled that there was a way to do this, but I wasn't sure what it was.  This post describes the process of answering that question.

This, I felt, was probably going to be close to the technique for counting the number of times a letter occurred in a cell.  I had searched for the answer to that one previously.  The answer here did involve the use of the SUBSTITUTE command too.  In that other case, the solution was to count the difference in length of the string before and after substituting nothing (i.e., "") for the occurrence of the character in question.

As pointed out by Aladin Akyurek, that would work here too, if you did the substitution, calculated the length difference, and then searched for the last occurrence, now that you knew how many times the character appeared in the string.   His solution appeared in a single formula, which I disfavored because it was easy to make mistakes in long formulas, hard to understand them, and hard to figure out where the mistake was.  But he did use SUBSTITUTE well, whereas I never used it, so I had something to learn there.

What I found more interesting, though, was his alternate suggestion, involving something else that I never did:  define a function and save it for future use.  I surely should have invested the time to learn Visual Basic, when it replaced the old-style BASIC that I had used for minor programming in the 1980s and somewhat thereafter, but then I guess VB changed again in the 2000s and threw a lot of its users off the trail.  Anyway, his function was one that I could actually understand, from the old BASIC days.  The steps to make it work were:

  • Close all Excel files other than the one you're working on.
  • Go into Tools > Macro > Visual Basic Editor > Insert > Module.
  • Copy and paste this into the window:
Function Reverse(Text As String) As String
Dim i As Integer
Dim StrNew As String
Dim StrOld As String
StrOld = Trim(Text)
For i = 1 To Len(StrOld)
StrNew = Mid(StrOld, i, 1) & StrNew
Next i
Reverse = StrNew
End Function
  • Go to File > Close and return to Microsoft Excel.
  • Use the new REVERSE function by specifying the text to be reversed.  For my present purposes, a good use would be like this:  =FIND("-",REVERSE(A1)).
I gave that a whirl.  It worked.  But some months later, after reinstalling Windows and Excel, it didn't work.  I was not sure what had changed.  I tried going into Tools > Macro > Security and changing the level to Medium; I played with alternatives; but I still got error messages.  As another approach, I installed Morefunc and then went into Tools > Morefunc and did what it required in order to install it, and then tried Morefunc's TEXTREVERSE command.  That didn't work either.  The message I was getting was #NAME, which apparently meant that Excel was not recognizing the function (i.e., TEXTREVERSE, or whatever it was called).  Something seemed to be interfering with the operation of the Visual Basic script.  I tried suspending my antivirus program, but that wasn't the solution.  Ultimately, the solution -- getting Morefunc's TEXTREVERSE to work -- seemed to involve a combination of killing and restarting Excel, embedding Morefunc into the worksheet until it finally said "Update successful," and typing my formula in all lowercase.  There was a new problem, though:  Morefunc's TEXTREVERSE function would crash Excel 2003 when dealing with cells containing more than 127 characters.  I tried again with the REVERSE code shown above.  Now that was working too, and no 127-character limit.  It worked on a cell containing more than 1,000 characters.

Thursday, April 21, 2011

Batch Renaming Text File Memos or Emails Using Their Contents to Create the Filenames

I was using Windows 7.  I had text files that contained old email messages, one message per file.  Each began with four or five lines:  FROM, TO, DATE, and SUBJECT, sometimes preceded by a blank line.  In that sense, they were very similar to memoranda.  At present, these emails had numerical names (e.g., 0045.txt) that said nothing about their contents.  I wanted to rename them to a Date - Time - From - To - Subject format, like this:

2010-04-13 13.45 Email from Joe to Nancy re Dinner.txt
In other words, their top several lines told me everything I needed to name them; I just didn't want to have to do it manually.  This post describes the steps I took to automate that naming process.

My knowledge of programming was more or less limited to DOS-style batch file scripting.  If I had been interested in entering single commands, I would have done that in a DOS box (a/k/a command window), which I could open in Windows 7 (customized) by going to Start > Run > cmd.  I wanted to run the relevant commands automatically, across a number of text files, so I would be saving those commands in a batch file, created in Notepad as a text file (ANSI format), saved as CommandFile.bat, and executed by double-clicking on it in Windows Explorer.  The question here was, what commands should go into that file?

Instead of creating a batch file, I could have used some other approach.  For instance, it seemed possible that IntelliGet or TextCrawler would do the job better, provided I were interested in spending the time to learn how to use them.  I did actually try with TextCrawler, but at this writing I had a question sitting in the TextCrawler forum for almost two days without an answer.  Same thing with AutoIT.  I wasn't sure how to proceed with those approaches, so at this point I was deciding to return to the batch file approach.  I had been using batch files on a very simple level for a long time, so I started there.

A search for batch command syntax information led me to the A-Z Index site, which I had used before.  I used that site to interpret what was happening in a two-line batch script that I found in another post:
@ECHO OFF
FOR /F "skip=2 delims=" %%i in (%1) DO >> lines.txt ECHO %%i & GOTO :eof
I've presented the DOS commands in capital letters to make them more visible.  ECHO would tell the computer what to make visible onscreen or in the output file (in this case, lines.txt).  The second line used a "FOR /F ... DO" loop to repeat the command.  (There were other FOR options.  For instance, FOR /D would operate on directories, not files.)  In plain English, the concept there was, "For each X, do Y."  The big idea was, apparently, that the "skip" part would tell it to skip two lines and then echo the third one to the lines.txt file, and then go to :eof (short for "end of file," i.e., end of this script).  In other words, this FOR-DO combination just said, skip a line, skip a line, output a line, then break out of the loop (i.e., quit).  So there would have to be another command that would call this script for each file in the directory.  But before getting to that, I wanted to understand how the "skip" part worked, because that's what I would need to modify for my purposes.  The syntax reference seemed to be saying that "skip=2" was a command to skip two lines, as I had just surmised.  Another source seemed to say that the "delims" part would specify what would count as the unit of analysis.  For instance, if there were a blank space between the equals sign (=) and the quotation mark, we would be looking for a blank space, not the end of a line, as a delimiter.  The loop would restart each time it found a blank space.  Apparently leaving no space between = and " meant that it would default to end-of-line as its delimiter.  I wasn't sure about that.  But what we seemed to be saying, so far, was that we were going to loop each time we encountered a line end, for a total of two loops.  Then, having arrived at the third line, we would proceed with the main attraction:  we would do something with "%%i in (%1)."  Specifically, we would export it, whatever it was, to lines.txt.  What we would be exporting would be %%i.  The percent signs, I knew, indicated a variable.  So we were going to export variable i, in %1, to lines.txt.  I guessed that (%1), being in parentheses, referred to a varible already supplied.  That is, the command that would call this script would indicate the name of a file to examine, and this %1 would represent that filename.  So, to recap:  having arrived at the third line of the file in question, let's export something (specifically, %%i) to lines.txt.  Since there wasn't any more information on what the %i variable (presented as %%i because it was occurring inside a batch file) was supposed to do, it seemed that it must just refer to the whole unit of analysis -- in this case, the whole line -- where we had ended up, having skipped two lines to get here.  In other words, this whole third line of the text file would be exported to lines.txt.

Now, how would all that work for me, where I wanted to output multiple lines?  I thought I might try to identify each of the lines in question with a separate variable, and then concatenate (i.e., combine) those variables in the output.  Basically, if variable L1 stands for line 1, and L2 for line 2, and so forth, then what I wanted was to output L1+L2+L3+L4 to lines.txt (though I preferred to call it output.txt).  (The double output signs, >>, meant "add this onto whatever already exists in the output file, as distinct from >, which would mean "replace whatever is in the output file with this new data.")  I wouldn't need a skip parameter -- I wanted to output all of the first four lines or, actually, the first five, since in some files the first line was a blank.

At this point, it occurred to me that I might have another option.  I made a backup of the folder containing the emails.  Then, at the DOS command prompt, I typed COPY *.* COMBINED.TXT.  This put all of their contents into one large text file.  I opened it and copied and pasted its entire contents into an Excel spreadsheet.  Then I parsed its lines to produce the desired filenames.  This was much more time-consuming than a scripted solution would have been, and it was not what I set out to find, but it did get the job done.