Saturday, July 25, 2009

How to Find a Large Number of Random Files on Your Computer

I had a list of more than 1,600 old files that used to exist on my computer. I wanted to find out if these files were still there and in good condition, or if they had gotten lost or accidentally deleted over the years.

I was using Windows XP. If it had been just a few files, I could have done a search manually, one file at a time. (I use WinXP in classic mode, so for me the sequence would be Start > Search > For Files or Folders.) But with this large number of files, I needed to automate the process. Here's the approach I used.
1. Spreadsheet. Put the list of files in Excel. If you don't have Excel, one free alternative is the spreadsheet in OpenOffice. Sort the list of files alphabetically and view it to see if you have any duplicates. Duplicates will make your computer repeat an unnecessary search. These searches can take a long time if you have a lot of files. If you know how, you can use the spreadsheet to filter out unique records. In Excel 2003, that option is in Data > Filter > Advanced Filter. For me, this reduced my starting list of 1,600 file names to just a few hundred.
2. DOS. Write a DOS command for each file, in the spreadsheet column next to the filename column. The DOS command will search for the file. The command for this purpose is DIR. (Capital letters aren't necessary in DOS commands, but I'll use them here for clarity.) Most DOS commands come with additional options. In this case, I used the S, B, and -C options. To write the DOS command automatically for all of the files in my list, I figured out the command format in the first row of the spreadsheet and then just copied it down to all the other rows. Using my first row as an example, the command I wanted was DIR "FILE1.DOC" /S /B, with the quotation marks. Those options would (a) search all subdirectories for the file and (b) give me the location of the file on the same line as the filename. My list of files was in column G of my spreadsheet, so on line 2 of column H, under the column title (i.e., in cell H2), I wrote this:
="DIR "&CHAR(34)&G2&CHAR(34)&" /S /B"
with the quotation marks. CHAR() inserts a symbol; 34 is the number of the quotation mark symbol. That is, CHAR(34) allowed me to show a quotation mark in the result, without confusing Excel as to what that quotation mark was supposed to mean. Thus, when I was done entering those characters in cell H2, the cell looked like this:
These characters, if entered as a DOS command, would search for the file named FILE1.DOC and would tell me where it was, if it existed. The quotation marks were useful because some of my filenames contained spaces, such as "Letter to Joe.doc." Without the quotation marks, DOS would stop at the first space, assuming that the name of the file was simply "Letter," and the rest of the filename would confuse DOS and yield a failed search. When I had the formula for the search command I wanted to use in DOS, I copied it down to all rows in the spreadsheet. Thus, for instance, cell H3 now displayed a search for a file called FILE2.DOC, in this form:
3. Redirect. Indicate that you want to save the search results in a file. To do this, I needed to add a redirection sign to my commands. The redirection sign ">>" would create the output file if it didn't exist, and would append the results of the current command to the file if it did already exist. I went back to cell H2 and revised its formula to look like this:
="DIR "&CHAR(34)&G2&CHAR(34)&" /S /B >> "&CHAR(34)&"D:\File List.txt"&CHAR(34)
I needed the additional CHAR(34) items because my output file name, "File List.txt," contained a space. When I entered that revision in cell H2, the resulting command looked like this:
DIR "FILE1.DOC" /S /B >> "D:\File List.txt"
This would list all of the found files in an output file located at the root of my drive D. If I didn't know the exact filename I was searching for, I would use wildcards in my DIR command. For example, a search for FILEX*.* would give me all files whose names began with FILEX. Finally, copy this revised DIR command down to all rows in your filelist spreadsheet.
4. Batch. Copy all of these commands into a DOS batch file. To create a DOS batch file, open Windows Notepad and copy and paste from the spreadsheet to Notepad. In my case, the first two lines in Notepad now read as follows:
DIR "FILE1.DOC" /S /B >> "D:\File List.txt"
DIR "FILE2.DOC" /S /B >> "D:\File List.txt"
Save the DOS batch file, but don't close it yet. For simplicity, I saved mine to the same place as the output file (i.e., D:\), and I called it simply BATCH.BAT. The BAT extension is important; it makes the file executable. Without that, you've just got a text file that won't run. Also, in Notepad, be sure to save BATCH.BAT in ANSI format. You will find this option under Notepad's File > Save As > Encoding option.
My batch file was now sitting in D:\, but the place I wanted to search was on drive E. So I needed to add commands on the first two lines of BATCH.BAT. While I was still in Notepad, then, I went to the top of BATCH.BAT and inserted these two lines:
CD \
These commands would tell DOS to go to drive E and then go to the root directory (which is also known, confusingly, as the "top" of the DOS subdirectory tree) before it starts its search. Now it was all set to run its searches across the entire drive E. Of course, I could go back into BATCH.BAT later and change its first line to refer to drive F instead, and then it would run the whole search again on drive F and would append the results to my D:\File List.txt file. (To edit a batch file, right-click on it and select Edit.) To automate the whole thing, of course, you could just repeat the contents of BATCH.BAT in a separate batch file for drives E, F, etc., and run them all at the same time (though you'd better do a global find-and-replace so they aren't all trying to write to File List.txt at the same time); or if you didn't want them all running at once, you could just copy and paste the list of commands again and again in BATCH.BAT, adding a line in front of each set to direct the computer to drive D, then E, then F, etc.
If you would like to test any lines from BATCH.BAT at any time, you can copy and paste them to a DOS command box. You can open a DOS command box by going to WinXP's Start > Run option and typing CMD into the space (and then hit Enter). You can also test the batch file as a whole by saving some of its lines into another batch file (let's call it X.BAT) and running that.
Note: when you're done with these batch files, delete them or rename them without the BAT extension. For instance, you might rename BATCH.BAT as BATCH.TXT. You don't want executable batch files sitting around, ready to go off and do all kinds of strange things if you accidentally click on them. You can put explanatory comments into batch files, but be sur eto precede each comment line with "::" or "REM" (short for "remark"). If you don't "comment out" your textual remarks in a batch file, DOS will try to run them as though they, too, were commands.
5. Run and Wait. Used to be known as "hurry up and wait," but then life sped up. (Kidding.) To run BATCH.BAT, save it with the changes just described, and then go to D:\ in Windows Explorer and double-click on it. The search can easily take hours if you are searching for a lot of files on several large hard drives.
6. Problem: Long Directory Names. As I watched the command run, I saw that I was getting an error message. It said, "The directory name [name of folder] is too long." It seemed to be doing this for one folder in particular. When I checked that folder in Windows Explorer, I found that it was buried so deep that I could not even view its contents. I killed the CMD window where BATCH.BAT was running, moved that deeply buried folder to a shallower location, deleted File List.txt (so that it would not contain repetitive information), and then re-ran the batch file. This fix told me that DOS commands in WinXP did not suffer from the much shorter file name limits found in original DOS. (If this hadn't worked, I was considering a revised batch file in which I would first have the system go to each subdirectory and then run its DIR command there. This approach might have involved printing out the directory tree and using the results as input to DIR.)
7. Further Operations. The foregoing steps resulted in a list showing that some of the files I sought did indeed exist on my computer, and showing where they were. That was the purpose of this blog post, so I won't detail additional steps. Briefly, though, in my case I wanted to gather those files in one location, so that I could take a look at them. I created a folder (D:\Gather) for the purpose. I could have copied this list of files into an Excel spreadsheet, and (using the same kind of formula as above) could have developed, for each of these files, a new batch file that used the DOS MOVE command to achieve the move. Each line in that batch file would use basically this format:
MOVE /-Y "D:\Folder\Subfolder\Filename.doc" D:\Gather
The Excel spreadsheet would give me a count of how many files I was expecting to find in the Gather folder, and of course Windows Explorer would show me how many actually made it. Before doing that in Excel, though, I realized it would be easier, with search and replace, to add the necessary ingredients to each line using Microsoft Word. Since not all filenames ended with ".doc," I had to do at least one search and replace involving ^p (Word's formula for the end of a line). Note that Word may default to using smart quotation marks, or may otherwise insert characters that won't run as plain text in a batch file, so be sure to do a final check (and run one or more search-and-replaces, copying samples of the smart quote and an actual, typed Notepad-style quotation mark from the text and pasting it into the search box, if needed) in Notepad. If you want to preserve the error messages (if any) produced by DOS in this process, you may want to run BATCH.BAT from the DOS command line instead of double-clicking on it in Windows Explorer.
As always, if you find this helpful, please enter a comment. Cheers!



Correction on that last part: the MOVE command works only on the same drive. When moving from one drive to another, try using XCOPY to copy the files and then DEL to delete them from the source directories. Another possibility is to use the MV command in Linux. You can run Linux, which is free, just by booting from a Linux CD. (Ubuntu is my preferred version.) I have also found Ubuntu's File Browser program (a Windows Explorer look-alike) very useful for file operations that cause Windows to crash, as sometimes happens when I am trying to move a large number of files.