Sunday, May 27, 2012

Batch Converting Multiple Word DOC Files to PDF in Scattered Folders

I had a large number of .doc files produced by Microsoft Word.  These files were in assorted folders.  I wanted to convert some or all of these files to PDF format.  This post describes the steps I took.

I had already tackled similar problems in several other posts, including these:

This post does not detail all of the steps described in those other posts.  If a step described here is not clear, perhaps one of those posts expresses it more lucidly.

I started by getting a list of the DOC files to be converted.  For this, I opened a command window and typed "DIR /s /b /a-d > doclist.txt."  It was OK if this DOC list included files that I did not want to convert:  I could go through the list manually at this point, deleting those that I did not want to convert, or I could do that in the next step.  The next step was to copy and paste the list of files from doclist.txt into Microsoft Excel or some other spreadsheet.  This gave me a list of file and path names that looked like this:
D:\Folder3\Subfolder 8\Filename Z.doc
Since some paths and/or filenames contained spaces, I would tend to use quotation marks in commands relating to them, in both Excel and the command window.  In Excel, I used the REVERSE function and other spreadsheet commands to extract the path (e.g., "D:\Folder3\Subfolder 8\") from the filename (e.g., "Filename Z.doc").  So now I had separate columns showing the paths and the filenames for each entry in doclist.txt.  This would be a good point for using formulas to identify groups of DOC files that I did not presently wish to convert to PDF.

The next step in the spreadsheet was to identify the filename without the extension, and to add PDF instead of DOC to that rump filename.  In other words, in this step I went from having Filename Z.doc to having Filename Z.pdf.  This gave me the essential ingredients for the batch commands that I would assemble on each line of the spreadsheet and would then paste into Notepad and save as a .bat file, so as to automate the conversion.

There were two ways to proceed at this point.  One was to leave the DOCs in place, in their home folders, and do the conversion and replacement right there.  I didn't like that approach.  It was too hard to be sure of what had happened in all those scattered folders.  The approach I preferred was to bring all those .DOC files together in one central folder, do the conversion, and then use the spreadsheet to construct batch files that would put those PDFs back where they belonged and, optionally, delete the DOCs from which they had come.

Bringing the DOC files to a central folder could be done very easily with a search program like Everything, searching for *.doc.  It could also be done with batch commands constructed in the spreadsheet.  An Excel formula producing a command of the latter nature would be something like ="move /-y "&char(34)&[cell containing filename including .doc extension]&char(34)&" D:\CentralFolder").  It would be important not to take this step -- that is, not to move the files away from their home folders to the central folder -- until I already had a list of where the files came from originally.  Without that, I'd have a big collection of DOC files and no idea of where they belonged.  Note that files bearing identical names, coming from different folders into one, could require some advance manual renaming to avoid overwriting.  In that case, after renaming but before moving, it would probably be advisable to re-run DIR, so as to get the current filenames.

Once the files were all in a central location (in this case, D:\Conversion), it was time to work up the batch conversion process.  For this, first, I set the General and Options tabs in Bullzip (my free PDF printer) so that it would operate without asking questions or opening PDFs, and would save the PDFs to a designated folder (D:\Conversion\PDFs).  Then I saved this command into a batch file that I called Converter.bat:
FOR /F "usebackq delims=" %%g IN (`dir /b "*.doc"`) DO "C:\Program Files (x86)\Microsoft Office\Office11\winword.exe" "%%g" /q /n /mFilePrintDefault /mFileExit && TASKKILL /f /im winword.exe
I saved Converter.bat in the folder containing the DOC files (in this case, D:\Conversion) and ran it.  It worked away for a while, at the speed of one document every few seconds, until it had produced one PDF for each of my DOC files.  Several times during the process, Word or Bullzip stalled with error messages (e.g., "Word cannot start the converter Rftdca32.cnv").  This seemed to result primarily from corrupted Word docs.  There seemed to be little alternative but to delete those files except where I could find a backup.

Now I had a set of DOCs and a set of PDFs.  One easy way to make sure that I had a copy of PDF for each DOC was to view the folders using a Windows Explorer alternative like FreeCommander.  In FreeCommander, I could combine the DOCs and PDFs together, sort by file type, select all DOCs, re-sort by file name, and look for instances in which alternating lines were not regularly highlighted.  (In Windows 7, Windows Explorer had lost the ability to retain highlighting after files were re-sorted.)  At this point or later, one could then just delete all DOCs that did have a corresponding PDF.  DoubleKiller Pro would provide a similar approach.  Another method, more suitable for large numbers of files, was to use the DIR and spreadsheet approach outlined above, writing formulas to check for identical filenames (not counting extensions).  Of course, there was no need to actually delete the DOCs if I wanted to keep both the PDF and the DOC.

I postponed that step to verify, first, that I would not be needing any of the DOCs anymore.  I had previously worked on ways to check PDFs by converting them to JPGs and seeing which ones converted successfully.  In that previous effort, IrfanView (my preferred tool) had not behaved as expected, so I had grappled with other approaches.  This time, however, the quick IrfanView batch conversion went smoothly.  This gave me a JPG displaying the first page of each PDF.  My decision there was that, in the interests of speed (and to avoid having to go through every page of every PDF),  I was content to look just at the first page.  There could still be errors on later pages of a PDF, but that would be rare.  If the first page came through OK, I could be fairly confident that most docs converted successfully.  So now, using IrfanView, I flipped through those JPGs quickly.

With these steps out of the way -- PDFs checked, superannuated DOCs deleted -- I went back to my Excel spreadsheet and worked up batch commands to move the new PDFs back to where the DOCs had been.  I had changed a couple of names along the way, so I had to move those manually, but the rest went automatically.  Project done!

Thursday, May 24, 2012

Videos from the Mid-2000s

The contents of this post have been merged into a new Best Videos Ever post.

Wednesday, May 23, 2012

Windows 7: Considering a RAID Array for Performance

It had been more than a year since I had last looked at adding a RAID array to improve performance.  I could see that my hard drive light was on very often, and the system slowed way down when that happened.  So it seemed advisable to take another shot at RAID.

I had a reasonably good backup arrangement.  As such, I wasn't too worried about the risk of losing data.  My motherboard would accommodate RAID 5 or 10, if I wanted to add more drives, but at this point hard drive supplies were still reeling from the floods in Thailand.  Prices were high on HDDs, and large solid-state devices (SSDs) were still prohibitively expensive.  For at least the near future, I was looking at a two-drive RAID 0.

The main problem with RAID 0 would be the amount of time required to restore my data from backup, and the increased likelihood that I would have to do so (because, in RAID 0, failure of either drive would mean failure of the entire array).  It made sense to make the RAID array as small as possible.  It would need to contain data that I used frequently, but there were a number of other things, less frequently accessed, that I could put onto a third drive that would not be part of the array.  A smaller RAID array would be quicker to restore and would also require smaller, less expensive drives -- which, in turn, would make it more feasible, sometime down the line, to add more drives and switch to a different RAID flavor.

The Moo0 System Monitor was informing me that, at present, my hard drive delays were coming from my existing data drive, not from drive C.  I had not been eager to reinstall Windows onto a new RAIDed drive C.  I was also not too keen on the prospect that Acronis backups and restores involving a RAIDed drive C could be complicated.  So there seemed to be some reason to put my programs partition (drive C) onto that standalone third drive as well.

While I was content with my backup arrangement on a day-to-day basis, I did not presently have an hourly backup.  If I decided to make that third drive a large one, I could use some of its space to store hourly backups.

The plan, then, was for the third drive to contain my drive C partition, my large and infrequently used data items, and perhaps an hourly backup partition.  It could also hold the partition where I stored my Acronis drive images, the partition containing my customized Start Menu and installation-related programs, and the partition that I used for various programs' cache folders and such.  The only partitions on this third drive that would need to be backed up would be drive C, the data partition, and the Start Menu partition.
At this point, there was a pause.  I found myself involved with a large spreadsheet task that was just dragging in its calculations. Those calculations were taking so long that I was forced to take considerable breaks from the project to do other things while Excel would grind away. The problem there was that, when I did come back to the spreadsheet, sometimes it would be several hours later. By that point, my head was no longer tuned in to the project; I was thinking about other things. As a result, I was forgetting and overlooking things that I would have handled much more efficiently if the computer had been able to keep up.

So before installing the RAID array, I upgraded my CPU.  To my surprise, although Moo0 had not seemed to indicate this, the faster CPU contributed noticeably to other tasks as well.  This performance boost took away some of the urgency behind the transition to RAID.

A second hardware upgrade made an even greater difference.  I had found that sometimes I would have to wait quite a while for the screen to keep up when I would switch from one program to another.  I had assumed that the hard drive was the reason.  As I could see from its activity light, it was keeping very busy, and Moo0 was indicating that it was the bottleneck.  Still, it occurred to me that some of the display issues might be due to limits in the motherboard's onboard graphics.  So I installed a 2GB video card.  Suddenly things were *much* better.  It seemed that the hard drive may somehow have been working overtime to compensate for inadequate video.

The speedup wrought by these two upgrades -- the faster CPU and the more capacious video card -- was so substantial as to eliminate, for now, the need for a RAID array.  In that sense, these other hardware upgrades paid for themselves.  I was pleased not to have to spend the time and endure the upheaval that would be involved in getting my data onto a RAID array, as well as the more-than-doubled risk that a single hard drive failure would wipe out all (not just some) of my data.

There were two other considerations favoring my decision to delay the RAID investment until a time of greater need.  First, as I could see, solid state drives (SSDs) were coming down in price.  Second, I was doing some housecleaning, deleting or compressing some materials and putting others onto separate partitions.  It seemed possible that, at some point, an SSD (or, conceivably, an SSD RAID array) containing a smaller set of files would be much more affordable.  Hence, for now, the RAID investment was once again on the back burner.

A Million-Day Calendar with Explicit Julian-Gregorian Comparison

I wanted to look up a historical date.  Specifically, I wanted to know which day of the week it occurred on.  As I was looking for an answer to that question, I gradually came to the impression that there did not exist a standard calendar.  I decided to build one.  This post describes that process.

Someone may already have created what I was looking for.  But I wasn't finding it.  What I was looking for was, simply, the Official Calendar.  Of the United States, of the Catholic Church, it didn't matter -- just an official calendar that some reputable body had actually committed to print (preferably with explanations, and without errors).

What I was finding, instead, was lots of rules about how to calculate an official calendar, as well as various tools that would assist in those calculations.  This was fine, as far as it went.  But we don't generally tell people who prefer the Celsius temperature scale to just use the Fahrenheit and convert it.  Instead, people living in places that use Celsius have thermometers that show them the literal answer, without the need for a manual conversion process.  I wanted something like that for calendar dates.

Ultimately, I created a calendar covering a million days, starting on January 1, 500 BC. I produced that calendar as a spreadsheet, printed it as a PDF, and made both available for download. I don't often revisit this post. As of an update in early 2023, these materials are available for download through my SourceForge project page or at Box.com or MediaFire. See also my download blog post.

The PDF is a 38MB, 10,000-page document.  I would not recommend printing more than necessary.

Assumptions and Calculations
Built into the Million-Day Calendar

I chose Excel 2010 to develop the calendar because that version of Excel could accommodate somewhat more than a million rows.  I did not use Excel's built-in date arithmetic, though, because of its known errors.  That is, I did not ask Excel to calculate the necessary dates automatically.  Instead, I calculated them in a semi-manual process.  The process was not entirely manual, because I did not calculate row by row, day by day, for each of the million days shown.  Instead, I developed formulas that would count forwards or backwards from a certain date, and I applied those formulas to the million rows, usually broken into several segments due to historical changes in calendar calculation.  There were some manual adjustments as well.

I found that a million days would cover approximately the period from January 1, 500 BC to the year 2238 AD.  This seemed like a good range for most purposes. For dates outside this range, there would still be the option of using a formula or calculator, or of adding another tab to extend the spreadsheet.

As shown in the preceding paragraph, I was inclined to use AD and BC to refer to calendar eras.  AD was short for Anno Domini (Latin for "in the year of the [or "our"] Lord"). AD and BC (short for "Before Christ") were thus based on an early medieval calculation of the number of years before or after the birth of Jesus. This religious origin was an addition to other religious origins (e.g., "Thursday" deriving from "Thor's Day"). Instead of AD and BC, an apparent minority of non-Christians preferred to use CE (short for "Christian Era" or "Common Era" or "Current Era") and BCE.

Traditional chronology did not incorporate a year zero (i.e., 0 AD or 0 BC).  That is, the calendar went directly from 1 BC to 1 AD.  The original concept may have been that there was no need for a year zero, since Jesus was not born until the start of the first year of his life (incorrectly calculated as 1 AD).  This variation would make no practical difference in the AD era:  for example, the number 2012 represented the year in which this post was written.  It would lead to difficulties in the BC era, however.  For instance, the rule on leap years (involving division by 4) would produce a leap year in the year 4 AD and, before that, in the year 0; but since there was no year 0, the prior leap year was in 1 BC.  Hence, traditional BC dates did not fit exactly with the rule that leap years are evenly divisible by 4.

The calendar in effect at the time of Jesus was the Julian calendar, introduced by Julius Caesar in 46 BC -- a year which, by decree, was 445 days long.  The Julian calendar was revised several times, finally stabilizing in 4 AD.  For present purposes, the key innovation of the Julian calendar was the decision to define the year as equal to 365.25 days, adjusted via leap years in every year evenly divisible by 4 (e.g., 2008, 2012, 2016).  The Julian calendar eliminated the leap month Mercedonius but did not otherwise significantly change the names or lengths of months.  For purposes of year numberingepochs (i.e., reference years) in the early centuries of the Julian calendar commonly used regnal systems based on the current ruler or other officials (e.g., "January 1 in the second year of the reign of the Emperor Justinian"), but there was a semi-chaos of other epochs as well.  For instance, the Anno Mundi era started from calculations of the date on which the world was created, and the Ab urbe condita era started from the hypothesized date when Rome was founded.

The big change after the institution of the Julian calendar came in 1582 AD, when Pope Gregory XIII introduced the Gregorian calendar.  The Gregorian reform assumed the use of AD rather than regnal or other epoch systems; the AD epoch concept had been gradually spreading during the Middle Ages.  Gregory's principal contribution was to revise leap year calculations.  Over the centuries, the Julian calendar had become increasingly inaccurate with respect to the actual equinox.  That is, the calendar might say that it was March 21 -- the time for Easter -- and therefore daytime and nightime should each be about 12 hours long; but in fact, according to the clock, that day would already have arrived more than a week earlier.

In other words, the Julian calendar was falling behind the real world because the calendar was inserting too many leap years.  The extra leap days were making the Julian calendar late:  it would say the date was only March 11, when it really should have been March 21.  Gregory thus removed ten days from the calendar for October 1582, to catch up, and also changed the leap year calculation slightly.  The Gregorian rule for leap years was that every year evenly divisible by 4 would still be a leap year, except that years evenly divisible by 100 would not be leap years unless they were also evenly divisible by 400.  So 1700, 1800, and 1900 would not be leap years, but 1600 and 2000 would be.

This adjustment was still not perfect, but because of gradual slowing in the Earth's rotation, it was apparently pretty close.  The slowing issue, which I did not explore, may have been related to the difference between the tropical year and the sidereal year.  The Julian and Gregorian calendars were apparently based on the tropical year, which was the amount of time that it took the Sun (as seen from Earth) to come back to the same place as it was on the previous vernal (spring) equinox.  The sidereal year was an alternative to the tropical year:  it was the amount of time that it took Earth to return to the same relative position as it had occupied a year earlier, as measured with reference to certain stars.

These findings about the Julian and Gregorian calendars called for some decisions, for purposes of constructing a million-day calendar.  One such problem had to do with the present day.  My computer might tell me that it was May 6, 2012.  This would be a date in the Gregorian calendar.  Its appearance on my computer, my wristwatch, and everywhere else would testify to Gregory's widespread success.  I knew, however, that there was also a Chinese New Year and a Jewish calendar and all sorts of other calendars that still had meaning for various cultural and religious purposes, as well as the similarly named but essentially unrelated Julian Year system used in astronomy.  Even the Julian calendar continued to be used in Eastern Orthodox churches.  I decided that the intended spreadsheet approach to the million-day calendar might enable others to add these alternative calendars as they wished.  Because of the size of the spreadsheet and the relative rarity and potential complexity of these other calendars, however, I decided that I would not try to build any of these alternatives into the calendar myself, but would instead focus on the Julian and Gregorian calendars that predominated in the West during the timeframe addressed in the million-day calendar.

Another problem had to do with adoption dates.  The Gregorian adjustment of October 1582 specified that the Julian calendar would end on October 4, 1582; the days of October 5 through October 14 (inclusive) would not exist; and the Gregorian calendar would begin on October 15, 1582.  This rule was adopted at very divergent rates:  immediately, in several Roman Catholic countries, but elsewhere with considerable delays and confusion continuing into the 20th century.  The problem here, then, was that October 5, 1582 did not exist in Spain, and yet someone in England could be staring at a letter dated October 5, 1582, and that would make perfect sense according to the Julian calendar, which would continue to be used in England until 1752 (at which point England would need to delete eleven days, not ten, to get in sync with the Gregorian reform).  During the transition period in England, people commonly used the terms "Old Style" (abbreviated as "O.S." in English, and as "st.v." in Latin) to refer to the Julian date, and "New Style" ("N.S." or "st.n.") to refer to the Gregorian date.

As just described, the Gregorian calendar officially began (and was officially implemented in some places) on October 15, 1582; the Julian calendar officially ended on the preceding day, which (according to the Julian) was October 4, 1582.  But one could also say that October 4, 1582 (Julian) was the same as October 14, 1582 (Gregorian).  This way of looking at the matter would require proleptic (i.e., anachronistic) calculations.  Specifically, there would be a proleptic Gregorian calendar for all dates before October 15, 1582 on the Gregorian calendar, and there would also be a proleptic Julian calendar for all dates before January 1, 4 AD on the Julian calendar.  October 13, 1582 (Gregorian) would be the same as October 3, 1582 (Julian); October 12 (G) would be the same as October 2 (J); and so forth, back in time.

Since the Gregorian calendar did not exist before 1582, the statement that the Battle of Hastings occurred on October 14, 1066 would imply that it was October 14 according to the Julian calendar, not the Gregorian.  While it could be confusing to cite proleptic Gregorian dates for events that were made part of history according to the Julian calendar, there seemed to be some applications for which a proleptic Gregorian calendar could be useful.  For example, someone might be interested in determining whether a certain event happened on the actual equinox, as distinct from the date represented as the equinox in the Julian calendar.  In developing the million-day calendar, I thought it would thus be useful to display Julian and Gregorian dates side-by-side, so as to confirm the accuracy of the calendar and/or of others' conversions between the two, as described more fully below.

To a much greater degree than the proleptic Gregorian calendar, it seemed that the proleptic Julian calendar could be useful for a variety of historical situations.  The concept here was, in essence, that one could work backwards to construct a Julian calendar for dates long before Julius Caesar, and could use that calendar to construct a list of standard dates when various historical events occurred.  Although sources rarely seemed to specify what calendar they were using, it appeared that the proleptic Julian calendar was in fact being used widely for this purpose.  There would certainly be scholarly disputes as to the conversion of ancient chronologies to Julian calendar terms (so as to interpret, for instance, a statement that a certain event occurred in the 245th year since the founding of Rome), but at least the calendar system itself would be consistent over centuries.

Developing and Testing the Million-Day Calendar

I added proleptic Julian calendar calculations to the million-day calendar. I started these calculations by adding a separate Julian Days table to the spreadsheet. The concept of the Julian Day was proposed by Joseph Scaliger in 1583. Julian Days were simply a count of days, beginning (for astronomical and historical reasons) with Day Zero at 12:00 noon on January 1, 4713 BC. (Julian Days could include decimal values for fractions of a day, such as 0.083 = 2 PM.) So, for instance, Julian Day 7 arrived at noon on January 8, 4713 BC.

There were no years in the Julian Day system, but Julian Days could be used to calculate the proleptic Julian calendar, in which every fourth year would be treated as a leap year.  Because there was no Year Zero in the Julian calendar, Scaliger's first year of 4713 BC was a leap year.  (That is, in a system that had a Year Zero between 1 BC and 1 AD, 4713 BC would have been called 4712 BC.)  The resulting calculations produced Julian dates, in the spreadsheet, that were consistent with those reached by John Herschel in his Outlines of Astronomy (1849, p. 595).  Specifically, January 1, 4004 BC was Julian Day 258,963; the destruction of Solomon's Temple (which Herschel put on May 1, 1015 BC) was on Julian Day 1,350,815; and Rome's founding (which Herschel put at April 22, 753 BC) was on Julian Day 1,446,502.  Moving into the million-day period beginning on January 1, 500 BC (Julian Day 1,538,799), the spreadsheet matched Herschel's calculation that the Julian calendar reformation of January 1, 45 BC occurred on Julian Day 1,704,987; the Islamic Hijra calendar began on Julian Day 1,948,439 (July 15, 622 AD); and the official last day of the Julian calendar (October 4, 1582) was Julian Day 2,299,160.  It tentatively seemed that the spreadsheet's Julian calendar portion was accurate.

I also added Day of Week calculations to the spreadsheet, beginning with the common assertion that January 1, 4713 BC was a Monday (in, implicitly, the proleptic Julian calendar).  For the dates cited in the preceding paragraph, these calculations indicated that January 1, 4004 BC was a Saturday; May 1, 1015 BC was a Friday; April 22, 753 BC was a Tuesday; January 1, 500 BC was a Thursday; January 1, 45 BC was a Friday; July 15, 622 AD was a Thursday; and October 4, 1582 was a Thursday.  Further, I extended the Julian calendar beyond its official end to Thursday, November 7, 2238 AD (Julian Day 2,538,798).  According to the spreadsheet (and also the Julian Day arithmetic, i.e., Julian Day 2,538,798 minus Julian Day 1,538,799), that was the millionth day (inclusive) from Thursday, January 1, 500 BC.  These particular Julian Day numbers and day-of-the-week calculations matched the values produced by an online date calculator appearing on a NASA webpage.  It tentatively seemed that the spreadsheet's Julian Day calculations were corresponding accurately with Julian calendar dates.

Next, I produced a proleptic Gregorian calendar in the million-day calendar, adjacent to the Julian calculations.  The starting point for this calendar's calculations was its commonly recognized starting date of Friday, October 15, 1582.  As noted above, the preceding day of October 14 on the Gregorian calendar (G) (if such a date had officially existed on that calendar) would have been Thursday, October 4 on the Julian calendar (J).  So the spreadsheet's presentation of Julian and proleptic Gregorian dates had to match up on the row containing the values of October 14 (G) and October 4 (J).  That is, both had to have the same Julian Day value of 2,299,160.  From October 14, 1582, I extended the Gregorian calendar back to January 1, 500 BC.  I decided not to extend this proleptic Gregorian calendar back into the period before 500 BC, though there were situations in which such an extension might have been useful.

There were some interesting things in the relationship between the proleptic Gregorian calendar and the Julian calendar.  At the starting point in the 16th century AD, the Gregorian dates were later than the Julian.  As just noted, October 14, 1582 (G) was equivalent to October 4, 1582 (J).  The Gregorian allowed fewer leap years, so the difference between it and the Julian began to narrow with each additional century (except for those evenly divisible by 400), going back in time.  The ten-day difference of 1582 thus became a nine-day difference on the first previous day when the formulas for the two calendars differed:  there was no February 29, 1500 (G), but there was a February 29, 1500 (J).  By the time one arrived back at the third century AD, the difference between the two calendars vanished.  That is, as noted by Peter Meyer, the two calendars had exactly the same dates from March 1, 200 AD to February 28, 300 AD.  This was no coincidence.  Gregory had designed his reform so that Easter would occur at about the same time as it had occurred in 325 AD, when the Council of Nicea (also spelled Nicaea) discussed such matters.  So during the century ending on February 28, 300 AD (J), both calendars showed the same dates (e.g., February 1, 300 (J) = February 1, 300 (G), and both are Julian Day 1,830,664).  Before the third century, the Gregorian calendar predated the Julian by progressively larger amounts, until January 1, 500 BC (J) would be represented as December 27, 501 BC (G).  Going back still farther, dates on the Julian calendar would continue to fall three days later every 400 years, so that January 1, 4713 BC (J) would arrive a month earlier on the Gregorian, in late November 4714 BC.  On the other extreme, in the centuries following 1582 AD, the Gregorian dates became progressively later than those of the extended Julian, until November 7, 2238 AD (J) was equivalent to November 22, 2238 (G).

I checked the foregoing dates and days of the week using another online calculator as well, produced by Fourmilab Switzerland.  I began by entering Julian Day numbers and then seeing what results this calculator would produce for Julian and Gregorian calendar dates.  This calculator took the approach of inserting a Year Zero in the proleptic Gregorian calendar, so its statement of BC dates differed from the values shown in the spreadsheet by one year.  For example, the Fourmilab calculator indicated that January 1, 45 BC (J) was equal to December 30, 45 BC (G), whereas the spreadsheet would put the latter as December 30, 46 BC (G).  Fourmilab's approach seemed incorrect in this regard.  For mathematical purposes (as in e.g., the ISO 8601 approach, below), there would need to be a Year Zero; but the historical reality seemed to be that proleptic calculations in both Julian and Gregorian calendars did not have a year zero.  Fourmilab was not alone here; the conflation of mathematical consistency with historical fact had evidently produced some confusion in other computing situations as well.  At any rate, after adjusting for that divergence in BC years, the results of the Fourmilab calculator did match up with those yielded by the spreadsheet and the NASA calculator.  This calculator and the spreadsheet also agreed that February 1, 200 AD (G) was Julian Day 1,794,140 and was also February 2, 200 AD (J).  (The NASA calculator did not do proleptic Gregorian calculations.)

I looked at one other online calculator, produced by CSGNetwork.  I did not attempt a redundant comparison against all of the dates listed above.  Instead, I focused on the especially problematic period of the first two centuries AD.  In that timeframe, the CSGNetwork calculator seemed to be in error.  Specifically, a "Calendar Date Entry" of January 1, 1 AD yielded Julian Day 1,721,425.5.  The NASA and Fourmilab calculators and the spreadsheet agreed that January 1, 1 AD (J) should rather be Julian Day 1,721,423.5 or 1,721,424.  So if "Calendar Date Entry" in the CSGNetwork calendar was intended to refer to a Julian calendar date, its Julian Day output was incorrect.  It did not appear that the calendar intended to refer, rather, to a Gregorian calendar date of January 1, 1 AD, because it then stated that its Julian Day value of 1,721,425.5 was equivalent to January 3, 1 AD (G).  In that latter regard, it was correct.

To some unknown extent, online calculators presumably used formulas that had been devised to facilitate date calculations.  For example, Bill Jefferys presented a formula for converting Julian Days (and, perhaps, dates on the Julian calendar) to the proleptic Gregorian calendar, but indicated that it would be inaccurate before 1582, and especially for years before 400 AD.  Paul Dohrman offered a procedure for converting Julian to Gregorian, and J.R. Kambak offered one for conversions from Gregorian calendar dates to Julian Days.  Dohrman's approach, as I understood it, required these steps:
  1. Truncate to centuries (e.g., 622 AD becomes 6).  In the case of BC dates, treat them as negatives and start by subtracting a year first (e.g., 499 BC becomes -500, which becomes -5).  This calculation produces X.
  2. Calculate 0.75X minus 1.25.  So 622 AD » 6 » 3.25 (using » as shorthand for "becomes"), and 499 BC » -5 » -5.
  3. Truncate decimal points.  So 622 AD » 6 » 3.25 » 3.  This is the number of days to add to the Julian date to find the Gregorian.
This procedure produced some results consistent with the spreadsheet and the Fourmilab calculator, converting July 15, 622 AD (J) to July 18, 622 AD (G), and January 1, 500 BC (J) to December 27, 501 BC (G) (after Year Zero adjustment).  This procedure did not seem to work in the first two centuries AD, however.  For example, in the case of July 1, 1 AD (J), Dohrman's approach seemed to yield the incorrect value of June 30 (i.e., century 0 * 0.75 – 1.25) rather than June 29 (G).

There also seemed to be a problem with Kambak's long formula for converting Gregorian dates to Julian Days.  It is possible that I did not copy or interpret that formula correctly.  The version that I tested was as follows, where Y = Gregorian year, M = Gregorian month, D = Gregorian day, and JD = Julian Day:
JD = 367Y – 7(Y+(M+9)/12)/4 – 3((Y+(M–9)/7)/100+1)/4 + 275M/9 + D + 1721029
As I translated this into an Excel formula (placed into cell D2), it read as follows (assuming the values of Y, M, and D were entered into cells A2 through C2, respectively):
=367*A2-7*(A2+(B2+9)/12)/4-3*((A2+(B2-9)/7)/100+1)/4+275*(B2/9)+C2+1721029
That formula's results varied from those produced by the Fourmilab calculator for certain dates checked above, such as July 1, 1 AD (G) and October 14, 1582 (G).  The variance in these instances was very small, however.  Specifically, the values for those two dates produced by the formula and the Fourmilab calculator were 1,721,606 vs. 1,721,606.5, respectively (for July 1, 1 AD (G)) and 2,299,159 vs. 2,299,159.5, respectively (for October 14, 1582 AD (G)).  That is, the Fourmilab calculator exceeded the formula's output by only 0.5 day in each case.  Unfortunately, this variation was not consistent.  For July 15, 622 AD (G), the Fourmilab calculator produced a value of 1,948,435.5, which was 0.5 day smaller than the Julian Day value of 1,948,436 produced by the formula.  Moreover, for November 22, 2238 (G), the Fourmilab calculator's output of 2,538,797.5 was 1.5 days larger than the figure of 2,538,796 produced by the formula.  In each of these several instances, the spreadsheet agreed, again, with the results produced by the Fourmilab calculator, after rounding the latter's 0.5-day output upward.  It appeared, in short, that this formula was very close but not entirely accurate.

By this point, checking of the spreadsheet had begun to transition into critiques of the ways in which various calculators and other tools had interpreted and applied various sources (e.g., Tantzen, 1960). I took this as a preliminary indication of the potential usefulness of the million-day spreadsheet, at least where an explicit presentation of dates might facilitate visualization of calendar developments.  While further usage and testing would be helpful in identifying points at which errors might have crept into the spreadsheet, it did preliminarily appear that the spreadsheet could provide a useful tool for date calculations and conversions.

The ISO 8601 Refinement

I developed the Gregorian section of the spreadsheet in one additional way. The International Organization for Standardization (ISO) had produced a standard prescription (known as ISO 8601) for calculating dates.  This prescription appeared likely to be useful for a variety of purposes, so the spreadsheet contains a column devoted to it.

The ISO 8601 standard adopted Gregorian date numbers. One effect of the standard, for present purposes, was to prescribe standard ways of representing dates. There was a YYYY-DDD ordinal date option, which used the day of the year, where day 366 would have a value only in leap years (e.g., 2012-366 = December 31, 2012).  In the spreadsheet, I used the year-month-day format (e.g., 2012-05-06 = May 6, 2012). ISO year values were ordinarily displayed with four characters (e.g., padded with leading zeros in 0023 rather than 23) for consistency.

A second effect of ISO 8601 stemmed from its adoption of a Year Zero, with apparently the same effect as what was sometimes called astronomical year numbering.  In this approach, before the epoch of 1 AD, the absolute value of the ISO year was one less than the traditional year (e.g., ISO year 0000 = 1 BC; ISO year –0001 = 2 BC). So the million-day calendar started on ISO date -0500-12-27 (i.e., December 27, 501 BC (G)). The numerical approach of ISO 8601, using minus signs instead of "BC" and likewise dispensing with "AD," had the advantage of avoiding controversy regarding the use of those two traditional modifiers.  The Fourmilab calculator (above) appeared to be implementing an ISO 8601 approach in its calculation of BC dates.

With the Gregorian calendar presented in ISO format, it would have been possible to apply another kind of check to the spreadsheet's day-of-the-week column.  This check would have used what was known as the Doomsday technique.  That technique, useful for quickly calculating the day of the week for a given date, seemed unnecessarily complicated within the million-day calendar spreadsheet, where one could simply use the Julian Day.  That is, since Julian Day 0 occurred at noon on Monday, January 1, 4713 BC, every Julian Day evenly divisible by 7 would be a Monday.  This way of calculating the day of the week, for a given date on the Gregorian calendar, seemed to produce the same results as I had calculated by using a formula that copied, into each day-of-week cell, the name of the day that appeared in the 7th preceding row.

Official and Local Calendars

As previously noted, Gregory intended that the last day of the Julian calendar (October 4, 1582) would be followed by the first day of the Gregorian calendar (October 15, 1582).  That intention was followed in a number of countries and, at this writing, was implemented in various online calculators (e.g., those appearing on U.S. Naval Observatory and NASA webpages).  It appeared that 1582 was the most plausible candidate for the year in which the world converted from the Julian to Gregorian calendars.  In short, this combination of proleptic Julian (to 4 AD), Julian (from 4 AD to 1582 AD), and Gregorian (since 1582 AD) appeared to form the most credible version of the world's official calendar.  The spreadsheet thus expresses what appears to be the Official Calendar that I had sought at the outset.

Some remarks appearing in preceding paragraphs have already acknowledged certain aspects of that de facto official calendar.  For one thing, the concept of the Julian Day was built from a starting date calculated according to Julian reckoning, but came to serve as a means of cross-reference between the Julian and the later Gregorian calendars.  So the spreadsheet column that presents the Julian Day number corresponding to a particular day on the Julian or Gregorian calendar does not belong solely within either the Julian or Gregorian sections of the calendar.  Rather, it seemed to be best presented in the spreadsheet's Official Calendar section.

Likewise, a given date would be a Monday, or a Tuesday, or some other day of the week, regardless of the date number given to it on the Julian or Gregorian calendars.  So it would have been redundant to present separate day-of-week columns in each of those calendars' parts of the spreadsheet.  Instead, the day of the week appears just once, in the Official Calendar section.

That section also presents the official date, in two different formats.  First is the traditional format, using BC or AD indicators of era.  These traditional dates are provided in the somewhat condensed but still recognizable YYYY-MM-DD form.  As such, their components (e.g., the number of the month) are accessible for further date calculations, as users may desire, with the aid of Excel text functions (e.g., MID, FIND).  The column presenting the Official Date in Traditional Format is thus the specific statement of the Official Calendar in approximately the form that now appears to be used by most people.

Second, the spreadsheet also presents the official date in ISO format -- specifically, with minus signs and a Year Zero, modifying the traditional presentation.  To emphasize, this is the official date.  It uses the Julian calendar for dates before October 15, 1582, and therefore is not the ISO 8601 date.  It is simply an indication of how the traditional, official date looks when stated in ISO style for purposes of numeric calculations.

As noted above, substantial portions of the world did not adopt the Gregorian reforms in 1582.  The spreadsheet is adaptable for purposes of developing localized versions that may accommodate reforms implemented in later years.  In the process of preparing this post, I also found a useful calendar with local customizations at TimeAndDate.com, though a brief look suggested the presence of inaccuracies like those identified in other calculators (above).

Uses of the Million-Day Calendar

This post has explained the creation of a million-day calendar covering the period from 500 BC to 2238 AD.  That calendar is provided in spreadsheet format, one row per day.

This spreadsheet format seems to have facilitated identification of potential errors in certain tools designed to assist in use of, and interactions between, the Julian and Gregorian calendars as well as the Julian Day and ISO 8601 date systems.  It may prove useful in other contexts calling for calculations, demonstrations, or cross-comparisons among calendars and systems, including some that users may add.

The spreadsheet presentation may also be useful in less technical, more data-oriented applications.  Within the limits of computing power and spreadsheet capacity, there may be tasks that call for an ability to add columns of information, to be filled at a rate of one item per day (or week, or other time period).  For instance, at this writing, I would like to find a database (if one exists) that would show something like the leading headline of the day -- the sort of thing that one might expect to find on the front page of the New York Times, for instance, if that newspaper had existed on the day of the Battle of Hastings.  If no such database exists, perhaps this spreadsheet, shared among a number of potential contributors, could help to bring about its existence.

Monday, May 21, 2012

Western Digital: Let the Buyer Beware

I bought a new Western Digital (WD) hard drive.  The drive's label indicated a date (of manufacture, presumably) of February 29, 2012.  It was supposed to have a five-year warranty.  And perhaps it did.  But when I went to WD's Warranty Check site and entered the drive's serial number, it indicated that the warranty would expire on July 11, 2012.  That would be a warranty of exactly five months, not five years.

I wanted to ask WD whether this was an error in their warranty check page, or whether perhaps the merchants selling such drives were deceived as to the actual duration of the warranty.  Unfortunately, WD offered no way to do so.  I spent 20 minutes screwing around in their website, trying various possibilities. 

For one thing, I had to create an account, which I was willing to do, though it seemed unnecessary.  Also, the Support link at the top of their webpage took me to a Service and Support webpage, where I tried several possibilities.  The Warranty & RMA Services link on that page led to an End User Customer page that, unfortunately, provided no way to make contact other than those appearing on the Service and Support webpage.  Specifically, the Contact WD link at the bottoms of these pages led to the same phone and email support webpages as were available on the Service and Support page.

Between those two, the email option led to a page that promised an opportunity to ask a question if I was just willing to Continue to WD Support Portal.  But that was false; there was no opportunity to ask a question on the resulting Manage Your Account page.  Meanwhile, the phone option led into a voice tree that provided no option for asking an actual question.

This was all very time-consuming and frustrating.  I appreciate that WD can make more money if it can force everyone to find answers to nonstandard questions somewhere else.  The exception I would point out is that WD will make less money if those nonstandard questions, or their handling, have to do with the purchase decision.  Specifically, (a) I expect to see a five-year warranty when I am promised one, and (b) if there is an error or falsification on that point, I expect to be able to find that out before, not after, buying a drive from WD.  Otherwise, at a certain point, I would fear a run-around and a possible nasty surprise.  No consumer wants that.

In my case, I don't have a defective drive.  So the standard RMA procedure is not applicable.  I have a drive that I am trying to sell.  I want to be able to assure my potential buyers that the drive is under warranty.  WD is not giving me that assurance.  Had I been aware of this sort of problem, I would not have bought this drive.  It may be great, and may last five years.  Or it may not.  In the latter case, I want warranty coverage, not a hassle.

I will have to discount my drive by some amount in order to overcome reasonable worries of potential buyers.  So in my case, it was a mistake to buy a Western Digital drive, counting on its resale value.  Two months from now -- by which time WD may have sorted out its warranty portal -- the situation may be different.  I'll consider that possibility if I feel like buying another drive from WD then.

Tuesday, May 8, 2012

Creating a Bootable Windows 7 USB Drive for Installation / System Repair / Recovery - First Cut

Normally, if I booted a computer from a Windows 7 installation DVD, I could get into System Recovery Options (e.g., Startup Repair, System Restore, Command Prompt) that would let me run various diagnostics.  Unfortunately, my laptop did not have a CD/DVD drive.  So if I wanted to see those Windows 7 startup repair options, it seemed that I would have to find a way to do so by booting the computer from a USB flash drive instead.  This post describes the steps I took to develop a USB drive that would give me those options.  It also incidentally describes how to make a bootable copy of the Windows 7 installation DVD on a USB drive.

One approach was to put the entire Windows 7 installation DVD on a USB drive.  The DVD contained about 3GB of material, so this would require a USB drive of 4GB or larger.  Another approach was to put just a Windows 7 System Repair or Recovery CD on a USB drive.  This would require only about 150MB, so I could use a smaller, older, cheaper, or otherwise unused USB flash drive.  The Recovery CD option might load faster than a full Windows CD, but it would not be useful for installation or for recovering system files.

Either way, the first step was to get the necessary files.  The Windows installation files would traditionally be purchased on a DVD, but it was also possible to download them.  Similarly, the Windows 7 System Repair Disc was ordinarily a CD, but it could be copied or converted to files on a hard drive.

To get a System Repair Disc, I had to search my computer for "system repair disc."  That didn't work in my case -- I must have renamed the relevant shortcut -- so I searched for various combinations of "create," "system," "repair," and "recovery."  I could also have used Control Panel > Backup and Restore > Create a system repair disc.  The option of downloading the file(s) needed for a system repair CD was apparently disappearing.  In any case, eventually I found and used the link to a little Windows 7 program whose title bar read simply, "Create a system repair disc."  This created the recovery CD.

Next, the files that weren't already in ISO format needed to be converted to ISO.  The downloaded versions of Windows 7 evidently came in ISO format.  By contrast, the installation DVD and the recovery CD were not in ISO format.  To convert them to ISO, I started by using Magic ISO Maker.  It warned me that it would not create an ISO larger than 300MB, but this seemed to be a bluff to motivate an immediate purchase.  Format Factory would apparently have been one among many freeware alternatives.  When I remembered that ImgBurn would create ISOs from files or discs, however, I deleted the Magic ISO output and used ImgBurn instead, since it had worked well for me in other sorts of projects in the past.

Once I had an ISO, I had a choice between two different approaches to get it properly unpacked and operational on the USB drive.  A dedicated USB drive would focus solely on one version of Windows 7 (e.g., 32-bit vs. 64-bit, Home vs. Ultimate).  This dedicated approach seemed likely to be relatively simple and reliable, and would probably be all that most users would need.  By contrast, a multiboot USB drive would allow the user to install and/or run two or more different operating systems (potentially including e.g., Windows XP and Linux).  I decided to go with the dedicated, single-system approach.

I started with the Windows 7 system recovery CD, which ImgBurn had now converted to a file I called Win7SysRepair.iso.  There seemed to be several ways to put this ISO onto a bootable USB drive.  One approach involved using Grub4DOS.  Another was to use Microsoft's Windows 7 USB/DVD Download Tool.  I ran that Tool.  It called for a few simple steps.  First, I plugged in the little 512MB USB flash drive on which I was going to install the Windows 7 system recovery CD files.  Then I pointed the Download Tool toward the newly created Win7SysRepair.iso.  I clicked the USB Device button, and the Tool found the USB drive.  I clicked Begin Copying and confirmed that it was OK to erase the USB drive.  The tool said, "Creating bootable USB device."  The first time I tried it, it failed, with this error message:

We were unable to copy your files.  Please check your USB device and the selected ISO file and try again.
I assumed this was due to interference from AntiRun, which I was using to keep an eye on USB drives.  I shut down AntiRun and tried again.  But no, the Tool failed the second time too.  To troubleshoot this problem, I ran a search and saw that this was a rare error.

The problem seemed to be that the Tool was formatting the USB drive as NTFS.  I thought the solution would be to go to Start > Run > diskmgmt.msc and quick-reformat the USB drive with a FAT32 file system (using a volume label of no more than eight characters).  But I still got the same error.  Another source said the problem was that the Microsoft programs (diskmgmt.msc and also the Tool) failed to use the Clean command.  In other words, my USB stick had residual formatting from some previous use.

The advice was to fix this problem by opening a command window with Administrator rights and type "diskpart" at the prompt.  This started the DiskPart program, with its own DISKPART> prompt.  The next step was to type "list disk" to see what drives were connected to the computer.  This showed me that, as expected, the last disk was the smallest:  491MB.  That was surely my USB drive.  (It seemed pretty important not to be reformatting the wrong drive.)  That 491MB drive was Disk 2.  So I typed "select disk 2."  It informed me that Disk 2 was now selected.  I typed "list disk" again to check and, sure enough, there was an asterisk next to Disk 2.  So I was ready to type "clean."  It said, "DiskPart succeeded in cleaning the disk."  With that done, I could type these remaining commands in DiskPart, one at a time:
create partition primary
select partition 1
active
format quick fs=fat32
assign
exit
I exited the command window and tried Microsoft's Windows 7 USB/DVD Download Tool again.  It still failed.  I tried again, this time using a different USB drive.  This time was even worse:  previously, it had failed at the 99% mark, but with this drive the copying process didn't even start.  I tried using an ISO built from a System Recovery CD created on another computer, running a different version of Windows.  But the Windows Download Tool said this:
Invalid ISO File

The selected file is not a valid ISO file.  Please select a valid ISO file and try again.
I got that error twice, with ISOs created by ImgBurn and also by Magic ISO Maker.  It was time to give up on the Microsoft Download Tool, reformat the USB drive, and try another approach.

I went back to look at the Grub4DOS approach mentioned above.  I wouldn't be using it to install multiple bootable operating systems on my little 512MB USB flash drive, but it looked like a straightforward process anyway; I figured maybe the education would come in handy later.  For this approach, I needed to download and install MultibootISO.  I found what appeared to be a popular, current version of this program on a Pendrivelinux webpage.

On closer inspection, what we downloading was now called YUMI (short for Your Universal Multiboot Installer).  YUMI was apparently a successor to both MultibootISO and Universal USB Installer.  YUMI was portable; no installation required.  YUMI didn't have a built-in option for installing Windows 7.  I got the feeling that YUMI was not going to replace MultibootISO for this particular task.  Nonetheless, I tried.  In YUMI, I selected "Try an Unlisted ISO."  YUMI didn't complain that the ISO was invalid.  It seemed to think it had succeeded.  Sadly, the USB drive wasn't bootable, at least not in the laptop where I tried it.  I tried again and, whoa, success!  Apparently I had just not hit Esc quickly enough to bring up my laptop's bootable USB drive menu when the laptop was first starting up, or maybe I had hit Esc too many times and escaped my way right out of that menu.  But now, on this second go, YUMI gave me the Windows 7 recovery CD functionality, running from my USB drive.

Well.  This YUMI thing was pretty cool.  When I started this post, I thought I would just be content with the Windows 7 installation DVD. For that purpose, my spare 4GB USB flash drive was sufficient.  But now I wanted to try YUMI with a large USB drive that would accommodate the Windows 7 installation DVD as well as other operating systems and other bootable CDs.  But this would have to await purchase of a 16GB or larger USB flash drive.

Monday, May 7, 2012

Blocking Unwanted Websites from Google Searches

I had been using the Optimize Google add-on in Firefox, and the Search Engine Blacklist extension in Chrome, to keep unwanted websites from clogging up my searches.  I especially like the approach of Optimize Google:  it would still show the sites that I was blocking, but they would be greyed-out.  This made it easier to keep an eye on what I was blocking.  I didn't want to be accidentally preventing myself from seeing good sites.

Unfortunately -- in Firefox 8 and 10, as distinct from Firefox 3 -- OptimizeGoogle was messing up my Google search tools.  I noticed it particularly in my ability to select a date range for a search.  Meanwhile, in Chrome, I eventually found that Google offered an option to Manage Blocked Sites -- but in that option, as in Search Engine Blacklist, I had to enter each website manually, rather than just pasting in a list from a file, as Optimize Google would let me do.  Even if these extensions had worked perfectly and easily, they still didn't help with other browsers (e.g., Internet Explorer, Opera).

I found a Computrick webpage that told me I could block out websites for all browsers in one move.  The recommended approach was to open the Hosts file in C:\Windows\System32\drivers\etc. (elsewhere in earlier versions of Windows).  (It would apparently take administrator privileges to modify that file.)  The modification was to add lines like this to the end of that Hosts file:

127.0.0.1 localhost
127.0.0.1 www.badsite.com
127.0.0.1 badsite.com
In other words, each line would begin with "127.0.0.1" followed by at least one space, and then the brief URL of the unwanted website.  As that example shows, I was supposed to enter the site's URL both with and without the "www" prefix.  The first line, 127.0.0.1, was apparently required to make the process work.  It was already there in Hosts, but it was commented out (i.e., its line was prefixed with a # symbol, making it nonoperational).  So I removed the # symbol before that line to make it work, and likewise for the next line, the last one that was already there when I first opened Hosts:  ::1.

I still had the list of sites that I tended to paste into Optimize Google's filter list.  I pasted that list into Excel, worked up formulas to produce the two versions suggested above (i.e., with and without the "www"), pasted those into Hosts, and saved it.  As soon as I did that, Microsoft Security Essentials (my antivirus program) popped up a message warning of a potential security threat.  I told it to Allow this instance.

I had decided to look into this website-blocking issue because I had just done a Google search that was plagued by sites I did not want to see.  I suspected that some of them were probably on that long list of unwanted sites that I had developed.  So now I hit F5 to refresh that search.  It didn't look like anything had changed.  I was looking particularly at one website that I knew I didn't want to see in my search results.  I made a point of adding it to Hosts, and I saved Hosts again.  Then I hit F5 to refresh the webpage again.  The unwanted site was still listed.

Further investigation led to an MVPS.org post that said Hosts was loaded at startup.  So apparently I would have to reboot the system for it to take effect.  This would be another drawback in comparison against a tool like Optimize Google, which could be fine-tuned on the fly to keep removing unwanted sites until my search results would be as pure as the driven snow.  OptimizeGoogle would also let me designate subparts of a website (e.g., www.GoodSite.org/CrappyDownloads).

I was thinking it would be great if someone had cooked up a list of suspicious or unwanted websites.  Presto!  Ask, and it shall be given unto you.  Someonewhocares posted a truly monumental Hosts file, free for the copying.  I was pretty sure that, if I pasted their Hosts file into mine, I would not be troubled by contacts from anyone anymore.  I was intrigued, but I decided that I would work into this thing more gradually, starting with my modest list of a hundred or so sites that I wanted to block.

While browsing this subject, I came across a long discussion that looked like it would have been interesting, if I had been interested.  One topic that did have potential had to do with sites like ad.doubleclick.net.  It sounded like my browsing might speed up if sites like that were not bogging down my system.  I went back to the Someonewhocares.org webpage and searched for ad.doubleclick.  They did have a few lines for websites like that.  But their comments indicated that blocking those sorts of ad sites could mess up some retailers' websites (e.g., Sears) and also Google itself.  I decided not to proceed further with that issue at this point.

I saved my newly revised Hosts file.  I also saved a copy of it to another drive, to be protected in case I had to reinstall Windows for some reason.  When I rebooted, I took another look at that most recent Google search.  No effect:  the unwanted sites were still there.  That was in Opera.  I tried the same search in Firefox, Internet Explorer, and Chrome.  Same result:  nothing had been filtered.  I looked again at C:\Windows\System32\drivers\etc\hosts.  It was still there, and it still contained the list of unwanted websites.

That was as far as I got with this project at this point.

Friday, May 4, 2012

Uploading Videos to YouTube without Letterboxes (Big Black Borders)

I was doing some occasional video editing, mostly using Adobe Premiere Elements.  Now and then, when I would upload a video to YouTube, it would have a letterbox.  That is, the video would be encased within a wide black rectangle.  This made the video smaller.  I didn't want it.  Getting rid of it was not easy.  This post presents a few notes in that direction.

One suggestion I encountered in a few different discussion threads was to add a certain tag to the Tags field available for each YouTube video.  The recommended tag was yt:crop=16:9.  I was supposed to type that into the Tags field, presumably separated from other tags by a comma, and that was supposed to expand the video.  I did find that this worked with one video, where I inserted that tag while the video was still uploading.  I found that it did not work with two other videos that were previoulsy uploaded, where I added the tag after the fact.  An alternative was to use yt:stretch=16:9.  That, too, failed with the previously uploaded videos.

I came across some Adobe suggestions on adjusting pixel and frame aspect ratios.  They didn't look terribly technical, and at some point I realized I would probably benefit from understanding them.

In the meantime, however, I found that I could sometimes resolve the issue by producing my video, in Premiere Elements, as an AVI, and then importing that AVI back into a new project.  Sometimes the re-importation would provoke Premiere Elements to ask if I wanted to correct my aspect ratio.  I didn't seem to lose any quality from this step, it didn't seem to require much additional time, and the MP4 that I would then output as my final project seemed to upload and display on YouTube without any letterbox problems.