A List of 377 Common, Relatively Insignificant Words - First Cut
Any word can have significance. I needed a list of words that would be least likely to have much significance in most situations.
I assembled this list as an aid in identifying words that would be relatively unlikely to provide insight into the contents of a file, when used in the name of that file. I hoped that elimination of these words would make it easier to notice important emphases among the words that remained. This blog should have a post on that project at about the same time as this post.
I assembled this list from several sources. I started with the list of the 100 most common words in English -- a list that, according to Wikipedia, was computed from an analysis of the billion-word Oxford English Corpus. I reduced that list to remove about 20 words that, although common, seemed especially capable of having significance even in a filename or other brief communication (e.g., "person," "time," "know").
Among the remaining words on that list, I expanded some to include other forms in normal usage. For example, along with "be" as the second most common word in English, I added "is," "am," "are," "was," "were," and "been." Some such alternate forms were already on the list (e.g., "we" and "us"); others may have been much less frequently used. I added some common words that were substantially similar to, or that would arise from combinations of, words on the list (e.g., "whenever," "wouldn't"). I also expanded the several numerical words on the list to include all single-word cardinal and ordinal numbers up to "ninety" and "ninetieth," along with other counting adjectives (e.g., "many").
From that start with the list of 100 most common words, I turned to certain specific types of words. I focused on commonly recognized parts of speech, particularly conjunctions (e.g., "but"), pronouns (e.g., "she"), conjunctive adverbs (e.g., "however"), and prepositions (e.g., "across"). There was a lot of overlap; lists of these kinds of words expanded but also tended to confirm what was already on the list. I added some relatively generic adverbs (e.g., "actually") and adjectives (e.g., "actual"). I also drew from Wikipedia's lists of the first and second hundred English basic words, and threw in some other frequently used words of relatively minor significance (e.g., "evidently").
These steps produced the following list of words that I intended to apply to my project.
a |
aboard |
about |
above |
according |
accordingly |
across |
actual |
actually |
additional |
additionally |
after |
again |
against |
all |
almost |
along |
alongside |
also |
although |
always |
am |
amid |
amidst |
among |
amongst |
an |
and |
another |
anti |
any |
anybody |
anyone |
anything |
anyway |
apparent |
apparently |
are |
around |
as |
astride |
at |
atop |
away |
barring |
be |
because |
been |
before |
behind |
below |
beneath |
beside |
besides |
between |
beyond |
both |
but |
by |
can |
cannot |
can't |
certain |
certainly |
circa |
clear |
clearly |
commonly |
comparable |
comparative |
comparatively |
concerning |
consequent |
consequently |
considering |
contrarily |
conversely |
could |
couldn't |
cum |
despite |
did |
didn't |
different |
do |
does |
doesn't |
done |
down |
during |
each |
eight |
eighteen |
eighteenth |
eighth |
eightieth |
eighty |
either |
eleven |
eleventh |
elsewhere |
equally |
especially |
even |
every |
everybody |
everyone |
everything |
evident |
evidently |
except |
excepting |
excluding |
few |
fifteen |
fifteenth |
fifth |
fiftieth |
fifty |
finally |
first |
five |
following |
for |
fortieth |
forty |
four |
fourteen |
fourteenth |
fourth |
from |
further |
furthermore |
generally |
get |
gets |
getting |
go |
going |
gone |
got |
had |
has |
have |
he |
hence |
henceforth |
her |
here |
hers |
herself |
him |
himself |
his |
honestly |
how |
however |
I |
if |
I'll |
I'm |
important |
in |
incidentally |
including |
inside |
instead |
into |
is |
isn't |
it |
its |
it's |
itself |
I've |
just |
less |
likely |
likewise |
little |
many |
may |
me |
meanwhile |
might |
mine |
minus |
more |
moreover |
most |
much |
must |
my |
myself |
namely |
near |
nearly |
neither |
never |
nevertheless |
next |
nine |
nineteen |
nineteenth |
ninetieth |
ninety |
ninth |
no |
nobody |
none |
nonetheless |
nor |
not |
nothing |
notwithstanding |
now |
of |
off |
often |
on |
once |
one |
only |
onto |
or |
other |
others |
otherwise |
our |
ours |
ourselves |
out |
outside |
over |
particular |
particularly |
per |
plus |
prior |
provided |
rather |
re |
really |
regard |
regarding |
regardless |
relatively |
same |
seem |
seemingly |
seems |
seven |
seventeen |
seventeenth |
seventh |
seventieth |
seventy |
several |
she |
should |
similar |
similarly |
since |
six |
sixteen |
sixteenth |
sixth |
sixtieth |
sixty |
small |
so |
some |
somebody |
someone |
something |
soon |
specific |
specific |
specifically |
still |
subsequent |
subsequently |
such |
ten |
tenth |
than |
that |
the |
their |
theirs |
them |
themselves |
then |
there |
thereafter |
therefore |
these |
they |
third |
thirteen |
thirteenth |
thirtieth |
thirty |
this |
those |
though |
three |
through |
throughout |
thru |
thus |
till |
to |
together |
too |
toward |
towards |
truly |
twelfth |
twelve |
twentieth |
twenty |
twice |
two |
ultimately |
under |
underneath |
undoubtedly |
unless |
unlike |
until |
up |
upon |
us |
versus |
very |
via |
vis-a-vis |
vs |
was |
way |
we |
well |
went |
were |
what |
whatever |
when |
whenever |
where |
whereas |
wherever |
whether |
which |
whichever |
while |
who |
whoever |
whom |
whomever |
whose |
why |
will |
with |
within |
without |
worse |
worst |
would |
wouldn't |
yes |
yet |
you |
your |
yours |
yourself |
yourselves |
2 comments:
Depending on the situation, it appeared that I could have added some other kinds of items to the list. One such category: numerical (as distinct from verbal) numbers (1, 2, ... and 1st, 2nd, ... ) up to some point. Another possibility: years and months. Also common terms for filenames (e.g., "spreadsheet" and variations on "email" and "doc").
The project in which I attempted to use this list was one that sought to identify multiword phrases.
Post a Comment