|S,p.a,m| Saga

 

After starting this Weblog in January 2015 and as a new user of WordPress, I tried to customize the settings that would bring pertinent comments to this blog site and discourage trolling, tangential, self-serving, generalized, content-praising, off-topic messages.

In February, I published two posts, one at the beginning of February, and one on February 19. The first post had a Greek letter in the title. The second post had all English words.

Prior to February 19, what was merely a low single digit daily trickle of comments turned into a deluge starting on February 21. Comments were made not only to my blog posts – one of which was WordPress’ “Hello, World” – but also to my “About …” page and a Sample Page. I thought it amusing that the commentary on these pages praised with the same fervor as my actual blog posts.

From February 21 through February 25, 2015, I received nearly 800 comment |S,p.a,m| messages. I was curious to see if I could analyze these messages further beyond their individual hooks to communicate back.

 [1] WHEN: Date and Time

I noticed that they arrived in clumps at differing times each day. So I created an overlapping frequency distribution using 24 hourly intervals and superimposed 4 days, 7 hours of Comment Email notifications based on arrival time (Universal Time) for moderation. It looks like this [Click to remove vaseline and enlarge]:



Frequency Distributions
The following table shows some descriptive characteristics:

Characteristic  # Emails Date  Time Interval(s)
Maximum
13  2-21-2015 8-9PM
32 2-22-2015 5-6AM
13 2-23-2015 11AM-12PM
46 2-24-2015 1-2PM, 2-3PM
24 2-25-2015 1-2AM
Total Messages
65  2-21-2015 6AM-Midnight
212  2-22-2015
116  2-23-2015
312  2-24-2015
75 2-25-2015 Midnight-7AM
Peak Arrival(s)
46  2-21-2015 5-10PM
118  2-22-2015 4AM-12PM
61  2-23-2015 8AM-3PM
190  2-24-2015 10AM-5PM
143  2-24-2015 & 7PM-3AM
 2-25-2015

The Peak Arrival duration seemed to be between 7 and 8 hours, once it got started on 2-22-15, with somewhat increasingly higher numbers of Emails showing up. It felt like new UTC time zones were activated over the days. Moscow and the Middle East on 2-22-15, Western Europe on 2-23-15, Iceland, Brazil and into New York on 2-24-15 and Vladivostok on the evening of 2-24-15 and continuing into 2-25-15.

Since I was spending a successively greater amount of my day reading the comments (to make sure a genuine comment wasn’t lurking) and isolating these moderation requests with no expectation that it would taper off soon and with full expectation that my next blog post would produce even greater unwanted feedback. I decided to “pull the plug”.

I updated the comment rules to include all those wishing to comment to register first. This stopped all the comments completely by 7:30AM UT on 2-25-15. It appears that the comments I experienced were programmatically generated, once postings on my blog site was made available to these sources.

I was curious also about who and where these false commentators came from. At first, I thought I could capture salient characteristics of each comment in an Excel Spreadsheet, but it required me to do repeated copy and paste from my Browser to Excel. As the comments mounted, it appeared that I would invest a large amount of time chasing upwards of 780 messages.

So I decided to review the E-mail message stream sent to me that indicated that a comment just arrived and needed approval moderation. Because such a single E-mail message could be associated with more than one comment, there were 605 such E-mail notifications, still making for a fairly large sample.

In order to process the information in a few minutes rather than a few weeks, I decided to use my Unix/Linux skills on MacOSX via the underlying commandline application called terminal.

The Characteristics I sought were:

  • [1] Date and Time of arrival of a message
  • [2] Page or Post that was targeted
  • Author (with IP address)
  • [3] Author alone
  • [4] Whois IP address information and Corresponding Cities and Countries
  • Email addresses that were used
  • [5] Email domains on the right side of the @ sign
  • [6] URL with a segregation of https and http communication protocols
  • [7] Selected Key words or phrase characteristics used in messages

 
I diligently moved all inbox mail in mail.app sent from WordPress requesting moderation to a mail account |S,p.a,m| folder.

Next, I checked with Google about where the MacOSX mail.app keeps the directory of a mail account’s |S,p.a,m| folder.

Once this was known, I began to write a bash script to extract into separate files the information listed above (as well as produce more analyzed reports).

Here is the bash script:


#! /bin/bash
USAGE="Usage: spamscript.bash"
# Created by Robert 2/25/2015 to Analyze Blog Spam
#
DIR=~/library/Mail/IMAP-arkay@arkaye.org@imap.1and1.com/Spam.imapmbox/Messages
FILESUFFIX=emlx
START=300570
FINISH=302099
# Note: Missing Mesgs between START and FINISH values; 607 files total/799 spam
OUTFILE1=~/SpamRawData1.txt
OUTFILE2=~/SpamRawData2.txt
OUTFILE2A=~/SortedSpamData2A.txt
OUTFILE2B=~/SpamRawData2B.txt
OUTFILE2C=~/SortedSpamData2C.txt
OUTFILE2E=~/SpamRawData2E.txt
OUTFILE3=~/SpamRawData3.txt
OUTFILE3A=~/SpamRawData3A.txt
OUTFILE3B=~/SortedSpamData3B.txt
OUTFILE3C=~/SortedSpamData3C.txt
OUTFILE4=~/SpamRawData4.txt
OUTFILE4A=~/SortedSpamData4A.txt
OUTFILE5=~/SpamRawData5.txt
OUTFILE5A=~/SpamRawData5A.txt
OUTFILE5B=~/SortedSpamData5B.txt
OUTFILE6=~/SpamRawData6.txt
OUTFILE6A=~/SortedSpamData6A.txt
OUTFILE7=~/SpamRawData7.txt
OUTFILE7A=~/SortedSpamData7A.txt
OUTFILE7B=~/SortedSpamData7B.txt
KEYWORDS=~/Keywordsfile

# Data of interest: Date and Time [1] [Done]
# Target post [2] [Done] See SortedSpamData2.txt for counts
# Author + IP Addresses [3] [Done] See SpamRawData3A.txt for Author only
# whois for IP address (May be forged) [4] [Done]
# Email Address (Especially Email Domain) [5] [Done] also
# SpamRawData5A.txt for email domains only
# URL [6] [Done] Segregate https: and http:
# Key Words or Phrases (i.e. 2 words ending in !) [7] [Done]

#
# Extract Date and Time and store in $OUTFILE1
#

awk '/^Date: /' $DIR/*.$FILESUFFIX | sed -e 's/^Date: //' > $OUTFILE1

#
# Extract Target post (Use html line in body in case of duplicates)
#

awk '/^http:\/\/www.arkaye.com\/blog\/2015/' $DIR/*.$FILESUFFIX | sed -e 's#^http://www.arkaye.com/blog/2015/../##' | sed -e 's#/$##' > $OUTFILE2
cat $OUTFILE2 | sort | uniq -c | sort -rn > $OUTFILE2A

# -------------> below finds line in all messages
awk '/^A new comment on the post /' $DIR/*.$FILESUFFIX | sed -e 's/^A new comment on the post //' | sed -e 's/ is waiting for.*$//' > $OUTFILE2B
awk '/^A new pingback on the post /' $DIR/*.$FILESUFFIX | sed -e 's/^A new pingback on the post //' | sed -e 's/ is waiting for.*$//' >> $OUTFILE2B
awk '/^A new trackback on the post /' $DIR/*.$FILESUFFIX | sed -e 's/^A new trackback on the post //' | sed -e 's/ is waiting for.*$//' >> $OUTFILE2B
cat $OUTFILE2B | sort | uniq -c | sort -rn > $OUTFILE2C

#
# Extract Author Name/Link with IP information (Use Author: line in body)
#

awk '/^Author : /' $DIR/*.$FILESUFFIX | sed -e 's/^Author : //' > $OUTFILE3
awk '/^Author : /' $DIR/*.$FILESUFFIX | sed -e 's/^Author : //' | sed -e 's/(.*$//' > $OUTFILE3A
sort $OUTFILE3A | uniq -c | sort -rn > $OUTFILE3B
cut -d " " -f1 $OUTFILE3A | sort | uniq -c | sort -rn > $OUTFILE3C

#
# Extract IP Address from Whois : line
#

awk '/^Whois : /' $DIR/*.$FILESUFFIX | sed -e 's/^Whois : http:\/\/whois.arin.net\/rest\/ip\///' > $OUTFILE4
sort $OUTFILE4 | uniq -c | sort -rn > $OUTFILE4A

#
# Extract E-mail Address E-mail : line
#

awk '/^E-mail : /' $DIR/*.$FILESUFFIX | sed -e 's/^E-mail : //' > $OUTFILE5
awk '/^E-mail : /' $DIR/*.$FILESUFFIX | sed -e 's/^E-mail : .*@//' > $OUTFILE5A
sort $OUTFILE5A | uniq -c | sort -rn > $OUTFILE5B

#
# Extract URL : line
#

awk '/^URL : /' $DIR/*.$FILESUFFIX | sed -e 's/^URL : //' > $OUTFILE6
sort $OUTFILE6 | uniq -c | sort -rn > $OUTFILE6A

#
# Extract Keywords from Comment text
#
# Consider as key words:
# See KEYWORDS Variable above

for i in $KEYWORDS
do
awk '/^Comment:/,/^Approve it:/' $DIR/*.$FILESUFFIX | grep "$i"
done >> $OUTFILE7

while read i
do
echo $(grep -c "$i" $OUTFILE7) $i
done < $KEYWORDS >> $OUTFILE7A
cat $OUTFILE7A | sort -rn > $OUTFILE7B

# END OF spamscript.bash

The following is a partial sample E-mail notification instance to me that allows the script to work properly:


From: WordPress [Masked]
Subject: [Math-Linux Insights] Please moderate: "Permutations Count On Factorials"
Date: February 22, 2015 6:20:13 PM PST

A new comment on the post "Permutations Count On Factorials" is waiting for your approval

http://www.arkaye.com/blog/2015/02/permutations-count-on-factorials/

Author : Rodent Exterminator St Catherines (IP: 23.106.66.219 , 23.106.66.219.rdns.as15003.net)
E-mail : columbuseagle@gmail.com
URL : https://www.youtube.com/watch?v=KvPxEUBqmLY
Whois : http://whois.arin.net/rest/ip/23.106.66.219
Comment:
You need to take part in a contest for one of the greatest blogs on the web.

I am going to recommend this web site!

[2] WHICH: Post Targets

The following table shows the number of comment Email sent to the targeted post (with post date) for approval:

# Post Title Post Date
130 Permutations Count On Factorials 2-19-2015
101 Factorials For Fun 1-15-2015
96 π Places 1-21-2015
86 Sample Page 1-15-2015
77 About Math-Linux Insights 1-15-2015
71 Hello world! 1-15-2015
36 π GPS (Greater Precision Solutions) 2-4-2015
—–
598

It was interesting that the commenters’ “programs” failed to distinguish my original content from WordPress generated content (or perhaps it was deliberate). Normally, Moderator rejection is nearly certain if the comment’s context is inappropriate or misdirected.

The most recent post in February had the largest number of Comment E-mails associated during the 6 days it existed and unregistered comments were allowed. Posts with html characters (i.e. of the form: &xxx; ) had unexpectedly fewer comments, especially since the GPS post came after the places post.

[3] WHO: Author

Each Comment has associated with it an Author name. This can be a userid or a link to a web page, video or web page description or gibberish. From the script generated files, I manually summarized and counted a family of descriptors to a single “Author” or keyword name. This was recounted and displayed based on a reverse numerical sort, general keywords that are associated with an Author name two or more times are:

25 Pest Control
15 best
12 Atlas Chalet
12 travertine tile
10 Bed Bug
9 Manhattan
9 dental
9 roof
8 =D7=A9=D7=99=D7=A8=D7=95=D7=AA=D7=99
8 how to repair
8 personal injury
7 Pest
7 Richmond Virginia (VA) best personal injury
6 cleaning
6 cosmetic
5 commercial
5 cost of
5 criminal
5 hail
5 how much
5 plumber
5 residential
5 tile
4 Insect
4 abogados de accidente Miami
4 brooklyn
4 dentist
4 find
4 hvac
4 title loans
3 Ant
3 Carpenter Ant
3 Exterminator
3 Home
3 Rat
3 Wildlife
3 affordable
3 cash
3 dui
3 estimated
3 get
3 marble
3 paid
3 payday
3 replacing
3 sealing
3 teeth
2 24 hr emergency
2 Ants
2 Atlas Chalet Warranty Cobb
2 Brittany
2 Click [Hh]ere
2 Defective Atlas Chalet Shingles
2 Home Pest Control Service
2 Pesticide
2 Residential Pest Control
2 Richmond
2 Roach
2 Rodent
2 Roof
2 SEO
2 TX
2 air conditioner
2 appliance repair
2 atlanta
2 best music
2 best paid survey sites
2 best personal injury attorney Richmond Virginia
2 blitz brigade hack
2 cat toys
2 cheap
2 commercial appliance repair
2 cost to
2 credit repair
2 deer hunter 2014
2 family law
2 free
2 garage door
2 go
2 google
2 great site
2 herpes cure
2 how to find a
2 it
2 jetpack joyride
2 knights and dragons
2 lawyer
2 miami
2 nyc
2 paid survey sites
2 plumbing
2 pou
2 pozyczka
2 queens
2 redolex.com
2 replacement gas furnace
2 reviews
2 seo
2 seo plugin
2 shingles repair
2 solar panel
2 solar power
2 storm
2 surveys for money
2 teeter hang ups review
2 title
2 try this site
2 tucson dui help
2 zesp=C3=B3=C5=82 na wesele Krak=C3=B3w
 

[4] WHERE: Country, State, City

WordPress associates an IP (v4) address (e.g. of the form nnn.nnn.nnn.nnn, where each nnn ranges independently between 0 and 255) with every comment. This allows me to pinpoint how many times the same IP address is used to issue a comment. Also, there is a website, What Is My IP? that lets you enter an IP address and it returns, among other things, the Latitude, Longitude, City, State and Country and a map segment associated with that IP address location.

In a file, I extracted the counts of IP addresses associated with the comments and then manually augmented that file with the Country, State and City. The highest repeated IP address (108.177.146.70) was 24, from Phoenix, Arizona. I then added up all the counts from each of the same cities and produced the following table (minus the IP addresses):

# Location
465 Arizona Phoenix
22 New York Buffalo
15 France Paris
12 Sweden Stockholm
7 Germany Frankfort
5 Nevada Henderson
5 California Los Angeles
5 United Kingdom London
4 Texas Dallas
3 Indiana Zionsville
3 Canada Quebec Montreal
3 China Caizi Zhen
3 Delaware Dover
3 New York New York City
3 Netherlands Dronten
3 Romania Media
3 Netherlands Amsterdam
2 Colorado Fort Collins
2 Alabama Montgomery
2 Florida Miami
2 Russia Moscow

# Location
2 Switzerland Zurich
2 Romania Moldova
2 California Fresno
1 Lebanon Beirut
1 China Sichuan Mianzhu
1 Taiwan Taipei
1 China Shaanxi Xian
1 Poland Warsaw
1 Germany Berlin
1 Illinois Chicago
1 Illinois Lombard
1 Maryland Baltimore
1 Russia Dubna
1 Washington Spokane
1 Russia Saint Petersburg
1 Germany Munich
1 North Carolina Greensboro
1 Ukraine Uzhgorod
1 Poland Gdansk
1 Romania Lasl
1 Russia Murmansk Kovdor
1 Iowa Cloud
597  Total

Therefore `bb (465/597)` or nearly 78% of the |s,p.a,m| comes from 44 servers (with unique IP addresses) in Phoenix, Arizona. These 44 unique servers are counted using:

   cat SortedSpamData4C.txt | grep –c ‘Arizona Phoenix’

[5] REPLY: Email Address

Turning now to the Email addresses associated with the 597 comment Emails, there were 49 unique email addresses used. gmail.com had the preponderance of Email domains with 185; The primary domains .de (Germany) had 8 and .net had 11.

# Domain
185 gmail.com
34 arcor.de
28 yahoo.com
27 zoho.com
26 freenet.de
25 gawab.com
24 googlemail.com
23 web.de
23 inbox.com
22 bigstring.com
21 yahoo.de
21 t-online.de
21 aol.com
15 gmx.de
13 hotmail.com
12 gmx.net
9 live.com
7 live.de
6 hotmail.de
6 care2.com
4 wildmail.com
4 snail-mail.net
4 peacemail.com
3 animail.net
# Domain
2 yepmail.net
2 vegemail.com
2 msn.com
2 moose-mail.com
2 mailservice.ms
2 mailsent.net
2 mail.com
2 justemail.net
2 imap.cc
1 xvrqyw.com
1 whale-mail.com
1 swift-mail.com
1 ssl-mail.com
1 speedpost.net
1 ownmail.net
1 mailworks.org
1 mailup.net
1 mailnew.com
1 mailmight.com
1 mailc.net
1 mailas.com
1 fastmessaging.com
1 fastmail.net
1 fastem.com
1 fast-mail.org
597  Total

[6] WHY: Click On URL

Next, we consider the URLs associated with the Comments. In addition to the 597 comments were 6 trackbacks (generated by others) and 2 pingbacks (generated by me).

Protocols used: https = 375 http = 230 Total = 605

Specific Selected URL Domains (enclosed with /s):

# URL
366 www.youtube.com
69 youtu.be
24 www.facebook.com
17 .*.[Gg]oogle.com
7 redolex.com
4 www.yelp.ca
4 bitly.com
3 scapca1.net
2 www.pinterest.com
1 twitter.com

Keyword URL Domains (No right / )

# URL pattern
16 .*survey.*
8 .*porn.*

As can be seen, youtube and its variant represented 435 URLs that commenters used to represent themselves for viewing purposes.

[7] WHAT: Message Keywords

The following table shows the keywords I determined to be peculiar to |S,p.a,m| comments and their popularity, in descending order. Based on the script output file, the following keywords were manually summarized and commandline counted.

# Keyword
294 nice
271 won
228 ans
199 info
137 pleasant
114 off topic
80 fastidious
47 brussels
45 heads up
42 creative writing
39 useful information
38 convey
28 comeback
27 masterpiece
25 pals
24 subject matter
24 donate
23 whilst
22 you relied on the video
22 go after your heart
21 hyperlink
20 at the glance out
19 peer
19 onderful site
14 loading velocity
14 arena
13 such a lot
11 uncanny
9 vefy
9 bravery
8 must read
7 I say to you
6 un-ambiguity
6 take a signal
6 preserveness
6 energetic article
6 I have a mission
4 precious knowledge
1 what a information
1 unpredicted
1 killing my time
1 did not happened
1 Grrrr…

So there it is. Probably much more than you wanted to know about how comment
|S,p.a,m| can find you as a result of normal blog post publicity on WordPress’ part.

Perhaps this data can offer insights to others plagued with similar experiences.

We may have to go back to the old tradition, as when researchers wrote papers in the last century. They invited you to read their paper (and offer improvements). Nothing went viral for years, if ever.

The Internet’s ability as a conduit and enabler for many “agents” to visit sites, leave (non)messages or even crash websites through directed, overwhelming traffic are instances of unwanted popularity. It is analogous to Standup Comedians being heckled or Government Spokespeople/Invited Speakers being vocally protested or the Signal being accompanied by a significant amount of Noise.

The most elegant solution to this, to my mind, lies in creating a virtual lightning rod decoy to attract the unwanted (Linux has a facility called /dev/null .), while creating and sheltering a direct conduit for those who wish to offer dialogue via cogent questions, differing opinions or other helpful reactions.

π (pi) day 2015

A moment within today is special. It is π (pi) Day of the Century. March 14, 2015, at 9:26 a.m. and 53 seconds. This matches the 1st 10 digits of π (pi). Did you feel irrational at that time?

As you can surmise, this day won’t come again, and its pattern matching not for 100 years.

I recall the song by Kate Bush from her Aerial CD: The pi song. Listen to the first 6:16 of it on youtube.

Have an otherwise uneventful day.

Permutations Count On Factorials

 

In my previous blog post, Factorials For Fun, I wrote about factorial notation and its amenability to being calculated by iteration and recursion.

In this post, I want to relate the use of factorials in permutations, named by John H Conway as arrangement numbers. While there is a factorial-based formula for counting how many permutations, it takes much more discipline to enumerate them all correctly (boring too, if done by hand). This has become such an extensive review, so I will relate in a separate future post factorials with combinations (choice numbers, also named by John H. Conway).

A Fundamental Counting Principle

The Product Rule:
Suppose that a procedure can be split into a sequence of k tasks. If there are n1 ways to do the first task and for each of these n1 ways, there are n2 ways of doing the second task, and generally, for all of these n1, n2, …, nk-1 ways, there are nk ways of doing the kth task, then there are

                   `bb  n_1*n_2*…*n_k`

ways to do the entire procedure.

This principle is applied whenever you are choosing each object from its own group.

Suppose your camping wardrobe is made up of 2 pairs of hiking boots, 3 vests, 6 shirts, 1 hat and 3 jackets. How many different outfits are possible?

Since each item belongs to its own group, we have:  `bb  2*3*6*1*3 = 108`   different outfits.

Permutations

A permutation of n objects is an ordered arrangement of n distinct items (objects) in a row.

Consider 3 coins, such as a Nickel, Dime, and Quarter. The Nickel (N), Dime (D) and Quarter (Q), in that order, is a single permutation of the three coins. The entire list of all possible permutations of the three coins is here:

{N, D, Q},  {N, Q, D},  {Q, N, D},  {D, N, Q},  {D, Q, N},  {Q, D, N}                        [1]

This list of permutations has a count of 6. This is equivalent to 3⋅2⋅1 arrangements.

This is because the first coin can be any one of 3 denominations (3 ways). Once chosen, the second coin can be any one of the remaining 2 denominations (2 ways). So for the first two positions, there are 3⋅2 ways. Once the second coin is chosen, the third coin is whatever denomination remains and is chosen (1 way). So for the first three positions, there are 3⋅2⋅1 ways.

Because each position is filled from what is sequentially available (remaining) among the 3 coins, this is called selection without replacement.

Question: How many permutations (arrangements) of n distinct objects are possible (without replacement)?

Value of k:                  1        2           3                   k            (n-1)    n

Ways to choose k:     n    (n-1)     (n-2) . . .   (n – k + 1) . . . 2       1                                 [Table 1]

Row position:            —    ——-     ——         ————-        —      —

In the first position, we can choose any of the n objects. Choose one.

In the second position, since one object was already chosen, we can choose any of the
(n-1) objects. Choose one.

In the third position, since two objects have already been chosen, we can choose any of the
(n-2) objects. Choose one.

In this same way, in the kth position, we can choose (n – k + 1) objects since k objects have already been selected. Choose one.

Finally, for the last position, since all but one, namely (n-1) objects have been chosen, there is only 1 object left to choose and put in the last position. Choose it and complete a single arrangement (ordering) in a row.

Therefore the number of ways to choose (arrange) n objects out of a total of n objects is:

                        n⋅(n-1)(n-2)⋅…⋅2⋅1 = n!                                                                        [2]

And the number of ways (permutations) to choose k (unique) objects out of a total of n (unique) objects is:

                        n⋅(n-1)⋅(n-2)⋅…⋅(n-k+1) = `bb (n!)/((n-k)!)`  =  `bb   _nP_k`                                [3]

The fraction in the center in [3], is the product of all the numbers from 1 to n, (i.e. n!) except for those being a product from 1 to n – k ( which is (n – k)! ).

Because n! is a multiple of (n – k)! , this artificial fraction evaluates to a product of consecutive integers, below, the arrangement numbers:

                        n⋅(n-1)⋅(n-2)⋅…⋅(n-k+1)

This “fraction” in [3] has a variety of notational labels and all mean the same thing:

 Symbols            Spoken or Meaning                                                     When k = n

 P(n,k)                        a k-permutation                                                            P(n,n) = n!/0! = n!

pnk                                        a k-permutation arranged in a row                             pnn = n!/0! = n!

nPk                              the permutations of n things taken k at a time       nPn = n!/0! = n!

(n)k                              a k-list                                                                                  (n)n = n!/0! = n!

nk                                n to the k falling (falling power)                                     nn = n!/0! = n!

n factorial is also recursively written as:

                        n! = n⋅(n-1)!                                                                                     [4]

which is valid for all positive integers n ≥ 1.

It is also true that, by solving [4] for (n-1)! , we have:

                       (n – 1)! =  `bb  (n!)/n`

(n – 1)! also represents the number of circular arrangements (or necklaces) of n distinct objects. This is because two permutations (i.e. complete placements of people around a circular table) can be considered identical if one becomes the other by rotating the circle (or table).

There are n ways to rotate the circle (or table). So n divides the n!

In summary, n! is the count of the number of permutations (or linear arrangements) of n distinct objects in a row and (n-1)! is the count of the number of circular permutations (or arrangements).

Computation Aids: Online Calculators

The website calculatorsoup.com has these special permutation calculators available for those dealing with larger values of n. They are:

This calculator has examples to show how to use and Calculate nPk where 0 ≤ n, k ≤ 9999. See:
Permutations Calculator.

and

Circular Permutations Calculator: Calculates (n-1)! where 0 ≤ n ≤ 9999. See:
Circular Permutations Calculator

Other specialized calculators are interspersed further below.

5 Examples

(1) How many different ways can the letters in the word numerals be arranged?

Since different ways is a synonym for arrangements, and the 8 letters are distinct, the permutation formula [2] (or [3] with k=n] , expressed as n! , is to be applied:

                          8! = 87654321 = 40320 ways

(2) How many ways are there to select a first-prize winner, a second-prize winner and a third-prize winner from 50 different people who have entered a contest?

In this question, it is important to know which person wins which prize. So the number of ways to choose the three prize-winners is the number of ordered selections of 3 people (out of 50).

So we consider that the first prize can be won by any of the 50 people; then the second prize can be won by any of the remaining 49 people; and the third prize can be won by any of the remaining 48 people. This is reflected in formula [3], where n = 50 and k = 3.

          50P3 = `bb  (50!)/((50-3)!)  =  ( 50*49*48*…*1)/(47*46*45*…*1)` =  50⋅49⋅48 = 142100 ways.

(3) How many permutations of the letters STUVWXYZ contain the consecutive lettered string XYZ ?

Because the consecutive lettered (sub)string XYZ must all appear as an ordered group (first X, then Y, then Z) in the larger 8 lettered arrangement, we can think of the string XYZ as a single entity.

Then the number of arrangements of {S, T, U, V, W, XYZ} is based on 6 items rather than 8. So:

                          6! = 654321 = 720 ways

(4) In how many ways can 9 people be seated around a circular table?

Because this fits the conditions for a circular permutation (with arrangements that can be rotated being considered indistinguishable), we use the formula in [5] with n = 9:

                          (9 – 1)! = 8! = 87654321 = 40320 ways.

(5) Find the number of ways in which 5 people Al, Betty, Cam, Dana and Ed have place settings (seats) at a circular table, such that:
     (i) Dana and Ed must always sit together, and
     (ii) Al and Betty must not sit together.

Because this fits the conditions for a circular permutation (with rotations of any arrangement being considered equivalent), we will start with formula [5].

Consider the first restriction. Dana and Ed sitting together represent a couple, a single entity. So n = 4 {Al, Betty, Cam, Couple} or {a, b, c, de) and:

                   2⋅(4 – 1)! = 2⋅3! = 2321 = 26 = 12

In the expression evaluation, as there are two positions the couple can assume, the 6 is multiplied by 2 to get 12.

To show each circular arrangement, I visualized a clock with de at the Noon position, a at the 3:00, b at the 6:00 and c at the 9:00 position, respectively. By keeping de anchored at the Noon position, the rest of the people (n – 1) can be permuted (arranged) normally for counting purposes and rotations do not occur.


Arrangements Clock

Using initials, the Part (i) count of 12 arrangements enumerates as follows:

(abcde), (acbde), (cabde), (cbade), (bacde), (bcade),

(abced), (acbed), (cabed), (cbaed), (baced), (bcaed)

Now consider the second restriction: Al and Betty must not be neighbors.

This means that Al and Betty (a and b) must be separated on either side of a or b by both c and de. Notice that the four uncolorful arrangements are just the ones that are needed.

We can determine this more analytically by thinking about creating four quartets: adeb, bdea, aedb, beda, which separate a and b and which get permuted with c in only 1 way each (this is because cadeb and adebc are equivalent rotations). Since c must be on the outside of the quartet, c is also between b and a (or a and b).

Alternatively, we can delete from the part (i) enumeration of 12, the 8 arrangements where a and b are next to each other. So we must subtract from the list of 12, these 8 arrangements: (abcde), (cabde), (cbade), (bacde), (abced), (cabed), (cbaed) and (baced).

This leaves 4 circular arrangements (acbde), (bcade), (acbed) and (bcaed) that simultaneously fulfill the two restrictions.

Permutation Relaxations

So far, this discussion focused on Permutations that were distinct and had no repetitions and were counted without replacement. How does the counting change if we allow repetitions or if we allow replacement or both?

  • Permutations with repetitions
    The rule is that if there are zero or more repetitions of an object, because they are indistinguishable objects, their multiplicity factorial must divide the overall arrangement factorial to get the proper count. (with zero repetitions, we have a multiplicity of 1; i.e. the object is unique.)

         `bb  (n!)/(a!*b!*c!*…*k!)`  , where a+b+c+…+k = n                 [6]

    As an example, suppose we want to know how many different ways the distinct letters in the word bananas can be arranged?   We calculate:

               `bb  (7!)/(3!*2!) =  (7*6*5*4*3*2*1)/((3⋅2⋅1)⋅(2⋅1)) = 7*6*5*2` = 420 .

    This is because ‘a is repeated three times and ‘n is repeated twice; for each of these letters, the repetitions are indistinguishable, so each multiplicity factorial (3! and 2!) must divide the factorial of the 7 letter word to get the correct count.

  • Permutations with replacement

    If we are counting Permutations with replacement, then Table 1 changes and becomes:

    Value of k:                  1         2        3                 k               (n-1)     n
    Ways to choose k:     n        n        n      . . .      n        . . .     n        n          [Table 2]
    Row position:            —-     —-      —-              —-               —-      —-

    In the first position, we can choose any of the n objects. Choose one.
    In the second position, we can choose any of the n objects. Choose one.
    In the third position, we can choose any of the n objects. Choose one.

    In this same way, in the kth position, we can choose any of the n objects. Choose one.

    Finally, for the last position, we can choose any of the n objects. Choose one.

    In this way, we complete a single arrangement (ordering) in a row.

    Therefore the number of ways to choose (arrange) n objects out of a total of n
    objects (with replacement) is:

                          n of these
                    ` <———>`
                       n⋅n⋅n⋅…⋅n⋅n  =  nn                                                      [7]

    And the number of ways (permutations) to choose k (unique) objects out of a total of n (unique) objects (with replacement) is:

                          k of these
                    ` <———>`
                       n⋅n⋅n⋅…⋅n⋅n  =  nk                                                      [8]

    There is a Permutations With Replacement Calculator. It offers examples of how to use and Calculate nr where 0 ≤ n, r ≤ 9999. See:
    Permutations With Replacement Calculator

    An Example:

    U.S. zip (postal) codes consist of an ordering of five digits, hyphen, followed by four digits chosen from 0-9 with replacement (i.e. digits may be reused). How many zip codes are in the set of all possible zip codes?

    Since we are looking for arrangements with replacement, we use n, the number of digits to be 0-9, which is 10 digits and the number of positions of the zip code, which is 9.

    Using the formula [8], we calculate:

                109 = 1 Billion.

    Since the zip code 00000-0000 is not used, we subtract it from 1 Billion to get:

    999,999,999 different zip codes.

    If 00000-nnnn are also to be unused, we have instead:

                 109 – 104 = 9.9999 × 108 = 999,990,000 different zip codes.

  • Derangements (and subfactorials)
    A Derangement is a permutation of the elements of n objects or elements, such that no element appears in its original position. If an original permutation is {1, 2, 3, 4, 5}, one derangement would be {2, 3, 4, 5, 1}.The symbol !n (sometimes called a subfactorial), represents the number of derangements of n objects and its formula is:

         !n = n!`bb  sum_(k=0)^n ((-1)^k)/(k!)`      [9]

    In this formula, n ≥ 2. Here are the listings of !n up to n=4.

    !0 = 1   (the empty permutation)
    !1 = None
    !2 = 1   (21)
    !3 = 2   (231) (312)
    !4 = 9   (2143), (2341), (2413), (3142), (3412), (3421), (4123), (4312), (4321)

    Derangements with a single fixed point are another special permutation calculation. The number of permutations having at least one fixed point,  (e.g.: {2, 4, 3, 5, 1} thus not being (a complete) derangement, is given by:

         n! – !n = n! – n!`bb  sum_(k=0)^k ((-1)^n)/(k!) = n!sum_(k=1)^n ((-1)^k)/(k!)`      [10]

    Another Example:

    You have 6 balls in 6 different colors, and for every ball you have a box of the same color. How many derangements do you have, if no ball is in a box of the same color? If at least one ball is in a box of the same color?

    To apply [9] to this problem, we calculate:

    !6 = 6!`bb ((1 – 1 + 1/2 – 1/6 + 1/24 – 1/120 + 1/720)) =`
    6*5*4*3 – 5! + 6*5 – 6 + 1 =
    360 – 120 + 30 – 6 + 1 = 265

    For the second part of the question, keeping in mind that what is not a derangement is a “partial” derangement, we subtract the total number of derangements from the total number of permutations.

    6! – !6 = 720 – 265 = 455

    or evaluating the right side of [10] explicitly, we have

    6!`bb ((1 – 1/2 + 1/6 – 1/24 + 1/120 – 1/720)) =`
    720 – 6*5*4*3 + 5! – 6*5 + 6 – 1 =
    720 – 360 + 120 – 30 + 6 – 1 = 455

  • Permutation Cycles

    A permutation cycle is a subset of a permutation whose elements trade places with one another.

    For example, in the permutation group {4, 2, 1, 3}, (143) is a 3-cycle and (2) is a 1-cycle. Here, the notation (143) means that starting from the original ordering
    {1, 2, 3, 4}, the first element is replaced by the fourth, the fourth by the third, and the third by the first, i.e., `bb 1 rarr 4 rarr 3 rarr 1`.

    ( `1, 2, 3, 4` )   So `1 rarr 4 , 4 rarr 3, 3 rarr 1` creates the 3-cycle (143) and
    `darr darr darr darr`   `2 rarr 2` creates the fixed point or 1-cycle (2).
    ( `4, 2, 1, 3` )   So we have, (143)(2) making up the two cycles in the
    permutation group {4, 2, 1, 3}.

    ( See: Weisstein, Eric W. “Permutation Cycle.” From MathWorld–A Wolfram Web Resource. <mathworld.wolfram.com/PermutationCycle.html> )

  • Odd And Even Permutations

    So is {4, 2, 1, 3} an odd or even permutation?

    By definition, an odd permutation is a permutation obtainable from an odd number of two-element swaps, i.e., a permutation with permutation symbol (Levi-Civita signature symbol) equal to -1.
    For the initial set (1,2,3,4) the twelve odd permutations are those with one swap (1,2,4,3, 1,3,2,4, 1,4,3,2, 2,1,3,4, 3,2,1,4, 4,2,3,1) and those with three swaps (2,3,4,1, 2,4,1,3, 3,1,4,2, 3,4,2,1, 4,1,2,3, 4,3,1,2).
    For a set of n elements and n ≥ 2, there are `bb (n!)/2` odd permutations *, which is the same as the number of even permutations. For n =1, 2, …, the total odd permutations are given by 0, 1, 3, 12, 60, 360, 2520, 20160, 181440, … (OEIS A001710).
    (* D’Angelo, J. P. and West, D. B. Mathematical Thinking: Problem-Solving and Proofs, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2000. (p.111) and
    Weisstein, Eric W. “Odd Permutation” From MathWorld–A Wolfram Web Resource. <mathworld.wolfram.com/OddPermutation.html> )

    By definition, an even permutation is a permutation obtainable from an even number of two-element swaps, i.e., a permutation with permutation symbol (Levi-Civita signature symbol) equal to +1.
    Similarly.
    For the same initial (1,2,3,4), the twelve even permutations are those with zero swaps: (1,2,3,4); and those with two swaps: (1,3,4,2, 1,4,2,3, 2,1,4,3, 2,3,1,4, 2,4,3,1, 3,1,2,4, 3,2,4,1, 3,4,1,2, 4,1,3,2, 4,2,1,3, 4,3,2,1).
    For a set of n elements and n ≥ 2, there are `bb (n!)/2` even permutations, which is the same as the number of odd permutations. For n=1, 2, …, the numbers are given by 0, 1, 3, 12, 60, 360, 2520, 20160, 181440, … as above. (OEIS A001710).
    ( See: Weisstein, Eric W. “Even Permutation” From MathWorld–A Wolfram Web Resource. <mathworld.wolfram.com/EvenPermutation.html> )

    An Odd Permutations Calculator may be used to Calculate `bb (n!)/2` where 2 ≤ n ≤ 100. See:
    Odd Permutations Calculator

    Also, An Even Permutations Calculator may be used to Calculate `bb (n!)/2` where 2 < n ≤ 9999. See:
    Even Permutations Calculator

    So {4, 2, 1, 3} is an even permutation and its 3-cycle (143) and 1-cycle (2) represent two even cycles.
    A rule of thumb is that the if a permutation cycle has an odd number of elements within, it is considered even, otherwise if an even number of elements within, it is considered odd.

    An analogy exists with odd and even numbers which is also true of permutation cycles:
    If two even numbers (cycles) are added together, the result is even.
    If two odd numbers (cycles) are added together, the result is also even.
    If one even and one odd number (cycles) are added together, the result is odd.
    Thus {4, 2, 1, 3} is an even permutation based on two even cycles (143) and (2).

Finally, the rosettacode.org site offers over 70 different computer language codes and scripts (although not any of the Unix/Linux Shells as yet). These programs deal with permutations with or without replacement (as well as that for combinations). See Enumerating Permutations Programs.

The site also has programmatic algorithms for
• permutations with repetitions,
• Finding the missing permutation: finding the missing enumerated permutation from the full set of permutations (47 programs and an interesting discussion tab) See: Find_the_missing_permutation and
• permutations and derangements: See: Derangement programs (26 programs)

I should mention that the most popular applications for permutations are for people that love Scrabble (at least two or more play), Anagrams and Jumble word and cartoon puzzles. Other applications are when a host is planning a formal a sit down, catered meal for at least 6 people that has seating preferences and constraints. As always, if you are unable to successfully seat everyone agreeably, make the problem bigger and invite more guests.

Book References:
Conway, John H., Guy, Richard K., (1995). The Book of Numbers. Springer-Verlag, New York, NY. (p. 66-67)

Knuth, Donald E. (1997). Art of Computer Programming, Volume 1: Fundamental Algorithms, 3rd Ed. Addison-Wesley, Boston. MA.(p. 45-50, 164-167)

Rosen, Kenneth H., Editor-In-Chief (2000). Handbook of Discrete and Combinatorial Mathematics. CRC Press, Boca Raton, FL (p. 84-91, 96-107)

Rosen, Kenneth H. (2012). Discrete Mathematics and its Applications, 7th Ed. McGraw-Hill, New York, NY (p. 385-443)

π GPS (Greater Precision Solutions)

In my previous blog post π Places, I reviewed special fractions that were very good approximations to the value of π to 2 and 6 decimal places: i.e. `22/7` and `355/113` respectively.

In this post, I would like to explore the more modern (since 1593) approximations based on infinite series.

Note: If the mathematics shown creates immediate “eye-glaze”, please skip to the Activities section, which is much more entertaining in comparison.

A Series sums up a finite or infinite collection of terms of a Sequence. If finite, there are a definite, bounded number of values that add up to a specific, measurable value. If infinite, there are an endless, unbounded number of values that add up to either a specific, measurable value (converge), or add up to an indefinite, unmeasurable value (diverge), which is denoted as infinity ().

Here are two examples of each:

 

Finite Series

 

First 10 Natural Numbers sum

`1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = sum_(n=1)^10 n=(10(10+1))/2=55`     [1]

π expansion to 7 decimal places:

`3 + 1/10 + 4/10^2 + 1/10^3 + 5/10^4 + 9/10^5 + 2/10^6 + 6/10^7 = 3.1415926`                 [2]

 

Infinite Series (Convergent)

 

π decimal expansion:

`3 + 1/10 + 4/10^2 + 1/10^3 + 5/10^4 + 9/10^5 + 2/10^6 + 6/10^7 + … = ` π                      [3]

Geometric Series:

`1 + 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + … + 1/2^(n-1) + … = sum_(n=1)^oo 1/2^(n-1)=2`            [4]

 

Infinite Series (Divergent)

 

Natural Number Series:

`1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + … = sum_(n=1)^oo n=oo`                        [5]

Harmonic Series:

`1 + 1/2 + 1/3 + 1/4 + 1/5 + 1/6 + 1/7 + … + 1/n + … = sum_(n=1)^oo 1/n=oo`                 [6]

For those interested in mathematically (in)finite series, look at the following
List of Mathematical Series. Summed general expressions are given for series organized by sums of powers, power series, binomial coefficients, trigonometric and rational functions. At the top, there are subscripted letters and Greek letter functions (such a necessary annoyance) that are separately named and linked.

 

Greater Decimal Places Approximators

 

Over the years, many mathematicians have tried to evaluate a finite number of terms in an infinite series to approximate π.

  • Francois Viete in 1593, using an infinite product, calculated π to 9 decimal digits, by applying the Archimedes method of exhaustion to a polygon with `bb 6 × 2^16 = 393216` sides and calculating the circumference (perimeter) when the diameter is 1. π =
    `2/[sqrt(1/2)*(sqrt[1/2+(1/2)*sqrt(1/2)])*(sqrt[1/2+(1/2)*sqrt{1/2+(1/2)*sqrt{1/2}]))*…` [7]

    Quite the eye-roller! Alternatively, this looks slightly less volatile, but is equivalent:

    π `~~ 2^k*sqrt(2–sqrt(2+sqrt(2+sqrt(2+sqrt(2+sqrt(2+sqrt(2+…))))))) ~~ 3.14157294`   [8]

  • James Gregory in 1671 published the arctangent (arctan) infinite series expansion and Gottfried W. Leibniz in 1682 published the specific case with x = 1 and based on

    arctangent`(1) = pi/4`                [9]
    Expanding this via the infinite series for arctangent`(x)`, where `bb | x | ≤ 1`, we get:


    arctangent`(x)= x – x^3/3 + x^5/5 – x^7/7 + … = sum_(n=0)^oo (-1)^n*(x^(2n+1))/(2n+1)`                [10]
  • John Machin in 1706, using arctangent and the Gregory-Leibniz Infinite Series expansion for arctangent, as shown below, calculated it to 100 decimal digits.

    π = 4∙[4∙arctangent`(1/5)` + arctangent`(1/239)`]                 [11]
    π =                                                                                                 [12]
    `4*{ 4*[ 1/5 – 1/(3*5^3) + 1/(5*5^5) – 1/(7*5^7)+… ] + [ 1/239 – 1/(3*239^3) + 1/(5*239^5) – 1/(7*239^7)+… ] }`
  • Srinivasa Ramanujan, was a renowned Indian mathematician who made novel contributions to mathematics and to determining π during his short life (1887-1920). The first formula below gives π to 11 decimal places.

    π = `root(4)[81+(19^2/22)] = 3.14159265262`                        [13]
    The following iterative formula was due to Jonathan and Peter Borwein based on Ramanujan’s formulas:


    π = `[9801/(2*sqrt(2))]*[1/(sum_(n=0)^oo {([(4n)!]/(n!)^4)*[(1103 + 26390n)/(4*99)^(4n)]}]]`        [14]
    Ramanujan’s integration formula for π is also quite innovative:


    `(pi/2)^3 = int_0^oo [(log x)^2/(1+x^2)]dx`        [15]
    which can easily be solved for π.
  • David and Gregory Chudnovsky in 1989 in New York City, created their own supercomputer (named m zero) to compute 1 billion decimal places of π from their home. It operated at 100 billion calculations per minute for almost a week.
    In 1997, they moved to Polytechnic Institute of Brooklyn (my Alma Mater and now part of New York University), creating the Institute for Mathematics and Advanced Supercomputing. By then, their computer worked for a week (with a better algorithm) to compute up to 8 billion decimal places for π.

    Their π calculation is based on:


    `1/pi = 12*sum_(n=0)^oo (-1)^n*{([(6n)!]/[(n!)^3*(3n)!])*[(13591409 + 545140134n)/(640320^(3n+3/2))]}`  [16]
  • Yasumasa Kanada holds the 2002 record for `bb 1.2411 × 10^12` decimal places for π. His computer programs (written in Fortran and C) were based on K. Takano’s 1982 arctangent formula [17] below and ran on a Hitachi SR8000/MPP with 144 nodes. The computations were carried out in hexadecimal (base 16) arithmetic and converted at the end to decimal for maximal efficiency. Further details about the computation are found on Yasumasa Kanada 2002 π Summary Page.

    π `~~`                               [17]
    `48*arctan(1/49) +128*arctan(1/57) – 20*arctan (1/239) + 48*arctan(1/110443)`
    This took 400 hours (hexadecimal computing) + `bb 23 1/3` hours (conversion).
    It was verified with F.C.M. Stoemer’s 1896 arctangent formula:


    π `~~`                               [18]
    `176*arctan(1/57) +28*arctan(1/239) – 48*arctan (1/682) + 96*arctan(1/12943)`
    The verification formula [18] took 157.067 hours (hexadecimal computing) + 21.53 hours (conversion) to evaluate and compare.
  • As of October 8, 2014, an anonymous mathematician named “houkouonchi” completed computing and verifying π to a total of `bb 1.33 × 10^13` decimal places using the Chudnovsky formula [16]. It took his program 208 days to compute and 182 hours to verify on an 2 x Xeon E5-4650L @ 2.6 GHz. See:
    Validation of π computation.

π Activities To Try:

 

Entertaining Websites:

Within the mathisfun.com website, there is an activity to guide you to find the approximate value of π. Look at π approximation

Many people are emotionally attached to the numbers in their life. Their social security number (e.g. 423456789), their birthday (02022015), their 7-digit telephone number (3234777).

There is a great website, called the Pi Search Page that lets you enter your special number and it reports where, in the 200 million decimal places of π that it is located, how many times it shows up (including not at all) and how long it took to search (typically in `bb 1/5` of a second).

Go to π Search Page and interact with it.

[Updated to add:] To see 10,000, 100,000 or 1 million digits of `bb pi` in all their glory in a downloaded file, a related site called digits of Pi; lets you view 10,000 places or access the download links.

Note that the probability that your birthday number string (8 digits long) will be embedded in the first 200 Million decimal places of π is equal to `bb (1 – 1/e^2)` which is just below 86.47%.

There is a World Ranking List of people who have memorized some number of digits of π and recited them. Please view π World Ranking List

Books worth reading:
  • Blatner, David (1997). The Joy of π . London, UK. Walker/Bloomsbury Books.The website Joy of π has a set of links to many π oriented pages including: π mysteries, music and π, memorizing π digits, having fun and enjoying weird aspects of π.

    David Blatner wrote about the statistical distribution of digits in the first million decimal places of π which are comprised of:

    99959  0's
    99758  1's
    100026 2's
    100229 3's
    100230 4's
    100359 5's
    99548  6's
    99800  7's
    99985  8's
    100106 9's

    For these data, I computed the following statistics: the mean (average) is 100000; the median (middle) is 100005.5; the standard deviation is 247.41 and the range of the data is 811, with the digit 5 being most frequent and the digit 6 being least frequent.

  • Beckmann, Petr (1971). A History of π (pi). New York, NY. St. Martin’s PressThis impressive source book traces the history of the constant and of the mathematicians that sought greater and greater π precision. There are many illustrations of artifacts and notations used to demonstrate the mathematics involved.

This completes the technical and non-technical tour of π. Please let me know if I’ve missed anything that could be further explored.

 

π Places






π (say ‘pie’) is the lower case Greek letter symbol that stands for a number defined as the ratio of the circumference (C) of a circle divided by its diameter (d).

Circumference means the distance around something, like a circle or a polygon. A synonym would be perimeter.

A diameter is the length of a line that is drawn starting from any point on the circle’s boundary and going through the center point of the circle and then continuing to the opposite boundary point. A radius (r) is the length of a line drawn from the center to any point on the circle’s boundary. It is half the diameter. It’s clearer in the diagram:


Parts Of A Circle

As a formula (no numbers, please), we write:

`pi = C/d `                                                  [1]

If d = 1 inch, then because π has a fixed, constant value, C must be equal to that value of π inches.

A closely related formula involving π involves the Area A, of a Circle and its radius r, (remember? – half of the diameter’s length). The formula is:

`pi = A/r^2 `                                                  [2]

Most descriptions of π state when it was first historically used and trace a series of numerical refinements in its approximated value over the last 4500 years.

A chronology of the computation of π from 2500 BC (calculated as `22/7` ) to the present is shown in the following link: π Chronology

It shows the date, the person, milestone or event and the decimal value of π (or the correct number of decimal digits after 3.) up through October 8, 2014, when π had been computed and verified (anonymously) to `1.33∙10^13` (over 13 Trillion) decimal digits.


Classifications

It turns out that π is an irrational number. (Johann Heinrich Lambert proved this in 1761.)

This means that we cannot discover a fraction that provides its value exactly. Writing its decimal digits never ends. So over the years, mathematicians have labored to compute π accurately to more and more decimal places using “good” fractions and infinite series where only the first “few” terms are evaluated and summed.

It is also true the π is a transcendental number. A transcendental number is an irrational number that is not an algebraic number. An algebraic number is one that satisfies a polynomial equation with a finite number of terms and with rational coefficients. So, for example, `sqrt(2)` is an irrational (non-repeating decimal number) and an algebraic number, since it is a solution to the algebraic equation: `x^2 = 2`

For non-algebraic equations, we have two equations from trigonometry, which use `pi/4` radians rather than 45 degrees:

tangent`(pi/4) = 1`                                                  [3]

and particularly its counterpart:

arctangent`(1) = pi/4`                                                [4]

is used as a starting point for determining π more precisely.


Rational (Fractional) Approximations

Methods of approximation started with Archimedes, who, around 250 BC, applied the method of exhaustion with inscribed and circumscribed regular polygons to enclose the lower and upper bounds of area and circumference of the circle. Shown below are inscribed and circumscribed pentagons, hexagons and octagons (n = 5, 6, 8). Starting with a hexagon, he doubled the sides for inner and outer n-gons until n = 96, finding the inner and outer perimeters at each value of n.


Pentagon, Hexagon, Octagon And Circle

Archimedes found bounds for π as:

`3 + 10/71 < pi < 3 + 10/70`      or      `223/71 < pi < 22/7`                   [5]

This gives the value of π to 2 decimal places, as 3.14.

A Cool Procedure And Shell Script For a better Fraction for π

Using the circumference formula [1], we can create a procedure based on one suggested by Rhett Allain (See:
Best Fraction For π to calculate a fraction that represents π for a given number of digits.

For reference, suppose πref is set to 3.14159265358979, which is to 14 decimal places. We use this 12 step procedure:

  1. Set πref = 3.14159265358979
  2. Set (1/πref) = 0.31830988618379
  3. Set loopcount = 500
  4. Set C = 22
  5. Set D = 7
  6. Set Percentage Error = 0.001
  7. Is loopcount = 0 ?
            Then output BestC, BestD, BestC/BestD, Error, BestLoop
             and stop

  8. Set πest = C/D
  9. Calculate Absolute Value of E = | (πref – πest) / πref |
    via E = | (πref – πest) ∙ (1/πref) |
  10. Is E < Error ?
               Then set BestC = C, set BestD=D, set Error = E,
               and set Bestloop = 500 – loopcount

  11. Is πest < πref ?
               Then add 1 to C and go to step 12
               Otherwise add 1 to D and go to step 12

  12. Subtract 1 from loopcount and go back to step 7 and compute new πest

Here is the bash shell script that I wrote to program this algorithm. It takes slightly under 7.5 seconds (what’s your hurry?) to finish 500 iterations (on a 2010 vintage 32-bit MacBook Pro) and 3.55 seconds on a 64-bit GNU/Linux Server.

The script successfully finds the fraction

`355/113` = 3.141592920353982                                              [6]

a close approximation to the reference π that is accurate to 6 decimal places. The percentage error using this fraction is 0.0000084913679%

In this script, the Linux utility bc is needed to offer/preserve 15 decimal place precision in the computations it performs. The shell facilities for operating with numbers using anything other than positive integers are minimal.


#! /bin/bash
USAGE="Usage: pifraction.bash"
# Version 1.0 by arkay on 1/17/2015
# Version 1.1 by arkay on 1/18/2015
#   Computed (1/Pi_ref) once and used it to multiply for 
#   % difference.

# Set Starting values for variables
Pi_ref="3.14159265358979"
Pi_ref_inv="0.31830988618379"
loopcount=500
C=22
D=7
Error="0.001"
# loop 500 times
until (( loopcount < 1 ))
do
# Calculate Pi.est
Pi_est=$(echo "scale=15; $C/$D" | bc -l)
# Calculate the percentage error E
E=$(echo "scale=15; ($Pi_ref - $Pi_est)*$Pi_ref_inv" | bc -l)
if [ "${E%%[0-9.]*}" == "-" ]  #Apply Absolute Value: 
# extract first character of E, either "–" or ""
then
	E=$(echo "scale=15; (-1)*$E" | bc -l)
fi
# Compare E with current minimum % Error
T=$( echo "scale=15; $E<$Error" | bc -l )
if [ "$T" -eq "1" ] # bc returns 1 if inequality true
then
	BestC=$C; BestD=$D; Error=$E; 
        BestLoop=$(expr 500 - $loopcount)
fi
# Compare Pi_est with Pi_ref
S=$( echo "scale=15; $Pi_est<$Pi_ref" | bc -l )
if [ "$S" -eq "1" ] # bc returns 1 if inequality true
then
	(( C += 1))
else
	(( D += 1))
fi
(( loopcount -= 1 ))
done	
# Produce the results
BestCoverD=$(echo "scale=15; $BestC/$BestD" | bc -l)
echo "BestC=$BestC BestD=$BestD BestCoverD=$BestCoverD"
echo "Pi_ref=$Pi_ref  Error=$Error  BestLoop=$BestLoop"
exit   # Normal stopping point

# End of pifraction.bash

Output results from this shell script are shown as:
BestC=355 BestD=113 BestCoverD=3.141592920353982
Pi_ref=3.14159265358979 Error=.000000084913679 BestLoop=439

In my next post about π, I will write about greater decimal place approximators and Interesting π activities to try.

Oh, one last thing…
This is a delightful Math Trick attributed to Martin Gardner:

Write all 26 letters of the alphabet, but start with the letter J as shown:

JKLMNOPQRSTUVWXYZABCDEFGHI

Then, remove all the letters that have vertical symmetry as shown:

JKL   N   PQRS              Z   BCDEFG

Now, count the letters that remain in each subset: 3 1 4 1 6.


Factorials For Fun

Start from the beginning. A factorial is a specific multiplication function applied to a positive integer value and all of its positive descendants. True but vague and tough to digest.

A factorial means to multiply a consecutive sequence of descending natural numbers.

Below, n is any positive integer (equal to a natural number), being multiplied:

            n! = n·(n–1)·(n–2)·…·3·2·1                                                                                             [1]

The symbol for a factorial is the exclamation point: ! . n! is pronounced ‘en’ factorial (math fans) or ‘en’ bang (programmers) or shriek out n (literalists and comedians).

Here are some examples:                      3! = 3·2·1 = 6

                                       10! = 10·9·8·7·6·5·4·3·2·1 = 3628800

{ Sudden question: In 1 second, how much would  9!  be? }

                                         1! = 1

To see the first 100 factorials, please admire the following table by N. J. A. Sloane:   First 100 Factorials

These integer results become huge as n increases. In a future post, I will write about how to determine the number of digits of n! without calculating the factorial and visually counting the digits.

Because mathematicians revere economy in writing, n! can be defined for any positive integer n by two rules:

            n! = n·(n–1)!
            and                                                                                                                              [2]
            1! = 1                                                                       

meaning, n! is the product of n and the factorial of the next lower value shown as (n – 1).

In the case of the second rule, this only works if we also have

            0! = 1                                                                                                               [3]

Since by rule 1 (and I know this may be a head-scratcher…)

           1! = 1·0! = 1                                                                                                      [4]

To find the value of n! , the first rule is applied repeatedly, as shown:

            n! = n·(n–1)! = n·(n–1)·(n-2)! = … =
            n
·(n–1)·(n-2)··3·2·1! =                                                                                  [5]
            n
·(n–1)·(n-2)··3·2·1

At this point, the original definition of n! in [1] has been recreated.

This [2] is called a recursive definition of n! . It has been said anonymously that, “in order to understand recursion, one must first understand recursion”.

In Computer Programming, n! can be determined via an iterative set of instructions, involving looping. Also, n! can be determined via a recursively called function or program, which calls itself repeatedly.

The Rosetta Code.org Wiki for factorial lists approximately 179 different programming languages with both iterative and recursive factorial code segments. It’s quite impressive – See:
Factorial Programs and Functions – Rosetta Code

Within this listing are programs that are part of the Unix/Linux facilities: awk, bash, bc, dc, JavaScript, m4, perl, php, python, R, ruby, and Unix shells (sh, ksh, zsh). These programs do not require a compiler. In future posts, these will be examined more closely.

There’s much more to factorials, which I will defer to further posts .

It has been suspected that mathematicians invented the factorial symbol (!) as a way of making mathematics look more exciting; after all, as the value of n increases, its factorial increases incredibly fast.