Mathematical modeling of fake news

I took Mathematical Modelling Basics course during the last couple of months. It was produced by Delft University of Technology, offered for free on edX. Thanks TUDelft and edX!

It is a great course introducing mathematical modeling. I like the fact in this short course, 3 important areas are covered with good practices: mathematics, computer programming, and technical writing.

  • The math involved is modeling with system of ordinary differential equations. Both analytical and numeric solutions to said model are introduced and practiced with good exercises;
  • The programming part is using Python to solve the system of ordinary differential equations numerically, with NumPy, using Euler’s method. Plotting is also introduced and practiced with Matplotlib;
  • Finally the course asks students to write a technical report using LaTex.

I also like the fact that the course encourages students to form a team and work together. I was very fortunate to be able to work with Zeus Garyulo, an Argentine currently working in Finland. Zeus is a wonderful teammate, very smart and has a much better grasp of math involved than I have. He broke down problems into actionable items quickly and provided the majority of the modeling, analysis, and validation work. Thanks Zeus!

The problem we chose to tackle is the spread of fake news. Without further ado, below is our report.

fail2ban installation and configuration notes

A couple of days ago one web site I volunteer to manage was under DDOS attack. I installed and configured fail2ban to protect us from future similar attacks. Here are some notes. The server is the RedHat/Fedora/CentOS variety, as you can tell from commands listed below. Please translate them to your distro’s corresponding commands as needed.

  • Installation is easy:
    sudo yum install fail2ban

    To make fail2ban starts automatically after a reboot, run this:

    sudo systemctl enable fail2ban

  • Configuring is relatively easy. It’s recommended that you create your own jail configuration file, using the jail.conf from the installation as a starting point. Three things are noteworthy from my experience:
    1. Make sure that you provide the correct log file. For web server, there are typically one access log file and one error log file. Ensure that you feed the right log file when using a particular filter;
    2. On this server, fail2ban didn’t properly expand the log and file names when I put wildcard characters in them. I got around that by listing them one by one.
    3. In the jail.conf file, no default banaction was defined. I added the following line:
    banaction = iptables-multiport
  • To write your own custom filter, make sure you put a sample log entry inside the filter file as a comment. Use the following command to debug your filter:
    sudo fail2ban-regex /path2testLogfile/test.log /etc/fail2ban/filter.d/my-filter.conf
    Here is a filter that I wrote:

    failregex = ^ -.*”POST \/component\/mailto\/\?tmpl=component\&link=aHR0cHM6.*”$

    ignoreregex =

  • After getting your jail.local ready, run the following command to debug any potential issues. I’ve found that if you have issues with your jail or filter files, “sudo systemctl start fail2ban” doesn’t always give you a good enough error message. Use this instead:

    sudo /usr/bin/fail2ban-client -x start

    You may need to start/stop a couple of times. To stop, run

    sudo /usr/bin/fail2ban-client -x stop

  • After debugging, before you finally start fail2ban service, it’s better to search the current access/error log and see if there is a match to the filter you defined. If yes, then take a note of its IP address and the last time it appears in the log file. Then start fail2ban by running
    sudo systemctl enable fail2ban
  • To verify that it works, run iptables -S and if it catches one offender and puts it in jail, you should see it in the output. Now go back to the access/error log and ensure there is no entry from that IP address since the last timestamp.

Good luck in protecting your servers!


1. 怎么更新Ubuntu Linux;
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install build-essential
2. 怎么安装Python包;
sudo apt-get install python-pip
sudo apt-get install python-dev
sudo pip install numpy
sudo pip install ggplot…
3. 怎么安装R和R包;
sudo apt-get install r-base
sudo apt-get install openjdk-7-jdk
sudo R
4. 如何方便快捷下载视频:
Firefox, 插件DownThemAll

Useful code snippet for Python list fold/reduce

Going through one of the MOOC courses, I came across a homework problem that I solved with one of Python’s list reduce functions, any(), throwing in an interesting functional programming twist. I save it here for my own reference. I won’t mention which course this is, lest I give the answer away to one of its many interesting assignments. But from what I learned so far, it is a great course from Udacity.

For those interested, this code snippet also uses various Python data structures like lists and dictionaries, recursion, not to mention the nice comments on FSM (Finite State Machine), regular expression implementation, and the meaning of determinism. All in all, a pretty good example to illustrate various important concepts in computer science.

Regarding recursion usage in this code, I *think* it is going to be pretty efficient, because the recursive calls won’t create additional frames that the program has to keep track of. So no need for tail recursion.

# Title: Simulating Non-Determinism

# Each regular expression can be converted to an equivalent finite state
# machine. This is how regular expressions are implemented in practice. 
# We saw how non-deterministic finite state machines can be converted to
# deterministic ones (often of a different size). It is also possible to
# simulate non-deterministic machines directly -- and we'll do that now!
# In a given state, a non-deterministic machine may have *multiple*
# outgoing edges labeled with the *same* character. 
# To handle this ambiguity, we say that a non-deterministic finite state
# machine accepts a string if there exists *any* path through the finite
# state machine that consumes exactly that string as input and ends in an
# accepting state. 
# Write a procedure nfsmsim that works just like the fsmsim we covered
# together, but handles also multiple outgoing edges and ambiguity. Do not
# consider epsilon transitions. 
# Formally, your procedure takes four arguments: a string, a starting
# state, the edges (encoded as a dictionary mapping), and a list of
# accepting states. 
# To encode this ambiguity, we will change "edges" so that each state-input
# pair maps to a *list* of destination states. 
# For example, the regular expression r"a+|(?:ab+c)" might be encoded like
# this:
edges = { (1, 'a') : [2, 3],
          (2, 'a') : [2],
          (3, 'b') : [4, 3],
          (4, 'c') : [5] }
accepting = [2, 5] 
# It accepts both "aaa" (visiting states 1 2 2 and finally 2) and "abbc"
# (visting states 1 3 3 4 and finally 5). 

def nfsmsim(string, current, edges, accepting): 
# fill in your code here 
    if string == "":
        return current in accepting
        letter = string[0]
        # Is there a valid edge?
        if (current, letter) in edges:
            return any(nfsmsim(string[1:], i, edges, accepting) for i in edges[(current, letter)])
            return False

# This problem includes some test cases to help you tell if you are on
# the right track. You may want to make your own additional tests as well.

print "Test case 1 passed: " + str(nfsmsim("abc", 1, edges, accepting) == True) 
print "Test case 2 passed: " + str(nfsmsim("aaa", 1, edges, accepting) == True) 
print "Test case 3 passed: " + str(nfsmsim("abbbc", 1, edges, accepting) == True) 
print "Test case 4 passed: " + str(nfsmsim("aabc", 1, edges, accepting) == False) 
print "Test case 5 passed: " + str(nfsmsim("", 1, edges, accepting) == False) 

Managing Windows services with sysinternals tools and Python

Below is the copy of the README file from one of my github repo. Feel free to download the code and play with it. If you have ideas and good Python Windows admin scripts, share them!

Working with Windows, one frequently needs to manage various services: finding their status, startup modes, startup accounts, stopping them, starting them, restarting them, etc. One can manage all this via a GUI tool by runninng “services.msc” manually, which is useful and handy. But manual method does not scale and is error prone. For example, when it comes time to stop and patch certain service on many servers, it is nice to have a proven, automated way of managing them.

For that, we need command line tools and automation. On the command line, Windows has NET commands, sc.exe from the Resource Kit, and psservice.exe from Sysinternals. Those are the set of tools I’ve used. Of the three, I like psservice the best, because it supports managing servers remotely and is easy to install (unzipping and no registry and directory pollution). Yes, I am aware of PowerShell, pywin32, etc., but that is not the focus of this project.

winServicePy combines Python and Sysinternal tools, psservice in particular, to manage Windows services easily across many servers in a company. All base functions in have been tested with unittest module. Tests are included in the tests folder.

1. Python 2.7;
2. Sysinternals tools in PATH;
3. Access to an administrative login on the servers you are trying to manage, and run all utilities under that Administrative login account.
Initially I coded functions that accepted login and passwords, but due to the fact that password can contain all sorts of characters, thus making it very hard to escape, I gutted that function. The code that includes login and password parameters is tagged, as you can see in github.

These set of tools can do 5 things:
1. Get service startup account;
2. getServiceStartupType. Get service startup mode;
3. getServiceStatus. Get service running status;
4. Set service startup mode auto|manual|disabled
5. Set service status by issuing STOP|START|RESTART command

Running python without any parameters gives the usage information. All 5 scripts accept at least 3 switches:

-f is a boolean switch. The presence of which indicates the input is a file.
-i is input for the script. If -f is present, then this should be a file that contains servers you want to manage, each one occupies its own line in the file. If -f is absent, then you can enter server names on command line, separated by a single comma only.
-s is service name. To find out a server name, go to service.msc and look at the service’s property.
-t, only available for, indicate the startup mode you want the service to be in. Possible choices are auto, manual, and disabled.
-a, only availabel for, indicate if you want to stop, start, or restart the service. Possible choices are stop, start, and restart.