Monday, 15 March 2010

New Blog

I'm quite lazy... and to keep this blog clean, I decided to create a now one solely dedicated to security and programming. Please check it out:

Oh by the way, a dakhma is construction for the purification of dead bodies in Zarathustrian religion, i.e. the bodies get cleaned by birds.

Last notice: I hope to post more language/linguistic related news soon. Though I'm currently working on quite a lot of projects.

Sunday, 13 December 2009

Zipf's law

Last friday I gave presentation on Zipf's law primarily concerned with processing frequency lists and spectra in R and zipfR. The scripts contain two Python programs for extracting frequencies out of NLTK's internal Gutenberg selection corpus and the section J of the ACL Anthology corpus. If you don't have access to the ACL, I provide the processed TFL and SPC files for both corpora in the ZIP file.


Sunday, 11 October 2009

Some NLP-related Python code

1. A program that counts the Flesch Score of a text. Code here. Don't know if the syllables are computed correctly.

2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!

3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.

4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.

Monday, 28 September 2009

Python-based RPN Evaluator

This program evaluates logic expressions out of a textfile with Reveresed Polish Notation (RPN) syntax.

Example world file:

wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python".
It quits when an empty expression occurs.

Example usage:
Logical Expression: rain sun &
Logical Expression: sun red |
Logical Expression: sun wind ^
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:


Find the source code here.

Friday, 25 September 2009

Download all SMBC Comics

Simple regex-based bruteforce program to save all comics from You'll need and my source code.

Tuesday, 15 September 2009

Google Cheat Sheet 0.11

Wrote a Google Cheat Sheet:

It's simple and contains every working function in Google Search, Groups, News, Calculator. What's missing? Query suggestions...

Saturday, 15 August 2009

Poor Networks, Neurons and Lookaheads

Syntactic networks bear similarities to biological networks since their levels are scale-free, i.e. the distribution of nodes and edges follow a power law (e.g. social networks), and small-world, i.e. most nodes can be reached by a relatively small number of steps (e.g. social networks):

From Wikipedia [EN] [ES]

A group of researchers at the Institute of Applied Linguistics in Beijing, China tried to find similarities between semantic and syntactic networks via a statistical approach and a treebank with semantic roles. Both networks are represented by small-world and scale-free graphs but differ in hierarchical structure, k-Nearest-Neighbour correlation and semantic networks tend to create longer paths, which makes it a poorer hierarchy in comparison to syntactic networks: Statistical properties of Chinese semantic networks

Temporal fluctations in speech are easily corrected by our brain. For decades this mechanism was a mystery. Two researches of the Hebrew University of Jerusalem, Israel described how neurons adjust to decode distorted sound perfectly. Although I don't understand this very technical paper, it'll perhaps provide new algorithms for speech processing: Time-Warp-Invariant Neuronal Processing

Another improvement for speech recognition and production was achieved by the Max Plank Society which developed a new mathematical model. It's based on the look-ahead assumption, i.e. our brain tries to estimate the most probable sound-sequence based on previous information, e.g. 'hot su...' = 'sun' > 'supper': Recognizing Sequences of Sequences