Monday 15 March 2010

New Blog

I'm quite lazy... and to keep this blog clean, I decided to create a now one solely dedicated to security and programming. Please check it out: dakhma.net

Oh by the way, a dakhma is construction for the purification of dead bodies in Zarathustrian religion, i.e. the bodies get cleaned by birds.


Last notice: I hope to post more language/linguistic related news soon. Though I'm currently working on quite a lot of projects.

Sunday 13 December 2009

Zipf's law

Last friday I gave presentation on Zipf's law primarily concerned with processing frequency lists and spectra in R and zipfR. The scripts contain two Python programs for extracting frequencies out of NLTK's internal Gutenberg selection corpus and the section J of the ACL Anthology corpus. If you don't have access to the ACL, I provide the processed TFL and SPC files for both corpora in the ZIP file.

Download:
[Slides]
[Scripts]

Sunday 11 October 2009

Some NLP-related Python code

1. A program that counts the Flesch Score of a text. Code here. Don't know if the syllables are computed correctly.

2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!

3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.

4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.

Monday 28 September 2009

Python-based RPN Evaluator

This program evaluates logic expressions out of a textfile with Reveresed Polish Notation (RPN) syntax.

Example world file:
wind
/sun
/rain
red

wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python log.py myworld.world".
It quits when an empty expression occurs.

Example usage:
C:\Python26>python log.py myworld.world
Logical Expression: rain sun &
0
Logical Expression: sun red |
1
Logical Expression: sun wind ^
True
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:

C:\Python26>



Find the source code here.

Friday 25 September 2009

Download all SMBC Comics

Simple regex-based bruteforce program to save all comics from http://www.smbc-comics.com/. You'll need http://commons.apache.org/io/ and my source code.

Tuesday 15 September 2009

Google Cheat Sheet 0.11

Wrote a Google Cheat Sheet: http://rapidshare.com/files/280485137/gcs.pdf

It's simple and contains every working function in Google Search, Groups, News, Calculator. What's missing? Query suggestions...

Saturday 15 August 2009

Poor Networks, Neurons and Lookaheads

Syntactic networks bear similarities to biological networks since their levels are scale-free, i.e. the distribution of nodes and edges follow a power law (e.g. social networks), and small-world, i.e. most nodes can be reached by a relatively small number of steps (e.g. social networks):


From Wikipedia [EN] [ES]


A group of researchers at the Institute of Applied Linguistics in Beijing, China tried to find similarities between semantic and syntactic networks via a statistical approach and a treebank with semantic roles. Both networks are represented by small-world and scale-free graphs but differ in hierarchical structure, k-Nearest-Neighbour correlation and semantic networks tend to create longer paths, which makes it a poorer hierarchy in comparison to syntactic networks: Statistical properties of Chinese semantic networks


Temporal fluctations in speech are easily corrected by our brain. For decades this mechanism was a mystery. Two researches of the Hebrew University of Jerusalem, Israel described how neurons adjust to decode distorted sound perfectly. Although I don't understand this very technical paper, it'll perhaps provide new algorithms for speech processing: Time-Warp-Invariant Neuronal Processing

Another improvement for speech recognition and production was achieved by the Max Plank Society which developed a new mathematical model. It's based on the look-ahead assumption, i.e. our brain tries to estimate the most probable sound-sequence based on previous information, e.g. 'hot su...' = 'sun' > 'supper': Recognizing Sequences of Sequences