I'm quite lazy... and to keep this blog clean, I decided to create a now one solely dedicated to security and programming. Please check it out: dakhma.net
Oh by the way, a dakhma is construction for the purification of dead bodies in Zarathustrian religion, i.e. the bodies get cleaned by birds.
Last notice: I hope to post more language/linguistic related news soon. Though I'm currently working on quite a lot of projects.
Monday 15 March 2010
Sunday 13 December 2009
Zipf's law
Last friday I gave presentation on Zipf's law primarily concerned with processing frequency lists and spectra in R and zipfR. The scripts contain two Python programs for extracting frequencies out of NLTK's internal Gutenberg selection corpus and the section J of the ACL Anthology corpus. If you don't have access to the ACL, I provide the processed TFL and SPC files for both corpora in the ZIP file.
Sunday 11 October 2009
Some NLP-related Python code
1. A program that counts the Flesch Score of a text. Code here. Don't know if the syllables are computed correctly.
2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!
3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.
4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.
2. A program that searches a wordlist for minimal pairs. Code here. Example here. The format of the wordlist is restrictive and the minimal pairs are printed twice!
3. A program that obfuscates the input, which means that first and last letter are the same but everything in between is mixed around. Code here.
4. A program that constructs a tree from a file and searches for the common minimal ancestor of two nodes. Code here. Example here.
Labels:
Computational Linguistics,
Linguistics,
Programming,
Python
Monday 28 September 2009
Python-based RPN Evaluator
This program evaluates logic expressions out of a textfile with Reveresed Polish Notation (RPN) syntax.
Example world file:
wind
/sun
/rain
red
wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python log.py myworld.world".
It quits when an empty expression occurs.
Example usage:
C:\Python26>python log.py myworld.world
Logical Expression: rain sun &
0
Logical Expression: sun red |
1
Logical Expression: sun wind ^
True
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:
C:\Python26>
Find the source code here.
Example world file:
wind
/sun
/rain
red
wind and red have the value of 1, sun and rain 0 since they are prefixed by "/".
Here's the syntax to run the program: "python log.py myworld.world".
It quits when an empty expression occurs.
Example usage:
C:\Python26>python log.py myworld.world
Logical Expression: rain sun &
0
Logical Expression: sun red |
1
Logical Expression: sun wind ^
True
Logical Expression: winter sun &
*** Error while evaluating: Bad name: 'winter'.
Logical Expression: sun red
*** Error while evaluating: Unbalanced expression: 'sun red'.
Logical Expression: sun red red |
*** Error while evaluating: Unbalanced expression: 'sun red red |'.
Logical Expression:
C:\Python26>
Find the source code here.
Friday 25 September 2009
Download all SMBC Comics
Simple regex-based bruteforce program to save all comics from http://www.smbc-comics.com/. You'll need http://commons.apache.org/io/ and my source code.
Tuesday 15 September 2009
Google Cheat Sheet 0.11
Wrote a Google Cheat Sheet: http://rapidshare.com/files/280485137/gcs.pdf
It's simple and contains every working function in Google Search, Groups, News, Calculator. What's missing? Query suggestions...
It's simple and contains every working function in Google Search, Groups, News, Calculator. What's missing? Query suggestions...
Saturday 15 August 2009
Poor Networks, Neurons and Lookaheads
Syntactic networks bear similarities to biological networks since their levels are scale-free, i.e. the distribution of nodes and edges follow a power law (e.g. social networks), and small-world, i.e. most nodes can be reached by a relatively small number of steps (e.g. social networks):
A group of researchers at the Institute of Applied Linguistics in Beijing, China tried to find similarities between semantic and syntactic networks via a statistical approach and a treebank with semantic roles. Both networks are represented by small-world and scale-free graphs but differ in hierarchical structure, k-Nearest-Neighbour correlation and semantic networks tend to create longer paths, which makes it a poorer hierarchy in comparison to syntactic networks: Statistical properties of Chinese semantic networks
Temporal fluctations in speech are easily corrected by our brain. For decades this mechanism was a mystery. Two researches of the Hebrew University of Jerusalem, Israel described how neurons adjust to decode distorted sound perfectly. Although I don't understand this very technical paper, it'll perhaps provide new algorithms for speech processing: Time-Warp-Invariant Neuronal Processing
Another improvement for speech recognition and production was achieved by the Max Plank Society which developed a new mathematical model. It's based on the look-ahead assumption, i.e. our brain tries to estimate the most probable sound-sequence based on previous information, e.g. 'hot su...' = 'sun' > 'supper': Recognizing Sequences of Sequences
A group of researchers at the Institute of Applied Linguistics in Beijing, China tried to find similarities between semantic and syntactic networks via a statistical approach and a treebank with semantic roles. Both networks are represented by small-world and scale-free graphs but differ in hierarchical structure, k-Nearest-Neighbour correlation and semantic networks tend to create longer paths, which makes it a poorer hierarchy in comparison to syntactic networks: Statistical properties of Chinese semantic networks
Temporal fluctations in speech are easily corrected by our brain. For decades this mechanism was a mystery. Two researches of the Hebrew University of Jerusalem, Israel described how neurons adjust to decode distorted sound perfectly. Although I don't understand this very technical paper, it'll perhaps provide new algorithms for speech processing: Time-Warp-Invariant Neuronal Processing
Another improvement for speech recognition and production was achieved by the Max Plank Society which developed a new mathematical model. It's based on the look-ahead assumption, i.e. our brain tries to estimate the most probable sound-sequence based on previous information, e.g. 'hot su...' = 'sun' > 'supper': Recognizing Sequences of Sequences
Subscribe to:
Posts (Atom)