Saturday, April 19, 2008

Links 2008-04-19: Mahout, Maakit, blog APIs, AWS, python

Technorati Tags: , , , , , ,

Monday, April 14, 2008

Links 2008-04-14: GTD, Rpy, Django, text analytics

  • Tracks rails app for GTD
  • Getting things done (simply) in Leopard The best Mac GTD app might be right under your nose (Dennis Best)
  • RPy is a very simple, yet robust, Python interface to the R Programming Language. It can manage all kinds of R objects and can execute arbitrary R functions (including the graphic functions). All errors from the R language are converted to Python exceptions. Any module installed for the R system can be used from within Python. See also An introduction to how we can interact with R from Python
  • Django People lists 2025 Django developers from around the world, in 85 different countries. The aim of the site is to help Django developers find like-minded souls near them, and hopefully kick-start some local meet-ups and user groups.
  • Django Pluggables Find reusable applications for your Django project
  • Open Computer: The Smart Alternative to an Apple Why spend $1999 to get the least expensive Apple computer with a decent video card when you can pay less than a fourth of that for an equivalent sleek and small form-factor desktop with the same hardware.
  • Orange is a component-based data mining software. It includes a range of preprocessing, modelling and data exploration techniques. It is based on C++ components, that are accessed either directly (not very common), through Python scripts
  • Flaptor Autotagger Demo TagAssist
  • Term Extraction Documentation for Yahoo! Search Web Services The Term Extraction Web Service provides a list of significant words or phrases extracted from a larger content.

Technorati Tags: , ,

Sunday, April 13, 2008

Links 2008-04-13: EC2, HBase, OASIS, Hypertable

  • Persistent Storage for Amazon EC2
  • X-Trace is a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex Internet applications. It does this by annotating network requests with metadata that can be used to reconstruct requests, even those that make use of multiple network layers. X-Trace “enabled” Internet sites make use of these identifiers to record the path that requests take through their network.
  • OASIS is a shared locality-aware server selection infrastructure. OASIS is organized as an infrastructure overlay, providing high availability and scalability.
  • HBase @ Rapleaf
  • Kosmix File System on SourceForge
  • Hypertable is an open source project based on published best practices and our own experience in solving large-scale data-intensive tasks. Our goal is to bring the benefits of new levels of both performance and scale to many data-driven businesses who are currently limited by previous-generation platforms. Our goal is nothing less than that Hypertable become one of the world’s most massively parallel high performance database platforms.
  • BIND 9 Manual
  • CodeIgniter MVC framework PHP
  • Pligg CMS digg-like site framework (I think)
  • Yahoo! planning to index microformats
  • python-libmemcached

Technorati Tags: , , , , , ,

Saturday, April 12, 2008

Links 2008-04-12: Jaql, Hadoop, ThruDB, Python & AWS, DNS

  • PottyMouth transforms completely unstructured and untrusted text to valid, nice-looking, completely safe XHTML. PottyMouth is designed to handle input text from non-technical, potentially careless or malicious users. It produces HTML that is completely safe, programmatically and visually, to include on any web page. PottyMouth is ideal for displaying blog comments, text email bodies in a web mail application or mailing list web archive, or any text fields on any site with user input text, such as a social networking, dating, or community site. In short, any input which is displayed in HTML and is input as text by a non-technical and/or untrusted user.
  • Jaql - a query language for JSON Demonstrates an example of JSON data, describes the key features of Jaql and shows how it can be used to process JSON data in parallel using Hadoop's map/reduce framework.
  • Pig is a dataflow programming environment for processing very large files. Pig compiles these dataflow programs into (sequences of) map-reduce jobs and executes them using Hadoop. It is also possible to execute Pig Latin programs in a "local" mode (without Hadoop cluster), in which case all processing takes place in a single local JVM.
  • Hadoop On Demand is a system for provisioning and managing independent Hadoop MapReduce and HDFS instances on a shared cluster of nodes. HOD is a tool that makes it easy for administrators and users to quickly setup and use Hadoop.
  • Hadoop Streamng allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer. For example:
    $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
        -input myInputDirs \
        -output myOutputDir \
        -mapper /bin/cat \
        -reducer /bin/wc
    
  • Red Black Trees Tutorial
  • Jing Project - The concept of Jing is the always-ready program that instantly captures and shares images and video…from your computer to anywhere.
  • Historical Graphs for Mortgage Rates
  • The JasPer Project is an open-source initiative to provide a free software-based reference implementation of the codec specified in the JPEG-2000 Part-1 standard (i.e., ISO/IEC 15444-1).
  • DNS for Rocket Scientists Online guide about DNS and (mostly) BIND 9.x on Linux (Fedora Core), BSD's (FreeBSD, OpenBSD and NetBSD)
  • ThruDB Document Oriented Database Services
  • PyAWS

Technorati Tags: , , ,

Monday, February 18, 2008

Links 2008-02-18: nginx, httplib2, Open Social

  • When data center cabling becomes art
  • Starling is a light-weight persistent queue server that speaks the MemCache protocol. It was built to drive Twitter's backend, and is in production across Twitter's cluster.
  • nginx [engine x] is a HTTP server and mail proxy server
  • Authoratory is a unique database of contact information, professional interests, social connections and funding of 630,570 leading scientists!
  • httplib2 is a comprehensive HTTP client library that supports many features left out of other HTTP libraries. (python)
  • OpenSocial Persistence Data API Developer's Guide: Protocol

Technorati Tags: , ,