Open source grid computing takes off

This has been fun to watch. The Hadoop team at Yahoo! is moving quickly to push the technology to reach its potential. They’ve now adopted it on one of the most important applications in the entire business, Yahoo! Search.

From the the Hadoop Blog:

The Webmap build starts with every Web page crawled by Yahoo! and produces a database of all known Web pages and sites on the internet and a vast array of data about every page and site. This derived data feeds the Machine Learned Ranking algorithms at the heart of Yahoo! Search.

Some Webmap size data:

Number of links between pages in the index: roughly 1 trillion links

Size of output: over 300 TB, compressed!

Number of cores used to run a single Map-Reduce job: over 10,000

Raw disk used in the production cluster: over 5 Petabytes

I’m still trying to figure out what all this means, to be honest, but Jeremy Zawodny helps to break it down. In this interview, he gets some answers from Arnab Bhattacharjee (manager of the Yahoo! Webmap Team) and Sameer Paranjpye (manager of our Hadoop development):

The Hadoop project is opening up a really interesting discussion around computing scale. A few years ago I never would have imagined that the open source world would be contributing software solutions like this to the market. I don’t know why I had that perception, really. Perhaps all the positioning by enterprise software companies to discredit open source software started to sink in.

As Jeremy said, “It’s not just an experiment or research project. There’s real money on the line.”

For more background on what’s going on here, check out this article by Mark Chu-Carroll “Databases are hammers; MapReduce is a screwdriver”.

This story is going to get bigger, I’m certain.

Open source grid computing takes off

Trending Articles

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

FIPS issues in Windows, .NET, and Visual Studio

Youth accused of having 17 forged $100 notes

XLN Audio Addictive Drums 2 Complete v2.1.7 Incl Keygen HAPPY NEW YEAR-R2R

Aberdeen woman in £15k benefit fraud

Exit Code 17006 when trying to update and Office 365 install with Project

Words and Expressions Class 9 Solutions | NCERT Class 9 Words and Expressions...

Error code CE-30095-7

Deployment configuration on indexers - DC:DeploymentClient -...

How to assign the custom BDXXX scripts to NPCs?

Oh Nadhaa - Umar zahir cover.

Practice Sheet of Right form of verbs for HSC Students

Doctor winning fight to the death against cancer

XXX esx.problem.hyperthreading.unmitigated.formatOnHost not found XXX (Build...

Questions regarding Proxmox and Dell Power Edge VRTX

(((IMPORTANT))): How to absorb high info content courses quickly

Forum Post: RE: Pump Selecting

UDP RSS performance issues with vmxnet3 on ESXi-6.7

[Forums] Plugin development : Error: Cannot find module 'lodash'

Manchester United Font 2021-2022