This is a purely technical blog concerning topics such as Python, Ruby, Scala, Go, Linux, open source software, the Web, and lesser-known programming languages. Ad maiorem Dei gloriam inque hominum salutem.
Monday, June 25, 2007
Random Comments from Google Developer Day
I went to Google Developer Day. Yeah, yeah, I know, that was weeks ago, and I'm only finally blogging about it now. Better late than never! Here are some random, sparse comments:
Google Infrastructure Talk
Google was still at Stanford in '97. In their current design for servers, they went back to not using cases for the servers. They're still using low end hardware. Note that GFS is not at the kernel level. They have 200+ clusters. MapReduce is not used for user search. It's more for heavy duty tasks like indexing. BigTable is pretty amazing. It's a distributed, multi-dimensional, sparse map. They have fine-grained load balancing and fast recovery. They have distributed locks and a locking service. Their largest [BigTable?] is 3000TB on several thousand machines. I asked, and he said that open sourcing GFS "isn't unthinkable".
Google Web Toolkit
Alex Martelli's Design Patterns in Python Talk
This is the third time I've seen this talk, and this time I was able to understand everything he said ;)
Theorizing the Data: Avoiding the Capital Mistake
This was a great talk about statistical approaches to linguistics. Probability stats papers were really big at the ACM in 2006. Everyone is fighting the spam problem. The speaker emphasized that more data results in better results, which is why he went to Google. Lots of data results in good machine learning which results in more useable language translations. In trying to do automated translations, nothing matter more than statistics. Getting hints from linguists wasn't all that helpful when they tried it. It would appear that humans may learn language by having a statistical understanding of patterns; after all, there are too many rules with too many exceptions.