kd_tree_kNN分类

#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
'''
file: kd_tree_kNN.py
author: xjump.me#at#gmail#dot#com

REF:
  http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.html
'''

import numpy as np
import operator
from scipy.spatial import cKDTree as KDTree

import sys
sys.setrecursionlimit(10000)

if __name__=="__main__":
  v0 = np.array([1,2,3,4,5,6])
  train_data_set = np.array([
    [1.2,3,6,7,3,2],
    [2,9,17,7,6,59],
    [1.2,44,6,3,3,23],
    [9,3,51,7,3,100],
    [18,4,39,7,3,21],
    [66,8,28,7,3,88],
    [3,1,2,7,3,33],
    [24,0.5,1,7,3,56],
    [22,99,7,7,3,0.6],
    [70,13,9,7,3,2],
  ])
  tree = KDTree(train_data_set)
  for k in range(1,10):
    print 'k=',k
    #test sample v0, return 3 nearest point, dimesion is k
    print tree.query(v0,k=3,p=k)

HBase技术介绍(转)

HBase简介

HBase, Hadoop Database,是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群。

HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统;Google运行MapReduce来处理Bigtable中的海量数据,HBase同样利用Hadoop MapReduce来处理HBase中的海量数据;Google Bigtable利用 Chubby作为协同服务,HBase利用Zookeeper作为对应。

HBase介绍(转)

一、简介

history

  • started by chad walters and jim
  • 2006.11 G release paper on BigTable
  • 2007.2 inital HBase prototype created as Hadoop contrib
  • 2007.10 First useable Hbase
  • 2008.1 Hadoop become Apache top-level project and Hbase becomes subproject
  • 2008.10 Hbase 0.18,0.19 released

HBase是bigtable的开源山寨版本。是建立的hdfs之上,提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库系统。