矢量数据入库
Lindorm(HBase)矢量数据导入,请参见快速入门。
栅格数据入库
- Pipeline技术
Pipeline模型是DLA Ganos基于GeoTrellis开源项目开发的用于栅格数据快速加载、处理和入库的ETL技术。
Pipeline模型包含了一系列功能模块:如读取数据(Load),转换(Transform),保存数据(Save)等。DLA Ganos Pipeline模型一般表示为一个JSON对象,其主要对象称为pipeline,该对象是要执行的步骤数组(还有一些JSON对象,我们将其称为Stage Objects)。DLA Ganos整个入库操作的Pipeline流程与相关参数全部通过一个JSON对象进行定义,一个简单的JSON脚本如下所示:[ { "uri" : "OSS资源URI", "type" : "singleband.spatial.read.oss" }, { "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.tile-to-layout" }, { "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.buffered-reproject" }, { "end_zoom" : 0, "resample_method" : "nearest-neighbor", "type" : "singleband.spatial.transform.pyramid" }, { "name" : "mask", "uri" : "oss://geotrellis-test/colingw/pipeline/", "key_index_method" : { "type" : "zorder" }, "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "type" : "singleband.spatial.write" } ]
- 入库流程
- 导入相关依赖。
import geotrellis.layer._ import geotrellis.spark.pipeline._ import geotrellis.spark.pipeline.json._ import geotrellis.spark._ import geotrellis.spark.store.kryo.KryoRegistrator import org.apache.spark.{SparkConf, SparkContext} import scala.util.{Failure, Try}
- 初始化Spark环境。
val conf = new SparkConf() .setMaster("local[*]") .setAppName("Spark Tiler") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrator", classOf[KryoRegistrator].getName) conf.set("spark.kryoserializer.buffer.max", "2047m") implicit val sc = new SparkContext(conf)
- 定义Pipeline JSON描述。以下示例是一个简单的pipeline模型,该模型定义的操作如下:
- 定义导入文件的URI与加载驱动。
- 数据分块模式(tile-to-layout)。
- 数据转换与冲投影等操作。
- 数据写入地址(Lindorm)。
val pipeline: String = """ |[ | { | "uri" : "OSS资源URI", | "time_tag" : "TIFFTAG_DATETIME", | "time_format" : "yyyy:MM:dd HH:mm:ss", | "type" : "singleband.spatial.read.hadoop" | }, | { | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.tile-to-layout" | }, | { | "crs" : "EPSG:3857", | "scheme" : { | "crs" : "EPSG:3857", | "tileSize" : 256, | "resolutionThreshold":0.1 | }, | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.buffered-reproject" | }, | { | "end_zoom" : 0, | "resample_method" : "nearest-neighbor", | "type" : "singleband.spatial.transform.pyramid" | }, | { | "name" : "srtm", | "uri" : "hbase://localhost:2181?master=localhost&attributes=attributes&layers=srtm-tms-layers", | "pyramid" : true, | "key_index_method" : { | "type" : "zorder" | }, | "scheme" : { | "tileCols" : 256, | "tileRows" : 256 | }, | "type" : "singleband.spatial.write" | } |] """.stripMargin
- 运行Pipeline模型。
//首先解析JSON描述的Pipeline模型,生成表达式集合: val list: List[PipelineExpr] = pipeline.pipelineExpr match { case Right(r) => r case Left(e) => throw e } //执行pipeline模型: val erasedNode = list.erasedNode Try { erasedNode.eval[Stream[(Int, TileLayerRDD[SpatialKey])]] } match { case Failure(e) => println("run failed as expected"); throw e case _ => }
- 导入相关依赖。
配置文件参考
数据加载objects
{
"uri" : "{oss| file | hdfs | ...}://...",
"time_tag" : "TIFFTAG_DATETIME", // optional field
"time_format" : "yyyy:MM:dd HH:mm:ss", // optional field
"type" : "{singleband | multiband}.{spatial | temporal}.read.{oss | hadoop}"
}
参数说明如下:
Key | Value |
---|---|
uri | 栅格数据源URI |
time_tag | 数据集元数据中的时间标签名称 |
type | 操作类型 |
说明 这里只有两种类型的读取器可用:通过Hadoop API从S3或从Hadoop支持的文件系统中读取。
数据写入objects
{
"name" : "layerName",
"uri" : "{oss| file | hdfs | ...}://...",
"key_index_method" : {
"type" : "{zorder | hilbert}",
"temporal_resolution": 1 // optional, if set - temporal index is used
},
"scheme" : {
"crs" : "epsg:3857",
"tileSize" : 256,
"resolutionThreshold" : 0.1
},
"type" : "{singleband | multiband}.{spatial | temporal}.write"
}
参数说明如下:
Key | Value |
---|---|
uri | 栅格数据源URI |
name | 图层名称 |
key_index_method | 从空间键(Satial Key)生成索引的键索引方法 |
key_index_method.type | 填充曲线类型:zorder, row-major, hilbert |
key_index_method. tmporal_resolution | 时间分辨率(单位:毫秒ms) |
scheme | 目标layout scheme |
scheme.crs | 目标scheme的crs参数 |
scheme.tileSize | layout scheme 数据块Tile尺寸 |
scheme.resolutionThreshold | 用户定义的布局方案的分辨率(可选字段) |
说明 这里只有两种类型的读取器可用:通过Hadoop API从OSS或Hadoop支持的文件系统中读取。
数据转换objects
- Tile To Layout
{ "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.tile-to-layout" }
说明 将RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] 转换为 RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})]模型参数说明如下:Key Options resample_method 重采样方法:nearest-neighborbilinearcubic-convolutioncubic-splinelanczos - ReTile To Layout
{ "layout_definition": { "extent": [0, 0, 1, 1], "tileLayout": { "layoutCols": 1, "layoutRows": 1, "tileCols": 1, "tileRows": 1 } }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.retile-to-layout" }
说明 将 RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] 对象按照用户配置的layout definition规则进行重新分块。 - Buffered Reproject
{ "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.buffered-reproject" }
说明 将 RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] 对象按照用户配置的layout scheme参数转换为目标CRS 数据分块。参数说明如下:Key Options crs 目标scheme的crs参数 tileSize layout scheme 数据块Tile尺寸 resolutionThreshold 用户定义的布局方案的分辨率(可选字段) resample_method 重采样方法:nearest-neighborbilinearcubic-convolutioncubic-splinelanczos - Per Tile Reproject
{ "crs" : "EPSG:3857", "scheme" : { "crs" : "epsg:3857", "tileSize" : 256, "resolutionThreshold" : 0.1 }, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.per-tile-reproject" }
说明 将 RDD[({ProjectedExtent | TemporalProjectedExtent}, {Tile | MultibandTile})] 对象按照用户配置的layout scheme参数转换为目标CRS 数据分块。参数说明如下:Key Options scheme 目标layout scheme scheme.crs 目标scheme的crs参数 scheme.tileSize layout scheme 数据块Tile尺寸 scheme. resolutionThreshold 用户定义的布局方案的分辨率(可选字段) resample_method 重采样方法:nearest-neighborbilinearcubic-convolutioncubic-splinelanczos - Pyramid
{ "end_zoom" : 0, "resample_method" : "nearest-neighbor", "type" : "{singleband | multiband}.{spatial | temporal}.transform.pyramid" }
说明 将RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})] 对象创建金字塔,直到 end_zoom 定义层级为止, 返回类型为Stream[RDD[({SpatialKey | SpaceTimeKey}, {Tile | MultibandTile})]].
关于Layout Scheme
LA Ganos 支持两种Layout Scheme模式:
- ZoomedLayoutScheme匹配TMS金字塔重要 ZoomedLayoutScheme需要知道从CRS获取的世界范围,以便构建TMS金字塔布局。这可能会导致重新采样输入栅格以匹配TMS级别的分辨率。
- FloatingLayoutScheme匹配输入栅格的原始分辨率。重要 FloatingLayoutScheme将发现本机分辨率和范围,并按给定的图块大小对其进行分区,而无需重新采样。