Titan-hadoop訪問DBpedia檔案內容
環境: Centos, Titan-0.5.0-Hadoop2
Titan-hadoop 實現對N_TRIPLES格式的RDF 訪問,從dbpedia下載nt格式的檔案(例如: ),編寫訪問屬性檔案,如下:
[cloudera@localhost titan-0.5.0-hadoop2]$ vi conf/hadoop/rdf-input.properties
# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.edgelist.rdf.RDFInputFormat
titan.hadoop.input.location=examples/labels_en_uris_zh.nt
titan.hadoop.input.conf.format=N_TRIPLES
titan.hadoop.input.conf.as-properties=
titan.hadoop.input.conf.use-localname=true
titan.hadoop.input.conf.literal-as-property=true
# output data parameters
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.graphson.GraphSONOutputFormat
titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
查詢資料:
[cloudera@localhost titan-0.5.0-hadoop2]$ gremlin.sh
gremlin> g = HadoopFactory.open("conf/hadoop/rdf-input.properties")
Titan-hadoop 實現對N_TRIPLES格式的RDF 訪問,從dbpedia下載nt格式的檔案(例如: ),編寫訪問屬性檔案,如下:
[cloudera@localhost titan-0.5.0-hadoop2]$ vi conf/hadoop/rdf-input.properties
# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.edgelist.rdf.RDFInputFormat
titan.hadoop.input.location=examples/labels_en_uris_zh.nt
titan.hadoop.input.conf.format=N_TRIPLES
titan.hadoop.input.conf.as-properties=
titan.hadoop.input.conf.use-localname=true
titan.hadoop.input.conf.literal-as-property=true
# output data parameters
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.graphson.GraphSONOutputFormat
titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
查詢資料:
[cloudera@localhost titan-0.5.0-hadoop2]$ gremlin.sh
gremlin> g = HadoopFactory.open("conf/hadoop/rdf-input.properties")
gremlin> g.V.map()
......
17:37:12 INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
17:37:12 INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1370056218_0005_r_000000_0' done.
17:37:13 INFO org.apache.hadoop.mapreduce.Job - Job job_local1370056218_0005 completed successfully
17:37:13 INFO org.apache.hadoop.mapreduce.Job - Counters: 35
File System Counters
FILE: Number of bytes read=2911187173
FILE: Number of bytes written=3038059762
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=405909
Map output records=405909
Map output bytes=65118176
Map output materialized bytes=66297322
Input split bytes=268
Combine input records=405909
Combine output records=405909
Reduce input groups=405909
Reduce shuffle bytes=0
Reduce input records=405909
Reduce output records=0
Spilled Records=811818
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5136
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2091909120
com.thinkaurelius.titan.hadoop.formats.edgelist.EdgeListInputMapReduce$Counters
IN_EDGES_CREATED=0
OUT_EDGES_CREATED=0
VERTEX_PROPERTIES_CREATED=1217727
VERTICES_CREATED=405909
VERTICES_EMITTED=405909
com.thinkaurelius.titan.hadoop.mapreduce.transform.PropertyMapMap$Counters
VERTICES_PROCESSED=405909
com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Counters
EDGES_PROCESSED=0
VERTICES_PROCESSED=405909
File Input Format Counters
Bytes Read=54114517
File Output Format Counters
Bytes Written=0
==>47994559900176 {label_=[慾望], _id=[47994559900176], name=[Want], uri=[]}
==>60888991522182 {label_=[無機化學命名法], _id=[60888991522182], name=[IUPAC_nomenclature_of_inorganic_chemistry], uri=[]}
==>78841791384159 {label_=[諾伊斯塔特-格萊韋], _id=[78841791384159], name=[Neustadt-Glewe], uri=[]}
==>78961407639797 {label_=[打狗英國領事館文化園區], _id=[78961407639797], name=[Former_British_Consulate_at_Takao], uri=[]}
==>95522075072286 {label_=[賴琳恩], _id=[95522075072286], name=[Lene_Lai], uri=[]}
==>153451821264409 {label_=[唐古韭], _id=[153451821264409], name=[Allium_tanguticum], uri=[http://dbpedia.org/resource/Allium_tanguticum]}
==>154857715280524 {label_=[溫帶], _id=[154857715280524], name=[Temperate_climate], uri=[]}
==>166027168671115 {label_=[GSh-18手槍], _id=[166027168671115], name=[GSh-18], uri=[]}
==>166513572484984 {label_=[WMA], _id=[166513572484984], name=[WMA], uri=[]}
==>182078824443170 {label_=[保羅·納斯], _id=[182078824443170], name=[Paul_Nurse], uri=[]}
==>211356647821663 {label_=[克魯克斯頓 (明尼蘇達州)], _id=[211356647821663], name=[Crookston,_Minnesota], uri=[]}
==>222227245802710 {label_=[我的女友是九尾狐], _id=[222227245802710], name=[My_Girlfriend_Is_a_Nine-Tailed_Fox], uri=[]}
==>229972043766751 {label_=[李天榮], _id=[229972043766751], name=[Wilson_Lee_Flores], uri=[]}
==>247488956381743 {label_=[1,2-雙(二異丙基膦)乙烷], _id=[247488956381743], name=[1,2-Bis(diisopropylphosphino)ethane], uri=[(diisopropylphosphino)ethane]}
==>264200262547493 {label_=[欽迪龍屬], _id=[264200262547493], name=[Chindesaurus], uri=[]}
==>...
......
17:37:12 INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
17:37:12 INFO org.apache.hadoop.mapred.Task - Task 'attempt_local1370056218_0005_r_000000_0' done.
17:37:13 INFO org.apache.hadoop.mapreduce.Job - Job job_local1370056218_0005 completed successfully
17:37:13 INFO org.apache.hadoop.mapreduce.Job - Counters: 35
File System Counters
FILE: Number of bytes read=2911187173
FILE: Number of bytes written=3038059762
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=405909
Map output records=405909
Map output bytes=65118176
Map output materialized bytes=66297322
Input split bytes=268
Combine input records=405909
Combine output records=405909
Reduce input groups=405909
Reduce shuffle bytes=0
Reduce input records=405909
Reduce output records=0
Spilled Records=811818
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=5136
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=2091909120
com.thinkaurelius.titan.hadoop.formats.edgelist.EdgeListInputMapReduce$Counters
IN_EDGES_CREATED=0
OUT_EDGES_CREATED=0
VERTEX_PROPERTIES_CREATED=1217727
VERTICES_CREATED=405909
VERTICES_EMITTED=405909
com.thinkaurelius.titan.hadoop.mapreduce.transform.PropertyMapMap$Counters
VERTICES_PROCESSED=405909
com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Counters
EDGES_PROCESSED=0
VERTICES_PROCESSED=405909
File Input Format Counters
Bytes Read=54114517
File Output Format Counters
Bytes Written=0
==>47994559900176 {label_=[慾望], _id=[47994559900176], name=[Want], uri=[]}
==>60888991522182 {label_=[無機化學命名法], _id=[60888991522182], name=[IUPAC_nomenclature_of_inorganic_chemistry], uri=[]}
==>78841791384159 {label_=[諾伊斯塔特-格萊韋], _id=[78841791384159], name=[Neustadt-Glewe], uri=[]}
==>78961407639797 {label_=[打狗英國領事館文化園區], _id=[78961407639797], name=[Former_British_Consulate_at_Takao], uri=[]}
==>95522075072286 {label_=[賴琳恩], _id=[95522075072286], name=[Lene_Lai], uri=[]}
==>153451821264409 {label_=[唐古韭], _id=[153451821264409], name=[Allium_tanguticum], uri=[http://dbpedia.org/resource/Allium_tanguticum]}
==>154857715280524 {label_=[溫帶], _id=[154857715280524], name=[Temperate_climate], uri=[]}
==>166027168671115 {label_=[GSh-18手槍], _id=[166027168671115], name=[GSh-18], uri=[]}
==>166513572484984 {label_=[WMA], _id=[166513572484984], name=[WMA], uri=[]}
==>182078824443170 {label_=[保羅·納斯], _id=[182078824443170], name=[Paul_Nurse], uri=[]}
==>211356647821663 {label_=[克魯克斯頓 (明尼蘇達州)], _id=[211356647821663], name=[Crookston,_Minnesota], uri=[]}
==>222227245802710 {label_=[我的女友是九尾狐], _id=[222227245802710], name=[My_Girlfriend_Is_a_Nine-Tailed_Fox], uri=[]}
==>229972043766751 {label_=[李天榮], _id=[229972043766751], name=[Wilson_Lee_Flores], uri=[]}
==>247488956381743 {label_=[1,2-雙(二異丙基膦)乙烷], _id=[247488956381743], name=[1,2-Bis(diisopropylphosphino)ethane], uri=[(diisopropylphosphino)ethane]}
==>264200262547493 {label_=[欽迪龍屬], _id=[264200262547493], name=[Chindesaurus], uri=[]}
==>...
來自 “ ITPUB部落格 ” ,連結:http://blog.itpub.net/16582684/viewspace-1283902/,如需轉載,請註明出處,否則將追究法律責任。
相關文章
- java檔案相關(檔案追加內容、檔案內容清空、檔案內容讀取)Java
- 訪問JavaWeb專案沒有返回任何內容JavaWeb
- c++ 獲取檔案建立時間、修改時間、訪問時間、檔案內容長度C++
- 檔案內容拷貝
- Oracle 控制檔案內容Oracle
- 檔案內容比較
- vim內替換檔案內容
- 檔案內容對比工具
- C#分割檔案內容C#
- git檢視檔案內容Git
- properties檔案內容亂碼
- Linux檔案內容操作Linux
- 檢視控制檔案內容
- dump 轉儲檔案內容
- 提取rpm檔案內容
- 使用ln同步檔案內容,支援忽略檔案
- vite vue-cli 讀取檔案原始內容 使用base64內容的檔案ViteVue
- JavaScriptFAQ(十九)——檔案訪問JavaScript
- 實用解析dmp檔案內容
- 命令列技巧:分割檔案內容命令列
- php獲取xml檔案內容PHPXML
- node中給檔案追加內容
- mybatis讀取properties檔案內容MyBatis
- js直接列印pdf檔案內容JS
- PowerShell輸出內容到檔案
- 控制檔案包含哪些基本內容
- 檔案內容查詢命令(轉)
- ftp上直接修改檔案內容FTP
- 使用GeoTools解析shp檔案內容
- 關於redo log 檔案中記錄的內容問題 ?
- scala簡要:檔案訪問
- 訪問ASM中的檔案ASM
- python中修改檔案行內容Python
- python操作檔案寫入內容Python
- 如何使用htmlq提取html檔案內容HTML
- 如何編輯PDF檔案的內容?
- git將指定內容寫入檔案Git
- 設定Flume監聽檔案內容