The GSoC Diaries – facet

facet_loader.py

Next up, I'm looking at facet_loader.py, which runs the Elasticsearch facet manager

How to run

# bin/facet_loader.py -s <source-URI> -d <destination-URI> -l <limit-num> -c <config-file> 
bin/facet_loader.py -s localhost:9300:jsonpedia_test_load:en -d localhost:9300:jsonpedia_test_facet:en -l 100 -c conf/faceting.properties

facet_loader.py is a strightforward script which calls:

MAVEN_OPTS='-Xms8g -Xmx8g -Dlog4j.configuration=file:conf/log4j.properties' mvn exec:java -Dexec.mainClass=com.machinelinking.cli.facetloader -Dexec.args='-s localhost:9300:jsonpedia_test_load:en -d localhost:9300:jsonpedia_test_facet:en -l 100 -c conf/faceting.properties'

The facetloader class does the following:

Create fromStorage and facetStorage instance of ElasticJSONStorage using the ElasticJSONStorageFactory
Create an instance of DefaultElasticFacetConfiguration and DefaultElasticFacetManager using this configuration.
The loadFacets method of the ElasticFacetManager is called, which converts each document from the fromStorage using the provided EnrichedEntityFacetConverter and puts it into the destinationStorage. The converter is simply going through each document, and creating documents out of each section of the original document

So this means...

Now, we have elasticsearch documents for each section available with details such as page,section,links, content_stem etc.

Next up, I'll be looking at the CSV Export workflow and deep-diving into the code.

Also, I need to start work on a couple of issues in the issue tracker (which has been long delayed at this point)

Posted on: Fri 08 May 2015

Category: gsoc – Tags: gsoc, dbpedia