Next up, I'm looking at facet_loader.py, which runs the Elasticsearch facet manager
How to run
# bin/facet_loader.py -s <source-URI> -d <destination-URI> -l <limit-num> -c <config-file>
bin/facet_loader.py -s localhost:9300:jsonpedia_test_load:en -d localhost:9300:jsonpedia_test_facet:en -l 100 -c conf/faceting.properties
facet_loader.py is a strightforward script which calls:
MAVEN_OPTS='-Xms8g -Xmx8g -Dlog4j.configuration=file:conf/log4j.properties' mvn exec:java -Dexec.mainClass=com.machinelinking.cli.facetloader -Dexec.args='-s localhost:9300:jsonpedia_test_load:en -d localhost:9300:jsonpedia_test_facet:en -l 100 -c conf/faceting.properties'
The facetloader class does the following:
- Create
fromStorageandfacetStorageinstance ofElasticJSONStorageusing theElasticJSONStorageFactory - Create an instance of
DefaultElasticFacetConfigurationandDefaultElasticFacetManagerusing this configuration. - The
loadFacetsmethod of theElasticFacetManageris called, which converts each document from thefromStorageusing the providedEnrichedEntityFacetConverterand puts it into thedestinationStorage. The converter is simply going through each document, and creating documents out of each section of the original document
So this means...
Now, we have elasticsearch documents for each section available with details such as page,section,links, content_stem etc.
Next up, I'll be looking at the CSV Export workflow and deep-diving into the code.
Also, I need to start work on a couple of issues in the issue tracker (which has been long delayed at this point)