[
https://issues.apache.org/jira/browse/ATLAS-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashutosh Mestry updated ATLAS-1818:
-----------------------------------
Description:
h3. Background
An environment that is setup with 100K hive_tables each with 84 columns.
The basic search with query parameter specified is executed. Results take 75
secs to appear.
h3. Analysis & Findings
Similar test was performed with smaller data set (200 hive_tables each with 81
columns) resulted in less than ideal performance.
Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses
_Solr_ for doing the search.
There are 2 aspects that affect performance:
* Solr's default for returning max query set when no limit is specified is
100K. In the test scenario, this is returning entire result set.
* Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_
does a sequential scan to filter data relevant to the query. This operation is
proportional to size of the result set.
h3. Solution
Following changes will improve performance:
* Solr's max result set property is governed by
_atlas.graph.index.search.max-result-set-size_. It will make sense to set this
to a lower number.
* Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
* Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that
takes additional paramters.
was:
h3. Background
An environment that is setup with 100K hive_tables each with 84 columns.
The basic search with query parameter specified is executed. Results take 75
secs to appear.
h3. Analysis & Findings
Similar test was performed with smaller data set (200 hive_tables each with 81
columns) resulted in less than ideal performance.
Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses
_Solr_ for doing the search.
There are 2 aspects that affect performance:
* Solr's default for returning max query set when no limit is specified is
100K. In the test scenario, this is returning entire result set.
* Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_
does a sequential scan to weed out pertinent data. This operation is
proportional to size of the result set.
h3. Solution
Following changes will improve performance:
* Solr's max result set property is governed by
_atlas.graph.index.search.max-result-set-size_. It will make sense to set this
to a lower number.
* Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
* Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that
takes additional paramters.
> Performance of Basic Search that Uses indexQuery Takes Long Time to Fetch
> Results
> ---------------------------------------------------------------------------------
>
> Key: ATLAS-1818
> URL: https://issues.apache.org/jira/browse/ATLAS-1818
> Project: Atlas
> Issue Type: Bug
> Components: atlas-core, atlas-webui
> Affects Versions: trunk, 0.8-incubating
> Reporter: Ashutosh Mestry
> Assignee: Ashutosh Mestry
> Fix For: trunk, 0.8-incubating
>
> Attachments: ATLAS-1818.patch
>
> Original Estimate: 120h
> Time Spent: 96h
> Remaining Estimate: 24h
>
> h3. Background
> An environment that is setup with 100K hive_tables each with 84 columns.
> The basic search with query parameter specified is executed. Results take 75
> secs to appear.
> h3. Analysis & Findings
> Similar test was performed with smaller data set (200 hive_tables each with
> 81 columns) resulted in less than ideal performance.
> Atlas Basic Search API uses _graph.indexQuery_ for performing search. This
> uses _Solr_ for doing the search.
> There are 2 aspects that affect performance:
> * Solr's default for returning max query set when no limit is specified is
> 100K. In the test scenario, this is returning entire result set.
> * Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_
> does a sequential scan to filter data relevant to the query. This operation
> is proportional to size of the result set.
> h3. Solution
> Following changes will improve performance:
> * Solr's max result set property is governed by
> _atlas.graph.index.search.max-result-set-size_. It will make sense to set
> this to a lower number.
> * Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
> * Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that
> takes additional paramters.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)