Ashutosh Mestry created ATLAS-1818:
--------------------------------------

             Summary: Performance of Basic Search that Uses indexQuery Takes 
Long Time to Fetch Results
                 Key: ATLAS-1818
                 URL: https://issues.apache.org/jira/browse/ATLAS-1818
             Project: Atlas
          Issue Type: Bug
          Components:  atlas-core, atlas-webui
    Affects Versions: trunk, 0.8-incubating
            Reporter: Ashutosh Mestry
            Assignee: Ashutosh Mestry
             Fix For: trunk, 0.8-incubating


h3. Background
An environment that is setup with 100K hive_tables each with 84 columns.

The basic search with query parameter specified is executed. Results take 75 
secs to appear.

h3. Analysis & Findings
Similar test was performed with smaller data set (200 hive_tables each with 81 
columns) resulted in less than ideal performance.

Atlas Basic Search API uses _graph.indexQuery_ for performing search. This uses 
_Solr_ for doing the search.

There are 2 aspects that affect performance:
* Solr's default for returning max query set when no limit is specified is 
100K. In the test scenario, this is returning entire result set.
* Once result set is returned, _EntityDiscoveryService.searchUsingBasicQuery_ 
does a sequential scan to weed out pertinent data. This operation is 
proportional to size of the result set. 

h3. Solution
Following changes will improve performance:
** Solr's max result set property is governed by 
_atlas.graph.index.search.max-result-set-size_. It will make sense to set this 
to a lower number.
** Modify Solr's configuration _solrconfig.xml_ to use _FastLRUCache_.
** Modify _EntityDiscoveryService.searchUsingBasicQuery_ to form a query that 
takes additional paramters.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to