Indexing data from multiple data sources(CSV, RDBMS)
Hi, I am working on indexing data from multiple data sources using a single collection. I specified data sources information in the data-config file and also updated managed schema.xml by adding the fields from all the data sources by specifying the common unique key across all the sources. Here is a sample config file. > url="jdbc:mysql://localhost/aaa" user="***" password="***" batchSize="1" /> > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" > url="jdbc:sqlserver://localhost;databasename=aaa" user="***" password="**"/> > > > > > > > > > > > Error Details: Full Import failed:java.lang.RuntimeException:java.lang.RuntimeException: org.apache.solr.handler.dataimport.DataImportHandlerException: Invalid type for data source: Jdbc-2 Processing Document #1 Thanks, Shravan
Re: Indexing data from multiple data sources
Hi, Greetings of the day!!! Unfortunately we have enclosed our database source details in the Solr community post while sending our queries to solr support as mentioned in the below mail. We find that it has been posted with this link https://sematext.com/opensee/m/Solr/eHNlswSd1vD6AF?subj=RE+Indexing+data+from+multiple+data+sources As it is open to the world, what we are requesting here is, could you please remove that post as-soon-as possible before it creates any sucurity issues for us. Your help is very very appreciable!!! FYI. Here I'm attaching the below screenshot [cid:6ccc253a-a590-4e89-b5de-fd9a59d88aba] Thanks & Regards, Ravikiran Moola From: RaviKiran Moola Sent: Friday, April 17, 2020 9:13 PM To: solr-user@lucene.apache.org Subject: RE: Indexing data from multiple data sources Hi, Greetings!!! We are working on indexing data from multiple data sources (MySQL & MSSQL) in a single collection. We specified data source details like connection details along with the required fields for both data sources in a single data config file, along with specified required fields details in the managed schema and here fetching the same columns from both data sources by specifying the common “unique key”. Unable to index the data from the data sources using solr. Here I’m attaching the data config file and screenshot. Data config file: Thanks & Regards, Ravikiran Moola +91-9494924492
Index using CSV file
Hi, I'm trying to import data from CSV file from Solr UI and I am completely new to Solr. Please provide the necessary configurations to achieve this.
Re: Index using CSV file
This you don’t do via the Solr UI. You have many choices amongst others 1) write a client yourself that parses the csv and post it to the standard Update handler https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html 2) use the Solr post tool https://lucene.apache.org/solr/guide/8_4/post-tool.html 3) use a http client command line tool (eg curl) and post the data to the CSV update handler: https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html However, it would be useful to know what you exactly trying to achieve and give more background on the project, what programming languages and frameworks you (plan to) use etc to give you a more guided answer > Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla > : > > Hi, > > I'm trying to import data from CSV file from Solr UI and I am completely new > to Solr. Please provide the necessary configurations to achieve this. > >
Re: Index using CSV file
Please also do not forget that you should create a schema in the Solr collection so that the data is correctly indexed so that you get fast and correct query result. I usually recommend to read one of the many Solr books out there to get started. This will save you a lot of time. > Am 18.04.2020 um 17:43 schrieb Jörn Franke : > > > This you don’t do via the Solr UI. You have many choices amongst others > 1) write a client yourself that parses the csv and post it to the standard > Update handler > https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html > 2) use the Solr post tool > https://lucene.apache.org/solr/guide/8_4/post-tool.html > 3) use a http client command line tool (eg curl) and post the data to the CSV > update handler: > https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html > > However, it would be useful to know what you exactly trying to achieve and > give more background on the project, what programming languages and > frameworks you (plan to) use etc to give you a more guided answer > >>> Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla >>> : >>> >> Hi, >> >> I'm trying to import data from CSV file from Solr UI and I am completely new >> to Solr. Please provide the necessary configurations to achieve this. >> >>
Solr performance using fq with multiple values
Hi, We are seeing significant performance degradation on single queries that use fq with multiple values as in: fq=field1_name:(V1 V2 V3 ...) If we use only one value in the fq (say only V1) we get Qtime = T ms As we increase the number of values, say to 5 values, Qtime more than triples, even if the number of results is small. In my tests I made sure cache was not an issue and nothing else was using the cpu. We commonly need to use fq with multiple values (on the same field name, which is normally a long). Is this performance hit to be expected? Is there a better way to do this? We use Solr Cloud 8.3, and the field that we use fq on is defined as: Thanks Reinaldo
Re: Solr performance using fq with multiple values
Hi Reinaldo, Involved fields should be indexed for better performance ? Sylvain Le sam. 18 avr. 2020 à 18:46, Odysci a écrit : > Hi, > > We are seeing significant performance degradation on single queries that > use fq with multiple values as in: > > fq=field1_name:(V1 V2 V3 ...) > > If we use only one value in the fq (say only V1) we get Qtime = T ms > As we increase the number of values, say to 5 values, Qtime more than > triples, even if the number of results is small. In my tests I made sure > cache was not an issue and nothing else was using the cpu. > > We commonly need to use fq with multiple values (on the same field name, > which is normally a long). > Is this performance hit to be expected? > Is there a better way to do this? > > We use Solr Cloud 8.3, and the field that we use fq on is defined as: > > stored="false" required="false" multiValued="false" > docValues="true" /> > > Thanks > > Reinaldo >
Re: Solr performance using fq with multiple values
We don't used this field for general queries (q:*), only for fq and faceting. Do you think making it indexed="true" would make a difference in fq performance? Thanks Reinaldo On Sat, Apr 18, 2020 at 3:06 PM Sylvain James wrote: > Hi Reinaldo, > > Involved fields should be indexed for better performance ? > > stored="false" required="false" multiValued="false" > docValues="true" /> > > Sylvain > > Le sam. 18 avr. 2020 à 18:46, Odysci a écrit : > > > Hi, > > > > We are seeing significant performance degradation on single queries that > > use fq with multiple values as in: > > > > fq=field1_name:(V1 V2 V3 ...) > > > > If we use only one value in the fq (say only V1) we get Qtime = T ms > > As we increase the number of values, say to 5 values, Qtime more than > > triples, even if the number of results is small. In my tests I made sure > > cache was not an issue and nothing else was using the cpu. > > > > We commonly need to use fq with multiple values (on the same field name, > > which is normally a long). > > Is this performance hit to be expected? > > Is there a better way to do this? > > > > We use Solr Cloud 8.3, and the field that we use fq on is defined as: > > > > > stored="false" required="false" multiValued="false" > > docValues="true" /> > > > > Thanks > > > > Reinaldo > > >
Re: Solr performance using fq with multiple values
On 4/18/2020 12:20 PM, Odysci wrote: We don't used this field for general queries (q:*), only for fq and faceting. Do you think making it indexed="true" would make a difference in fq performance? fq means "filter query". It's still a query. So yes, the field should be indexed. The query you're doing only works because docValues is true ... but queries using docValues have terrible performance. Thanks, Shawn