Unstructured/Structured data for indexing
Hi, I am a solr newbie,just got a quick question. SOLR is designed for querying unstructured data,but then why we have to send it in a structured form(json,xml) for indexing?. Thanks & Regards,S Subin -- View this message in context: http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unstructured/Structured data for indexing
Thanks jurgen...for clarifying...:-) On 9 Dec 2015 2:06 pm, Jürgen Wagner (DVT)" [via Lucene]" < ml-node+s472066n4244411...@n3.nabble.com> wrote: > Subin, > Only the envelope is structured. What's inside the individual fields of > the structure may be single values (possibly considered structured > meta-data) or unstructured (like free text or other fields with informal > semantics). > > Even if you pass a 5-hour video as a major case of unstructured data to > Solr or Elasticsearch, you will need to agree on how meta-data and individual > aspects of that data object will be passed. > > Best regards, > --Jürgen > > > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С > уважением > *i.A. Jürgen Wagner* > Head of Competence Center "Intelligence" > & Senior Cloud Consultant > > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543 > E-Mail: [hidden email] > <http:///user/SendEmail.jtp?type=node&node=4244411&i=0>, URL: > www.devoteam.de > -- > Managing Board: Jürgen Hatzipantelis (CEO) > Address of Record: 64331 Weiterstadt, Germany; Commercial Register: > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071 > > > On 09.12.2015 09:09, subinalex wrote: > > Hi, > > I am a solr newbie,just got a quick question. > > SOLR is designed for querying unstructured data,but then why we have to send > it in a structured form(json,xml) for indexing?. > > Thanks & Regards,S > Subin > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406.html > Sent from the Solr - User mailing list archive at Nabble.com. > > > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406p4244411.html > To unsubscribe from Unstructured/Structured data for indexing, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4244406&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQyNDQ0MDZ8LTc3MzYxMjgxNA==> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406p4244412.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexed columns not visible on schema browser in Solr Admin console
Hi All, I have indexed a json with columns that arenot defined in schema.xml,reason being i have a dynamicfield defined to catch them. The indexing completes without error,even the index and tlog folders in collection root are created. However,when i login to solr admin console,i am not able to see the newly indexed field names in schema browser for the particular collection. Why is it so?,,, Please help on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-columns-not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexed columns not visible on schema browser in Solr Admin console
Thanks a lot alex. The issue was because as u said it did not match the definition :-) On 4 Sep 2016 8:35 a.m., "Alexandre Rafalovitch [via Lucene]" < ml-node+s472066n4294558...@n3.nabble.com> wrote: It should show up in the Admin UI, even if it matches the dynamic field. So, I would focus on that. Perhaps you did not run commit. Or it did not match the definition the way you thought it did. I would do a manual record that matches that dynamic field and see if _that_ shows up. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 3 September 2016 at 15:55, subinalex <[hidden email] <http:///user/SendEmail.jtp?type=node&node=4294558&i=0>> wrote: > Hi All, > > I have indexed a json with columns that arenot defined in schema.xml,reason > being i have a dynamicfield defined to catch them. > > The indexing completes without error,even the index and tlog folders in > collection root are created. > > However,when i login to solr admin console,i am not able to see the newly > indexed field names in schema browser for the particular collection. > > Why is it so?,,, > > Please help on this. > > > > -- > View this message in context: http://lucene.472066.n3. nabble.com/Indexed-columns-not-visible-on-schema-browser- in-Solr-Admin-console-tp4294523.html > Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Indexed-columns- not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523p4294558.html To unsubscribe from Indexed columns not visible on schema browser in Solr Admin console, click here <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4294523&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQyOTQ1MjN8LTc3MzYxMjgxNA==> . NAML <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-columns-not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523p4294561.html Sent from the Solr - User mailing list archive at Nabble.com.
How to know if SOLR indexing is completed prorammatically
Hi Guys, We are running back to back solr indexing batch jobs.We need to ensure if the triggered batch indexing is completed before starting the next. I know we can check the status by viewing the 'Logging' and 'CoreAdmin' page of solr admin console. But,we need to find this out programmatically and based on this trigger the next solr indexing batch job. Please help with this. :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to know if SOLR indexing is completed prorammatically
Thanks a lot christian.. let me explore that.. :) -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799p4298807.html Sent from the Solr - User mailing list archive at Nabble.com.
Re-Indexing 143 million rows
Hi Team, I have indexed data with 143 rows(docs) into solr.It takes around 3 hours to index.I usde csvUpdateHandler and indexes the csv file by remote streaming. Now ,when i re-indexing the same csv data,it is still taking 3+ hours. Ideally,since there are no changes in _id values,it should have finished quickly right?. Please provide some insights on this.. Regards, Subin -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows-tp4306622.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re-Indexing 143 million rows
Thanks a lot Eric..:-) On 21 Nov 2016 20:09, "Erick Erickson [via Lucene]" < ml-node+s472066n4306659...@n3.nabble.com> wrote: > In a word, "no". Resending the same document will > > 1> delete the old version (based on ID) > 2> index the document just sent. > > When a document comes in, Solr can't assume that > "nothing's changed". What if you changed your schema? > > So I'd expect the second run to take at least as long as the first. > > Best, > Erick > > On Mon, Nov 21, 2016 at 1:16 AM, subinalex <[hidden email] > <http:///user/SendEmail.jtp?type=node&node=4306659&i=0>> wrote: > > > Hi Team, > > > > I have indexed data with 143 rows(docs) into solr.It takes around 3 > hours to > > index.I usde csvUpdateHandler and indexes the csv file by remote > streaming. > > Now ,when i re-indexing the same csv data,it is still taking 3+ hours. > > > > Ideally,since there are no changes in _id values,it should have finished > > quickly right?. > > > > Please provide some insights on this.. > > > > Regards, > > Subin > > > > > > > > -- > > View this message in context: http://lucene.472066.n3. > nabble.com/Re-Indexing-143-million-rows-tp4306622.html > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows- > tp4306622p4306659.html > To unsubscribe from Re-Indexing 143 million rows, click here > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4306622&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQzMDY2MjJ8LTc3MzYxMjgxNA==> > . > NAML > <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows-tp4306622p4306952.html Sent from the Solr - User mailing list archive at Nabble.com.
Implementing DIH - Using a non-datetime change tracking column to Identify delta
Hi Experts, Can we use a non-datetime column to identify delta rows in deltaQuery for DIH configuration. Like for example in the below deltaQuery , deltaQuery="select ID from category where last_modified > '${dih.last_index_time}'" the delta rows are picked when the last_modified datetime is greater than last index time. I want to pick the deltas if a column value differs from the corresponding column value in solr. deltaQuery="select ID from category where md5hashcode <> ; 'indexedmd5hashcode'" Can we implement this?. -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-DIH-Using-a-non-datetime-change-tracking-column-to-Identify-delta-tp4328306.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Implementing DIH - Using a non-datetime change tracking column to Identify delta
Thanks Shawn!!..:-) Ll try out this.. On 6 Apr 2017 00:00, "Shawn Heisey-2 [via Lucene]" < ml-node+s472066n4328519...@n3.nabble.com> wrote: On 4/4/2017 7:40 AM, subinalex wrote: > Can we use a non-datetime column to identify delta rows in deltaQuery for > DIH configuration. > Like for example in the below deltaQuery , > > deltaQuery="select ID from category where last_modified > > '${dih.last_index_time}'" > > the delta rows are picked when the last_modified datetime is greater than > last index time. > > I want to pick the deltas if a column value differs from the corresponding > column value in solr. > > deltaQuery="select ID from category where md5hashcode <> ; > 'indexedmd5hashcode'" The only piece of information that DIH saves internally when it starts an import is the current timestamp. You can still do what you want, but you will need to be responsible for keeping track of the information necessary to determine what's new in your own program. Solr will not do it for you. When you start an import, you can provide any arbitrary information with URL parameters on the request that starts the import. Here's my full config for DIH from one of my Solr cores showing how to use these parameters: I am specifying many of the parts of the SQL query from URL parameters. For example, I will include a "dataView" parameter to choose at import time what view or table will be queried. The other parameters control what ID values will be returned. The query and deltaImportQuery attributes are identical. At one time, all my indexing was done with DIH, so I used these parameters to limit what was done by the delta-import runs. Currently, DIH is only used for full rebuilds, I have a SolrJ program for incremental changes. Thanks, Shawn -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Implementing-DIH- Using-a-non-datetime-change-tracking-column-to-Identify- delta-tp4328306p4328519.html To unsubscribe from Implementing DIH - Using a non-datetime change tracking column to Identify delta, click here <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4328306&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQzMjgzMDZ8LTc3MzYxMjgxNA==> . NAML <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://lucene.472066.n3.nabble.com/Implementing-DIH-Using-a-non-datetime-change-tracking-column-to-Identify-delta-tp4328306p4329037.html Sent from the Solr - User mailing list archive at Nabble.com.