from:"subinalex"

Unstructured/Structured data for indexing

2015-12-09 Thread subinalex

Hi,

I am a solr newbie,just got a quick question.

SOLR is designed for querying unstructured data,but then why we have to send
it in a structured form(json,xml) for indexing?.

Thanks & Regards,S
Subin 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Unstructured/Structured data for indexing

2015-12-09 Thread subinalex

Thanks jurgen...for clarifying...:-)
On 9 Dec 2015 2:06 pm, Jürgen Wagner (DVT)" [via Lucene]" <
ml-node+s472066n4244411...@n3.nabble.com> wrote:

> Subin,
>   Only the envelope is structured. What's inside the individual fields of
> the structure may be single values (possibly considered structured
> meta-data) or unstructured (like free text or other fields with informal
> semantics).
>
> Even if you pass a 5-hour video as a major case of unstructured data to
> Solr or Elasticsearch, you will need to agree on how meta-data and individual
> aspects of that data object will be passed.
>
> Best regards,
> --Jürgen
>
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: [hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4244411&i=0>, URL:
> www.devoteam.de
> --
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
> On 09.12.2015 09:09, subinalex wrote:
>
> Hi,
>
> I am a solr newbie,just got a quick question.
>
> SOLR is designed for querying unstructured data,but then why we have to send
> it in a structured form(json,xml) for indexing?.
>
> Thanks & Regards,S
> Subin
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406p4244411.html
> To unsubscribe from Unstructured/Structured data for indexing, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4244406&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQyNDQ0MDZ8LTc3MzYxMjgxNA==>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unstructured-Structured-data-for-indexing-tp4244406p4244412.html
Sent from the Solr - User mailing list archive at Nabble.com.

Indexed columns not visible on schema browser in Solr Admin console

2016-09-03 Thread subinalex

Hi All,

I have indexed a json with columns that arenot defined in schema.xml,reason
being i have a dynamicfield defined to catch them.

The indexing completes without error,even the index and tlog folders in
collection root are created.

However,when i login to solr admin console,i am not able to see the newly
indexed field names in schema browser for the particular collection.

Why is it so?,,,

Please help on this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-columns-not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexed columns not visible on schema browser in Solr Admin console

2016-09-04 Thread subinalex

Thanks a lot alex.
The issue was because as u said it did not match the definition

:-)

On 4 Sep 2016 8:35 a.m., "Alexandre Rafalovitch [via Lucene]" <
ml-node+s472066n4294558...@n3.nabble.com> wrote:

It should show up in the Admin UI, even if it matches the dynamic
field. So, I would focus on that. Perhaps you did not run commit. Or
it did not match the definition the way you thought it did.

I would do a manual record that matches that dynamic field and see if
_that_ shows up.

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 3 September 2016 at 15:55, subinalex <[hidden email]
<http:///user/SendEmail.jtp?type=node&node=4294558&i=0>> wrote:

> Hi All,
>
> I have indexed a json with columns that arenot defined in
schema.xml,reason
> being i have a dynamicfield defined to catch them.
>
> The indexing completes without error,even the index and tlog folders in
> collection root are created.
>
> However,when i login to solr admin console,i am not able to see the newly
> indexed field names in schema browser for the particular collection.
>
> Why is it so?,,,
>
> Please help on this.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
nabble.com/Indexed-columns-not-visible-on-schema-browser-
in-Solr-Admin-console-tp4294523.html
> Sent from the Solr - User mailing list archive at Nabble.com.


--
If you reply to this email, your message will be added to the discussion
below:
http://lucene.472066.n3.nabble.com/Indexed-columns-
not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523p4294558.html
To unsubscribe from Indexed columns not visible on schema browser in Solr
Admin console, click here
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4294523&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQyOTQ1MjN8LTc3MzYxMjgxNA==>
.
NAML
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexed-columns-not-visible-on-schema-browser-in-Solr-Admin-console-tp4294523p4294561.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to know if SOLR indexing is completed prorammatically

2016-09-30 Thread subinalex

Hi Guys,

We are running back to back solr indexing batch jobs.We need to ensure if
the triggered batch indexing is completed before starting the next.

I know we can check the status by viewing the 'Logging' and 'CoreAdmin' page
of solr admin console.

But,we need to find this out programmatically and based on this trigger the
next solr indexing batch job.


Please help with this.


:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to know if SOLR indexing is completed prorammatically

2016-09-30 Thread subinalex

Thanks a lot christian..
let me explore that..


:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799p4298807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re-Indexing 143 million rows

2016-11-21 Thread subinalex

Hi Team,

I have indexed data with 143 rows(docs) into solr.It takes around 3 hours to
index.I usde csvUpdateHandler and indexes the csv file by remote streaming.
Now ,when i re-indexing the same csv data,it is still taking 3+ hours.

Ideally,since there are no changes in _id values,it should have finished
quickly right?.

Please provide some insights on this..

Regards,
Subin



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows-tp4306622.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Re-Indexing 143 million rows

2016-11-22 Thread subinalex

Thanks a lot Eric..:-)

On 21 Nov 2016 20:09, "Erick Erickson [via Lucene]" <
ml-node+s472066n4306659...@n3.nabble.com> wrote:

> In a word, "no". Resending the same document will
>
> 1> delete the old version (based on ID)
> 2> index the document just sent.
>
> When a document comes in, Solr can't assume that
> "nothing's changed". What if you changed your schema?
>
> So I'd expect the second run to take at least as long as the first.
>
> Best,
> Erick
>
> On Mon, Nov 21, 2016 at 1:16 AM, subinalex <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4306659&i=0>> wrote:
>
> > Hi Team,
> >
> > I have indexed data with 143 rows(docs) into solr.It takes around 3
> hours to
> > index.I usde csvUpdateHandler and indexes the csv file by remote
> streaming.
> > Now ,when i re-indexing the same csv data,it is still taking 3+ hours.
> >
> > Ideally,since there are no changes in _id values,it should have finished
> > quickly right?.
> >
> > Please provide some insights on this..
> >
> > Regards,
> > Subin
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/Re-Indexing-143-million-rows-tp4306622.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows-
> tp4306622p4306659.html
> To unsubscribe from Re-Indexing 143 million rows, click here
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4306622&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQzMDY2MjJ8LTc3MzYxMjgxNA==>
> .
> NAML
> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-Indexing-143-million-rows-tp4306622p4306952.html
Sent from the Solr - User mailing list archive at Nabble.com.

Implementing DIH - Using a non-datetime change tracking column to Identify delta

2017-04-04 Thread subinalex

Hi Experts,

Can we use a non-datetime column to identify delta rows in deltaQuery for
DIH configuration.
Like for example in the below deltaQuery ,

  deltaQuery="select ID from category where last_modified >
'${dih.last_index_time}'"


the delta rows are picked when the last_modified datetime is greater than
last index time.

I want to pick the deltas if a column value differs from the corresponding
column value in solr.

 deltaQuery="select ID from category where md5hashcode  <> ;
'indexedmd5hashcode'"



Can we implement this?.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-DIH-Using-a-non-datetime-change-tracking-column-to-Identify-delta-tp4328306.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing DIH - Using a non-datetime change tracking column to Identify delta

2017-04-07 Thread subinalex

Thanks Shawn!!..:-)
Ll try out this..

On 6 Apr 2017 00:00, "Shawn Heisey-2 [via Lucene]" <
ml-node+s472066n4328519...@n3.nabble.com> wrote:

On 4/4/2017 7:40 AM, subinalex wrote:

> Can we use a non-datetime column to identify delta rows in deltaQuery for
> DIH configuration.
> Like for example in the below deltaQuery ,
>
>   deltaQuery="select ID from category where last_modified >
> '${dih.last_index_time}'"
>
> the delta rows are picked when the last_modified datetime is greater than
> last index time.
>
> I want to pick the deltas if a column value differs from the
corresponding
> column value in solr.
>
>  deltaQuery="select ID from category where md5hashcode  <> ;
> 'indexedmd5hashcode'"

The only piece of information that DIH saves internally when it starts
an import is the current timestamp.

You can still do what you want, but you will need to be responsible for
keeping track of the information necessary to determine what's new in
your own program.  Solr will not do it for you.

When you start an import, you can provide any arbitrary information with
URL parameters on the request that starts the import.  Here's my full
 config for DIH from one of my Solr cores showing how to use
these parameters:

I am specifying many of the parts of the SQL query from URL parameters.
For example, I will include a "dataView" parameter to choose at import
time what view or table will be queried.  The other parameters control
what ID values will be returned.

The query and deltaImportQuery attributes are identical.  At one time,
all my indexing was done with DIH, so I used these parameters to limit
what was done by the delta-import runs.  Currently, DIH is only used for
full rebuilds, I have a SolrJ program for incremental changes.

Thanks,
Shawn

--
If you reply to this email, your message will be added to the discussion
below:
http://lucene.472066.n3.nabble.com/Implementing-DIH-
Using-a-non-datetime-change-tracking-column-to-Identify-
delta-tp4328306p4328519.html
To unsubscribe from Implementing DIH - Using a non-datetime change tracking
column to Identify delta, click here
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4328306&code=YWxleGt1dHR5MTlAZ21haWwuY29tfDQzMjgzMDZ8LTc3MzYxMjgxNA==>
.
NAML
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-DIH-Using-a-non-datetime-change-tracking-column-to-Identify-delta-tp4328306p4329037.html
Sent from the Solr - User mailing list archive at Nabble.com.

Unstructured/Structured data for indexing

Re: Unstructured/Structured data for indexing

Indexed columns not visible on schema browser in Solr Admin console

Re: Indexed columns not visible on schema browser in Solr Admin console

How to know if SOLR indexing is completed prorammatically

Re: How to know if SOLR indexing is completed prorammatically

Re-Indexing 143 million rows

Re: Re-Indexing 143 million rows

Implementing DIH - Using a non-datetime change tracking column to Identify delta

Re: Implementing DIH - Using a non-datetime change tracking column to Identify delta

10 matches

Site Navigation

Mail list logo

Footer information