Hi
I have a Master Solr through which I am querying to multiple solr instance
and aggregating their response and responding back to the user.
Now the requirement is that when I get the data querying multiple solr
instance, I want it to be sorted based on some field name.
Say I have 3 Slave Solrs
Hi! http://211.20.97.26/google.com.offers.html
Hi
I am trying to access xml files which are stored in our cms,how do i pass
username/passwd to dih so i can get all xml files
its throwing exception
java.io.IOException: Server returned HTTP response code: 401 for URL:
http://admin:admin...@cms1.zinio.com.com//articles/100850443.xml
Is ther
I am out of the office until 31/07/2013.
I will respond to your query on my return,
Thanks
Siobhan
Note: This is an automated response to your message "Re: maximum number of
documents per shard?" sent on 24/07/2013 0:14:59.
This is the only notification you will receive while this person is a
Kevin,
Those are some good query response times but they could be better. You've
configured the field type sub-optimally. Look again at
http://wiki.apache.org/solr/SpatialForTimeDurations and note in particular
maxDistErr. You've left it at the value that comes pre-configured with
Solr, 0.0
That sounds like a satisfactory solution for the time being -
I am assuming you dump the data from Solr in a csv format?
How did you implement the streaming processor ? (what tool did you use for
this? Not familiar with that)
You say it takes a few minutes only to dump the data - how long does it t
Hello Matt,
You can consider writing a batch processing handler, which receives a query
and instead of sending results back, it writes them into a file which is
then available for streaming (it has its own UUID). I am dumping many GBs
of data from solr in few minutes - your query + streaming write
Sorry for the late response. I needed to find the time to load a lot of
extra data (closer to what we're anticipating). I have an index with close
to 220,000 documents, each with at least two coordinate regions anywhere
between -10 billion to +10 billion, but could potentially have up to maybe
half
Hi Matt,
This feature is commonly known as deep paging and Lucene and Solr have
issues with it ... take a look at
http://solr.pl/en/2011/07/18/deep-paging-problem/ as a potential
starting point using filters to bucketize a result set into sets of
sub result sets.
Cheers,
Tim
On Tue, Jul 23, 2013
Hi! http://millanao.cl/google.com.offers.html
Bah! I didn't notice that you'd used edismax, ignore
my comments.
Sorry for the confusion
Erick
On Tue, Jul 23, 2013 at 2:34 PM, Joe Zhang wrote:
> I'm not sure I understand, Erick. I don't have a "text" field in my schema;
> "title" and "content" are both legal fields.
>
>
> On Tue, Jul 23, 201
Right, issuing a commit after every document is not good
practice. Relying on the auto commit parameters in
solrconfig.xml is usually best, although I will sometimes
issue a commit at the very end of the indexing run.
Several things about this thread aren't making sense. First of
all your commitw
Are you mixing SolrCloud and old-style master/slave?
There was a bug a while ago (search the JIRA) where
replication was copying the entire index unnecessarily,
but I think that was fixed by 4.3.
Best
Erick
On Tue, Jul 23, 2013 at 6:33 AM, xiaoqi wrote:
>
> hi,all
>
> i have two solr ,one is ma
2.1 billion documents (including deleted documents) per Lucene index, but
essentially per Solr shard as well.
But don’t even think about going that high. In fact, don't plan on going
above 100 million unless you do a proof of concept that validates that you
get acceptable query and update perf
Perfect thanks so much. You just cleared up the other little bit, i.e. when
the SpellingQueryConverter is used/not used and why you might implement
your own.
Thanks again.
On Tue, Jul 23, 2013 at 6:48 PM, Dyer, James
wrote:
> You've got it. The only other thing is that "spellcheck.q" does not
Currently I am using SOLR 3.5.X and I push updates to SOLR via queue (Active
MQ) and perform hard commit every 30 minutes (since my index is relatively
big around 30 million documents). I am thinking of using soft commit to
implement NRT search but I am worried about the reliability.
For ex: If I
You've got it. The only other thing is that "spellcheck.q" does not analyze
anything. The whole purpose of this is to allow you to just send raw keywords
to be spellchecked. This is handy if you have a complex "q" parameter (say,
you're using local params, etc) and the SpellingQueryConverter
still 2.1 billion documents?
Thanks James. That's it! Now:
http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0
returns:
perform hvac
4
perform
hvac
performed hvac
4
performed
hvac
If you have time, I'm still slightly unclear on the field element in the
spellcheck configu
For people who have same issue, solved solved adding:
text
in the requestHandler /update/extract" in solrconfig.xml:
last_modified
ignored_
* text*
-MM-dd
So no need to add content in solrj:
p.setParam("literal.text",handler.toString());
R
I don't believe you can specify more than 1 field on "df" (default field).
What you want, I think, is "qf" (query fields), which is available only if
using dismax/edismax.
http://wiki.apache.org/solr/SearchHandler#df
http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields.29
James Dyer
I
Hello Solr users,
Question regarding processing a lot of docs returned from a query; I
potentially have millions of documents returned back from a query. What is
the common design to deal with this ?
2 ideas I have are:
- create a client service that is multithreaded to handled this
- Use the Sol
Thanks Alan and Shawn. Just installed Solr 4.4, and no longer experiencing
the issue.
Thanks! :)
On Tue, Jul 23, 2013 at 7:21 AM, Shawn Heisey wrote:
> On 7/23/2013 7:50 AM, Alan Woodward wrote:
> > Can you try upgrading to the just-released 4.4? Solr.xml persistence
> had all kinds of bugs i
Hi all,
im testing solrcloud (version 4.3.1) with 2 shards and 1 external zookeeper.
All its runing ok, documents are indexing in 2 diferent shards and select
*:* give me all documents.
Now im trying to add/index a new document via solj ussing CloudSolrServer.
*the code:*
Try tacking &maxCollationTries=0 to the URL and see if the collation returns.
If you get a collation, then try the same URL with the collation as the "q"
parameter. Does that get results?
My suspicion here is that you are assuming that "markup_texts" is the default
search field for "/select" b
Hi James,
If I try:
http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0&maxCollationTries=0
I get the same result:
0
7
true
Perfrm HVC
0
0
3
0
6
0
perform
4
performed
1
performance
3
2
7
10
0
hvac
4
have
5
false
However, you're right that m
I think the best bet here would be a ping like handler that would simply
return the state of only this box in the cluster:
Something like /admin/state which would return
"down","active","leader","recovering"
I'm not really sure where to begin however. Any ideas?
jim
On Mon, Jul 22, 2013 at 12:5
Hi James,
I get the following response for that query:
0
8
true
Perfrm HVC
0
3
0
6
0
perform
4
performed
1
performance
3
2
7
10
0
hvac
4
have
5
false
Thanks
Brendan
On Tue, Jul 23, 2013 at 3:19 PM, Dyer, James
wrote:
> For this query:
>
>
> http://localhost:8981/s
Hi,
Thanks for your suggestions. I'll be able to provide answers to a few of
your questions right now rest I'll answer after some time. It takes
around 150k to 200k queries before it goes down again after restarting it.
In a typical query we are returning around 20 fields. Memory utilization
peak
For this query:
http://localhost:8981/solr/articles/select?indent=true&q=Perfrm%20HVC&rows=0
...do you get anything back in the spellcheck response? Is it correcting the
individual words and not giving collations? Or are you getting no individual
word suggestions also?
James Dyer
Ingram Cont
Hi! http://optiideas.com/google.com.offers.html
Hi All,
I have an IndexBasedSpellChecker component configured as follows (note the
field parameter is set to the spellcheck field):
text_spell
default
solr.IndexBasedSpellChecker
* spellcheck*
./spellchecker
.0001
with the corresponding f
I'm not sure I understand, Erick. I don't have a "text" field in my schema;
"title" and "content" are both legal fields.
On Tue, Jul 23, 2013 at 5:15 AM, Erick Erickson wrote:
> this isn't doing what you think.
> title^10 content
> is actually parsed as
>
> text:title^100 text:content
>
> where
On 23 July 2013 21:52, Chris Hostetter wrote:
>
>
> : Can anyone remove this spammer please?
>
> The recent influx is not confined to a single user, or a single list. Nor
> is there a clear course of action just yet, since the senders in question
> are all legitimate subscribers who have been act
: Ok thanks, I just wanted the know is it possible to ignore boost value or
: not during score calculation and as you said its not.
: Now I would have to focus on nutch to fix the issue and not to send boost=0
: to Solr.
the index time bosts are encoded in field norms -- if you wnat to ignore
th
Oh cool! I'm glad it at least seemed to work. Can you post your
configuration of the field type and report from Solr's logs what the
"maxLevels" is used for this field, which is logged the first time you use
the field type?
Maybe there isn't a limit under 10B after all. Some quick'n'dirty
calcu
Hi Alistair,
You probably need a commit, and not an optimize.
Which version of Solr are you running against? The 4.0 releases have more
complications, but generally sending a commit will do. Not sure if GSearch
sends one, only partly because I never was able to make it work. :)
Michael Della Bi
Elodie: I just tested your configs (as close as i could get since i don't
have the com.kelkoo classes) using the current HEAD of the 4x branch and
had no problems with the entity includes.
what java version/vendor are you using?
are you using the provided jetty or your own servlet container
What are the dangers of trying to use a range of 10 billion? Simply a
slower index time? Or will I get inaccurate results?
I have tried it on a very small sample of documents, and it seemed to
work. I could spend some time this week trying to get a more robust (and
accurate) dataset loaded to play
The use case is to prevent the necessity to download something else
(zookeeper) when everything needed to run it is (likely) present in the
Solr distribution already.
Maybe we don't need to start Jetty, maybe we can start Zookeeper with an
extra script in the Solr codebase.
At present, if you are
July 2013, Apache Solr™ 4.4 available
The Lucene PMC is pleased to announce the release of Apache Solr 4.4
Solr is the popular, blazing fast, open source NoSQL search platform
from the Apache Lucene project. Its major features include powerful
full-text search, hit highlighting, faceted search, d
DirectSolrSpellChecker does not prepare any kind of dictionary. It just uses
the term dictionary from the indexed field. So what you are trying to do is
impossible.
You would think it would be possible with IndexBasedSpellChecker because it
creates a dictionary as a sidecar lucene index. But
Solr doesn't support any kind of short-circuting the original query and
returning the results of the corrected query or collation. You just re-issue
the query in a second request. This would be a nice feature to add though.
James Dyer
Ingram Content Group
(615) 213-4311
-Original Message-
: Can anyone remove this spammer please?
The recent influx is not confined to a single user, or a single list. Nor
is there a clear course of action just yet, since the senders in question
are all legitimate subscribers who have been active members of the
community.
There is an open issue to
Here is my fieldtype:
My input for indexing at analysis section of Solr admin page:
{| style="text-align: left; width: 50%; table-layout: fixed;"
Hello Chris,
Thank you for your help.
I checked differences between my files and your test files but I didn't
find bugs in my files.
All my files are in the same directory: collection1/conf
=> schema.xml content:
]>
Thanks for your comment Eric.
When I use *server.add(doc);* - everything is fine (but takes long time
to hard commit every single doc) , so I am sure docs are uniquely indexed.
Maybe I shouldn't do *server.commit();* at all from solrj code, so SOLR
would use autoCommit/autoSoftCommit configu
Are you actually seeing that output from the WikipediaTokenizerFactory??
Really? Even if you use the Solr Admin UI analysis page?
You should just see the text tokens plus the URLs for links.
-- Jack Krupansky
-Original Message-
From: Furkan KAMACI
Sent: Tuesday, July 23, 2013 10:53 A
If you use wikipediatokenizer it will tag different wiki elements with
different types (you can see it in the admin UI).
so then followup with typetokenfilter to only filter the types you care
about, and i think it will do what you want.
On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote:
> Hi
There was also a bug in the lazy loading of multivalued fields at one point
recently in Solr 4.2
https://issues.apache.org/jira/browse/SOLR-4589
"4.x + enableLazyFieldLoading + large multivalued fields + varying fl =
pathological CPU load & response time"
Do you use multivalued fields very he
Here is a paper that I found useful:
http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
On Tue, Jul 23, 2013 at 10:42 AM, Furkan KAMACI wrote:
> Thanks for your comments.
>
> 2013/7/23 Tommaso Teofili
>
>> if you need a specialized algorithm for detecting blogposts plagiarism /
Hi;
I have indexed wikipedia data with Solr DIH. However when I look data that
is indexed at Solr I something like that as well:
{| style="text-align: left; width: 50%; table-layout: fixed;" border="0"
|- valign="top"
| style="width: 50%"|
:*[[Ubuntu]]
:*[[Fedora]]
:*[[Mandriva]]
:*[[Linux Mint]]
Thanks for your comments.
2013/7/23 Tommaso Teofili
> if you need a specialized algorithm for detecting blogposts plagiarism /
> quotations (which are different tasks IMHO) I think you have 2 options:
> 1. implement a dedicated one based on your features / metrics / domain
> 2. try to fine tune
if you need a specialized algorithm for detecting blogposts plagiarism /
quotations (which are different tasks IMHO) I think you have 2 options:
1. implement a dedicated one based on your features / metrics / domain
2. try to fine tune an existing algorithm that is flexible enough
If I were to do
Curious what the use case is for this? Zookeeper is not an HTTP
service so loading it in Jetty by itself doesn't really make sense. I
also think this creates more work for the Solr team especially since
setting up a production ensemble shouldn't take more than a few
minutes once you have the nodes
On 7/23/2013 7:50 AM, Alan Woodward wrote:
> Can you try upgrading to the just-released 4.4? Solr.xml persistence had all
> kinds of bugs in 4.3, which should have been fixed now.
The 4.4.0 release has been finalized and uploaded, but the download link
hasn't been changed yet because the mirror
Thanks Mikhail,
I'll go for your EdgeNGramTokenFilter suggestion.
-
Kind regards,
Paul
The JSON keys within the "highlighting" object are the document IDs, and
then the keys within those objects are the highlighted field names.
Again, I repeat my question: Exactly why is it difficult to deserialize?
Seems simple enough.
-- Jack Krupansky
-Original Message-
From: Mysur
Hi there,
My Solr is being fed by Fedora GSearch and when uploading a new resource, the
Collection is optimized but not current so the new resource can't be found. I
have to go to the Core Admin page and Optimize it from there, in order to make
the collection current. Is there anything I should
On 7/23/2013 3:33 AM, Furkan KAMACI wrote:
> Sometimes a huge part of a document may exist in another document. As like
> in student plagiarism or quotation of a blog post at another blog post.
> Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
> detect it?
Solr is designed
Do you need all of the fields loaded every time and are they stored? Maybe
there is a document with gigantic content that you don't actually need but
it gets deserialized anyway. Try lazy loading
setting: enableLazyFieldLoading in solrconfig.xml
Regards,
Alex.
Personal website: http://www.oute
One classic approach is to simply use the full text of the suspect text as
well as bigrams and trigrams (phrases) from that text with "OR" operators.
The top results will be the documents that most closely "match" the subject
text. That provides a visual set similar results. You will then have t
Did you look at:
*) $deleteDocById
*) $deleteDocByQuery
*) deletedPkQuery
Just search for delete on https://wiki.apache.org/solr/DataImportHandler
If you tried all of those, maybe you need to explain your problem in more
specific details.
Regards,
Alex.
Personal website: http://www.outerthou
Can you try upgrading to the just-released 4.4? Solr.xml persistence had all
kinds of bugs in 4.3, which should have been fixed now.
Alan Woodward
www.flax.co.uk
On 23 Jul 2013, at 13:36, Ali, Saqib wrote:
> Hello all,
>
> Every time I issue a SPLITSHARD using Collections API, the zkHost att
There is no such thing as a "qf filter" - "qf" is simply a list of names of
fields to search for the terms from the query, "q", as well as boost
factors. Filtering is done with "filter queries" - "fq".
-- Jack Krupansky
-Original Message-
From: Mysurf Mail
Sent: Tuesday, July 23, 201
Hi,
Use fq, not qf. It needs to be indexed. Filtering is like searching
without scoring.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Tue, Jul 23, 2013 at 9:39 AM, Mysurf Mail wrote:
> I am probably using it wrong.
> http:
I am probably using it wrong.
http://
...:8983/solr/vault10k/select?q=*%3A*&defType=edismax&qf=CreatedBy%BLABLA
returns all rows.
It neglects my qf filter.
Should I even use qf for filtrering with edismax?
(It doesnt say that in the doc
http://wiki.apache.org/solr/ExtendedDisMax#qf_.28Query_Fields
Can anyone remove this spammer please?
On Tue, Jul 23, 2013 at 4:47 AM, wrote:
>
> Hi! http://mackieprice.org/cbs.com.network.html
>
>
But I dont want it to be searched.on
lets say the user name is "giraffe"
I do want to filter to be "where created by = giraffe"
but when the user searches his name, I will want only documents with name
"Giraffe".
since it is indexed, wouldn't it return all rows created by him?
Thanks.
On Tue,
Moreover, you may want to use fq=CreatedBy:user1 for filtering.
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Tue, Jul 23, 2013 at 9:28 AM, Raymond Wiker wrote:
> Simple: the field needs to be "indexed" in order to search (or
Simple: the field needs to be "indexed" in order to search (or filter) on
it.
On Tue, Jul 23, 2013 at 3:26 PM, Mysurf Mail wrote:
> I want to restrict the returned results to be only the documents that were
> created by the user.
> I then load to the index the createdBy attribute and set it to
I want to restrict the returned results to be only the documents that were
created by the user.
I then load to the index the createdBy attribute and set it to index
false,stored="true"
then in the I want to filter by "CreatedBy" so I use the dashboard, check
edismax and add
I check edismax a
Assumptions:
* you currently have two choices to start Zookeeper: run it embedded
within Solr, or download it from the ZooKeeper site and start it
independently.
* everything you need to run ZooKeeper (embedded or not) is included
within the Solr distribution
Assuming I've got the above righ
Hello all,
Every time I issue a SPLITSHARD using Collections API, the zkHost attribute
in the solr.xml goes missing. I have to manually edit the solr.xml to add
zkHost after every SPLITSHARD.
Any thoughts on what could be causing this?
Thanks.
i have tried post.jar and it works when i set the literal.id in solrconfig.xml.
i can't pass the id with post.jar (-Dparams=literal.id=abc) because i get a
error: "could not find or load main class .id=abc".
On 20. Jul 2013, at 7:05 PM, Andreas Owen wrote:
> path was set text wasn't, but it do
I am updating my solr index using deltaQuery and deltaImportQuery
attributes in data-config.xml.
In my condition I write
where MyDoc.LastModificationTime > '${dataimporter.last_index_time}'
then after I add a row I trigger an update using data-config.xml.
Now, sometimes I delete a row.
How ca
this isn't doing what you think.
title^10 content
is actually parsed as
text:title^100 text:content
where "text" is my default search field.
assuming title is a field. If you look a little
farther up the debug output you'll see that.
You probably want
title:content^100 or some such?
Erick
On
First a minor nit. The server.add(doc, time) is a hard commit, not a soft one.
But the rest of it. When you add your 70 docs, do they all have the same id
(i.e. the field). If so, there will be only one document, the last
one since all the earlier ones will be overwritten.
Not quite sure why you
Hi,
On Tue, Jul 23, 2013 at 8:02 AM, Erick Erickson wrote:
> Neil:
>
> Here's a must-read blog about why allocating more memory
> to the JVM than Solr requires is a Bad Thing:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> It turns out that you actually do yourself
Hi Cosimo,
Very simple: Oracle 1.7 is your best bet. If you have a large heap
and are seeing STW pauses, try G1 - we've been using it and have been
happy with it.
Ciao,
Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm
On Tue, Jul 2
It can be done by extending LuceneQParser/SolrQueryParser see
http://wiki.apache.org/solr/SolrPlugins#QParserPlugin
there is newTermQuery(Term) it should be overridden and delegate to
newPrefixQuery() method.
Overall, I suggest you consider to use EdgeNGramTokenFilter in index time,
and then search
Neil:
Here's a must-read blog about why allocating more memory
to the JVM than Solr requires is a Bad Thing:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
It turns out that you actually do yourself harm by allocating more
memory to the JVM than it really needs. Of course
To add to what Erick said, that *quantifying* is hugely important!
How do you measure your search relevance improvements?
How are you currently measuring it?
How will you see, after you apply any changes, whether relevance was
improved and how much?
How will you know whether, even test queries you
There has been a _ton_ of work since 4.0, and 4.4 will be out in a day
or two. I suspect the best advice is to try 4.4...
Best
Erick
On Mon, Jul 22, 2013 at 2:54 PM, Michael Long wrote:
> I'm seeing random crashes in solr 4.0 but I don't have anything to go on
> other than "IllegalStateException
Another thing I've seen people do is something like
text:(test AND pdf)^10 text:(test pdf).
so docs with both terms in the text field get boosted a lot, but docs
with either one will still get found.
But as Jack says, you have to demonstrate a problem before you propose
a solution.
You say " a l
Thanks!
On 23 July 2013 12:19, Markus Jelsma wrote:
> Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712
>
>
> -Original message-
>> From:Markus Jelsma
>> Sent: Tuesday 23rd July 2013 13:18
>> To: solr-user@lucene.apache.org
>> Subject: RE: facet.maxcount ?
>>
>>
Ah, I think I misread your question. So your question is actually, how make
solr embed higlighting into the doc response itself. I'm not aware of such
a functionality. This why you have the "highlighting" section in your
response.
On Tue, Jul 23, 2013 at 2:30 PM, Dmitry Kan wrote:
> You just ne
You just need to specify the emphasizing tag in hl params by adding
something like this to your query:
&hl.fl=content&hl.simple.pre=&hl.simple.post=<%2Fb>
Check the solr admin page, the querying item, it shows the constructed
query, so you don't need to guess!
Regards,
Dmitry
On Mon, Jul 22
My client has an installation with 3 different clients using the same Solr
index. These clients all append a * wildcard suffix in the query: user
enters "abc def" while search is performed against (abc* def*).
In order to move away from this way of searching, we'd like to move the
clients away from
Eeh, here's the other one: https://issues.apache.org/jira/browse/SOLR-1712
-Original message-
> From:Markus Jelsma
> Sent: Tuesday 23rd July 2013 13:18
> To: solr-user@lucene.apache.org
> Subject: RE: facet.maxcount ?
>
> Hi - No but there are two unresolved issues about this topic:
>
Hi - No but there are two unresolved issues about this topic:
https://issues.apache.org/jira/browse/SOLR-4411
https://issues.apache.org/jira/browse/SOLR-4411
Cheers
-Original message-
> From:Jérôme Étévé
> Sent: Tuesday 23rd July 2013 12:58
> To: solr-user@lucene.apache.org
> Subject: f
Hi all happy Solr users!
I was wondering if it's possible to have some sort of facet.maxcount equivalent?
In short, that would exclude from the facet any term (or query) that
matches at least facet.maxcount times.
That facet.maxcount would probably significantly improve the
performance of reques
hi,all
i have two solr ,one is master , one is replication , before i use them
under 3.5 version . it works fine .
when i upgrade to 4.3version , i found when replication solr copying index
from master , it will clean current index and copy new version to self
folder . slave can't search durin
Actually I need a specialized algorithm. I want to use that algorithm to
detect duplicate blog posts.
2013/7/23 Tommaso Teofili
> Hi,
>
> I you may leverage and / or improve MLT component [1].
>
> HTH,
> Tommaso
>
> [1] : http://wiki.apache.org/solr/MoreLikeThis
>
>
> 2013/7/23 Furkan KAMACI
>
Hi ,
Can any one help on how to refer the solrcore.properties uploaded into
Zookeeper ?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-1-0-not-using-solrcore-properties-tp4040228p4079654.html
Sent from the Solr - User mailing list archive at Nabble.com.
Hi,
I you may leverage and / or improve MLT component [1].
HTH,
Tommaso
[1] : http://wiki.apache.org/solr/MoreLikeThis
2013/7/23 Furkan KAMACI
> Hi;
>
> Sometimes a huge part of a document may exist in another document. As like
> in student plagiarism or quotation of a blog post at another b
Hi;
Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a blog post at another blog post.
Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
detect it?
Im trying to Index oracle database 10g XE using Solr's Data Import Handler.
My data-config.xml looks like this
My schema.xml looks like this -
Now when I try to index it, Solr is not able to read the columns of the
table and
Hi! http://mackieprice.org/cbs.com.network.html
How do I cast datetimeoffset(7)) to solr date
On Tue, Jul 23, 2013 at 11:11 AM, Mysurf Mail wrote:
> Ahaa
> I deleted the data folder and now I get
> Invalid Date String:'2010-01-01 00:00:00 +02:00'
> I need to cast it to solr. as I read it in the schema using
>
> stored="true" required="true"
1 - 100 of 104 matches
Mail list logo