Sven
In my data-config.xml I have the following
In my schema.xml I have
And in my solrconfig.xml I have
data-config.xml
dismax
Please unsubscribe me from Mailing list
Shalin Shekhar Mangar a écrit :
On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler <
xavier.schep...@sciences-po.fr> wrote:
Hey,
I'm thinking about using dynamic fields.
I need one or more user specific field in my schema, for example,
"concept_user_*", and I will have maybe more than 200 use
I understand that tika is able to index pdf content: its true? I tried to
post a pdf from local and I've seen in the solr/admin schema browser another
document, but when I search only the document id is available, the documents
doesn't seem indexed. Do I need other products to index pdf content?
On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler <
xavier.schep...@sciences-po.fr> wrote:
> Shalin Shekhar Mangar a écrit :
>
> On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler <
>> xavier.schep...@sciences-po.fr> wrote:
>>
>>
>>
>>> Hey,
>>>
>>> I'm thinking about using dynamic fields.
>>>
>>> I n
Shalin Shekhar Mangar a écrit :
On Tue, Feb 9, 2010 at 2:43 PM, Xavier Schepler <
xavier.schep...@sciences-po.fr> wrote:
Shalin Shekhar Mangar a écrit :
On Mon, Feb 8, 2010 at 9:47 PM, Xavier Schepler <
xavier.schep...@sciences-po.fr> wrote:
Hey,
I'm thinking about using
Hi,
I am having problems getting the delta-import to work for my schema.
Following what i have found in the list, jira and the wiki below
configuration should just work but it doesn't.
The sql generated in the deltaquery is correct, the times
I don't use any garbage collection parameters.
/Tim
2010/2/8 Simon Rosenthal :
> What Garbage Collection parameters is the JVM using ? the memory will not
> always be freed immediately after an event like unloading a core or starting
> a new searcher.
>
> 2010/2/8 Tim Terlegård
>
>> To me it d
Ok I'm going ahead (may be:).
I tried another curl command to send the file from remote:
http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf
and the behaviour has been changed: now I get an error in solr log file:
HTTP St
If I unload the core and then click "Perform GC" in jconsole nothing
happens. The 8 GB RAM is still used.
If I load the core again and then run the query with the sort fields,
then jconsole shows that the memory usage immediately drops to 1 GB
and then rises to 8 GB again as it caches the stuff.
Which version of Solr/Lucene are you using?
Can you run Lucene's CheckIndex tool (java -ea:org.apache.lucene
org.apache.lucene.index.CheckIndex /path/to/index) and then post the
output?
Have you altered any of IndexWriter's defaults (via solrconfig.xml)?
Eg the termIndexInterval?
Mike
On Mon, F
Hi all,
I need logic in solr to join two field in query;
I indexed two field : id and body(text type).
5 rows are indexed:
id=1 : text= nokia samsung
id=2 : text= sony vaio nokia samsung
id=3 : text= vaio nokia
etc..
I am searching by "q=id:1" returning result perfectly, returning "n
try this
deltaImportQuery="select id, bytes from attachment where application =
'MYAPP' and id = '${dataimporter.delta.id}'"
be aware that the names are case sensitive . if the id comes as 'ID'
this will not work
On Tue, Feb 9, 2010 at 3:15 PM, Jorg Heymans wrote:
> Hi,
>
> I am having probl
you can also try
URL urlo = new URL(url);// ensure that the url has wt=javabin in that
NamedList namedList = new
JavaBinCodec().unmarshal(urlo.openConnection().getInputStream());
QueryResponse response = new QueryResponse(namedList, null);
On Mon, Feb 8, 2010 at 11:49 PM, Jason Rutherglen
wrote
> I am searching by "nokia" and resulting (listing) 1,2,3
> field with short
> description.
> There is link on search list(like google), by clicking on
> link performing
> new search (opening doc from index), for this search
>
> I want to join two fields:
> id:1 + queryString ("nokia samsung") t
Hi,
id like to know if its possible to have a solr Server with a schema and lets
say 10 fields indexed.
I know want to replicate this whole index to another solr server which has a
slightly different schema.
There are additional 6 fields these fields change the sort order for a product
which ba
Hi Ahmet,
Thank you very much..
my problem solved..
with regards
On Tue, Feb 9, 2010 at 5:38 PM, Ahmet Arslan wrote:
>
> > I am searching by "nokia" and resulting (listing) 1,2,3
> > field with short
> > description.
> > There is link on search list(like google), by clicking on
> > link perfo
Hi,
I'm trying to improve the search box on our website by adding an
autosuggest field. The dataset is a set of properties in the world
(mostly europe) and the searchbox is intended to be filled with a
country-, region- or city name. To do this I've created a separate,
simple core with one do
With the new sort by function in 1.5
(http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function), will it now be
possible to include the ExternalFileField value in the sort formula? If so, we
could sort on last bid price or last bid time without updating the document
itself.
However, to displ
I have been using distributed search with haproxy but noticed that I am
suffering a little from tcp connections building up waiting for the OS level
closing/time out:
netstat -a
...
tcp6 1 0 10.0.16.170%34654:53789 10.0.16.181%363574:8893
CLOSE_WAIT
tcp6 1 0 10.0.16.170%34654
> I'm trying to improve the search box on our website by
> adding an autosuggest field. The dataset is a set of
> properties in the world (mostly europe) and the searchbox is
> intended to be filled with a country-, region- or city name.
> To do this I've created a separate, simple core with one
>
Hi,
I did not try this, but could you not read the URL client side and pass it to
SolrJ as a ContentStream?
ContentStream urlStream =
ContentStreamBase.URLStream("http://my.site/file.html";);
req.addContentStream(urlStream);
--
Jan Høydahl - search architect
Cominvent AS - www.cominvent.com
On 2/9/2010 2:57 PM, Ahmet Arslan wrote:
I'm trying to improve the search box on our website by
adding an autosuggest field. The dataset is a set of
properties in the world (mostly europe) and the searchbox is
intended to be filled with a country-, region- or city name.
To do this I've created a
NOTE: Please start a new email thread for a new topic (See
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking)
Your strategy could work. You might want to look into dedicated entity
extraction frameworks like
http://opennlp.sourceforge.net/
http://nlp.stanford.edu/software/CRF-NER.shtml
indeed that made it work. Looking back at the documentation, it's all there
but one needs to read every single line with care :-)
2010/2/9 Noble Paul നോബിള് नोब्ळ्
> try this
>
> deltaImportQuery="select id, bytes from attachment where application =
> 'MYAPP' and id = '${dataimporter.delta.id}
Much more efficient to tag documents with language at index time. Look for
language identification tools such as
http://www.sematext.com/products/language-identifier/index.html or
http://ngramj.sourceforge.net/ or
http://lucene.apache.org/nutch/apidocs-1.0/org/apache/nutch/analysis/lang/Languag
You may also want to play with other highlighting parameters to select how much
text to do highlighting on, how many fragments etc. See
http://wiki.apache.org/solr/HighlightingParameters
--
Jan Høydahl - search architect
Cominvent AS - www.cominvent.com
On 9. feb. 2010, at 13.08, Ahmet Arslan
Hi,
Index replication in Solr makes an exact copy of the original index.
Is it not possible to add the 6 extra fields to both instances?
An alternative to replication is to feed two independent Solr instances -> full
control :)
Please elaborate on your specific use case if this is not useful answ
Tim,
The GC just automagically works right?
:)
There's been issues around thread local in Lucene. The main code for
core management is CoreContainer, which I believe is fairly easy to
digest. If there's an issue you may find it there.
Jason
2010/2/9 Tim Terlegård :
> If I unload the core and
Hello Everyone,
I have a field in my solr schema which stores emails. The way I want the
emails to be tokenized is like this.
if the email address is abc@alpha-xyz.com
User should be able to search on
1. abc@alpha-xyz.com (whole address)
2. abc
3. def
4. alpha-xyz
Which tokenizer should
Thanks Lance and Michael,
We are running Solr 1.3.0.2009.09.03.11.14.39 (Complete version info from
Solr admin panel appended below)
I tried running CheckIndex (with the -ea: switch ) on one of the shards.
CheckIndex also produced an ArrayIndexOutOfBoundsException on the larger
segment contai
I tried your suggestion, Hoss, but committing to the new coordinator
core doesn't change the indexVersion and therefore the ETag value isn't
changed.
I opened a new JIRA issue for this
http://issues.apache.org/jira/browse/SOLR-1765
Thanks,
Charlie
-Original Message-
From: Chris Hostett
Yes, the term count reported by CheckIndex is the total number of unique terms.
It indeed looks like you are exceeding the unique term count limit --
16777214 * 128 (= the default term index interval) is 2147483392 which
is mighty close to max/min 32 bit int value. This makes sense,
because Check
I opened a Lucene issue w/ patch to try:
https://issues.apache.org/jira/browse/LUCENE-2257
Tom let me know if you're able to test this... thanks!
Mike
On Tue, Feb 9, 2010 at 2:09 PM, Michael McCandless
wrote:
> Yes, the term count reported by CheckIndex is the total number of unique
> term
Thanks Michael,
I'm not sure I understand. CheckIndex reported a negative number:
-16777214.
But in any case we can certainly try running CheckIndex from a patched
lucene We could also run a patched lucene on our dev server.
Tom
Yes, the term count reported by CheckIndex is the total
I attached a patch to the issue that may fix it.
Maybe start by running CheckIndex first?
Mike
On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West wrote:
>
> Thanks Michael,
>
> I'm not sure I understand. CheckIndex reported a negative number:
> -16777214.
>
> But in any case we can certainly try
On Tue, Feb 9, 2010 at 2:56 PM, Tom Burton-West wrote:
> I'm not sure I understand. CheckIndex reported a negative number:
> -16777214.
Right, we are overflowing the positive ints, which wraps around to the
smallest int (-2.1 billion), and then dividing by 128 = ~ -1677214.
Lucene has an array
I know this is not Drupal, but thought this question maybe more around the
Solr query.
For instance, I pulled down LucidImaginations Solr install, just like the
apache solr install and ran the example solr and loaded the documents from
the exampledocs.
I can go to:
http://localhost:8983/solr/ad
Hi,
To match 1, 2, 3, 4 below you could use a fieldtype based on TextField, with
just a simple WordDelimiterFactory. However, this would also match abc-def,
def.alpha, xyz-com and a...@def, because all punctuation is treated the same.
To avoid this, you could do some custom handling of "-", "."
Hi,
I'm using the /itas requestHandler, and would like to add spell-check
suggestions to the output.
I'm having spell-check configured and working in the XML response writer, but
nothing is output in Velocity. Debugging the JSON $response object, I cannot
find any representation of spellcheck r
Hello,
One of the commercial search platforms I work with has the concept of
'document vectors', which are 1-gram and 2-gram phrases and their
associated tf/idf weights on a 0-1 scale, i.e. ["banana pie", 0.99]
means banana pie is very relevant for this document.
During the ingest/indexing proces
> I've been looking at the Solr TermVectorComponent
> (http://wiki.apache.org/solr/TermVectorComponent) and it
> seems to have
> something similar to this, but it looks to me like this is
> a component
> that is processed at query time (?) and is limited to
> 1-gram terms.
If you use it can give
We are using Solr 1.4 in a multi-core setup with replication.
Whenever we write to the master we get the following exception:
java.lang.RuntimeException: after flush: fdx size mismatch: 1285 docs vs 0
length in bytes of _gqg.fdx file exists?=false
at
org.apache.lucene.index.StoredFieldsWriter.cl
The class was added in 2007 and hasn't changed. I don't know if anyone uses it.
Presumably sort-by-function will use it.
On Tue, Feb 9, 2010 at 5:59 AM, Jan Høydahl / Cominvent
wrote:
> With the new sort by function in 1.5
> (http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function), will it
Thank you Ahmet, this is exactly what I was looking for. Looks like
the shingle filter can produce 3+-gram terms as well, that's great.
I'm going to try this with both western and CJK language tokenizers
and see how it turns out.
On Tue, Feb 9, 2010 at 5:07 PM, Ahmet Arslan wrote:
>> I've been l
Hi All,
I'm trying to create an index of documents, where for each document, I am
trying to associate with it a set of related keywords, each with individual
boost values that I compute externally.
eg:
Document Title: Democrats
related keywords:
liberal: 4.0
politics: 1.5
obama: 2.0
A couple of minor problems:
The qt parameter (Que Tee) selects the parser for the q (Q for query)
parameter. I think you mean 'qf':
http://wiki.apache.org/solr/DisMaxRequestHandler#qf_.28Query_Fields.29
Another problems with atomID, atomId, atomid: Solr field names are
case-sensitive. I don't kn
stream.file= means read a local file from the server that solr runs
on. It has to be a complete path that works from that server. To load
the file over HTTP you have to use @filename to have curl open it.
This path has to work from the program you run curl on, and relative
paths work.
Also, tika d
This goes through the Apache Commons HTTP client library:
http://hc.apache.org/httpclient-3.x/
We used 'balance' at another project and did not have any problems.
On Tue, Feb 9, 2010 at 5:54 AM, Ian Connor wrote:
> I have been using distributed search with haproxy but noticed that I am
> sufferi
hello *, quick question, what would i have to change in the query
parser to allow wildcarded terms to go through text analysis?
To select the whole string, I think you want hl.fragmenter=regex and
to create a regex pattern for your entire strings:
http://www.lucidimagination.com/search/document/CDRG_ch07_7.9?q=highlighter+multi-valued
This will let you select the entire string field. But I don't know how
to avoid the non-
That's what I was going to look up :)
The nutch thing works reasonably well. It comes with a training
database from various languages. It had some UTF-8 problems in the
files. The trick here is to come up with a balanced volume of text for
all languages so that one language's patterns do not overw
The admin/form.jsp is supposed to prepopulate fl= with '*,score' which
means bring back all fields and the calculated relevance score.
This is the Drupal search, decoded. I changed the %2B to + signs for
readability. Have a look at the filter query fq= and the facet date
range.
Also, in Solr 1.4
We need more information. How big is the index in disk space? How many
documents? How many fields? What's the schema? What OS? What Java
version?
Do you run this on a local hard disk or is it over an NFS mount?
Does this software commit before shutting down?
If you run with asserts on do you get
Thank you! it works very well.
I think that the field type suggested by you will index words like DOT, AT,
com also
In order to prevent these words from getting indexed, I have changed the
field type to
On Wed, Feb 10, 2010 at 10:09 AM, Lance Norskog wrote:
>
> Thanks for the pointer to ngramj (LGPL license), which then leads to
> another contender, http://tcatng.sourceforge.net/ (BSD license). The
> latter would make a great DIH Transformer that could go into contrib/
> (hint hint).
>
>
SOLR-17
56 matches
Mail list logo