I was doing some reading on the new features and whatnot, and I am interested
in upgrading. I have a few questions though:
1) The index seemed to have changed, can I reuse the current index or should
I reindex the data? I read some things about optimizing the index and
whatnot, but I am not clear
Hello,
I am looking into indexing two data sources. One of those is a standard
website and the other is a Sharepoint site. The problem is that I have no
direct database access. Normally I would just use the DIH and get what I
need from the DB. I do have a java DAO (data access object) class that I
Perhaps I was a little confusing...
Normally when I have DB access, I do a regular indexing process using DIH.
For these two sources, I do not have direct DB access. I can only view the
two sources like any end-user would.
I do have a java class that can get the information that I need. That clas
That would certainly work.
Just as a general thing, how would one go about indexing Sharepoint content
anyway? I heard about the Sharepoint connector for Lucene but I know nothing
about it. Is there a standard best practice method?
Also, what are your thoughts on extending the DIH? Is that recomm
Hello,
I have an index with lots of different types of documents. One of those
types basically contains extracts of PDF docs. Some of those PDFs can have
1000+ pages, so there would be a lot of stuff to search through.
I am experiencing really terrible performance when querying. My whole index
h
Hello,
I have a case where if I search for the word "windows", I get results
containing both "windows" and "window" (and probably other things like
"windowing" etc.). Is there a way to find exact matches only?
The field in which I am searching is a text field, which as I understand
causes this b
Hello Erick,
Thanks for the reply. I am a little confused by this whole stemming thing.
What exactly does it refer to?
Basically, I already have a field which is essentially a collection of many
other fields (done using copyField). This field is a text field. So what
you're saying is to have a d
Hello all,
I'm just wondering what the benefits/consequences are of using shards or
merging all the cores into a single core. Personally I have tried both, but
my document set is not large enough that I can actually test performance and
whatnot.
What is a better approach of implementing a search
Hello,
I'm trying to implement a "Related Articles" feature within my search
application using the mlt handler.
To give you a little background information, my Solr index contains a single
core that is created by merging 10+ other cores. Within this core is my main
data item known as an "article
I don't quite understand what you mean by that. Did you mean TermVector
Components?
Also, I did some more digging and I found some messages on this mailing list
about filtering. From what I understand, using the standard query handler
(solr/select/?q=...) with a qt parameter allows you to filter
Hi all,
Not sure how good my title is, but here is a (hopefully) better explanation
on what I mean.
I am indexing a set of articles from a DB. Each article has an author. The
author is saved in then the DB as an author ID, which is a number.
There is another table in the DB with more relevant i
ent to separate fields
> author_fname
> author_lname
> author_email
>
> so you would get details like
>
> John
> Doe
> j...@doe.com
>
>
>
> On Wed, Jul 29, 2009 at 7:39 PM, ahammad wrote:
>>
>> Hi all,
>>
>> Not sure ho
Hello all,
I've been having this issue for a while now. I am indexing a Sybase
database. Everything is fantastic, except that there is 1 column that I can
never get back. I don't have direct database access via Sybase client, but I
was able to extract the data using some Java code.
The field is
. The import, however, fails for every single row, which is
impossible. I am positive that there is data in that column.
Any other suggestions?
Cheers
ahammad wrote:
>
> Hello all,
>
> I've been having this issue for a while now. I am indexing a Sybase
> database. Everything
g?
>
> On Fri, Jul 31, 2009 at 6:31 PM, ahammad wrote:
>>
>> Hello,
>>
>> I tried it using the debug and verbose parameters in the address bar.
>> This
>> is what appears in the logs:
>>
>> INFO: Starting Full Import
>
Hello,
I have a MultiCore setup with 3 cores. I am trying to merge the indexes of
core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear on
what needs to happen.
This is what I used:
http://localhost:9085/solr/core3/admin/?action=mergeindexes&core=core3&indexDir=/solrHome/cor
Fri, Aug 7, 2009 at 10:45 PM, ahammad wrote:
>
>>
>> Hello,
>>
>> I have a MultiCore setup with 3 cores. I am trying to merge the indexes
>> of
>> core1 and core2 into core3. I looked at the wiki but I'm somewhat unclear
>> on
>> what needs t
Hello,
I have been using multicore/shards for the past 5 months or so with no
problems at all. I just added another core to my Solr server, but for some
reason I can never get the shards working when that specific core is
anywhere in the URL (either in the shards list or the base URL).
HTTP Stat
t for this
new core, it is set to "threadID". Changing that to id fixed the problem.
Shalin Shekhar Mangar wrote:
>
> On Tue, Aug 18, 2009 at 9:01 PM, ahammad wrote:
>
>> HTTP Status 500 - null java.lang.NullPointerException at
>>
>> org.apache
Hello,
Is it possible to add a prefix to the data in a Solr field? For example,
right now, I have a field called "id" that gets data from a DB through the
DataImportHandler. The DB returns a 4-character string like "ag5f". Would it
be possible to add a prefix to the data that is received?
In thi
Hello,
I'm wondering if it's possible to make Solr use a Nutch index. I used Nutch
to crawl some pages and I now have an index with about 2000 documents. I
want to explore the features of Solr, and since both Nutch and Solr are
based off Lucene, I assume that there is some way to integrate them w
Thanks for your reply Andrzej. I am very interested in learning more about
this and I cannot wait to check it out. Nutch is extremely good on its own,
but I want to know what else can be done with the Nutch/Solr combo.
Cheers
Andrzej Bialecki wrote:
>
> Tony Wang wrote:
>> I heard Nutch 1.0 wi
Hello,
I've never used Solr before, but I believe that it will suit my current
needs with indexing information from a database.
I downloaded and extracted Solr 1.3 to play around with it. I've been
looking at the following tutorials:
http://www.ibm.com/developerworks/java/library/j-solr-update/i
he index (inside Solr or not).
>
> It is possible that LuSql might be an preferable alternative to
> Solr/DataImportHandler, depending on your requirements.
>
> LuSql: http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
>
> Disclaimer: I am the author of LuSq
Hello Erik,
I'm interested in attending the Webinar. I just have some questions to
verify whether or not I am fit to attend...
1) How will it be carried out? What software or application would I need?
2) Do I have to have any experience or can I attend for the purpose of
learning about Solr?
Th
ion)
line for? It seems to me like a sort of filter. What if I don't want to
filter anything and just want to index all the rows?
Cheers
Noble Paul നോബിള് नोब्ळ् wrote:
>
> On Mon, Apr 20, 2009 at 7:15 PM, ahammad wrote:
>>
>> Hello,
>>
>> I've never us
Hello,
I finally was able to run a full import on an Oracle database. According to
the statistics, it looks like it fetched all the rows from the table.
However, When I go into /data, there is nothing in there.
This is my data-config.xml file:
Excuse the error in the title. It should say "missing Lucene index"
Cheers
ahammad wrote:
>
> Hello,
>
> I finally was able to run a full import on an Oracle database. According
> to the statistics, it looks like it fetched all the rows from the table.
> However,
Hello all,
Is it possible for Solr to assign a unique number to every document?
For example, let's say that I am indexing from several databases with
different data structures. The first one has a unique field called artID,
and the second database has a unique field called SRNum. If I want to ha
Did you define all the fields that you used in schema.xml?
Ci-man wrote:
>
> I am using MS SQL server and want to index a table.
> I setup my data-config like this:
>
>
> autoCommit="true"
> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
Hello,
I'm trying to index data from a Sybase DB, but when I attempt to do a full
import, it fails. This is in the log:
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.AbstractMethodError:
com.sybase.jdbc2.jdbc.SybConnection.setHoldability(I)V
Hello,
How would I go about creating an aggregate entry? Does it go in the
data-config.xml file?
Also, out of curiosity, how can I access the UUIDField variable? It mat be
required for something else.
Cheers
Erik Hatcher wrote:
>
>
> On Apr 28, 2009, at 9:49 AM, ahammad wrote:
Hello all,
I am tyring to index directly from an Oracle DB. This is what appears in the
stack trace:
SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select * from ARTICLE Processing Document # 1
at
org.apache.solr.handler
/pass and
tried it on the server...
Erik Hatcher wrote:
>
> Did you move the Oracle JDBC driver to the other machine also?
>
> Erik
>
> On May 26, 2009, at 11:37 AM, ahammad wrote:
>
>>
>> Hello all,
>>
>> I am tyring to index directly from a
thandler-1.4-dev.jar from the nightly builds in the
{solrHome}/core1/lib directory (I only need it for the first core). Is there
something else I need to do for it to work?
I don't recall doing an additional step when I did this a few weeks ago on
my local machine.
Any help is appreci
I have a multicore setup as well, and when I query something, I do it through
core0, then specify both core0 and core1 ins the "shards" parameter.
However, I don't have identical indicies. The results I get back are
basically and addition of both cores' results.
Good luck, please reply to this m
when you query across
multiple shards?
KennyN wrote:
>
> Thanks for the reply ahammad, that helps. Are you specifying them both in
> a URL, or in the name="shards">localhost:8983/solr/core0,localhost:8983/solr/core1
> like I have?
>
> I should add that I
t? If that is correct, where in
the source folders would the ClobTransformer.class file go?
Thanks.
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> I guess it is better to copy the ClobTransformer.class alone and use
> the old Solr1.3 DIH
>
>
>
>
>
> On Tue, May 26,
ther than ClobTransformer.class. put that jar into
> solr.home/lib
>
> On Wed, May 27, 2009 at 6:10 PM, ahammad wrote:
>>
>> Hmmm, that's probably a good idea...although it does not explain how my
>> current local setup works.
>>
>> Can you please explain how this is done?
case the request to datasource stays at 1
regardless. Looks like it tries once and fails, then it terminates the
process...
Regards
Noble Paul നോബിള് नोब्ळ्-2 wrote:
>
> no need to rename .
>
> On Wed, May 27, 2009 at 6:50 PM, ahammad wrote:
>>
>> Would I nee
Hello,
In the solrconfig.xml file, there is a property:
${solr.data.dir:./solr/data}
Try setting something else in here and see what happens...I'm not sure how
solr works with Ubuntu, but it's worth a shot...
Tim Haughton wrote:
>
> OK, I spoke too soon.
>
> When you tried it on
Hello,
I have a field type of "text" in my collection called "question".
When I query for the word "customer" for example in the "question" field (ie
q=question:customer), the first document with the highest score shows up,
but does not contain the word customer at all.
Instead, it contains the
Hello,
I have a MultiCore install of solr with 2 cores with different schemas and
such. Querying directly using http request and/or the solr interface works
very well for my purposes.
I want to have a proper search interface though, so I have some code that
basically acts as a link between the s
and send it to
> Solr using SolrJ. What's the name of that class... MapSolrParams, I
> believe.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message
>> From: ahammad
>> To: solr-user@lucen
he results of my query, I get null for everything.
However, the result of sdl.getNumFound() is correct, so I know that both
cores are being accessed.
Is there a difference with how SolrJ handles multicore requests?
Disclaimer: The code
ahammad wrote:
>
> Hello,
>
> I have a Mult
Sorry for the additional message, the disclaimer was missing.
Disclaimer: The code that was used was taken from the following site:
http://e-mats.org/2008/04/using-solrj-a-short-guide-to-getting-started-with-solrj/
.
ahammad wrote:
>
> Hello,
>
> I played around some more w
Hello,
I am trying to install a patch for Solr
(https://issues.apache.org/jira/browse/SOLR-284) but I'm not sure how to do
it in Windows.
I have a copy of the nightly build, but I don't know how to proceed. I
looked at the HowToContribute wiki for patch installation instructions, but
there are n
/asf/lucene/solr/trunk': could
not connect to server (http://svn.apache.org)" when I try. Something tells
me that my proxy is blocking the connection. If that is the case, then I
don't think that I can do a checkout. Do you have any other alternatives?
Thanks again for the input.
a
When I go to the source and I input the command, I get:
bash: patch: command not found
Thanks
Koji Sekiguchi-2 wrote:
>
> ahammad wrote:
>> Thanks for the suggestions:
>>
>> Koji: I am aware of Cygwin. The problem is I am not sure how to do the
>> whole
>&g
Hello,
I've recently started using this handler to index MS Word and PDF files.
When I set ext.extract.only=true, I get back all the metadata that is
associated with that file.
If I want to index, I need to set ext.extract.only=false. If I want to index
all that metadata along with the contents,
Hello,
I can index rich documents like pdf for instance that are on the filesystem.
Can we use ExtractingRequestHandler to index files that are accessible on a
website?
For example, there is a file that can be reached like so:
http://www.sub.myDomain.com/files/pdfdocs/testfile.pdf
How would I g
Hello,
I'm not sure what the best way is to do this, but I have done something
identical.
I have the same requirements, ie several datasources. I also used SolrJ and
jsp for this. The way I ended up doing it was to create a multi core
environment, one core per datasource. When I do a query acros
ription etc simultaneously. You can copy
all those things to the text field and then search on the text field, which
contains all the information that you wanted to search on.
joe_coder wrote:
>
> Thanks ahammad for the quick reply.
>
> As suggested, I am trying out multi core w
I have a Solr core that retrieves data from an Oracle DB. The DB table has a
few columns, one of which is a Blob that represents a PDF document. In order
to retrieve the actual content of the PDF file, I wrote a Blob transformer
that converts the Blob into the PDF file, and subsequently reads it u
ta-imports.
I hope that I explained this issue properly. I am really stuck on this. Any
help would be highly appreciated.
--
ahammad wrote:
>
> I have a Solr core that retrieves data from an Oracle DB. The DB table has
> a few columns, one of which is a Blob tha
Hello,
I am not reusing the context object. The remaining part of the code takes in
a "Blob" object, converts it to a FileInputStream, and reads the contents
using PDFBox. It does not deal with anything related to Solr.
The Transformer doesn't even execute the remaining part of the code. It
does
I had the same problem as you last year, i.e. indexing stuff from different
sources with different characteristics. The way I approached it is by
setting up a multi-core environment, with each core representing one type of
data. Within each core, I had a "data type" sort of field that would define
In our deployment, we thought that complications might arise when attempting
to hit the Solr server with addresses of too many cores. For instance, we
have 15+ cores running at the moment. At the worst case, we will have to use
all 15+ addresses of all the cores to search all our data. What we
eve
58 matches
Mail list logo